Install
Works with Drupal: ^10 || ^11Using Composer to manage Drupal site dependencies
Alternative installation files
Download tar.gz
149.68 KB
MD5: 6e9361e490e5630bb77112ce8d815228
SHA-1: 6cfa17a2830e8cd27359875cfcba29df16686b17
SHA-256: 8834f4c13ff6b013ddbab406a688f7089eb633b0e077bc865c4b265ff671a1fa
Download zip
183.34 KB
MD5: 4de81726fb1623cc1a87b21fbfeb7925
SHA-1: f34c241143b19a8e0af7eb64a83bbe448652c2a6
SHA-256: 346ad1705789ae54ce4d9de6b8ec6d2bd3c4cf45239577b8e8c0ce2105743ea2
Release notes
Opensolr Search 1.8.0
Added
- Crawler Settings UI — new admin section to configure parallel threads, request delay (0.1–10s), and crawl mode (1–6) per site. Settings are saved in Drupal config and passed to the Opensolr crawler API on
every crawl. - Multilingual crawling support — confirmed working with 7 languages, 10 threads, zero language mismatches. Sitemap-only crawling (mode 5) ensures only URLs from the sitemap are indexed, preventing locale
variant pollution.
Fixed
- Wrong language indexed for multilingual pages — root cause: crawler replaced the requested URL with the page's canonical URL but kept the original page's HTML content. When Drupal's canonical pointed
locale variants to a primary locale (e.g./ro-md/page→ canonical/en-gb/page), Romanian content was indexed under the en-gb URI. Fix: if canonical differs from the crawled URL, skip indexing entirely —
the canonical URL will be indexed when the crawler reaches it directly. - Crawler link leaking in modes 4–6 — HTML link extraction at level 1 discovered locale variant URLs from language switchers, nav links, and RSS feeds, polluting the crawl queue. Fix: modes 4–6 now only
extract links at level 0 (the start URL itself). Sitemaps are still followed at any depth. RSS feeds are no longer treated as sitemaps. - Flush to Solr stuck at end of crawl — multiple worker processes exiting simultaneously caused a race where pidcount detection was unreliable, leaving buffered docs unflushed. Fix: every exiting worker now
flushes the shared buffer (safe because Solr upserts by doc ID). - Request delay validation — server-side validation prevents saving values below 0.1 seconds.
Improved
- Sitemap chunking — reduced from 5000 to 500 URLs per sitemap file for better crawler performance.
- Parallel Threads default — defaults to account maximum, JS caps the field to the user's plan limit. Description mentions firewall IP whitelisting.
- Crawl status labels — "Indexed Pages" renamed to "Discovered Pages" for accuracy.