Closed (won't fix)
Project:
Apache Solr Search
Version:
6.x-1.x-dev
Component:
Miscellaneous
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Issue tags:
Reporter:
Created:
9 Feb 2009 at 15:28 UTC
Updated:
19 Dec 2011 at 18:22 UTC
On our production site with Apache Solr, we decided to turn off indexing of search/* altogether from robots.tx, because:
* spidering was loading our site
* Google results included many (many!) irrelevant (IMO) results for page after page of facet combinations
We also added an XML sitemap to make Google index us correctly.
I think that we should think a bit about what to do about the very real possibility of spamming search engines through add/remove facet links.
This article provides some quick poniters and links to other sources: http://www.netconcepts.com/faceted-navigation-article/
Comments
Comment #1
pwolanin commentedyes, i agree this is critical, but is something we need to probably address as a core patch.
Comment #2
gregglesThe core robots.txt disallows search by default.
Google's webmaster guidelines recommend this as well:
Comment #3
gregglesMy comment wasn't clear on my recommended course of action: I'd say this is 'by design' or 'won't fix'.
Comment #4
janusman commentedWas unaware robots.txt disallowed /search/*
Marking won't fix.
Comment #5
cpliakas commentedRelated discussion posted against Facet API at #1370342: Implement a setting to add "rel=nofollow" to facet links. This gets more complex with D7 where searches can reside outside of the "search/*" path. In addition.
Comment #6
cpliakas commentedOlder issue posted against Faceted Search: #197783: Module makes database balloon in size - avoid logging the guided searches.
Comment #7
cpliakas commentedAlso posted issue to the SEO checklist module: #1376398: Validate SEO approach taken by Facet API and determine if it would be worth adding an item to the checklist