Ponder facets acting as link spam, add rel=nofollow? [#371542]

On our production site with Apache Solr, we decided to turn off indexing of search/* altogether from robots.tx, because:

* spidering was loading our site
* Google results included many (many!) irrelevant (IMO) results for page after page of facet combinations

We also added an XML sitemap to make Google index us correctly.

I think that we should think a bit about what to do about the very real possibility of spamming search engines through add/remove facet links.

This article provides some quick poniters and links to other sources: http://www.netconcepts.com/faceted-navigation-article/

Comments

Comment #1

pwolanin commented 11 February 2009 at 18:15

yes, i agree this is critical, but is something we need to probably address as a core patch.

Comment #2

greggles

he/him

English

Denver, Colorado, USA

commented 11 February 2009 at 18:15

The core robots.txt disallows search by default.

Google's webmaster guidelines recommend this as well:

Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don't add much value for users coming from search engines.

Comment #3

greggles

he/him

English

Denver, Colorado, USA

commented 11 February 2009 at 18:16

My comment wasn't clear on my recommended course of action: I'd say this is 'by design' or 'won't fix'.

Comment #4

janusman commented 18 February 2009 at 00:38

Status:

Active

» Closed (won't fix)

Was unaware robots.txt disallowed /search/*

Marking won't fix.

Comment #5

cpliakas commented 15 December 2011 at 16:21

Related discussion posted against Facet API at #1370342: Implement a setting to add "rel=nofollow" to facet links. This gets more complex with D7 where searches can reside outside of the "search/*" path. In addition.