Consider robot bans for more listing pages [#1708360]

after #1702486: Change behaviour of profile listings: new sort order and/or block from robots.txt has been addressed to some degree, I wonder if we should block further listing pages from robots.

These listing pages are all expensive to generate and do not provide much info by themselves.

All nodes are linked to by the tracker which is generated in a more efficient way and not as costly.

The obvious idea is to block all paged pages except the tracker.

We could even keep the first page of each paged page in the index, ie we block all pages that match /node?page= but not /node itself.

I am not an SEO expert and I am not sure if this change would have any consequences.

Please discuss.

Comments

Comment #1

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 1 August 2012 at 20:58

I've had a look at the stats that google has about the number of indexed pages on d.o.

Last August, there were about 5 Mio indexec pages, and apparently it was sort of constant. Then there was a jump to 9 Mio in early November. Then there was a steady increase till about June, topping at over 18 Mio indexed pages. Then a slight decrease to about 14 Mio.

This is obviously nuts, we don't really have that much content. If we add user profiles and nodes, we have less than 2.5 Mio pages.

The bloated number is probably due to all these pages link lists. There are alone about 200k results for

site:drupal.org/taxonomy

Comment #2

Ben Finklea CreditAttribution: Ben Finklea commented 1 August 2012 at 22:07

The question is, how deeply would googlebot have to delve to find an arbitrary node? Linked from the profile pages may not be ideal. (not sure)

If any of those nodes are more than 5-6 levels deep from the home page (or they're not linked from anywhere. i.e. orphans) then it's best to either leave things as is or create a sitemap so that Googlebot will find them.

Then there is the fly by the seat of your pants option: Look to see how many pages are indexed and daily traffic. Ban the bot. Wait a week and check again. Kieran would kill me for recommending this...but, if it's as you describe then it shouldn't have an effect at all.

Also consider that there are other ways to handle performance issues. That's not my area, though.