I'm fairly certain that Google will reindex the same page, trying every single value from a drop down list if you have one as an exposed filter. I'm suggesting that

# URL Variables
disallow: /*list
disallow: /*of
disallow: /*drop
disallow: /*down
disallow: /*url
disallow: /*variables

be added to robots.txt, or at least a warning somewhere so web admins know how to handle the issue. Spiders might try other values in other url variables as well? Guessing on the robots.txt syntax, it's correct right?

Similar yet different issue
#280281: Views pager duplicates content

Comments

mikeytown2’s picture

or would you do something like this

Disallow: /path/of/view?
Allow: /path/of/view
Allow: /path/of/view?page=*

Or this

Disallow: /*?
Allow: /*?page=

using clean urls.

The option of having views insert a "noindex" meta tag might be the answer.

EDIT:
http://www.google.com/support/webmasters/bin/answer.py?answer=76329&hl=en

mikeytown2’s picture

Actually now that I think about it, I'm disabling anything that makes a url variable until we get robots.txt figured out. Duplicate content is a killer. ?page= is fine, everything else isn't.

mikeytown2’s picture

https://www.google.com/webmasters/tools/dashboard

  1. Click on your URL, or add it in.
  2. Go to tools
  3. Analyze robots.txt
  4. Find the Test URLs against this robots.txt file box
  5. Type some test URLs

Once I get some feedback on this, I'll add it to the handbook page

This is what I came up with

# URL Variables
Disallow: /*?
Allow: /*?page=
Disallow: /*?page=*&*
mikeytown2’s picture

halisemre’s picture

I have a question?
If i use
Disallow: /*?
Allow: /*?page=
Disallow: /*?page=*&*

http://www.mysite.com/?page=1 is ok
http://www.mysite.com/?page=2 is ok

but what about

http://www.mysite.com/?page=0
it is the same as http://www.mysite.com/ so it is kind of you are duplicating the frontpage.

Is there a way to eliminate this problem

mikeytown2’s picture

I updated the handbook page
http://drupal.org/node/345620

Another way is to not link to page=0; or 301 it.

merlinofchaos’s picture

Category:bug» support
Status:Active» Fixed

How is this a bug? Views has no control over the robots.txt. Looks like you guys have it sorted, anyhow.

Status:Fixed» Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.