Drupal Association members fund grants that make connections all over the world.
Because duplicate content is considered harmful for search engine optimization (Drupal SEO Group), if you are using Views, you should take steps to avoid duplication. For example, you should edit your robots.txt file (Google Webmasters/Site owners Help - > Dynamic pages, Google Webmasters/Site owners Help -> Duplicate content). If you have any exposed filters in Views, then the URL is full of variables. This leads to duplicate content and thus should be controlled via robots.txt (Google Webmasters/Site owners Help -> URL structure). In short if you use Views you need to use robots.txt! The following information will help in optimizing robots.txt for Google. Other major search engines and robots are covered here... via wikipedia cited source.
The reason why I'm only discussing Google is because it's the only search engine that allows you to test your robots.txt file against any URL you want Google Webmasters/Site owners Help -> Checking robots.txt.
First things first, enable clean URLs! Lets say you have a view that exposes the title to the user, so they can refine or search the view. This will cause two duplicate versions of that view. This isn't good SEO.
Lets say that view also uses a pager, so you need to allow the
page URL variable and disallow every other URL variable. This can be accomplished by adding this to robots.txt in the root of your webserver.
# Disallow all URL variables except for page Disallow: /*? Allow: /*?page= Disallow: /*?page=*&* Disallow: /*?page=0*
I tested this using the robots.txt tool that is provided in Google's webmaster tools so this may not work with other search engines.
Multiple views of content
I believe it is better to only have your nodes indexed, or only a single, all encompassing view. Having both will lead to duplicate content and thus a lower overall page rank. For my site, I use Views in conjunction with Directory to display my multiple taxonomies. Thus the directory will have duplicate content, so I need to disallow that from being indexed, but I want my nodes to be indexed. This is where XML Sitemap is key. I only have XML Sitemap, XML Sitemap: Engines and XML Sitemap: Node enabled because I only want my nodes to be submitted. I use pathauto for my taxonomies and nodes. I put my nodes in one directory and my taxonomy terms in another. Then in robots.txt all I have to do is disable my taxonomy root directory like this.
# No taxonomy Disallow: /taxonomy-dir-name
Submitting a sitemap is key, otherwise the search engine may never find everything. The main point is to put your nodes in a directory that can be easily separated from all your views. Allow the nodes, disallow the views.
200 returned for a non existent path
http://drupal.org/project/modules/google.com <- This should return a 404; 200 given. Potentially a duplicate content penalty!
Views404 is a module designed to handle this situation.
Here's my "eureka" thread to the views issue tracker