I am trying to use XML Sitemap with domain access. My site has 58 virtual subdomains. It appears that the sitemap generates the way I need it to. It generates all the entries at maindomain.com
However, when Google attempts to crawl the urls, it returns errors saying that the url subdomain.maindomain.com/nodename is not allowed for this sitemap location. All of the content I have indexed is set with permission to be viewed either on its subdomain or the main domain. So somehow the url that google is seeing is on a subdomain, hence the error. If you could provide any insight into how to manage this so that Google does not see the subdomains or attempt to crawl the subdomains, I would really appreciate it. I have scoured Google looking for answers and it is not a heavily addressed subject. I think that ideally I would like to block access to any subdomains through robots.txt and have google crawl my nodes on my main domain and index them on the main domain as well.
Thanks
Comments
Comment #1
avpadernoIf you don't have it installed, you can try RobotsTxt, which is thought to be used in cases like yours. Using it, XML Sitemap will add a line in the robots.txt that has the purpose to declare which is the authored sitemap for a web site.
Comment #2
1959mvp commentedI Looked into the module RobotsTXT, but I am using DOMAIN ACCESS, not MULTI-SITE. There is a major difference. Domain Access serves content to subdomains based on permissions using Wildcard DNS. Therefore the subdomains are virtual. It seems that XML Sitemap is correctly building the sitemap using the base domain to build all the URL's but the crawler is somehow able to crawl the subdomain and return an error "url not allowed for a sitemap at htis location". I need to figure out how to block the crawlers from accessing content on the subdomains.
Thanks,
Comment #3
avpadernoI think there isn't an answer for such particular situation. I am going to close the support request.
Comment #4
Anonymous (not verified) commentedThis isn't a function of this project. You'll probably need to modify the .htaccess rules or the httpd conf rules. You may need different IP addresses for each v. domain. I'm not all that proficient with the v.d. process but based on what Google Webmaster states a different IP address may help.
Comment #6
szy commentedSo, is there a working solution for 6th Drupal?
Now I can see, that:
- node urls in a sitemap do not contain a proper domain,
only the base one,
- a single node shows up on every domain's sitemap,
regardless its DA settings,
- I can't find db_rewrite_sql in code of XML-S. module.
I remember, that there was a patch to 5th Drupal, but
I can't find any solution working properly on D6.
Is there any? Don't you use sitemaps in your D6?
Szy.
Comment #7
Anonymous (not verified) commentedOpen a new support request.