Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Hi,
For some reason, Google index submitted form URL...
With all the weird variable renaming IE: ?c744_name=4e&660c_name=c1&9299_name=97&0402_name=2a
So when the page is unpublished, Google reports over 50 variants of this page, boosting and polluting the error report.
If the variable were remaining stable, we could exclude them as URL parameters in Google WebMasterTools, but they change on every post.
The canonical URL is in place, but seems not considered.
Any solution?
Comments
Comment #1
iva2k CreditAttribution: iva2k commentedThanks for posting!
The URLs are produced by BOTCHA, and randomly generated codes in the variables are from one of the recipes, which makes unique URLs for deterring bot posters. As you noticed, it creates a problem with indexing engines, which follow all links and blindly accumulate the infinite number of URL variables.
I can think of two things, which probably both should be implemented:
1. Make all BOTCHA-generated links "nofollow", so indexers will not try to unroll the infinite list of links.
2. Remove URL variable after the link is followed (so it does not stick around in all subsequent URLs). It will be good for regular users too, so they will not have the URL polluted with the leftover variable.
Both options require some work on BOTCHA code. I hope people can contribute patches, as I'm completely booked.
Comment #2
MastaP CreditAttribution: MastaP commentedThanks for your response iva2k
At this point, I am wondering how come Google Bot ever reaches this URL in the first place.
It posts forms!?
1-As for adding no-follow. Not sure adding this to the form action is valid. Your thoughts?
2-I agree this would make it cleaner, although not sure it applies here in terms of indexing.
Comment #3
iva2k CreditAttribution: iva2k commentedGood question, google bot has to post forms to get url variables from Botcha. And there is no "nofollow" on form post URLs. I have no answer to that. Perhaps googling can help to understand that (pun intended).
Comment #4
iva2k CreditAttribution: iva2k commentedI just did some googling:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html...
http://stackoverflow.com/questions/2038228/does-google-follow-buttons-in...
https://www.mattcutts.com/blog/bot-obedience-herding-googlebot
Looks like should add "noindex" tag to all pages with Botcha variables in the URL. It may actually be very straigtforward to do in the recipes.