Hi,
For some reason, Google index submitted form URL...
With all the weird variable renaming IE: ?c744_name=4e&660c_name=c1&9299_name=97&0402_name=2a

So when the page is unpublished, Google reports over 50 variants of this page, boosting and polluting the error report.
If the variable were remaining stable, we could exclude them as URL parameters in Google WebMasterTools, but they change on every post.

The canonical URL is in place, but seems not considered.

Any solution?

Comments

iva2k’s picture

Thanks for posting!

The URLs are produced by BOTCHA, and randomly generated codes in the variables are from one of the recipes, which makes unique URLs for deterring bot posters. As you noticed, it creates a problem with indexing engines, which follow all links and blindly accumulate the infinite number of URL variables.

I can think of two things, which probably both should be implemented:

1. Make all BOTCHA-generated links "nofollow", so indexers will not try to unroll the infinite list of links.

2. Remove URL variable after the link is followed (so it does not stick around in all subsequent URLs). It will be good for regular users too, so they will not have the URL polluted with the leftover variable.

Both options require some work on BOTCHA code. I hope people can contribute patches, as I'm completely booked.

MastaP’s picture

Thanks for your response iva2k

At this point, I am wondering how come Google Bot ever reaches this URL in the first place.
It posts forms!?

1-As for adding no-follow. Not sure adding this to the form action is valid. Your thoughts?

2-I agree this would make it cleaner, although not sure it applies here in terms of indexing.

iva2k’s picture

Good question, google bot has to post forms to get url variables from Botcha. And there is no "nofollow" on form post URLs. I have no answer to that. Perhaps googling can help to understand that (pun intended).

iva2k’s picture

I just did some googling:
http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html...
http://stackoverflow.com/questions/2038228/does-google-follow-buttons-in...
https://www.mattcutts.com/blog/bot-obedience-herding-googlebot

Looks like should add "noindex" tag to all pages with Botcha variables in the URL. It may actually be very straigtforward to do in the recipes.