I know the module's developer stance on this kind of request from long-time closed #1238938: Noindex specific node paths and pages, that

IMHO, it would not be correct to use the noindex metatag to deal with this.

, however on the same page gisle goes on

Feel free to repoen if you think there exists a use case for this where using permissions to regulate robot access to these paths cannot be done.

So I decided to refresh the request, however to cover all sort of possible pages and not only nodes, because today on one of our projects we got Google indexed bunch of pages like comment/reply/30/15, even though paths like comment/* are blocked by robots.txt.

Researching this further I found Google recommends:

To properly prevent your URL from appearing in Google Search results, you should password-protect the files on your server or use the noindex meta tag or response header (or remove the page entirely).

We definitely can not password-protect pages like comment/reply/30/15 as well as can not remove them either, so the only option here is to inject noindex meta tag on such pages. And that's where this module would be very much handy, but unfortunately in concentrates around nodes only.

However, imagine use cases when Drupal sites have lot's of custom pages of not-node type and they might badly need noindex-tag. Of course, the needed effect can be implemented through templates, but extending this module would be also a very good idea. Thanks for consideration!

Comments

nickonom created an issue. See original summary.

gisle’s picture

First: There is no way this project will be expanded to "cover all sort of possible pages". However, I believe that doing so is in the scope of Metatag – also linked from the project page with the following note:

This project provides a very comprehensive framework setting metatags. Use it if you require capabilities beyond the scope of this project

As far as the use-case you describe, it applies to comments specifically. You want them to be readable by anonymous visitors, but not indexed by Google and other well-behaved search engines. It is correct that this cannot be blocked by robots.txt, as it only regulates how robots crawl your site, not how it is indexed (so comments will be linked if they have an URL and somebody, somewhere links to that URL). So you need to use the noindex metatag to regulate indexing.

I may accept a patch to accommodate this particular use-case, provided it is to the point and well-written.

So may first want to look at Metatag and see if it can be used to solve your problem. While I have a rather minimalist attitude to module management, my impression is that the maintainers of Metatag are much more expansive as to what they implemented, and maybe the Metatag maintainers are more open to patches than me.

However, I believe the expansive strategy of Metatag comes at a price: Metatag is a behemoth that currently have 153 open bug reports. This project has a very small footprint that has 0 open bug reports. I like to keep it that way.

nickonom’s picture

Metatag is for pages and because comments are just part of pages I don't think Metatag module can save the day in my use case.

I understand and totally accept your approach to keep it simpler and concentrate on nodes only. I just thought that if a new module for all kind of pages was to developed, then it basically would do exactly the same as this module does - to inject the nonfollow tag when necessary - by just expanding such necessary cases to "cover all sort of possible pages". So it wouldn't be much different, but it would just start incorporating other conditions, and hence my suggestion.

gisle’s picture

I just thought that if a new module for all kind of pages was to developed, then it basically would do exactly the same as this module does - to inject the nonfollow tag when necessary - by just expanding such necessary cases to "cover all sort of possible pages". So it wouldn't be much different, but it would just start incorporating other conditions, and hence my suggestion.

This module works by attaching a noindex-field to a node (i.e. an instance of the "content" entity) that says "noindex this". Extending this to allow this field to be added to other Drupal entities such as comments and users would be fairly trivial and use the same code.

Expanding this to "cover all sort of possible pages", including non-entity based content, special pages and aggregates (such as those produced by Views) would require a very different approach (probably some sort of filter based on the URL) and would probably mean that a total rewrite of the module from scratch would be necessary.

If you want this, and Metatag does not solve it, the best approach is probably to create a new module (e.g. something named "Url Noindex") that filters on URLs. You can of course create that yourself, or hire somebody (including me) to create it.

gisle’s picture

Status: Active » Closed (won't fix)

It is been a year, with no new information. Time to close.