Use Robots Meta Tag rather than robots.txt when possible [#1032234]

Comment	File	Size	Author
#15	Bildschirmfoto 2017-06-09 um 16.31.15.png	120.85 KB	nodestroy
#15	Bildschirmfoto 2017-06-09 um 16.31.05.png	140.47 KB	nodestroy

Comment #1

yonailo CreditAttribution: yonailo commented 11 April 2011 at 14:00

subscribing

Log in or register to post comments

Comment #2

yonailo CreditAttribution: yonailo commented 11 April 2011 at 14:00

subscribing

Log in or register to post comments

Comment #3

RobLoach

he/him

CreditAttribution: RobLoach commented 3 May 2011 at 18:38

Interesting... Would this use both robots.txt and the meta tag, or just the meta tags?

Log in or register to post comments

Comment #4

j0nathan CreditAttribution: j0nathan commented 3 May 2011 at 21:33

Subscribing.

Log in or register to post comments

Comment #5

pillarsdotnet CreditAttribution: pillarsdotnet commented 28 May 2011 at 00:07

Log in or register to post comments

Comment #6

userok CreditAttribution: userok commented 13 July 2011 at 10:33

By allowing those links to be crawled, wouldn't that impact on site performance?
eg, on every instance of /comment/reply, the link needs to be crawled first, in order to access the meta tag 'noindex'.

I'm a bit hazy about indexing so I could be completely wrong.

Log in or register to post comments

Comment #7

Roger34 CreditAttribution: Roger34 commented 20 July 2011 at 01:13

I am not sure if this post belongs to this page, since it did not get a reply elsewhere (http://drupal.org/node/22265)I am posting here:

I use default robots.tx in drupal 6.22. But google webmaster central, performance overview shows that prohibited directories are also accessed. Do you suggest that I will be better off adding paths in meta tag? I have the following listed in the google webmaster central under example page loading time:
/admin/content/add 1.9
/node/add/story 2.3
/node/add/article 3.1
/rss.xml 0.6
/node/15008/edit 2.2
/admin/settings 0.9
/admin/reports/status 1.6
/admin/reports/status/run-cron 120.01

In addition to Disallow: /admin/
do I also need to specify for example: /admin/reports/status/run-cron. I am sure no one wants the crawler to spend 120 seconds on cron.
I also do not like google to crawl node/add/

Would appreciate a reply.

Log in or register to post comments

Comment #8

ar-jan CreditAttribution: ar-jan commented 20 July 2011 at 21:13

I like this idea. Re #6: yes, I think so, the page would have to be crawled. So apart from any possible performance impact, for very large sites (thousands of pages) this would mean the crawler spending time on irrelevent /comment/reply pages, crawling time that should be spent on actual content. (Unless noindex pages are 'free' as far as far as crawling effort is concerned?)

@Roger: no, this is not the place to ask, this is an issue about changing the way Drupal core works. But: the default robots.txt prevents search engines from even crawling those pages. Everything under /admin/ is already disallowed by that line, so you shouldn't have to add that particular cron page. Better check that your robots.txt is accessible. For further questions head back to the forums or IRC.

Log in or register to post comments

Comment #8.0

philbar CreditAttribution: philbar commented 20 July 2011 at 21:49

Issue summary:

View changes

Update for clarity

Log in or register to post comments

Comment #9

jhedstrom

English

Portland, OR

CreditAttribution: jhedstrom commented 9 December 2014 at 18:42

Version:	8.0.x-dev	» 8.1.x-dev
Issue summary:	View changes

Moving to 8.1.

Log in or register to post comments

Comment #10

iantresman CreditAttribution: iantresman commented 7 October 2015 at 10:21

Just a note that I think that some of the paths in the robots.txt file appear to be incorrect, and do not block the required pages. See "robots.txt paths incorrect"

Log in or register to post comments

Comment #11

7 October 2015 at 10:21

Version:

8.1.x-dev

» 8.2.x-dev

Drupal 8.1.0-beta1 was released on March 2, 2016, which means new developments and disruptive changes should now be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #12

7 October 2015 at 10:21

Version:

8.2.x-dev

» 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #13

7 October 2015 at 10:21

Version:

8.3.x-dev

» 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #14

chapf CreditAttribution: chapf commented 3 May 2017 at 23:09

Today, after checking the access logs of a drupal based website I administer I was rather surprised to see googlebot constantly crawl some of the pages that are listed in the stock robots.txt file.

So I found this issue here and then read up on some of the documentation google provides for its crawler and they clearly state that robots.txt isn't any good in preventing a page from being indexed or crawled or listed in the search results. Basically useless for the purpose I think many people still assume it fulfills!! (Source: https://support.google.com/webmasters/answer/6062608?hl=en)

Now I don't mean to be annoying but is there any concrete plan to move forward on this problem or should I have a look into contrib modules? Seems wrong to me since this little file is core functionality and there is also other open issues for robots.txt listed above.

Log in or register to post comments

Comment #15

nodestroy CreditAttribution: nodestroy commented 9 June 2017 at 14:34

File	Size
Bildschirmfoto 2017-06-09 um 16.31.05.png	140.47 KB
Bildschirmfoto 2017-06-09 um 16.31.15.png	120.85 KB

from my point of view we should completely remove that urls from robots.txt and replace that with a x-robots-tag implementation - if thats possible with compatibility in mind.

listing a specific page in robots.txt is no guarantee to prevent it from indexing. see an example below from drupal.org

Log in or register to post comments

Comment #16

no2e CreditAttribution: no2e commented 9 June 2017 at 17:44

@nodestroy:

robots.txt is used to prevent crawling, not indexing. Your screenshots show exactly this. The page is indexed, but not crawled (hence why the search result snippet refers to the robots.txt as reason why no relevant snippet could be shown).

X-Robots-Tag (and the equivalent meta-robots) prevents indexing, not crawling. To allow bots to notice this, they have to crawl the page, of course.

In both cases, "prevent" is of course not meant in a technical sense. Both ways require that the bot and the search engine are polite.

Log in or register to post comments

Comment #17

9 June 2017 at 17:44

Version:

8.4.x-dev

» 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #18

9 June 2017 at 17:44

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #19

9 June 2017 at 17:44

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #20

9 June 2017 at 17:44

Version:

8.7.x-dev

» 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Log in or register to post comments

Comment #21

9 June 2017 at 17:44

Version:

8.8.x-dev

» 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #22

9 June 2017 at 17:44

Version:

8.9.x-dev

» 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Log in or register to post comments

Comment #23

9 June 2017 at 17:44

Version:

9.1.x-dev

» 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Log in or register to post comments

Comment #24

9 June 2017 at 17:44

Version:

9.2.x-dev

» 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #25

9 June 2017 at 17:44

Version:

9.3.x-dev

» 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #26

9 June 2017 at 17:44

Version:

9.4.x-dev

» 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #27

9 June 2017 at 17:44

Version:

9.5.x-dev

» 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Log in or register to post comments

Comment #28

9 June 2017 at 17:44

Version:

10.1.x-dev

» 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.