robots.txt is part of the core distribution. I think it should be something like robots.txt.example or similar, so that we do not have to update it or change it.

See Programming: Never Hack Core and Site Building: Never Hack Core.

#11 drupal-495608-11.patch8.62 KBtim.plunkett
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es). View
#5 robotstxt.patch9.23 KBRobLoach
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es). View


RobLoach’s picture

Title: robots.txt is part of core, breaks "never hack core"-principle » "Never hack core"-principle broken by robots.txt
Version: 6.0 » 7.x-dev
Status: Needs review » Active
Issue tags: +robots.txt, +Don't Hack Core

In Drupal's current state, in order to add stuff to robots.txt, one must either modify robots.txt, or delete the file and use the RobotsTXT module. Requiring custom entries in robots.txt is a common practice of any site, and telling people to "never hack core" just makes absolutely no sense here.

In order to make this sane, we should have calls to /robots.txt output the standard robots.txt. Instead of this being a straight file, however, it would be outputted from a variable/hook. Note that this should also work when mod_rewrite is unavailable.

seutje’s picture

I like this idea, but since it's not a bug and it does involve some changes, it doesn't seem feasible for 7

Damien Tournoud’s picture

Version: 7.x-dev » 8.x-dev
Category: task » feature

Agreed with #2.

Dave Reid’s picture

Yar, I be supporting renaming the file to example.robots.txt although I'd love to get it as an actual hook_robotstxt() and hook_robotstxt_alter() in core.

RobLoach’s picture

Status: Active » Needs work
Issue tags: +delivery callback
9.23 KB
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es). View

This patch does a few things...

  • Leaves robots.txt where it is so if the server does not have Clean URLs, it will still get the default robots.txt
  • When Clean URLs are active, however, it'll send the request over to Drupal to handle
  • Uses hook_robotstxt() and hook_robotstxt_alter() to construct the robots.txt
  • Tries to output the text via hook_menu's delivery callback (not working)

Anyone know how $page['#theme_wrappers'] works?

  // Search engine control.
  $items['robots.txt'] = array(
    'page callback' => 'drupal_get_robotstxt',
    'access callback' => TRUE,
    'type' => MENU_CALLBACK,
    'delivery callback' => 'drupal_deliver_txt_page',

I guess we should rather base this on ajax_deliver instead?

Also, should there be a variable that hook_robotstxt() checks before grabbing from the file for the default value?

+ * Implements hook_robotstxt().
+ */
+function system_robotstxt() {
+  // Cache the robots.txt content from the file system.
+  $robotstxt = &drupal_static(__FUNCTION__, array());
+  if (empty($robotstxt)) {
+    if ($cache = cache_get(__FUNCTION__)) {
+      $robotstxt = $cache->data;
+    }
+    else {
         // Check the robotstxt variable first before grabbing the file contents.
         $robotstxt = empty(variable_get('robotstxt')) ? file(realpath('robots.txt'), FILE_IGNORE_NEW_LINES) : variable_get('robotstxt');
+      cache_set(__FUNCTION__, $robotstxt);
+    }
+  }
+  return $robotstxt;
NikLP’s picture

Sounds like a heap of good ideas, +1 from me.

Josh The Geek’s picture

+++ modules/system/txt.tpl.php	1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,25 @@
+// $Id: html.tpl.php,v 1.6 2010/11/24 03:30:59 webchick Exp $

No $Id$ after tggm. Can you reroll this patch with Git? Also, it was the wrong Id anyways. If you copy a file with an Id, you change it back to $Id$ from its expanded form.

There should also probably be a system_robotstxt like you suggested that contains the usual defaults. Should a test be included? +1 the the whole idea.

Powered by Dreditor.

catch’s picture

Subscribing. Increasingly I'd like us to stop supporting non-clean urls - at least for things that are only needed on production sites. Then we wouldn't need double logic for so much stuff.

Regardless this seems like a good plan.

RobLoach’s picture

Everytime I put together a site with a staging or multisite setup, I always hit this. Once again, going to add it to my hit list.

j0nathan’s picture


tim.plunkett’s picture

Status: Needs work » Needs review
8.62 KB
PASSED: [[SimpleTest]]: [MySQL] 29,429 pass(es). View

Reroll with git.

RobLoach’s picture

I'm still not sure about drupal_deliver_txt_page(). Is there a better/cleaner way to output just text in Drupal?

Also, this is interesting: #1032234: Use Robots Meta Tag rather than robots.txt when possible

pillarsdotnet’s picture

jeremyr’s picture

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

I'm currently facing this issue with an existing set of D6 sites.

j0nathan’s picture

A solution described in comment #14 would also benefit to Aegir which hosts multiple sites into a unique platform.

RobLoach’s picture

Would there be a way to drop the robots.txt file into the respective sites/ folder for a multi-site setup? Each site may need to have their own unique file and it just makes sense to have customizations in the same folder as settings.php.

Although that does sound handy, I think it's something we should pass off to contrib to handle. First thing is getting hook_robotstxt() in. Then the Robots.txt module for Drupal 8 could worry about loading in additional robots entries from the sites directories.

andypost’s picture

Suppose we can't make the patch in without dropping none-clean urls support. So at first robots.txt should be moved into example.robots.txt and only after landing of this patch we could start clean-url as requirement.

Also I'd like to point to that system module is not a good place for robots in case #679112: Time for system.module and most of includes to commit seppuku

EDIT: Also let's fix #180379-45: Fix path matching in robots.txt

lpalgarvio’s picture

neat :)

does a contrib module really have to exist? can this be merged into D8 core? a GUI makes sense.

joachim’s picture


I just saw a patch to a contrib module ( which recommends that users add lines to robots.txt, and that got me thinking -- surely this should be done with a hook_robotstxt ;)

> Also I'd like to point to that system module is not a good place for robots

Should we move it to a robotstxt.module?

joachim’s picture

Title: "Never hack core"-principle broken by robots.txt » generate robots.txt from a hook so users don't have to hack core to change it

Better title.

pillarsdotnet’s picture

Title: generate robots.txt from a hook so users don't have to hack core to change it » Move all or part of robotstxt module into core.

How is the patch in #5 different from the RobotsTxt module?

joachim’s picture

Neat, I didn't know about that!

Looking at that project page, I'd say this:

> and gives you the chance to edit it, on a per-site basis, from the web UI

which isn't in the patch. IMO that can stay in contrib.

joestewart’s picture

A little related info, hopefully useful. Aegir currently looks in the site files directory for a robots.txt and falls back to the one in Drupal root. Apache commit:

#1173954: Support for per-site robots.txt

andypost’s picture

If core could run as service or without node module I think this functionality should live in module.
Having example.robots.txt make no sense because brings more questions in forums.
Probably core could be shipped with default set of rules but UI can live in contrib as token module does.

andypost’s picture

Hey, it seems nobody works on this so maybe move this issue to D9?

lpalgarvio’s picture

seems to be the most wise decision.

klonos’s picture

Should we at the very least:

1. rename the file to default.robots.txt or example.robots.txt
2. require the same copy-rename procedure that we require for default.settings.php during installation (could be automated if no robots.txt exists already).

1. would prevent overwriting any custom file created with each update.
2. would ensure that a robots.txt file exists

RobLoach’s picture

Title: Move all or part of robotstxt module into core. » Move parts of robotstxt module into core.

Hey, it seems nobody works on this so maybe move this issue to D9?

As long as we get the patch up to par, then it might still be able to get in.

How is the patch in #5 different from the RobotsTxt module?

It attempts to use Drupal's rendering engine rather than outputting text and exiting the process.

Should we move it to a robotstxt.module?

Introducing a robotstxt.module to Drupal core could be an option. The current patch sticks it directly into system.module, and we all know system.module is already pretty large.

Questions left to get this patch up to par:

  1. How does one "properly" output a Drupal-generated text file in Drupal 8?
  2. Do we stick it into a robotstxt module in Drupal core, or stick it directly into system.module?
andypost’s picture

Status: Needs review » Needs work
+++ b/includes/common.incundefined
@@ -218,6 +218,22 @@ function drupal_get_profile() {
+function drupal_get_robotstxt() {

@@ -2543,6 +2559,116 @@ function drupal_deliver_html_page($page_callback_result) {
+function drupal_deliver_txt_page($page_callback_result) {

+++ b/modules/system/system.moduleundefined
--- /dev/null
+++ b/modules/system/txt.tpl.phpundefined

maybe better to introduce this as core service?

RobLoach’s picture

Status: Needs work » Active
andypost’s picture

Status: Active » Needs work

the only way for this in D8 core a router with controller

RobLoach’s picture

Issue tags: +Needs reroll

Likely needs a reroll, and switch over to a controller. robots.txt has been bugging me since the Drupal 5 days. Would love to get it out of there so that we don't have to deal with patch workflows there.

andypost’s picture

Yes, controller should get $request to allow fine tuning of the hook results for each of searchbots

Albert Volkman’s picture

Version: 8.x-dev » 9.x-dev
Issue tags: -Needs reroll

Moving to 9.x.

Albert Volkman’s picture

Issue summary: View changes

Reference "Never Hack Core" docs.

mc0e’s picture

Issue summary: View changes

Why was this moved back to 9.x-dev? Seems like it's a few major versions overdue already, and should be given higher priority than that.

catch’s picture

Version: 9.x-dev » 8.1.x-dev
andypost’s picture

Version: 8.1.x-dev » 8.0.x-dev
Category: Feature request » Task

So 8.x version of robotstxt module works now, it makes sense to discus at least approach...

Answers to #28
module sends just a alterable strings (see implementation all we need is to configure proper caching

@catch I think that's a task with BC:
1) rename txt to example.robots.txt as we have for gitignore
2) add controller and route with proper caching + reading of example file or config
3) leave contrib module to swap controller and provide UI

lpalgarvio’s picture

Version: 8.0.x-dev » 8.1.x-dev
marcingy’s picture

Version: 8.1.x-dev » 8.2.x-dev

Should be 8.2 as 8.1 is feature frozen