Note: I have reported this issue to the drupal security team and was advised to create an issue.
Problem/Motivation
Drupal tries to hide its install files via robots.txt, this does not work when Drupal is installed in a sub folder since search engines only check the root of a domain for robots.txt.
Normally that is only a SEO issue but when the server is configured wrong or the user has started but not finished the installation process leaving the settings.php writeable in place it leads to a security issue on the site. A malicious user could install Drupal using an external db server and then gain php access to the server.
This is not a security issue of Drupal since it needs a mistake of the server admin or maintainer but we could save people from being hacked when we hide their vulnerability. You can find vulnerable sites using Google. I wont post the links or sites here in public but send links via mail on request. Total one can find a few dozen sites easily.
Proposed resolution
We cannot protect the user from all possible follies but the vulnerable sites shouldn't be searchable via a search engine. Google advises to use the noindex meta tag or x-robots-tag to make sure a site is not indexed:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Here is an overview about the support of other search engines regarding the two tags:
http://michaeljaylissner.com/blog/support-for-x-robots-tag-http-header-a...
Attached is a patch that adds the noindex meta tag. I chose the meta tag way since it has the best support among the SE although http would be more flexible for other content types than html and needs fewer LOC. Both ways work and prevent the site from being indexed. I have setup test sites to validate the patch here:
http://d7-dev-test.a7n.de/
As you can see only the unpatched site appears in google:
https://www.google.de/search?q=site:d7-dev-test.a7n.de
Remaining tasks
Review patch.
User interface changes
None.
API changes
None.
Follow-up issues
Maybe unmake settings.php writeable afer 24 hours automatically.
Comments
Comment #1
s.Daniel CreditAttribution: s.Daniel commentedOne more try.
Comment #3
s.Daniel CreditAttribution: s.Daniel commented#1: install.php-noindex-meta-tag-1760330-1.patch queued for re-testing.
Comment #4
cosmicdreams CreditAttribution: cosmicdreams commentedhttp://support.google.com/webmasters/bin/answer.py?hl=en&answer=93710 provides a more concise description.
Please namespace the key for the installer. The key currently used could easily be trampled on by an unsuspecting dev.
Comment #5
s.Daniel CreditAttribution: s.Daniel commentedThanks, I changed the variable and key name to be more descriptive.
Comment #6
sunGreat work! :)
1) This already describes what is being done to some extent, but it would be great to have some additional explanation for why this is needed in this comment here (and e.g. cannot be achieved via robots.txt); i.e., let's transfer some more details from the issue summary into the code comment.
2) Minor: Can we add a blank line before and after this code block?
Let's rename the key to 'install_meta_robots'
Comment #7
s.Daniel CreditAttribution: s.Daniel commentedThanks for the review. :)
I made the proposed corrections.
Comment #8
sunThanks! :)
Comment #9
droplet CreditAttribution: droplet commentedwhy nofollow ? I don't mean any SEO issue. but for this thread purpose, should not here ?
Comment #10
sunI guess nofollow is there to prevent the crawler from following any links on the page?
Unless I'm mistaken, noindex only affects the page it is on, but doesn't affect any subsequent resources that can be reached from that page.
In any case, having both is safer, and I don't see a reason to hold up this patch in case noindex might semantically be unnecessary.
Comment #11
s.Daniel CreditAttribution: s.Daniel commentedRight. For all other internal install pages I see no reason to have google crawel them so I think it's better to not let google follow the links.
We could use the first install page for SEO purpose and link to d.o. Then we could consider removing the nofollow and put it on individual internal links. However from all I know that might as well be treated as link spam by Google creating a negative effect and will in any case have no big positive effect since the drupal install sites are usually new with low page rank and link value. That would be a different issue with a lot of things to discuss though.
Comment #12
catchThis seems reasonable. It's silly to have vulnerabilities you can just google for, a bit like the PHP filter.
Committed/pushed to 8.x, seems like we could backport this to D7?
Comment #13
jbrown CreditAttribution: jbrown commentedIs it possible to test the installer?
Comment #14
jfhovinne CreditAttribution: jfhovinne commentedHere is the patch for D7.
Tested installation with patch applied - install works as expected and new meta tag is correctly added to the install pages.
Comment #14.0
jfhovinne CreditAttribution: jfhovinne commentedtypo
Comment #15
parthipanramesh CreditAttribution: parthipanramesh commentedTested. Looks fine to me and this is a good approach to solve the "issue".
Comment #16
David_Rothstein CreditAttribution: David_Rothstein commentedWhy is the backported D7 patch so different from the D8 patch (including missing all those code comments)?...
The D8 patch literally applies directly to D7 (just without the core/ directory) and seems to work fine, so I'm posting that instead and will commit it if it passes tests.
Comment #17
David_Rothstein CreditAttribution: David_Rothstein commentedCommitted to 7.x - thanks! http://drupalcode.org/project/drupal.git/commit/cb7127c