Problem/Motivation
Drupal's .htaccess and web.config files provide reasonable protection for sensitive files from prying eyes. There is FilesMatch (Apache) and a rewrite rule (IIS) that block accessing certain files based on a file name match. Looking at the history of the file over time, we only kept adding new patterns but never cleaned them up.
They provide protection for:
- Accessing PHP files directly via the web server (module, install, inc, profile, etc).
- Accessing sensitive static files (twig, yml, po, composer.json, composer.lock, etc).
- Access said files as temporary edit files of Vim and emacs (swo, swp, ~, etc).
- Patch residue (.bak, .orig, #foo.php#, etc).
Among these rules, there are some out dated rules such as code-style.pl that are no longer relevant (code-style.pl was removed many years ago).
The regular expression is getting difficult to read and there are some improvements that we can (micro) optimize.
Proposed resolution
Evaluate the current list of rewrite rules. Remove the ones that we no longer need, and combine/optimize them.
Remaining tasks
* Discuss on the components to remove as they are no longer necessary.
* Optimize the rewrite rules to adapt to modern directory structures (.well-known directory is excluded from ^\..* matching for example)
* Optimize the regular expression with non capturing group.
* Discuss to change the order of rewrite rules to make them easier to read (file extensions, followed by exact file names, followed by dot files, followed by edit/patch residue, etc).
* Remove file patterns that we block, but are no longer relevant. For example, SVN-related file matches are not necessary because a parent level dot-directory match will block them. Other than that, Entries.*|Repository|Root|Tag|Template|all-wcprops|entries|format seems to have survived from early as 2003, but I couldn't find concrete evidence why they should be blocked, considering we block the .svn directory they are contained in.
* Update .htaccess and web.config configuration.
* Update nginx and other web server documentation as necessary.
User interface changes
None.
API changes
None.
The release notes will need to say that we updated the .htaccess file, and the tests need to be updated of course. No other API changes.
Data model changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| #26 | 3017095-nr-bot.txt | 149 bytes | needs-review-queue-bot |
| #12 | interdiff_9-12.txt | 696 bytes | tatarbj |
| #12 | 3017095-12-htaccess-regex.patch | 2.3 KB | tatarbj |
Issue fork drupal-3017095
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
tatarbjComment #3
chewie commentedComment #4
chewie commentedHere is some patch with some optimization from my side.
I added some optimization for regexp.
For example removed in any cases access to trash/backup files in any cases.
Removed records about mysterious
Entries.*|Repository|Root|Tag|Template|all-wcprops|entries|format. If you have any idea why this is necessary to keep this files in .htaccess/web.config, please, remind.Comment #5
chewie commentedCurrently, I don't have idea how to make regular expression more readable. It could be done for /.../x modifier. But unfortunately, for instance, Apache do not support PCRE standards (https://httpd.apache.org/docs/2.4/glossary.html#regex), otherwise it could be possible use syntax like this: https://softwareengineering.stackexchange.com/questions/194975/readable-...
From my side I think this is for now don't see big sense protect all possible files, which could contain sensitive data. It is more important to find solution how we could add possibility for developers to extend protections for files with sensitive data.
Comment #6
chewie commentedComment #7
chewie commentedComment #8
ayesh commentedThanks for the awesome work! It looks like the tests failed due to emacs temp files (
/~$/) not being blocked.I have updated the match from @Chewie to exclude all files ending with a tilde. These files are almost always temporary files, and regardless of the extension, should be blocked. Current rule does not block all extensions with the tilde; just
.moduleand.php~, which leaves other temp files (.install~,yml~, et al) accessible. Regex tester.- Because using files with
.lockending is a pattern for lock files, I have moved thecomposer.lockrule to a generic.lockrule.- Along with
.lockextensions, one could think to block.pidfiles, but we'd have more problems if we there are any.pidfiles out in the open. I don' think we need to go that far.- One more improvement would be making the rules case insensitive. For Apache-style
FilesMatch, a capturing group that starts like?i:makes it case insensitive. For IIS, the<match ignoreCase="false" />does the trick.Comment #9
ayesh commentedThis new one blocks all dot-files in the root except
.well-known(that is used in acme protocol, Safari password change URL, etc). I also added tests for said files.Comment #11
ayesh commentedComment #12
tatarbjIt seems good to me, only a small typo is fixed in my patch. I believe it's an RTBC as a clean-up issue.
Comment #19
fathershawn"re-rolled" these changes against 9.2.x
Comment #22
draenen commentedThe block for Entries.* caused a problem on our site due to some PDF files starting with "Entries". Took awhile to track down why those PDFs were returning 403s so it would be nice to clean these up if they're not needed. Patch from #12 worked great.
Comment #26
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.