Problem/Motivation

Following discussion with the Drupal Security Team, it was agreed that this could be handled in a public "security improvements" issue.

At present Drupal's file API allows filenames to be created which could be dangerous if they're not handled safely. This is not a directly exploitable vulnerability, but improvements could be made that would reduce the likelihood of filenames being used as part of a chained attack.

Command injection is a specific concern here.

https://owasp.org/www-community/attacks/Command_Injection

https://portswigger.net/web-security/os-command-injection

Steps to reproduce

In some cases, browsers will escape/encode certain characters in a normal file upload, but it may be possible to avoid that escaping using a tool like Burp Suite, or perhaps via web services (rest / jsonapi).

An example of a dangerous filename which I believe a normal file field will currently accept is:

foo";echo `whoami`; #.txt

A fairly recent improvement to filename handling (which we could build upon) is described in this Change Record:

https://www.drupal.org/node/2972665

Proposed resolution

  • Always remove/replace specific characters that may be used for command injection e.g. " ; # |` and if possible ' &.
  • Disallow spaces by default in filenames (makes it quite a lot harder to achieve meaningful command injection).

Remaining tasks

  • Implement improvements.
  • Add tests (e.g. in \Drupal\Tests\file\Functional\SaveUploadTest ).
  • Ensure that improvements also apply to web services and other uses of the API if possible.

User interface changes

Default filename handling will change - notably:

  • Some "special characters" - mostly punctuation - will be removed from filenames on upload.
  • Spaces in filenames will be removed / replaced by default, but this remains configurable.

Introduced terminology

n/a?

API changes

Changes to filename handling may represent an API change.

Data model changes

n/a?

Release notes snippet

tbc

Issue fork drupal-3516706

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

mcdruid created an issue. See original summary.

mcdruid’s picture

Issue summary: View changes
mcdruid’s picture

Issue summary: View changes
kim.pepper’s picture

All of these are available as optional settings. Are you saying we should default to safer settings, or enforce them?

mcdruid’s picture

I'd recommend that we enforce the removal (/ substitution) of the most dangerous characters.

For removal of spaces, that'd be ok as a default (would people really want to turn it off?!)

mcdruid’s picture

@kim.pepper although the existing options would remove all of the characters I highlighted as a concern, they do more than that.

I wouldn't advocate stripping filenames down to only ASCII / alpha-numeric characters (by default or as a mandatory setting) because of how restrictive that is for different languages.

I'm proposing a fairly limited number of forbidden characters - those which have significance for the shell - which would be a subset of what would be stripped by any of the existing options IIUC?

mcdruid’s picture

Having said that, I went to look for a definitive list of "dangerous characters" and ended up with quite a lot of "it depends".

This is not definitive, but e.g. https://stackoverflow.com/questions/15783701/which-characters-need-to-be...

There are some quite good lists using e.g. the shell escaping option that printf (and apparently ls) offer - e.g.

No, character % does not need to be escaped
No, character + does not need to be escaped
No, character - does not need to be escaped
No, character . does not need to be escaped
No, character / does not need to be escaped
No, character : does not need to be escaped
No, character = does not need to be escaped
No, character @ does not need to be escaped
No, character _ does not need to be escaped
Yes, character   needs to be escaped
Yes, character ! needs to be escaped
Yes, character " needs to be escaped
Yes, character # needs to be escaped
Yes, character $ needs to be escaped
Yes, character & needs to be escaped
Yes, character ' needs to be escaped
Yes, character ( needs to be escaped
Yes, character ) needs to be escaped
Yes, character * needs to be escaped
Yes, character , needs to be escaped
Yes, character ; needs to be escaped
Yes, character < needs to be escaped
Yes, character > needs to be escaped
Yes, character ? needs to be escaped
Yes, character [ needs to be escaped
Yes, character \ needs to be escaped
Yes, character ] needs to be escaped
Yes, character ^ needs to be escaped
Yes, character ` needs to be escaped
Yes, character { needs to be escaped
Yes, character | needs to be escaped
Yes, character } needs to be escaped

I feel like most of those make sense; and there are a fair number of characters that would be allowed.

Whether this would be more or less confusing / user friendly than just stripping all punctuation though is perhaps a usability question.

Looks like the code mostly uses regex; I wonder whether the punct character class would be any use. It'd remove the "safe" characters from the list above which we may not want to do.

To be clear what I'm looking for here is something that would remove all the dangerous shell characters but not limit users to strict ASCII alphanum without e.g. accented letters etc..

mcdruid’s picture

FWIW by default Wordpress removes a load of "special" characters from filenames:

https://github.com/WordPress/wordpress-develop/blob/6.7.2/src/wp-include...

function sanitize_file_name( $filename ) {

...

	$special_chars = array( '?', '[', ']', '/', '\\', '=', '<', '>', ':', ';', ',', "'", '"', '&', '$', '#', '*', '(', ')', '|', '~', '`', '!', '{', '}', '%', '+', '’', '«', '»', '”', '“', chr( 0 ) );

...

	$filename = str_replace( $special_chars, '', $filename );

Joomla seems to have decided to let the underlying OS decide what it will and will not allow in filenames :shrug:

https://github.com/joomla/joomla-cms/issues/33214

kim.pepper’s picture

Status: Active » Needs review
Issue tags: +Needs tests

Here's a basic implementation. I haven't looked at any tests yet.

mcdruid’s picture

Status: Needs review » Active

Thanks, that's a good start. Looks like you've borrowed the list of chars to strip from wordpress. Perhaps we could remove a few of the "safe" ones based on the list in #7; we're only talking about a few e.g. % + = :.

FWIW here's how typo3 does this:

https://github.com/TYPO3-CMS/core/blob/v13.4.9/Classes/Resource/Driver/L...

The comments suggest that it's pretty strict:

... any character not matching [.a-zA-Z0-9_-] is substituted by '_'

...but in actual fact it's a bit more nuanced as there are some settings around allowing / remapping utf8 characters.

I'll try to do some work on this but am short on time just now.

mcdruid’s picture

Status: Active » Needs review

sorry, didn't mean to change status

kim.pepper’s picture

Removed config option, and updated tests now we replace special chars.

smustgrave’s picture

Issue tags: -Needs tests

Tests appear to be there so removing that tag.

Looking at the solution were all options taken?

kim.pepper’s picture

Looking at the solution were all options taken?

Do you mean are all the special characters included?

smustgrave’s picture

Mean the proposed solution read like there were several approaches? Unless I read that wrong

kim.pepper’s picture

I think we agreed on stripping special characters. It was more about which characters to strip. I have a feeling stripping whitespace might be contentious.

smustgrave’s picture

Can what’s not making it in be scratched from the proposed solutio

smustgrave’s picture

Sorry to be a stickler

cmlara’s picture

As a site owner, I'm going to miss Parentheses, Ampersand and Space., A little less Brackets, Pound Sign and Exclamation Point though I encounter them.

As a security engineer: This is not a bad hardening, however it is certainly not a fix for wherever the faults actually occur.

As someone who has used this in the past to perform sample RCE's against site setups; I can say yes this would make it much harder to exploit.

I also wish to reiterate that for these characters to cause issue someone has had to make some very serious mistakes down stream, mistakes that this change likely does not actually remove the vulnerabilities , just at most makes it significantly harder to exploit (move from Basic to Complex, saving 1 point on the Drupal Vulnerability Scoring ) and maybe move the Impacted Environment from All to Uncommon (2 points on the Drupal Score scale).

I agree with @smustgrave the 'excluded' ideas would be nice to have listed for historical purposes.

smustgrave’s picture

Status: Needs review » Needs work

For the summary cleanup please

kim.pepper’s picture

TBH I don't have a strong opinion on which characters should be included or excluded. I'll happily defer to those with a stronger security background. I _do_ think removing whitespace will be disruptive to site owners.

mcdruid’s picture

Status: Needs work » Needs review
Issue tags: +Needs usability review

Thanks - I'm going to tag this for a usability review (per https://www.drupal.org/docs/develop/issues/issue-procedures-and-etiquett...) as I agree that it's likely stripping spaces from filenames on uploads may not be universally welcomed.

It would, however, have benefits in terms of Security Hardening (it's very hard to achieve Command Injection without any spaces). (Acknowledging the comment in #20 that if a command injection vulnerability of similar exists, this is not the only fix that's needed.)

There are also other benefits like making it easier to iterate over batches of files in the shell etc.. but that's probably out of scope here.

I personally loathe spaces in filenames but I am probably not representative of a broad cross section of users.

So far I think there's some consensus that stripping "special characters" from filenames likely provides enough benefit (in terms of security) to outweigh annoyance some users may feel at having e.g. punctuation marks removed from their filenames, but stripping spaces may be seen as a step too far.

It could be a default that site owners could disable in the file handling options. I'd argue to make the special character stripping mandatory, but that's also open to debate.

Previous comments include some light research on how other CMSs/Frameworks do this. Wordpress and Typo3 both strip (or substitute) most punctuation by default, and it looks like they both swap out spaces for either an underscore or dash AFAICS.

Drupal already has an option to replace whitespace in filenames with _ or - but it's not enabled by default. I'd like to turn that on by default as part of this issue.

What does the UX-Team think about having mandatory stripping of special chars from filenames, plus enabling swapping spaces out for - or _ by default?

(I've not yet updated the IS to eliminate any proposed options as I don't think we have eliminated any yet.. agree we should do that when we get to that point though.)

mcdruid’s picture

I've posted some more details about my motivation for pursuing this change in a blog post about a recent Security Advisory:

https://www.mcdruid.co.uk/article/hacking-ai-module-drupal-cms

Hopefully the extra context this provides outweighs any whiff of self-promotion :)

To summarise the proposal briefly, the change would be:

  • Filenames always have dangerous characters stripped (or replaced with an inert character like -). In practice this means most but not all punctuation (e.g. see list in #8).
  • Filenames have spaces replaced by -. This is existing optional functionality which would be enabled by default.
  • The replacement character used in both cases could be configurable (again, that functionality already exists).

This would be pretty similar to how filenames are sanitised by both WordPress and Typo3.

kim.pepper’s picture

Rebased on 11.x

I also switch the file install config to replace_whitespace: true. I'm not sure this is sufficient as a default or whether we want to go the extra step and enable it via an update hook. This will be disruptive as per #2

mcdruid’s picture

Issue tags: -Needs usability review

Thank you to the Usability Team who reviewed and discussed this issue in their meeting #3526141: Drupal Usability Meeting 2025-05-30. I believe it's still on the todo list to comment here, but the meeting recording and transcript are linked to from that issue so we can glean details of their review in the meantime.

To attempt to summarise - the question was:

What does the UX-Team think about having mandatory stripping of special chars from filenames, plus enabling swapping spaces out for - or _ by default?

...and the answer:

benji: okay, there's a rather long list of special characters that are hard coded.

benji: ... You can't remove anything from that list. It's it's hard coded. The only thing you can change in configuration is whether to replace spaces.

benji: ... the main usability question is whether to allow people to override it.

benji: ... I think I agree that we should

benji: use the more secure default. So by default, replace spaces.

benji: Let let people override. And Ralph agrees.

benji: So we're unanimous on that.

Ralf Koller: [T]hat looks like a reasonable change, definitely.

I'll ask @benjifisher to correct me if that's not quite accurate.

Another important issue that was brought up was whether we're proposing to retrospectively tackle existing files.

I think that's a "no"; the likelihood of unintended consequences is really high if we tried to rename existing files, and we should limit the scope to future uploads only. (This is what Benji and the team correctly assumed).

So let's proceed on the basis that we want to:

  • Introduce mandatory replacement / removal of "dangerous characters", and..
  • Enable replacement of spaces in filenames by default, but allow this to be overridden.

Benji mentioned in the UX meeting that any overrides may have to happen only in settings.php as otherwise an attacker can potentially make those changes themselves (e.g. via XSS in the UI).

That's definitely a good point, but I think in the case of allowing spaces in filenames I'd be comfortable allowing that change to made in the UI if the dangerous characters are always removed with no way of overriding that behaviour.

mcdruid’s picture

Issue summary: View changes

Updates to IS now that we've agreed a proposed solution.

kim.pepper’s picture

Rebased on 11.x and fixed tests. We had space ' ' in the list of special chars, but this is handled separately.

kim.pepper’s picture

Looks like the test fail is an unrelated PHP 8.5 test.

The failure is in core/tests/Drupal/Tests/Composer/Plugin/Scaffold/Functional/ScaffoldUpgradeTest.php

kim.pepper’s picture

benjifisher’s picture

@mcdruid:

I am sorry for the delay in adding a comment. As you said in Comment #26, the Usability team looked at this issue on 2025-05-30. I just reviewed the discussion, and your summary in #26 looks correct.

For the record, the attendees at the usability meeting were benjifisher, rkoller, and simohell.

Thanks for clarifying that the change will not affect new uploads. In fact, the change to the whitespace default will only affect newly installed sites (or sites that newly install the file module).

At the meeting, we had the impression that stripping punctuation was non-negotiable, so the only change we fully considered was the default handling of whitespace.

If you want more feedback from the usability team, a good way to reach out is in the #ux channel in Slack.

smustgrave’s picture

From reading #26 it sounds like the idea was allow for overriding the replacement list.

kim.pepper’s picture

Added the special chars to config with a default fallback.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new91 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

benjifisher’s picture

@kimpepper:

I see that you updated the MR. Should the issue status now be NR instead of NW?

kim.pepper’s picture

Tests are still failing so I left at NW

kim.pepper’s picture

This hasn't moved in a while so I'm creating a MR to just default to removing whitespace to see what breaks.

kim.pepper’s picture

I've been debugging but still can't figure out why change the whitespace setting causes the core/modules/file/tests/src/Functional/SaveUploadTest.php error message to go from:

"The file <em class="placeholder">x�xx.gif</em> could not be uploaded because the name is invalid."

to

The specified file <em class="placeholder">x�xx.gif</em> could not be uploaded.<ul><li>Only files with the following extensions are allowed: <em class="placeholder">jpg jpeg gif png txt doc xls pdf ppt pps odt ods odp</em>

...even though that extension is allowed. 🤔

mcdruid’s picture

StatusFileSize
new347.33 KB

Thanks for picking this back up.

I'm not yet sure exactly why this is happening, but looks like the problem's in \Drupal\file\EventSubscriber\FileEventSubscriber::sanitizeFilename

...specifically:

    if ($fileSettings->get('filename_sanitization.replace_whitespace')) {
      $filename = preg_replace('/\s/u', $replacement, trim($filename));
    }

When the filename containing the invalid unicode goes through that regex, it seems to end up set to null.

We can reproduce this in isolation in a PHP (8.3.6) shell:

php > $filename = "x\xc0xx.gif";
php > $replacement = '-';

php > var_dump($filename);
string(8) "x�xx.gif"

php > $filename = preg_replace('/\s/u', $replacement, trim($filename));

php > var_dump($filename);
NULL

So the sanitization ends up effectively destroying the filename (the part before extension) completely and we're left with a filename of just gif without the leading dot (this is back in \Drupal\file\Upload\FileUploadHandler::handleFileUpload after the FileUploadSanitizeNameEvent has been dispatched):

whitespace regex blows the filename away leaving just the string gif

That filename then fails the allowed extension validation because... well, everything's gone pear shaped.

Not sure why the whitespace stripping regex is doing this, but that seems to be why we're seeing the weird test failure.

mcdruid’s picture

php > $filename = "x\xc0xx.gif";
php > $replacement = '-';
php > $filename = preg_replace('/\s/u', $replacement, trim($filename));

php > var_dump(preg_last_error());
int(4)

php > var_dump(PREG_BAD_UTF8_ERROR);
int(4)

AFAICS we're hitting:

https://www.php.net/manual/en/pcre.constants.php#constant.preg-bad-utf8-...

...the last error was caused by malformed UTF-8 data (only when running a regex in UTF-8 mode).

So I'm not sure if that's a problem with the test data or what.. but if it's possible to feed that input to the sanitisation as a filename, we probably ought to handle the error more gracefully.

It doesn't look like preg_replace() throws anything catchable here, so we might have to check for the null value and decide if we want to check preg_last_error() to provide detailed feedback / logging about what's gone wrong, or simply bail out without trying to do further validation etc..

kim.pepper’s picture

I tried checking for a NULL return value from preg_replace() and logging the error message.

kim.pepper’s picture

I feel we should have a separate check for invalid UTF-8 characters outside the whitespace replacement.

kim.pepper’s picture

Per #43 I added an explicit check in FileUploadHandler and changed from logging an error to throwing an exception in FileEventSubscriber::sanitizeFilename()

kim.pepper’s picture

Status: Needs work » Needs review

I think a check at the top of FileEventSubscriber is the simplest approach.

kim.pepper’s picture

I wonder if we should split off a new issue and consider it a bug for not handling invalid UTF-8?

mcdruid’s picture

Yeah I think a separate issue to work on the unicode handling problem would make sense.

kim.pepper’s picture

OK will take a look tomorrow.

kim.pepper’s picture

Title: Disallow dangerous filenames e.g. command injection characters » [PP-1] Disallow dangerous filenames e.g. command injection characters
Status: Needs review » Postponed

I think we should postpone this on #3562543: FileEventSubscriber::sanitizeFilename() does not check for invalid UTF-8 chars It would be great if those following this issue could review that.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.