Problem/Motivation
While working on #3516706: [PP-1] Disallow dangerous filenames e.g. command injection characters we discovered that invalid UTF-8 characters can cause preg_replace() to return NULL. You need to call preg_last_error() or preg_last_error_msg() to see that it caused a PREG_BAD_UTF8_ERROR.
This error is not currently checked and a $filename of NULL is then passed to subsequent validation checks. These will fail, but the reason isn't obvious. For example, you might get an 'invalid extension error' which is confusing to users.
Steps to reproduce
Proposed resolution
In Drupal\file\EventSubscriber\FileEventSubscriber::sanitizeFilename() check for invalid UTF-8 characters before doing any other validation checks and throw a Drupal\Core\File\Exception\FileException if invalid. This will get caught and handled in file_save_upload().
Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
Issue fork drupal-3562543
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
kim.pepperReady for review.
Comment #4
kim.pepperRan a test-only job to show the failure.
Comment #5
smustgrave commentedPretty simple review :)
Test-only was already ran here https://git.drupalcode.org/issue/drupal-3562543/-/jobs/7621295 which shows the test coverage.
Changes in core/modules/file/src/EventSubscriber/FileEventSubscriber.php look good and even better have a simple comment added to further explain.
LGTM!
Comment #7
quietone commentedI applied the suggestions and now the Kernel tests are failing.
Comment #8
kim.pepperThey look unrelated:
core/modules/package_manager/tests/src/Kernel/PhpTufValidatorTest.phpComment #10
kim.pepperSeemed like a random fail. Rebased on main and back to green.
Comment #11
smustgrave commentedRebased and test failure appears to be random /builds/core/modules/layout_builder/tests/src/FunctionalJavascript/InlineBlockTest.php:528.
Feedback appears to be addressed on this one though.
Comment #12
dwwLeft a few concerns/musings on the MR. Not worth downgrading to NR over them, but wanted to at least ask before this is committed.
Thanks,
-Derek
p.s. I thought I was going to have to point out #3545276: expectDeprecation() is deprecated and friends, but I realized this is an exception test, not a deprecation.
Comment #14
alexpottI was about to commit this and think I wondered whether throwing an exception here is the correct place. I think we should be adding this check to \Drupal\file\Validation\Constraint\UploadedFileConstraintValidator so we can output a message to the user in the form rather than triggering a 500.
I think we need to leave the exception in as this will cover places not using the \Drupal\file\Upload\FileUploadHandler but adding this as a violation via the constraint will be a make nicer UI.
Comment #15
alexpottAnother thought that adds a layer of complexity was does transliteration do with these characters if it is enabled? It does...
I wonder if that offers us a way forward rather than rejecting the file - if transliteration is not enabled we add similar logic to \Drupal\file\EventSubscriber\FileEventSubscriber::sanitizeFilename() instead of throwing an exception.
Comment #16
alexpottWe'd have to do something like...
Inspired by \Drupal\Component\Transliteration\PhpTransliteration::transliterate() ... maybe we could refactor this into public static method on \Drupal\Component\Transliteration\PhpTransliteration so we don;t have to duplicate code.
Comment #19
smustgrave commented@shivansh_0909 fixes should continue in the existing MR please.
Comment #20
shivansh_0909 commentedI have opened a merge request to improve the UTF-8 exception message assertion.
Merge request: https://git.drupalcode.org/project/drupal/-/merge_requests/15138
Happy to make any changes if needed. Thanks!
Comment #21
dwwAnd LLM use needs to be disclosed.
And fixes should actually address the given feedback.
Comment #22
shivansh_0909 commentedI have addressed the UTF-8 exception message assertion and fixed all PHPCS and PHPStan issues. The pipeline is now passing. Ready for review.
Comment #24
dww@shivansh_0909 - at this point, it’s not clear if you’re a human or an LLM agent. These “contributions” are totally unhelpful for this issue. You need to read what we’ve said and act accordingly. I’ve closed your MR. If you actually want to help, you’ll have to read and understand comments 13, 14 and 15, then make the suggested changes and push them to the existing MR. If you continue to push commits to new branches that aren’t directly addressing those comments, we’re going to assume you’re using automated tools to generate slop and waste our time, and will likely request that your account be banned.
Comment #25
alexpottAlso before we do #13 or #15 we need to have a discussion about what is the correct approach. Currently we have:
UploadedFileConstraintValidatorand continue to throw an exception inFileEventSubscriber::sanitizeFilename()FileEventSubscriber::sanitizeFilename()We need to decide what to do before we start coding.
I think because of the existing behaviour of transliterate and our approach of manipulating the filename in
FileEventSubscriber::sanitizeFilename()we should do that rather than error. The question is how and when. Should we only do this if we've not transliterated and'filename_sanitization.replace_whitespace'or'filename_sanitization.deduplicate_separators'is enabled... as that is the only way an error is going to be caused. Or should we add a new event listener that does this sanitisation at the very beginning of the process so all other event listeners can be sure the filename is valid unicode. Not sure.Comment #26
shivansh_0909 commented@dww Hey, thanks for the feedback and sorry for the confusion earlier I realize I jumped in without fully understanding the ongoing discussion and should’ve contributed to the existing MR instead of opening a new one. I’ll go through comments 13–15 properly to understand the approach being discussed before making any further contributions. Appreciate the guidance.