Problem/Motivation
Per #1314214: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) there is currently no uniqueness constraint on the uri field in the file_managed
table enforced in the db schema in D8, because of limitations on lengths of keys under utf8mb4 with lowest-common-denominator MySQL configs. Instead we rely on a Drupal/application-level constraint in \Drupal\file\Plugin\Validation\Constraint\FileUriUnique
.
However, this constraint is not currently enforced in file_save_upload()
, as the validate()
method is never explicitly called on the $file
object.
Coupled with the way new filenames are chosen in file_destination()
and file_create_filename()
(based on checking file_exists()
on a candidate name) this can lead to a race condition whereby more than one entry can be created in the file_managed
table with the same value for the uri
field.
When this happens, there is a risk of data loss. For example, if garbage collection deletes temporary files which have the same uri
as other (permanent) files.
Slow implementations of hook_file_validate()
(e.g. in the ClamAV module) can exacerbate the race condition, by increasing the likelihood that multiple parallel(ish) processes select the same filename before anything is written to disk.
Proposed resolution
Enforce the unique constraint on files' uri
to avoid multiple rows with the same value in the file_managed
table.
Remaining tasks
- Review the patch.
- Commit the patch.
User interface changes
There may be a new error message presented to users if they try to upload a file and the race condition occurs. However, it actually seems that changes since the initial report of this bug make it less likely that this will happen in the UI (see #52 for example).
API changes
None.
Data model changes
None.
Release notes snippet
Use "Problem/Motivation" section above?
Original report by manarth
First discovered in the ClamAV module: #2864506: Long ajax process causing files to disappear.
A slow implementation of hook_file_validate()
can cause inappropriate entries in the file_managed
table, causing an in-use file to be treated as orphaned, and therefore deleted on a cron-run. This causes permanent loss of data.
More significantly, the upload appears to succeed and the data is available for a period of time, whilst the data-loss happens on cron, between 6 hours and 3 months after the data was uploaded. This can make the cause of the issue harder to recognise, and reduce the likelihood that the original file is still available for recovery.
Steps to reproduce:
Not possible to manually reproduce since 8.6.x (see #52) - see the new test instead: SaveUploadTest.php::testDuplicate()
- Prepare a Drupal install with a content-type that contains a file field.
- Enable a module which has a slow implementation of
hook_file_validate()
; either use example code provided below, or install this sandbox (machine name: slow_file_upload) - Go to the node-add page for the relevant content-type
- Click the "browse" button, and choose a file to upload
- Click the form's "Save and publish" button quickly, before the AJAX upload + validate process has completed
The outcome is that two entries are added to the file_managed
table, and one entry added to the file_usage
table. Here is an example:
mysql> select * from file_managed;
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| fid | uuid | langcode | uid | filename | uri | filemime | filesize | status | created | changed |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
| 1 | 3bcdd711-988e-4906-b8c6-0b22df271b93 | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 0 | 1492187343 | 1492187343 |
| 2 | 0a7a9874-7e73-43da-8eeb-a7caa03ae63d | en | 1 | foo.txt | public://2017-04/foo_0.txt | text/plain | 9 | 1 | 1492187346 | 1492187346 |
+-----+--------------------------------------+----------+------+----------+----------------------------+------------+----------+--------+------------+------------+
2 rows in set (0.00 sec)
mysql> select * from file_usage;
+-----+--------+------+----+-------+
| fid | module | type | id | count |
+-----+--------+------+----+-------+
| 2 | file | node | 1 | 1 |
+-----+--------+------+----+-------+
1 row in set (0.00 sec)
One entry is correctly recorded in the file_managed
table, with a corresponding file_usage
entry; the second is an orphaned file. However, both entries share the same uri. When the cron cleanup process deletes orphaned files, the valid, in-use file is deleted, causing permanent data loss.
The issue was originally reported in the context of the ClamAV module, however, I've been able to reproduce using nothing more than a sleep(10)
command in a new debug module:
/**
* Implements hook_file_validate().
*/
function mymodule_file_validate(Drupal\file\FileInterface $file) {
$errors = [];
if (!isset($_POST['op']) || ($_POST['op'] != 'Save')) {
// Add a delay if we're not actually saving the node.
sleep(30);
}
return $errors;
}
This particularly affects the ClamAV module, where certain virus-scan configurations can add 10 seconds to the file-validate process, but this issue could affect any scenario where validation invokes a slow external process.
Comment | File | Size | Author |
---|---|---|---|
#91 | 2869855-91.patch | 7.64 KB | mcdruid |
#91 | 2869855-91_test_only.patch | 5.96 KB | mcdruid |
#91 | interdiff-2869855-88-91.txt | 919 bytes | mcdruid |
#81 | 2869855-81.patch | 7.32 KB | mcdruid |
#81 | 2869855-81_test_only.patch | 5.64 KB | mcdruid |
Comments
Comment #2
manarth CreditAttribution: manarth as a volunteer commentedComment #3
xjmYikes. Thanks for reporting this!
Comment #4
xjmComment #5
cilefen CreditAttribution: cilefen commented@manarth: Thank you for discovering and reporting this issue. @xjm, @alexpott, @effulgentsia, @lauriii, @catch and I discussed this issue at a recent meeting and agreed it is critical priority because of data loss.
Comment #6
NancyDruDoes this also affect 7.x?
Comment #7
Wim LeersExactly the same thing is being reported at #2884052: Uploading a managed file on a custom form that allows multiple files also triggers the remove button which results in a duplicate temporary record and results in the file being deleted.
It sounds like this is in fact a duplicate of that, which would mean that
hook_file_validate()
is irrelevant here.Comment #8
catchGoing to mark this as a duplicate of #2884052: Uploading a managed file on a custom form that allows multiple files also triggers the remove button which results in a duplicate temporary record and results in the file being deleted since it does sound identical.
Comment #9
vijaycs85Comment #10
mcdruidUn-marking this as a dupe of #2884052: Uploading a managed file on a custom form that allows multiple files also triggers the remove button which results in a duplicate temporary record and results in the file being deleted as I think they're very closely related, but not exactly the same thing. I'd argue the other issue should stay focused on the JS fix, whereas this one should look at the problem of duplicate rows in the file_managed table, and the race condition in file_save_upload.
I'll update the IS shortly.
I was able to reproduce the problem of ending up with duplicate rows in the file_managed table for the same file if you add a slow implementation of hook_file_validate, and submit the node form containing a file field before the ajax upload has completed. It's even easier to manually reproduce if you make the validation hook slow only when it's not processing a full node form submit e.g.:
The result is indeed that you end up with two rows for the same file in file_managed, with the same uri. The one from the node form submission is marked as FILE_STATUS_PERMANENT (1) whereas the one from the ajax upload is temporary (0):
The temp file (status 0) will be cleaned up by a cron run, and that will blow away the permanent file (status 1) too, as they share the same uri.
I think the problem here is that there's potentially quite a big gap between file_save_upload deciding what filename (/ uri) it's going to use for an upload:
http://cgit.drupalcode.org/drupal/tree/core/modules/file/file.module?h=8...
...and actually writing that file to disk:
http://cgit.drupalcode.org/drupal/tree/core/modules/file/file.module?h=8...
...and eventually saving the file to the db:
http://cgit.drupalcode.org/drupal/tree/core/modules/file/file.module?h=8...
I think the significant gap though is between the call to file_destination and actually moving the upload to that destination.
In between, validation takes place and this is why a slow validation exacerbates the problem.
file_destination - and file_create_filename which it calls - decide on the filename which should be used based on checking file_exists.
This means if more than one thread processes an upload with the same filename / upload settings more-or-less in parallel, they can both decide to use the same destination based on the absence of a file at that path when they call file_destination.
One very simple - in fact, crude - way of getting around this is to "claim" the destination once its been derived by touching the file on disk e.g.:
With this in place, trying to reproduce the issue with the duplicate entries gives us something more like:
...where the cleanup of the temp file would no longer blow away the permanent one.
I'm not suggesting we actually implement that, but I think it illustrates the point. I'll include a patch with just that change to get the ball rolling. (Reasons not to do this include that some stream wrappers may not support touch, and you may end up creating a mess on the disk if you touch files which don't end up with corresponding entries in the db, and are therefore never cleaned up).
What other approaches could be taken to "claiming" the destination before the file is actually written to disk?
The existing locking API could be used; as well as checking file_exists, the upload process could acquire a lock for a not-yet-used destination. That seems almost like a fit, but I'm not sure it's really what the lock API is for.
We could implement something new where an entry is made in the db (/ pluggable storage backend) when a destination is claimed but the file has yet to be written to disk (assuming there isn't already something like this that I'm not aware of).
Fundamentally, it probably shouldn't be possible to end up with multiple entries in the file_managed table with the same value for uri. Perhaps the scope of this issue should be limited to ensuring that doesn't happen during file_save_upload though.
Comment #11
mcdruidComment #12
mcdruidAnother thought; it might improve the UI if we disabled the submit button on the node form while the ajax upload is being processed?
Comment #13
mcdruidNoting also that file_save_upload works in almost exactly the same way in D7 in terms of the call to file_destination (which is also pretty much the same), validation, and then drupal_move_uploaded_file. So it seems quite likely that the same issue applies to D7?
Comment #14
mcdruidAnother way of looking at this would be to tolerate the duplicate entries in file_managed, but ensure that file_cron doesn't actually delete files from disk if there's another file ( / row in the table) with the same uri.
That's sort of similar, but not quite the same as what's being looked at in #2821423: Dealing with unexpected file deletion due to incorrect file usage.
Personally, I think I'd prefer to avoid the duplicate entries.
Comment #15
mcdruidComment #17
mcdruidHad a quick look at this in D7. AFAICS the same problem exists, but there's a significant difference in that D7's schema includes this:
http://cgit.drupalcode.org/drupal/tree/modules/system/system.install?h=7...
So if you end up with the race condition between two upload processes trying to use the same uri, one will hit an error e.g.:
In a couple of tests I did using a slow implementation of hook_file_validate (identical to the D8 one outlined in a previous comment), it was the ajax upload's temp file (status = 0) which hit the Integrity constraint violation, and therefore the problem being discussed in this issue was avoided, albeit somewhat crudely.
In the absence of a more elegant solution to this race condition, adding a constraint in the schema in D8 would be one workaround here.
Comment #18
mcdruidTurns out the uniqueness constraint on the uri field in the file_managed table was removed as part of #1314214-257: MySQL driver does not support full UTF-8 (emojis, asian symbols, mathematical symbols) because of the limitations on lengths of keys under utf8mb4 with lowest-common-denominator MySQL configs. Per that issue:
That application-level constraint is provided by core/modules/file/src/Plugin/Validation/Constraint/FileUriUnique.php and is tested in:
https://api.drupal.org/api/drupal/core%21modules%21file%21tests%21src%21...
However, AFAICS that constraint is never enforced in file_save_upload, as the validate method is never explicitly called on the $file object.
Plus the uri is actually set on the file object after validation:
(The comment at the end there is misleading; the last line typically sets the uri to the actual path where the file will be stored e.g. public://2018-01/druplicon.png)
Adding in a quick crude test after the uri has been set, but before the file is saved:
...shows that the uniqueness constraint would trigger a violation when this race condition results in two uploads using the same uri e.g.:
[Fri Jan 19 12:13:32.608268 2018] [:error] [pid 859] The file <em class="placeholder">public://2018-01/druplicon_5.png</em> already exists. Enter a unique file URI., referer: http://example.com/node/add/article
So it seems to me that we need to fix the fact that the "Drupal-level unique constraint on the File URI field instead of a database-level unique constraint" isn't enforced at the moment for uploads processed by file_save_upload.
(Or, of course, we could put the constraint back into the db if transliteration of file names landed.)
Comment #19
mcdruidHere's a patch which adds a call to the validate method on the file object itself.
To make this work, we have to move
$file->setFileUri
to before$file->validate
, but (if we change nothing else) it seems that we can't set the uri before callingfile_validate
. That's probably worthy of a comment if we go with this approach.I've also blown away the comment about moving the file to the temp directory as that seems to be legacy cruft AFAICS.
In my testing, this prevents the race condition creating two files with matching uri's in the file_managed table. I've also confirmed that the error message from the
FileUriUnique
constraint is displayed when appropriate (in order to see it I had to hack a value into the file_managed table which I knew would collide with the next uri thatfile_destination
would come up with).Comment #21
mcdruidWill have a look at why the patch caused a handful of upload tests to fail...
Also, discussed with @catch - we should file a follow up to reconcile
$file->validate
and(hook)_file_validate
.Comment #22
mcdruidAFAICS the tests are failing because of the extra validation e.g.:
So those tests may need a few tweaks in order to prevent them from triggering the violation of the unique constraint (which we've not been checking properly before).
These are the tests in question:
Comment #23
mcdruidThe reason the tests are failing is that the validation of the constraint is currently happening when
$replace
isFILE_EXISTS_REPLACE
.A little later in
file_save_upload
Drupal does this:...so the record in the
file_managed
table will be updated for this replacement upload, and a duplicate record is not created.It's therefore not valid to flag a violation of the constraint.
I don't think it makes sense to only call
$file->validate()
if$replace
is notFILE_EXISTS_REPLACE
, as there could theoretically be other constraints which should be tested (I don't think that's the case at present, but it could be).One way around this would be to implement the constraint as one of the validators passed to
file_validate()
which would mean we could decide not to check for uniqueness when$replace
isFILE_EXISTS_REPLACE
.I'm not sure if the constraint is ever tested by Drupal without this patch... I'll see if I can find that out before thinking about removing the existing constraint in order to implement it differently.
Another option would be to see if we can test the contraint(s) as part of the actual
$file->save()
.Comment #24
mcdruidHere's a simple way of making this work; leave the call to
$file->validate()
until just before$file->save()
which means we don't get a violation forFILE_EXISTS_REPLACE
as the existing file entity is going to be updated.All the tests which failed with #19 pass locally with this patch.
It's a bit ugly doing two rounds of validation, but AFAICS if the first round fails, the second will never be reached so we shouldn't end up displaying two sets of error messages.
(This patch restores the misleading comment about moving files to the temp directory, but that's probably out of scope here anyway - I'll file a follow up for that).
Oh, and this does successfully prevent the duplicate rows in
file_managed
when the race condition's caused by submitting the node form while the ajax upload is still underway.Comment #25
mcdruidFollow up to remove the misleading comment: #2938853: Remove outdated and misleading comment re. temp directory from file_save_upload
Comment #26
webchickLooks like this still is in need of tests.
Comment #27
mcdruidHere's a test which I think confirms the problem.
It simulates the race condition which allows two managed files to be created with the same URI by:
* Uploading a test file.
* Removing the file from disk, but leaving the entry in the file_managed table.
* Uploading the same file again.
At present this results in two entries in the file_managed table with the same URI, which is the situation we're trying to avoid.
So we expect this test to fail.
I don't think the patch from 24 will make this test pass successfully, as it should make the second upload fail validation.
Hopefully this test is useful to illustrate the problem we're trying to solve though.
I suppose the next question is whether to adapt the test so that it "matches" the patch, or the other way around.
To change the patch so that it makes this test pass would mean altering the way that Drupal chooses the file name (URI) it should use for an upload, as opposed to running the validation on the file object to enforce the URI uniqueness constraint once the wrong (duplicate) file name has already been derived. Perhaps that would be a better approach, but it seems like it will require more significant changes than the current patch.
Either way, I think we do still want to ensure that the URI uniqueness constraint is enforced during uploads, which is not the case at present (as this test hopefully illustrates).
Comment #29
mcdruidHere's another test-only patch which should fail, showing that the URI uniqueness constraint is not being enforced when an upload creates a duplicate URI in the
file_managed
table....plus a patch which enforces the URI uniqueness constraint in
file_save_upload
and should therefore make the new test(s) pass.Comment #31
mcdruidConfirming that was the expected result; test-only patch failing and the other passing.
Comment #32
mcdruidComment #34
wturrell CreditAttribution: wturrell as a volunteer commentedRerolled for 8.7.x
The test could need refactoring - as the reason the patch wouldn't apply is the changes in #2870448: Convert web tests to browser tests for file module - I've not studied it in detail, all I've done is move it to the correct file. No changes to the patch itself.
Comment #37
wturrell CreditAttribution: wturrell as a volunteer commented(again)
Comment #38
wturrell CreditAttribution: wturrell as a volunteer commentedComment #41
wturrell CreditAttribution: wturrell as a volunteer commentedCan someone else verify the extensive refactoring in #2894193: Multiple file upload does not validate extensions correctly has fixed this issue in core? (I asked there about it as well.)
Using the steps in the issue summary, I can no longer reproduce the duplicate DB entries.
Comment #42
mcdruidThanks for looking at this!
Unfortunately I don't think the changes have fixed the duplicate db entries issue.
I ran the test from this issue (having pulled all the latest changes), but commented out the assertion that checks for the extra validation the patch in this issue introduces:
(I also updated the way the test checks the max fid based on the test from the other issue).
That last assertion failed, and checking in the test db:
So it looks like the problem is still there AFAICS; the test ended up with two managed files with the same value for
uri
.Comment #43
wturrell CreditAttribution: wturrell as a volunteer commented@mcdruid ok, firstly, having got tests running locally, I can confirm what you found, i.e. the new test is still failing on 8.7.x.
I'm confused (largely because I'm guessing) about how to reroll the patch.
e.g. I tried adding your
$file->setValidationRequired(TRUE);
in the same place (line 965) and the $violations foreach loop (the one appends to the $errors array) after the call to file_validate(), but if I do that, every test screenshot is showing a failed file upload message with the error: 'This value should be of the correct primitive type.'Comment #44
mcdruidThanks again for wrestling with these; the core module had changed quite significantly in the meantime!
I think these patches bring the changes up-to-date.
I've made a change in the test-only patch by commenting out the check for the message which would only ever appear with the extra validation in place. In other words, the test-only focuses on whether the duplicate entries were created in the file_managed table.
I'm getting the expected results locally, let's see if d.o agrees...
Comment #46
mcdruid...and that was the expected outcome.
Leaving this as "Needs review" as that's what needs to happen for the actual changes in the patch now.
Comment #47
wturrell CreditAttribution: wturrell as a volunteer commentedWhat's the shortest sleep interval we can reasonably use? i.e. with sleep(600) as now, all core test jobs will take 10 minutes longer?
Comment #48
andypostSleep 300 should be enough
Comment #49
mcdruidOoohhhhhh! I'm glad you pointed that out. I hadn't noticed the
sleep()
being introduced (it wasn't there in the patches from #29).The sleep in the steps to reproduce is not actually necessary in the test; that's just to make one thread pause in order to allow the race condition with another thread to be reproduced manually.
In the test, we do this instead:
Here are the same patches again without the sleep (unsurprisingly the tests run a lot quicker!)
Comment #51
mcdruid...and confirming again that we expected the test-only patch to fail.
...and that "Needs review" is appropriate.
Comment #52
wturrell CreditAttribution: wturrell as a volunteer commentedJust to point out, as mentioned previously, I've been unable to reproduce this manually for a while - I've tested today on a few different versions of core (without the patch) and it seems that's been the case since 8.6.0 (8.5.x is still affected, including 8.5.8).
I'd like to be more precise about what changed, however I gave up trying to run git bisect across releases because of all the complications with composer and the database.
I've added a note to the issue summary, updated the old test code there with the more recent version (and also created a module others can use, which contains the hook that adds a 30 second delay). With new versions of core, if you test manually, you won't see any new entry in the file_managed table until the sleep() command has finished executing.
Comment #53
mcdruidThanks wturrel, looks like you're right... the original steps to manually reproduce this no longer seem to produce quite the same outcome.
I'm not sure exactly what's changed in the interim either.
I think the test (e.g. in the patches from #49) still proves that there's a potential race condition because the uniqueness constraint on
file_managed.uri
is not being enforced... which - I believe - can still lead to the issue described whereby files are unintentionally / unexpectedly deleted.Comment #55
rjg CreditAttribution: rjg commentedI am able to recreate this issue in Drupal 8.7.5. I've had a handful of files go missing in Production and have been trying to track down the issue and just now I am able to consistently replicate.
hook_file_validate()
with a 5 second sleepWhen the AJAX request for the second file field (remove image) is initiated before the browser receives the response from the first file field (upload PDF), the form is not yet updated with the AJAX response, and the second request includes the "new" PDF file and core acts on it. It will create a second new file entity with the uploaded file, though this one will be marked as "temporary".
The race condition comes in where
_file_save_upload_single
checks for duplicate file names before runninghook_file_validate()
and before actually saving the file. So, if the 2 AJAX request makes it to_file_save_upload_single
before the first AJAX request is finished, then both requests will create separate file entities with the same exact URI.Comment #56
rjg CreditAttribution: rjg commentedAdding some more information to my previous comment (#55):
hook_file_validate()
to force a delay, the first file AJAX request almost always finishes before getting a chance to click the second file field.So I feel the race condition in
_file_save_upload_single
must be leading to both file entities getting the same URI.I don't know if the file fields concurrent AJAX requests problem should be a separate issue from this one. Anyone have thoughts on this? My first reaction is ideally multiple form ajax requests are forced to be synchronous, whether through some queuing mechanism or disabling other AJAX form fields until the first AJAX response is received. Definitely sounding like larger scope than this issue. EDIT: related to https://www.drupal.org/project/drupal/issues/2884052 ?
I think originally this race condition issue occurred with 1 file upload and with a node save POST, and may have been fixed coincidentally due to other core updates because I can't recreate this issue in core 8.7. If I upload 1 file and click save before the AJAX request is done, I get an error that my file field is required but empty.
Comment #57
rjg CreditAttribution: rjg commentedDrilling down more into
_file_save_upload_single()
:When two file entity AJAX requests execute at the same time, the second request for example to remove an image field has the form data for the 'new' file upload, and that file gets uploaded a second time (scoping issue? the AJAX request passes the field name in to the file widget processor and it's executing actions on a separate field as well), and if “getDestinationFilename()” gets called on the second AJAX req before “moveUploadedFile()” is called on the other first, the second file entity that was created earlier in the function will keep the same file URI.
Two file AJAX requests can execute at the same time if the first is a file upload of either a large file, a slow client connection, or uploading a remote file, and the second file field is modified before the first request finishes.
It makes sense to call
getDestinationFilename()
early on, before file name length and hook file validate logic. Maybe it should immediatelytouch
the destination to alleviate filename conflicts, as long as it deleted the blank file if any errors are thrownComment #58
wturrell CreditAttribution: wturrell as a volunteer commented@rjg: does applying the patch make any difference to your situation?
Comment #59
rjg CreditAttribution: rjg commented@wturrell great news, the patch in #49 fixed the duplicate file issue for me:
I tested on core
8.7.7
There's room for improvement in the content management UX in the entity form, but I believe this addresses the critical issue (data loss).
Testing steps & results:
hook_file_validate()
withsleep(5);
(or 10) in an active modulefile
field and animage
fieldWhat I think is bad UX is the image removal silently failing. If the image field is not required, it would be easy for a content manager to assume the file was removed and overlook that it's still displayed on the node edit form.
Since the image removal did not work, I tested again to see what happens with a new node and uploading 2 files at once. The same error occured and duplicate files were prevented, and there was no data loss or unintended results with the image field. Testing steps & results:
Comment #60
dpagini CreditAttribution: dpagini as a volunteer commentedBased on the previous comment, thought it was appropriate to push this to RTBC status, and potentially get the next round of reviews.
I'm having a little bit of a hard time following the problem here, but what's confusing to me is that the latest "test only" patch has code updates to the file.module file. Wondering if that was an oversight, because I see the other test-only patches only had a test method...?
Comment #61
Krzysztof Domański1. Methods (assertResponse and assertRaw) call uses 2 parameters, but method signature uses 1 parameters.
2. There may be less code.
3. Fixed test only.
@dpagini Good point!
Comment #63
Krzysztof DomańskiUploaded new test only. Commented message test that does not yet exist.
Comment #65
Krzysztof DomańskiCleaning patches and comments.
1. It has been reviewed in #60 so back to RTBC. Small changes - see interdiff-49-65.
2. I updated test only, because in #49 it had code updates to the file.module file.
The test only has the commented line below. That is because this message does not yet exist (it will be added in a new patch).
Comment #70
klonosSince this has been confirmed to exist in D7 too, should this be tagged for backport?
Comment #71
catch@klonos for 7.x backport it's best to create a separate issue against 7.x, linking back to this one.
Comment #72
Wim LeersThis seems to be bringing the test coverage that we have for REST/JSON:API file uploads (
\Drupal\Tests\rest\Functional\FileUploadResourceTestBase::testPostFileUploadDuplicateFile()
and\Drupal\Tests\jsonapi\Functional\FileUploadTest::testPostFileUploadDuplicateFile()
respectively) over to form-uploaded files.Does this race condition then also exist there?
Both REST and JSON:API already do perform entity validation. I think that means they're safe from this?
Should we expand the REST/JSON:API test coverage to also simulate this race condition?
Comment #73
mcdruid@klonos see #17 - D7 is protected by the unique constraint on the file uri in the schema, so although it's possible that the code will try it the db won't allow two rows in the file_managed table with the same value for uri. Perhaps there's room for improvement there, but I'm not sure a backport for D7 would be prioritised given that the schema prevents the race condition from causing the data loss issue outlined here.
@Wim Leers thanks for the review; I'll try to update the IS and have a look at REST/JSON:API unless anyone beats me to it.
Comment #74
mcdruidComment #75
mcdruidComment #76
mcdruidHopefully the IS is in better shape now.
Comment #78
mcdruidConfirmed that REST and JSON:API already do entity validation on file uploads and therefore enforce the unique constraint.
Added test coverage for both:
\Drupal\Tests\jsonapi\Functional\FileUploadTest::testPostFileUploadDuplicateFileRaceCondition()
\Drupal\Tests\rest\Functional\FileUploadResourceTestBase::testPostFileUploadDuplicateFileRaceCondition()
...which should hopefully verify this - I've added these to the test_only patch, but these should pass where we expect
\Drupal\Tests\file\Functional\SaveUploadTest::testDuplicate()
to fail.I'd hope to put this back to RTBC if the only failure is that one in the test_only patch.
Comment #81
mcdruidOops, I see what I did there. In the REST and JSON:API tests, we're working directly with a
$response
object so calling$this->assertResourceErrorResponse(422, ...
doesn't work.Changed that to simply checking for the 422 status code in order to hopefully see these passing. We could add more assertions to check for e.g. the error text "Unprocessable Entity: file validation failed." (and the rest).
I'm not sure what they best way of doing that would be in these tests, other than something like:
Suggestions for how to do that better welcome!
Comment #83
dpagini CreditAttribution: dpagini as a volunteer commentedThis looks good to me, and other than the added test changes, there really hasn't been much changed from previous patches, so I think this is appropriate to move back. Looks good to me.
Comment #84
dpagini CreditAttribution: dpagini as a volunteer commentedIs it worth mentioning/consideration on any front-end regression from #59?
Comment #85
mcdruidOk, good that we got the expected result that time.
However, I misunderstood the errors in #78 and I think we can use
$this->assertResourceErrorResponse(422, ...
instead of just checking for the status codes.It caught me out that the function signature's slightly different for
assertResourceErrorResponse()
in rest and jsonapi.With any luck these should work, and they test the full validation error messages when the file URI already exists.
Interdiff is with #78 as I'm trying to fix those failures. We know #81 worked when only checking status code.
[edit: cross-posted with 83 and 84... yes, thanks - I agree, and will put this back to RTBC if/when the tests pass as expected]
Comment #88
mcdruidNearly there, hopefully.
The error message in the rest module actually seems to come from
\Drupal\rest\Plugin\rest\resource\EntityResourceValidationTrait
which means it's very slightly different to what I was expecting.It also seems to include a trailing newline
\n
which we can see being added in theforeach()
in that trait'svalidate()
method, but that's one for a followup if we want to address it.Let's try again...
Comment #89
mcdruidFollow-up for the trailing newline is #3082130: \Drupal\rest\Plugin\rest\resource\EntityResourceValidationTrait produces error messages with a trailing newline.
If that were to be committed first, we'd have to remove the newline from the
$message
in the REST module test here.Comment #91
mcdruidOh dear, just some escaping I shouldn't have left in the expected error message this time I think...
Comment #93
mcdruidGot there in the end!
Comment #94
mcdruidComment #95
alexpottCrediting @manarth for filing the issue, @dpagini, @Wim Leers for code review and @cilefen, @catch and @xjm for issue triage.
Committed e103770 and pushed to 8.8.x. Thanks!
I've committed only to 8.8.x for now just in case triggering the validation throws up some unexpected issues. That said this should be safe for backport because we should not be saving invalid entities in the database. Also 8.8.x is just around the corner.
One possible follow-up is that the message
Is not the most user friendly. I didn't push back on this because:
Comment #97
Wim Leers#78+++++++++++
Thank you so much for adding REST + JSON:API test coverage for this too! 👏
Comment #99
Gábor HojtsyComment #100
pameeela CreditAttribution: pameeela commentedIf there's a possible disruption as a result of this can we please add a release note snippet to the IS?
If no disruption, we can just untag it as 8.8.0 release notes and include it as a critical bug fix.
Comment #101
catchThere shouldn't be any disruption here, it's 100% bug fix and something the vast majority of sites won't run into.
Comment #102
Wim LeersRestoring the tag that @Gábor Hojtsy added and I think @catch accidentally removed :)
Comment #103
pameeela CreditAttribution: pameeela commentedThanks @Wim Leers - we’re using the release notes tag for issues that should be called out as disruptive, so it should be removed - but this will still be included in the list of fixes.
Comment #104
Wim LeersAh! My bad :) Thanks for clarifying that, @pameeela! 👍
Comment #105
alexpottTurns out there is disruption here :) see #3089697: Regression due to File::setValidationRequired in _file_save_upload_single() however I think we should fix this before 8.8.0 is released because I'm not sure there's any need for the disruption.