Problem/Motivation
When a webform fails validation, it may create an additional submission, rather than update the currently-edited existing draft submission.
Steps to reproduce
1) Create a webform with one element which can be used to fail server-side validation. I used a text field with unique validation.
1a) Set the webform submission settings / submission draft settings to "Allow users to save multiple drafts" and also "Automatically safe as draft when paging, previewing, and when there are validation errors".
2) Create and save one submission with a value to be duplicated in the next submission.
3) Revisit the webform's View tab and attempt to submit another submission with the same value (i.e. will fail unique validation).
4) Observe in another browser tab that there are now two submissions.
5) Click the submission's Submit button again.
6) Observe in the other tab that there are now three submissions: the original, the first draft, and a duplicate draft.
7) Repeating steps 5 and 6 will show that each failed submission attempt creates an additional draft submission
Proposed resolution
I believe that when a submission is autosaved, the submission id should be stored in a hidden form element and this should be used to update the existing submission.
This bug exist both when the submission is created via a webform element and directly from the webform's entity path's view tab. In my application, instead of a duplicate submission, I get a database integrity exception attempting to insert the same sid into the database. I have not yet figured out why my use case causes a different behavior, but I think a fix to this bug will also fix my issue.
One possibility would be to include the sid in the form along with a hash with the build id for security.
| Comment | File | Size | Author |
|---|---|---|---|
| #44 | webform_duplicate-drafts-3164012-44.patch | 9.22 KB | danchadwick |
| #42 | webform_duplicate-drafts-3164012-41.patch | 9.27 KB | danchadwick |
| #38 | 3164012-38.patch | 9.05 KB | jrockowitz |
| #36 | 3164012-36.patch | 8.72 KB | jrockowitz |
| #34 | 3164012-34.patch | 11.45 KB | jrockowitz |
Comments
Comment #2
danchadwick commentedThe crux of the issue is a check-and-egg problem. When adding a new submission, first the new submission is created with any default values and then the WebformSubmissionForm is created for that new entity. If upon a validation failure, the new submission is auto-saved, then the next time the form is submitted on the same path, another new submission (i.e. a duplicate) is created.
As a workaround for my application, I save the submission id indexed by the form build id both when the form is initially built and in a validation handler that runs after autosave. I do this in a WebformHandler, but it could be done elsewhere. I use a PrivateTempStore created via the 'user.private_tempstore' service. Then in the route controller that creates the webform form for drafts, I check the store for the incoming build id (which I awkwardly get from $_POST since there isn't a form yet), try to retrieve the submission id, and if found generate the entity form using the submission with an 'add' operation.
This may well not be the correct solution for fixing this in Webform itself.
Comment #3
jrockowitz commentedWe should figure out to get the saved submission's sid into the $form_state and use it.
Comment #4
danchadwick commentedBy the time you have a form_state, the entity is already attached to the form. It might be possible to use the form cache to retrieve the sid earlier, but some of that is protected as I recall.
I tried using form_state storage without success.
My solution doesn't scale well because the cache expiration is 7 days. For a busy site, that could be a ton of build ids.
Comment #5
jrockowitz commentedThe attached webform replicates the issue of multiple drafts when you click 'Submit' with entering any value.
I ran into the same issue around storing the sid in the $form_state and building the form.
The attached patch just tries to prevent the issue from occurring by not autosaving multiple drafts on validation errors.
Comment #6
danchadwick commentedPlease dear god no, let's not make the solution be not supporting the feature that is critical to my application. :)
Comment #7
danchadwick commentedHow about:
1) Include the sid in a hidden field on the form. Use #default_value to ensure the value is returned from the browser. Be sure the set the sid after autosave() or maybe save().
2) Include a hash of the sid with the build id on the form. Again, use #default_value.
3) In (I think) setEntity(), check the $_POST for the sid, and if found, validate the sid using the hash and the build_id.
4) If valid, load the submission for the sid. Validate that a) the current user is the owner, b) it is a submission for the same webform, c) the current user has update access, and d) it is a draft.
5) If so, use that submission in preference to the entity passed.
Thoughts?
Comment #8
jrockowitz commentedI think a hidden field can be tampered with via the browser.
I am not seeing any way to store the saved submission id during the validation state because the form is not being rebuilt.
Comment #9
jrockowitz commentedIf we have to securely pass information from one-page request to another, using the $_SESSION might be the most secure way to do it.
At the same time, creating a user session has performance implications.@see #11Comment #10
danchadwick commentedI believe a session is how webform 7 does this.
The idea of the hidden field is that it is also hashed with the build id to prevent tampering. This is the same security as a token, I think. Furthermore, it the submission is validated to both be a draft and owned by the author, so the only one being "harmed" is the author themself.
Or we can use my workaround. The documentation on PrivateTempStore specifically says it can be used for multi-step forms.
I'm not sure if webform 8 supports drafts for anonymous users, but that may affect the choice too.
Comment #11
jrockowitz commentedWebform 8 does support drafts for anonymous users.
I just realized that when drafts are enabled for anonymous users the $_SESSION is used.
@see \Drupal\webform\WebformSubmissionStorage::setAnonymousSubmission
@see \Drupal\webform\WebformSubmissionStorage::hasAnonymousSubmissionTracking
Comment #12
jrockowitz commentedBecause of the duplicate records, I am marking this as 'Major'. We also need to add some basic test coverage that makes sure duplicate submissions are no longer created for autosaved multiple draft with validation error.
Comment #13
danchadwick commentedI am assuming you're working on a solution. If not, let me know and I'll work on it. Thx for you support. And, good god, don't you sleep?
Comment #14
jrockowitz commentedI think the patch from #19 is a good starting point.
We need to add test coverage via \Drupal\Tests\webform\Functional\Settings\WebformSettingsDraftTest. It looks like the
test_form_draft_multiplewebform can be used to replicate the issue.@DanChadwick help writing the test coverage is appreciated.
Comment #15
danchadwick commentedApologies for not seeing patch #9. :|
I fixed two issues:
1) The session must be set in autosave() regardless of whether the submission is new or not. This is because for each GET or POST, an initial new submission is created before setEntity() is called. On the first POST, the submission will be new and the session sid is set. On the second POST, that session sid is used and deleted, but now the session is existing. In autosave, it needs to set the session sid again.
That said, I believe the only case in which this logic is needed is when the form will not be rebuilt due to errors. Therefore, I added a test for hasAnyErrors(). This avoids setting the session unnecessarily and potentially leaving the session in use.
2) It is possible that the submission might be deleted in between one POST and the next by another session. In this case, the session sid will exist, but there won't be a submission for it (i.e. load() will fail). I added a test just like the other cases in setEntity().
I have manually tested this with validation errors. I tested by my application and a simpler webform. I also tested this with preview (where this new code won't be executed) and I cannot create a failure, so I think preview and prev/next page always worked and will continue to work.
I have reviewed the security issue that occurred in webform 7 around this same feature and convinced myself that using $_SESSION in this manner is secure.
Tests aren't my strength, so I'm hoping you can write whatever test you feel is appropriate. Again, thank you for your support and dedication.
Comment #16
danchadwick commentedWeird. Didn't think I touched priority. Back to Major.
Comment #17
jrockowitz commentedThe patch looks good. I am fine with writing the test.
Comment #18
danchadwick commentedThought of a couple of issues.
1) User start creating a draft. It gets autosaved with sid 100 and has validation errors. The user then visits another path to generate a new draft on a different webform. Submission 100 is retrieved and used with the wrong webform. Solution is to save at least the webform id and maybe the source entity type and id in the session.
2) Same setup with submission 100, but the user starts creating another submission on the same webform (and with say the same source entity). Instead of getting the new submission with the defaults as defined by the path, submission 100 is presented instead. The solution here is (I think) to save the build_id in the session. Unfortunately setEntity() doesn't have access to $form, so it would have to get it from $_POST. Ordinarily this would give me pause since that can be forged, but I don't see how a forged build_id can match the session's build_id without it being proper operation. If build_id is checked, then maybe we don't have to check the webform id (and maybe source entity type and id).
If we do check build_id, we need to make sure that it doesn't break anything with multi-step webform, particularly using the browser's back button.
I will poke around these issues more tomorrow. Are there other concerns to worry about?
Comment #19
jrockowitz commentedWe should track the webform and source entity in the
$_SESSION. We could probably format the$_SESSIONkey aswebform_draft_multiple_sid.{webform}.{source_entity_type}.{source_entity_id}. We might need to create a::getDraftMultipleSessionIdhelper function.Because we are using and removing
$_SESSIONin the same request, I am not too concerned about a user having multiple concurrent drafts that cause a conflict.Comment #20
danchadwick commentedAm I wrong here? The setting the the session data and the reading of it happen in consecutive HTTP requests, not the same request. Gets set as the form has failed validation. Response goes back to browser. User clicks submit again. Route controller rebuilds a new submission for that route, setEntity reads the session data to replace the submission.
I will spend some time today validating what failures I can makes happen.
Comment #21
jrockowitz commentedYou are right, after validation fails the $_SESSION has the multiple draft sid until the next form POST.
We need to improve the tracking and storing of the multiple draft's sid. My concern with storing the multiple draft sid in a hidden field is more around complexity but it might be the only option.
Comment #22
danchadwick commentedHere's an implementation that uses a hidden field. The tricky part is that the WebformSubmissionForm form object is created, its setEntity() method called, and saved in the $form_state build info under as 'callable_object'. When the form is POST'ed, the cached form state is restored, restoring the outdated form object that was in effect when the form was created. This form object refers to the original newly-created submission entity, rather than the autosaved one. Therefore after autosaving, the form cache must be deleted for the form and the form and form_state will be built afresh on the next HTTP request.
I haven't (yet) looked at what can happen with the browser back button, but that should be tested. Overall I think this is a much sounder approach because the submission id is kept with the submission data in the form itself.
It does use $_POST data, but it validates the submission to be a submission that the user could edit anyhow. So I don't think a hacker can exploit this implementation. If this is a greater concern, it would be possible to also store the submission hashed with the build_id and validate that too.
The webform 7 multiple draft implementation is very similar and has passed security team review.
Comment #23
danchadwick commentedDammit. Some sort of stale form problem causing the priority to revert. Back to Major again.
Comment #24
jrockowitz commentedThe patch fixes the issue but I can't get the tests to confirm that the issue is fixed. This might be cause we are deleting the form builder cache. I will have to work on the tests later.
The source entity is not always set and to the comparison between the draft and current submission was throwing an error. I moved the comparison to \Drupal\webform\WebformSubmissionForm::compareSubmissions.
I also renamed
untrusted_draft_sidtowebform_draft_multiple_sid. I am not sure we need to advertise an 'untrusted' input.Comment #25
jrockowitz commentedI did a little more research in the automated testing issue and during automated testing, multiple drafts are being created. This issue might be specific to the automated tests.
Comment #26
danchadwick commentedThe reason I advertised "untrusted" is for the security of future developers. That value in the submission values is readily forged. I didn't want some future developer to look in $form_state['values'] with a debugger, see the SID, use it for something trusted, and create a needless vulnerability. Just my thoughts; it's your project.
I've been thinking more about the form cache. The first GET and first POST will each have unique build_id cache entries which once the submission is created will be outdated. Only the POST build_id will get deleted.
So:
- View webform (GET), new entity created, form cached as build_id 1
- Submit (POST). form and state retrieved from cache as build_id 1, updated, and saved as build_id 2. Autosave deletes build_id 2.
- Press browser back and submit form with build_id 1. Stale form retrieved and duplicate entry created.
I was thinking of disabling the form cache if multiple drafts+autosave, but I think AJAX needs them.
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21Form%21Fo...
It might be necessary to delete the form cache in setEntity too so that the incoming build_id is deleted when there is no sid in the $_POST.
I will test more on this today.
Comment #27
danchadwick commentedAlso, I am seeing not just duplicate submissions, but SQL errors due to duplicated uuid's. I thought I'd mention it. I'm not sure why sometimes I get a a duplicate and sometimes I get a SQL INSERT exception. I presume the uuid is being cached via the form_state's referenced to the form object's reference to the entity.
Comment #28
jrockowitz commentedAt this point, let's agree that the solution to this problem is going to be a somewhat hack. Ideally, we need simple, self-contained, and easy to understand solution.
The attached patch is a different approach, where the draft is saved before the form is built and submission (secure) token is set to a hidden input. I am hoping this is the most stable solution and we can focus on improving this one block of code.
Comment #29
danchadwick commentedThanks for your work on this. I like your solution. I addressed two issues:
1) Form caching will make this not work for the reasons I described in #22. I realize now that by default the form is not cached, which is probably why #28 worked for you. In my application, there is AJAX on the form, which enables caching. I disabled caching when multiple drafts + autosave is in use and in my testing, this prevents the duplicate submissions (which also have duplicate uuids, so throw an SQL exception). I have tested this patch with AJAX, and I will test it more this afternoon unless you have a better patch to test.
2) The token is not secure. It would be possible for someone to forge the token and cause the form to proceed with it. I'm not sure of the exact failure mode -- whether it is information disclosure or also data forgery. I added the submission test which limits the issue to other submission by the same user on the same webform with the same source entity. Since the user has access to these anyhow, I see no vulnerability. I'm not sure whether an access test should be added too.
Comment #30
danchadwick commentedPatch #29. Whoops.
Comment #31
jrockowitz commentedThe token is being used securely because \Drupal\webform\WebformSubmissionStorageInterface::loadFromToken checks the webform, source entity, and current user account in the query. The token is being used exactly the same way when it is passed via the URL to load a multiple draft.
I don't think we need the
::compareSubmissionsmethod but I will try to clean up the patch and add some test coverage.Comment #32
jrockowitz commentedI added the test coverage, moved the code to WebformSubmissionForm::setEntityForDraftMultiple, and added more comments.
I am open to any improvements.
Comment #33
danchadwick commentedI'll test more tomorrow, but "$form_state->disableCache();" needs to go before the test for 'op'. If the form is allowed to be cached, buildForm is never called and the stale form object is used.
Comment #34
jrockowitz commentedThe automated test is not catching any cache issue. If we do find a caching we should be able to test it.
Attached is a mix of the two approaches and might address potential caching issues.
Comment #35
danchadwick commentedTo reveal the form cache issue, the form cache must be used. The way that happens for me is for the first operation on the new submission to use AJAX. The creates a POST which updates the form and caches it. I'm setting breakpoints in FormBuilder on setCache and getCache.
#34 creates exactly one extra submission.
1) Initial get. No token so setEntity() does nothing special. buildForm() creates the token field with the token for a new (unsaved) submission. "OV-27sfX1HSX7JlVpEdg45fkAgxSgFuv3IaXvT2zZys"
Form is not cached in FormBuilder::processForm(). Draft is not saved yet.
2) AJAX POST request. setEntity() finds the token from 1) in the form but not in the database, so it saves the new second submission. "B0sVW85B_t1m-rp_wccUCHhipSealugXm9_XSAPfozw"
Form cache miss in FormBuilder::processForm(). buildForm() called, creating token field with this token.
Form is cached in FormBuilder::rebuildForm()
Note that the value of the token in the browser's form is still from step 1 ("OV-....")
3) Form submit. setEntity() finds a token in the POST data, but not in the database (because that draft was never saved). It saves a different new draft. "BDu5h_ZEsKPxUmyszD-arilJTSWAkc7-ZWYshAS2Gtc"
FormBuilder::buildForm() has a form cache hit (form was cached in 2). The cached $form_state refers to the form object when it was cached. $form_state->getBuildInfo()['callback_object']->getEntity()->getToken() is "B0sVW85B_t1m-rp_wccUCHhipSealugXm9_XSAPfozw".
And now there are two submissions.
Comment #36
jrockowitz commentedYep, I was able to replicate the Ajax issue.
LOL. I think we are back to using $_SESSION since it is the most reliable way to pass data from one page request to another.
Based on our previous discussion, we need to track the submission's token and form's build id. The token and build id are unique and random enough that I think we don't have to worry about forgery.
Please review the attached patch.
Comment #37
danchadwick commentedTesting results:
- Initial Get. No special processing in setEntity(), no session, no cache set
- AJAX Post. No special processing in setEntity(), no session, cache miss, cache set. Token: "JzyL8TU1-txa1kscouhzRtjJJtf0DosIpa_FYfv9ftc"
- Submit #1. No special processing in setEntity(). Cache hit; form restored. Session set "JzyL8TU1-txa1kscouhzRtjJJtf0DosIpa_FYfv9ftc" => "form-GfC-TpxHnFjSPSKrYyiK1iYMtejOIm51EEuFod7uBnU". No new cache set.
- Submit #2. Session detected, submission restored into form object, session unset. Cache hit; form restored (the version created from the AJAX Post). Attempt to save the entity in autosave() results in SQL exception, duplicate uuid on INSERT.
Because a form cache his restores the form object (WebformSubmissionForm), and because that includes the entity, any cache hit MUST be of an up-to-date copy of the form object. This implies
A) always setting the cache (such as in Submit #1) OR
B) never setting the cache (such as in adding $form_state->cacheDisable() in WebformSubmissionForm::buildForm() ) OR
C) delete the now-stale cache when the submission is first created ($was_new).
Another issue is the browser back button, but I didn't get far enough to test that. I think because the form is failing validation, it keeps the same build_id.
Happy to help with a patch, but I'm not sure which direction you prefer. Again, thanks a TON for your help with this tricky issue.
Comment #38
jrockowitz commentedSo caching is becoming the main issue. I think the solution is to use the ::afterBuild callback to set the webform submission token in a hidden input value. Now we don't need use the $_SESSION. The very basic tests are passing.
Comment #39
jrockowitz commentedComment #40
danchadwick commentedTesting results:
- Initial GET. Nothing special in setEntity. buildForm adds hidden field. afterBuild adds the value "eIXD5HvkT2I8Gon3tkan53RjkJb7mA_zDhkj1fQOGBE". Not cached.
- AJAX POST. Token found in $_POST, but not in database. New entity has token "KRPoKYWsZPEf0mxwj8PrWnkgTsjv54qp7e8L1hbGH_I". Cache miss. formBuild and afterBuild set hidden field to this new token. Form rebuilt. Submission is the same. formBuild creates field without a #value. Form is cached. afterBuild sets field to "KRPoKYWsZPEf0mxwj8PrWnkgTsjv54qp7e8L1hbGH_I". Submission hasn't been autosaved yet.
- Submit #1. setEntity finds token in $_POST but not in database. Yet another new submission with token "bhQ7si_c3BJZRzTAFTCQui-wI_MJM2arn5Iyc0iTSe0". Cache hit has hidden field without a #value. Afterbuild adds the token, but since there was a cache hit, this is from the cached form object's entity, which is "KRPoKYWsZPEf0mxwj8PrWnkgTsjv54qp7e8L1hbGH_I". autoSave creates a new submission, the one with this token ("KRP..."). Form is not cached because it not being rebuilt.
- Submit #2. setEntity finds the token and retrieves the submission from the database. Token is "KRP...". Cache hit. Cached form object's entity's token is "KRP...", BUT the cached form object's entity is still new. afterBuild is called and sets the hidden field's #value to "KRP...". autosave is called with the submission with token "KRP...", but new. SQL INSERT exception.
The cache cannot be allowed to contain a reference a form object which has an out-of-date entity.
I still think of the strategies from #37 is needed, or an alternative.
Comment #41
jrockowitz commentedI am stumped. I am going to take a break from working on this issue.
I am open to disabling the cache. Please include some test coverage with any patch.
Comment #42
danchadwick commentedI chose to start with your latest patch. I prefer the hidden field over the session because it is bundled with the submission data and it isn't a global variable (as a session is). There is also no session data to be left hanging should the submission be abandoned. I prefer the use of the token over the sid in the hidden variable because a) the token exists before the submission is saved and has an sid and b) it is inherently immune to forgery which sid's are not.
The only changes I made relative to #38 are
a) eliminating the two-step creating of the hidden field because I'm hidden field will always reflect the form_state's form object's entity, and
b) Deleting the form's cache whenever a new submission is autosaved with multiple drafts enabled. In this case, the submission in the form_state's form object's entity must be updated with the saved version. This will happen on the next POST in setEntity().
I left your tests from #38 intact. I think it's important (actually more important) to understand the form handling as follows:
WITH AJAX AND CACHING
- Initial GET. New submission. setEntity() nothing special. buildForm adds hidden field with value "-l3R789MFil4CttSa5AntyUntA2FTOcqAwvTYT-I4L0". Not cached.
- AJAX POST. setEntity() detects the token in the field, but the submission isn't in the database. It uses the second new submission with token "fdS9dRsPBVKitNM8I8VnyMp5jtU97teH-3qQ4YOtw1Y". Cache miss. buildForm add a hidden field with "fdS...". Form is rebuilt. buildForm() adds the same hidden field with "fdS..." Form is cached.
- Submit #1. setEntity() again detects the token but doesn't find the submission in the database. It uses the third new submission with a token of "-bzvYfm3oeyNGUaTNmnpTqGWBLeD_46KlMYjXRHpTeg". Cache hit. The form's hidden token is now "fdS...". autosave() saves the new submission with a token of "fDS". It deletes the form from the cache because the cache is now state (referring to a new submission, not the just-saved submission). The form is not cached (not being rebuilt).
- Submit #2. setEntity() finds the token in the POST and successfully loads it from the database and uses it as its entity. Cache miss because submit #1 deleted the cache entry. buildForm() adds the hidden field with the token "fdS...". autosave() saves the existing submission. Because the submission wasn't new, the cache wasn't delete (but it doesn't exist anyhow). Form again isn't cached because not rebuilding.
- Submit #3. Same as submit #2.
- AJAX #2. setEntity() finds the token and retrieves the submission. Cache miss. buildForm adds field with token "fdS..." Form rebuilt. Again buildForm() adds hidden field. Cache is set.
- AJAX #3. setEntity() finds token and retrieves submission. Cache hit. Form rebuilt. buildForm adds hidden field again with "fdS...". Cache set.
- AJAX #4. Same as #3.
- Submit #4. setEntity() finds token and retrieves submission. Cache hit. No call to buildForm(). Submission saved (existing). Form cache NOT deleted because submission wasn't new. Cache not set.
- Submit #5. Save as Submit #4.
WITHOUT AJAX AND CACHING.
Really the same as above, except the cache is never set, so there are never cache hits.
Comment #43
danchadwick commentedThinking about this. The caching part of this problem (not the setEntity() part of finding the right submission) does not seem to be limited to multiple drafts. I'm pretty sure that a single-draft webform with a AJAX POST initial request will cache the form and cause the second submit to retrieve a stale cache form object and its entity.
I want to confirm that, but if so, the test for multiple drafts after the autosave should be removed.
Comment #44
danchadwick commentedAck. As I suspected in #43, the issue with leaving a stale cache entry in when autosaving exists even for webform which don't have the multiple-draft option enabled. The attach patch is the same as #41, but removes the test for multiple drafts in autosave() when deleting the form cache entry. Without this, the failure is an SQL insert exception because of the duplicate uuid.
This patch is ready for review.
Comment #45
jrockowitz commentedThe patch from #44 works perfectly. I think we have come up with the best solution for a very difficult and challenging problem.
I think all the errors that we were seeing (and now fixing) have been plaguing webform users for a long time.
Comment #46
danchadwick commentedThank you for your dedication to webform in general and this issue in particular. Tough one.
Changing the version since this patch doesn't apply cleanly to 5.19 anymore due to git collisions in the tests. I am using the patch in production now applied to 5.20-beta4.
Comment #49
jrockowitz commented