Overview

Follow-up to #3490565: Add rudimentary conflict prevention to the preview end-point
In that issue we put in logic to throw an exception if client didn't send the latest auto-save hash for the entity and regions. This prevents 2 users for editing at the same time and wiping out each others changes

Trying to get the e2e tests to pass I discovered that rapid fire requests could cause this exception to be trigger without only 1 user editing the page.

So we decided to scope #3490565 to just the backend parts

In creating the front-end solution we discovered 2 cases which the back-end solution made in #3490565 did not cover

  1. A browser session that opened an entity in XB right after the entity was published that therefore has no auto-save data but remains open and inactive while in another browser session the entity is edited and then published again and after that the original browser session makes a request with the stale entity/layout information thus wiping out the changes made since that browser session was open

    This is because we cannot determine the difference between

    • a client that has no autosave hashes because the entity has just been published and they have the latest changes
    • a client that has no autosave hashes because at time the browser made the last request there were no auto-saves because the entity had just been published, but now the version they have locally is 1 or many published versions behind

    Because \Drupal\experience_builder\Controller\ApiLayoutController::getAutoSaveHashes() only returns the hash data for entities that have current auto-save entries, in both scenarios above it will be return an empty array. therefore we don't send enough information back and forth between the client and the server to differentiate between these 2 scenarios.

    If we don't differentiate between these 2 scenarios then client in the second case could submit a request that would pass validateAutoSaves() but would save information in the auto-save that would wipe out the published changes and revert back to previous version of the entity.

  2. Rapid fire client requests, or requests with a slow network

    Because in 0.x \Drupal\experience_builder\Controller\ApiLayoutController::validateAutoSaves() will throw an exception if the client does not have the latest auto-save hashes, 1 client request cannot be fired until the previous client request has returned with the auto-save hashes. Because we don't want to lock the UI until requests come back from the server the current solution will not work

Proposed resolution

  1. To handle first case in the overview where a client session has old entity/layout/model data AND no auto-saves is able to make a request:

    Change \Drupal\experience_builder\Controller\ApiLayoutController::getClientAutoSaveData to return nested array with both

    • hash: the hash from the auto-save entry
    • autoSaveRevision: (the latest saved revision if revisionable otherwise a hash of $entity->toArray()) + changed time if supported

    In practice for config entities this would just be the hash of the entity array. For nodes and pages it would be [REVISION_ID-CHANGED_TIME]

    This means that \Drupal\experience_builder\Controller\ApiLayoutController::getAutoSaveHashes() would never return an empty array even if there were no auto-saved entries. Therefore the server can distinguish between a client session that was started 1 or more saved versions ago and a client that was just opened, even if they both have no auto-save hashes

  2. Handing "Rapid fire client requests, or requests with a slow network" section in the summary:

    Because we don't want to lock the UI until the last request returns and we trust the the client to keep track of updated data if requests from the same client session, we don't need to check the hashes are the same. This allows a more seamless UX for users on slow networks. But we don't want a user to be able to switch to different browser tab and have the hashes not be checked because it might not have the latest changes. So for this reason a client session is the same browser tab but not some browser session and different tabs.

    With each request to experience_builder.api.layout.patch and experience_builder.api.layout.post we will require a unique clientInstanceId. The server will check if this client id matches the last client_id in the auto-save entry for all of the entities being edited, page or node + regions. If they all match then the auto-hashes do not need to be checked for matches but the autoSaveRevision values will still be checked.

  3. In \Drupal\experience_builder\Controller\ApiLayoutController::validateAutoSaves() remove the special casing for phpunit tests.

User interface changes

If another user has triggered an auto-save since the current user has made changes then the current user will get an error

CommentFileSizeAuthor
#55 1127-2.png460.56 KBneha_bawankar
#55 1127-1.png150.9 KBneha_bawankar
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

tedbow created an issue. See original summary.

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Title: [PP-1] Enforce conflict enforcement outside of tests and e2e tests » [PP-1] Add client support conflict detection via the `autoSaves' key

tedbow’s picture

I brought in the changes from #3490565: Add rudimentary conflict prevention to the preview end-point because I made progress getting the e2e test to pass, so I thought I would the current state

tedbow’s picture

The only 2 e2e tests that are failing now are `entity-form-field-types-test.cy.js` which does pass for me locally and `media-library.cy.js` which did fail locally but I didn't see an 409 request errors in the log. So maybe these are unrelated random test errors? Not sure, retesting

larowlan’s picture

Title: [PP-1] Add client support conflict detection via the `autoSaves' key » Add client support conflict detection via the `autoSaves' key
Status: Postponed » Active
wim leers’s picture

Priority: Normal » Major
Issue tags: -stable blocker +beta blocker, +Usability

AFAICT this is also a beta blocker? Otherwise we haven't solved the current auto-save-data-loss scenarios described in #3490565: Add rudimentary conflict prevention to the preview end-point?

nagwani’s picture

tedbow’s picture

I am actually surprised that no existing e2e tests failed 🎉

2 new tests failed

  1. rapid-component-addition.cy.js: I am not sure this can really be test reliability give the random test fails we have in e2e and this test would solely rely on timing. I am removing it
  2. auto-save-conflict.cy.js: This was attempt to try to simulate 2 users with 2 different browser sessions having a conflict. I am not sure how to do this in cypress and searched around a little and couldn't get it to work. Someone with more experience might have better luck
jessebaker’s picture

Assigned: Unassigned » tedbow

I am having a hard time coming up with a robust solution on the client side.

There are two different issues.

The first issue is concurrent editing from two different client instances. That can be either two different users or the same user in two different tabs.

This is solved by the autoSaves hash being passed back and forth and checked and a warning being issued in the instance that a conflict/mismatch arises. ✅

The second issue is proving more difficult to tackle. This issue occurs when a single user in a single client is able to trigger the mismatch scenario by making multiple requests in quick succession.

The issue happens in the following scenario:

  1. The client sends an update to the server with the current autoSaves hash (we’ll call it 1).
  2. While that request is still in flight, the client sends another update to the server. This request also has autoSave hash 1 because the first request has not returned yet with the updated hash.
  3. The server receives the first request and generates a new autoSave hash (2)
  4. The response to the first request comes back to the client and the local autoSave hash is updated to 2.
  5. The server receives the second request (which has autoSave hash 1) and rejects it (because it should be 2!) as 409
  6. The response to the second request is received by the client as 409 and the user is shown an error.

Potential solutions (to problem 2)

  1. “Lock” the UI when a request is in flight so that subsequent requests cannot be sent
  2. Silently discard new requests made when a request is in flight (this is what Claude’s attempt is doing in !MR1080)
  3. Mark in flight requests as "aborted" when a subsequent request is made. When an aborted response returns, discard the returned data knowing that the second response is on the way.
  4. Create a client-side “pending” system. While a request is in flight, subsequent data updates to the layout/model are stored as a local variable. When a response is received the local variable is checked and if the variable is set, generate a new request with the data from the variable combined with the autoSaves hash from the most recent response. Then Immediately unset the variable. If multiple data updates are made before the response comes back, they can each just overwrite the local variable as each change is a cumulative data that includes all previous data updates.
  5. On loading the React app generate a uuid to use as a client instance ID. Alongside the autoSave hash also send the client instance ID to the server. When the server receives consecutive requests with the same client instance ID, bypass the autoSave hash check as we know that they have come from a single instance and thus can’t be conflicting concurrent changes.

I've investigated the above solutions and have the following conclusions:

  1. This has poor UX implications. If we visually indicate the UI is locked it appears flickery on a good connection. If we just silently ignore user actions it makes the app feel unresponsive
  2. This allows the server and client to get out of date. The user makes 3 actions but the 2nd and 3rd are silently discarded. It looks like the actions have been made, but on the server only the 1st actually happened.
  3. I spent a long time implementing this only to find that it doesn't solve the issue. The subsequent request still has the out of data autoSave hash at the point it is dispatched so it still leads to a 409 error.
  4. I have spent even longer attempting to create this system. After 2 days I've not been able to successfully get this working with RTK query. I DO think it's possible, it's just VERY difficult and is resulting in a large amount of additional complexity
  5. I have just had this idea and I'm writing this comment on the issue because I'd like to get a second opinion on if its feasible. I think it would work and is less complex than the pending system above.
tedbow’s picture

I think there is pontential problem with #11 5)

they have come from a single instance and thus can’t be conflicting concurrent changes.

Is this true? What if the client makes 3 request in sequence and for some reason request 1 takes a long time to connect? Basically request 2 and 3 are processed and then request 1 gets processed. Wouldn't that wipe out changes in 2 and 3? Is what I suggesting even possible, maybe on randomly sluggish networks? Maybe ok to handle this edge case?

larowlan’s picture

It may be that #3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents is a better use of effort if the approach here (which was meant to be a short-term quick fix) is not possible.

wim leers’s picture

Status: Active » Postponed (maintainer needs more info)
Related issues: +#3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents

+1 for re-assessing #3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents. It's now >6 months old, but still relevant.

Marking Postponed (maintainer needs more info) and leaving assigned to @tedbow to convey we should first check that avenue.

tedbow’s picture

Assigned: tedbow » Unassigned
Issue summary: View changes
Status: Postponed (maintainer needs more info) » Needs work

re #13 and #14 I am currently a -1 on doing #3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents because

  1. There is nothing in #3492065 for "User interface changes"(I just copied that part of #3490565: Add rudimentary conflict prevention to the preview end-point to here now)
    if we "Enable concurrent editing" there probably will be some very tricky UX problems and we should figure the desired outcomes before we start making the solution

    Will concurrent editing just work perfectly and we all agree exactly what that means so there is no describe the UX? 😜

    for instance what happens if I am editing props for component, which should show the updates I as type, at the same time another user does another operation that moved that operation at out of view? This would either be by the other user reorder components at the same level as mine or adding another component above it that puts my out of view. Or it could be changing the props of component above mine that changes the height.

  2. Is there no possibility of conflict still in that system? I understand it would be less possible because you are allowing 2 users to edit different parts of the page. But 2 users could still try to edit the same component where 1 user didn't start with the latest changes that were just auto-saved correct?

    If so think a lot of the problems documented in #11 still exist just at a different level down

    I think #12 out of sync requests from the same user could still be problem. Especially because we send auto-save requests as you type for single prop of single component

    So I think you we wouldn't be saving time but not having to solve #11 it would just be a different level, per component

  3. #3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents just seems more complicated so there would be more unknown unknowns. I think we are further along on the current implementation so there would less, less risk
tedbow’s picture

Just wanted to make clear my comments in #16 were all with the idea that we have limited time before Beta1 - re #3515932: Milestone 1.0.0-beta1: Enable creation of non-throwaway sites and probably have a lot of other issues. The product decision has been made that we don't need concurrent editing in Beta1. Not sure about GA. but I assume we have a lot of other features that were prioritized in for Beta1

If we had more development resources and there weren't other features that were deemed beta blockers I would be all for getting concurrent editing in before beta1. But also if also if we made that decision won't the first order of business be defining the UX for that?

jessebaker’s picture

@tedbow I've found a bug that I think needs to be addressed server side.

Steps

  1. Ensure you have no autosaves/pending changes
  2. Reload the page to ensure you start fresh
  3. Make a change (e.g. edit a component prop)
  4. Make another change. Note that these changes both make requests with the correct/latest autoSave hash in the payload
  5. Publish all changes
  6. Make a third change. Note that this request correctly sends the last autoSave hash that was received in .4
  7. The response to this third change is a 409.

I suspect the latest autoSave hash on the server side is either being deleted or updated when changes are published but not communicated back to the FE so the client ends up with an out of date hash.

I think that either a) publishing should not update the hash, or b) the response from a publish action needs to include the updated hash to the FE can store it for the next request.

penyaskito made their first commit to this issue’s fork.

tedbow’s picture

@jessebaker I found the problem that is causing #18

It is because during Publish we delete the auto-save entry, including the hashes and the client instance id, for the items that were published. This is because there is no auto-saved information at this point. Everything matches what should have been just saved. We don't want this to show up "Review X changes"

So when the client sends the auto-saved hashes when find a mismatch, client has auto-saved hashes and the server doesn't. Now we could change the server side mechanism to stored this information separate from the actual auto-save information we will need to publish but that would make the server side more complex. Also it is also possible that this particular entity will never be edited within XB again, so this keeps us from having to keep auto-saved hash information for any entity that was ever edited in XB forever.

So I was going to suggest that when submit a publish request and get response from the server that you delete the auto-save hashes. That would solve the problem in #18 but I did make me think of another problem, that actually exists right now

  1. User 1 open xb/xb_page/1/editor, does not make any changes, there is no auto-saved hash because ther are not changes yet
  2. User 1 leave their browser open
  3. User 2 opens xb/xb_page/1/editor makes some changes
  4. User 2 publishes xb_page 1
  5. User 1, makes a change in their browser session that they never closed

I think the effect would be that User 1 would make an auto-saved version that would not be aware of the published changes and so would wipe out those changes when it was published again.

We could solve this by not deleting the auto-saved entry the server-side when publishing, and therefore throwing a conflict error for User 1. But again this would require us to keep auto-save hash information forever for all entities, which could start to add up on larger sites. Because we don't know if there is some browser session somewhere that was left open or not.

So I don't think this is a solution.

I think what we need to do is have the client be aware of the starting point version of the entity it is editing.

So for example for a page or node this would be version number, vid or changed property.
Basically the server would return to the client, along with the "autosaves" key, a "verison_id" key of the last saved version on the server. The client would always return this key.

So in the example above User 1's last request would properly get a 409 error because it's version number would be 1 version number behind. But this would also require the client to request /xb/api/v0/layout/xb_page/1 after publishing. Though after #3516517: [META] Selective Publishing and Reverting the client could try to be more selective and not request if the current page was not publish it would probably be safer to always just request it.

There are probably unrelated reasons why the client should always call /xb/api/v0/layout/xb_page/1 after publishing. For example Drupal has hook_entity_presave which allows altering an entity before it is saved. Of course XB will not be aware of any changes made in custom code here, therefore if XB doesn't call /xb/api/v0/layout/xb_page/1 after publishing the auto-save request could wipe out any changes that were made in any hook_entity_presave code

tedbow’s picture

I pushed up 86ce2085 which will return the revision id under "auto" . Probably at least autoSaves: Record<string, string>; will need to updated on the front-end because this now an object.

but it seems to work manually testing

tedbow’s picture

Note I have not had time to fix the phpunit tests but the e2e test passed so that is good sign. Just to need to update the phpunit tests to expect to a nested structure with revision info from the client.

jessebaker’s picture

@tedbow re: #18 #20

1) I think perhaps we should create/address this in a follow up
2) +1 to using 'changed' or even perhaps 'lastModified' and using a timestamp - I could see that information actually being useful elsewhere in the UI at some point.
3) After publishing I think we already refetch some data so I don't see it being a problem to update the page.

jessebaker’s picture

There are probably unrelated reasons why the client should always call /xb/api/v0/layout/xb_page/1 after publishing.

#20

As of my latest commit the FE now re-requests the page after publishing changes IF the current page is amongst the list of pending changes.

In my testing there is one remaining bug that I can see that is quite tricky to reproduce.

  1. In a browser tab start from a blank page, no autosaves (everything published)
  2. Copy the URL and open a new tab so you have two tabs with the same blank page state.
  3. In Tab 1, add a Heading component and immediately publish all changes
  4. Switch to Tab 2.
  5. Preview is now out of date (showing blank still, but should have a Heading on it)
  6. Add a Hero component
  7. There should be a conflict warning but there isn't.

From what I can see, the issue is that the autosave hash has autoSaveRevision but it's always "1". Nothing I do ever seems to increment that number (that is actually a string).

 {
  "autoSaveRevision": "1",
  "hash": "efa8dd3ce4f5f9e1"
}
tedbow’s picture

re #24 ok might know what is happening.....

tedbow’s picture

@jessebaker thanks for testing this. I was testing this with nodes and it was working.

Basically entity have a seperate revision id but if we create a new revision on every save this won't increment. I guess Page is not incrementing which I guess makes sense because we don't have a way to revert to old revisions or view old revisions AFIACT

So I used the "changed" time also in the revision for the client

tedbow’s picture

Title: Add client support conflict detection via the `autoSaves' key » For content entities add client support conflict detection via the `autoSaves' key
Related issues: +#3530827: For config entities add client support conflict detection via the `autoSaves' key

changing title to make clear we are only handling content entities, node and pages with layouts

We will need this too #3530827: For config entities add client support conflict detection via the `autoSaves' key

tedbow’s picture

Status: Needs work » Needs review
effulgentsia’s picture

Issue tags: +sprint
wim leers’s picture

How do I begin reviewing this MR? There's lots of comments here, and the issue summary doesn't talk about the client_id and clientInstanceId this MR is adding.

Can we get an up-to-date issue summary to facilitate a review? And ideally also sprinkle ℹ️ … comments over the MR to make it clearer still 🙏

P.S.: if #3512616: Document the reasons for not using Workspaces for saving in XB 1.0.0 had landed, then I'd have asked to update that ADR. In fact … any reason not to do that now, @tedbow? 😇🙏 That'd be even better than an issue summary update!

wim leers’s picture

Assigned: Unassigned » tedbow
Status: Needs review » Needs work
tedbow’s picture

Issue summary: View changes
Issue tags: -Needs reroll

re-rolled, added some more code comments and updating the summary but not finished. Leaving assigned to myself as I am working on the summary

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Issue summary: View changes

updated more.... but still more to update

leaving assigned to myself

tedbow’s picture

Issue summary: View changes
tedbow’s picture

Issue tags: +Needs tests

Change the MR so that \Drupal\experience_builder\Controller\ApiLayoutController::patch only validates the region that contains the component being updated.
I chatted with @jessebaker, @wimleers and @effulgentsia about this. Right now we are just going to conflict if all entity auto-saves don't match because it forces the user refresh the browser and we can then certain other parts of the front-end are in-sync after the refresh

I need to add test to demonstrate how this works

tedbow’s picture

Issue tags: -Needs tests

Added test coverage for regions to \Drupal\Tests\experience_builder\Kernel\ApiLayoutControllerPostTest::testOutdatedAutoSave. \Drupal\Tests\experience_builder\Kernel\ApiLayoutControllerPatchTest::test already had a test for a conflict with a region

tedbow’s picture

Assigned: tedbow » Unassigned
Issue summary: View changes
Status: Needs work » Needs review
Issue tags: -Needs issue summary update

@wim leers Re: #31 I have update the summary. I have had a chance to update #3512616: Document the reasons for not using Workspaces for saving in XB 1.0.0 but I think this issue can be reviewed. Most of the back-end changes are in \Drupal\experience_builder\Controller\ApiLayoutController::validateAutoSaves. \Drupal\Tests\experience_builder\Kernel\ApiLayoutControllerPostTest::testOutdatedAutoSave is the most explicit test of this. \Drupal\Tests\experience_builder\Kernel\ApiLayoutControllerPatchTest::test() also catches ConflictHttpException which is thrown from validateAutoSaves()

tedbow’s picture

Created follow-up #3532056: For less conflict errors only validate region auto-save on layout PATCH, in that issue I mention we might just want to wait until #3492065: Enable concurrent editing: Replace the postPreview action with atomic equivalents. The back-end work for #3532056 would be simple but I am not sure about the front-end changes that would be needed. We would need to send different info from the client but we might need to reload differently.

Also added a todo in the code

nagwani’s picture

Assigned: Unassigned » larowlan
larowlan’s picture

Assigned: larowlan » Unassigned

Left a review (which I assume is why this was assigned to me)
Unassigning

tedbow’s picture

Assigned: Unassigned » larowlan

I addressed a couple points in @larowlan's review

larowlan’s picture

Assigned: larowlan » Unassigned

Replied on the issue
There's a few open threads but only one question from me about using ::toArray vs ::normalizeEntity

tedbow’s picture

Assigned: Unassigned » larowlan

Assigning back to @larowlan because I answered his question on the MR

larowlan’s picture

Assigned: larowlan » Unassigned

@tedbow's response regarding using the hash makes sense - thanks

tedbow’s picture

tedbow’s picture

Assigned: Unassigned » larowlan
tedbow’s picture

Status: Needs review » Reviewed & tested by the community

I going to RTBC this based on @bnjmnm MR approval and @larowlan saying the backend changes look good

wim leers credited bnjmnm.

wim leers’s picture

Assigned: larowlan » Unassigned

I trust Lee & Ben, so going ahead and merging! 🚢

wim leers’s picture

Assigned: Unassigned » tedbow

Approved by proxy: since @larowlan +1'd the BE.

Unfortunately:

Rebase failed: Rebase locally, resolve all conflicts, then push the branch. Try again.

@tedbow: can you rebase locally & merge once it's green again? 🙏

  • tedbow committed 9f99e82f on 0.x authored by jessebaker
    Issue #3526907 by tedbow, jessebaker, larowlan, bnjmnm: For content...
tedbow’s picture

Assigned: tedbow » Unassigned
Status: Reviewed & tested by the community » Fixed

🎉

neha_bawankar’s picture

StatusFileSize
new150.9 KB
new460.56 KB

Tested changes on branch 0.x , for following scenarios :

Scenario

Result

Status

  1. User A, open "Test Page" immediately after publishing
  2. Do not make any changes; leave the browser idle and do not trigger auto-save.
  3. User B, open "Test Page" and add new component to page (Druplicon)
  4. User B, publish "Test Page"
  5. User A, make a change and save or trigger auto-save.
testcase01 PASS
  1. User A, open "Test Page" add component image
  2. User B, open "Test Page" (Image and Druplicon both present )
  3. User B, publish "Test Page"
  4. User A, reload page
Changes published with Image and Druplicon PASS
  1. User A , start making rapid changes (such as fast typing or repeated component moves) that trigger multiple almost-simultaneous auto-save requests.
  2. Simulate slow network conditions (using browser DevTools network throttling set to 3G)
  3. Attempt to continue editing
  4. Publish the changes before all the made changes as reflected
test2 PASS
  1. In User A, open "Test Page" right after publish
  2. In User B, publish the "Test Page"multiple times (create several versions).
  3. In User A, make a change and try to auto-save or publish.
User A on edit gets message to refresh the page , and on Refresh is updated to latest version PASS
  1. In User A, open "Test Page" right after publish.
  2. Add component to page “Image“
  3. Leave page for few hours (1hr)
User A should be able to continue editing from the point were they left PASS
wim leers’s picture

Issue tags: -sprint

Thanks! 🥳

tedbow’s picture

re #55 @neha_bawankar wow thanks for the detailed testing report!!!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.