Problem/Motivation
We currently have some content types (containing lots of fields/paragraphs) that take a lot of time to save. In some cases over 30 seconds. A big part of this is caused by the recursive entity tracking and scanning a lot of fields.
Proposed resolution
Since we do not have any automatic jobs running based on entity usage information, it is ok if the entity usage information is a little bit out of date to improve the performance. The idea was to allow the entity usage tracking to be built via a queue on cron runs. We are then making sure cron runs every 5 or 10 minutes, do the information would never be out of date too much.
Remaining tasks
- Add a setting 'Update entity usage information via a queue', turned off by default
- Make sure we add a proper warning/disclaimer explaining this means entity usage information is always going to be out of date, and should only be enabled when the entity usage information is not being used to automatically update or delete content.
- Make the update hooks write entity IDs and the operation to the queue if the setting is enabled.
- Make the queue handle the tracking as usual based on the operation.
User interface changes
New checkbox in settings page.
API changes
Data model changes
| Comment | File | Size | Author |
|---|---|---|---|
| #25 | 3015287-entity-usage-tracker-queue.patch | 8.81 KB | ruuds |
| #24 | 3015287-24.patch | 9.47 KB | ricovandevin |
| #22 | 3015287-23.patch | 9.83 KB | ricovandevin |
| #18 | 3015287-entity-usage-tracker-queue.patch | 8.28 KB | betoaveiga |
| #12 | 3015287-12.patch | 8.39 KB | seanb |
Issue fork entity_usage-3015287
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
marcoscanoI'd really like to restrict this as much as possible, only leaving it to people who are really aware of the side effects it might have. In that context, how about we add the setting + the behavior if the setting is turned on, but don't expose it on the settings form?
This way the only way to enable the cron-based processing would be through CMI, programmatically or drush. There is already a similar mechanism if you want to have the warning message in a custom form which is not a delete form (see
entity_usage.settings.delete_warning_form_classes).Comment #3
seanbThat sounds perfectly reasonable. Let's do that.
Comment #4
johannijdam commentedHere is a patch for adding the new/deleted entities to the queue. This can be set by adding a line to the settings file. Read the docs for more info.
Comment #5
seanbThanks @johannijdam. Looking good, I'll try it on my local machine and let you know if it works for me.
Comment #6
marcoscanoThanks! Some minor comments:
typo: s/support/supports
Nitpick: These 2 lines span over 80 chars.
Technically this isn't the only way to set it... :) How about we rephrase this with something along the lines of
"This can be enabled by importing config, setting the value programmatically with custom code, or overriding it at runtime by adding the following line to your settings.php file:"?We need to add
queue_trackingto theentity_usage.settingsschema as well.Can
insertbe a constant in the queue class instead?Same here, I believe a constant would be better.
Same here.
Same here.
Same here.
Here too we would use the constants.
And one relevant consideration:
I wonder the side effects of this deferred loading. Kind of an edge case, but what if someone edited the entity between the moment this item was added to the queue, and the moment the item is processed? We might be trying to track something that doesn't actually correspond to what happened in the first save, since the entity is now different. I'd assume it would be safer to add the whole entity object to the queue item, at least this way each processed item will register whatever was being saved by the user. But I don't know if serializing in the queue item a whole entity object has any side effects at all...
Now that I think of it, I realize maybe an alternative approach could be to avoid the use of the queue processing, and rely on a DestructableInterface service. I would still keep the setting (which could be called something like "background_tracking" instead of "queue_tracking"), and when it's switched on, instead of triggering the tracking synchronously, we just register the tracking to be done at destruction time, which is after the request was sent back to the browser. This has the advantage of not needing to rely on cron running, and significantly reduces the time during which there would be a mismatch in usage numbers. (it would actually only be out-of-date during the time needed for the tracking operation itself to take place).
What do you think?
Comment #7
seanbI think this is what Search API does as well. I like it. We'll get started.
Comment #8
johannijdam commentedHere is the new version of the patch. Removed the queue method, and use the background tracking instead.
Comment #9
seanbFrom https://symfony.com/doc/current/components/http_kernel.html#the-kernel-t...
So this means the "background" process is only actually a background process when using PHP FPM. I think that's not a big problem, but still something worth mentioning in the README.txt.
Comment #10
marcoscanoThanks! I think the approach in #8 is much better. It would be great to have test coverage for the 5 operations we are affecting, making sure they work as expected when working in "background mode".
Re #9, I didn't know about that, thanks for sharing! Yes it's probably better to make this clarification explicit in the readme file.
Comment #11
marcoscanoComment #12
seanbRerolled #8.
Comment #13
duaelfrThis looks great!
I have issues on a website where the process is so long that I get a Gateway Timeout and lost all my edits on the content. I have to disable the module to be able to update some nodes, that's painful.
I'd love to see this commited!
I feel that the setting should be added in the settings form and have an upgrade path to preserve BC, though. I could do this but I'd need some guidance to write the tests asked in #10.
Comment #14
duaelfrI gave this a harder look and even if it does fix the Gateway Timeout issue, it still breaks the max_execution_time.
Would it make sense to delay a bit more those calculations to use a Queue instead?
Comment #15
seanbI guess we could technically move it to a queue, but that also means the tracked data is no longer accurate. For most sites I don't see that as a big issue, but this should definitely be optional at best. Since this could cause confusion it should not be enabled without fully understanding the consequences.
Let's create a separate issue to add queues.
Comment #16
moshe weitzman commentedIs anyone successfully using this patch in Production (or something similar)? The patch looks good to me, just looking for some assurance before we proceed.
Comment #17
seanbWe developed, and are using, the patch in the website of the University of Rotterdam (https://www.eur.nl) for over 2 years without any issues. Over 200 editors are working with the site, so I guess if there was an issue we would probably have found it. It would be great to get some more feedback though!
Comment #18
betoaveigaHere is a patch against 8.x-2.0-beta6 based on the patch to use a Queue instead of the background track at the end of the request. It checks that storage and entity can be loaded before processing. It might be paranoic, but we are loading things later, so we must check if those things can load.
Comment #21
ricovandevin commentedRerolled #12
Comment #22
ricovandevin commentedApparently the Web IDE created the new file in another location than I expected. Fixed.
Comment #23
askibinski commentedFYI this patch is also comitted to the entity_track module, which works with the 4.x branch of entity_usage.
Comment #24
ricovandevin commentedReroll
Comment #25
ruuds commentedI've changed @BetoAveiga 's patch to make it work with 2.0.0-beta12 (this patch uses the Drupal queue for updating the usage of entities).
Comment #26
alexpottI wonder if the performance issue here still the case. The entity usage plugins are now much more performant.
I think this might still be desirable but for me it is probably not a stable blocker anymore.