Problem/Motivation
#597236: Add entity caching to core added the entity cache to core, but unfortunately it did not consider enabling it when loading an entity by revision ID.
There are contrib modules such as Entity Reference Revisions, Paragraphs and core modules such as Content Moderation, which rely heavily on loading entity revisions.
A developer might build a system where she is using ContentEntityStorageBase::loadRevision()
to load a default entity revision and by doing so one will currently not profit from the entity cache.
ContentEntityStorageBase::loadRevision()
is neither using the static entity cache nor the persistent entity cache, which has as a consequence that the function is inconsistent with ContentEntityStorageBase::load()
, as if you retrieve an entity multiple times in a single request with the latter one it will always return a reference to the same entity object, but if you retrieve it multiple times with the former one you will always get a new entity object and thus a completely different object reference.
The attached test demonstrates all the problems.
Proposed resolution
- Enable static entity caching for
ContentEntityStorageBase::loadRevision()
for all revisions. - Enable persistent entity caching for
ContentEntityStorageBase::loadRevision()
for all revisions. - Ensure that
ContentEntityStorageBase::load()
andContentEntityStorageBase::loadRevision()
return a reference to the same entity object, if loading the default entity revision. - To ensure backwards compatibility
$storage::resetCache()
will be invalidating all cached revisions. Include an optimization to invalidate only the default revision when the entity has only revisionable fields.
Remaining tasks
Review.
User interface changes
none
API changes
\Drupal\Core\Entity\RevisionableStorageInterface::loadRevisionUnchanged()
to load an entity revision bypassing the static cache.\Drupal\Core\Entity\RevisionableStorageInterface::resetRevisionCache($revision_id)
for reseting the static and persistent cache for specific revision IDs$storage->loadRevision($revision_id) === $storage->loadRevision($revision_id)
now evaluates to TRUE$storage->load($id) === $storage->loadRevision($revision_id)
now evaluates to TRUE if $revision_id references the current default revision
Data model changes
none
Comment | File | Size | Author |
---|---|---|---|
#139 | plus-persistent-cache.png | 227.02 KB | deviantintegral |
#139 | static-cache.png | 228.04 KB | deviantintegral |
Issue fork drupal-2620980
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
hchonovI am currently working on a possible patch.
Comment #4
hchonovFirst implementation....
Comment #6
hchonovComment #8
hchonovComment #10
hchonovComment #12
hchonovComment #13
hchonovComment #14
hchonovIf loading a default revision by revision id it should be added to the static entity cache as well.
Comment #15
hchonovComment #17
hchonovRerolling for 8.1.x.. Nothing changed between this and #14.
Comment #19
hchonovAs now the function loadRevision is moved to the ContentEntityStorageInterface we have to use this inteface when writing tests and mockups for it.
Comment #20
catch+1. When entity caching was added originally, we didn't have separate methods for loading revisions. This is also why #1730874: Add support for loading multiple revisions at once wasn't implemented for revisions at the same time (i.e. node_load(1, 2) was the API to get node 1 at revision 2. Given the workflow intiative there'll be a lot more need for loading and viewing revisions - as well as the contrib modules that reference specific revisions (although I'm skeptical about that except for very specific cases).
Haven't reviewed the patch yet.
Comment #21
cspitzlaymostly a reroll, the change to the use statements is already in core now.Sorry, I misread the rejects. Will post an updated patch.
Comment #22
cspitzlayreroll of #19Comment #23
cspitzlayreroll of #19
Comment #25
cspitzlayComment #27
hchonovComment #31
jibranShould we profile this as well?
Comment #32
hchonovNot really sure that this needs profiling as it addresses mostly inconsistencies :
So the changes here are needed and not necessary done for performance but mostly for consistency reasons.
Comment #33
timmillwoodAfter working on #2809227: Store revision id in originalRevisionId property to use after revision id is updated I found that most of the stuff is already in the patch from #27 so I've been looking at fixing the failing tests.
The only failing test that matters is
\Drupal\Tests\content_moderation\Kernel\ContentModerationStateTest
and this seems to be due to where we load the ContentModerationState entity in\Drupal\content_moderation\Plugin\Field\ModerationStateFieldItemList::getModerationState
.We use $storage->loadRevision(), but $content_moderation_state->fields['moderation_state'] and $content_moderation_state->values['moderation_state'] seem to have different values.Comment #34
timmillwoodThink I'm getting closer to what's causing the issue here:
In this part of the test we load the ContentModerationState entity, change the moderation state.
then in the save method we don't actually save the ContentModeration state, we save the parent entity. Therefore the ContentModerationState where we changed the moderation state is still cached with the wrong information.
Comment #35
timmillwoodThis should fix it.
Comment #37
timmillwoodComment #38
hchonovYou do not need to reset the cache here. You could only exchange the entity :
Comment #39
timmillwood@hchonov - That produces a new error, will dig into why:
Comment #40
timmillwoodSo
$related_entity->moderation_state->entity = $this;
as suggested in #38 won't ever work because we need to set the moderation state to a ModerationState entity, not a ContentModerationState.Therefore
$related_entity->moderation_state->entity = $this->moderation_state->entity;
or$related_entity->moderation_state->target_id = $this->moderation_state->target_id;
(as set in #35) are the only options. Although both if these require a cache reset.Comment #41
hchonovOk I now understand what you mean but I do not understand really your previous statement:
Whenever you load an entity in the system it should be a reference to the same entity object no matter if you load by id or reference. That said I do not get it how at one place you have the previously cached entity and at another one an other object. Do you think this might be a bug in the patch? Would you please clarify in more details what exactly happens?
Comment #42
timmillwoodWe have a ContentModerationState which has an entity reference field to a ModerationState config entity, it also references the entity the moderation state is for (such as a node or block) but not via entity reference field.
In the test we change the moderation state of an entity by updating it's ContentModerationState. So load revision X of ContentModerationState, and change the ModerationState reference from Y to Z. When staving the ContentModerationState it doesn't actually save it just updates the $related_entity->moderation_state->entity. So next time revision X of ContentModerationState is loaded it still has ModerationState Z and not Y as it should.
Comment #43
timmillwoodIt'd like to use
getOriginalRevisionId()
to fix a bug in 8.2.x, so switching this to 8.2.x, but not sure it's a valid patch release patch,Comment #44
hchonovThe patch still has some places, which need a rework.
We could not remove the revision functions from the EntityStorageInterface until D9.
These two functions should be put into a new interface, otherwise this represents an API break.
Comment #45
timmillwoodAddressed items from #44
Comment #48
timmillwoodI think it's not applying because the patch was for 8.3.x and this issue was for 8.2.x
Ideally I'd like to see this in 8.2.x, but I can't see that happening.
Comment #49
timmillwoodComment #52
timmillwoodFor #44.1 I needed to add these methods back in more than just the interface.
Comment #53
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedIn general, how does this play with #1596472: Replace hard coded static cache of entities with cache backends? Would it buy us anything if we do that issue first? Or would that one be harder to do if we go with this one first?
Skimmed a bit through the patch and a few points stand out:
Wouldn't it be easier if we would key the static revision cache by entity ID first and then revision ID (i.e. $entityRevisions[$entity_id][$revision_id])?
Core assumes integer revision IDs in the storage class, so I think we can make the same assumption here and everywhere else we deal with a revision ID.
Related to the point above, we're missing the return type here.
Comment #54
timmillwoodI had not seen #1596472: Replace hard coded static cache of entities with cache backends, maybe it'd be better to revisit that? and in the mean time I go back to #2809227: Store revision id in originalRevisionId property to use after revision id is updated and just implement the
getOriginalRevisionId()
elements from this issue?Comment #55
BerdirNot sure if it matters in which order we get those issues in, the second will need a reroll but not sure which way round it will be harder.
should we only do this if we have a revision key? getEntityKey() will check it too, but it also checks first if it already has a key.
Not really for performance, more for reading the code.
this would then simply change to use the corresponding static cache class.
The first part of the comment doesn't really make sense to me.
I understand this exists to prevent that loading an entity by revision and then by ID or the other way round always returns the same, statically cached object?
About the @todo, what if we do what @amateescu suggested and cache it as [entity-id][revision-id]. If we then additionally do a special case for the default revision and cache that as [entity-id]['default'] instead of the actual ID, then we could actually directly return that if we have it?
Comment #56
hchonovAddressing #53:
If we do this then when the method ::loadRevision($revision_id) is called we do not now at this point to which entity id the revision id is assigned, sure we could iterate over the array and search for it - not sure what is better.
Addressing #54:
I would say that here we already have something working and test coverage for it, so as soon as the nitpicks are cleared it should be ready to go and later the other issue for exchanging the cache backend will need a reroll to implement the missing functionality. We have a problem here and the sooner we fix it the better.
Addressing #55:
Sure, we could first check if the revision entity key is present, but this adds just more code that is not necessary needed as getRevisionId will make this check as well and if the key is not present it will just return NULL.
Yes, you understand the intention correctly and yes I think the comment could need a refactoring :).
If we do it this way then for the default revision we would need two entries in the entity cache array -
[entity-id]['default'] and [entity-id]['revision_id'] so that loadRevision could search for the revision id in the entity cache array.
Comment #57
catchJust in general #1596472: Replace hard coded static cache of entities with cache backends has a working patch and needs reviews, and in turn would allow migrations to stop clearing persistent entity caches when they reclaim memory once the dependent LRU issue is in.
Comment #58
timmillwoodAfter talking on IRC today we decided to implement the
getOriginalRevisionId()
part of this issue in #2809227: Store revision id in originalRevisionId property to use after revision id is updated, then come back to this.Comment #59
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's a patch without the new methods and the interface. This can be used after the other gets in.
Comment #60
timmillwoodI applied the patch from #59 on top of the latest patch in #2809227: Store revision id in originalRevisionId property to use after revision id is updated and all seemed to work ok.
Comment #61
hchonovUpdating the patch based on the changes in #2809227: Store revision id in originalRevisionId property to use after revision id is updated and I've made also a little bit of code refactoring.
Comment #62
hchonovIn #61 I've started optimizing loadRevisionUnchanged but forgot to finish my work .. oups.
Comment #63
hchonovIn doLoadRevision only write to the persistent cache if we are loading a default revision.
Comment #64
hchonov#2809227: Store revision id in originalRevisionId property to use after revision id is updated got committed so this is issue is not postponed anymore on it. I've re-rolled the patch.
Comment #65
jibranThanks for the re-roll some observations.
This can be removed now after #2809227: Store revision id in originalRevisionId property to use after revision id is updated.
'internal' seems redundant here.
How about restoring the
ignoreStaticCache
after theloadRevision
?It should be string|int instead.
AFAIK revision ID can only be int. Can a revision ID be string?
If we initialize
$result
withNULL
then we don't need this 'else'.Can't we do this after #2809227: Store revision id in originalRevisionId property to use after revision id is updated?
Just increasing the patch size so let's not do this.
We can use the parent methods and
return NULL;
.return NULL;
maybe?Let's use class constants.
Comment #67
hchonov#65
2., 6., 8. & 11. Done
1. The todo is about determining the revision id based on the enitity id, so I do not understand how #2809227: Store revision id in originalRevisionId property to use after revision id is updated is solving this?
3. There is no need of restoring it, as it is only set by loadRevisionUnchanged and read by loadRevision. I've actually forgotten to introduce the property to the class, so I've added it now.
4.1 Well content entities use integer for their ids so why should it be string|int instead of only int?
4.2 I've kept this identical to EntityStorageInterface and actually like said in 4.1 content entities use integers for their ids, so if changing anything maybe allowing only integers?
5. The doc is now defining the param of the revision id as being int.
7. Like in 1. I do not see how #2809227: Store revision id in originalRevisionId property to use after revision id is updated should be of any help here?
9., 10. I've added the same todo to the new methods now as the one from KeyValueContentEntityStorage::createTranslation :
I've also removed the changes to the content_moderation module, as there are new fails again and I am unable of understanding what happens there.
Comment #70
hchonovFixing the content_moderation test fails.
Comment #72
tstoecklerThis is problematic because
getFromStaticEntityRevisionCache()
can returnNULL
. In general I'm not sure why we try to fetch something from the cache when we already have an entity that we could just return.What is this?
Comment #73
tstoecklerJust found one more reason why the
return $this->getFromStaticEntityRevisionCache($entity->getRevisionId());
is wrong:Inside
$storage->loadUnchanged()
we end up calling$this->load()
, but before that we clear the static cache by callingparent::resetCache()
. But we don't clear the static revision cache. So the call to$this->getFromStaticEntityRevisionCache()
that then happens inload()
(insideloadUnchanged()
) will potentially re-use a statically cached revision that might have already been tampered with before. So it fundamentally breaks the contract ofloadUnchanged()
.Note that this particular issue could be fixed by explicitly clearing the static revision cache in
loadUnchanged()
below theparent::resetCache()
call.Comment #74
tstoecklerI think it is important that we add test coverage to expose #73 before trying to fix it. I tried to cook up something locally but failed so far, so will try to explain what happened:
An entity was being saved with an updated field value. As part of saving
$entity->original
was being populated with the return value fromloadUnchanged()
. Because of the above behavior,loadUnchanged()
actually returned the already modified entity (this part I could not yet isolate in a test, see below). Then when saving the dedicated table data, Drupal sees if anything changed for the given field, but since$entity->original
was actually identical to$entity
, Drupal ignored this field and did not persist the values to the database, even though the field values were (both) different from the ones in the database.So in other words:
This will print the value that the entity had before, and not
'bar'
.Regarding not being able to isolate this behavior, you need to create a situation where the entity that you have modified is in the static cache but the persistent cache does not have that entity. Then we will reach
$this->load()
inContentEntityStorageBase::loadUnchanged()
and the craziness begins...Comment #75
tstoecklerFirst a re-roll for #2776975: March 3, 2017: Convert core to array syntax coding standards for Drupal 8.3.x RC phase. Working on a test again now.
Comment #76
hchonov#72.1
This will not happen, because before we retrieve the entity from the static cache we first check if it is there and if not we put it into the static cache. This happens just two rows before the one you are referencing to.
#72.2
I guess this is a some left over from something I was experimenting with, so we can just remove it.
#73
This wasn't earlier the case. We previously cleared the cache by the CESB::resetCache implementation. This changed in #2753675: loadUnchanged should use the persistent cache to load the unchanged entity if possible instead of invalidating it and I've forgotten to update the patch in this issue here. So yes, we have to clear the static revision cache additionally. Good catch :).
Comment #77
tstoecklerIt still seems unnecessary, though. Why not just
return $entity;
?Comment #78
hchonovI was able to reproduce the problem and it would happen if the entity is statically and persistently cacheable and also revisionable. The use case is as follow:
$entity = $storage->create();
$entity->save();
$entity = $storage->load($entity->id());
// Do something to invalidate the entity cache e.g. create a field config and save it in an entity insert hook.
$unchanged = $storage->loadUnchanged($entity->id);
What happens now is that loadUnchanged will return the same reference to the entity object because the static entity cache has not been reset properly.
The attached patch has the test for this and will fail.
Comment #79
BerdirSee also #2859042: Impossible to update an entity revision if the field value you are updating matches the default revision. for a similar problem that exists in HEAD with forward revisions where changes aren't stored. That issue might also need this, although I'm not sure if we should implicitly rely on the static caching here, we might want to make it more explicit.
Comment #80
hchonovFixing the failing test provided in #78.
Comment #82
timmillwoodTempted to RTBC this, but it's a little over my head.
So just going to add a "looks good to me" comment.
Comment #83
vijaycs85Few review comments:
Minor: We could avoid 2 additional calls to getRevisionId by storing in local variable.
getRevisionFromPersistentCache already doing a postLoad() do we still need one here?
can be single line return statement.
empty if block
load() is already in EntityStorageInterface. Do we need to redefine here?
More than 2 spaces.
Comment #84
hchonov#83:
1. Done.
2. Nice catch.. Actually currently in HEAD we execute the post load only in the loading functions, not in the ones retrieving from the persistent cache, therefore we should remove that piece of code from the patch from getRevisionFromPersistentCache. Done.
3. Done.
4. Done.
5. I've redefined load, loadRevision and deleteRevision in ContentEntityStorageInterface in order for them to have an annotation that they are returning content entities, but this is not necessary change for the current patch -> reverting that change.
6. Done.
Thank you, @vijaycs85!
Comment #85
vijaycs85Thanks @hchonov. interdiff looks good to me. I would like to do a manual testing to make sure getting proposed resolution. If it's too much to ask/do "steps to reproduce" in issue summary (as I understand it's more of technical fix than UI change), can we have a test-only patch to prove it still fails?
Comment #86
hchonov@vijaycs85, I've extended the already provided ContentEntityStaticCacheTest in the patch. As it tests also the new method loadRevisionUnchanged I had to remove that piece of testing the new method in order to provide a test only patch, but of course I've kept it in the whole patch :).
Comment #89
jhedstromPatch no longer applies.
And a few minor issues:
Any
@todo
needs a corresponding issue linked from the code comment.Can the revision_id really be a string? Elsewhere in the patch it's just an int...
Ditto here.
Comment #90
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedStraight reroll for the minute. Looking at the feedback in #89 and reviewing again shortly.
Comment #91
hchonovUnfortunately the current patch is not considering non-revisionable fields and they will be cached together with the entity, but we don't have any way yet of loading their current values. I can think of three ways for solving this:
\Drupal\Core\Entity\Sql\SqlContentEntityStorage::saveToDedicatedTables()
always call::hasFieldValueChanged()
for non-revisionable fields and if they've changed invalidate the revision cache tag for that entity - still have to introduce the revision cache tag.Any opinions on which option is or might be the better choice?
Comment #92
Sam152 CreditAttribution: Sam152 as a volunteer and at PreviousNext commentedWould it be worth adding a test case for the scenario in #91 so a few of these options can be evaluated.
Comment #93
kfritscheNot sure why in #90 the method isDefaultRevisionPublished was copied from ModerationInformation to EntityOperations.
So for now a re-roll without that move.
Still open #89 and #91
Comment #95
hchonovThis is a re-roll and at the same time adaptation to the new storage logic as a lot of things have changed in the storage in the meanwhile and no straightforward re-roll was possible. I've tried to further simplify the patch and reuse as much as possible from the provided infrastructure.
This patch still doesn't fix #91.
Comment #97
hchonovOne of the test fails was caused by an endless loop triggered by the workspace entity query class, where if during entity load an entity query is executed then in the preparation phase another entity load will occur leading to the endless loop. I am not sure how we can deal with such a problem, but for now I've flagged the query with a tag, which is checked in the workspace entity query class to prevent loading the entity again.
Comment #98
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedWe already have a recursion protection in
EntityOperations::entityLoad()
, which is a bit ugly but it will be removed by #2924218: Clearing the persistent entity cache every time we switch between workspaces is super wasteful, so I would suggest to either wait for that issue to land or re-work this patch on top the latest one over there so you don't waste too much time chasing soon-to-be-dead code :)Comment #99
hchonov@amateescu, it is nice, that there is already an issue about that :). I think I am gonna stick for now to the small fix I have here as otherwise the patch will get much bigger ... I guess I'll have to deal with the removed code later :).
So I was trying to develop a test for #91, but it wasn't failing. Then I've realized that the clearing of the revision caches wasn't optimal and this is why the revision caches have been always deleted on entity save... I've optimized the patch now and the test is failing now as we would've expect.
Comment #100
hchonovAnd here is solution for that problem.. It is not nice but we would have to wait for #2862574: Add ability to track an entity object's dirty fields (and see if it has changed) to get a nice solution.
Comment #103
hchonovI was struggling a lot with the latest test failures, until I've asked @tstoeckler for help and he found the reason for the test failures very quickly. We think that the problem existed even before the patch, but we don't why it hasn't occurred before. What happens is that a duplicate of an entity with a path alias is created and the path alias is not unset on the duplicate leading to two entries of the same alias pointing to two different entities. Later on an incoming request for that alias with the patch applied the duplicate entity was picked instead of the first one.
@tstoeckler++
Comment #105
hchonovHere is a fix for the last test failures... I had to add a clean up of the persistent revision cache on entity delete and entity revision delete.
Comment #106
hchonovI was working on additional test coverage for deleting entities and entity revisions and discovered some more problems with the patch, which I've fixed in the new version:
- The entity revision ID field is not flagged as revisionable and therefore it has to be excluded explicitly of the check for changes in non-revisionable fields.
- During the re-rolls somehow I've forgot the piece of code updating the default revision status.
- In some comments I was using "ids" instead of "IDs".
Comment #108
hchonovThe test failures are showing that the patch is starting to function properly :).
We have places in the tests where field values are altered directly in the revision table through the database API and the entities are loaded by revision IDs. As the revisions weren't cached previously this was fine, but now the revisions are cached persistently and therefore the persistent revision cache has to be cleared after such an action.
Comment #109
tstoecklerStarted reviewing, got about a third of the way through. Most notably, this patch now needs to be adapted for #1596472: Replace hard coded static cache of entities with cache backends.
Some more comments:
There should be an empty line before the
@see
's. Also, I don't think we're allowed to use omit the class name even if it's a reference within the same class.This kind of seems the wrong way around. Couldn't we instead - whenever we put a revision in the revision static cache check if it is the default revision and then populate the entity static cache directly?
I know this is mostly copied from
::loadMultiple()
, so feel free to ignore, but wanted to mention it nonetheless.$passed_*
is a bit strange, in my opinion.!empty()
could be omitted, although I guess some people like keeping it around.The equivalent code in
::loadMultiple()
contains a check for (the equivalent of)if ($passed_revision_ids)
here. Unless I'm missing something, that is in fact pointless, so that it's fine to omit it here, but I think we should then also remove it from::loadMultiple()
. I'm fine with doing that in a follow-up if you don't want to do that here.Here, again,
::loadMultiple()
has a check for (the equivalent of)if ($revision_ids === NULL || $revision_ids)
and here, it actually is a performance improvement to avoid that call, unless I'm missing something.Why can we remove this? It seems it should still be called, at least on the queried entities?
The comment doesn't match the code, or at least it's written in an unclear way. Per the comment I would think we only have to clear the static revision cache if a new revision was created but it is cleared unconditionally.
Also, the first sentence contains "only" twice.
We could add a
@todo
to remove this in https://www.drupal.org/project/drupal/issues/2975307Seems unnecessary to do this twice, can we not just include
$current_language
in the second loop?Also, I think the
if ($original->hasTranslation($langcode)
is missing anelse
. If the default revision adds a new translation we should always clear the persistent revision cache for that revision, no?All this leads me to wonder, whether this complex check is even wort it or whether it's not easier to just always clear the persistent revision cache in this case even though there might be some cases where it's not necessary.
This seems pretty unfortunate. Wouldn't it be nicer to simply clear the persistent revision cache of the default revision when saving a new revision as the default revision?
I think
isDefaultRevision()
should returnTRUE
even if the entity type is not revisionable, so I think we can consolidate those two calls.I'm not sure I agree with that. Even though this is of course to be avoided generally people may be modifying the database directly.
My thinking is that this would mostly be true for the non-revision persistent cache, as well. I.e. if the default revision does not get saved or deleted, we shouldn't have to clear the persistent cache. We do, though, unconditionally, so I think we should do the same for the revision cache.
Since this method is only ever used in conjunction with
::resetStaticRevisionCache()
I wonder whether instead of this method we should just add a::resetStaticRevisionCacheByIds()
method or similar (or maybe even a flag to::resetStaticRevisionCache()
?Since this is often used conditionally instead of
::buildCacheId()
what do you think about instead of a new method, adding a$revision = FALSE
parameter tobuildCacheId()
instead? That way we could simply pass down the$revision
parameter in a bunch of the methods above.Comment #111
tstoecklerInteresting, why is this necessary?
Seems we can just assign (=) here, instead of adding (+=).
Comment #112
tstoecklerI just found a major problem with this patch: This method needs to take into account not only
$this->entities
but also$this-entityRevisions
. It can be the case that there are statically cached revisions that are not also cached in$this->entities
and in that case those revisions will never be cleared usingresetRevisionCache()
Comment #113
tstoecklerWhat is also problematic is that if you have multiple revisions of the same entity in the revision cache you cannot clear all of the revisions using the API. Only the revision that happens to be the "loaded revision" will be cleared. I think there should be some way to explicitly clear by revision ID.
Comment #114
tstoecklerHad an in-person chat with @hchonov and @mkalkbrenner about this yesterday. Here are some notes. After writing this I will attempt to re-implement this (i.e. not re-roll in the strict sense) on top of current 8.7.x and the
MemoryCache
implementation.MemoryCache
for the static revision cacheEntityStorageInterface::resetCache($ids)
. With the current patch only the default revision is cleared from the revision cache. I.e. after calling$storage->resetCache([5])
there might still be a revision in the revision cache with the entity ID 5.I think backwards-compatibility is the most important thing here, which in this case means not to break the existing expectations of users. However, one could argue this for both implementations:
resetCache()
is the only method to clear an entity's (or entity revision's!) cache right now, we need to make sure that it continues to clear everything related to the entity including all of its revisionsresetCache()
should not mess with the newly introduced revision cache as that would be expanding its functionality. We just need to clear the default revision to prevent that from being used when subsequently callingload()
resetRevisionCache()
take an array of revision IDs instead of entity IDs. @hchonov agreed with this. This is relevant to the dicussion above, because, even if we only clear the default revision inresetCache($old_revision->id())
we could then tell people if they want non-default revisions to be cleared they can callresetRevisionCache($old_revision_id->getRevisionId())
Comment #115
tstoecklerWhile starting with the re-implementation, some more questions came up, so this is another "review" of the current patch.
So I guess this is because
::loadUnchanged()
actually doesn't just ignore the static cache, but clears it. So with$a === $b
will yieldFALSE
.So by using this flag and not actually clearing the static cache you can achieve a situation where with
$a === $b
will yieldTRUE
. That is nice, butloadUnchanged()
andloadRevisionUnchanged()
loadRevisionUnchanged()
also clear the static cache. Even though it would be a bit unfortunate, perhaps the consistency outweighs the idempotency? Not sure.Since
loadMultiple()
is not being overidden here, I think this "fallback to the revision cache" only happens when loading a single entity, not multiple ones. That is a bit unfortunate, if true. I think if we implement #109.2 we can remove this entire method override and then also get rid of this inconsistency.While this goes again #109.14 I just now realized that we are extending the base methods and introducing an additional parameter. Since we end up doing the if-else in the method, anyway, I wonder whether we shouldn't introduce separate
getFromStaticRevisionCache()
andsetStaticRevisionCache()
methods.Comment #116
tstoecklerOK, opened #3013917: Minor improvements for (Content|)EntityStorageBase for #109.3 + #109.4
Comment #117
tstoecklerOK, here's a complete rework of the patch done in collaboration with @hchonov and me. Done with literal pair programming and then also internal patch-ping-pong until we were both at a point where we are confident with the patch.
For now not providing an in-depth explaining of the decisions here, will add that to the issue summary, so we can keep that updated if we refine the implementation here.
First, I want to see what kind of test failures we have now.
Comment #119
hchonovWe had some unfinished work regarding the cache tags, which seems to fix some of the test fails.
The changes in
SqlContentEntityStorageTest
are required as it is mocking a content entity only through theEntityInterface
, but now inContentEntityBase:setPersistentCache()
we call::isDefaultRevision()
on the entity, but that method isn't present on theEntityInterface
.Comment #121
tstoecklerHere's an interdiff for the coding standards fixes.
Comment #122
hchonovThe problem with the workspaces tests was caused because of the preLoad hooks. With our revision cache, if an entity is loaded by revision ID in its default revision, then that entity will be put both in the entity and entity revision cache. In this case the preLoad will not occur, when loading the entity by ID because the entity already exists in the static entity cache. I and @tstoeckler think, that the idea for having the preLoad after the retrieval from the static cache was not to execute the preLoad hooks multiple times, but even if we put the preLoad before the retrieval from the static cache, we still could skip executing the preLoad hooks. For this we could create additional entries in the static cache for preLoaded entities and retrieve them from there before executing the preLoad hooks. The cache IDs of preLoaded entities should then be extended by a dedicated prefix or suffix, for example "preLoaded".
I've also applied the coding standards from #121 to the new patch.
Comment #124
hchonovAn optimization contained in this patch for resetting the cache only on updates is causing the test fail of
\Drupal\KernelTests\Core\Command\DbDumpTest::testScriptLoad()
I've just created a dedicated issue for this optimization, where the test is also being adjusted - #3015374: Prevent unnecessary reseting of the entity cache for new entities on entity create/insertP.S. it appears that the fails in
Drupal\Tests\taxonomy\Kernel\Migrate\d7\MigrateTaxonomyTermTest
are also caused by this optimization.The next version of the patch for the current issue should therefore have that optimization removed.
Comment #125
hchonov1. Removed the optimization for skipping the cache reset on entity insert.
2. Fixed the content moderation test fails by adjusting Content Moderation to have proper expectation when having revision cache and properly shared entity references.
The necessary changes in Content Moderation might be seen as an API break, but if looked at them as such, then this is also a necessary change because of the increased usage of loading an entity by revision ID introduced both by Content Moderation and Workspaces. Additionally all usages of Entity Reference Revisions / Paragraphs will also profit from the new revision cache.
Comment #126
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedIt seems to me that we're trying to do too many things in this issue. Since the static cache seems to be the most problematic part, how about splitting up the work and create a separate issue for enabling persistent caching of revisions, since that brings the most performance benefits anyway?
Comment #127
hchonovI guess you are right, that this should simplify the patch a little bit.
My main reason for keeping this together until now was that the performance benefit is only one side of the coin. The other is consistency and as important as the performance gain.
What I mean is that currently we have
$storage->load($id) === $storage->load($id)
evaluating to TRUE, but having both
$entity = $storage->load($id); $entity === $storage->loadRevision($entity->getRevisionId())
and
$storage->loadRevision($revision_id) === $storage->loadRevision($revision_id)
evaluating to FALSE.
Comment #128
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedTbh, I think the consistency part is only an artificial problem, if we can call it a problem at all. Are there any examples (code or even conceptual) where an identical comparison check between entity revision objects is actually useful or can not be accomplished in another way?
Because the other side of the coin (the performance aspect) offers real and measurable benefit :)
Comment #129
hchonovI am sorry, but one cannot simply define consistency as an "artificial problem". When talking about DX, then consistency is really important. If I as developer write code based on the assumption, that the storage is always returning the same entity reference when loading an entity by ID, then I would have the same expectation when loading an entity by revision ID.
Here is an example test case from core, when loading an entity by ID, but the very same use case is also valid when loading an entity by revision ID:
Like in the case of
config_entity_static_cache_test_config_test_load()
one could have implemented ahook_entity_load()
to enhance the entity by additional information, which is retrieved from a slow storage, but it is valid only for the current request and therefore one could not usehook_entity_storage_load()
, which will be caching the entity into the persistent cache.If this is your problem with the static cache, then I have good news :). It doesn't only ensure consistency, but also brings performance gain, which also "offers real and measurable benefit :)".
Call 100 times
$storage->loadRevision($revision_id
with enabled static cache and the entity storage will have to retrieve the entity from the persistent cache only once.Call 100 times
$storage->loadRevision($revision_id)
without enabled static cache and the entity storage will have to retrieve the entity from the persistent cache not only once, but 100 times.This is why I see the static and persistent cache of
::loadRevision()
as one complete feature.Comment #130
hchonovComment #131
hchonovComment #132
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@hchonov, I really can't tell if you're being serious or not. I know what static caching does.
I don't have any problem with the static cache, the two sides of the coin were performance and consistency.
That's super helpful to know! Thanks!
Please read #126 again, all I'm saying is that it would be easier if we add persistent caching for revisions first, which doesn't have any questionable things like #122 and #125 to address, and then work on enabling static cache as well and discuss those issues in depth.
Comment #133
chr.fritschJust a re-roll of #125
Comment #135
Wim LeersI just finished reading the entire issue and skimmed every linked issue that is not yet closed.
Thanks for the many interesting issues that have been linked from here, for the multiple separate issues that have already landed that cleaned up some areas, and for the thoughtful discussion here! Especially kudos to @hchonov and @tstoeckler for their meticulous work. 👏👏👏
It seems that the importance of this issue has only grown since this was opened, like @catch pointed out in #20 about two and a half years ago. I think that's even more true today.
#133's reroll is failing on a test that has existed since April 2018, so something in the foundations has probably changed that's causing that failure — it's not a new test, it's one that was green in #125.
I understand and sympathize with #132 in that it's indeed easier to develop two related things in a single patch. Sometimes it's absolutely necessary. I don't think it is absolutely necessary in this case. A 50K patch against Entity API is hard to land. I think you'll find that by extracting the static caching from this patch and leaving that as a second step (with the code already fully done!) you'll get to commits (and hence progress) faster 🙂 Actually, it may even be informative to just split up the patch here in a "phase 1: persistent revision caching" patch and a "phase 2; static revision caching" patch plus a combined patch. That alone will make this easier to review, because it'll be clearer how the two relate to each other. I'm sure it's already clear to those who spent many (MANY!) hours writing this patch, but it's hard for reviewers to gain that same level of understanding.
So I propose that as a first step, we split this patch up in two
*-do-not-test.patch
patches just to make reviews easier. Let's then check again whether we believe splitting this up in two issues is worth it.Rebased #133, it conflicted with #3025427: Add @trigger_error() to deprecated EntityManager->EntityTypeBundleInfo methods. Also fixed trivial nits (see interdiff).
🤔 Why tentatively? Code should be doing things with certainty, not tentatively. That word frightens me, especially deep in Entity API :)
🤔 Wouldn't
$are_revision_ids
be a clearer name?🤔 Per-revision cache tags will lead to an exponential increase in the number of cache tags in existence for some sites. The
cachetags
DB table will grow enormous.Is this absolutely necessary? Why?
🤔
$is_revision
would be clearer?🤔 From reading #122, I thought this work had already been done. Apparently it still needs to happen?
🤔 Why do we need to change when the cache is reset? How confident are we that this won't cause behavior changes?
🤔 Why are these changes necessary?
I think: Because direct DB queries don't trigger the necessary invalidation. This should be listed in the issue summary as a sort-of-API-change (I know it isn't really, but if core was still doing this until now, then probably some contrib/custom modules are too). It definitely needs a change record of its own.
Tagging
for this.Tagging
for this.Comment #138
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedStraight reroll against 8.8.x, lets see if any more tests fail.
Comment #139
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedFortunately (well, unfortunately for me!) I've got that real-world performance problem.
We're replicating an existing API that returns a response that is both wide (many entities) and deep (many entity references within paragraph fields). Our serializer calls
referencedEntities
on paragraph fields, which means thatloadRevision
is called. On one request we're calling loadRevision ~340 times.The total response times can be ignored - this is with other caches disabled and other unrelated performance fixes to make.
The static cache alone saves around 340ms.
A second load with a primed persistent cache saves an additional 1080ms or so:
Some ab results without xhgui (as that affects absolute numbers significantly)
Before this patch:
After:
The test failure is triggered in
\Drupal\Tests\workspaces\Kernel\WorkspaceIntegrationTest::assertWorkspaceStatus
when it's switching the workspace and checking the home page view. That's as far as I've got - I've not used Workspaces at all so it will take some time to debug.Comment #140
Berdir> Our serializer calls referencedEntities on paragraph field
Don't do that. We implemented an optimization in ERR that assumes that the referenced paragraph revision is the default revision, which is typically the case when loading the default revision of the host entity. Only if that doesn't match we try to load by revision. At least until this issue is resolved.
However, we forgot to implement that for referencedEntities() too, so instead, just do foreach ($items as $item) and access ->entity.
Comment #141
deviantintegral CreditAttribution: deviantintegral at Lullabot commentedThanks! I'll take a look at that.
I haven't found the root cause of the failure, but I did find a side effect that may help in tracking it down. In
\Drupal\Tests\workspaces\Kernel\WorkspaceIntegrationTest::assertWorkspaceStatus()
, if we callloadUnchanged()
before rendering the view, the test passes. Remove that call and we're back to failing.Comment #143
apadernoComment #144
geaseReroll of patch from #138 against 8.9.x.
Comment #145
geaseIn this patch update, preLoad implementing hook_entity_preload() was moved before static cache when loading an entity, to allow this hook implementations still to act where there is already a default revision loaded. It was actually a todo in this patch. And separate cache for preload hook results was introduced. And the latter is kept separate from "normal" entities static cache. The patch also contains a test for that.
There was another test failure in content moderation. The assertion that failed was introduced in this patch, and i can't see why it should be met. Once reverted ContentModerationStateTest.php to the head 8.9 branch, test passes.
Comment #146
geaseMinor updates to the test.
Comment #149
geaseAnother try at fixing all tests. This time that's because loadUnchanged() doesn't check its argument. Created a separate issue for that #3099709: Prevent null parameter value in loadUnchanged() like in load()
Comment #151
geaseSeems that !empty() was too risky condition. Changing to !==NULL.
Comment #156
gngn CreditAttribution: gngn at Computer Manufaktur GmbH commentedWe encountered the problem showing a view with node revision data.
The count of SQL-Statements skyrocketed to more than 5000 and the page got so slow it exceeded PHP's max_execution_time of 240 seconds.
My colleague um adapted the patch in #151 so we could apply it to 9.3.5.
Seems to work for us, but we did not perform much testing yet - so we will upload the patch for 9.3.5 after we did a little more testing.
Comment #157
hchonov@gngn that's great! I was hoping also to be able soon to dedicate some time and get back to it... It would be great if you could post the reroll of the patch and then we can see what the automated tests tell. After having some time passed I also think like it was suggested here that we should divide the issue in two parts - one for the persistent and one for static cache, otherwise this issue will just get bigger and not have high chances of getting in. When you upload the patch and we fix the eventual test failures I would start working on dividing the approach.
Comment #158
gngn CreditAttribution: gngn at Computer Manufaktur GmbH commentedReroll for 9.3.5 and 9.3.6 (hope it applies to 9.4.x-dev).
We stumbled upon this issue trying to resolve a timeout issue with revisions.
The attached reroll of #135 seemed to help - but now we found out that we only reduced the number of timeouts - so we are still testing...
Comment #159
apadernoIt's preloaded entities.
Comment #160
gngn CreditAttribution: gngn at Computer Manufaktur GmbH commented@apaderno thanks for taking a look.
Do you mean the typo in the comment (preloaded entites should be preloaded entities) - or do you mean we are missing something about the preloaded entities?
Comment #161
apaderno@gngn I meant the typo in the comment.
Comment #162
yogeshmpawarResolved CSpell errors in #158 & Addressed #159.
Comment #164
yogeshmpawarUpdated patch will fix test failures.
Comment #166
yogeshmpawarFix test failures & added interdiff.
Comment #167
apadernoComment #168
yogeshmpawarThanks @apaderno, Fixed CS issues & added an interdiff.
Comment #171
needs-review-queue-bot CreditAttribution: needs-review-queue-bot as a volunteer commentedThe Needs Review Queue Bot tested this issue. It either no longer applies to Drupal core, or fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
Apart from a re-roll or rebase, this issue may need more work to address feedback in the issue or MR comments. To progress an issue, incorporate this feedback as part of the process of updating the issue. This helps other contributors to know what is outstanding.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #172
muriqui CreditAttribution: muriqui at CivicActions for National Science Foundation commentedReroll of #168, resolving two conflicts:
$items
array and$this->cacheBackend->setMultiple($items)
, rather thanthis->cacheBackend->set()
as in #168.$entity->expects($this->any())->method('isDefaultRevision')->will($this->returnValue(TRUE));
to$entity->expects($this->any())->method('isDefaultRevision')->willReturn(TRUE);
Comment #175
weseze CreditAttribution: weseze as a volunteer commentedPatch applies correctly to latest releases of both D9 and D10. (haven't tested dev)
We are using it to fix serious performance issues in a heavy customised layout builder setup. Layout builder pages can contains up to 100 inline blocks. (but really there is no actual hard limit for us)
Before the patch (while editing and translating pages with LB) we had 5-10.000 SQL queries taking 3-5seconds of loading time. (depending on the complexity of the layout builder setup on the page)
After the patch this went down to few hundred SQL queries and e few hundred ms of loading time.
This is a lifesaver for us, so thanks to everyone here for all the work put in!
Comment #178
capysara CreditAttribution: capysara at Bounteous commentedI rerolled patch #172 and moved it into a MR, but it still needs work to get all the tests to pass.
I'm hiding the patches so that future fixes can continue in the MR.