Problem/Motivation

Saving an entity is a complex process. At moment the following happens

1. Load an entity (it might be retrieved from the entity cache).
2. Trigger entity save.
2.1. loadUnchanged is called to set $entity->original.
2.2. resetCache will reset the persistent and the static cache.
2.2. When the original entity is loaded it will be put back again in the persistent and static cache.
2.3. After the entity is saved it will be removed again from the persistent and static cache.

Like you see a lot of DELETE and INSERT statements for the cached entity will occur while it is being saved, which
offers a place for performance optimization.

Additionally there are couple of places in core where loadUnchanged is called even not as part of the save process, but as an example when an entity form is validated - see \Drupal\Core\Entity\Plugin\Validation\Constraint\EntityChangedConstraintValidator::validate.

Proposed resolution

loadUnchanged actually does not have to delete the entity from the persistent cache, as the entity in the persistent cache will not be changed anyway and will always be the most up to date one.

loadUnchanged should not reset the persistent cache and try loading the entity from there.

Not a perfect measure but something I've measured in one of our systems where we save a lot of entities at once -

without the patch five times in a row:
13.168s
11.808s
12.153s
11.523s
15.467s

-> average = 12,8238s

with the patch five times in a row:
9.796s
9.380s
9.503s
9.661s
9.771s

-> average = 9,6222s

Remaining tasks

Review the proof of concept.

User interface changes

none

API changes

none

Data model changes

none

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

hchonov created an issue. See original summary.

hchonov’s picture

Title: Improve entity saving the improving loadUnchanged and caching the entity again after it is saved » Optimize entity saving by improving loadUnchanged and caching the entity again after it is saved
hchonov’s picture

Status: Active » Needs review

Status: Needs review » Needs work

The last submitted patch, optimize_entity_save_poc.patch, failed testing.

hchonov’s picture

Issue summary: View changes
Status: Needs work » Needs review
FileSize
2.29 KB

The just saved entity should be persistently cached before the hooks have been executed, if we want to cache it statically as well, we would have to clone the entity object, but as the static cache is usually not needed after save we'll not statically cache the entity. The only place this could be done atm is in ContentEntityStorageBase::invokeFieldPostSave, but probably we could think of some better solution.

Status: Needs review » Needs work

The last submitted patch, 5: 2753675-5.patch, failed testing.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.0-beta1 was released on August 3, 2016, which means new developments and disruptive changes should now be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hchonov’s picture

Status: Needs work » Needs review
FileSize
1.66 KB
2.27 KB

It looks like the problems are caused by putting the entity into the persistent cache after it has been saved.. Lets try out only with changes in loadUnchanged.

hchonov’s picture

A new implementation where loadUnchanged will not destroy the entity reference loaded already in the static entity cache.

Currently if we load load multiple times we will always get the same object reference.. until we call loadUnchanged and then calling load again would not return the same object reference as the previous times ... I've changed that now and loadUnchanged is no longer destroying the previous entity object reference.

Let's see what happens :)

hchonov’s picture

This should be better now with a test provided.

The last submitted patch, 8: 2753675-8.patch, failed testing.

The last submitted patch, 9: 2753675-9.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 10: 2753675-10.patch, failed testing.

hchonov’s picture

Status: Needs work » Needs review
FileSize
4.66 KB
674 bytes

As the entities are put into the persistent cache before the post load has been executed we have to execute it if we have retrieved the entity directly from the persistent cache.

hchonov’s picture

Title: Optimize entity saving by improving loadUnchanged and caching the entity again after it is saved » loadUnchanged should not alter the entity caches and instead of reseting the cache retrieve the entity from the persistent cache if present
Issue summary: View changes

Status: Needs review » Needs work

The last submitted patch, 14: 2753675-14.patch, failed testing.

hchonov’s picture

Not a perfect measure but something I've measured in one of our systems where we save a lot of entities at once -

without the patch five times in a row:
13.168s
11.808s
12.153s
11.523s
15.467s

-> average = 12,8238s

with the patch five times in a row:
9.796s
9.380s
9.503s
9.661s
9.771s

-> average = 9,6222s

I would say this makes a difference ;)

hchonov’s picture

Issue summary: View changes
timmillwood’s picture

I've been looking into why this is breaking content_moderation and I can't really work it out.

Looking at the failing test in \Drupal\content_moderation\Tests\ModerationLocaleTest

The following code there is failing:

    // Now check that we can create a new draft of the translation.
    $edit = [
      'title[0][value]' => 'New draft of translated node',
    ];
    $this->drupalPostForm('fr/node/' . $english_node->id() . '/edit', $edit, t('Save and Create New Draft (this translation)'));
    $this->assertText(t('Article New draft of translated node has been updated.'));
    $english_node = $this->drupalGetNodeByTitle('Another node', TRUE);
    $french_node = $english_node->getTranslation('fr');
    $this->assertEqual($french_node->moderation_state->target_id, 'published');
    $this->assertTrue($french_node->isPublished());
    $this->assertEqual($french_node->getTitle(), 'Translated node', 'The default revision of the published translation remains the same.');

Above this section we assert that $french_node is published. Then using the UI we create a new draft. So the published version stays as the default, and the draft becomes a forward revision. When we load the entity it should load the default revision which is published, however it seems to be loading the default revision. I assume that when the new draft is saved it gets cached, so when we load it's loading the cached draft revision, rather than the default revision?

Berdir’s picture

Looking into this. Something in content_moderation is wrong, because it marks the draft as default revision.

Berdir’s picture

Yeah. \Drupal\content_moderation\EntityOperations::isDefaultRevisionPublished() does a load() from storage and that relies on the fact that loadUnchanged() clears the cache.

Works fine if you change that to loadUnchanged().

Also, not too happy about the no-cache fallback. load() will actually still update and write the persistent cache, so we gain very little and have some pretty strange workarounds.

I see two alternative options:
a) Move this down to the sql implementation. Call gFromStorage() directly, to work around static and persistent cache completely.
b) just call parent::loadUnchanged(). This is IMHO a very rare situation anyway and might not be worth to optimize?

timmillwood’s picture

good find @Berdir!

Berdir’s picture

And because we now know that original is actually the same, we can simply use that.

That said, this shows that this causes a behaviour change when calling load during presave. Custom code that doesn't know about original probably uses this too.

I would suggest we explicitly unset the static cache, that keeps that behaviour without making it any slower as long as load isn't called.

timmillwood’s picture

Thinking about this, the isDefaultRevisionPublished() calling load() makes sense to make sure the default revision is published makes sense.

why isnt load returning the default revision?

Berdir’s picture

Static cache, you get the exact same object again.

Load unchanged is also the default revision.

timmillwood’s picture

@Berdir - I think load should always get the default, and the cache should always get cleared at the right time to make sure that's the case.

hchonov’s picture

Also, not too happy about the no-cache fallback. load() will actually still update and write the persistent cache, so we gain very little and have some pretty strange workarounds.

I see two alternative options:
a) Move this down to the sql implementation. Call gFromStorage() directly, to work around static and persistent cache completely.
b) just call parent::loadUnchanged(). This is IMHO a very rare situation anyway and might not be worth to optimize?

Hmm what is the problem if the entity is not present in the persistent cache to call load and put it there, but not in the static cache like it is currently done in the patch? What we gain is that when loadUnchaged is called multiple times, what actually happens in core during save (EntityChangedConstraintValidator::validate and EntityStorageBase::doPreSave), and if the entity is not present in the persistent cache it would be put there only once and not multiple times like now. And when dealing with a lot of entities, this makes a big difference. Additionally you might have custom code, where you want to retrieve the unchanged entity on save as well.

Berdir’s picture

@timmillwood: load() does always return the default revision. As does loadUnchanged(). But load() also returns the statically cached object. And you are in *pre-save*, the entity has not yet been saved. The fact that the cache is cleared on HEAD is not a documented behavior, it is a side effect of how loadUnchanged() works. If another backend would implement it differently, your code would fail. But as I already wrote, I think we should explicitly invalidate the static cache as a BC layer.

@hchonov: Did you do any tests as to when it actually goes in the no-cache case? To be able to save an entity, you always need to load it first. As long as you load and save the default revision, the static cache could only *not* exist when you have some concurrency with another process that saves too (and then you have bigger problems) or you invalidate yourself before saving, which again, should be pretty rare. And even the non-default-revision case would go away once the issue that adds caching for that goes away.

I'm just trying to avoid code/complexity/side effects for what I think is a very rare case. But as written above, at a minimum, we should remove the part that sets back the static cache, which would actually simplify your code a lot already, as you don't need to get the existing entity or set it back, you simply need to invalidate it.

Berdir’s picture

Another case that could theoretically happen is e.g. with forms and ajax, that would store the entity in form_state over multiple requests and someone *might* invalidate during that time. But even with that, note that as long as you are on node/NID/edit or any other URL with the NID in the URL it will upcast/load on any request anyway.

hchonov’s picture

@hchonov: Did you do any tests as to when it actually goes in the no-cache case?

No, I did not tested the case when the entity is not present in the persistent cache, but I am pretty sure we need this fallback, even if the entity is saved in a concurrent process we still want to get the newest one entity - this is actually why the EntityChangedConstraintValidator::validate is using loadUnchanged.

But as written above, at a minimum, we should remove the part that sets back the static cache, which would actually simplify your code a lot already, as you don't need to get the existing entity or set it back, you simply need to invalidate it.

I do need to set back the static cache in order for the following to work:

$entity_first_load = $storage->load($id);
$storage->loadUnchanged($id);
$entity_second_load = $storage->load($id);
$this->assertSame($entity_first_load, $entity_second_load);
hchonov’s picture

Another case that could theoretically happen is e.g. with forms and ajax, that would store the entity in form_state over multiple requests and someone *might* invalidate during that time. But even with that, note that as long as you are on node/NID/edit or any other URL with the NID in the URL it will upcast/load on any request anyway.

The entity stored into the form state should be used until the user finally submits the form, then we would need a conflict management solution in case the entity has been saved in the background. I am going to work on such a solution soon, but this is a different issue.

Berdir’s picture

Yes, it needs to *work*. But both my suggestions would work too. It is a different question whether we should *optimize* for it.

About the static cache:

If we would design this on a clean slate then I would agree with you, it would make sense.

However, this is already not the case in HEAD because loadUnchanged() resets the cache. So I don't think we have existing test coverage that does anything like that.

But the test fails actually prove that we have code that is expecting the opposite. that doing load() after a loadUnchanged() must *not* return the same object. And I think it would be better to not break that in 8.x. I have a feeling that more contrib and custom code will rely on that, especially if they don't know about ->original. I'm pretty sure they will pass if you just remove the setStaticCache() call in your code.

Berdir’s picture

Correction: It will not be enough to just remove that call, as we need to do it for both cases.. when loading from persistent storage as well.

Berdir’s picture

Implemented my suggestion, also added more comments to explain why we are doing it.

For the record, with the static cache reset, calling loadUnchanged() and just load() then behaves pretty much the same. So technically, we could simply call ::load() after unsetting the static cache, which would basically have the same effect as the current implementation, with only a minimal performance difference.

This passed at least one failing content moderation tests and will likely be green. I still think that content_moderation should use loadUnchanged() or simply ->original as we know that that they are all 100% the same thing, and for BC, can also not change to something else.

Status: Needs review » Needs work

The last submitted patch, 34: load-unchanged-2753675-34.patch, failed testing.

Berdir’s picture

So, it is failing on the new test added here, but that isn't passing on HEAD either and as I attempted to explain above, actually has side effects, so I think we should not add that test. We are not fixing a bug, we are doing a performance optimization and we should attempt to do so without any kind of behavior change.

hchonov’s picture

These changes + #2825247: loadUnchanged should be loading also the unchanged referenced entities in order to have a complete unchanged entity structure loaded are needed for the autosave_form module and without them the module is not working for inline entity forms based on entity references. I am going to add tests there, but I still do not get it what is the problem of fixing loadUnchanged messing up with the caches? I as developer would expect that loadUnchanged will not break what load is returning.

hchonov’s picture

Status: Needs work » Needs review
FileSize
2.32 KB
2.48 KB

So I've investigated further and the problems I have are caused by the current behavior where the persistent cache is emptied, but the problems are gone when #34 is applied.

So I agree to leave the topic of this issue just optimisation we probably could discuss in an another issue to change the behavior and add the failing test. Removing it from the current patch now.

hchonov’s picture

So I've just noticed that we forgot to simulate the behavior of the parent at 100% by not putting the loaded unchanged entity into the static cache.

hchonov’s picture

Title: loadUnchanged should not alter the entity caches and instead of reseting the cache retrieve the entity from the persistent cache if present » loadUnchanged should use the persistent cache to load the unchanged entity if possible instead of invalidating it
Issue summary: View changes
tstoeckler’s picture

Status: Needs review » Needs work

I think this looks great and will be a pretty huge performance boost for people heavily using Inline Entity Form / Paragraphs / ...

Some comments:

  1. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -627,6 +627,43 @@ protected function setPersistentCache($entities) {
    +    // and then reloads the entity. That is slow, especially if this is  done
    

    Double space before "done".

  2. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -627,6 +627,43 @@ protected function setPersistentCache($entities) {
    +    // an entity. Attempt to optimize this for content entities by trying to
    +    // load them directly from the persistent cache again.
    +    $entities = $this->getFromPersistentCache($ids);
    

    1. "Attempt to" is superfluous in my opinion.

    2. I guess it's obvious for everyone in this issue, but would still be nice to pin-point the difference between static cache and persistent cache here, something like, adding: "... as in contrast to the static cache that will never be changed until the entity is saved." or something.

  3. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -627,6 +627,43 @@ protected function setPersistentCache($entities) {
    +    // The cache invalidation in the parent has the side effect that loading the
    +    // same entity again during the save process (for example in
    +    // hook_entity_presave()) will load the unchanged entity. Simulate this
    +    // by explicitly removing the entity from the static cache.
    +    $this->removeFromStaticCache($id);
    

    Very minor, but in terms of code flow, I think it would be more readable (and more in line with the parent function) if this were first.

  4. +++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
    @@ -627,6 +627,43 @@ protected function setPersistentCache($entities) {
    +      $this->setStaticCache($entities);
    

    This should be wrapped in if ($this->entityType->isStaticallyCacheable()) { so that it matches EntityStorageBase::loadMultiple()

  5. +++ b/core/lib/Drupal/Core/Entity/EntityStorageBase.php
    @@ -154,6 +154,16 @@ protected function setStaticCache(array $entities) {
       /**
    +   * Removes an entity from the static cache.
    +   *
    +   * @param mixed $id
    +   *   The ID of the entity to remove from the static cache.
    +   */
    +  protected function removeFromStaticCache($id) {
    +    unset($this->entities[$id]);
    +  }
    

    I think it makes sense to introduce this, but if so we should also use it in EntityStorageBase::resetCache() and ContentEntityStorageBase::resetCache() (not sure why that doesn't call the parent, but I guess that's a separate issue).

    Also, even though I like "remove" better because it is more descriptive of what is actually happening, should this be named "resetStaticCache()" to be consistent with "resetCache()"?

Marking needs work for 4. and 5.

hchonov’s picture

Status: Needs work » Needs review
FileSize
2.66 KB
3.36 KB

I've renamed the function to resetStaticCache and left it at first time as protected and moved it into the ContentEntityStorageBase, as actually the function is 1:1 copy of the EntityStorageBase::resetCache.

hchonov’s picture

Oh and I am not really sure we have to check if the entity type is statically cacheable before putting the entity into the static cache through ::setStaticCache as the method itself will make this check. What do you think?

tstoeckler’s picture

+++ b/core/lib/Drupal/Core/Entity/ContentEntityStorageBase.php
@@ -627,6 +627,46 @@ protected function setPersistentCache($entities) {
+    $this->resetStaticCache($ids);

@@ -647,6 +687,24 @@ public function resetCache(array $ids = NULL) {
+  protected function resetStaticCache(array $ids = NULL) {
+    if ($this->entityType->isStaticallyCacheable() && isset($ids)) {
+      foreach ($ids as $id) {
+        unset($this->entities[$id]);
+      }
+    }
+    else {
+      $this->entities = [];
+    }
+  }

Sorry for being so picky about this, but as far as I can see, this is now a 100% copy of EntityStorageBase::resetCache(), right? I think that's not ideal, TBH. Maybe we can just call parent::resetCache() for now, and I will open a follow-up issue to improve the code flow in all those different functions. I.e. maybe we can then add resetStaticCache() in EntityStorageBase directly and make the code a little bit more self-documenting. Would you be OK with that?

Oh and I am not really sure we have to check if the entity type is statically cacheable before putting the entity into the static cache through ::setStaticCache as the method itself will make this check. What do you think?

That's a good point, but EntityStorageBase::loadMultiple() does that as well, so I think it's nice to be consistent. That is also something I would improve in a follow-up (i.e. we could do that in the same one).

hchonov’s picture

So instead of defining a new method resetStaticCache we just call parent::resetCache and document that the parent is resetting only the static cache? Yes, this is fine with me.

tstoeckler’s picture

Yes, exactly. Then (assuming that everyone agrees with this) I would open a follow-up to improve the code-flow of the various methods there. But that way we can keep this patch here focused and minimal.

catch’s picture

I haven't fully caught up with this issue but hchonov pinged me to discuss.

Just quickly though I think it's OK to change the behaviour for people calling load() with no revision ID inside entity saving - they should just use $entity->original if that's what they want, and that's an easy change to do now- i.e. it's a bug in the contrib/custom module that happens to not be exposed at the moment, if they're lucky.

hchonov’s picture

Addressing #44.

tstoeckler’s picture

Status: Needs review » Reviewed & tested by the community

Thank you, perfect.

Opened #2826366: [PP-1] Stream-line (Content|)EntityStorageBase::resetCache() and friends for improving the code flow.

Rajab Natshah’s picture

+1 More Testing

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed c628bd5 and pushed to 8.3.x. Thanks!

This looks really nice.

  • alexpott committed c628bd5 on 8.3.x
    Issue #2753675 by hchonov, Berdir, timmillwood, tstoeckler:...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.