Problem/Motivation
On a site that is using paragraphs fields on taxonomy terms the taxonomy_post_update_make_taxonomy_term_revisionable
is taking hours. If the paragraphs fields are removed the update is quick.
I think this is a bug because the currently in HEAD updates can take hours. This is happened because all the paragraphs entities are loaded during every save. The problem seems to be coming from the line
$needs_save = $this->entity instanceof EntityNeedsSaveInterface && $this->entity->needsSave();
In \Drupal\entity_reference_revisions\Plugin\Field\FieldType\EntityReferenceRevisionsItem::preSave() this causes the paragraph entity to be loaded. So your taxonomy terms have lots of paragraphs then this results in many many additional entity loads.
Proposed resolution
Stop calling preSave methods from \Drupal\Core\Entity\Sql\SqlContentEntityStorage::restore()
. Generally these are not necessary. If they are then you can override the restore implementation in the entity's storage class. For example in TermStorage we do:
/**
* {@inheritdoc}
*/
public function restore(EntityInterface $entity) {
$entity->preSave($this);
parent::restore($entity);
}
Remaining tasks
User interface changes
None
API changes
None. There is a behaviour change to an @internal method. \Drupal\Core\Entity\Sql\SqlContentEntityStorage::restore()
no longer does
$entity->preSave($this);
$this->invokeFieldMethod('preSave', $entity);
Data model changes
None
Release notes snippet
@tbd - I don't think this needs one because we're changing the behaviour of a method marked @internal and there is a change record.
Comment | File | Size | Author |
---|---|---|---|
#62 | 3066439-follow-up.patch | 1.05 KB | alexpott |
#58 | 3066439-2-58.patch | 13.11 KB | alexpott |
#58 | 56-58-interdiff.txt | 756 bytes | alexpott |
#56 | 3066439-2-55.patch | 13.11 KB | alexpott |
#55 | 53-55-interdiff.txt | 2.58 KB | alexpott |
Comments
Comment #2
alexpottComment #3
alexpottComment #4
alexpottSilly me. That is not going to work. I changed approaches to minimise API change last moment and forgot to test locally. We can't just add that to the interface without breaking the world.
Comment #5
alexpottSo this shouldn't break the world.
Comment #6
daniel.bosenI tested this patch in conjunction with #3066440: Taxonomy update to revisionable is taking hours with paragraphs fields on our installation with about 6000 terms and 15000 paragraphs and the update took a minute instead of 5 hours, as it did before.
This is quite nice indeed!
Comment #7
Ghost of Drupal PastMight not be part of this patch but EntityStorageInterface::restore probably needs better documentation and probably tests too because I can't find adequate documentation or test either but it might be just me. What is a "restore"? How is it different from save, when should it be called instead of save? I filed #3066674: Missing typehints in SqlFieldableEntityTypeListenerTrait to begin travelling the road to understanding.
Comment #8
catchThis might be the same root cause as #3056540: taxonomy_post_update_make_taxonomy_term_revisionable() timeouts with large numbers of terms.
Still needs tests and a change record but otherwise looks good.
Comment #9
alexpottComment #10
alexpottHere's a test for the restore method and the calling of the field methods including the new preRestore.
Comment #11
alexpottBeen discussing with @amateescu about why are we invoking preSave on the fields. Maybe it is not necessary.
Comment #12
alexpott@amateescu asked me to file this as well. This will fail...
Not sure about #11.
Comment #14
alexpottDiscussed with @amateescu. We agreed that removing all calls to pre-save is the right thing to do as only removing the call to field presave but leaving the entity presave in felt inconsistent and unexpected. This means that it is on the entity type to ensure it is restorable before any restore is done. So for taxonomy terms we can call preSave() in the its storage class's implementation of restore. This is similar to https://git.drupalcode.org/project/entityqueue/blob/e099b13506f664afe46e...
Now we've move the preSave to taxonomy term storage it feels like some special for taxonomy terms to set up the parent term ID in the following situation]
and its okay that field preSave is not being called.
Comment #15
alexpottI've updated the change record.
Comment #16
alexpottSome docs updates.
Comment #17
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThis storage class should be named
EntityTestUpdateStorage
:)Let's throw the exception here directly instead of calling the entity's
preSave()
method. The goal of that exception is to test for something going wrong while calling$storage->restore()
, so it doesn't matter if it's thrown by the entity class or the storage handler.I'm not sure about this, why don't we set the custom storage class directly in the entity type definition?
Comment #18
alexpott@amateescu good points! Fixed them all.
Comment #19
alexpottCommas are nice.
Comment #20
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNice! I think this is ready to go, hoping that the testbot agrees as well.
Comment #23
alexpottWhoops. Too many entity_test modules.
Comment #24
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedYay, all green :)
Comment #25
casey CreditAttribution: casey at SWIS commentedGreat, we just had an issue while upgrading a site to D8.7 caused by these calls to preSave (without calling presave hooks).
Not sure if I should open a separate issue, so I just give some background here:
* The site implements a hook_ENTITY_TYPE_presave that overrides the behavior of MenuLinkContent::preSave() always setting $entity->setRequiresRediscovery(FALSE) for non-"internal:" links.
* The post update menu_link_content_post_update_make_menu_link_content_revisionable() calls this new ::restore() method which in turn calls $entity->preSave() for any menu link content.
* Our hook_ENTITY_TYPE_presave is never called, and thus our rediscover override is not called.
* Now that the rediscover flag was not set for the menu links we want to be rediscovered (so we can programatically attach children links to it) the menu_tree was totally broken.
Calling $entity->preSave (without invoking hooks) in ::restore() is not only slow, it can destroy your data!
Comment #26
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@casey, thanks for sharing that use case, it's super useful! And I think it makes this issue critical.
Comment #27
amateescu CreditAttribution: amateescu for Pfizer, Inc. commented@casey, just to clarify my understanding of the problem described in #25, I think the current patch fixes it because we stop calling
MenuLinkContent::preSave()
duringrestore()
so the discoverability and the menu tree are not impacted anymore by the conversion to revisionable, right?Comment #28
catchIt's possible that this will result in a custom/contrib entity type running into the same issue as taxonomy module, but I think the risk of that is outweighed by the criticality of the bug and the number of cases we know are affected. So +1 on the general approach.
One point though:
This is only saving three lines of code, why not copy it? It runs the potentially of running additional code that could be added to Term::preSave() as it currently is. Or at least add a clear comment explaining why it needs to be called here.
Comment #29
alexpott@catch yeah I ummed and ahhed on that when I posted #23 but now given #25 I think I now prefer copying the code. It makes it explicit and sets a good example for contrib & custom and means that the docs are adhered too - i.e. no presave is called during a restore.
Comment #30
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedI also considered pointing out that we could copy that code, so I'm glad both of you think it's better this way :)
Comment #31
casey CreditAttribution: casey at SWIS commented@amateescu #27, correct
Comment #32
Ghost of Drupal PastIf I understand correctly the goals of
restore
thenTermStorage::restore
is wrong -- my understanding is you have the entity coming from somewhere else but it's a entity previously saved in D8 (perhaps in another Drupal instance, of course) and as such the target_id was zeroed on a previous (presumed)save()
. If this mental model is correct thenrestore
should be declaredfinal
so you can't break the assumption.Comment #33
BerdirI was wondering about that too. Term parents used to be very special as it was only a half-field, but that has been changed a while ago and I think that is there (and not only a default value for the field) to make it impossible to break the tree by having no value at all.
However, it should never have no value in a restore situation. So I think this method isn't needed. I'm not quite sure yet if there *are* use cases for overriding that method and if it should be final or not.
Comment #34
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedHere's at least one use case for being able to "massage" the data before the restore process: https://git.drupalcode.org/project/entityqueue/blob/e099b13506f664afe46e...
If we want to be able to remove the
restore()
override from the latest patch, we need to look into and fix the failures from the patch in #12.Comment #35
alexpottSo here's a clue about what';s going on. It appears making the term parent field revisionable has a problem.
Backup data - will be the same for both runs
Broken - without the taxonomy storage restore override
Works - with the taxonomy storage restore override
Comment #36
alexpottOh and it's not just the revision table...
It looks like we have a multilingual issue
Comment #37
alexpottSo #2543726: Make $term->parent behave like any other entity reference field, to fix REST and Migrate support and de-customize its Views integration did an incorrect update. It used the wrong langcode for the langcode of the parent reference. These fields are not translatable and therefore, I think, though it'd be good for an entity translation expert to confirm, should use the site's default langcode. Here's a fix that proves it - can't be the final fix though :(
Comment #38
alexpottSo something like...
Comment #39
alexpottAnd now with some commentary and protections against custom and contrib code.
Comment #40
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedExcellent find and fix!
It's -> Its :)
"is a sign" -> "could be a sign"?
We should add some test coverage for this in
\Drupal\Tests\taxonomy\Functional\Update\TaxonomyParentUpdateTest::testTaxonomyUpdateParents()
.Comment #41
catchThis is a content change, could it go in a post update instead?
Also very nice find this is a lot better.
Comment #42
alexpottRe #41 - nope unfortunately it can't - it has to come before the revisionable post update. And the post updates are by definition not in any guaranteed order.
Comment #43
Berdirwhat about terms whose default langcode isn't the site default langcode?
I think it should be the default langcode of the entitty, not the site.
Comment #44
alexpott@Berdir is of course correct... @Berdir was who I had in mind when I wrote
:)
It needs to be what we do in
\Drupal\Core\Entity\Sql\SqlContentEntityStorage::saveToDedicatedTables()
ie...Comment #45
Ghost of Drupal Past> Note that overriding this method to fix data prior to restoring is a sign that the current data is corrupt.
My primary language is not English (it's PHP :P ) but wouldn't "Do not override this method unless fixing corrupt data prior to restoring is necessary" be better? And if this is what it is, is it the role of
restore
to fix corrupted data, isn't that the role of the caller? I still would love to seerestore
asfinal
. The test could overridedoSave
instead ; by makingrestore
final we could remove this note altogether and send a strong message about how it should be used. Sorry I am a bit stubborn here, I am mostly just scared of what we are getting into again, we had special saves for the update path before and it ended up in tears and very slightly corrupted data which eventually caused the weirdest migration faults...Comment #46
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedThe documentation of
hook_post_update_NAME()
says that ordering is possible:Comment #47
alexpottRe #46 - I still don't think that this is a post update function. It's fixing a data corruption.
Comment #48
alexpottHere's a patch that addresses @Berdir's point in #43. The update get considerably more complex :(
Also here's a test against 8.6.x that shows we have a data corruption issue. Note it will not fail against 8.7.x as the new revisionable update is fixing things (atm in HEAD).
Comment #49
alexpottGiven we've got complex batching going on in the update let's test it.
Comment #50
alexpott@chx's suggestion of not overriding restore for the exception test is a good one. It allows us to revisit the
final
discussion in the future and makes us adhere to the docs better. Ie. there are no examples of overriding in core.Comment #52
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNote that we are setting
entity_update_batch_size
to 1 since #3007606: Force all update path tests to only run one entity per batch, so we're executing 65 batch process for this update test :)Comment #53
alexpott@amateescu we're not executing 65 batch process - that's not how it works. It means we looping in
_batch_process()
more and checking resources more often.Anyhow let's bump the number to 30 and I also noticed a bug in my update function :).
Comment #54
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedSince we're doing straight db queries, we need to ensure that the entity type is using core's default SQL content entity storage.
I don't think we should document how code used to work :)
How about adding this test method to the existing
TaxonomyParentUpdateTest
test class?Comment #55
alexpottThanks for the review @amateescu.
Re #54
1. Fixed
2. Rewrote the docs
3. We can't do this. We're using different database dumps.
Comment #56
alexpottHere's the patch
Comment #57
amateescu CreditAttribution: amateescu for Pfizer, Inc. commentedNice! I think this patch is good to go now :)
Comment #58
alexpott@chx pondered the difference between
is_a()
andis_subclass_of()
- as far as I can see there's only really the third param difference is this case. Core prefers is_subclass_of() - so swapping the patch to use that.Comment #61
catchLooks great now. Committed/pushed to 8.8.x and cherry-picked to 8.7.x, thanks!
Comment #62
alexpottVery very small follow-up to fix PHP 5.5 on 8.7.x
Comment #63
BerdirLooks good but is the first part really only for php 5.5/8.7?
Comment #64
alexpott@Berdir we can apply this to both branches... one less line of code for 8.8.x doesn't really matter.
Committed 19071a4 and pushed to 8.8.x and to 8.7.x.
Comment #68
heddnDoes the CR related here need to be updated or published? It is still in draft status.