I tested on a profile2 entity save, with numerous fields, changing one one value.
Without patching the field API, I got 409 SQL queries with about 98 DELETE, 144 SELECT, and 36 INSERTS ; Others are just modules reacting upon hooks and bootstrap. Once applied, it lowered to a total of 54 SQL queries all inclusive (same query). [Actually those result were the result of an erroneously catched AJAX request]
My xdebug stack trace, which actually weights 12 M (which is huge) went down to 11 M (which is not really significant, but 1 Mo on a less bloated site is a lot). I couldn't find any significant CPU abnomalies due to field values deep comparison. [EDIT: This is still true after catching the right profile trace, even better since it lowered to 10 M]
Doing deep array comparison for field values upon the original
entity property did this, I excluded field data from writing whenever they match.
This probably cause a effing lot of CPU consumption but I guess that in other scenarios such as Commerce site order/cart update or delete, this might actually drastically the SQL concurrency and give a serious backend performance boost, at the cost of a maybe significant frontend CPU load.
So, I'm just testing for fun, if tests passes I'd try on some live sites.
For Drupal 7 see #2470619: Do not attempt field storage write when field content did not change. Current issue is about Drupal 8, for which the change is not trivial.
Comment | File | Size | Author |
---|---|---|---|
#95 | 92-95-interdiff.txt | 917 bytes | alexpott |
#95 | 2297817.95.patch | 15.54 KB | alexpott |
#95 | 92-95-interdiff.txt | 917 bytes | alexpott |
#92 | interdiff.txt | 1008 bytes | yched |
#92 | 2297817.92.patch | 15.54 KB | yched |
Comments
Comment #1
pounardComment #2
pounardSeriously, my numbers are totally wrong, it sometime happens that SQL query count lowers for an unknown reason, I'd say there are some other modules doing it wrong and messing up with my caches, continuing introspecting this.
Comment #4
pounardHum most field values come from the form API while being saved and there are values left that don't belong to field schema, so the comparison can't work without cleanup the input values first in most cases, I'll try to do that.
Comment #5
pounardNew test with a better comparison algorithm that prunes empty values and prune non field schema driven properties.
With this patch, I actually reduced DELETE queries from 98 to 10, and INSERT queries from 36 to 14 on my test use case (changing a single field value in the profile2 form for which at least 40 fields are present in the form).
Comment #6
pounardSomething that cannot be reduced is the
entity_load_unchanged()
function that loads the original entity and resets cache at some point before: everything gets loaded from database no cache at all in this specific scenario. On my original 400+ SQL queries, I did remove 100 of them due to a buggy module, and the only abberation are the 96 calls tofield_sql_storage_field_storage_load()
due to theentity_load_unchanged()
function. For those, we cannot really do better.Comment #8
pounardJust a minor itch with revisions: when inserting a new revision it's not possible to skip any fields because they would be missing to the newly created revision.
Comment #9
pounardComment #10
pounardAfter some discussion with Berdir, I'd try another way and fix this into the storage itself. I'm not fond of this variant since having it in the
field_attach_update()
function would mean that other storage engines that default would have it for free ; In the other hand applying it into the engine makes it really simpler to fix in Drupal 8 code which has not diverged that much.Comment #11
pounardI will fix for D7 first, because I do need it on real life projects, but I'll do the D8 fix too if I can find some time for it.
Comment #12
pounardComment #13
pounardHere it is, cleaned up a bit function names, added a lot of inline documentation (better having too much than none) and fixed false positives when comparing field item values due to widget elements ordering not being the same as in the database storage.
This new patch is done directly into the storage engine.
My local tests won't pass because of weird file permission errors, anyway this is supposed to pass here.
Comment #15
pounardWhy but why !!! AAAHHH
Comment #16
pounardThis one should pass, but I suspect it reveals a weird bug or a bad behavior when saving an entity with no original one attached to it, but passing something else than an entity on the
original
property of the node.Comment #17
pounardOups.
Comment #18
pounardDrupal 7 patch is ready to roll :) Working gracefully and shows a real benefit when you have a high number of entity writes.
Here is a Drupal 8 version for fun, I'm not sure at all it will ever pass, at least node save works great on my working copy. Note that I'm not well acquainted with Drupal 8 entity and field API yet so there might be numerous best ways to achieve this, at least it's a start. Hope you like it Berdir :D
Comment #19
pounardFor the record, my comments in there are pretty much wrong I did rewrote it something like 10 times ! And the most important comments about Form API leaving unordered and arbitrary element values in the field items from the Drupal 7 patch should be put in the Drupal 8 patch too.
That's likely to be a problem at some point that field values may contain arbitrary data and unordered properties, it makes comparison very difficult, and data might even never be the same from one runtime to another ! Nothing could confuse more the developers than moving properties in there.
Comment #20
pounardOff topic: I also figured that profiling a single entity display page generates a xdebug profiler trace that weights 40M. That's huge ! Even on the most bloated Drupal 7 sites I had to work on, I never went more than 20M on a single page (that's quite a difference). It does not implies that performance is that wrong, it's just a stack trace, but that's oftenly quite relevant.
Comment #22
pounardPlease note that Drupal 7 is working, and Drupal 8 version will differ a lot because Field API si not the same, I'd really like to see Drupal 7 unblocked even thought it's not ready for Drupal 8.
Comment #23
pounardDrupal 7 patch needs review.
Comment #24
pounardThis is not a test anymore, it is actually working. I'd like you, core commiters and maintainers, to study the possibility of commiting the Drupal 7 core patch without going throught Drupal 8.
Drupal 8 did diverge so much in Field API from Drupal 7 that it is a way more complex bug, and both patches won't have nothing in common.
Please consider reviewing this a Drupal 7 only patch.
Comment #26
mgiffordEven if the patches look nothing alike, the feature needs to exist in D8 before it is brought into D7. There's not much point reviewing the D7 patch unless you are actively running into this problem.
Comment #27
BerdirThanks for digging up this issue, I was looking for it :)
Yes. The code in 8.0.x is really not that different, it was moved around, but the concepts are still exactly the same.
And as @mgifford said, there's little chance that a change like this will make it into 7.x, and definitely not without implementing it in 8.x first.
@alexpott started working on a changed-fields concept in #2456257: [PP-1] Normalizers must set $entity->_restSubmittedFields for entity POSTing/PATCHing to work, I had something a bit different in mind as that won't help much with e.g. saving entities in a form.
Comment #28
pounard9 month, many Drupal 7 releases, and this work is lost in space because of D8-first insanity. Do it for D8 if you want, but I won't help you for this, I already tried, and we already lost valuable months for thousand of existing in production Drupal 7 sites.
Comment #29
BerdirThat may be true, but there's nothing that we can do about it, Drupal 7 isn't moving much and there's zero chance that it will be committed in 7.x, and nobody is looking at it. However, we do have a fair chance to get it into 8.x, and have it widely tested there, and then a 7.x backport might still be an option.
It also has nothing to do with "D8-first-insanity", it's just extremely hard to find people to review 7.x patches. I have a few of those patches myself. If this would have been RTBC for weeks/months, then would understand your point, but that's not what this has been blocked on.
Here's a reroll and a few fixes, no interdiff as it required quite a few changes to apply and is a on a different class now. Passed some of the tests where it failed before.
Comment #30
BerdirNow with actual patch.
Comment #31
BerdirDon't need the null stuff actually, missed that it's checked already.
Comment #34
BerdirisNewRevision() doesn't work anymore at that point, apparently, it's set back earlier.
Comment #35
pounardIt's untrue, Drupal 7 is live and tens of thousands of production sites are running it. Drupal 8.x won't be stable nor used for criticial production services before months or years if it things don't go as expected. Many people are actually turning their back to Drupal 7 only because core contrib such as you tell this exact same thing, while thousands of them are actually just expecting a bit of attention. If you're not capable of maintaining the stable version of the product, I won't ever trust you ever for the next release to be a good product. Let's be honest, this is just taking production site hostages just because you don't care about it. I do care about it, because this is what I do for a living: maintaining production sites. Drupal 8 is today just an unstable, slow, and fashionable toy while we have real constraints out there in the real world.
EDIT: And I am sorry for the rant, but sad reality is sad, lots of Drupal 7 patch are commited only when Drupal 8 changed enougth so that they're not applicable anymore, and that somewhat makes me angry because I encountered the case more than once. I did re-run tests over your patch because I'm not really why this it should fail this way.
Comment #37
pounardAbout the Drupal 8 patch, the *[Data|Thingy]Interface had an ::equals() method, this patch would be a 1-liner; but that's another topic.
Comment #40
Fabianx CreditAttribution: Fabianx commentedI actually think chances are there this could get into D7, but yes it needs to go into D8 first.
And often enough D8 found some problems that D7 would have run into later, so the backport policy is fine.
@pounard: Please add to the groups.drupal.org/high-performance list of D7 performance patches, then it gets more visibility and testing.
Even announcing in that group usually gets you at least 3-4 reviewers ...
Comment #41
BerdirThis should fix the last test fail. We can't return early when checking for empty, other translations might have a value.
Comment #43
BerdirStrange, not exactly sure why that is happening.
Comment #44
BerdirComment #45
amateescu CreditAttribution: amateescu commentedThis is the same optimization that we have for menu links even in D7, so it makes perfect sense to bring it on the entity level as well.
We definitely need to take this approach because TypedData/FieldItemList objects can work with their internal properties so we'll save a lot php magic function calls by not accessing them through the entity object.
I'm not sure these are actually needed, the field items are probably already filtered to death up to this point :)
@pounard, you might be interested in this issue #2462101: [policy, no patch] Update backport policy to account for 8.x minor versions and 7.x divergence, although in your case it's more like "frustrating" than "confusing" :)
Comment #46
Fabianx CreditAttribution: Fabianx commentedCan we get a follow-up for that?
RTBC, looks great to me.
Do we need additional test coverage for that?
Comment #47
BerdirI think that was a crosspost with @amateescu, I'd like to run some performance tests with this, and explore ways to make the comparison code faster and nicer. But thanks for the RTBC :)
Comment #48
pounardEven though I'm glad to see this finally waking up, the fact is that D8 did take the D7 version as hostage for a year. I still think the backport policy is stupid and should not exists. I won't do D8 patches, already did, lots of time, but D8 is moving too fast and I don't have a few days per week to spend to follow something that is inconsistent in time; even if I am following what's happening for D8, I don't have the man time to learn deeply its internals since everything I'll learn will be outdated a few weeks later. Every patch I made in this context was getting broken by another one, even when a D8 contributor finally vouch for it, new concerns around the patch arise and D7 is put into sleep until all core contributors agree about a radically different D8 solution: this sometime takes up to 2 years (not a joke) to see a D7 one-liner able to get in, just because D8 was too broken or too different to get it in. In almost all cases, this ends up with a radically different solution in D8. Both issues in almost all cases should be split and have their own lifetime, they are NOT the same issue, they just embrace the same base concept. D8 and D7 are not the same product anymore, get over it and just open two issues.
Comment #49
plach#2428795: Translatable entity 'changed' timestamps are not working at all is trying to add an internal state to entities to track field value changes. This might be useful aside from the storage context. Actually it's currently limited to just track which translation objects have changes, but could extended to field values. Thoughts?
Comment #50
Fabianx CreditAttribution: Fabianx commented#47: Per the backport policy, we should get the minimal fix into D8 OR split this off for a larger fix and allow D7 to go ahead.
It is fine to check profiling, but #45.1 is out of scope for a minimal fix.
Comment #51
pounardThanks a lot @Fabianx.
@platch is right there might be a less odd solution than iterating over all levels of field values.
I think that @Berdir's patch in #44 is probably the smaller fix that can be done, and it passes tests, but I'm convinced that the @todo I left about ::equals() or ::compare() would be the best approach to solve this.
Comment #52
mkalkbrennerI had a look at the patch #44. It's right that it's just a minimal change. Nevertheless if we introduce a tracking of "dirty" fields within entities triggered by value changes / additions / removals, function hasFieldValueChanged() can skip all its logic and just access that list of dirty fields.
There are already two patches (I'm aware of) targeting such a tracking:
#2456257: [PP-1] Normalizers must set $entity->_restSubmittedFields for entity POSTing/PATCHing to work
#2428795: Translatable entity 'changed' timestamps are not working at all
Comment #53
pounardSo, just open a Drupal 8 issue and leave Drupal 7 go with its own then ^^
Comment #54
BerdirSo, tested this on my install profile with 33 nodes with different content, just loaded them all and saved them again. Most have quite a few configurable fields. More realistic use cases would be to change a field or two, but that could also be a base field, like publishing or unpublishing a bunch of nodes. So this basically shows the most optimistic use case of re-saving without changing any configurable field.
Time spent in Drupal\Core\Entity\Sql\SqlContentEntityStorage::saveToDedicatedTables() goes from 400ms with HEAD to 100ms with the patch. Most of the remaining time is now spent in hasFieldValueChanged() (85ms), so there's definitely room for improvements, but at the same time, there's also the xhprof overhead to consider, as usual, that's close to doubling the execution time. When comparing with just microtime(), the execution time just for the save calls is:
HEAD:
xhprof enabled: 1.6s
no xhprof: 1s.
Patch:
xhprof enabled: 1.25s
no xhprof: 0.75s
So, this is definitely a nice improvement. And that site does have almost no content. I've run the same script on a site with 500k nodes and limited it to loading and saving 50 nodes, without xhprof:
HEAD: 1.66s
Patch: 0.95s
The test script:
What I'm worried about is that hasFieldValueChanged() has no explicit test coverage. We know it's not breaking anything, but just from looking at the test results, we don't actually know if it is filtering out things. Will think about writing some tests for that.
Comment #55
Fabianx CreditAttribution: Fabianx commentedSo 20-25% faster for the less content case and 40% faster for the much content case. WOW!
So regarding xhprof overhead:
One thing that has been overlooked is that xhprof overhead is almost _constant_ (because it is a constant call per PHP CPU tick more), that means if you test a site with xhprof enabled both patched and unpatched and then do the same with xhprof off, the absolute numbers will vary widely, but in all benchmarks the percentage wise was always very near together OR within the margin of error even.
So if we save 60k function calls then that will have less effect in absolute numbers, but almost the same effect in percentage numbers, because those 60k calls have a constant overhead caused by the profiling.
This is the same here again, so I think we can stop the paranoia with xhprof. ;) (though providing numbers without xhprof is good of course!)
-------------------------
That all said:
This is numbers wise a critical issue based on what it can save. Yes, it is not a regression from D7, but it blocks a D7 patch.
Therefore - in being honest - marking as such.
Entity Saving even in D7 is very slow and I have seen lots of entity saving related slowness recently on some sites in New Relic, so something to improve that is very much appreciated.
I agree some explicit test coverage would be nice.
Thanks for the benchmarks, Berdir!
Comment #56
pounardI think that it won't be easy to test this effectively since the field content may vary upon the scenario, when you save from a form you have a highly polluted set of data that comes from the form metadata or inputs that are not directly data to save. That's something that should be fixed too. The most important thing in my opinion is to test the entity save more than this function itself, and that's what core already does cover pretty well.
Comment #57
jibranNW because of #55
Comment #58
Anonymous (not verified) CreditAttribution: Anonymous commentedOh sweet. There is an issue for this. I was just thinking about it :D
Comment #59
xjmComment #60
alexpottI created FieldItemListInterface::equals and added a unit test for it.
Comment #61
Fabianx CreditAttribution: Fabianx commentedLooks great!
I think a unit test is all we need, therefore: RTBC.
Comment #62
BerdirLooks good, just some code standard stuff.
The unit tests are great, but they also won't help answer the question if we need that filter calls or not.
The only thing I'm wondering about is if we want to move the field item level comparing into the FieldItem class...
We don't really need "sadly" here I think ;)
afaik coding standard requires a comment here.
I know it's long, but I think coding standard says we don't abbreviate variables :)
Comment #63
pounardFrom what I see this issue is definitely taking over into "not so simple to resolve" and therefore should not block the Drupal 7 issue, as per the backport policy states.
Comment #64
pounardComment #65
xjmComment #66
yched CreditAttribution: yched commentedThat @todo is done ?
Minor - the actual flow might be clearer as :
if ($field_definition->isTranslatable()) {
$langcodes = array_keys($entity->getTranslationLanguages());
}
else {
$langcodes = [$entity->getUntranslated()->language()->getId()];
}
param name is misleading, it's an ItemList, not an item
--> $items_to_compare ? or just $items ?
Comment #67
yched CreditAttribution: yched commentedThen, thinking about the overall logic :
Current patch deletes / re-inserts all values in all languages if one value changed in one language.
Instead of hasFieldValueChanged() ("did a value change in any language, TRUE/FALSE"), we could do getChangedLanguages() ("list of languages where a value changed"), and then delete / re-insert values only for those languages ?
Would mean the same number of queries as with current patch (one DELETE + one multi-INSERT per field where some value changed in any language). Just, it would delete and re-send much less data on multilingual sites (95% use case is to update values in one language only)
OTOH, this would mean actually doing the comparison in all languages, while the current patch stops comparing at the first language where a change was found, so that's potentially extra CPU overhead.
Would that be worth benchmarking ?
Comment #68
Fabianx CreditAttribution: Fabianx commented#67: Is it possible to do the language comparisons in a follow-up? Or would that complicate a future version and mean a BC break for things introduced here?
There is a whole meta devoted to language + revisions and writing data and dirty entities.
Comment #69
alexpottI don;t think making the comparison on the field item level is worth it. The current method compares multiple field items at the same time and is quicker because of that. The current patch adds a further optimisation by checking early for equivalence so we don't do unnecessary array walking and sorting.
re #62
re #66
$list_co_compare
Comment #70
Fabianx CreditAttribution: Fabianx commentedBack to RTBC, all feedback has been addressed.
Comment #71
yched CreditAttribution: yched commentedre: @alexpott :
Is that in reply to #67 ? Because that is not what I'm suggesting there :-)
re @Fabianx #68:
Refining to #67 in a followup would mean breaking the protected ContentStorageBase::hasFieldValueChanged() introduced here. Technically that's a BC break, would only break subclasses, meaning mostly Mongo, which probably woudn't use the method anyway.
Then it's mostly about splitting the thinking and discussion into two issues :-)
Then sorry, NW for a last remark on the current patch :
I'm not too comfortable with a generic $items->equals($other_items) method that has a side effect of modifying both $items and $other_items.
- Could we leave it to the caller ?
- Or maybe forget the generic FieldItemListInterface::equals(), and scale down to just having an internal ContentEntityStorageBase::compare($items_1, $items_2) helper, which can have any side effect it wants ?
Comment #72
alexpott@yched re "I don;t think making the comparison on the field item level is worth it" - nope that was a reply to @Berdir
re:
Nice catch - I totally agree - it is wonky. Fixed the filtering on equals and added tests around this.
Comment #73
Fabianx CreditAttribution: Fabianx commentedLooks great, leaving to yched or berdir to do the final RTBC. (according to new policy this would be: needs subsystem maintainer sign-off)
Comment #74
plachI don't think this if is required, the code in the IF branch should always work.
I don't think is covering the case where we remove translations: can't we just compare
array_keys($entity->getTranslationLanguages())
andarray_keys($original->getTranslationLanguages())
before entering the loop?Comment #75
alexpott@plach thanks. Fixed both of your points.
Comment #76
plachLooks good, thanks! Can we please add test coverage for the translation removal case?
Comment #77
alexpott@plach is there not implicit test coverage of this?
Comment #78
plachWell, the entity translation kernel test should provide implicit coverage for that, but then why the previous patch was green?
I'm also wondering whether it would be better to
asort
the langcode arrays before comparing them. I don't think language order could change during a request, but better safe than sorry.Comment #79
alexpott@plach I don't think we should bother using
asort
because this is a performance change and doing unnecessary sorting seems unnecessary.Added a test - but unfortunately it does not fail using the code from #72 :(
I found a bug with how we're detecting new revisions on entities that are not using revisions.
Comment #80
mkalkbrennerBTW I gave the new method FieldItemList::equals() a try and reused it for #2428795: Translatable entity 'changed' timestamps are not working at all (see #48). Seems to work :-)
Comment #81
yched CreditAttribution: yched commentedAgreed with the last patch, but I'll let @plach RTBC ?
Comment #82
yched CreditAttribution: yched commented:-)
Comment #83
plachOk. I'll see whether I can provide a failing test as a follow-up. No point in holding this up.
Comment #85
alexpottIt turns out that we mark the entity has not a new revision a little bit early.
Comment #86
alexpottlol
Comment #88
alexpottHmmm... we have problems with - is the code below actually valid?
It is called from
Drupal\path\Plugin\Field\FieldType\PathItem
Comment #89
alexpottDiscussed with @Berdir and we decided that we should change urlInfo to only throw an exception when we don't have an ID.
Comment #90
yched CreditAttribution: yched commented@alexpott: can you summarize why we need to move the "enforceIsNew() / setNewRevision()" bits around ?
Also, looks like the patch now *always* sets enforceIsNew(FALSE) the moment an entity is saved, instead of only in the "if $is_new" code branch previously ?
Comment #91
alexpott@yched I moved that because when I moved the setNewRevision() it seemed consistent to move this as well. So we only set the isNew and isNewRevision properties once the entity is completely saved and all the hooks can rely on this being consistent too.
Atm if you call isNew inside a field insert hook it will return false - even though you know you dealing with a new entity.
Comment #92
yched CreditAttribution: yched commentedDiscussed in DDD, makes sense noew.
Attached patch with a minor suggested change just to make reading that code easier next time I bump into it :-)
RTBC if those changes are ok with you.
Comment #93
yched CreditAttribution: yched commented@alexpott says he's OK with interdiff #92, so RTBC
Comment #94
catchVery minor nit but shouldn't this be $original_items since it's just a local variable?
This is nice that that's all we have to do.
A long time ago, we discussed using the setters to mark when a field was changed, which would allow us to drop $original. That should improve performance on entity writes further, since we'd save the loading we have to do to get $original. However I have a feeling we'd still have to do the comparison since for form submissions everything in the form would be 'changed' even if it isn't (unless we somehow did the comparison in the form submit and only set the changed items?). I'm not sure if that is actually feasible yet or not though.
Comment #95
alexpottFixed #94.1 - thanks for the review. Since it is such a minor nit I'm setting straight back to rtbc.
Comment #96
alexpottComment #97
yched CreditAttribution: yched commentedYeah, we definitely don't do that ;-)
So yes, marking changes through the setters would still need to actually compare the values, because a form submit "sets" all fields.
Comment #98
webchickLooks like catch's feedback was addressed. I did have one question which was why we're doing this special for entities and not, say, also config objects or whatever, but Alex explained because those aren't doing multiple table writes. Fair enough.
Committed and pushed to 8.0.x. Thanks!
Moving back to 7.x for backport, though I'm less sure it's critical there.
Comment #100
BerdirA separate issue was openend for D7: #2470619: Do not attempt field storage write when field content did not change. So I think we can mark this as fixed.
Comment #101
mkalkbrennerdeleted
Comment #102
plachAssigning back to me so I don't forget about that non-failing test.
Comment #103
mkalkbrennerHmmm,
I tried to use this function to detect changes in preSave() of an entity, in other word I compared some fields of $this against $this->original. On fields that obviously did not change this FieldItemList::equals() nevertheless returned FALSE. The reason sems to be this final check:
return $value1 === $value2;
This obviously has to fail on values like these timestamps:
or these booleans:
Comment #104
alexpottphp--
Comment #105
mkalkbrennerShould I re-open this issue or open a new one?
Comment #106
mkalkbrennerI opened a follow-up:
#2475237: Method FieldItemList::equals() uses type safe comparison limiting usefulness
Comment #107
joelpittetYou'd almost need to preemptively cast the types to their intended types before comparison, is that even a viable possibility?
PHP-- indeed!
Comment #108
Anonymous (not verified) CreditAttribution: Anonymous commentedIt's not PHP's fault. You know how the language works so don't blame it. The issue is the inconsistency of data handling, that's all. And I don't see a reason why not use == instead of ===.
The issue here should be the tests not catching this.
Comment #109
joelpittet@ivanjaros I was thinking things like this may pop-up:
Comment #110
mkalkbrenner@joelpittet, @ivanjaros: The discussion continues at #2475237: Method FieldItemList::equals() uses type safe comparison limiting usefulness. Please post you're comments there.
Comment #112
xjmRemoving tag since there was a separate 7.x issue.