Proposed commit message
Issue #2432791 by alexpott, vijaycs85, tim.plunkett, joshtaylor, Fabianx, Berdir, yched, bojanz: Skip Config::save schema validation of config data for trusted data
Problem/Motivation
Configuration Import is slow and needs a lot of memory.
A lot of that is the schema validation in Config::save. We also recalculate the dependencies for all configuration entities.
Proposed resolution
Add a trusted flag to Config::save() / ConfigEntityBase::save() so that casting to schema and calculating dependencies can be skipped for module and theme installation.
This does not affect schema coverage during testing because we still have the config save event listener that checks all configuration - and if any module ships with invalid config this will find it :)
Furthermore, once we implement import validators for dependencies and schema we can skip them during import making that quicker too. See
- #2414951: Validate configuration schema before importing configuration
- #2416109: Validate configuration dependencies before importing configuration
Remaining tasks
- Review
- Commit
User interface changes
None
API changes
- Add a trust_data flag to
Config::save()
- Add
ConfigEntityInterface::trustData()
andConfigEntityInterface::hasTrustedData()
- Add
config_export
annotation to configuration entity types - Add
ConfigEntityTypeInterface::getPropertiesToExport()
Beta phase evaluation
Issue category | Bug because major performance/DX/UX impact on config import |
---|---|
Issue priority | Critical because of the massive performance improvements to the installer and cold cache view of node/1 |
Comment | File | Size | Author |
---|---|---|---|
#97 | 2432791.97.patch | 70.88 KB | alexpott |
#97 | 94-97-interdiff.txt | 2.66 KB | alexpott |
#94 | 2432791.94.patch | 68.55 KB | alexpott |
#94 | 85-94-interdiff.txt | 3.76 KB | alexpott |
#85 | interdiff.txt | 948 bytes | tim.plunkett |
Comments
Comment #1
Fabianx CreditAttribution: Fabianx commentedComment #2
Fabianx CreditAttribution: Fabianx commentedComment #3
alexpottSomething like this?
Note that this patch should fail because we have tests that schema are enforced during module install.
Comment #5
alexpottWhoops.
Comment #7
alexpottWill still fail (ithink) but less :)
Comment #8
catchComment #9
alexpottComment #10
alexpottComment #11
alexpottComment #13
alexpottFingers-crossed.
Comment #15
dawehnerWhy do we consider something as not trusted, once we have saved it? Maybe add some line of documentation to explain that behaviour.
Comment #16
Gábor HojtsyThe IS says "Add a trusted flag so that these steps can be skipped for module installation.". Would be good to extend the IS on what is the scope of the proposed trusted flag, what would it mean, etc. Likewise, as @dawehner pointed out the code would need some docs too. Its hard to evaluate this patch without knowing its scope.
Comment #17
alexpottNot sure why this change is exposing the bug in the contact module but it is - it is probably due to removing the calculate dependencies change.
Comment #19
Fabianx CreditAttribution: Fabianx commentedDo you think views could use this check too to skip the $this->addCacheMetadata() step or should this be follow-up in another issue?
I like the patch a lot already!
Comment #20
alexpottFixed the last test fail.
Comment #21
vijaycs85Minor: makeTrusted/markTrusted would be more appropriate.
Minor: isTrusted?
Minor: $is_trusted = FALSE
!!float ?
Comment #22
Fabianx CreditAttribution: Fabianx commented#21: +1 to isTrusted() and markTrusted().
Comment #23
Wim Leerss/$trusted_data/$is_trusted/
It's a boolean, not data, the parameter name should match that.
There's no parameter?
Comment #24
bojanz CreditAttribution: bojanz commentedComment #25
alexpottThanks for the reviews everyone.
re @vijaycs85 #21
re @Wim Leers #23
Comment #26
BerdirLooks like drush si standard is about 45s vs. 43s with this, only did a single run, so might vary a bit.
I guess I was hoping for a bit more, but we still have runtime configuration changes, that need to be checked, and thefore we end up parsing/loading the config schema after every installed module anyway.
One obvious example is saving the module list in ModuleInstaller ;)
Actually, even if we e.g. do call it there with save(TRUE), we still need need the config schema for default configuration, because config entities need to config schema in toArray().
To summarize, while this makes things a bit faster, there is no way to avoid re-parsing config schema after every installed module.
Comment #27
Fabianx CreditAttribution: Fabianx commentedIt saves quite some memory with non-drush install though ...
Comment #28
BerdirSure, it helps, just not quite as much as I'd hoped it would :)
Comment #31
alexpottRerolled.
Comment #33
alexpottThis patch continues to find interesting things :)
Comment #34
vijaycs85+1 to RTBC.
Comment #36
Fabianx CreditAttribution: Fabianx commentedWorks nicely here.
RTBC
Comment #37
vijaycs85updating beta evaluation...
Comment #38
vijaycs85Comment #39
joshtaylor CreditAttribution: joshtaylor commentedReroll due to https://www.drupal.org/node/2090115
Comment #41
vijaycs85seems fails are due to new config introduced in #2090115: Don't install a module when its default configuration has unmet dependencies
Comment #43
vijaycs85oops... Missed one...
Comment #45
alexpottThe rerolls from #39 and included a lot of unrelated change. Interdiff is to #33.
Comment #47
alexpottFixing new test failures.
Comment #48
Fabianx CreditAttribution: Fabianx commentedI don't want to hold this up as the patch is really really good, but I still wonder why we use two different schemas to do the same thing ...
That feels weird in a way.
I forgot:
Is there a reason for that?
I think the parameter needs to be $has_trusted_data here as well.
Comment #49
alexpottComment #50
Fabianx CreditAttribution: Fabianx commentedRTBC then, thanks for the information!
Edit: Does this need the API change tag?
Comment #51
alexpottWe can stop the schema cache rebuilding during extension install too.
Comment #52
Fabianx CreditAttribution: Fabianx commentedLooks great to me!
I wonder if its almost easier to make trusted the default and opt-in for validation, but that can be a follow-up discussion.
I also wondered if you would like some new profiling for #51?
Comment #53
alexpottDone some per testing... I'm not sure this is worth it. I've fixed all the untrusted writes during both standard and minimal install. As @Berdir pointed out in #26 we need to do something about ConfigEntityBase::toArray().
Minimal install
Without the patch
With the patch
With all configuration writes are trusted.
Standard
Without patch
With patch
With all configuration writes are trusted.
Comment #54
Fabianx CreditAttribution: Fabianx commentedPre-liminary numbers on #53:
With patch
* 1 m - 1m 3 secs 'real time' without patch
* 23.* secs user time - without patch
With patch
* 48 - 51 secs 'real time'
* 16.* secs 'user time'
Comment #56
alexpottSo the patch attached no longer uses schema to work out what properties to export to config from config entities. It would also allow camel case properties to be mapped to snake case keys in config.
@Fabianx can you profile this - I'm seeing a 10 second improvement during a standard install in Drush - but not a lot of memory improvements.
Comment #57
alexpottI left some debug in :(
Comment #59
bojanz CreditAttribution: bojanz commentedNice! Note that I started a similar mapping in #2411967: Allow config entities to specify the config key <-> entity property mapping in order to resolve #1832630: [policy, then patch] Decide on coding standard for configuration keys of ConfigEntities, I'm fine with most of that work happening here instead.
Comment #60
alexpottFixed some of the tests.
Comment #61
Fabianx CreditAttribution: Fabianx for Acquia commentedTagging
Comment #62
Fabianx CreditAttribution: Fabianx for Acquia commentedNeeds some profiling, RTBC from my side unless something is missing ...
Comment #63
tim.plunkettAdding this tag not because it improves DX, but because it regresses.
Comment #64
bojanz CreditAttribution: bojanz commentedReviewed the patch.
The good things:
1) Improved performance, and the ability to mark data as "trusted".
This is major use case in Commerce 2.x because we import known valid data from our libraries (currencies, address formats, tax types, tax rates).
2) The ability to map entity properties to config keys. This resolves #1832630: [policy, then patch] Decide on coding standard for configuration keys of ConfigEntities, an issue we've been stuck on for years, which is a big source of core inconsistency, causing core to not respect its own coding standards, and core entity types to somethimes use camelCase properties, and sometimes not.
Most config entities have at least one property that will need to be mapped this way.
3) Much less important than #1 and #2, a config entity can now be saved before it has a schema declared, making it easier to get started with some prototype code. I'm not sure how important this is, but I'll "allow" the argument.
The bad things:
1) Config entity types now need to define another annotation key, listing their properties. This is another step, and duplication, since they will end up declaring the properties in schema as well. So, not exactly elegant.
However, I'm okay with it, since #2 basically mandates it (and the patch in #2411967: Allow config entities to specify the config key <-> entity property mapping started doing the same), and because performance.
Comment #65
joshtaylor CreditAttribution: joshtaylor commentedReroll.
Comment #67
alexpott@joshtaylor These lines needed to be merged.
Comment #68
alexpott@tim.plunkett and @bojanz actaully if the ConfigEntityType does not provide an annotation it falls back to using schema - also if you do provide a mapping you don't have to provide a schema. So in someways this patch increases flexibility and developer choices.
Comment #69
alexpottWith this patch
Standard site install using drush
Viewing node/1 on a code cache
XHprof overall summary:
Without this patch
Standard site install using drush
Viewing node/1 on a code cache
XHprof overall summary:
Given the 10 second improvement on standard install and the 0.5 sec improvement on cold cache I think this should be critical. The reason for the node/1 cold cache performance boost is the entity displays a cached and in building their cache entry they use
toArray()
which in HEAD requires schema and in this patch does not.Comment #70
yched CreditAttribution: yched commentedWondering what triggers a cache/serialization of the EntityDisplay in node/1 ?[edit: sorry, I misread @alexpott's comment. More on that in a new comment below]
Also - that number on the cost of toArray() using schema seems scary...
Comment #71
alexpott@Fabianx noticed that we hadn't quite finished removing
config_schema_test.schema_in_install
New comparison available here... This shows cold cache node/1 - the improvement has dropped to a 17% reduction in wall time - still significant. Xhprof is unreliable with this many function calls :(. But even with that this is still critical because that level of improvement is still a massive 118ms.
Comment #72
joshtaylor CreditAttribution: joshtaylor commentedSeeing some nice speed improvements when installing via drush as well..
Without:
With:
Comment #73
Fabianx CreditAttribution: Fabianx for Acquia commentedRTBC, looks great now!
Comment #74
Fabianx CreditAttribution: Fabianx for Acquia commentedAdded a proposed commit message.
Comment #75
yched CreditAttribution: yched commentedAlso, added some thoughts based on this in #1977206-36: Default serialization of ConfigEntities. Opinions welcome :-)
Comment #76
alexpottTicking the boxes in the commit credit form to make @Fabianx's suggested commit message the commit message.
Comment #77
yched CreditAttribution: yched commentedI initially understood that as an EntityViewDisplay being serialized, which felt odd for node/1.
But actually that's EntityManager::getViewModes() - its data is cached, but on cold cache this internally (getAllDisplayModesByEntityType()) calls toArray() on every view_mode / form_mode config entity.
That's not specific to node/1, it would happen on any page after a cache rebuild, when rebuilding local tasks data (form modes / view modes add local tasks in Field UI screens)
getAllDisplayModesByEntityType() calling toArray() on every view_mode / form_mode config entity feels a bit weird, but I don't really have a better proposal. It's really the cost of the schema-based toArray() that is scary :-/
Comment #78
alexpottre #70/#75
@yched the cache of the EntityDisplay is caused by calling
Drupal\Core\Entity\EntityManager::getAllDisplayModesByEntityType
which is called as a part ofDrupal\node\NodeViewBuilder::getBuildDefaults
.@yched and yep using schema for toArray() has always been a bit scary because of toArray()'s reliance on schema being available.
Comment #79
BerdirMore numbers:
PHP 5.6, HEAD:
PHP 5.6, patch:
PHP 7, HEAD:
PHP 7, patch:
$properties_for_annotation is used exactly once, why not just loop over $this->config_export?
this is a bit weird, because it's not actually static ;)
Also wondering if we shouldn't camelCase it, to make it clear that it's not an annotation property?
$this->normalizedPropertyMap or something?
why false and not null? in other places, like entity load, we switched from returning false to null.
Comment #80
xjmSo this will need a reroll for #2204697: Move getConfigPrefix() to ConfigEntityTypeInterface apparently -- also, looks like some feedback in #79 to address.
Comment #81
alexpottAddressed everything in #79 and done the reroll due to #80. Interdiff is tricky due to reroll. C&p of the relevant code that addresses #79.
No weird local variables anymore :)
Comment #82
BerdirYes, getPropertiesToExport() looks much better now :)
Is it worth to have a change record for this? It's kind of new, but also something that existing entity types are expected to implement and possibly also custom hook_install() code.
Comment #83
tim.plunkettWhat's to stop us from moving this into getPropertiesToExport(), in order to avoid recalculation of this as well?
Why camelCase for one and not the other? Just because one is intended for use in an annotation?
In the calling code we don't differentiate between [] and NULL (by using empty()), why not just always return an array?
Adding some unit tests to cover the changes to config entities.
Comment #85
tim.plunkettMy apologies. I shouldn't refactor right before uploading without rerunning tests :)
Comment #86
Berdir2. Yes, that's exactly why I suggested camel case. It makes sense IMHO to separate it from the annotation keys, because setting it could have a very weird effect. But it's not too important if others disagree.
Comment #87
BerdirOn 3. I kind of like the explicit definition of not knowing the properties to export. But if we do 1. (and I kind of like it), then that should make this irrelevant because we always would have an array.
Comment #88
alexpottThe problem with (1) is that we then need to provide the full name - since without that we can't determine the schema. So doing this would make yched sad - see #1977206: Default serialization of ConfigEntities
Comment #89
yched CreditAttribution: yched commentedRe "1": so yeah, #1977206: Default serialization of ConfigEntities will need to determine cases where ConfigEntityBase ::serialize() can safely call toArray() and be sure that it won't rely on config schema.
AFAICT, the easiest way will be to check for the presence of a config_export key directly, so I'm not sure we will really use getExportProperties(), we have no actual use for its return value anyway, what we need is the end result of toArray().
That would mean ::serialize() makes assumptions on how ::toArray() works (config_export key present == toArray() won't rely on the schema), but I don't really see a better option ?
Comment #90
yched CreditAttribution: yched commentedIn fewer words :-), I don't think I see an issue with getExportProperties() internalizing the "config_export key or use config schema" logic ?
Comment #91
yched CreditAttribution: yched commentedPosted a PoC in #1977206-41: Default serialization of ConfigEntities
Comment #92
alexpottI'd still rather not unnecessarily tie this method to schema and the configuration entity's configuration object name.
Comment #93
alexpottWe've done the necessary profiling see #69, #72 and #79
Comment #94
alexpottAdded some tests which found some edge cases in Config and Configuration entities if schema has changed the underlying data.
Comment #95
alexpottUpdating suggested commit message.
Comment #97
alexpottFix up the tests.
Comment #98
Fabianx CreditAttribution: Fabianx commentedBack to RTBC, looks great now!
Comment #99
catchCommitted/pushed to 8.0.x, thanks!
Comment #101
tim.plunkettWow that was fast. Guess #83.1 and #83.3 didn't make it :(
Comment #102
Fabianx CreditAttribution: Fabianx commented#101: Sorry, I missed that it was not yet answered.
Can we do #83.1 as a non-critical follow-up?
Comment #103
catchI also missed that it wasn't answered, seems OK for a follow-up though indeed.
Comment #104
vijaycs85created #2483407: Allow configuration schema fallback in ConfigEntityType::getPropertiesToExport() to work without an ID
Comment #105
alexpott#92 is my response to #83.1 and .3. I just don't think it is necessary to couple ConfigEntityType to the configuration object name of a Config entity object.