Problem/Motivation
Follow-up from #3199691: [Symfony 6] Symfony no longer parses octal numbers starting with 0 alone.
- We want to use modern features in Symfony YAML but are being constrained by the complexity of supporting these features in PECL YAML
- PECL YAML uses libYAML, and libYAML still does not have YAML 1.2 support. https://github.com/yaml/libyaml/issues/20
- We originally added support for PECL YAML due to concerns with the performance of Symfony's parser, especially when compared with parsing JSON when that was a possible config storage format. Symfony is generally parsing YAML on build, not runtime.
- PECL YAML was not installed on PHP 8 for ages and when it was we didn't see a massive speed up
- I think all the important parts of YAML parsing use a FileCache. This means that if the regular cache is not available and the APC cache is primed with these files then YAML parsing does not happening. This means that it would be possible to warm these caches prior to a cache clear on production as FileCaches are not cleared during a cache clear.
- Performance tests in #20 showed the performance difference with PECL YAML is negligible
Steps to reproduce
Proposed resolution
Keep an eye on libYAML, and track any divergency/workarounds in the meantime that we might be able to undo once both YAML parses have 1.2 support.
- Remove the YAML implementation switching to only use Symfony YAML
- Move the logic from
\Drupal\Component\Serialization\YamlSymfonyto\Drupal\Component\Serialization\Yaml - Replace
\Drupal\Core\Serialization\Yamlwith an alias toDrupal\Component\Serialization\Yaml - Remove the Yaml PECL validation work-arounds
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
| Comment | File | Size | Author |
|---|
Issue fork drupal-3205480
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3205480-drop-pecl-yaml
changes, plain diff MR !6224
Comments
Comment #2
andypostComment #8
wim leersComment #9
wim leersPer @alexpott at https://drupal.slack.com/archives/C2THUBAVA/p1699953782042549?thread_ts=...
As an additional benefit: we'd be able to use
enums directly in our simple config & config entities' YAML representations: https://symfony.com/blog/new-in-symfony-6-2-improved-enum-support#enums-...Thoughts?
Comment #10
wim leersA first example where this would benefit greatly: #3401493: Convert 3 LOCALE_TRANSLATION_OVERWRITE_* constants used in the UI to a backed enum.
Comment #11
catchComment #12
alexpottAlso this means we can't use enums as arguments in the container. See https://git.drupalcode.org/project/drupal/-/jobs/353909 ...
Comment #13
alexpott#2951046: Allow parsing and writing PHP class constants and enums in YAML files is very interesting.
Comment #14
wim leersThis blocks the original/better approach in #3402168: Follow-up for #3361534: Config validation errors can still occur for contrib modules, disrupting contrib and other issues like it.
Comment #15
catchWe originally added support for PECL YAML due to concerns with the performance of Symfony's parser, especially when compared with parsing JSON when that was a possible config storage format. Symfony is generally parsing YAML on build, not runtime.
This was ten years ago though, but I think we need a comparison of something YAML-heavy like router rebuilding with and without the extension to see how they compare now.
Comment #16
andypostFYI since this 10 years PECL yaml extension has no changes and the same for libyaml which still has no support for 1.2 https://github.com/yaml/libyaml/issues/20
Comment #17
andypostMoreover 2 unit tests are broken when yaml extension enabled
Comment #18
wim leersI ran the latter. This is with:
symfony/yaml6.0.Results of the benchmark, which parses 3 YAML files 1,000 times for each parser:
Comment #19
andypostFYI last night the tarballs for new releases of PHP are published, I will test it today and tomorrow will roll out new images
Comment #20
wim leersFirst result for each of the 3 files: ~0.2 seconds for Symfony versus ~0.05 seconds for PECL YAML in #18, for the most complex YAML file. That's 4 times slower!
OTOH … how often does Drupal parse 1,000 YAML files? 🤔
Modifying the benchmark to match Drupal's needs
Let's check with a good body of YAML files in Drupal core. Those in the Standard install profile:
Results
Conclusion
This suggests to me that assuming a fairly recent machine (in terms of CPU + SSD) and PHP >=8.1 (the minimum for Drupal 10), the real-world performance difference is negligible.
Comment #21
alexpott@Wim Leers it's not just config... which is not usually parsed during a cold cache... it's services / routing / menu links etc...
Comment #22
catchYeah the ones I'm concerned about are these, particularly on more complex sites with 200-400 modules installed.
- container rebuild
- router rebuild
- library definitions
- menu links
- menu tasks
- menu actions
If you clear caches from the performance page, then you have to go through 100% of these before you get the next paint. iirc router rebuild happens on the POST request after the form submission, and the rest will happen on the subsequent GET request, but they nearly all block everything else and each other. Give or take menus though since they might happen within bigpipe placeholder replacement.
If you clear caches on a high traffic site, then the longer these take, the more chance of stampedes where multiple requests are trying to rebuild the same cache items before any one of them has finished.
If we take a site with 300 modules installed, then assume that every module has routes and services, and 50% of modules have at least one of libraries, menus, links or actions (so 2.5 YAML files per module overall), that is:
300 + 300 + 150 = 750 YAML files, plus a handful of extras for core.services.yml, core.libraries.yml etc.
For 400 modules 400 + 400 + 200 = 1000.
For 100 modules: 100 + 100 + 50 = 250.
Then if 94 YAML files leads to a 10 millisecond performance regression, we could assume that the reasonable worst case of 1000 YAML files leads to a ~100ms performance regression - this is enough to compound a stampede situation or to create noticeable drag on UI operations like clearing caches or installing new modules.
However even 100ms is not that bad compared to issues like #2940755: block_content block derivatives do not scale to thousands of block_content entities which also kick in during the same situations.
Autowiring should eventually mitigate some of container rebuilding (or at least shift it away from YAML), since there'll be less YAML to parse. The container is the least shiftable/most blocking thing, so that is good.
Routes as attributes will also shift some things away from YAML and that's probably where the bulk of the YAML verbosity is after services.
#3257725: Add a cache prewarm API would mitigate this a bit too.
So it still might be fine but don't think it's negligible.
Comment #23
alexpottAlso the reason this got pushed on again was enums... but it turns out we have a solution for PECL yaml for that ...so shrug and carry on might be okay here.
Comment #24
andypostAs yaml extension now added to 8.1-8.3 images let's see how it affect pipelines
Comment #25
smustgrave commentedIs this actually postponed on pipeline rtesting? Didn't see an MR or patch to review.
Not sure the right status.
Comment #26
andypostAs there's no regressions (images using yaml extension last week) the next step is #2157447: Add hook_requirements() for PECL YAML parser
Comment #27
mars0test commentedHi,
We can consider that many complex sites use CDNs or caching services to enhance performance. Without them, maintaining high performance for a complex Drupal website can be challenging.
Another question: Is it normal to perform a cache-clear on a production environment? Also, is YAML parsing the most significant issue when it occurs? ^^"
"Maybe we can try to favor the Symfony YAML parser when it comes to parsing the services.yaml file and improve our management of service declarations, which is not entirely equivalent to what we can do in Symfony 6.
Comment #28
alexpottI think given #3108309: Support Yaml::PARSE_CUSTOM_TAGS in \Drupal\Component\Serialization\YamlSymfony::decode we should look at this again. I think all the important parts of YAML parsing use a FileCache. This means that if the regular cache is not available and the APC cache is primed with these files then YAML parsing does not happening. This means that it would be possible to warm these caches prior to a cache clear on production as FileCaches are not cleared during a cache clear.
I've checked and the following stuff uses FileCache.
Note that config does not use FileCache - but this is okay because config is not read from files anymore during runtime - it is read for the config table.
Therefore I think the use-case to support PECL Yaml is tenuous and the complexity is not worth it. Plus it makes adoption of new features from Symfony Yaml (which is under more active development) harder.
Comment #29
andypostAbsence of file cache for config is separate issue, which is annoying only installing from config
Comment #30
alexpottIn order to not read Yaml during multiple cache rebuild from the UI with APCu installed and using standard we need to land both:
#3414825: InfoParser should use FileCache
#3414807: \Drupal\Core\Asset\LibraryDiscoveryParser should use FileCache to cache the parsed YAML
Wrt to #3389567: Drupal\Core\Config\FileStorage::getAllCollectionNamesHelper should use some caching - I think it is wrong to use file cache for something that will only ever be read once.
Comment #31
wim leers@alexpott Very interesting. Reviewed both of those issues 🏓
Comment #32
catchGiven we've run into issues more than a couple of times, I think we should go ahead here once we're confident that filecache is being used in all the places it should be.
Comment #33
ghost of drupal pastI really, really, really do not want to get involved in decision making but it seems this issue focuses on build time decoding.
However,
SerializationInterface::encodecan and is used run time. Dropping support for the vastly faster PECL emitter here is nothing short of disastrous. Please keep it. For my use case, keeping only YamlPecl::encode and throwing a not supported in YamlPecl::decode is a perfectly valid solution, cutting the maintenance burden down to practically nothing while keeping run time performance high. Thanks for your consideration.Edit: oh, and there's no need for the proxy class either here. Just please keep a
serialization.yaml_peclservice or some such. Thanks again.Comment #34
alexpott@Ghost of Drupal Past FWIW \Drupal\Component\Serialization\Yaml::encode enforces all Yaml encoding via Symfony so if you are using either \Drupal\Component\Serialization\Yaml or \Drupal\Core\Serialization\Yaml then encoding is via the slower Symfony already. And if you want to use PECL for encoding you'd have to write your own serializer already...
Comment #35
andypostSerialisation is weakly used even for igbinary it needs tons of hacks ala #2836091: Swap serialization in DatabaseBackendFactory
Having parts overridable out of box is good to have, speed of parsing by pecl still beats all usetland implementations, so sensible places can use SF implementation (config export) but the rest can have ability to choose
Comment #36
alexpott@andypost the drivers behind this are the we have needs for Symfony features on decode elsewhere see #3108309: Support Yaml::PARSE_CUSTOM_TAGS in \Drupal\Component\Serialization\YamlSymfony::decode for example. I think that once we have file cache for all the Yaml decoding in core there is no reason to automatically swap to PECL YAML for decoding (if it is available) and if we don't do that then we can consider deprecating the YamlPecl class because we are not using it in core. The deprecation can be a follow-up issue we people can argue whether or not we should keep it.
Comment #37
alexpottPushed an MR that disable the automatic switching to YamlPecl. FWIW we already know that this is not going to result in a massive slow down of test because of @Wim Leers excellent research above AND the fact that PECL YAML was not installed on PHP 8 for ages and when it was we didn't see a massive speed up.
Doing this issue will also allow us to close :
Comment #39
kim.pepperLeft a couple of comments.
Comment #40
alexpottThanks for the review @kim.pepper
Comment #41
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #42
alexpottComment #43
wim leersFound only nits, so keeping at . I think this kind of low-level change with broad impact should be reviewed by multiple people, so not yet RTBC'ing.
Comment #44
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #45
longwaveMR needs rebasing, has some unrelated commits in it.
Added some questions to the MR.
Comment #46
kim.pepperUpdated the IS with the decisions made in the comments.
Comment #47
kim.pepperComment #48
kim.pepperComment #49
kim.pepperUpdated the CR
Comment #50
alexpottAddressed @kim.pepper's review - thanks for all the issue management @kim.pepper.
Comment #51
andypostAdded 2 more comments
Comment #52
andypostrebased to remove merged-in commits
Comment #53
andypost@alexpott looks like deprecation with class_alias is shorter then common
Comment #54
andypostAs pipeline showing
\Drupal\Core\Serialization\Yamlstill has some usage in core, so maybe needs follow-up to deprecate it too- https://git.drupalcode.org/issue/drupal-3205480/-/pipelines/80716
- https://git.drupalcode.org/issue/drupal-3205480/-/pipelines/80720
Comment #55
smustgrave commentedThere a recommended way for testing this one?
Comment #56
alexpott@smustgrave not really - Yaml reading is such an integral part of core that this code is called 1000s of time per test run. It's tested - the thing to check is the deprecations and does everything make sense.
Comment #57
smustgrave commentedBased on that nothing appears to have broken as tests are all green
Verified the deprecation has a working/correct link to CR 3415489
First time seeing an entire class deprecated but makes sense.
I see test coverage for the deprecations too.
LGTM!
Comment #58
catch#3205480-32: Drop PECL YAML library support in favor of only Symfony YAML still needs an answer for me - are there any places we aren't using filecache but should be? I think @alexpott mentioned config schema in slack.
Comment #59
smustgrave commentedThat's my mistake for jumping the gun.
Comment #60
wim leersAgreed with @catch, but the concrete example of config schema discovery is already okay, I think?
\Drupal\Core\Config\Schema\ConfigSchemaDiscoveryusesconfig.storage.schemawhich is\Drupal\Core\Config\ExtensionInstallStorage\Drupal\Core\Config\ExtensionInstallStorageextends\Drupal\Core\Config\InstallStoragewhich extends\Drupal\Core\Config\FileStorage\Drupal\Core\Config\FileStoragehas been usingFileCachesince #2473179: Use FileCache for config storageThere's many layers here, so I'd like someone to double-check 😅
Comment #61
alexpottRe #60 the file cache usage in
\Drupal\Core\Config\FileStorageis only in memory storage and not APCu cache. So if we read the schema twice in a request then we'd benefit - but not after a cache clear.I think this is okay as we don't actually read schema during a cache clear or regular use of your site. But I also think we should open an issue to add schema to the APCu cache.
Comment #62
alexpottOpened #3421483: Use APCu file cache for config schema
Comment #63
wim leersAFAICT that means we can RTBC this.
I re-reviewed the entire MR. I have only one nit: the use of
assertEquals()where we should be able to useassertSame(). Not commit-blocking though.Comment #64
longwaveCouple of nits on the MR.
Comment #65
alexpottAddressed @longwave's feedback as it was minor nits putting back to rtbc.
Comment #67
catchThis looks good now and unblocks some other issues.
Committed/pushed to 11.x and cherry-picked to 10.3.x, thanks!
Comment #69
wim leersThis unblocked #2951046: Allow parsing and writing PHP class constants and enums in YAML files and #3108309: Support Yaml::PARSE_CUSTOM_TAGS in \Drupal\Component\Serialization\YamlSymfony::decode! 🎉
AFAICT #3414647: Always use YamlSymfony for loading service yaml files is now obsolete?
P.S.: @catch, any chance the push to
11.xdidn't go through?Comment #71
kim.pepperPublished the CR
Comment #72
neclimduloh... sucks we don't have a standards compliant YAML parser anymore.
Comment #74
rsaddington commentedHowdy,
We've been working with a customer who is experiencing issues with the new (and slower) Symfony YAML parser.
When importing large config files they are seeing things 4x slower in Drupal 10.3 vs. 10.2
Is there a way we can continue to make the old php::yaml_parser available in Drupal 10.3?
Perhaps as a config option or as a settings directive?
Comment #75
donquixote commentedIs there any policy on prefering Drupal\Component\Serialization\Yaml over Drupal\Core\Serialization\Yaml?
Currently I see both used in Drupal core to equal amount:
- 38 matches for "use Drupal\\Core\\Serialization\\Yaml;"
- 34 matches for "use Drupal\\Component\\Serialization\\Yaml;"
- (All of lib/Drupal/Component uses Drupal\\Core\\Serialization\\Yaml, as it should.)
To me it seems the most sensible to standardize on Component Yaml, unless we have plans to replace the simple class_alias() in the future, to insert additional functionality in Core Yaml. But even then we would get mixed results, if the choice between the two is a bit random.
EDIT: The reason I ask is this:
I remember some code quality tool telling me to only use Component Yaml.
But my IDE autocomplete keeps putting Core Yaml at the top, possibly because it is used so often in core.