Problem/Motivation

Follow-up from #3199691: [Symfony 6] Symfony no longer parses octal numbers starting with 0 alone.

Steps to reproduce

Proposed resolution

Keep an eye on libYAML, and track any divergency/workarounds in the meantime that we might be able to undo once both YAML parses have 1.2 support.

  • Remove the YAML implementation switching to only use Symfony YAML
  • Move the logic from \Drupal\Component\Serialization\YamlSymfony to \Drupal\Component\Serialization\Yaml
  • Replace \Drupal\Core\Serialization\Yaml with an alias to Drupal\Component\Serialization\Yaml
  • Remove the Yaml PECL validation work-arounds

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Issue fork drupal-3205480

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

catch created an issue. See original summary.

andypost’s picture

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

wim leers’s picture

wim leers’s picture

Title: PECL YAML library supports YAML 1.1, but Symfony supports YAML 1.2 » Drop PECL YAML library support in favor of only Symfony's YAML?
Status: Active » Needs review

Per @alexpott at https://drupal.slack.com/archives/C2THUBAVA/p1699953782042549?thread_ts=...

We should consider deprecating pecl yaml support

As an additional benefit: we'd be able to use enums directly in our simple config & config entities' YAML representations: https://symfony.com/blog/new-in-symfony-6-2-improved-enum-support#enums-...

Thoughts?

wim leers’s picture

catch’s picture

Issue tags: +needs profiling
alexpott’s picture

Also this means we can't use enums as arguments in the container. See https://git.drupalcode.org/project/drupal/-/jobs/353909 ...

alexpott’s picture

catch’s picture

We originally added support for PECL YAML due to concerns with the performance of Symfony's parser, especially when compared with parsing JSON when that was a possible config storage format. Symfony is generally parsing YAML on build, not runtime.

This was ten years ago though, but I think we need a comparison of something YAML-heavy like router rebuilding with and without the extension to see how they compare now.

andypost’s picture

FYI since this 10 years PECL yaml extension has no changes and the same for libyaml which still has no support for 1.2 https://github.com/yaml/libyaml/issues/20

andypost’s picture

Moreover 2 unit tests are broken when yaml extension enabled

wim leers’s picture

Issue tags: -needs profiling
  1. https://klm.no-ip.org/blog/2021/10-06-pecls-yaml-way-faster-than-symfony... from ~2 years ago
  2. https://github.com/koenpunt/yaml-php-benchmark from 8 years ago is outdated, but https://github.com/koenpunt/yaml-php-benchmark/pull/2 from August 2023 updates it.

I ran the latter. This is with:

  • PHP 8.1.25, with PECL YAML 2.2.2 which uses libYAML 0.2.5
  • symfony/yaml 6.0.
  • on a 2021 M1 Max MacBook Pro

Results of the benchmark, which parses 3 YAML files 1,000 times for each parser:

Run 1
$ php -f ./benchmark.php
PECL YAML
Total: 0.047293186187744
Average: 4.7293186187744E-5
Total: 0.042062044143677
Average: 4.2062044143677E-5
Total: 0.035311937332153
Average: 3.5311937332153E-5

Symfony YAML
Total: 0.20844388008118
Average: 0.00020844388008118
Total: 0.1512598991394
Average: 0.0001512598991394
Total: 0.088942050933838
Average: 8.8942050933838E-5

Spyc YAML
Total: 0.13850688934326
Average: 0.00013850688934326
Total: 0.094258785247803
Average: 9.4258785247803E-5
Total: 0.074319124221802
Average: 7.4319124221802E-5
Run 2
$ php -f ./benchmark.php
PECL YAML
Total: 0.051701068878174
Average: 5.1701068878174E-5
Total: 0.042438983917236
Average: 4.2438983917236E-5
Total: 0.035650968551636
Average: 3.5650968551636E-5

Symfony YAML
Total: 0.20855808258057
Average: 0.00020855808258057
Total: 0.15146589279175
Average: 0.00015146589279175
Total: 0.089360952377319
Average: 8.9360952377319E-5

Spyc YAML
Total: 0.13911104202271
Average: 0.00013911104202271
Total: 0.093008995056152
Average: 9.3008995056152E-5
Total: 0.07398796081543
Average: 7.398796081543E-5
Run 3
$ php -f ./benchmark.php
PECL YAML
Total: 0.049553155899048
Average: 4.9553155899048E-5
Total: 0.042342901229858
Average: 4.2342901229858E-5
Total: 0.035673856735229
Average: 3.5673856735229E-5

Symfony YAML
Total: 0.20927906036377
Average: 0.00020927906036377
Total: 0.15109014511108
Average: 0.00015109014511108
Total: 0.089078903198242
Average: 8.9078903198242E-5

Spyc YAML
Total: 0.13884806632996
Average: 0.00013884806632996
Total: 0.093702077865601
Average: 9.3702077865601E-5
Total: 0.074218034744263
Average: 7.4218034744263E-5
andypost’s picture

FYI last night the tarballs for new releases of PHP are published, I will test it today and tomorrow will roll out new images

wim leers’s picture

First result for each of the 3 files: ~0.2 seconds for Symfony versus ~0.05 seconds for PECL YAML in #18, for the most complex YAML file. That's 4 times slower!

OTOH … how often does Drupal parse 1,000 YAML files? 🤔

Modifying the benchmark to match Drupal's needs

Let's check with a good body of YAML files in Drupal core. Those in the Standard install profile:

$ cd yaml-php-benchmark/yaml
$ mv ~/core/core/profiles/standard/config/install/*.yml .
$ mv ~/core/core/profiles/standard/config/optional/*.yml .
$ cd ..
$ ls -al yaml  | wc -l
      94
# apply attached patch — which changes it from benchmarking individual files to ALL the files in the "yaml" directory

Results

$ php -f ./benchmark.php
PECL YAML
Total: 3.8904910087585
Average: 0.0038904910087585

Symfony YAML
Total: 13.704869031906
Average: 0.013704869031906

Conclusion

  • There is barely any variance now, because A) no results are shown for individual files, B) 94 YAML files instead of 3.
  • 3.9 vs 13.7 seconds for parsing all Standard config 1,000 times ⇒ PECL YAML is 3.5 times faster — not 4.
  • … but in the real world that'd mean 3.9 milliseconds vs 13.7 milliseconds

This suggests to me that assuming a fairly recent machine (in terms of CPU + SSD) and PHP >=8.1 (the minimum for Drupal 10), the real-world performance difference is negligible.

alexpott’s picture

@Wim Leers it's not just config... which is not usually parsed during a cold cache... it's services / routing / menu links etc...

catch’s picture

Yeah the ones I'm concerned about are these, particularly on more complex sites with 200-400 modules installed.

- container rebuild
- router rebuild
- library definitions
- menu links
- menu tasks
- menu actions

If you clear caches from the performance page, then you have to go through 100% of these before you get the next paint. iirc router rebuild happens on the POST request after the form submission, and the rest will happen on the subsequent GET request, but they nearly all block everything else and each other. Give or take menus though since they might happen within bigpipe placeholder replacement.

If you clear caches on a high traffic site, then the longer these take, the more chance of stampedes where multiple requests are trying to rebuild the same cache items before any one of them has finished.

If we take a site with 300 modules installed, then assume that every module has routes and services, and 50% of modules have at least one of libraries, menus, links or actions (so 2.5 YAML files per module overall), that is:

300 + 300 + 150 = 750 YAML files, plus a handful of extras for core.services.yml, core.libraries.yml etc.

For 400 modules 400 + 400 + 200 = 1000.

For 100 modules: 100 + 100 + 50 = 250.

Then if 94 YAML files leads to a 10 millisecond performance regression, we could assume that the reasonable worst case of 1000 YAML files leads to a ~100ms performance regression - this is enough to compound a stampede situation or to create noticeable drag on UI operations like clearing caches or installing new modules.

However even 100ms is not that bad compared to issues like #2940755: block_content block derivatives do not scale to thousands of block_content entities which also kick in during the same situations.

Autowiring should eventually mitigate some of container rebuilding (or at least shift it away from YAML), since there'll be less YAML to parse. The container is the least shiftable/most blocking thing, so that is good.

Routes as attributes will also shift some things away from YAML and that's probably where the bulk of the YAML verbosity is after services.

#3257725: Add a cache prewarm API would mitigate this a bit too.

So it still might be fine but don't think it's negligible.

alexpott’s picture

Also the reason this got pushed on again was enums... but it turns out we have a solution for PECL yaml for that ...so shrug and carry on might be okay here.

andypost’s picture

As yaml extension now added to 8.1-8.3 images let's see how it affect pipelines

smustgrave’s picture

Status: Needs review » Postponed

Is this actually postponed on pipeline rtesting? Didn't see an MR or patch to review.

Not sure the right status.

andypost’s picture

As there's no regressions (images using yaml extension last week) the next step is #2157447: Add hook_requirements() for PECL YAML parser

mars0test’s picture

Hi,

We can consider that many complex sites use CDNs or caching services to enhance performance. Without them, maintaining high performance for a complex Drupal website can be challenging.

Another question: Is it normal to perform a cache-clear on a production environment? Also, is YAML parsing the most significant issue when it occurs? ^^"

"Maybe we can try to favor the Symfony YAML parser when it comes to parsing the services.yaml file and improve our management of service declarations, which is not entirely equivalent to what we can do in Symfony 6.

alexpott’s picture

Status: Postponed » Needs work

I think given #3108309: Support Yaml::PARSE_CUSTOM_TAGS in \Drupal\Component\Serialization\YamlSymfony::decode we should look at this again. I think all the important parts of YAML parsing use a FileCache. This means that if the regular cache is not available and the APC cache is primed with these files then YAML parsing does not happening. This means that it would be possible to warm these caches prior to a cache clear on production as FileCaches are not cleared during a cache clear.

I've checked and the following stuff uses FileCache.

  • Service yaml parsing
  • Route yaml parsing
  • Info.yaml parsing
  • Anything that uses \Drupal\Component\Discovery\YamlDiscovery (so plugins like permissions, actions etc..)

Note that config does not use FileCache - but this is okay because config is not read from files anymore during runtime - it is read for the config table.

Therefore I think the use-case to support PECL Yaml is tenuous and the complexity is not worth it. Plus it makes adoption of new features from Symfony Yaml (which is under more active development) harder.

andypost’s picture

Absence of file cache for config is separate issue, which is annoying only installing from config

alexpott’s picture

In order to not read Yaml during multiple cache rebuild from the UI with APCu installed and using standard we need to land both:
#3414825: InfoParser should use FileCache
#3414807: \Drupal\Core\Asset\LibraryDiscoveryParser should use FileCache to cache the parsed YAML

Wrt to #3389567: Drupal\Core\Config\FileStorage::getAllCollectionNamesHelper should use some caching - I think it is wrong to use file cache for something that will only ever be read once.

wim leers’s picture

@alexpott Very interesting. Reviewed both of those issues 🏓

catch’s picture

Given we've run into issues more than a couple of times, I think we should go ahead here once we're confident that filecache is being used in all the places it should be.

ghost of drupal past’s picture

I really, really, really do not want to get involved in decision making but it seems this issue focuses on build time decoding.

However, SerializationInterface::encode can and is used run time. Dropping support for the vastly faster PECL emitter here is nothing short of disastrous. Please keep it. For my use case, keeping only YamlPecl::encode and throwing a not supported in YamlPecl::decode is a perfectly valid solution, cutting the maintenance burden down to practically nothing while keeping run time performance high. Thanks for your consideration.

Edit: oh, and there's no need for the proxy class either here. Just please keep a serialization.yaml_pecl service or some such. Thanks again.

alexpott’s picture

@Ghost of Drupal Past FWIW \Drupal\Component\Serialization\Yaml::encode enforces all Yaml encoding via Symfony so if you are using either \Drupal\Component\Serialization\Yaml or \Drupal\Core\Serialization\Yaml then encoding is via the slower Symfony already. And if you want to use PECL for encoding you'd have to write your own serializer already...

andypost’s picture

Serialisation is weakly used even for igbinary it needs tons of hacks ala #2836091: Swap serialization in DatabaseBackendFactory

Having parts overridable out of box is good to have, speed of parsing by pecl still beats all usetland implementations, so sensible places can use SF implementation (config export) but the rest can have ability to choose

alexpott’s picture

@andypost the drivers behind this are the we have needs for Symfony features on decode elsewhere see #3108309: Support Yaml::PARSE_CUSTOM_TAGS in \Drupal\Component\Serialization\YamlSymfony::decode for example. I think that once we have file cache for all the Yaml decoding in core there is no reason to automatically swap to PECL YAML for decoding (if it is available) and if we don't do that then we can consider deprecating the YamlPecl class because we are not using it in core. The deprecation can be a follow-up issue we people can argue whether or not we should keep it.

alexpott’s picture

Status: Needs work » Needs review

Pushed an MR that disable the automatic switching to YamlPecl. FWIW we already know that this is not going to result in a massive slow down of test because of @Wim Leers excellent research above AND the fact that PECL YAML was not installed on PHP 8 for ages and when it was we didn't see a massive speed up.

Doing this issue will also allow us to close :

kim.pepper’s picture

Left a couple of comments.

alexpott’s picture

Thanks for the review @kim.pepper

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new5.17 KB

The Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

alexpott’s picture

Status: Needs work » Needs review
wim leers’s picture

Found only nits, so keeping at Needs review. I think this kind of low-level change with broad impact should be reviewed by multiple people, so not yet RTBC'ing.

needs-review-queue-bot’s picture

Status: Needs review » Needs work
StatusFileSize
new90 bytes

The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

longwave’s picture

MR needs rebasing, has some unrelated commits in it.

Added some questions to the MR.

kim.pepper’s picture

Issue summary: View changes

Updated the IS with the decisions made in the comments.

kim.pepper’s picture

Title: Drop PECL YAML library support in favor of only Symfony's YAML? » Drop PECL YAML library support in favor of only Symfony YAML
kim.pepper’s picture

Issue summary: View changes
kim.pepper’s picture

Issue summary: View changes

Updated the CR

alexpott’s picture

Status: Needs work » Needs review

Addressed @kim.pepper's review - thanks for all the issue management @kim.pepper.

andypost’s picture

Added 2 more comments

andypost’s picture

rebased to remove merged-in commits

andypost’s picture

@alexpott looks like deprecation with class_alias is shorter then common

andypost’s picture

As pipeline showing \Drupal\Core\Serialization\Yaml still has some usage in core, so maybe needs follow-up to deprecate it too

- https://git.drupalcode.org/issue/drupal-3205480/-/pipelines/80716
- https://git.drupalcode.org/issue/drupal-3205480/-/pipelines/80720

smustgrave’s picture

There a recommended way for testing this one?

alexpott’s picture

@smustgrave not really - Yaml reading is such an integral part of core that this code is called 1000s of time per test run. It's tested - the thing to check is the deprecations and does everything make sense.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

Based on that nothing appears to have broken as tests are all green

Verified the deprecation has a working/correct link to CR 3415489

First time seeing an entire class deprecated but makes sense.

I see test coverage for the deprecations too.

LGTM!

catch’s picture

Status: Reviewed & tested by the community » Needs review

#3205480-32: Drop PECL YAML library support in favor of only Symfony YAML still needs an answer for me - are there any places we aren't using filecache but should be? I think @alexpott mentioned config schema in slack.

smustgrave’s picture

That's my mistake for jumping the gun.

wim leers’s picture

Agreed with @catch, but the concrete example of config schema discovery is already okay, I think?

  • \Drupal\Core\Config\Schema\ConfigSchemaDiscovery uses config.storage.schema which is \Drupal\Core\Config\ExtensionInstallStorage
  • \Drupal\Core\Config\ExtensionInstallStorage extends \Drupal\Core\Config\InstallStorage which extends \Drupal\Core\Config\FileStorage
  • \Drupal\Core\Config\FileStorage has been using FileCache since #2473179: Use FileCache for config storage

There's many layers here, so I'd like someone to double-check 😅

alexpott’s picture

Re #60 the file cache usage in \Drupal\Core\Config\FileStorage is only in memory storage and not APCu cache. So if we read the schema twice in a request then we'd benefit - but not after a cache clear.

I think this is okay as we don't actually read schema during a cache clear or regular use of your site. But I also think we should open an issue to add schema to the APCu cache.

alexpott’s picture

wim leers’s picture

Status: Needs review » Reviewed & tested by the community

AFAICT that means we can RTBC this.

I re-reviewed the entire MR. I have only one nit: the use of assertEquals() where we should be able to use assertSame(). Not commit-blocking though.

longwave’s picture

Status: Reviewed & tested by the community » Needs work

Couple of nits on the MR.

alexpott’s picture

Status: Needs work » Reviewed & tested by the community

Addressed @longwave's feedback as it was minor nits putting back to rtbc.

  • catch committed 47d03890 on 10.3.x
    Issue #3205480 by alexpott, andypost, Wim Leers, catch, kim.pepper,...
catch’s picture

Version: 11.x-dev » 10.3.x-dev
Status: Reviewed & tested by the community » Fixed

This looks good now and unblocks some other issues.

Committed/pushed to 11.x and cherry-picked to 10.3.x, thanks!

wim leers’s picture

  • catch committed 0efb1513 on 11.x
    Issue #3205480 by alexpott, andypost, Wim Leers, catch, kim.pepper,...
kim.pepper’s picture

Published the CR

neclimdul’s picture

oh... sucks we don't have a standards compliant YAML parser anymore.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

rsaddington’s picture

Howdy,

We've been working with a customer who is experiencing issues with the new (and slower) Symfony YAML parser.

When importing large config files they are seeing things 4x slower in Drupal 10.3 vs. 10.2

Is there a way we can continue to make the old php::yaml_parser available in Drupal 10.3?

Perhaps as a config option or as a settings directive?

donquixote’s picture

Is there any policy on prefering Drupal\Component\Serialization\Yaml over Drupal\Core\Serialization\Yaml?

Currently I see both used in Drupal core to equal amount:
- 38 matches for "use Drupal\\Core\\Serialization\\Yaml;"
- 34 matches for "use Drupal\\Component\\Serialization\\Yaml;"
- (All of lib/Drupal/Component uses Drupal\\Core\\Serialization\\Yaml, as it should.)

To me it seems the most sensible to standardize on Component Yaml, unless we have plans to replace the simple class_alias() in the future, to insert additional functionality in Core Yaml. But even then we would get mixed results, if the choice between the two is a bit random.

EDIT: The reason I ask is this:
I remember some code quality tool telling me to only use Component Yaml.
But my IDE autocomplete keeps putting Core Yaml at the top, possibly because it is used so often in core.