Problem/Motivation

From the DrupalCon Barcelona Hard Problems Meeting on performance:

  1. ImageStyle/derivative stuff aggregate generation - remove state gets, file_exists(), cold cache memory/cpu
  2. Cron subscriber: state get/config get on EVERY request, we can only get rid of the config get by moving it into a module; we’ll still need a config get, but then you can at least avoid the config get if you uninstall that module https://www.drupal.org/node/2507031
    • Alex: can’t we cache that?
    • Berdir: No, wasn’t able to do so even after 6 months of trying. I used a cache collector, but that didn’t work out.
    • Patch: https://www.drupal.org/node/2575105

Proposed resolution

The state service / \Drupal\Core\State\State extends \Drupal\Core\Cache\CacheCollector. This results in less queries to the database as a single cache entry stores commonly accessed state items.

Most code will not need to make any changes.

NOTE: due to a mistake proper discussion of the issue takes place in #2 -> #35 and #65 and after.

Remaining tasks

Decide on approach. See #136 and #137

User interface changes

None.

API changes

\Drupal\Core\State\State::__construct now requires a cache backend and the lock service to be injected. Support for calling this without those services is removed in Drupal 10.

Data model changes

None.

Release notes snippet

A new $settings['state_cache'] setting has been added, which should be set to TRUE to improve performance, but can be set to FALSE if any modules are storing large amounts of data or frequently changing data in the state store. In Drupal 11, the setting will be removed and the state cache will be permanently enabled. See the change record for more information.

CommentFileSizeAuthor
#144 interdiff-133-142.txt800 bytesquietone
#135 2575105-133.patch18.06 KBquietone
#135 interdiff-133-135.txt1.33 KBquietone
#133 2575105-133.patch18.06 KBberdir
#131 2575105-2363351-combined-131-interdiff.txt1.79 KBberdir
#131 2575105-2363351-combined-131.patch21.17 KBberdir
#128 drupal-core-init-cache_default_bin_backends-2363351-88-interdiff.txt1.79 KBberdir
#128 drupal-core-init-cache_default_bin_backends-2363351-88.patch19.38 KBberdir
#127 2575105-2363351-combined-123.patch17.6 KBberdir
#123 2575105-123.patch14.29 KBberdir
#121 2575105-121.patch14.18 KBberdir
#120 2575105-121-is-106-with-no-test.patch8.9 KBmathilde_dumond
#119 2575105-119-is-106-with-no-tests.patch0 bytesmathilde_dumond
#117 2575105-117.patch14.18 KBberdir
#116 2575105-116-d10.0.patch14.21 KBberdir
#114 2575105-114.patch14.12 KBberdir
#113 interdiff_112-113.txt2.34 KBpradhumanjain2311
#113 2575105-113.patch14.12 KBpradhumanjain2311
#112 2575105-112.patch16.77 KBberdir
#106 2575105-106.patch16.79 KBberdir
#101 2575105-101-interdiff.txt4.44 KBberdir
#101 2575105-101.patch16.82 KBberdir
#92 2575105-92-interdiff.txt987 bytesberdir
#92 2575105-92.patch16.25 KBberdir
#89 2575105-89-interdiff.txt1.49 KBberdir
#89 2575105-89.patch16.27 KBberdir
#87 2575105-87-interdiff.txt5.31 KBberdir
#87 2575105-87.patch15.79 KBberdir
#81 2575105-81.patch13.83 KBalexpott
#81 76-81-interdiff.txt4.27 KBalexpott
#76 2575105-state-cache-76-interdiff.txt449 bytesberdir
#76 2575105-state-cache-76.patch12.99 KBberdir
#71 2575105-state-cache-71-8.9-do-not-test.patch11.84 KBberdir
#70 2575105-state-cache-70-interdiff.txt2.63 KBberdir
#70 2575105-state-cache-70.patch12.99 KBberdir
#70 2575105-state-cache-70-test.patch12.99 KBberdir
#69 2575105-state-cache-69-interdiff.txt1.81 KBberdir
#69 2575105-state-cache-69.patch10.35 KBberdir
#67 2575105-state-cache-67-interdiff.txt458 bytesberdir
#67 2575105-state-cache-67.patch9.16 KBberdir
#66 2575105-state-cache-66.patch9.17 KBberdir
#65 2575105-state-cache-65.patch8.73 KBberdir
#55 2575105-55.patch716 bytesalexpott
#49 2575105-49-revert.patch10.43 KBamateescu
#44 2575105-revert.patch10.45 KBcatch
#44 2575105-revert.patch4.33 KBcatch
#43 2575105-revert.patch6.13 KBcatch
#40 2575105-revert.patch10.26 KBcatch
#30 cache-collector-state-2575105-28.patch10.32 KBberdir
#28 cache-collector-state-2575105-28.patch10.32 KBberdir
#13 cache-collector-state-2575105-13.patch10.32 KBberdir
#10 cache-collector-state-2575105-10-interdiff.txt1.27 KBberdir
#10 cache-collector-state-2575105-10.patch11.19 KBberdir
#4 cache-collector-state-2575105-4-interdiff.txt4.75 KBberdir
#4 cache-collector-state-2575105-4.patch10.19 KBberdir
#2 cache-collector-state-2575105-1.patch5.44 KBberdir

Issue fork drupal-2575105

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

Berdir created an issue. See original summary.

berdir’s picture

Status: Active » Needs review
StatusFileSize
new5.44 KB

First patch. Manual testing is working fine, lets see how many tests are failing.

Tests *will* fail due to stale caches in the parent site.

Status: Needs review » Needs work

The last submitted patch, 2: cache-collector-state-2575105-1.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new10.19 KB
new4.75 KB

That's not a lot of fails and only within tests in almost all cases.

Weird, almost makes me wonder if this is actually working ;)

The last submitted patch, 2: cache-collector-state-2575105-1.patch, failed testing.

berdir’s picture

Issue tags: +needs profiling

It is working, the reason we don't have test fails is because we already have a static cache that we invalidate after every web request.

The race condition that I was fighting with a long time ago in the original issue seems to be gone, yay. To be sure, I'll do 10 or so test runs after the sprints are over.

I did some profiling, but the absolute numbers don't really change that much, due to mysql cache and so on. But we are trading 5 db queries on the frontpage for uid 1 with one call to apc (I'm using the bootstrap bin now, not 100% sure about that).

The risk of course is that we end up with a lot of data in the cache collector that we don't really need on most requests, especially if state is used incorrectly.

We already have quite a few big variables in there, like the admin route list (which is ~80 with just core and 200+ on my site) and the theme extension objects.

I also found #2154461: Consider persistent caching and/or CacheCollector for state system again, where we thought about using the cache separately for each item. With bootstrap/fast chained, that might also be a valid option and less risky if state is mis-used. Thoughts?

berdir’s picture

Issue tags: -needs profiling

I did the profiling (as I wrote, not much to see, so no stats)

wim leers’s picture

Priority: Normal » Major
Issue tags: +D8 cacheability, +DrupalCon Barcelona 2015

Looks wonderfully simple!

  1. +++ b/core/core.services.yml
    @@ -397,7 +397,9 @@ services:
    +    arguments: ['@keyvalue', '@cache.bootstrap', '@lock']
    

    Is some of the State used even on non-HTML responses? If it is, then I think @cache.bootstrap is fine. Otherwise, I think introducing @cache.state may make sense?

  2. +++ b/core/lib/Drupal/Core/EventSubscriber/KernelDestructionSubscriber.php
    @@ -64,7 +64,7 @@ public function onKernelTerminate(PostResponseEvent $event) {
    -    $events[KernelEvents::TERMINATE][] = array('onKernelTerminate', 100);
    +    $events[KernelEvents::TERMINATE][] = array('onKernelTerminate', -100);
    

    Interesting. Let's document why.

  3. +++ b/core/lib/Drupal/Core/State/State.php
    @@ -22,19 +25,17 @@ class State implements StateInterface {
    -   * Constructs a State object.
    +   * Constructs an State object.
    

    I think the original was correct.

wim leers’s picture

I also found #2154461: Consider persistent caching and/or CacheCollector for state system again, where we thought about using the cache separately for each item. With bootstrap/fast chained, that might also be a valid option and less risky if state is mis-used. Thoughts?

So over there, this was said:

  1. Add a core state storage implementation that uses the cache API to cache each individual query, and default to it.
  2. Add an implementation that uses CacheCollector so that these end up as a single query.

The second is tempting, but it does have the potential disadvantage that rarely used state keys could pollute the cache for all pages - for example cron timings, deleted fields and other stuff that's rarely accessed but in the same system.

It seems like the potential for damage is relatively big. If even a single contrib/custom module creates big State entries (think hundreds of kilobytes or maybe even megabytes), then every page is slowed down massively. With the chained inconsistent consistent cache backend that uses APCu, it seems like it may be better to have multiple small I/Os instead.

berdir’s picture

Fixed the nitpicks.

On separate cache entries vs. cache collector and which cache bin to use, I discussed that with different people including @alex.pott, @Wim Leers and @dawehner.

The thing is that it is a) easy to use a different cache bin simply by changing service arguments and b) also possible to write a different implementation and use that instead.

We want to optimize for the 90% use case, where we assume that people will end up using really problematic modules since contrib modules should hopefully get fix and improved by someone if they do it wrong and many might also not have APCu. It's also possible to do it badly with separate cache entries, if you don't have APCu and are doing many different state get calls per page.

Therefore, the decision is to use the cache collector approach with the bootstrap cache bin, since even fast chained has a single cache get to check if the cache isn't stale, so re-using that makes sense.

We will look into moving some parts away from state even, e.g. the non admin routes since that is now cached and not needed on every or even many requests, but that's a different issue.

Status: Needs review » Needs work

The last submitted patch, 10: cache-collector-state-2575105-10.patch, failed testing.

wim leers’s picture

+++ b/core/tests/Drupal/Tests/Core/Render/RendererBubblingTest.php
--- a/index.php
+++ b/index.php

+++ b/index.php
+++ b/index.php
@@ -8,6 +8,11 @@

@@ -8,6 +8,11 @@
  * See COPYRIGHT.txt and LICENSE.txt files in the "core" directory.
  */
 
+$uprofiler_path = '/var/www/html';
+include_once $uprofiler_path . '/uprofiler_lib/utils/uprofiler_lib.php';
+include_once $uprofiler_path . '/uprofiler_lib/utils/uprofiler_runs.php';
+uprofiler_enable(UPROFILER_FLAGS_NO_BUILTINS + UPROFILER_FLAGS_MEMORY);
+
 use Drupal\Core\DrupalKernel;
 use Symfony\Component\HttpFoundation\Request;
 
@@ -20,3 +25,9 @@

@@ -20,3 +25,9 @@
 $response->send();
 
 $kernel->terminate($request, $response);
+
+$uprofiler_data = uprofiler_disable();
+$uprofiler_runs = new uprofilerRuns_Default();
+$namespace = 'd8';
+$id = $uprofiler_runs->save_run($uprofiler_data, $namespace);
+print "<a href='http://localhost/uprofiler_html/?run=$id&sort=excl_wt&source=$namespace'>uprofiler</a>";
 

This is why it failed :)

Once that's fixed, this is ready.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new10.32 KB

...

wim leers’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

Improved the IS.

Regarding profiling: see #6.

wim leers’s picture

Issue summary: View changes
berdir’s picture

To provide some more numbers, I also checked a production site in new relic, and over the last 7 days, new relic reports ~10 queries to key_value on a node page, most if not all (on that version) are state, and one query takes about 0.7ms. It also reports ~1200 queries/minute against that table, so most of them should go away with caching this.

The last submitted patch, 10: cache-collector-state-2575105-10.patch, failed testing.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 13: cache-collector-state-2575105-13.patch, failed testing.

wim leers’s picture

Status: Needs work » Reviewed & tested by the community
GET http://ec2-52-89-208-32.us-west-2.compute.amazonaws.com/checkout/user/login returned 0 (0 bytes).

I asked testbot 3083 to be killed.

catch’s picture

Status: Reviewed & tested by the community » Needs review

We're still using state in https://api.drupal.org/api/drupal/core!modules!field!field.purge.inc/fun...

I'd be happier if we moved that to k/v first, or at least made an explicit decision that deleted fields in the bootstrap bin is OK.

wim leers’s picture

Status: Needs review » Reviewed & tested by the community

So, catch, I think you're saying that the rationale given in #10 by Berdir is insufficiently convincing?

We want to optimize for the 90% use case, where we assume that people will end up using really problematic modules since contrib modules should hopefully get fix and improved by someone if they do it wrong […]

Therefore, the decision is to use the cache collector approach with the bootstrap cache bin, […]

We will look into moving some parts away from state even, e.g. the non admin routes since that is now cached and not needed on every or even many requests, but that's a different issue.

i.e. you feel that the "deleted fields state" must be fixed before this can go in, because it could pollute things too much. But like Berdir said in #10 already, we already have the routing.non_admin_routes state entry, which we should also remove.

Deleting fields feels like a relative edge case, and can be refactored/improved independently of this issue. Furthermore, if I read https://api.drupal.org/api/drupal/core%21modules%21field%21field.purge.i... correctly, then it looks like this can only ever temporarily "polluate" state: while the field is being purged. So it's also self-recovering. Which — if correct — is another reason to do that in a separate issue that this issue does not need to be blocked on?

This inclines me to RTBC it again. Feel free to re-un-RTBC — I may be missing something :)

catch’s picture

Status: Reviewed & tested by the community » Needs review

@Wim I think it's convincing for contrib, but only if core sets a good example as to what's in state vs. not.

We could remove the caching of non-admin routes and just get it from this cache instead.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

fabianx’s picture

I have a way to resolve this issue in a BC-compatible way with clear expecations set for contrib:

- Create a decorator, which decorates the state service (instead of changing it)
- Call this decorator state.cached
- Use state.cached instead of state for core services that only store small things

Profit!

fabianx’s picture

Issue summary: View changes
Status: Needs review » Needs work

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

The last submitted patch, 13: cache-collector-state-2575105-13.patch, failed testing.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new10.32 KB

Here's a reroll for now. Will try the separate service idea but that will result in a *way* larger patch, with more conflict potentation.

Status: Needs review » Needs work

The last submitted patch, 28: cache-collector-state-2575105-28.patch, failed testing.

berdir’s picture

Version: 8.2.x-dev » 8.3.x-dev
StatusFileSize
new10.32 KB

Lets try again for 8.3.x

berdir’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work

The last submitted patch, 30: cache-collector-state-2575105-28.patch, failed testing.

fabianx’s picture

Nice work! :)

berdir’s picture

I've been thinking about that state.cached idea, but I'm not convinced. How do we keep those services in sync? What if someone writes a key directly through state and then we don't see that in state.cache and don't invalidate the cache?

I was thinking if we could maybe instead introduce a setting (container parameter?) to enable or disable caching completey.

That said, I looked at some of our sites and it looks all pretty OK to me. What's interesting is that the by far greatest key in there is locale.translation_status. And that's a list of information for each project and it's only something that's needed in the backend while checking/updating locale data. And we save it per project. So it seems like it should be a separate key value collection IMHO. I'll open an issue for that. Not sure if that is an API or not, but IMHO not. There are API functions to get the data.

The total lenght of all keys is 144340 on one project, with 37 keys, the biggest are:

mysql> select name, length(value) from key_value where collection = 'state' order by length(value) desc;
+---------------------------------------+---------------+
| name                                  | length(value) |
+---------------------------------------+---------------+
| locale.translation_status             |         61375 |
| system.theme.data                     |         24406 |
| system.module.files                   |         21732 |
| monitoring.available_sensors          |         10033 |
| routing.non_admin_routes              |          7978 |
| system.test_mail_collector            |          5977 |
| search_api_solr.endpoint.data         |          3877 |
| views.view_route_names                |          3791 |
| router.path_roots                     |          2218 |
| system.theme.files                    |           804 |

Another project is 136532 total length, 87 keys, biggest keys:

mysql> select name, length(value) from key_value where collection = 'state' order by length(value) desc;
+-----------------------------------------------------+---------------+
| name                                                | length(value) |
+-----------------------------------------------------+---------------+
| system.module.files                                 |         24430 |
| system.theme.data                                   |         23145 |
| routing.non_admin_routes                            |         21786 |
| locale.translation_status                           |         13677 |
| monitoring.available_sensors                        |         11193 |
| system.javascript_parsed                            |          7565 |
| ****:presenter                             |          7446 |
| theme.active_theme.seven                            |          5349 |
| theme.active_theme.suedostschweiz                   |          4087 |
| search_api_solr.endpoint.data                       |          3622 |
| views.view_route_names                              |          2881 |
| router.path_roots                                   |          2653 |
berdir’s picture

#2834580: Move locale.translation_status state key to a separate key/value collection

Just because the 2-3 projects that I checked don't have big data in there of course doesn't mean that there are no sites that do. We could make my suggested setting opt-in or even add a requirements hook or so if it uses the default backend to warn about big data in it.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

catch’s picture

Category: Task » Bug report
Priority: Major » Critical

This got committed by mistake here:

commit 714799324429f301188a1ff9113907728d4c1e18
Author: Alex Pott <alex.a.pott@googlemail.com>
Date:   Mon Dec 19 09:27:43 2016 +0000

    Issue #2834580 by Berdir: Move locale.translation_status state key to a separate key/value collection


But as far as I can tell, the State cache item never gets set. Moving to critical bug for a revert.

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new10.26 KB

Patch doesn't reverse apply any more, here's a revert patch based on #30.

Status: Needs review » Needs work

The last submitted patch, 40: 2575105-revert.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

berdir’s picture

Hu, that's weird, wasn't aware that we did that accidently, sorry about providing a messed up patch in the other issue :)

> 18:50:39 Fatal error: Uncaught Error: Cannot access parent:: when current class scope has no parent in /var/www/html/core/lib/Drupal/Core/State/State.php:101

catch’s picture

Status: Needs work » Needs review
StatusFileSize
new6.13 KB

One line wasn't fully reverted.

catch’s picture

StatusFileSize
new4.33 KB
new10.45 KB

Better when you create the patch properly too.

The last submitted patch, 43: 2575105-revert.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

The last submitted patch, 44: 2575105-revert.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Status: Needs review » Needs work

The last submitted patch, 44: 2575105-revert.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

amateescu’s picture

Status: Needs work » Needs review
StatusFileSize
new10.43 KB

This patch is causing bugs like #3007827: The state cache is not cleared when setting a state, here's a revert patch created manually.

amateescu’s picture

Status: Needs review » Reviewed & tested by the community

The revert works fine, let's get it in so we can continue working on the original patch.

  • catch committed 7fcabc7 on 8.7.x
    Issue #2575105 by Berdir, catch, amateescu: Use cache collector for...

  • catch committed cc29449 on 8.6.x
    Issue #2575105 by Berdir, catch, amateescu: Use cache collector for...
catch’s picture

Status: Reviewed & tested by the community » Needs work

Committed/pushed the revert to 8.7.x and 8.6.x.

Back to CNW.

alexpott’s picture

The revert has broken PHP 5 testing :(

core/modules/system/tests/src/Functional/System/ShutdownFunctionsTest.php now errors with

1) Drupal\Tests\system\Functional\System\ShutdownFunctionsTest::testShutdownFunctions
null does not match expected type "array".
alexpott’s picture

Status: Needs work » Needs review
StatusFileSize
new716 bytes

This fixes it. It might be a separate issue that this revert has exposed. Before the revert there are two callbacks before the callbacks start being called. _system_test_first_shutdown_function and the lock release callback. After the revert there's only _system_test_first_shutdown_function - in PHP 5 seems to be a bug with a foreach where the array being looped over changes. See https://3v4l.org/le78R

The fix is to use while and next for a more reliable process - see https://3v4l.org/clIom

andypost’s picture

It also needs to prevent serialization

+++ b/core/lib/Drupal/Core/State/State.php
@@ -19,18 +16,20 @@ class State extends CacheCollector implements StateInterface {
+  protected $cache = [];

it brings state here, so needs __sleep()/__wakeup() to prevent static cache from serialization

  • xjm committed 8861a54 on 8.7.x
    Issue #2575105 by alexpott: hotfix for PHP 5 test failures in...

  • xjm committed 29aaab0 on 8.6.x
    Issue #2575105 by alexpott: hotfix for PHP 5 test failures in...
xjm’s picture

Baffled as to how the recent commit exposed this, but since HEAD is busted, I just went ahead and committed #55. Leaving this open for #56 and for followup investigation or a proper fix since the patch in #55 changes code that hasn't been touched in a year. Maybe it should be opened as a separate issue?

alexpott’s picture

Category: Bug report » Task
Priority: Critical » Major

@xjm yes it it baffling. I think it's another thing we broke when we converted code for the each() deprecation. But the break was hidden because other changes including this one meant that we had more than just the _system_test_first_shutdown_function in $callbacks. This meant when _system_test_first_shutdown_function adds yet another callback we weren't at the end of the loop and the test didn't break. When this issue was reverted it became the only callback and therefore it broke as per https://3v4l.org/le78R

I ponder given the random set of things that has happened to this issue if we should close it and start with a fresh issue.

Since we're now back to where we were before this got erroneously committed as part of another issue (see #39) setting the issue metadata back to where it is.

alexpott’s picture

I've opened #3008140: Improve shutdown function testing to add test coverage so there's additional test coverage so we don't get caught out by #55 again.

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

berdir’s picture

StatusFileSize
new8.73 KB

Lets see if we can revive this. Maybe we should do a new issue as this is indeed pretty confusing with the reverts and bugfixes being committed there.

berdir’s picture

StatusFileSize
new9.17 KB

Adding BC.

berdir’s picture

I suspect that the CLI problems reported by #3007827: The state cache is not cleared when setting a state are related to #2272685: Incorrect handling of invalid cache items in apcu backend. cache collector is the only thing in core to actually rely on cache invalidation and the existing ones probably don't use a fast chained cache backend. And cache invalidation with the apcu backend is a mess, to put it carefully.

As a workaround, switching to a different cache backend that doesn't use fast chained. means an extra cache get from the database/redis/..., but state is fairly frequently invalidated, compared to other bootstrap things, so might actually not be a bad idea anyway.

Status: Needs review » Needs work

The last submitted patch, 67: 2575105-state-cache-67.patch, failed testing. View results

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new10.35 KB
new1.81 KB

Well, that is stupid. I forgot the call to persist() even though I likely wrote that documentation myself that you have to do that (although I don't remember why we don't do it by default. I was 7 years ago...).

While the apcu thing didn't help, this is the real problem. I'll see if I can somehow write a better test for this.

berdir’s picture

Here is a unit test that ensures that you can update values that already are in the cache. I realized that there are two conditions for this bug to happen that might be common during cron jobs but not so much in tests:

a) The value must already be in the cache, so that we don't trigger \Drupal\Core\State\State::resolveCacheMiss() which would mark it for writing, whether or not it changes during the request
b) At least one other miss/set/delete must happen in the same request, otherwise \Drupal\Core\Cache\CacheCollector::$keysToPersist is empty, the cache had been invalidated, nothing is written and it will build it again from scratch on the next request.

I'm testing both cases for b), to also make sure that the cache is updated if nothing else changes.

Attaching a version of the unit test with commented out persist to show that it catches the bug and an interdiff with just the test.

berdir’s picture

A patch for Drupal 8.9.

The last submitted patch, 70: 2575105-state-cache-70-test.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

The last submitted patch, 70: 2575105-state-cache-70.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

berdir’s picture

Status: Needs review » Needs work
berdir’s picture

+++ b/core/tests/Drupal/Tests/Core/State/StateTest.php
@@ -0,0 +1,71 @@
+
+namespace Drupal\Tests\Core\Site;
+

namespace here is wrong.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new12.99 KB
new449 bytes

Fixed that.

alexpott’s picture

  1. +++ b/core/lib/Drupal/Core/EventSubscriber/KernelDestructionSubscriber.php
    @@ -60,7 +60,10 @@ public function onKernelTerminate(TerminateEvent $event) {
    -    $events[KernelEvents::TERMINATE][] = ['onKernelTerminate', 100];
    +    // Run this subscriber after others as those might use services that need
    +    // to be terminated as well or run code that needs to run before
    +    // termination.
    +    $events[KernelEvents::TERMINATE][] = ['onKernelTerminate', -100];
    

    This change feels quite impactful. Is there not an argument for adding another tag - something like needs late destruction

    I worry that something is contrib is expecting to come after this. Like will this break supercache? http://grep.xnddx.ru/search?text=onKernelTerminate&filename=

  2. +++ b/core/lib/Drupal/Core/State/State.php
    @@ -16,20 +19,26 @@ class State implements StateInterface {
    +      @trigger_error('The cache.bootstrap service must be passed to State::__construct(), it is required before drupal:10.0.0.', E_USER_DEPRECATED);
    ...
    +      @trigger_error('The lock service must be passed to State::__construct(), it is required before drupal:10.0.0.', E_USER_DEPRECATED);
    

    The format for these is @trigger_error('Calling ' . __METHOD__ . '() without the $image_factory argument is deprecated in drupal:9.1.0 and is required in drupal:10.0.0. See https://www.drupal.org/node/3173719', E_USER_DEPRECATED);

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

berdir’s picture

1. Trying to track down why exactly I made the change and -100, but it was in the first patch already, I guess I didn't pick -100 for a reason. You're right, this is a tricky change.

That said, your search term only finds methods that happen to have the same name, but there are plenty of other subscribers if you search for http://grep.xnddx.ru/search?text=KernelEvents%3A%3ATERMINATE&filename=. I would assume that most of those don't have dependencies between themself, but who knows.

instead of a second tag, maybe an attribute on the tag could work too.

I did debug again why I did this and the reason is \Drupal\automated_cron\EventSubscriber\AutomatedCron, that also has a prio of 100. So we could also "fix" at least core by setting automated cron to 101. The question is what's more likely to break?

a) Something not fully working because it calls state *after* \Drupal\Core\EventSubscriber\KernelDestructionSubscriber and then at least caching of that is messed up (it would never end up in the cache if only called there)
b) Something has to be called *after* this or another destruct event.

I'm honestly unsure. I think a) probably has a smaller impact, if something later on indeed uses state, worst case is that it won't benefit from the new cache. But it might be hard to improve that then in the affected custom/contrib code. b) has higher risk of breaking something but more "potential" improvement.

Given that, for minimal BC impact, we could set automated cron to 101 and discuss moving the kernel destruct to later by default or offering a late destruct option in a follow-up?

alexpott’s picture

I think changing automated cron to 101 is a good idea. And opening the follow-up to discuss late destruction / changing the priority.

alexpott’s picture

StatusFileSize
new4.27 KB
new13.83 KB

Patch addresses #80, #79 and #77.

alexpott’s picture

Issue summary: View changes

Created the change record https://www.drupal.org/node/3177901 and updated issue summary.

berdir’s picture

One thing that we haven't discussed anymore is the risk of possibly very large state items. Before that was a minor issue as you only had to load it when you used it. But now it will sooner or later end up the cache and we have to load it on every page.

I guess at a minimum, we should document in the change record that state should be used only for smaller stuff. The issue summary originally used a state.cached service, so the collector usage would be fully optional, but that would require additional changes and at least writes to regular state would also need to invalidate the cache.

One alternative idea I just had was to skip caching large entries automatically, we'd just ignore them and don't call persist() on them, just pass them through. Basically a strlen() > 10/100kb or so? I think the overhead of that would be fairly minimal.

alexpott’s picture

One alternative idea I just had was to skip caching large entries automatically, we'd just ignore them and don't call persist() on them, just pass them through. Basically a strlen() > 10/100kb or so? I think the overhead of that would be fairly minimal.

Seems like a good idea. Maybe this could be baked into the cache collector pattern? Like isn't this true for all use-cases?

andypost’s picture

daffie’s picture

@Berdir said:

One alternative idea I just had was to skip caching large entries automatically, we'd just ignore them and don't call persist() on them, just pass them through. Basically a strlen() > 10/100kb or so? I think the overhead of that would be fairly minimal.

@alexpott responded:

Seems like a good idea. Maybe this could be baked into the cache collector pattern? Like isn't this true for all use-cases?

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new15.79 KB
new5.31 KB

I thought about that, but that's not really possible right now. Subclasses are responsible right now to call persist(), and within persist, we no longer know the size. At the same time, it is a bit awkward because writing the test made me realize that we can also not call invalidateCache() then. Because if an item is big *and* being set, we would invalidate the cache for no reason. IMHO also shows again that this whole thing with forcing clients to call persist just doesn't really seem like the best idea :)

So I think we need to do it in State, but then we can also set it to 10k characters whereas other implementations might want to use a higher limit (could be done through a property).

Also did some slight refactoring to have less duplication by reusing the setMultiple()/deleteMultiple methods.

Maybe create a follow-up to automatically persist() including the a max size implementation that could be set/overridden in the subclass?

berdir’s picture

Status: Needs review » Needs work

Something failed very badly :)

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new16.27 KB
new1.49 KB

huh, so you are telling me that not everything in state is a string? But that assumption worked so well in my unit test... (Never trust a unit test)

Not sure if there is a faster way to get the size of an array or object than serializing it. This means an extra serialize call which is not so great. Only on a set/cache miss, but still.

jibran’s picture

+++ b/core/core.services.yml
@@ -462,7 +462,9 @@ services:
+    arguments: ['@keyvalue', '@cache.default', '@lock']

+++ b/core/lib/Drupal/Core/State/State.php
@@ -16,20 +24,26 @@ class State implements StateInterface {
+      $cache = \Drupal::cache('discovery');

Isn't discover and default two separate bins?

berdir’s picture

Status: Needs review » Needs work

Yes, leftover, I switched over to default to avoid bugs with the apcu backend, forgot to update it for the BC fallback.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new16.25 KB
new987 bytes

Fixed that.

jibran’s picture

Thanks, a couple of more observations:

  • Do we need BC tests here? I reckon we don't need them.
  • Do we want to collect some before and after stats here?
berdir’s picture

> Do we want to collect some before and after stats here?

We have this in production on lots of sites, some of them very large, I don't have any exact numbers, but between this and #3131585: Performance regression caused by using the last installed entity definitions you can expect that queries against key_value will basically vanish for regular requests unless you have some other module that queries key value directly. And since I fixed the cache inconsistency issue, we haven't had any issues with it.

alexpott’s picture

Not sure if there is a faster way to get the size of an array or object than serializing it. This means an extra serialize call which is not so great. Only on a set/cache miss, but still.

This bit feels really tricky. The most likely stuff to fall into this size restriction is the entity type info... here's the rows greater than 10000 on a real site I'm working on which has plenty of custom entity types:

| entity.definitions.installed                         | contact_message.field_storage_definitions                              |          8609 |
| entity.definitions.installed                         | path_alias.field_storage_definitions                                   |          9576 |
| entity.definitions.installed                         | site__referrer.field_storage_definitions                               |         11044 |
| entity.definitions.installed                         | site__request.field_storage_definitions                                |         11469 |
| entity.definitions.bundle_field_map                  | easy_email                                                             |         11526 |
| entity.definitions.installed                         | site__offer_group.field_storage_definitions                            |         11847 |
| entity.definitions.installed                         | codes_pool_collection.field_storage_definitions                        |         12234 |
| entity.definitions.installed                         | file.field_storage_definitions                                         |         12357 |
| entity.definitions.installed                         | site__region.field_storage_definitions                                 |         12797 |
| entity.definitions.installed                         | message.field_storage_definitions                                      |         14694 |
| entity.definitions.installed                         | site__store.field_storage_definitions                                  |         18551 |
| entity.definitions.installed                         | site__referee.field_storage_definitions                                |         18919 |
| entity.definitions.installed                         | block_content.field_storage_definitions                                |         20156 |
| entity.definitions.installed                         | taxonomy_term.field_storage_definitions                                |         21079 |
| entity.definitions.installed                         | site__file_upload_type.field_storage_definitions                       |         22231 |
| entity.definitions.installed                         | association_channel_map.field_storage_definitions                      |         22562 |
| entity.definitions.installed                         | site__leads.field_storage_definitions                                  |         22725 |
| entity.definitions.installed                         | comment.field_storage_definitions                                      |         23232 |
| entity.definitions.installed                         | site__banner.field_storage_definitions                                 |         25463 |
| entity.definitions.installed                         | site__offer.field_storage_definitions                                  |         27384 |
| entity.definitions.installed                         | user.field_storage_definitions                                         |         27716 |
| entity.definitions.installed                         | menu_link_content.field_storage_definitions                            |         29164 |
| entity.definitions.installed                         | site__form_submission.field_storage_definitions                        |         30170 |
| entity.definitions.installed                         | site__sub_offer.field_storage_definitions                              |         34275 |
| entity.definitions.installed                         | site__custom_form.field_storage_definitions                            |         36089 |
| entity.definitions.installed                         | site__channel.field_storage_definitions                                |         41726 |
| entity.definitions.installed                         | easy_email.field_storage_definitions                                   |         42517 |
| entity.definitions.installed                         | site__customer.field_storage_definitions                               |         46881 |
| entity.definitions.installed                         | node.field_storage_definitions                                         |         50498 |
| entity.definitions.installed                         | site__association.field_storage_definitions                            |         60462 |
| state                                                | routing.non_admin_routes                                               |        191093 |

/me wonders how often routing.non_admin_routes is used...

berdir’s picture

@alexpott: This issue/limit only affects state, other key value collections aren't cached, not like this.

berdir’s picture

But still, thinking about it more, "Only on a set/cache miss, but still." means that if you have something above the defined size, it will be a cache miss every time it is requested.

And your non admin routes thing is large and I think used a lot. inspecting, I can see that it has its own cache wrapper around state, so it's actually not that much of a problem: \Drupal\Core\Routing\RoutePreloader::onRequest. Still, the state cache means we could drop the extra cache wrapper. But only if we _don't_ limit the size. Tricky :)

Might be worth inspecting a bit what's in there though. I wondered before if the implementation there is too simplistic, I'm pretty sure there are a _ton_ of routes in there that are actually never or hardly ever needed for link access/generation.

catch’s picture

"Only on a set/cache miss, but still." means that if you have something above the defined size, it will be a cache miss every time it is requested.

Could we cache the size that we calculate in the cache collector, with a special format or whatever so it's distinguishable from actual data? Then rather than recalculating each time we'd just know we have to look it up. Then it really would only be done on cache misses.

berdir’s picture

Hm. interesting idea, but I think tricky with the current setup. I think that could only be done in the actual logic that does write the cache, but then we would have to do it in the CacheCollector base class (or duplicate it into State). But as mentioned, the decision *if* something should be persisted is done in the subclasses, including immediately invalidating the cache in case of a set(), so if we'd change persist() to ignore known-large-items, we'd still end up invalidating the cache.

alexpott’s picture

#3120301: RoutePreloader: prevent preloading of routes generated by JSON:API fixes the massive route preload entry for me. So now this issue will add a double cache.

berdir’s picture

StatusFileSize
new16.82 KB
new4.44 KB

Given my thoughts about the length checking overhead on every load possibly being worse than just caching it unless it is _really_ large, but then you might have different issues anyway, I'm going back to an earlier patch (#81).

> So now this issue will add a double cache.

Given that the related issue is now fixed. lets fix that problem by just removing the separate RoutePreloader cache then? That means an even bigger benefit, because we save a separate cache get.

For a second, I was worried about the routes cache tag there, but that's not an issue at all, the state cache already invalidates itself if it changes. That made me wonder if we should optimize to not write the still fairly large state key if it didn't change (chances of that on cache clears and redeploys seems large and we likely already have it loaded/in cache now), but that's out of scope.

I did have another look around some of our larger sites and since we don't use jsonapi, the routes preload entry is managable even without that core key, the biggest thing I found was a system.js_cache_files state entry with 87kb, that's an array with 620 entries. The same site has one instance of state misuse where it uses it as sort of a cache with dynamic (date) keys, so there are in total 6k keys and close to 1M of data in there, but the cache collector approach will ensure that it doesn't need to load that much. The current cache entry length there is 150kb as a serialized string.

Status: Needs review » Needs work

The last submitted patch, 101: 2575105-101.patch, failed testing. View results

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

berdir’s picture

catch’s picture

Title: Use cache collector for state » [PP-1] Use cache collector for state
berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new16.79 KB

Reroll.

Status: Needs review » Needs work

The last submitted patch, 106: 2575105-106.patch, failed testing. View results

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

quietone’s picture

Found this because this is an issue that has already been committed but still open. In fact I see two commits.

Maybe we should do a new issue as this is indeed pretty confusing with the reverts and bugfixes being committed there.

I agree. (From #65)

Not that I recommend continuing in this issue, but if it makes sense to do so, can the Issue Summary be brought up to date?

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

berdir’s picture

StatusFileSize
new16.77 KB

Reroll for D10.

pradhumanjain2311’s picture

StatusFileSize
new14.12 KB
new2.34 KB

Try to fix cs errors.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new14.12 KB

Another reroll.

Note: The conflicting commit has been reverted on 10.0, use the patch above there, this is for 10.1.

Status: Needs review » Needs work

The last submitted patch, 114: 2575105-114.patch, failed testing. View results

berdir’s picture

StatusFileSize
new14.21 KB

Sorry for the noise, another conflict on 10.0.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new14.18 KB

And a reroll for 10.1, same conflict for 10.0 and 10.1, but the patches are elswhere not the same. #2363351: Cached services can't be used in service providers/modifiers is close now, once that's in I will check if there are any test fails left.

Status: Needs review » Needs work

The last submitted patch, 117: 2575105-117.patch, failed testing. View results

mathilde_dumond’s picture

StatusFileSize
new0 bytes

version of the patch from #106 without the tests, to have a patch that applies on D9. test changes are excluded to avoid rerolls for D9

mathilde_dumond’s picture

StatusFileSize
new8.9 KB

sorry about the empty patch, here is again patch 106 with no test

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new14.18 KB

Another reroll. There's no point in updating the deprecation message versions until the blocker is resolved.

Status: Needs review » Needs work

The last submitted patch, 121: 2575105-121.patch, failed testing. View results

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new14.29 KB

Another reroll. 10.1 is busy.

Status: Needs review » Needs work

The last submitted patch, 123: 2575105-123.patch, failed testing. View results

berdir’s picture

Note for people following along here. #2363351: Cached services can't be used in service providers/modifiers is now a hard requirement to use this on 10.1 due to the new DevelopmentCompilerPass that use state. I'm not sure if state is the correct thing to do use there but for now, you need the current patch from that other issue to be able to use this.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new17.6 KB

Here's a combined patch with the one from the issue. I'm seeing some very weird things on 10.1 with Symfony 6.3 with this (something about Ghost/Lazy services and missing properties). Lets see what testbot thinks.

berdir’s picture

Can reproduce the errors also with just core on update.php. It's going through DevelopmentSettingsPass as expected, that's not visible in the backtrace though. Error looks like this:

Error: Call to a member function insert() on null in Drupal\Core\Lock\DatabaseLockBackend->acquire() (line 71 of core/lib/Drupal/Core/Lock/DatabaseLockBackend.php).

Somehow it doesn't work to properly initialize a Ghost object so early and then it explodes.

Workaround is to not use the state service there but use the key value store directly and bypass caching and CacheCollector stuff. Not pretty but avoids the issue and actually kind of makes sense as we are not interested in persisting those values in the cache.

(edit: wrong patch name, but the contents should be correct)

The last submitted patch, 127: 2575105-2363351-combined-123.patch, failed testing. View results

Status: Needs review » Needs work
berdir’s picture

Status: Needs work » Needs review
StatusFileSize
new21.17 KB
new1.79 KB

Ah, much better. Fixing that unit test, then this should green, but still blocked on the cached services issue, even the workaround for DevelopmentSettingsPass as there are some tests that break.

smustgrave’s picture

Status: Needs review » Postponed
berdir’s picture

Title: [PP-1] Use cache collector for state » Use cache collector for state
Status: Postponed » Needs review
StatusFileSize
new18.06 KB

That is in, rerolled. Conflicted on the deprecated state stuff.

smustgrave’s picture

Status: Needs review » Needs work

Deprecation messages need updating. Same for CR please

quietone’s picture

Status: Needs work » Needs review
StatusFileSize
new1.33 KB
new18.06 KB

All I did was to update two trigger_error messages.

berdir’s picture

I'm slightly torn on this. I strongly believe that this has considerable performance improvements, we're seeing large reductions of query counts with this on production sites.

That said, there are definitely cases where cache is _not_ required and it's wasteful to load things on every request that are only needed on cron jobs or even cache rebuils. See the changes in #128. I'm wondering if we should instead introduce a new service that wraps state and make caching an opt-in. It would implement the same service, so it's a drop-in replacement. Not sure about \Drupal::state(), could be a new method as we want to promote it IMHO, or a new parameter.

That would deal with concerns about sites/custom code that stores a lot of data in state that is not frequently. Thoughts?

The downside is that if a key is inconsistently used, sometimes with cached state, sometimes not would result in cache invalidation issues. That's technically also the case when bypassing the state service and using keyvalue directly with the current patch.

catch’s picture

Should we be moving some keys out of state to a separate collection? I was thinking about the recent development settings page since that's only accessed on container rebuilds. Not sure what that collection would be called though, like state-2...

We're comparing this to Drupal 7 variables which loaded the entire table of state + config on every request so we're already quite a bit more efficient, but yeah it does seems a waste.

smustgrave’s picture

So this issue doesn't stall should this be NW for #137?

catch’s picture

@smustgrave if we did #136 it would be needs work for that, but if we do #137 it would not, instead we'd open a spin-off and go ahead here.

smustgrave’s picture

Status: Needs review » Reviewed & tested by the community

To keep the issue from stalling going to mark it then. If we need a follow up for #136 can do that for everyone here.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 135: 2575105-133.patch, failed testing. View results

quietone’s picture

Issue summary: View changes
Status: Needs work » Needs review

I am reviewing issues that I have bookmarked that were committed and then reverted.

As I read #136 and #137 there is not yet agreement on the approach so setting this to needs review for a decision on that point.

There are failing tests and this should be converted to an MR. I will do that now.

quietone’s picture

Issue tags: +no-needs-review-bot
StatusFileSize
new800 bytes

There was a conflict when applying the patch so here is the interdiff.

I am also tagging no-needs-review-bot to emphasize that a decision should be made before working on the MR.

smustgrave’s picture

Question who can make the decision though?

smustgrave’s picture

@berdir or @catch could you make the calls the remaining tasks? Not sure who can.

catch’s picture

@berdir's idea is to opt-in individual keys for caching.

My idea is two have two services/collections - one for cached state things, one for infrequent.

I think two collection is more robust - no chance of something being requested both cached and uncached, but we need to name it, and we also need to decide whether the new or old collection is the one that gets the caching.

Once possibility would be state vs. state_cached (or state.cached) and move the frequent stuff over to it.

berdir’s picture

Status: Needs review » Needs work

I don't think I was thinking of doing it opt-in per key, I also assumed we have two state services, not two collections though. Although I suppose that does kind of make it per-key.

I'm not sure I get how exactly we would "move" stuff over. Wouldn't that essentially be an API change? Lets take the maintenance check for example. If we just switch core over, then other modules and for example drush that check the old one will be broken as that will just not be there anymore. We'd need a BC layer on the non-cached state (which I assume would be the current state collection) that redirects keys to the cached version, so we need to somehow give it a list of all those keys, which sounds complicated.

The thing is, we already have non-cached collections, that's every other collection stored in keyvalue storage. If you don't want to have it cached, use \Drupal::keyValue('yourmodule')->get('foo'), that's just a few extra characters (DI is a bit more involved with the factory). The only reason we'd bother with a cached and non-cached state version is it wasn't cached until now and there might be sites out there that have all kinds of things in there due to custom and maybe some contrib modules using it ways it's not really meant to be used.

Of course "the way it's meant to be used" is kind of being changed here, no existing documentation clearly says it should only be used for small bits of information and limited cases, it's just somewhat implied by the examples. Also, the fact that it uses the key value store and is "just" a specific key value collection and the key value API itself doesn't seem to be actually documented anywhere (except the query to get all keys on the documentation page). Neither on https://api.drupal.org/api/drupal/core%21core.api.php/group/info_types/10 nor is there a key value API on https://www.drupal.org/docs/develop/drupal-apis, and https://www.drupal.org/docs/8/api/state-api/overview (which I think is very confusing, especially the reset part).

Anyway, a counter-proposal/idea to the two different things idea:
* We introduce a setting that in D10 defaults to FALSE, so you have to opt-in to get the cache, with a requirements check that points out if the setting is not explicitly set and recommends to either set it to TRUE or FALSE explicitly
* in D11+, we switch the default to TRUE.
* We improve the state documentation to make clear in which cases it should be used and when it's better to use your own key value collection
* Consider moving some examples in core to a different key value collection to have fewer bad examples, coming back to the BC concerns though. I don't think we need to provide BC for example for the twig development mode setting though?

(Also setting to needs work so that @smustgrave no longer needs to be sad about this being stuck in the needs review queue ;))

berdir’s picture

+++ b/core/lib/Drupal/Core/Routing/RoutePreloader.php
@@ -73,17 +68,7 @@ public function onRequest(KernelEvent $event) {
 
-      // Ensure that the state query is cached to skip the database query, if
-      // possible.
-      $key = 'routing.non_admin_routes';
-      if ($cache = $this->cache->get($key)) {
-        $routes = $cache->data;
-      }
-      else {
-        $routes = $this->state->get($key, []);
-        $this->cache->set($key, $routes, Cache::PERMANENT, ['routes']);
-      }
-
+      $routes = $this->state->get('routing.non_admin_routes', []);
       if ($routes) {

if we add the setting, what are we going to do with this bit? drop the cache, sites with disabled cache will just get an extra query? Add extra complexity and do a check for the cache setting, only doing it if global state cache is disabled?

catch’s picture

Consider moving some examples in core to a different key value collection to have fewer bad examples, coming back to the BC concerns though. I don't think we need to provide BC for example for the twig development mode setting though?

This sounds good and I also don't think we need to provide BC for that one (hopefully none of the ones we move, but definitely not that one).

The $settings opt-in also sounds like a good plan.

#149 I could go for either of those approaches, not sure which is best, but at least it'll be temporary.

berdir’s picture

Status: Needs work » Needs review

It's a bit ugly, but I implemented the setting now in State, have to override a bunch of methods to avoid unnecessary cache get/set/invalidate calls.

This should have an effect on the new performance tests in core, haven't looked much at them yet, I guess that's going to change now.

catch’s picture

From https://git.drupalcode.org/project/drupal/-/merge_requests/5636/pipeline... looks like we're saving approx 10 database queries on nearly every page request, which is great and what I'd expect based on how many state checks we do usually.

Also there's a possibility it's a fluke since occasionally we get runs this fast already, but the full test run was 7m53 seconds with all functional tests finishing under 4 minutes - it may be enough of a change that it's making test runs faster overall. Shouldn't get too excited until that's happened a couple more times though.

Consider moving some examples in core to a different key value collection to have fewer bad examples, coming back to the BC concerns though. I don't think we need to provide BC for example for the twig development mode setting though?

Yeah I think that's fine - we can add a change record and the specifics of whether and when keys are set aren't part of the public API, obviously if contrib is actually relying on something a lot we'd need to think about it more carefully, but we can do that case-by-case.

berdir’s picture

> From https://git.drupalcode.org/project/drupal/-/merge_requests/5636/pipeline... looks like we're saving approx 10 database queries on nearly every page request, which is great and what I'd expect based on how many state checks we do usually.

Yes, saw that too. I was confused about the increase of queries in that umami test though, I need to look at what that exactly does, I wonder if there's some timing issues/state cache race conditions as it does multiple requests in quick succession.

Reproducing that manually in blackfire with a user without any permissions in umami gives me _very_ different numbers and what I expect. -4 state queries, +1 cache_default query, 33/3 in total. And not 17/18.

I think now that it's opt in, it would make more sense to move this to cache_discovery so we use the fast-chained backend for it? That should save another query then.

longwave’s picture

Checking on two separate production databases that I work on:

> select name, length(value) from key_value where collection='state' and length(value) > 10000;
+--------------------------------------+---------------+
| name                                 | length(value) |
+--------------------------------------+---------------+
| drupal_css_cache_files               |         30824 |
| routing.non_admin_routes             |         16115 |
| simple_sitemap.queue_stashed_results |       8136706 |
| system.js_cache_files                |         70931 |
+--------------------------------------+---------------+
4 rows in set (0.005 sec)
> select name, length(value) from key_value where collection='state' and length(value) > 10000;
+--------------------------------------+---------------+
| name                                 | length(value) |
+--------------------------------------+---------------+
| drupal_css_cache_files               |         10519 |
| simple_sitemap.queue_stashed_results |       6331055 |
| system.js_cache_files                |         11710 |
+--------------------------------------+---------------+
3 rows in set (0.001 sec)

Looks like simple_sitemap should be storing whatever that data is somewhere else.

catch’s picture

Reproducing that manually in blackfire with a user without any permissions in umami gives me _very_ different numbers and what I expect. -4 state queries, +1 cache_default query, 33/3 in total. And not 17/18.

One thing to note is that cache requests in the performance tests include those from chained fast, so if blackfire is ignoring those, it'll be a huge difference.

I think now that it's opt in, it would make more sense to move this to cache_discovery so we use the fast-chained backend for it

Seems sensible to me. isn't it using bootstrap already? If so that's already using chained fast.

catch’s picture

RefreshVariablesTrait calls $state->resetCache() which in the MR calls clear() which resets both the persistent and static caches, it needs to be ::reset() instead. Pushed a commit. Test should fail in the right direction now.

catch’s picture

#156 makes the change in the performance test look right to me, however we're seeing other test failures in runs which aren't consistent. I think the issue is that any test using state to track.. well.. state between requests, now needs to either switch to WaitTerminateTestTrait or to use the key value service directly without the state wrapper - because the cache will be updated in terminate. Pushed a commit for LanguageNegotiationContentEntityTest which was failing locally for me.

Before I found this, I thought I found a race condition in the state implementation - the cache entries were being invalidated before key value is written to. That's not the cause of the test failures after all, but I still think it's a valid change.

catch’s picture

Got to a green test run, but had to do some horrible things in LanguageNegotiationContentEntityTest and some of the test failures are random so could still be a couple of false positives.

I think the proper approach to those is going to be switching from state to raw key value in the test modules/test classes, if we do that we can probably undo TerminateWaitTestTrait. However this at least confirms that all of the test failures are due to test and tested site going out of sync when explicitly using the state API to track/alter things in the test/test modules themselves, and not real bugs caused by the change at all.

smustgrave’s picture

Any recommendation for reviewing this one?

catch’s picture

Issue tags: +Needs followup

I think the main remaining things are:

1. We should open issues for taking things out of state and into key/value directly which are accessed extremely rarely. The new development settings stuff that's only on container rebuild / form save are the obvious ones, there are probably more though.

Also an issue against simple_sitemap would make sense too.

2. Decide whether the changes in core/modules/language/tests/src/Functional/LanguageNegotiationContentEntityTest.php are acceptable here with a follow-up to move the test modules to use key/value, or whether we should refactor those test modules to use key/value in this issue. This also goes for the various test modules used in the tests that are getting WaitTerminateTestTrait added. I could go either way here - refactoring the test modules adds scope, but it'll be less churn because we then wouldn't be adding WaitTerminateTestTrait and removing it again.

Tagging needs follow-up since we need that for #1 even if we need to make a decision on #2.

berdir’s picture

Status: Needs review » Needs work

Did a rebase after the query logging changes. Ignored all changes to StandardPerformanceTest and OpenTelemetryAuthenticatedPerformanceTest, that will need to be redone completely.

Couldn't run either test on my current system, both fail with "Test was run in child process and ended unexpectedly", will need try somewhere else.

kristiaanvandeneynde’s picture

Assigned: Unassigned » kristiaanvandeneynde

I'll take care of the performance stuff real quick

kristiaanvandeneynde’s picture

Assigned: kristiaanvandeneynde » Unassigned

Went green on my local (the performance stuff)

berdir’s picture

Thanks, that looks great.

One thing I find a bit interesting is that the cache get count goes up. But we're also removing a cache get here in RoutePreloader, does that not run on these requests? Maybe because all these test scenarios actually hit dynamic page cache.

Would be kind of nice to have a test that doesn't use that as that's hiding _a lot_ of stuff, but I guess it could also get a bit overwhelming, that's going to be a lot of queries I guess. and not in scope of this issue.

catch’s picture

@berdir see #3414373: Route preloading doesn't run on dynamic page cache hits.

There are some cold/lukewarm cache tests in the Umami performance tests which don't do query assertions yet, I've been hesitant to add assertions to them while we had so many random test failures, but we should probably try that now. This isn't the same as no dynamic page cache at all though so we still might want that too.

kristiaanvandeneynde’s picture

I would assume the extra get is coming from the collector going to the cache once to get all of the state stuff, whereas before the state stuff were non-cache-related queries, which showed up in the list of individual queries we're tracking.

Quick guess: The cache get that is saved in RoutePreloader shows up in those test cases where the cache get count does not go up by one. The ones where it does go up, RoutePreloader doesn't save a cache get and state introduces one extra get. Just a guess, though.

Edit: Cross-posted with @catch and my guess seems to match what he's saying.

andypost’s picture

kristiaanvandeneynde’s picture

Last fail is Drupal\Tests\media\FunctionalJavascript\MediaSourceOEmbedVideoTest::testMediaOEmbedVideoSource, I'm not getting any wiser from the artifacts. I'm not seeing anything related to media in the changes either.

catch’s picture

Worth running the job again to see if it's an existing random. I don't remember that one specifically but there's been a lot of random media test failures.

andypost’s picture

The test reported already but only once in #2829040-188: [meta] Known intermittent, random, and environment-specific test failures and error is different, so maybe it's real issue with redirect?

Behat\Mink\Exception\ExpectationException: Current page is
    "/media/add/test_media_oembed_type", but "/admin/content/media" expected.
andypost’s picture

it's because of error The provided URL does not represent a valid oEmbed resource. for https://vimeo.com/7073899

andypost’s picture

traced down the test locally (reproducible) til \Drupal\media\OEmbed\ResourceFetcher::fetchResource()

cURL error 28: Operation timed out after 5003 milliseconds with 0 bytes received (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://172.30.1.2/media_test_oembed/resource?url=https://vimeo.com/7073899&altered=1

kristiaanvandeneynde’s picture

Why can't I do a fast-forward pull on this branch? Happened twice now already 🤔

Re #172, are we sending a GET request to Vimeo every time we run our tests? I'm not sure how happy Vimeo would be with us doing so :)

andypost’s picture

@kristiaanvandeneynde no, there's mapping in state to return local file and somehow this request is failed.

I used to add debug print and I see that request failed before the controller returns result(

steinmb’s picture

Very interesting work. Xhprof lead me to this issue, as I am tracking down a performance issues in a D10.2.x —site.

select name, length(value) from key_value where collection='state' and length(value) > 10000;

 custom_fs_connector.basic_data          | 808712
 custom_fs_connector.info_type_name_list |  27342
 custom_room_info                            | 304317
 custom_fs_connector.course_code_list    | 101678

This is information that not often change, tough by reading through the issue and try to understand what is going on, I would say, this data should not be stored here but the site custom module. From my understanding is this data that not often change.

kristiaanvandeneynde’s picture

Ah yes, now that you mention it: ResourceController::setResourceUrl() uses state.

Then the failure makes slightly more sense, we're changing how state works and now a test using state is failing.

longwave’s picture

Rebased, I expect the performance tests to fail as I couldn't figure out what the new expected counts were supposed to be from the conflicts.

Also reverted the AutomatedCron event subscriber priority back to 100 and the docs change in core.api.php, I think now that needs_destruction is handled after all terminate events (following #3412556: Allow needs_destruction services to run on page cache hits) that all terminate event subscribers are safe to use state.

@kristiaanvandeneynde fast forward pulls do not work when the branch has been rebased and force pushed, most MR contributors tend to use the rebase method instead of merging in 11.x as merge commits are sometimes tricky to work with and it makes the set of commits in the MR cleaner.

longwave’s picture

Maybe a can of worms but is this another use for a feature flag module over settings.php? We could enable the feature flag module on new installs but existing sites can opt in over time, and eventually we can deprecate/remove it?

Changed my mind, this is more low level and belongs in settings.php.

longwave’s picture

Status: Needs work » Needs review

MediaSourceOEmbedVideoTest sets state during the test and then expects the site under test to read that, but now we can't (easily) invalidate the site's state cache so the test doesn't have the effect it thinks it does. To solve this I switched the test ResourceController to use key value instead of state.

Also updated the performance test counts, and updated the change record a bit.

longwave’s picture

ExtensionList::getPathNames() appears to wrap cache around state, we can perhaps drop the cache wrapper in a followup.

kristiaanvandeneynde’s picture

In general this looks really good! I'm wondering why we're adding the WaitTerminateTestTrait without calling its method, though? I couldn't find anything in core that automatically calls that method if it's present either.

longwave’s picture

Removed WaitTerminateTestTrait seemingly with no harm done.

longwave’s picture

Removed the cache wrapper from ExtensionList, hoping this will remove another cache get.

edit: it did not, but it should still save some cache accesses.

kristiaanvandeneynde’s picture

Status: Needs review » Needs work

Just went over all the changes again and they look good to me. I would just like to see a follow-up so we don't forget to change the state_cache setting in D11.

Also wondering if we need a deprecation notice for dropping the $cache argument from RoutePreloader, although I couldn't find any other example of that in core or the documentation. The docs describe what to do when A, B, C becomes A, C; but not what to do if it becomes A, B. I would err on the side of adding a trigger_error still, using func_get_args().

I have no other objections. Good job on figuring out MediaSourceOEmbedVideoTest, by the way! Setting to NW for the 2 points above but feel free to ping me when those are addressed and I'll gladly set this to RTBC.

longwave’s picture

Thanks for reviewing, opened #3436954: Remove state_cache setting in Drupal 11.

#160 asked for some other followups, so leaving the tag in place for now.

Will add a deprecation for the RoutePreloader argument, and also change LanguageNegotiationContentEntityTest to use keyvalue in this issue, given we have precedent in the ResourceController for the OEmbed test.

longwave’s picture

Status: Needs work » Needs review
longwave’s picture

Also opened #3436960: simple_sitemap.queue_stashed_results should be in keyvalue instead of state, unsure if @catch wants to open any more followups or not - leaving the tag in place for now.

longwave’s picture

Issue summary: View changes

Added a release note given we are adding a new setting here.

The tests failed a couple of times but I think I was just unlucky with random fails - they just passed again so hopefully this is all good now.

catch’s picture

Tagging as a release highlight, because along with other changes in Drupal 10.3 we're dramatically decreasing the database queries on all page requests, can go in a general 'performance improvements' paragraph.

I opened #3436971: Move the TestWaitTerminate middleware to be added conditionally which I'd been thinking about for a while, but seeing the query in the diff reminded me of it.

Now that tests using state have been updated to k/v here, I think that's everything I was worried about in #160 so removing the tag.

kristiaanvandeneynde’s picture

RTBC for me, should fix the 2 drupal:10.0.0 references to drupal:10.3.0, though. Holding off on those grounds.

longwave’s picture

Fixed the deprecation version, don't know what I was thinking there.

In a followup I think we should add a note to https://api.drupal.org/api/drupal/core%21core.api.php/group/info_types/10 and perhaps https://www.drupal.org/docs/develop/drupal-apis/state-api/state-api-over... mentioning that the state store is meant for frequently used and infrequently changed small pieces of data, and the key-value store should be used for larger or more frequently changing data.

kristiaanvandeneynde’s picture

Status: Needs review » Reviewed & tested by the community

Agreed. Code looks good to me now, so RTBC.

catch’s picture

  • catch committed 16964d02 on 11.x
    Issue #2575105 by Berdir, catch, longwave, quietone,...
catch’s picture

Version: 11.x-dev » 10.3.x-dev
Status: Reviewed & tested by the community » Fixed

Really nice to be able to commit this one, we've come a long way since variable_get().

I had to resolve a merge conflict on 10.3.x, it was trivial, let's see whether I did I right or not...

Committed/pushed to 11.x and cherry-picked to 10.3.x, thanks!

  • catch committed 64e78946 on 10.3.x
    Issue #2575105 by Berdir, catch, longwave, quietone,...

berdir’s picture

Wow. Very tempted to insert the "It's been 84 years..." meme. This took a while, thanks for the help to push this over the finish line.

I've clarified and expanded the change record: https://www.drupal.org/node/3177901, specifically around the current opt-in/opt-out behavior and how to check for problematic usages of the state API.

I've also added a short paragraph on recommended/correct usage of the state API and I created a follow-up issue to improve API docs as well as the documentation on drupal.org: #3437176: Improve documentation of the state and key value API

longwave’s picture

This feels like a good heuristic:

Sites with more than 100 entries and/or a total value size/length of 100'000

Wonder if we should add a hook_requirements status check for this to help people before Drupal 11 enables this for good?

berdir’s picture

I just made those numbers up, felt like a nice, even numbers, not sure about them yet, especially the length one.

There is already a hook_requirements() warning, the problem is that there is no API to get that information, we would need to check the used key value implementation and then hardcode them or something like that, was not sure about the putting that complexity in there. What meant to do is link the change record there in that message so that people can read through that if they have questions. Maybe we can do a follow-up commit or issue for that?

jurgenhaas’s picture

Question: we have a sub-class of State in ECA, how should we declare the constructor and the service, if we want to support both Drupal 10.2 and 10.3? This seems impossible with this change, or am I missing something?

catch’s picture

I found https://git.drupalcode.org/project/eca/-/commit/3bb22ea5820fd81f72e46c44...

Since this is a subclass with its own service definition, why pass the extra services into your own constructor for both 10.2 and 10.3 - they won't be used in 10.2 but that seems OK?

jurgenhaas’s picture

Unfortunately not. We have an extra service to inject and our service declaration looks like this:

  eca.state:
    class: Drupal\eca\EcaState
    parent: state
    arguments: ['@datetime.time']

So, for 10.3 the constructor needs to look like this:

public function __construct(KeyValueFactoryInterface $key_value_factory, CacheBackendInterface $cache, LockBackendInterface $lock, TimeInterface $time) {

but for 10.2 it needs to look like that:

public function __construct(KeyValueFactoryInterface $key_value_factory, TimeInterface $time) {
catch’s picture

@jurgenhaas why can't you inject cache and lock into 10.2?

jurgenhaas’s picture

@catch ah, you mean I could update the service declaration there to mimic what 10.3 is going to do in the parent service? They should work, I'll give that a try in #3437321: Add support for cache collector for state in core

longwave’s picture

Spotted two issues here that deserve followups:

1. The new constructor calls the CacheCollector constructor with a fixed cache key; this will need to be changed in any subclasses, so we should move this to an overridable class constant.

2. We inject the bootstrap cache bin in services.yml but the BC fallback uses the discovery bin.

longwave’s picture

#206.1 can actually just be solved by overwriting the property, but opened #3437347: State cache collector uses a fixed cache and collection ID anyway.

#206.2 is a bug so opened #3437344: State cache bin is inconsistently chosen

catch’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

klemendev’s picture

We have observed that on some of our websites

SELECT name, LENGTH(value) FROM key_value WHERE collection = 'state' ORDER BY LENGTH(value) DESC LIMIT 100;

yields length numbers in orders of 200k, which is not ok according to https://www.drupal.org/node/3177901.

The vast majority of the length comes from drupal_js_cache_files.

https://www.drupal.org/node/3177901 recommends optimizing if

Sites with more than 100 entries and/or a total value size/length of 100,000 should review their specific entries using the following query and either optimize their usage or disable state caching as a temporary measure:

but how can we optimize something that originates from the core?

kristiaanvandeneynde’s picture

Could you file a new issue about this please? If core is setting that much data through drupal_js_cache_files, that definitely needs a dedicated issue.

berdir’s picture

Core removed any usage of the string system.drupal_js_cache_files 12 years ago when it was renamed to system.js_cache_files. And that too was deprecated and removed in in 10.2 or 10.1.

Whatever you have there is probably not core or at least not actively used anymore. Try searching your code base for it, if you don't find anything, you can delete those records. FWIW, since this is a cache collector, if nobody *uses* those state keys, they will also not end up in the cache.

klemendev’s picture

No occurrence of drupal_js_cache_files in the codebase, it seems this is stale stuff. The website we spotted this on used to run on all versions starting D7, so maybe it is an artifact from these times.

quietone’s picture