Problem

On every web request, DrupalKernel::getCachedContainerDefinition() loads the compiled container definition from the database via the cache.container service.

Loading the container is pretty slow: it not only involves fetching 500KB+ of serialized data, but also deserializing it into PHP arrays.

This database query happens during early bootstrap, before the main service container is available, before ChainedFastBackend is initialized, and before APCu-backed cache bins are accessible.

This means the container loading cannot benefit from APCu even though the container definition is ideal for APCu caching (large, rarely changes, read on every request).

The cache.container service was deliberately excluded from ChainedFastBackend in #2248767: Use fast, local cache back-end (APCu, if available) for low-write caches (bootstrap, discovery, and config) because of the chicken-and-egg problem: APCu cache services aren't available before the container that defines them is loaded.

Note: sites using Redis (#3007374: Document bootstrap_container_definition in README) or Memcache (#2882755: Move container to memcache) can override bootstrap_container_definition in settings.php to use those backends, but this still requires a network round-trip to the cache server on every request.

Solution

So I did an experiment (the serious kind) and added an "APCu fast path" in getCachedContainerDefinition() that call the raw APCu functions. We're basically skipping over the Drupal services, which I think is okay.

I’m getting back up to speed with the caching layer, so to be on the safe side, I modeled the design after ChainedFastBackend.

Benchmarks

HTTP TTFB improvement (Umami, 200 requests per scenario, APCu-only change):

Anonymous pages:

Page Baseline p50 APCu p50 Baseline p95 APCu p95 Baseline mean APCu mean
Front 23.0ms 20.2ms (-12%) 38.6ms 26.3ms (-32%) 25.5ms 20.8ms (-18%)
Article 27.9ms 22.4ms (-20%) 41.4ms 35.1ms (-15%) 29.2ms 24.1ms (-17%)
Recipes 21.7ms 19.9ms (-8%) 32.4ms 25.3ms (-22%) 23.1ms 20.4ms (-12%)


Authenticated pages (admin user):

Page Baseline p50 APCu p50 Baseline p95 APCu p95 Baseline mean APCu mean
Front 38.9ms 34.3ms (-12%) 50.7ms 40.7ms (-20%) 40.4ms 35.1ms (-13%)
Article 37.7ms 34.8ms (-8%) 51.6ms 43.3ms (-16%) 39.5ms 35.8ms (-9%)
Recipes 38.8ms 34.6ms (-11%) 48.7ms 41.6ms (-15%) 40.5ms 35.7ms (-12%)


These are HTTP TTFB measurements through DDEV's proxy (TLS + nginx + FPM) on my localhost. I'm seeing consistent 8-12% p50 improvement across all scenarios, with even larger p95 gains (15-32%). It feels important enough to evaluate more closely.

Disclaimer: I used Claude Code throughout, though much of the work was mine and I reviewed everything carefully.

Some extra details

  • Test coverage also follows the patterns from ChainedFastBackendTest but I added additional tests for various edge cases: corrupted APCu entries, APCu-unavailable fallback, cold start, and explicit invalidation. ChainedFastBackend's unit tests cover 5 scenarios; and our actually cover 11. We could consider porting some of my new tests to ChainedFastBackend, but I'm not sure that is needed.
  • Added an in-memory APCu emulator in DrupalKernelApcuTestKernel to validate the flow without a real APCu extension (CLI lacks APCu in this environment). A second test kernel (DrupalKernelNoApcuTestKernel) overrides getContainerApcuKey() to return NULL, simulating environments without APCu.

Edge cases

I tried to cover all the following edge cases:

Scenario Behavior
CLI (Drush, tests) APCu skipped (apcu_enabled() returns FALSE). Falls to DB.
Installation InstallerKernel uses allow_dumping=FALSE. No caching.
Update.php UpdateKernel::cacheDrupalContainer() returns FALSE. No caching.
Container rebuild mid-request APCu cleared in invalidateContainer(). DB fallback.
APCu memory full apcu_store() returns FALSE silently. DB fallback.
Multisite Site path in APCu key prevents collisions.
Multi-server deploy Cache key includes VERSIONS_HASH. Key changes on deploy.
Multi-server drush cr Timestamp in DB is removed by deleteAll(). Other servers detect missing timestamp and skip APCu on next request.


Related issues

Issue fork drupal-3583040

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dries created an issue. See original summary.

dries’s picture

Status: Active » Needs review

Tests are passing, so moving this to "Needs review".

dries’s picture

Addressed both points:

  1. Removed testReturnsNullWhenBothEmpty, testApcuHitWithEqualTimestamp, and testWriteThenReadRoundTrip
  2. Replaced StubCacheBackend with MemoryBackend. Eliminates the need for CID tracking and the custom test backend, as @longwave suggested.

Less is more.

catch’s picture

Couple of things:

1. Because we need to check the persistent backend for the invalidation timestamp, we don't save a database round trip. It does mean a much smaller entry to get from the database so it's possibly worth it despite this.

A possible solution would be to use cache_bootstrap for the invalidation timestamp and cache this somewhere for re-use in the real cache_bootstrap chained fast service.

2. Would it be possible to use the chained fast backend wired into the bootstrap container instead of hardcoding apcu calls in the kernel?

berdir’s picture

If everything in ChainedFast is fully injected then I think it would work to set up the bootstrap container definition for that. Doing it with a different backend is tricky though since that isn't supported, it has to be the same bin.

I guess we assumed that we still need the persistent bin lookup and that's why it was skipped, but I was wondering about that before. redis also optimizes the persistent backend lookup to be a simpler direct check without per-bin deletion lookups, see: https://www.md-systems.ch/en/blog/2025-01-26/redis-startup-performance-i....

redis also supports https://relay.so/, that essentially includes its own ChainedFast-like implementation with shared invalidation, see https://relay.so/docs/1.x/introduction. The default/recommended configuration for that includes the container bin, so this is already possible there.

dries’s picture

Refactored based on feedback from @catch and @berdir. The result is both cleaner code and less code. That said, I'm not 100% sure I did it correctly. The new ChainedFastBackend version is a tiny bit slower (~1-2ms on my laptop) due to the extra indirection. The overall performance gains is still significant though.