Problem/Motivation

Twig caching produces a directory structure looking like the following:

sites/default/files/php/twig
  - 5915e35d823c3_container.html.twig_5ytYVUs_gdkx819cOy1xqJ2Z_
   - 5915e35d823c3_datetime-form.html.twig_F5e-roY2x9_yW4qxQPvi924S5
   ...

These file names consist of

  • The prefix, generated by uniqueid() everytime the available twig extensions change, which is triggered by module install,
    and as such problematic on multi webhead environments. This prefix is stored inside state to be shared across all webheads, already.
  • The twig filename
  • Protection provided by PHP file storage

The current method of clearing the Twig cache (as seen in drupal_flush_all_caches()) is this:

  // Wipe the Twig PHP Storage cache.
  PhpStorageFactory::get('twig')->deleteAll();

By default (see \Drupal\Core\PhpStorage\PhpStorageFactory::get()), it is stored in the public:// stream wrapper, specifically in public://php/:

    if (!isset($configuration['directory'])) {
      $configuration['directory'] = PublicStream::basePath() . '/php';
    }

So, by default, it is written to public://, which by definition is shared across all webheads.

However, on hosting environments that use multiple web heads (such as Acquia Cloud Enterprise), the public:// stream wrapper is backed by a high-latency distributed file system. This is fine performance-wise for downloadable files (because they can be cached by a reverse proxy such as Varnish), but not for files that are required tens if not hundreds or thousands of times per second, which is the case for compiled Twig templates. Even when those files are cached in PHP's OpCache, the file system is checked to see whether the file has changed in the system, still resulting in many file system hits. If that file system then has high latency, this leads to terrible performance.

So, Acquia Cloud Enterprise, like many other Drupal hosting platforms, does something like this in settings.php:

$settings['php_storage']['twig'] = array(
  'directory' => '/some/writable/local/directory',
);

The result is that on hosting platforms that do this, this code:

  // Wipe the Twig PHP Storage cache.
  PhpStorageFactory::get('twig')->deleteAll();

really only clears the local directory that stores the Twig cache, on the current webhead only.

The root cause of this problem is quite simple: drupal_flush_all_caches() assumes that the Twig PHP Storage Cache's directory is shared across web servers.

Proposed resolution

Either:

  1. use $settings['deployment_identifier'] to key compiled Twig templates by — having multiwebhead hosting platforms set this to the git commit hash would then solve it (#5 (catch) + #8 (Wim Leers) + #14 (Berdir))
  2. add a Twig::invalidate() API that drupal_flush_all_caches() would call, which clears the state variable we already have. This enforces a recalculation of the prefix across all webheads (#7 (dawehner) + #10 (msonnabaum).

Concerns for solution 1 are that nobody knows about $settings['deployment_identifier']; the general response to that is that you already need to know about that anyway (se #14).

Concerns for solution 2 are that it makes PHPStorage dependent on State (or Cache).
This is no longer a valid concern, as it already depends on it.

A concern that applies to both is that stale files can linger forever (#48 + #49 + #52). This is impossible to solve though; Drupal can't be responsible for deleting those stale files, because it may require a long-running process to delete potentially many thousands of files. Therefore, it must be up to each hosting platform to delete those files in a background job. #49 contains one possible approach (the one pioneered by Pantheon).

Remaining tasks

User interface changes

None.

API changes

None.

Data model changes

None.

Comments

msonnabaum created an issue. See original summary.

cilefen’s picture

dsouza_rohan’s picture

Wim Leers’s picture

catch’s picture

We should at least use $deployment_identifier to key twig templates by.

catch’s picture

Also general note I agree with splitting this out from the phpstorage issue - there are things like $deployment_identifer we can do that don't require changes to phpstorage itself.

dawehner’s picture

In order to invalidate information across multiple webheads one needs a way to share information in an easy way. Much like we are now using the cache backend for the container, we could use a state => cache entry which is used by the files:

  • Twig::invalidate() => generates a new random value for the twig hash, invalidates the cache, stored in state
  • On runtime the phpstorage uses this hash for its hash loading, potentially cached, so no DB query is required

Would that simply work?

Wim Leers’s picture

Why not just deployment identifier?

What you say can work too (the random hash would be cleared in a single shared location and would therefore affect all web heads), but what's the benefit?

IMO the benefit of a deployment identifier is that you also prevent race conditions when two web heads are not updated at the exact same time (eg a few seconds apart).

dawehner’s picture

Why not just deployment identifier?

Well, of course deployment identifier works for existing people which are aware of it, but well, the fact that people run into the problem show that they don't know about it.
This issue is for solving the problem in a different way potentially.

What you say can work too (the random hash would be cleared in a single shared location and would therefore affect all web heads), but what's the benefit?

At the moment the invalidation just happen on the one webhead which retrieves the invalidation. When we use cache/state we would share the invalidation between web heads automatically, so something like drush cr would actually work as expected.

msonnabaum’s picture

What dawehner proposes is very much what I had in mind. We need to think about this in terms of correctness first.

dawehner’s picture

Status: Active » Needs review
FileSize
2.52 KB

So here is a patch for it. I talked for a bit with @fabianx about it and he seemed to be in favour of it as well.

Status: Needs review » Needs work

The last submitted patch, 11: 2752961-11.patch, failed testing.

dawehner’s picture

Status: Needs work » Needs review
FileSize
3.09 KB
578 bytes

Let's quickly fix this failure.

Berdir’s picture

But the downside of that is that phpstorage now depends on cache/state?

We actually have a cache backend that uses phpstorage, if someone were to use that for bootstrap, that wouldn't end well :)

Yes, you have to know about deployment identifier. But if you do multi-head deployments, then you have to *know* stuff anyway. You have to know what to share, how to share, how to deploy/invalidate.

I think what most of those sites actually want for production, be it single or multi-head, is a phpstorage backend that isn't read only but ignores delete requests, because you know that it will not change unless you actually do a deployment. And then you can either use a deployment identifier or explicitly delete the files on deploy.

The problem that I see with both this patch and the deployment identifier approach is that we don't actually delete the old files. Over time, won't that become a problem?

dawehner’s picture

I see a potential problem in that indeed.

Over time, won't that become a problem?

Mh in order to solve that properly we could introduce some prefix for the filenames which point deployment identifier or so. By that we could cleanup old files when they no longer match to the current deployment identifier.

dawehner’s picture

Here is the alternative approach which at least allows to use the deployment identifier.

Status: Needs review » Needs work

The last submitted patch, 16: 2752961-16.patch, failed testing.

chr.fritsch’s picture

Status: Needs work » Needs review
FileSize
2.1 KB

I fixed the tests to bring back the attention to this issue.

Status: Needs review » Needs work

The last submitted patch, 18: no_reliable_method-2752961-18.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review

Failed with random fails before.

anavarre’s picture

Any reviewer? It's a pretty stubborn issue with multiple web heads.

dawehner’s picture

+++ b/core/modules/system/src/Tests/DrupalKernel/ContainerRebuildWebTest.php
index 67f1104..0c6ec8a 100644
--- a/sites/default/default.settings.php

--- a/sites/default/default.settings.php
+++ b/sites/default/default.settings.php

+++ b/sites/default/default.settings.php
+++ b/sites/default/default.settings.php
@@ -295,7 +295,7 @@

@@ -295,7 +295,7 @@
  * custom code that changes the container, changing this identifier will also
  * allow the container to be invalidated as soon as code is deployed.
  */
-# $settings['deployment_identifier'] = \Drupal::VERSION;
+$settings['deployment_identifier'] = \Drupal::VERSION;
 
 /**

One Thing I'm wondering ... does this mean we should have some default fallback for the deployment identifier to always contain the Drupal version? Maybe a dedicated method Settings::getDeploymentIdentifier() and use that by default?

catch’s picture

https://api.drupal.org/api/drupal/core!lib!Drupal!Core!DrupalKernel.php/... uses the version string and the deployment identifier.

dawehner’s picture

Right, given that, we should simply not comment out default.settings.php and call it a day?

Cottser’s picture

@dawehner in other words leave default.settings.php as is, correct?

dawehner’s picture

@dawehner in other words leave default.settings.php as is, correct?

Yeah I would think so?

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

Status: Needs review » Needs work
Issue tags: +Triaged D8 major

@alexpott, @cilefen, @catch, @effulgentsia, and I discussed this issue today and agreed that it is a major bug because it reduces scalability and can lead to invaild caches, fatals, etc. on multiple web heads.

I thought #2301163: Create a phpstorage backend that works with a local disk but coordinates across multiple webheads had been fixed and was going to reference that, but no such luck. This issue was spun off from that one. I started to move this to the theme system, but it looks like the fix is actually still for the PHP storage here.

Is there any way to provide some manner of test coverage here?

Also, sounds like the patch needs work to remove the default.settings.php hunk (or perhaps expand the documentation for the line instead of uncommenting it).

Thanks everyone!

bradjones1’s picture

Re-rolled.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

moshe weitzman’s picture

Yeah, I think some more documentation about the value of deployment_identifier would be nice.

I guess the site should increment it for each deployment? So, something like $settings['deployment_identifier'] = \Drupal::VERSION. '-deploy1'. The next time you deploy you write in 'deploy2', and so on. Seems like a hassle compared to the State solution.

P.S. @anavarre et al - Acquia could use its settings.inc to automatically set a unique deployment_identifier.

moshe weitzman’s picture

Is there some other cache backend that twig could/should use in a multi web-head situation?

dawehner’s picture

In a good environment one could use environment variables to get this deployment identifier, see #2501901: Allow configuring the environment via an environment variable as a "discussion" about leveraging those.

bkosborne’s picture

As someone prepping to launch a D8 site on multiple web heads, it seems like there are just two current options to resolve this:

1. After any twig template has changed and has been deployed, SSH into each web server and clear the Twig cache: \Drupal\Core\PhpStorage\PhpStorageFactory::get("twig")->deleteAll();
2. Apply this patch above, then establish a method where the deployment identifier variable is changed every time you deploy updates to Twig templates.

Is that correct?

webflo’s picture

That is correct.

bkosborne’s picture

Heads up for anyone on Acquia: This patch alone won't work. In Acquia's settings.php file that's included on all sites, it sets $settings['php_storage']['twig']['secret'] = $drupal_hash_salt. Not sure why it does that when that's already what core does in PhpStorageFactory. Anyway, you need to unset that secret in your settings.php file so that the logic in PhpStorageFactory runs.

MKorostoff’s picture

Hey guys, just a thought, but would it be possible to make Drupal configurable such that twig cache can reside in a shared location, much like other caches? Given the option, I'd be very happy to store twig cache in a memcache server that is shared amongst all of my apache servers. Thanks!

MKorostoff’s picture

Here's what I don't understand about this issue: I currently see in my project a directory called sites/default/files/php/twig. In this directory is a bunch of files that sure look like they're the twig cache. This is significant because this directory is shared among all my apache servers. So my question is: if this directory isn't the twig cache, what is it? Am I understanding this accurately if I say that these files are the "compiled" output of twig templating, but the actual cache is elsewhere?

Berdir’s picture

Nope, that is the cache, there is nothing else. If they are in a shared directly, then everything is working correct. But the performance isn't great, because writing and accessing those files is not as fast as it could be. Being able to do that in a local directory on each server, especially someting that's actually tmpfs would be faster.

This issue is about supporting such scenarios better.

You can even do local folders already if you include a deployment step to clear that folder on every server on deployment. As long as you don't have a scenario where configuration changes could result in *changed* compiled twig templates. There's nothing in core that does that AFAIK, but it's technically possible.

MKorostoff’s picture

@Berdir, thanks for the reply. So here's what I don't get: currently, on Acquia hosting they are officially recommending to ssh into each individual server and clear twig cache. So if you have 15 apache servers, for the time being, that means opening and closing 15 ssh sessions. The thing is, every acquia production instance has a filesystem that is shared by all of the web servers. So my question is: when I ssh into a server to clear twig cache, what am I clearing? If I were only clearing the contents sites/default/files/php/twig, wouldn't I just have to do it once on any randomly selected web server?

Berdir’s picture

If it is not in a local directory, then that is enough, yes. In fact, a normal full cache clear like drush cr will already do that as well.

bkosborne’s picture

@MKorostoff Acquia does not store the twig cache files on the shared filesystem, instead they are on temp file stores for each web head. I believe I saw a comment in the Acquia config file somewhere where that's set saying it's related to performance.

MKorostoff’s picture

@bkosborne thanks!

deviantintegral’s picture

would it be possible to make Drupal configurable such that twig cache can reside in a shared location, much like other caches? Given the option, I'd be very happy to store twig cache in a memcache server that is shared amongst all of my apache servers.

I did some investigation into this a few months back, as I was thinking the same thing. I'll copy my notes here.

  • My initial thoughts about using memcache likely won't pan out. The PHP opcache doesn't work with eval(), and skipping the opcache is a pretty significant performance hit, especially when sites will have hundreds or thousands of templates. One (sketchy) idea I had was to create a stream wrapper, but that adds a ton of extra code just to load a template.
  • This twig PR makes it clear that upstream twig will never support anything other than the filesystem for templates. There's an interface now instead of being hardcoded, but the answer seems to be "you're doing it wrong" if you can't use the filesystem for compiled templates.

The options I see are, ordered from least to most complex:

  • If we end up resolving this by having a shared temporary directory, then we could fairly easily map templates to temp. Unless we decided to back that with RAM, then we'd be having some sort of persistent storage, in which case we might as well save the complexity and just use public://.
  • Precompile templates during a site build job. Symfony supports this, but Drupal doesn't. We'd have to include this as a part of all builds for all sites. We'd have to write a custom template loader as well. This is somewhat complex as it requires a bootstrapped Drupal to compile templates.
  • Compile at runtime, but without templates being shared across web servers. This could hurt performance after a deployment (each webhead would independently compile templates), and would only work if we could ensure temp is wiped out on all webheads after a deployment.
  • Store template compilations in memcache, and write a stream wrapper so compiled templates can be stored in the opcache. This is the most risky strategy - I'm not even sure if opcaches work with custom stream wrappers.

If I remember right, twig template files use the hash of the file contents in their names. In other words, there isn't much of a reason to clear template caches unless disk space is a concern. After a deployment, even if old templates are on the web nodes, any template changes should result in a different hash.

For this set of sites, we ended up writing and using https://www.drupal.org/project/twig_temp and not worrying about clearing templates as we were deploying with Docker, so filling up all the disk space was never an issue.

deviantintegral’s picture

Oh, and warming the template caches at build isn't an option either, even if you wrote a drush command for it, since #template can be used in render arrays to generate templates at runtime. You could warm all file-based templates, but dynamic templates would still be generated on a per-web-node basis.

Wim Leers’s picture

#42 is correct, Acquia Cloud Enterprise has multiple webheads, and to not hit the slow (because network latency) filesystem that is shared across webheads, it stores compiled Twig templates on the local (ephemeral) storage.

#44: Thanks so much for writing up your thought process! But I'm confused why some of the options you listed even made the list. Feedback below:

If we end up resolving this by having a shared temporary directory, then we could fairly easily map templates to temp. Unless we decided to back that with RAM, then we'd be having some sort of persistent storage, in which case we might as well save the complexity and just use public://.

We should not require the use of a shared temporary directory.

Precompile templates during a site build job. Symfony supports this, but Drupal doesn't. We'd have to include this as a part of all builds for all sites. We'd have to write a custom template loader as well. This is somewhat complex as it requires a bootstrapped Drupal to compile templates.

This is not possible, because in Drupal it's impossible to enumerate all Twig templates. This would be splendid though; it'd remove some deployment risks.

Compile at runtime, but without templates being shared across web servers. This could hurt performance after a deployment (each webhead would independently compile templates), and would only work if we could ensure temp is wiped out on all webheads after a deployment.

That is effectively what Acquia Cloud Enterprise does. Indeed the current work-around is to wipe temp on all webheads after deployment. However, an actual solution would be to do what this patch has been doing: ensure there is an identifier/hash/… that all webheads respect from a single source of truth (either State in the DB, or deployment_identifier in the code, see the discussion in #1 through #15).
Let's not forget this is first and foremost a synchronization/coordination problem.

Store template compilations in memcache, and write a stream wrapper so compiled templates can be stored in the opcache. This is the most risky strategy - I'm not even sure if opcaches work with custom stream wrappers.

Drupal can't require memcache. We could store it in the DB instead, but that'd just mean we'd be using the DB as a cache storage. Per \Drupal\Component\PhpStorage\PhpStorageInterface's docs, that'd be supported. But we'd be incurring network latency for every single file. The whole point is to avoid that latency.

If I remember right, twig template files use the hash of the file contents in their names. In other words, there isn't much of a reason to clear template caches unless disk space is a concern. After a deployment, even if old templates are on the web nodes, any template changes should result in a different hash.

… but the point of caching compiled Twig templates is that we don't need to hit the file system for every single template on every single request. You're right that we hash the contents, but we don't check if the original Twig template actually has been modified on every single request. And the compiled Twig templates are stored in PHP's OpCache, so we don't hit the file system at all.

The problem is that (as the IS says) after a deployment, drupal_flush_all_caches() is called, which calls PhpStorageFactory::get('twig')->deleteAll();, but that only affects the storage on the current machine. All other webheads still have the compiled Twig templates, and are blissfully unaware that Twig templates may have been updated.

So what we need, is some hash in State or the deployment_identifier in settings.php to be modified for every deployment, so that coordination across multiple webheads is possible.

(Getting back into this, so I hope I didn't make any mistakes/am not misremembering certain things.)

Wim Leers’s picture

Issue summary: View changes

Now, let's update the IS. Because it's currently biased towards Acquia Cloud Enterprise: it portrays the problem as something that affects all Drupal 8 sites/hosting, but that's not true.

Also, let's list the proposed solutions that can work, and the concerns raised against them.

Hopefully that gets this issue back on track again :)

Berdir’s picture

#46 looks correct to me. Only comment is that IMHO, if the infrastructure supports it, I would recommend some kind of deployment job that clears the folder on all servers over drupal providing a changing identifier.

Because with such an identifier, the old files will not be removed until you do a cache clear in drupal (and then only on one server), so there is a risk that you might get a considerably large amount of files on some servers, and that in turn can slow down accessing those files and writing new ones.

That apparently seems to be a hard problem to solve (not sure why), platform.sh enterprise does also not support that yet.

Wim Leers’s picture

Issue summary: View changes

Good point, added that concern to the IS.

And funny enough, the Acquia Cloud ticket was updated less than an hour ago with information about an alternative solution, pioneered by Pantheon: https://github.com/pantheon-systems/drops-8/pull/182/files. It does this:

/**
 * Place Twig cache files in the Pantheon rolling temporary directory.
 * A fresh rolling temporary directory is provided on every code deploy,
 * guarenteeing that cached files that may be invalidated by code updates
 * are always fresh. Note that in the case of twig cache files, the
 * rendered results are also cached in the database, so a cache clear is
 * still necessary to see updated results after a code deploy.
 */
if (isset($_ENV['PANTHEON_ROLLING_TMP']) && isset($_ENV['PANTHEON_DEPLOYMENT_IDENTIFIER'])) {
  // Relocate the compiled twig files to <binding-dir>/tmp/ROLLING/twig.
  // The location of ROLLING will change with every deploy.
  $settings['php_storage']['twig']['directory'] = $_ENV['PANTHEON_ROLLING_TMP'];
  // Ensure that the compiled twig templates will be rebuilt whenever the
  // deployment identifier changes.  Note that a cache rebuild is also necessary.
  $settings['deployment_identifier'] = $_ENV['PANTHEON_DEPLOYMENT_IDENTIFIER'];
  $settings['php_storage']['twig']['secret'] = $_ENV['DRUPAL_HASH_SALT'] . $settings['deployment_identifier'];
}

As part of that approach, one could also delete the "old temp" on every deployment. It wouldn't introduce problems because it's not shared across webheads anyway, so there's no coordination problem. And it'd solve the "ever growing file system usage" problem described in #48 AFAICT.

Thoughts?

Does this make solution 2 in the IS (introducing Twig::invalidate(), which updates a hash/key/ID stored in State or Cache) more appealing again?

moshe weitzman’s picture

I support solution #2 for the reasons dawehener mentions in #9. It works for the most people as they would expect. I'm not overly bothered by the fact that invalidation relies on State.

pwolanin’s picture

minor point, related to other fixes we made in the Twig system, we don't necessarily want a random value to check against, but rather a unique value each time (e.g. microtime() or uniqid() ), since a truly random value can repeat (in theory).

greg.1.anderson’s picture

In the Pantheon solution, the `ROLLING` portion of the `PANTHEON_ROLLING_TMP` environment variable is simply the deployment identifier. As was previously mentioned, deleting the stale twig cache files is not critical, but not deleting them at all causes them to pile up over time. Pantheon solves this problem by deleting stale rolling temporary directories slowly over time in a backend scavenge process. The scavenger can look up the deployment identifier for each binding directory, so it knows to skip the directory that contains the active set of cache files. It also avoids deleting any directory that has been modified too recently, just for good measure. Drupal sites with very complex custom themes can have a very large number of twig cache files, which can sometimes take a significant amount of time to delete. Keeping scavenging out of the deployment process is therefore beneficial, as that's one less thing for deploys to have to wait for.

It seems to me that solving the multi-head cache scavenge problem in Drupal is not the most practical solution. The services that are managing the different heads is better suited for cleanup; deleting a lot of twig files as part of a web request, similar to the way poor man's cron works for sites without cron, would be even worse than doing it at deploy time. If code to do this is added to core, it should be an opt-in service for folks to turn on. Since single-head already works, perhaps multi-head could be solved in documentation only.

@deviantintegral's https://www.drupal.org/project/twig_temp will suffice for some use cases as well.

Wim Leers’s picture

#50: I don't know PHPStorage well enough to know/predict what the consequences are of making it dependent on State. Hopefully @Berdir can chime in.

Since single-head already works, perhaps multi-head could be solved in documentation only.

+1!

Wim Leers’s picture

I just noticed that #2569119: Use Crypt::hashBase64(), not hash('crc32b') or sha1 for placeholder tokens made some changes to TwigEnvironment. The changes (and the code) are a bit confusing, but

  1. what matters is that there's now a twig_cache_prefix that is recalculated (and set to uniqid()) whenever:
  2. the Twig extension hash changes (which can only happen when the container is rebuilt and \Drupal\Core\DependencyInjection\Compiler\TwigExtensionPass discovers new or updated twig.extension-tagged services during that container rebuild)
  3. the twig_extension_hash_prefix State entry does not exist (either because it's a fresh install, or because this key-value pair was deleted)

(The accompanying change record: https://www.drupal.org/node/2833264)

This means that proposed solution 2, which includes the introduction of Twig::invalidate(), could be as simple as this:

public function invalidate() {
  $this->state->delete('twig_extension_hash_prefix');
}

Patch attached.

(I think this is the change that @pwolanin is referring to in #51?)

Wim Leers’s picture

FileSize
1.87 KB
3.16 KB

The twig_extension_hash_prefix key is a very confusing misnomer, because it contains more than just the extension hash prefix.

Let's at least move that to a class constant: that prevents bugs to typos, and allows us to make it clearer.

Wim Leers’s picture

Issue summary: View changes

#48's concern that files can linger, which was confirmed and elaborated on by #52, has been added to the IS by me in #49. I attributed it as a problem for solution 1. But really, it's a problem for both solutions.

IS updated.

Wim Leers’s picture

The patch in #54/#55 implements solution 2. But we could actually have it implement both solutions. Essentially by changing

new TwigPhpStorageCache($cache, $this->twigCachePrefix)

to:

new TwigPhpStorageCache($cache, $this->twigCachePrefix . Settings::get('deployment_identifier'))

Because the current solution still requires drupal_flush_all_caches() to be called to get updated Twig templates. We shouldn't require that.

In other words:

  1. #54/#55 basically fixed the "Clear all caches" button at /admin/config/development/performance for multiwebhead setups
  2. but it still doesn't fix the "code deployment with updated Twig templates" problem

Also relying on the deployment identifier would fix that.

Thoughts?

Wim Leers’s picture

Also, per #56, we'll need documentation for multiwebhead hosting platforms, so it's clear what Drupal expects them to do/provide. And undoubtedly we want a change record to announce that to the world.

dawehner’s picture

  1. +++ b/core/lib/Drupal/Core/Template/TwigEnvironment.php
    @@ -63,7 +77,7 @@ public function __construct($root, CacheBackendInterface $cache, $twig_extension
         if ($options['cache'] === TRUE) {
    -      $current = $state->get('twig_extension_hash_prefix', ['twig_extension_hash' => '']);
    +      $current = $state->get(static::CACHE_PREFIX_METADATA_KEY, ['twig_extension_hash' => '']);
    

    So does that mean we have an uncached state query on every request in HEAD?

  2. +++ b/core/lib/Drupal/Core/Template/TwigEnvironment.php
    @@ -83,6 +97,15 @@ public function __construct($root, CacheBackendInterface $cache, $twig_extension
       /**
    +   * Invalidates all compiled Twig templates.
    +   *
    +   * @see \drupal_flush_all_caches
    +   */
    +  public function invalidate() {
    +    $this->state->delete(static::CACHE_PREFIX_METADATA_KEY);
    +  }
    +
    

    We should document here how this helps multi-webhead deployments, IMHO.

Wim Leers’s picture

#59:

  1. Yep, since #2569119: Use Crypt::hashBase64(), not hash('crc32b') or sha1 for placeholder tokens… Introduced by @pwolanin in #2569119-62: Use Crypt::hashBase64(), not hash('crc32b') or sha1 for placeholder tokens based on discussion with @alexpott apparently. Fabianx RTBC'd. Nobody thought of that. (Man, I really wish we had automated performance tests, that'd have caught this: #2308683: [meta] Create a 'Profiling' install profile, for testbot-powered simple profiling and easier local profiling.)
  2. Oh, yes, docs are still needed. For now, just asking for a review of the approach in #55 (which implements solution 2 in the IS), as well as what thoughts are about also implementing solution 1 as I proposed in #57.
dawehner’s picture

Sorry for not commenting on the actual fix. I think it solves the problem perfectly!

Crell’s picture

Hi, representative from a multi-webhead host here. :-)

If I'm following the proposal here properly, it's just a question of how we generate a unique value for each deploy/change to use, right? Either use a host-provided value or generate a random one on each cache clear (which would also happen after a deploy and thus pick up that use case, too)?

We (Platform.sh) have a git tree hash value we can trivially map into the settings array so option 1 would be fine with us. My concern is that it assumes that the templates will only ever change on a deploy; while in a logical world with most frameworks that's the case, it would preclude user-editable Twig files that get compiled like any other Twig file. I'm not sure if anyone is doing that yet, but it's conceptually something that I know we discussed on and off many times in the Panels/SCOTCH world. (Think, basically, Contemplate module but not terrible. Generate a Twig file based on user config, compile it, the compiled version gets used so there's no performance overhead and no PHP is stored in the DB so there's no security issue.)

That would imply option 2 is more robust, as it still leaves the door open for configuration-based changes to the compiled twig templates. Leveraging the deployment identifier as part of that seems fine though; in fact I'm going to go add that value to our standard project template now that I know it exists. :-)

The other question is cleanup. I agree that blocking a cache clear or delete on deleting many files, possibly over a network connection, is a bad idea. However, we shouldn't just punt it to hosts unless we give hosts a clear way to do so. If I read the current proposal correctly (and I may well not be), the prefix is just a file prefix; that means the cleanup process needs to "delete all files in $dir except those that begin with $value_in_the_state_system"? That seems error prone and inefficient, and the cleanup process would need access to Drupal anyway to look up that value. (Since, per above, commit hash alone is insufficient.)

Using the ID as a directory name rather than file prefix would help a little, but ideally we'd still allow an outside process to do the cleanup without needing to access Drupal systems. If we need that, then we may as well let Drupal do it itself in a cron task.

'course, now that I think about that, though... If the whole point is to let each web head have the cache on a *local* disk, is it really going to be that slow? Slow enough that we can't just do it inline on deploy and eat the extra second or two to rm a directory? (And for changes coming from config without deploy, meh, disk is cheap.)

I may be missing some thing here...

dawehner’s picture

Issue summary: View changes
Issue tags: -Needs change record

The more I think about it, the more I like wims solution. I updated the issue summary to explain what is actually the status quo and how the state solution solves the problem, especially because it doesn't worry about the cleanup step, given the reasons described in the issue summary. Not sure anyone would care about it, but in case someone does, there is always room for another issue, solving that here would just derail the solution we have.

Here is a change record: https://www.drupal.org/node/2879963
For a second I was wondering whether we should use \Twig_Environment::clearTemplateCache but it turns out, cacfb069b2e65d5487489677238142b464135b40 in twig removed the capability , see https://github.com/twigphp/Twig/issues/1879 as a comment.

Berdir’s picture

Yeah, since we have the state key already, makes sense to build on top of that.

However, the patch completely removes deleting the actual files, we can't rely on some infrastructure process existing by default? That's also going to be annoying for local debugging when you need to debug something in a generated twig file?

What about keeping the default delete for single-head server as a default but provide a subclass that deletes the state key instead?

dawehner’s picture

What about doing both for now. A multi-webhead project can still implement their own subclass, which does something better, if they want?

Crell’s picture

Question: How likely is a cache clear without redeploy to change a cached Twig file?

I noted above the potential for config-driven templates, but I don't know if anyone is doing that currently. Anyone else know?

Here's the challenge I see, as a host:

1) If a cache clear does not change compiled twig files, then the deploy ID is sufficient and totally easy. In which case the state ID is kinda irrelevant.

2) If cache clear does change compiled twig files and cache clears are frequent compared to deploys, then any clean up script MUST leverage the state ID and thus MUST run from within Drupal.

3) If cache clear does change compiled twig files and cache clears are not frequent compared to deploys, then it's safe enough for hosts to ignore the extra files and fall back to option 1.

So then the two questions are:

1) Do cache clears change compiled twig files (viz, is that a thing we want to support)
2) Are cache clears sufficiently frequent compared to deploys that we care about them being separate events?

greg.1.anderson’s picture

Cleanup in cron would be fine. Providing a default cleanup algorithm that can be overridden would also be fine. If supporting downloading twig templates or in-browser twig editing et. al. is desired, then managing the deployment identifier inside Drupal makes sense.

I am wary about doing cleanup at the instant that the twig files are invalidated. If another process on the same web head was accessing the twig cache at exactly the same time, then you would thrash the cache. At the end of the day, the inefficiencies of this sort of thing are not going to be too great; however, in any cleanup scheme, would still behove you to avoid deleting files that might still be in use (i.e. by a process still holding the old deployment identifier in a local or static variable).

greg.1.anderson’s picture

Posted before I saw #66. From my standpoint, those questions are still open. However, it seems that the cost of managing the state ID from within Drupal is not great, whereas NOT managing this ID in Drupal prevents these scenarios from working smoothly on multi-webhead hosts.