Problem/Motivation

Most webservers have a maximum size for headers.

Proposed resolution

We could implement an (optional) hashing strategy that would significantly reduce the size of the headers that are sent out.

Acquia purge already includes a similar solution.

CommentFileSizeAuthor
#63 purge_cache_tags_minify-2952277-63.patch14.86 KBbceyssens
#62 purge_cache_tags_minify-2952277-62.patch16.25 KBbceyssens
#61 purge_cache_tags_minify-2952277-61.patch14.51 KBNickDJM
#57 purge_cache_tags_minify-2952277-57.patch14.32 KBrp7
#54 purge_cache_tags_minify-2952277-54.patch7.05 KBuros.ceh
#53 purge_cache_tags_minify-2952277-53.patch14.54 KBjmaties
#51 purge_cache_tags_restrict_tags-2952277.patch846 bytesastoker88
#49 purge_cache_tags_minify-2952277-49.patch7.44 KBastoker88
#48 purge_cache_tags_minify-2952277-48.patch5.32 KBastoker88
#45 3069291_9.3.patch32 KBmpp
#38 purge_cache_tags_minify-2952277-38.patch7.04 KBkiseleva.t
#37 interdiff-36-37.txt1.1 KBMiroslavBanov
#37 purge_cache_tags_minify-2952277-37.patch7.04 KBMiroslavBanov
#36 purge_cache_tags_minify-2952277-36.patch7.24 KBkiseleva.t
#31 2952277_31.patch14.92 KBseanB
#30 2952277_30.patch14.97 KBseanB
#28 2952277_28.patch14.85 KBmpp
#27 purge-2952277-27.patch14.88 KBvtcore
#25 purge-2952277-25.patch15.26 KBjonhattan
#24 purge-2952277-24.patch15.26 KBjonhattan
#2 2952277-cache-tag-minify-2.patch7.95 KBbucefal91
#4 2952277-cache-tag-minify-4.patch16.41 KBbucefal91
#4 interdiff-2-4.txt468 bytesbucefal91
#10 purge-2952277-10.patch13 KBGaëlG
#10 purge-2952277-4-10.interdiff.txt7.97 KBGaëlG
#10 interdiff-4-10.txt7.97 KBGaëlG
#15 purge-minify_cache_tags-2952277-15.patch13.71 KBrp7
#16 purge-minify_cache_tags-2952277-16.patch14.15 KBGaëlG
#16 purge-minify_cache_tags-2952277-16-15.interdiff.txt673 bytesGaëlG
#19 purge-2952277-19.patch15.16 KBndobromirov
#19 interdiff-2952277-16-19.patch4.26 KBndobromirov
#21 interdiff-2952277-19-21.patch377 bytesndobromirov
#21 purge-2952277-21.patch15.18 KBndobromirov
#22 purge-2952277-22.patch15.09 KBrp7
#22 interdiff-2952277-21-22.patch605 bytesrp7
#23 interdiff-2952277-22-23.patch2.52 KBjonhattan
#23 purge-2952277-23.patch15.26 KBjonhattan

Issue fork purge-2952277

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mpp created an issue. See original summary.

bucefal91’s picture

hello!

Just recently I leveraged purge module for Varnish caching and we do hit the issue of too long headers (apache + mod_fcgi). The limit is 8kb and I think one has to re-compile apache in order to bump it (https://maxchadwick.xyz/blog/http-response-header-size-limit-with-mod-pr... , https://github.com/webdevops/TYPO3-metaseo/issues/233)

I read through #2844852: Minify the cache tags sent in the header and I understand the concerns of nielsvm. I still think running a minor risk of hash collision is better than running no Varnish on anonymous HTML :)

I would propose to introduce some kind of salt into the hash to cover the issues of collision. Considering https://www.drupal.org/project/acquia_purge/issues/2844852#comment-12513413 .. basically for the website developers is OK to have some collisions, it is very undesirable to have such a collision where one tag is updated very frequently and another is part of too many pages. Then it would invalidate too much of cache too frequently.
If developer encounters such case in his setup, he could just change the salt to kind of re-distribute the hashes in a new fashion. With certain (rather high) degree of luck, a new salt will change collisions and the "frequent" and the "popular" cache tags will no longer collide. It does require some thinking and analysis from the developer, but that's at least some kind of a solution. The salt could be kept in Drupal State API and we could expose some "reset salt" button on the admin UI.

Also, is there any particular reason md5() was used in https://www.drupal.org/project/acquia_purge/issues/2844852 ? I am by far not an expert in cryptography and hashing, but nowadays md5 used even in non-sensitive context will trigger warnings and questions upon a security audit. I checked, D8 core password hashing routine uses hash('sha512')

On a different note, there is also an alternative to pure "blind" hashing. Hitting 8kb limit is not so easy, after all, for a single page. Normally it should be some "family" of cache tags that "explode" the length of the header. In the case of https://www.drupal.org/project/acquia_purge/issues/2844852 it was taxonomy terms. In my case there are few such families: paragraph:*, node:*. If our real estate (header size) is limited, we should optimize its consumption. We could abbreviate (compress) the most common (at least in Drupal core) tags to 1 letter. Things like (using regexp syntax to make it shorter):

  • node:(\d+) => n:$1
  • config:(.+) => c:$1
  • block => b
  • taxonomy_term:(\d+) => t:$1
  • paragraph:(\d+) => p:$1

Such rather primitive string substitutions can significantly help in shrinking length of the header. Also, they do not obfuscate matters as badly as md5 (that's to consider #2890018: Introduce a debug mode for tags not to be minified in the HTTP headers). They also expose a risk of collisions, but I think the risk of undesired collisions as I explained them above is smaller with the substring substitution. The patch I attach takes this approach - I introduce a minification service (so both Cache-Tag header and Purger plugins get to use the same logic on minification without code duplication). I also introduce a hook to collect the substring replacement in cache tags - through this hook one could tailor the replacements to his own specific case (in case minifying core tags is not enough and his website has lots of custom cache-tags). The patch works quite well on my localhost. I am going to run a few more tests and will deploy it onto the server where the website I am working on is hosted.

And the last thought... If your page has more than 1k cache tags (8 bytes per cache tag) than there is nearly no way to solve it by any kind of manipulation/minimifaction. I guess fundamentally proper way is to have Drupal "orchestrate" cache tag => page URL relation in some kind of key-value storage. This would significantly increase amount of HTTP Ban requests and it would be wise to research if there's any limit on the BAN requests that most of proxy caches pose (Varnish, CloudFlare, Nginx, etc)

bucefal91’s picture

Status: Active » Needs review
bucefal91’s picture

More thorough testing reveals a few more sweat spots for minifying bytes :)

Quentin Massez’s picture

Hello,

I had a problem of too large header due to the size of cache-tags.
This patch solved my problem. Thanks !

I also noticed something on cache tags that seems to me unoptimized : I had both user_list and user:uid cache tags.
But it's not really related to this issue and I should probably open another one.

GaëlG’s picture

I also noticed something on cache tags that seems to me unoptimized : I had both user_list and user:uid cache tags.
But it's not really related to this issue and I should probably open another one.

#3001276: Remove individual entity cache tags when an entity list cache tag is here (no need for node:42 if node_list is here)

GaëlG’s picture

Status: Needs review » Reviewed & tested by the community

For now the minification from patch #4 seems to be enough for our use case (not that much tags). I change this to "reviewed" because of Quentin's feedback. Thank you @bucefal91!

If it happened not to be enough in the future, there could be another optimization: remove every "node:XX" tag when there's a "node_list" tag (and the same for terms, etc.). It seems like this cannot be done easily in core, as I just figured out in #3001276: Remove individual entity cache tags when an entity list cache tag is here (no need for node:42 if node_list is here).

bucefal91’s picture

I am glad my patch was useful to others too. From my side, let me also mention that we've been running my patch #4 in production (D8 + Purge + Varnish setup) for half a year now. No issues came up.

MiroslavBanov’s picture

1. Why is there the #2 patch file committed as part of the #4 patch?

2. Should we add a Null CacheTagMinificator implementation, at least for debugging purposes, in case someone suspects issues with the minification?

3. Also please note that acquia_purge and cloudflare are using hashing. I don't think hashing should be dismissed easily, I think it is very effective, and it might be implemented in a follow-up issue and become a popular alternative service implementation.

GaëlG’s picture

@bucefal As our problematic website continued to grow, we hit the max header size again, even with the minifier. So that I implemented what I told in #7, as an addition to #4.

@MiroslavBanov
1. It's presumably a mistake.

2. A Null CacheTagMinificator implementation might be useful, as a follow-up issue maybe?

3. Yes, a follow-up issue for hashing might be useful. Personally I like this minification method because in case of bugs, tags are still somewhat "human-readable".

GaëlG’s picture

MiroslavBanov’s picture

Agreed, #9.2 and #9.3 can be implemented in a follow-up issue.

Nice that we are making this effort to optimize. I don't think that the currently proposed solution matches my needs very effectively, but that can be fixed in follow-ups. Cheers!

Wim Leers’s picture

Issue tags: +D8 cacheability
Related issues: +#2491561: Port to Drupal 8

I did something similar a long time ago at #2491561: Port to Drupal 8.

almunnings’s picture

I noticed that my tags weren't getting minified after patch #10 when sent back to varnish for a BAN command.

Should the [invalidation:expression] token perhaps call in a minifyCacheTag() ?

rp7’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
13.71 KB

I can confirm #14. After applying this patch, Varnish purges no longer work because the non-minified/simplified cache tags are sent in the BAN request.

The issue arises because the tokens [invalidations:separated_pipe], [invalidations:separated_comma] & [invalidations:separated_tab] don't use the minify/simplify services yet. Added the follow lines (having issues creating an interdiff) in purge_tokens_tokens():

\Drupal::service('purge.cache_tags_simplifier')->simplifyCacheTags($expressions);
$cache_tag_minificator = \Drupal::service('purge.cache_tag_minificator');
$expressions = array_map([$cache_tag_minificator, 'minifyCacheTag'], $expressions);

Updated patch attached.

GaëlG’s picture

@rp7 Nice catch! Here's an update of patch #15 so that it also works with the [invalidation:expression] token.

ndobromirov’s picture

Issue summary: View changes

This should handle menu_link_content_list as well.
Very often all pages that have headers and footers will have this and at least 10s of menu-links in there.

Maybe consider the shortening of:
menu_link_content:39 to either of:

  • m_l_c:39
  • mlc:39
  • ml:39

I think it's this module's responsibility to support all entities that are in core and can go into the tens of items on a page.

Please consider this core issue as well #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing.

ndobromirov’s picture

Overall thanks for this initiative.
With a manual test of all of the above ideas, I've managed to shrink a particular page header from 1760 bytes to 680

Media items should be minified as well.
media:1 -> md:1, ma:1 or m:1.

ndobromirov’s picture

Here are the proposed changes for the menu_link_content entities and their list tag.
Extended new interfaces doc-blocks with high level descriptions and examples.
Small tweaks here and there.

Adding needs tests tag due to the fact that the 2 new services are not unit-tested.

ndobromirov’s picture

Status: Needs work » Needs review
ndobromirov’s picture

Here is another version that support media minification.
As medias are way more used compared to the menus.
I am switching the replacements a bit and now media: gets the m: token prefix.

rp7’s picture

Patch in #15 for fixing the invalidations tokens applies the simplifier to the tokens. This transforms [ 'node:123', 'node_list' ] to [ 'node_list' ], which is obviously incorrect. Responses can very well be cached with only the node:123 cache tag, but not the node_list cache tag.

Removed this line, so now only the minificator is used for invalidating.

jonhattan’s picture

Added this minifications for core:

'image.style.': => 'is.'
'responsive_images.styles.': => 'ris.',

Also, I think paragraph: should be out since it is not core, but anyway changed to paragraph to also match paragraph_wrapper. If we're going to add common contrib modules, page_manager deserves a shorten (page_manager_route_name:page_manager.page_v_home).

Also added a simplification for media_list.

jonhattan’s picture

Another typo (responsive_images -> responsive_image)

MiroslavBanov’s picture

The solution is obviously very needed and a step up for a lot of sites. But I see a problem here that there is no way to disable any of this, except in code.

vtcore’s picture

Rerolled #25 for the latest 8.x-3.x-dev (commit 19adf78)

mpp’s picture

Added missing newline in purge.services.yml and rerolled against latest dev.

willeaton’s picture

This might need a rerole, it returns patch errors now. Just commenting for now.

seanB’s picture

seanB’s picture

Sorry, #30 contains a mistake. This is a correct reroll.

adamfranco’s picture

Status: Needs review » Reviewed & tested by the community

I've been using #31 for several days after running into huge headers caused by the Group module's many cache tags (its many-to-many relationships add tags for all three of the group, the group-content mapping-entity, and the content entity). This patch and the custom hook implementations below have reduced my cache-tag headers from the 8KB-17KB range down to 300B-3KB, a drastic improvement. I think this patch strikes the right balance of doing just enough to be helpful out-of-the-box while still being extensible and customizable to fit special needs.

 /**
  * Implements hook_purge_cache_tag_minify_dictionary().
  */
function mymodule_purge_cache_tag_minify_dictionary() {
   // Note that the underlying implementation uses strtr() which replaces keys
   // starting with the longest replacement first, and doesn't re-replace within
   // the results.
   return [
     // Common configs.
     'config:user.role.anonymous' => 'c:u.r.anon',
     'config:user.role.authenticated' => 'c:u.r.auth',
     'config:filter.format.' => 'c:ffmt.',
     'config:easy_breadcrumb.settings' => 'c:eb.s',
     'config:field.storage.' => 'c:f.st.',
     'responsive_image.styles.responsive_' => 'ris.r_',

     // Taxonomy.
     'taxonomy_vocabulary' => 'tv',
     'taxonomy_term' => 'tt',

     // Block Visibility Groups.
     'block_visibility_group:' => 'bvg:',
     'config:block_visibility_groups.block_visibility_group.' => 'c:bvgs.',

     // Workbench Access.
     'config:workbench_access.access_scheme.' => 'c:wba.as.',

     // Group.
     'group' => 'g',
     'group:' => 'g:',
     'group_content:' => 'gc:',
     'group_content_list' => 'gcl',
     'group_content_list:' => 'gcl:',
     'group_content_list:entity:' => 'gcl:e:',
     'group_content_list:plugin:' => 'gcl:p:',
     'group_content_list:plugin:group_membership' => 'gcl:p:gm',
     'group_content_list:plugin:group_membership:' => 'gcl:p:gm:',
     'group_content_list:plugin:group_membership:entity:' => 'gcl:p:gm:e:',
     'group_content_list:plugin:group_membership:group:' => 'gcl:p:gm:g:',
     'group_content_list:plugin:group_node' => 'gcl:p:gn',
     'group_content_list:plugin:group_node:' => 'gcl:p:gn:',
     'entity:' => 'e:',
   ];
 }

 /**
  * Implements hook_purge_cache_tags_simplify_dictionary().
  */
 function mymodule_purge_cache_tags_simplify_dictionary() {
   return [
     // If group_list is in the tags, then the cache will be invalidated if any
     // group changes, so there is no need to also list individual group:xx tags.
     'group_list' => '/^group\:/',
   ];
 }

The only potential improvement I can think of would be to make minification a configurable option -- I'd always want minification myself and I think it is a good default, but maybe others have a use case for the raw cache-tags.

lhridley’s picture

Status: Reviewed & tested by the community » Needs work

Marking Needs Work, patch is failing testing.

idebr’s picture

#33 The tests have been failing since Aug 2017, see https://www.drupal.org/pift-ci-job/1595419. This issue still needs tests though, so the issue status change is valid regardless.

Lendude’s picture

Dev tests should be fixed now by #3035053: Dev builds are broken., so requeued #31 to see if that doesn't break anything existing

kiseleva.t’s picture

Status: Needs work » Needs review
FileSize
7.24 KB

I've applied the patch, but in some cases, the Cache-Tags header is still too big.
Changed that similar to the Acquia Purge module to use hashes.

MiroslavBanov’s picture

About patch #36, I actually contributed a bit regarding the hash function, but it is less optimal as it's a base32, and that means more collisions. Here is the base64 version of it. This has 16777216 unique hashes for a 4 character base64 hash, rather than the 1048576 for a 4 character base32 hash.

Of course, in terms of a solution it's still not perfect - there can be collisions, there is still no way to disable it, and no way to switch to a different minification approach.

kiseleva.t’s picture

hugronaphor’s picture

I confirm that #38 solved the issue for me.

adamfranco’s picture

Comment #9 MiroslavBanov
...
2. Should we add a Null CacheTagMinificator implementation, at least for debugging purposes, in case someone suspects issues with the minification?

Comment #37 MiroslavBanov
Of course, in terms of a solution it's still not perfect - there can be collisions, there is still no way to disable it, and no way to switch to a different minification approach.

Especially as this patch drifts away from human-readable key-minification into more opaque hashes, I'm feeling like maybe it is time to make the minifier a plugin that can be configured or swapped. @MiroslavBanov's idea of a null-minifier would allow disabling minification, a key-only minifier (patch #31) could be used by those for whom it is sufficient and want to avoid collisions, and a hashing minifier (patch #38) could be optimized for absolute minimum size at the expense of collisions causing potentially unnecessary purges. Making minification optional would hopefully also allow for easier real-world testing of the different strategies since site administrators could change the setting live (with a cache rebuild) rather than needing to patch code and redeploy.

fago’s picture

Running into this issue as well.

I think adding pluggability is not necessarsy at this mount when all we would like is an option to disable this during development. So let's simply add a config option or even just a setting for it?

Sutharsan’s picture

Status: Needs review » Needs work
+++ b/src/CacheTagMinificator.php
@@ -0,0 +1,23 @@
+  public function minifyCacheTag($cache_tag) {
+    $length = 4;

According to my calculation this hash length results in a high risk of collision.
Using the calculation at https://preshing.com/20110504/hash-collision-probabilities/
I come with the following:

4 char hash: 1638 hashes: chance of unique 1.097E-9
6 char hash: 1170 hashes: chance of unique 0.960
7 char hash: 1023 hashes: chance of unique 0.998
8 char hash: 909 hashes: chance of unique 0.99990

Based on this, I propose a hash length of 8 chars.

Disclaimer: I'm a statistics noob.

MiroslavBanov’s picture

Making the hash longer makes the minification less effective. Could be better to have the option to configure the length.
Also note that a collision is usually not the end of the world. Some cached items would just be invalidated when they shouldn't.

aspilicious’s picture

This saved my life :)

mpp’s picture

Status: Needs work » Needs review
FileSize
32 KB

Fixed merge conflict.

mpp’s picture

I created an issue fork to facilitate further development on this issue. A patch is available on https://git.drupalcode.org/project/purge/-/merge_requests/5.diff

It would be useful to continue the work there.

astoker88’s picture

Re-rolled the patch to work against latest branch - PHP 7.4 Drupal 9.2.6

astoker88’s picture

#48 is broken fixed up here. @mpp didn't see yours until after id posted, so #49 is just your diff!

BramDriesen’s picture

Interesting, I'll also test this in a couple of days :)

astoker88’s picture

Looking through the patch here, we are applying the cache tags minification to other types of invalidators.
We use a combination of "tag" and "path", the tag purger works, however just noticed that the path invalidator is always returning a blank value, causing a lack of invalidation.

Suggest we add a check so that the token replacement minification happens only when the type is tag, not path or any other scenario.

GaëlG’s picture

We still had problems on very long views pages (maps) when using latest patches. It's because the simplification logic has been removed
since patch #36. So I made a little module which adds it back: https://www.drupal.org/project/cache_tags_simplify

jmaties’s picture

I think you are right adamfranco in #40
I have mixed both patches (#31 and last) and make it configurable.

uros.ceh’s picture

Re-rolling the patch #38 for compatibility with 3.4 module release.

Status: Needs review » Needs work

The last submitted patch, 54: purge_cache_tags_minify-2952277-54.patch, failed testing. View results

Renrhaf’s picture

Patch works well, no more 502 errors because headers are too big for Nginx.
Thank you.

masipila’s picture

I'm testing (and reading) the patch #57.

+/**
+ * Implements hook_purge_cache_tags_simplify_dictionary().
+ */
+function purge_purge_cache_tags_simplify_dictionary() {
+  // Most common list cache tags.
+  return [
+    'config:block_list' => '/^config\:block\./',
+    'menu_link_content_list' => '/^menu_link_content\:/',
+    'media_list' => '/^media\:/',
+    'node_list' => '/^node\:/',
+    'file_list' => '/^file\:/',
+    'taxonomy_term_list' => '/^taxonomy_term\:/',
+    'user_list' => '/^user\:/',
+  ];
+}

Could somebody please elaborate once more what's the conceptual logic behind this?

Isn't it so that:

  • An individual node page, will only have node:42 cache tag
  • But a View that renders nodes will have also node_list (and node_list:<bundle>) so that whenever new nodes have been created, the Views that render these nodes will be invalidated.

So are we throwing away the node:xxx tags if and only if also the node_list tag is also present? In other words, the individual node pages would still have the node:xxx cache tags?

If my thinking-out-loud is correct here, can we please elaborate the documentation block of purge_purge_cache_tags_simplify_dictionary()?

Cheers,
Markus

afinnarn’s picture

I'm not sure how helpful this comment is, but I tried the patch in #54 and got the following error.

The website encountered an unexpected error. Please try again later.
ArgumentCountError: Too few arguments to function Drupal\purge\Plugin\Purge\TagsHeader\TagsHeaderBase::__construct(), 3 passed in /var/www/html/docroot/modules/contrib/acquia_purge/src/Plugin/Purge/TagsHeader/AcquiaCloudSiteHeader.php on line 41 and exactly 4 expected in Drupal\purge\Plugin\Purge\TagsHeader\TagsHeaderBase->__construct() (line 32 of modules/contrib/purge/src/Plugin/Purge/TagsHeader/TagsHeaderBase.php). 

I had to switch to other work and didn't have time to look into the error, but I will update the thread if I do. Posting since someone else might encounter the error or know how to fix in a patch.

afinnarn’s picture

Okay, the relevant patch in the Acquia Purge module is here and it says to re-roll into this patch: https://www.drupal.org/project/acquia_purge/issues/3157178

I will look into this now in my local dev environment.

NickDJM’s picture

Rerolled #57 for latest version.

bceyssens’s picture

I'm not a fan of the hashing solution as you have no idea what the content is. The dictionary looks like a lot of work to maintain.
As an alternative method I added compression strategy, keeping only the first letter of each word. It is fully automated and you could still retrieve the original cache tag.

As you should still have the option to disable minification logic completely, I changed the configuration form to a select.
Example minitication ratios (number of characters in the cache tags header):

  • None: 4030
  • Dictionary: 2337
  • Compress: 1300
  • Hash: 734
bceyssens’s picture

FileSize
14.86 KB

Replacing incorrect patch