Problem/Motivation

Most webservers have a maximum size for headers.

Proposed resolution

We could implement an (optional) hashing strategy that would significantly reduce the size of the headers that are sent out.

Acquia purge already includes a similar solution.

CommentFileSizeAuthor
#92 2952277-92.patch18.73 KBduaelfr
#80 purge-2952277-MR5--20251006-2.diff20.94 KBrecrit
#77 2952277-77.patch19.83 KBjelle_s
#76 interdiff.txt7.88 KBjelle_s
#76 2952277-76.patch381.79 KBjelle_s
#72 purge_cache_tags_minify-2952277-72.patch15.46 KBj_bekaert
#71 purge_cache_tags_minify-2952277-71.patch12.84 KBmichel.g
#70 purge_cache_tags_minify-2952277-70.patch12.38 KBmattbloomfield
#69 interdiff_66-69.txt7.93 KBdruplr
#69 purge_cache_tags_minify-2952277-69.patch7.22 KBdruplr
#66 purge_cache_tags_minify-2952277-66.patch7.35 KBmattbloomfield
#65 purge_cache_tags_minify-2952277-65.patch7.34 KBmattbloomfield
#63 purge_cache_tags_minify-2952277-63.patch14.86 KBbceyssens
#62 purge_cache_tags_minify-2952277-62.patch16.25 KBbceyssens
#61 purge_cache_tags_minify-2952277-61.patch14.51 KBnickdjm
#57 purge_cache_tags_minify-2952277-57.patch14.32 KBrp7
#54 purge_cache_tags_minify-2952277-54.patch7.05 KBuros.ceh
#53 purge_cache_tags_minify-2952277-53.patch14.54 KBjmaties
#51 purge_cache_tags_restrict_tags-2952277.patch846 bytesastoker88
#49 purge_cache_tags_minify-2952277-49.patch7.44 KBastoker88
#48 purge_cache_tags_minify-2952277-48.patch5.32 KBastoker88
#45 3069291_9.3.patch32 KBmpp
#38 purge_cache_tags_minify-2952277-38.patch7.04 KBkiseleva.t
#37 interdiff-36-37.txt1.1 KBmiroslavbanov
#37 purge_cache_tags_minify-2952277-37.patch7.04 KBmiroslavbanov
#36 purge_cache_tags_minify-2952277-36.patch7.24 KBkiseleva.t
#31 2952277_31.patch14.92 KBseanb
#30 2952277_30.patch14.97 KBseanb
#28 2952277_28.patch14.85 KBmpp
#27 purge-2952277-27.patch14.88 KBvtcore
#25 purge-2952277-25.patch15.26 KBjonhattan
#24 purge-2952277-24.patch15.26 KBjonhattan
#2 2952277-cache-tag-minify-2.patch7.95 KBbucefal91
#4 2952277-cache-tag-minify-4.patch16.41 KBbucefal91
#4 interdiff-2-4.txt468 bytesbucefal91
#10 purge-2952277-10.patch13 KBgaëlg
#10 purge-2952277-4-10.interdiff.txt7.97 KBgaëlg
#10 interdiff-4-10.txt7.97 KBgaëlg
#15 purge-minify_cache_tags-2952277-15.patch13.71 KBrp7
#16 purge-minify_cache_tags-2952277-16.patch14.15 KBgaëlg
#16 purge-minify_cache_tags-2952277-16-15.interdiff.txt673 bytesgaëlg
#19 purge-2952277-19.patch15.16 KBndobromirov
#19 interdiff-2952277-16-19.patch4.26 KBndobromirov
#21 interdiff-2952277-19-21.patch377 bytesndobromirov
#21 purge-2952277-21.patch15.18 KBndobromirov
#22 purge-2952277-22.patch15.09 KBrp7
#22 interdiff-2952277-21-22.patch605 bytesrp7
#23 interdiff-2952277-22-23.patch2.52 KBjonhattan
#23 purge-2952277-23.patch15.26 KBjonhattan

Issue fork purge-2952277

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

mpp created an issue. See original summary.

bucefal91’s picture

StatusFileSize
new7.95 KB

hello!

Just recently I leveraged purge module for Varnish caching and we do hit the issue of too long headers (apache + mod_fcgi). The limit is 8kb and I think one has to re-compile apache in order to bump it (https://maxchadwick.xyz/blog/http-response-header-size-limit-with-mod-pr... , https://github.com/webdevops/TYPO3-metaseo/issues/233)

I read through #2844852: Minify the cache tags sent in the header and I understand the concerns of nielsvm. I still think running a minor risk of hash collision is better than running no Varnish on anonymous HTML :)

I would propose to introduce some kind of salt into the hash to cover the issues of collision. Considering https://www.drupal.org/project/acquia_purge/issues/2844852#comment-12513413 .. basically for the website developers is OK to have some collisions, it is very undesirable to have such a collision where one tag is updated very frequently and another is part of too many pages. Then it would invalidate too much of cache too frequently.
If developer encounters such case in his setup, he could just change the salt to kind of re-distribute the hashes in a new fashion. With certain (rather high) degree of luck, a new salt will change collisions and the "frequent" and the "popular" cache tags will no longer collide. It does require some thinking and analysis from the developer, but that's at least some kind of a solution. The salt could be kept in Drupal State API and we could expose some "reset salt" button on the admin UI.

Also, is there any particular reason md5() was used in https://www.drupal.org/project/acquia_purge/issues/2844852 ? I am by far not an expert in cryptography and hashing, but nowadays md5 used even in non-sensitive context will trigger warnings and questions upon a security audit. I checked, D8 core password hashing routine uses hash('sha512')

On a different note, there is also an alternative to pure "blind" hashing. Hitting 8kb limit is not so easy, after all, for a single page. Normally it should be some "family" of cache tags that "explode" the length of the header. In the case of https://www.drupal.org/project/acquia_purge/issues/2844852 it was taxonomy terms. In my case there are few such families: paragraph:*, node:*. If our real estate (header size) is limited, we should optimize its consumption. We could abbreviate (compress) the most common (at least in Drupal core) tags to 1 letter. Things like (using regexp syntax to make it shorter):

  • node:(\d+) => n:$1
  • config:(.+) => c:$1
  • block => b
  • taxonomy_term:(\d+) => t:$1
  • paragraph:(\d+) => p:$1

Such rather primitive string substitutions can significantly help in shrinking length of the header. Also, they do not obfuscate matters as badly as md5 (that's to consider #2890018: Introduce a debug mode for tags not to be minified in the HTTP headers). They also expose a risk of collisions, but I think the risk of undesired collisions as I explained them above is smaller with the substring substitution. The patch I attach takes this approach - I introduce a minification service (so both Cache-Tag header and Purger plugins get to use the same logic on minification without code duplication). I also introduce a hook to collect the substring replacement in cache tags - through this hook one could tailor the replacements to his own specific case (in case minifying core tags is not enough and his website has lots of custom cache-tags). The patch works quite well on my localhost. I am going to run a few more tests and will deploy it onto the server where the website I am working on is hosted.

And the last thought... If your page has more than 1k cache tags (8 bytes per cache tag) than there is nearly no way to solve it by any kind of manipulation/minimifaction. I guess fundamentally proper way is to have Drupal "orchestrate" cache tag => page URL relation in some kind of key-value storage. This would significantly increase amount of HTTP Ban requests and it would be wise to research if there's any limit on the BAN requests that most of proxy caches pose (Varnish, CloudFlare, Nginx, etc)

bucefal91’s picture

Status: Active » Needs review
bucefal91’s picture

StatusFileSize
new16.41 KB
new468 bytes

More thorough testing reveals a few more sweat spots for minifying bytes :)

quentin massez’s picture

Hello,

I had a problem of too large header due to the size of cache-tags.
This patch solved my problem. Thanks !

I also noticed something on cache tags that seems to me unoptimized : I had both user_list and user:uid cache tags.
But it's not really related to this issue and I should probably open another one.

gaëlg’s picture

I also noticed something on cache tags that seems to me unoptimized : I had both user_list and user:uid cache tags.
But it's not really related to this issue and I should probably open another one.

#3001276: Remove individual entity cache tags when an entity list cache tag is here (no need for node:42 if node_list is here)

gaëlg’s picture

Status: Needs review » Reviewed & tested by the community

For now the minification from patch #4 seems to be enough for our use case (not that much tags). I change this to "reviewed" because of Quentin's feedback. Thank you @bucefal91!

If it happened not to be enough in the future, there could be another optimization: remove every "node:XX" tag when there's a "node_list" tag (and the same for terms, etc.). It seems like this cannot be done easily in core, as I just figured out in #3001276: Remove individual entity cache tags when an entity list cache tag is here (no need for node:42 if node_list is here).

bucefal91’s picture

I am glad my patch was useful to others too. From my side, let me also mention that we've been running my patch #4 in production (D8 + Purge + Varnish setup) for half a year now. No issues came up.

miroslavbanov’s picture

1. Why is there the #2 patch file committed as part of the #4 patch?

2. Should we add a Null CacheTagMinificator implementation, at least for debugging purposes, in case someone suspects issues with the minification?

3. Also please note that acquia_purge and cloudflare are using hashing. I don't think hashing should be dismissed easily, I think it is very effective, and it might be implemented in a follow-up issue and become a popular alternative service implementation.

gaëlg’s picture

StatusFileSize
new13 KB
new7.97 KB
new7.97 KB

@bucefal As our problematic website continued to grow, we hit the max header size again, even with the minifier. So that I implemented what I told in #7, as an addition to #4.

@MiroslavBanov
1. It's presumably a mistake.

2. A Null CacheTagMinificator implementation might be useful, as a follow-up issue maybe?

3. Yes, a follow-up issue for hashing might be useful. Personally I like this minification method because in case of bugs, tags are still somewhat "human-readable".

gaëlg’s picture

miroslavbanov’s picture

Agreed, #9.2 and #9.3 can be implemented in a follow-up issue.

Nice that we are making this effort to optimize. I don't think that the currently proposed solution matches my needs very effectively, but that can be fixed in follow-ups. Cheers!

wim leers’s picture

Issue tags: +D8 cacheability
Related issues: +#2491561: Port to Drupal 8

I did something similar a long time ago at #2491561: Port to Drupal 8.

almunnings’s picture

I noticed that my tags weren't getting minified after patch #10 when sent back to varnish for a BAN command.

Should the [invalidation:expression] token perhaps call in a minifyCacheTag() ?

rp7’s picture

Status: Reviewed & tested by the community » Needs review
StatusFileSize
new13.71 KB

I can confirm #14. After applying this patch, Varnish purges no longer work because the non-minified/simplified cache tags are sent in the BAN request.

The issue arises because the tokens [invalidations:separated_pipe], [invalidations:separated_comma] & [invalidations:separated_tab] don't use the minify/simplify services yet. Added the follow lines (having issues creating an interdiff) in purge_tokens_tokens():

\Drupal::service('purge.cache_tags_simplifier')->simplifyCacheTags($expressions);
$cache_tag_minificator = \Drupal::service('purge.cache_tag_minificator');
$expressions = array_map([$cache_tag_minificator, 'minifyCacheTag'], $expressions);

Updated patch attached.

gaëlg’s picture

@rp7 Nice catch! Here's an update of patch #15 so that it also works with the [invalidation:expression] token.

ndobromirov’s picture

Issue summary: View changes

This should handle menu_link_content_list as well.
Very often all pages that have headers and footers will have this and at least 10s of menu-links in there.

Maybe consider the shortening of:
menu_link_content:39 to either of:

  • m_l_c:39
  • mlc:39
  • ml:39

I think it's this module's responsibility to support all entities that are in core and can go into the tens of items on a page.

Please consider this core issue as well #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing.

ndobromirov’s picture

Overall thanks for this initiative.
With a manual test of all of the above ideas, I've managed to shrink a particular page header from 1760 bytes to 680

Media items should be minified as well.
media:1 -> md:1, ma:1 or m:1.

ndobromirov’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests
StatusFileSize
new15.16 KB
new4.26 KB

Here are the proposed changes for the menu_link_content entities and their list tag.
Extended new interfaces doc-blocks with high level descriptions and examples.
Small tweaks here and there.

Adding needs tests tag due to the fact that the 2 new services are not unit-tested.

ndobromirov’s picture

Status: Needs work » Needs review
ndobromirov’s picture

StatusFileSize
new377 bytes
new15.18 KB

Here is another version that support media minification.
As medias are way more used compared to the menus.
I am switching the replacements a bit and now media: gets the m: token prefix.

rp7’s picture

StatusFileSize
new15.09 KB
new605 bytes

Patch in #15 for fixing the invalidations tokens applies the simplifier to the tokens. This transforms [ 'node:123', 'node_list' ] to [ 'node_list' ], which is obviously incorrect. Responses can very well be cached with only the node:123 cache tag, but not the node_list cache tag.

Removed this line, so now only the minificator is used for invalidating.

jonhattan’s picture

StatusFileSize
new15.26 KB
new2.52 KB

Added this minifications for core:

'image.style.': => 'is.'
'responsive_images.styles.': => 'ris.',

Also, I think paragraph: should be out since it is not core, but anyway changed to paragraph to also match paragraph_wrapper. If we're going to add common contrib modules, page_manager deserves a shorten (page_manager_route_name:page_manager.page_v_home).

Also added a simplification for media_list.

jonhattan’s picture

StatusFileSize
new15.26 KB

There was a typo in my last patch

jonhattan’s picture

StatusFileSize
new15.26 KB

Another typo (responsive_images -> responsive_image)

miroslavbanov’s picture

The solution is obviously very needed and a step up for a lot of sites. But I see a problem here that there is no way to disable any of this, except in code.

vtcore’s picture

StatusFileSize
new14.88 KB

Rerolled #25 for the latest 8.x-3.x-dev (commit 19adf78)

mpp’s picture

StatusFileSize
new14.85 KB

Added missing newline in purge.services.yml and rerolled against latest dev.

willeaton’s picture

This might need a rerole, it returns patch errors now. Just commenting for now.

seanb’s picture

StatusFileSize
new14.97 KB

Rerolled.

seanb’s picture

StatusFileSize
new14.92 KB

Sorry, #30 contains a mistake. This is a correct reroll.

adamfranco’s picture

Status: Needs review » Reviewed & tested by the community

I've been using #31 for several days after running into huge headers caused by the Group module's many cache tags (its many-to-many relationships add tags for all three of the group, the group-content mapping-entity, and the content entity). This patch and the custom hook implementations below have reduced my cache-tag headers from the 8KB-17KB range down to 300B-3KB, a drastic improvement. I think this patch strikes the right balance of doing just enough to be helpful out-of-the-box while still being extensible and customizable to fit special needs.

 /**
  * Implements hook_purge_cache_tag_minify_dictionary().
  */
function mymodule_purge_cache_tag_minify_dictionary() {
   // Note that the underlying implementation uses strtr() which replaces keys
   // starting with the longest replacement first, and doesn't re-replace within
   // the results.
   return [
     // Common configs.
     'config:user.role.anonymous' => 'c:u.r.anon',
     'config:user.role.authenticated' => 'c:u.r.auth',
     'config:filter.format.' => 'c:ffmt.',
     'config:easy_breadcrumb.settings' => 'c:eb.s',
     'config:field.storage.' => 'c:f.st.',
     'responsive_image.styles.responsive_' => 'ris.r_',

     // Taxonomy.
     'taxonomy_vocabulary' => 'tv',
     'taxonomy_term' => 'tt',

     // Block Visibility Groups.
     'block_visibility_group:' => 'bvg:',
     'config:block_visibility_groups.block_visibility_group.' => 'c:bvgs.',

     // Workbench Access.
     'config:workbench_access.access_scheme.' => 'c:wba.as.',

     // Group.
     'group' => 'g',
     'group:' => 'g:',
     'group_content:' => 'gc:',
     'group_content_list' => 'gcl',
     'group_content_list:' => 'gcl:',
     'group_content_list:entity:' => 'gcl:e:',
     'group_content_list:plugin:' => 'gcl:p:',
     'group_content_list:plugin:group_membership' => 'gcl:p:gm',
     'group_content_list:plugin:group_membership:' => 'gcl:p:gm:',
     'group_content_list:plugin:group_membership:entity:' => 'gcl:p:gm:e:',
     'group_content_list:plugin:group_membership:group:' => 'gcl:p:gm:g:',
     'group_content_list:plugin:group_node' => 'gcl:p:gn',
     'group_content_list:plugin:group_node:' => 'gcl:p:gn:',
     'entity:' => 'e:',
   ];
 }

 /**
  * Implements hook_purge_cache_tags_simplify_dictionary().
  */
 function mymodule_purge_cache_tags_simplify_dictionary() {
   return [
     // If group_list is in the tags, then the cache will be invalidated if any
     // group changes, so there is no need to also list individual group:xx tags.
     'group_list' => '/^group\:/',
   ];
 }

The only potential improvement I can think of would be to make minification a configurable option -- I'd always want minification myself and I think it is a good default, but maybe others have a use case for the raw cache-tags.

lisa.rae’s picture

Status: Reviewed & tested by the community » Needs work

Marking Needs Work, patch is failing testing.

idebr’s picture

#33 The tests have been failing since Aug 2017, see https://www.drupal.org/pift-ci-job/1595419. This issue still needs tests though, so the issue status change is valid regardless.

lendude’s picture

Dev tests should be fixed now by #3035053: Dev builds are broken., so requeued #31 to see if that doesn't break anything existing

kiseleva.t’s picture

Status: Needs work » Needs review
StatusFileSize
new7.24 KB

I've applied the patch, but in some cases, the Cache-Tags header is still too big.
Changed that similar to the Acquia Purge module to use hashes.

miroslavbanov’s picture

About patch #36, I actually contributed a bit regarding the hash function, but it is less optimal as it's a base32, and that means more collisions. Here is the base64 version of it. This has 16777216 unique hashes for a 4 character base64 hash, rather than the 1048576 for a 4 character base32 hash.

Of course, in terms of a solution it's still not perfect - there can be collisions, there is still no way to disable it, and no way to switch to a different minification approach.

kiseleva.t’s picture

StatusFileSize
new7.04 KB

Fixed small syntax error in patch #37 (appears in #36).

hugronaphor’s picture

I confirm that #38 solved the issue for me.

adamfranco’s picture

Comment #9 MiroslavBanov
...
2. Should we add a Null CacheTagMinificator implementation, at least for debugging purposes, in case someone suspects issues with the minification?

Comment #37 MiroslavBanov
Of course, in terms of a solution it's still not perfect - there can be collisions, there is still no way to disable it, and no way to switch to a different minification approach.

Especially as this patch drifts away from human-readable key-minification into more opaque hashes, I'm feeling like maybe it is time to make the minifier a plugin that can be configured or swapped. @MiroslavBanov's idea of a null-minifier would allow disabling minification, a key-only minifier (patch #31) could be used by those for whom it is sufficient and want to avoid collisions, and a hashing minifier (patch #38) could be optimized for absolute minimum size at the expense of collisions causing potentially unnecessary purges. Making minification optional would hopefully also allow for easier real-world testing of the different strategies since site administrators could change the setting live (with a cache rebuild) rather than needing to patch code and redeploy.

fago’s picture

Running into this issue as well.

I think adding pluggability is not necessarsy at this mount when all we would like is an option to disable this during development. So let's simply add a config option or even just a setting for it?

sutharsan’s picture

Status: Needs review » Needs work
+++ b/src/CacheTagMinificator.php
@@ -0,0 +1,23 @@
+  public function minifyCacheTag($cache_tag) {
+    $length = 4;

According to my calculation this hash length results in a high risk of collision.
Using the calculation at https://preshing.com/20110504/hash-collision-probabilities/
I come with the following:

4 char hash: 1638 hashes: chance of unique 1.097E-9
6 char hash: 1170 hashes: chance of unique 0.960
7 char hash: 1023 hashes: chance of unique 0.998
8 char hash: 909 hashes: chance of unique 0.99990

Based on this, I propose a hash length of 8 chars.

Disclaimer: I'm a statistics noob.

miroslavbanov’s picture

Making the hash longer makes the minification less effective. Could be better to have the option to configure the length.
Also note that a collision is usually not the end of the world. Some cached items would just be invalidated when they shouldn't.

aspilicious’s picture

This saved my life :)

mpp’s picture

Status: Needs work » Needs review
StatusFileSize
new32 KB

Fixed merge conflict.

mpp’s picture

I created an issue fork to facilitate further development on this issue. A patch is available on https://git.drupalcode.org/project/purge/-/merge_requests/5.diff

It would be useful to continue the work there.

astoker88’s picture

StatusFileSize
new5.32 KB

Re-rolled the patch to work against latest branch - PHP 7.4 Drupal 9.2.6

astoker88’s picture

StatusFileSize
new7.44 KB

#48 is broken fixed up here. @mpp didn't see yours until after id posted, so #49 is just your diff!

bramdriesen’s picture

Interesting, I'll also test this in a couple of days :)

astoker88’s picture

StatusFileSize
new846 bytes

Looking through the patch here, we are applying the cache tags minification to other types of invalidators.
We use a combination of "tag" and "path", the tag purger works, however just noticed that the path invalidator is always returning a blank value, causing a lack of invalidation.

Suggest we add a check so that the token replacement minification happens only when the type is tag, not path or any other scenario.

gaëlg’s picture

We still had problems on very long views pages (maps) when using latest patches. It's because the simplification logic has been removed
since patch #36. So I made a little module which adds it back: https://www.drupal.org/project/cache_tags_simplify

jmaties’s picture

StatusFileSize
new14.54 KB

I think you are right adamfranco in #40
I have mixed both patches (#31 and last) and make it configurable.

uros.ceh’s picture

StatusFileSize
new7.05 KB

Re-rolling the patch #38 for compatibility with 3.4 module release.

Status: Needs review » Needs work

The last submitted patch, 54: purge_cache_tags_minify-2952277-54.patch, failed testing. View results

renrhaf’s picture

Patch works well, no more 502 errors because headers are too big for Nginx.
Thank you.

rp7’s picture

StatusFileSize
new14.32 KB

A few changes to patch in #53:

  • Cleaned up the settings form
  • Added config schema definition for the simplification_logic config property
masipila’s picture

I'm testing (and reading) the patch #57.

+/**
+ * Implements hook_purge_cache_tags_simplify_dictionary().
+ */
+function purge_purge_cache_tags_simplify_dictionary() {
+  // Most common list cache tags.
+  return [
+    'config:block_list' => '/^config\:block\./',
+    'menu_link_content_list' => '/^menu_link_content\:/',
+    'media_list' => '/^media\:/',
+    'node_list' => '/^node\:/',
+    'file_list' => '/^file\:/',
+    'taxonomy_term_list' => '/^taxonomy_term\:/',
+    'user_list' => '/^user\:/',
+  ];
+}

Could somebody please elaborate once more what's the conceptual logic behind this?

Isn't it so that:

  • An individual node page, will only have node:42 cache tag
  • But a View that renders nodes will have also node_list (and node_list:<bundle>) so that whenever new nodes have been created, the Views that render these nodes will be invalidated.

So are we throwing away the node:xxx tags if and only if also the node_list tag is also present? In other words, the individual node pages would still have the node:xxx cache tags?

If my thinking-out-loud is correct here, can we please elaborate the documentation block of purge_purge_cache_tags_simplify_dictionary()?

Cheers,
Markus

afinnarn’s picture

I'm not sure how helpful this comment is, but I tried the patch in #54 and got the following error.

The website encountered an unexpected error. Please try again later.
ArgumentCountError: Too few arguments to function Drupal\purge\Plugin\Purge\TagsHeader\TagsHeaderBase::__construct(), 3 passed in /var/www/html/docroot/modules/contrib/acquia_purge/src/Plugin/Purge/TagsHeader/AcquiaCloudSiteHeader.php on line 41 and exactly 4 expected in Drupal\purge\Plugin\Purge\TagsHeader\TagsHeaderBase->__construct() (line 32 of modules/contrib/purge/src/Plugin/Purge/TagsHeader/TagsHeaderBase.php). 

I had to switch to other work and didn't have time to look into the error, but I will update the thread if I do. Posting since someone else might encounter the error or know how to fix in a patch.

afinnarn’s picture

Okay, the relevant patch in the Acquia Purge module is here and it says to re-roll into this patch: https://www.drupal.org/project/acquia_purge/issues/3157178

I will look into this now in my local dev environment.

nickdjm’s picture

StatusFileSize
new14.51 KB

Rerolled #57 for latest version.

bceyssens’s picture

StatusFileSize
new16.25 KB

I'm not a fan of the hashing solution as you have no idea what the content is. The dictionary looks like a lot of work to maintain.
As an alternative method I added compression strategy, keeping only the first letter of each word. It is fully automated and you could still retrieve the original cache tag.

As you should still have the option to disable minification logic completely, I changed the configuration form to a select.
Example minitication ratios (number of characters in the cache tags header):

  • None: 4030
  • Dictionary: 2337
  • Compress: 1300
  • Hash: 734
bceyssens’s picture

StatusFileSize
new14.86 KB

Replacing incorrect patch

swatigarg06’s picture

I tested patch #63 and it seems to be failing for 8.x-3.4.

mattbloomfield’s picture

I rerolled an older patch for the new version of the module.

mattbloomfield’s picture

StatusFileSize
new7.35 KB

Please disregard the last one, here's a new patch

andypost’s picture

dasginganinja’s picture

I just went through the past issues and tried to get an understanding of what's going on in this thread. Hopefully this summary helps guide this issue towards an actual patch.

I was starting with patch #49.

There is a MR for this thread that is based on #49 available here: Merge Request !5

Notes for #49 patch / #66
- Patch by default uses MD5 with first 4 characters in #49 and what is expected with this patch for those who has applied it
- Patch 66 seems like it's #49 version rerolled
- Patch #51 can be added to #49 to limit it to just being tags. This is an issue with path invalidation cases and should be applied.
- This does not have any configuration / schema

New functionality / schema changes in #61/63
- Seems like there is a lot of configurability in these -- good work!
- 61 vs 63 was the latest comparison for the _NEW_ work for alternate methods of shortening tags
- Both contain dictionary and MD5 approaches
- 63 has a new "compress" function as well as all the previous work (except #50?)
- 63 changes from simplification_logic to minification_logic configuration key. This means that anybody on simplification_logic key will need to migrate to that. Perhaps we fall back to the previous simplification_logic key?
- 63 will allow the original tag through if there is no config or it is set to null
- 61 will default to md5 if the simplification_logic config is not present

Missing functionality overlooked in #51 as this logic applies to path type expressions as well. This should be incorporated into all of these patches.

I think it would be great if #61 / #63 could be combined into a similar request, using the simplification_logic key. The code in #63 is a lot easier to read FWIW.

I hope this helps move things forward.

druplr’s picture

StatusFileSize
new7.22 KB
new7.93 KB

Here is @mattbloomfield's patch #66 (which as @dasginganinja indicated, is #49 rerolled) with whitespaces corrected. This does not include the new functionality / schema changes in #61/63.

mattbloomfield’s picture

StatusFileSize
new12.38 KB

Thanks druplr. I used your patch to reroll #61. I saw #63 was failing, so I'm not sure what we might want to do with that.

michel.g’s picture

StatusFileSize
new12.84 KB

Add a check on TagInvalidation because this code was also replacing tokens on other validators (e.g. URL)

j_bekaert’s picture

StatusFileSize
new15.46 KB

Combination of #71 with the added functionality of #63.

prudloff made their first commit to this issue’s fork.

prudloff’s picture

I updated the MR with the latest patch from comment 72.
We still need to add tests.

I also feel like this is getting hard to review.
It might be best to split this into several smaller issues (one for the base system and one for each minification strategy for example).

jelle_s’s picture

It seems like #52 got overlooked. Somewhere in this discussion the simplification logic was removed from the patch, while some hooks dealing with simplification are still in there. I would suggest adding the simplification of the dictionary back in, minification by the dictionary alone seems to not be enough in some cases.

jelle_s’s picture

StatusFileSize
new381.79 KB
new7.88 KB

I re-added the simplify logic and added a setting for it.

Side note: I believe the form for these settings and the route to that form should be moved to the purge_ui module?

jelle_s’s picture

StatusFileSize
new19.83 KB

Patch didn't apply to 3.6 for me, this one should

tuwebo’s picture

Hello,
I've just run into this issue. Given that #3395776: Make POST requests render cacheable I was wondering what is going to happen when using the "Mignification logic: Compress", since it looks like the core RenderCache::get is hardcoding "CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD" and the mignification logic is converting it as cachemissifuncacheablehttpmethod:f

  public function get(array $elements) {
    if (!$this->isElementCacheable($elements)) {
      return FALSE;
    }

    $bin = $elements['#cache']['bin'] ?? 'render';
    if (($cache_bin = $this->cacheFactory->get($bin)) && $cache = $cache_bin->get($elements['#cache']['keys'], CacheableMetadata::createFromRenderArray($elements))) {
      if (!$this->requestStack->getCurrentRequest()->isMethodCacheable()) {
        if (!empty(array_filter($cache->tags, fn (string $tag) => str_starts_with($tag, 'CACHE_MISS_IF_UNCACHEABLE_HTTP_METHOD:')))) {
          return FALSE;
        }
      }
      return $cache->data;
    }
    return FALSE;
  }

Should I use the dictionary logic instead or is there any other configuration that I am missing?

recrit made their first commit to this issue’s fork.

recrit’s picture

StatusFileSize
new20.94 KB

Uploading a static patch for the latest MR updates that supports simplifying {ENTITY TYPE}_list:{ENTITY BUNDLE}. Example: "node_list:article".

recrit’s picture

After doing some more testing with "Simplify cache tags" enabled, there is a fundamental issue with this feature.

The cache tag header is used by a Varnish to tag the cached response that it receives from Drupal. When cache tag simplification used, the "{ENTITY_TYPE}:{ENTITY_ID}" tag (e.g. "node:1234") is removed from the header value when the node is saved.
For example:
- Drupal cache tags for the headers: "node:1234", "node_list" . (EDIT 1) The list tag is being added by a block / paragraph on the page that displays a list of related nodes.
- Simplified tags: "node_list"
- RESULT = BUG: Varnish does not tag the node page with it's own tag "node:1234". So if the Drupal cache tag "node:1234" is invalidated by itself, then the Varnish page is not invalidated. You have to invalidate it with "node_list" which is wrong since it clears all pages.

EDIT 1: Clarified how the "node_list" Drupal cache tag was being added to the page.

prudloff’s picture

I don't think it's a bug, node_list is invalidated when any node is updated, so node:1234 is redundant here.

When a node is updated, Drupal invalidates both the node:xxx and node_list tags.

recrit’s picture

@prudloff Not a hard bug, but maybe a warning / caveat for others. You can no longer run ` drush p:invalidate tag node:1`, it will not work. Anyone using "Simplify cache tags" should evaluate their pages to confirm the expected cache tags are present.

For my use case, I was able to use `hook_purge_cache_tags_simplify_dictionary_alter()` to ensure that the current page's entity context had tags for the node, group content / relation, and group. So that clearing any of those specific tags, clears those pages.

geek-merlin’s picture

Status: Needs work » Closed (duplicate)

Judging from the IS, this is a dup of #2844620: Automatically split cache debug headers into multiple lines when they exceed 8k, which fixes the problem in all cases, and is already rtbc.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

driskell’s picture

Status: Closed (duplicate) » Needs work

@geek-merlin I think this is related and a good solution but not a duplicate as that issue only impacts the debug headers but this issue was about the Cache-Tags header the contrib module appends in its TagsHeader plugins. The same solution could be used potentially.

seanb’s picture

Maybe this is unrelated, but we recently had an issue with large headers in our tests. We got nginx throwing the infamous upstream sent too big header while reading response header.

For us, this was triggered by the _drupal_error_header() function, adding a X-Drupal-Assertion for each error or notice in the tests. After the Drupal 11 update, we had a lot of deprecated code that is now still working, but needs to be fixed before we update to Drupal 12. Having a lot of these notices caused nginx to throw the error.

Obviously, we need to fix the deprecations. For now, we added a workaround in our settings.local.php for the tests.

  // Disable deprecation errors in tests.
  set_error_handler(static function ($error_level, $message, $filename = NULL, $line = NULL) {
    // Ignore deprecations in tests.
    if ($error_level !== E_USER_DEPRECATED && $error_level !== E_DEPRECATED) {
      _drupal_error_handler($error_level, $message, $filename, $line);
    }
  });

Just posting this here since it was very difficult to debug this weird nginx error for our tests. Hope this helps someone.

jan kellermann’s picture

@geek-merlin

This would be the patch:

diff --git a/src/EventSubscriber/CacheableResponseSubscriber.php b/src/EventSubscriber/CacheableResponseSubscriber.php
index f9fe74c..9a121d7 100644
--- a/src/EventSubscriber/CacheableResponseSubscriber.php
+++ b/src/EventSubscriber/CacheableResponseSubscriber.php
@@ -66,7 +66,10 @@ class CacheableResponseSubscriber implements EventSubscriberInterface {
             throw new \LogicException("Header plugin '$pluginId' should return a non-empty string on ::getHeaderName()!");
           }

-          $response->headers->set($name, $header->getValue($tags));
+          $values = $header->getValue($tags);
+          foreach (explode("\n", wordwrap($values, 8000)) as $delta => $row) {
+            $response->headers->set($name . ($delta > 0 ? "-{$delta}" : ''), $row);
+          }
         }
       }
     }

But Apache doesnt merge the headers to one header line. If you want to split up the header, you have to modify the ban-vcl also:

      ban("obj.http.Cache-Tags ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-1 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-2 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-3 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-4 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-5 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-6 ~ " + req.http.Cache-Tags);
      ban("obj.http.Cache-Tags-7 ~ " + req.http.Cache-Tags);

Not sure if this is the best way or whether we should keep trying to reduce the size of the cache tags.

duaelfr made their first commit to this issue’s fork.

duaelfr’s picture

Some plugins, like the purger from the Bunny CDN module, might use the getExpression method on TagInvalidation objects instead of casting them to strings. I updated the MR to have the same processing on this method as the __toString() method.

duaelfr’s picture

Now tags are minified (or not, depending on configuration) directly in TagInvalidation::getExpression(). Calls to purge.cache_tag_minificator are not needed anymore in hook_tokens_tokens.

duaelfr’s picture

StatusFileSize
new18.73 KB

Patch provided for composer.

falco010’s picture

Hi @duaelfr,

We were using the MR diff of this issue as a patch in our project.

After your changes in the MR, the functionality of this issue seemed to be broken (Not sure which part/commit from you). It would not be properly invalidated anymore. As extra info: we are using the 'Hash (using MD5 algorithm)' Minification logic.

After switching back to the patch in #80, the invalidation and functionality of this issue continued to work properly again.

kmonty made their first commit to this issue’s fork.