Problem/Motivation
We are experiencing excessive cache clearing due to hashed cache tag collisions on this module. It appears that each tag sent to Cloudflare is 3 characters long and each character is a hexadecimal (i.e. 16 possible characters in each one). This means there are only 16^3 = 4096 possible hashes that can be sent to Cloudflare and the probability of collisions is high.
Steps to reproduce
Example: The tags 'file_list' and 'config:system.menu.language' result in the same hash '144'.
Proposed resolution
We found that changing the number of possible characters to 36 (26 alpha plus 10 numeric digits) and increasing the length of the hash to 4 alleviated the problem for us. It results in ~1.6 million possible hashes which reduces the chances for collisions considerably.
However, this increases the size of the header. We have not run into any issues with it yet but we also added a config setting that allows us to remove any tags from the list based on prefix (e.g. config tags) and this decreases the header size.
| Comment | File | Size | Author |
|---|---|---|---|
| #2 | cloudflare-increase-hash-size-3401335-2.patch | 7.52 KB | kleinmp |
Comments
Comment #2
kleinmp commentedAttaching patch.
Comment #3
jody lynnConfirmed that we are running this patch in production and our Cloudflare cache hit rate increased significantly
Comment #4
almunningsThis patch has worked well for us.
We were seeing excessive collisions across entities, and Cloudflare was invalidating completely unrelated content.
This patch is excellent
Comment #5
altcom_neil commentedHi
We also ran into this issue and another related issue - cache tags in the same Cloudflare account will clear all environments - so the UAT sites cache tags will clear the production sites cache if you are using the same account during development. We have added a patch that allows you to prefix the cache tag with an environment character so that cache tags are unique per environment.
See https://www.drupal.org/project/cloudflare/issues/3394651
In that code we increased the length of the hashed cache tag (before adding the environment character) to 6 characters (giving 16.7 million unique codes) - as we didn't spot the better improvement of using the larger character set that you have used here. We have been using this code on sites with in the excess of 100,000 nodes and we haven't run into any header size issues so 4 character tags should be fine.
If you do use 6 characters in the hash then you are up to over 2 billion unique hashes!
Should the length of the hash value be a config value - with a minimum of 4 so that it can be configured on a site-by-site basis? Very, very, very, very (etc) large sites would potentially have more than 1.6 million cache tags if they have millions of entities?
Cheers, Neil
Comment #6
mandclu commentedThanks @kleinmp for identifying this, and for providing a fix. Merged in.