Problem/Motivation

We are experiencing excessive cache clearing due to hashed cache tag collisions on this module. It appears that each tag sent to Cloudflare is 3 characters long and each character is a hexadecimal (i.e. 16 possible characters in each one). This means there are only 16^3 = 4096 possible hashes that can be sent to Cloudflare and the probability of collisions is high.

Steps to reproduce

Example: The tags 'file_list' and 'config:system.menu.language' result in the same hash '144'.

Proposed resolution

We found that changing the number of possible characters to 36 (26 alpha plus 10 numeric digits) and increasing the length of the hash to 4 alleviated the problem for us. It results in ~1.6 million possible hashes which reduces the chances for collisions considerably.

However, this increases the size of the header. We have not run into any issues with it yet but we also added a config setting that allows us to remove any tags from the list based on prefix (e.g. config tags) and this decreases the header size.

Comments

kleinmp created an issue. See original summary.

kleinmp’s picture

Attaching patch.

jody lynn’s picture

Status: Active » Reviewed & tested by the community

Confirmed that we are running this patch in production and our Cloudflare cache hit rate increased significantly

almunnings’s picture

This patch has worked well for us.
We were seeing excessive collisions across entities, and Cloudflare was invalidating completely unrelated content.

This patch is excellent

altcom_neil’s picture

Hi

We also ran into this issue and another related issue - cache tags in the same Cloudflare account will clear all environments - so the UAT sites cache tags will clear the production sites cache if you are using the same account during development. We have added a patch that allows you to prefix the cache tag with an environment character so that cache tags are unique per environment.
See https://www.drupal.org/project/cloudflare/issues/3394651

In that code we increased the length of the hashed cache tag (before adding the environment character) to 6 characters (giving 16.7 million unique codes) - as we didn't spot the better improvement of using the larger character set that you have used here. We have been using this code on sites with in the excess of 100,000 nodes and we haven't run into any header size issues so 4 character tags should be fine.
If you do use 6 characters in the hash then you are up to over 2 billion unique hashes!

Should the length of the hash value be a config value - with a minimum of 4 so that it can be configured on a site-by-site basis? Very, very, very, very (etc) large sites would potentially have more than 1.6 million cache tags if they have millions of entities?

Cheers, Neil

mandclu’s picture

Status: Reviewed & tested by the community » Fixed

Thanks @kleinmp for identifying this, and for providing a fix. Merged in.

  • mandclu committed bbfb0a7d on 2.0.x
    Issue #3401335 by kleinmp, mandclu: Excessive Tag Hash Collisions
    

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.