Follow-up from #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend.

Problem/Motivation

Fetching ittems from the DB cache might be faster if the cache ID was hashed and stored in a binary column instead of using a longer key on a VARCHAR column.

Proposed resolution

Use a 256 bit (or similar size to be collisions resistant) hash of the human-readable cache ID as the actual cache ID so we have a fixed-length randomized string.

Collect some data to show if the difference is meaningful.

A hash will be much more uniformly distributed making it a better actual key for BTREE lookups in the database back-end. We can further optimize optimize the mysql implementation by using a BINARY column to avoid the overhead of the utf8 character set being used for non-utf8 data

Remaining tasks

Comments

pwolanin’s picture

Issue summary: View changes
Status: Needs review » Active
pwolanin’s picture

@danblack - the likelihood of sha-256 or sha-512/256 collision is not going to be significantly altered by research. It's possible some less-than-ideal behavior will be found, but not at the level that would mater for this use case.

If this was a significant worry, someone somewhere would have seen a collision on git hashes actually happen. That is a 160 bit sha-1 which has some known flaws (hence why we don't use it for new code). Roughly, you'd need as many git commits as there are stars in the universe to find a collision by chance. It would take a vastly larger number to find a sha-256 collision.

http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-poss...

wim leers’s picture

Status: Active » Closed (won't fix)
Related issues: +#2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend

This was not yet done, but we're now using hashes if the CID becomes too long: #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend.

Doing what this issue says, which goes even further, hampers DX. So I think we've gone as far as we want to with this. Feel free to reopen.