Follow-up from #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend.
Problem/Motivation
Fetching ittems from the DB cache might be faster if the cache ID was hashed and stored in a binary column instead of using a longer key on a VARCHAR column.
Proposed resolution
Use a 256 bit (or similar size to be collisions resistant) hash of the human-readable cache ID as the actual cache ID so we have a fixed-length randomized string.
Collect some data to show if the difference is meaningful.
A hash will be much more uniformly distributed making it a better actual key for BTREE lookups in the database back-end. We can further optimize optimize the mysql implementation by using a BINARY column to avoid the overhead of the utf8 character set being used for non-utf8 data
Comments
Comment #1
pwolanin commentedComment #2
pwolanin commented@danblack - the likelihood of sha-256 or sha-512/256 collision is not going to be significantly altered by research. It's possible some less-than-ideal behavior will be found, but not at the level that would mater for this use case.
If this was a significant worry, someone somewhere would have seen a collision on git hashes actually happen. That is a 160 bit sha-1 which has some known flaws (hence why we don't use it for new code). Roughly, you'd need as many git commits as there are stars in the universe to find a collision by chance. It would take a vastly larger number to find a sha-256 collision.
http://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-poss...
Comment #3
wim leersThis was not yet done, but we're now using hashes if the CID becomes too long: #2224847: Automatically shorten cid's in Cache\DatabaseBackend and PhpBackend.
Doing what this issue says, which goes even further, hampers DX. So I think we've gone as far as we want to with this. Feel free to reopen.