The comment.module uses a special trick to get threaded comments out in one query, sorted by thread. It stores a key of the form '1.2.3/' where each number is the index of the comment at that depth.
Because these keys are sorted as strings, we don't use regular integers (because 10 would be sorted before 2). Instead we use the sequence 1,2,...,9,90,91,92,...,99,990,991,...
This is a very inefficient system, because it adds another digit every 10 items. It doesn't scale well and generates excessively long keys. Long keys means slower db.
So, I thought of a better scheme: each number consists of a single digit for length, followed by a regular number. So:
00, 01, 02, ..., 09, 100, 101, 102, ..., 198, 199, 2000, 2001, 2002, ..., 2998, 2999, 3000, 3001, ...
Compared to regular decimal numbers, it only has 1 extra digit, and it retains string sorting order up to millions of comments. We can make it even more efficient by using higher base numbers. PHP goes up to base 36, so it's trivial to encode/decode these numbers with built-in routines.
The last hurdle is to convert existing comments, which is relatively easy with the update system.
Patch attached.
Comment | File | Size | Author |
---|---|---|---|
#1 | comment_26.patch | 4.74 KB | JonBob |
comment_24.patch | 5.38 KB | Steven | |
Comments
Comment #1
JonBob CreditAttribution: JonBob commentedAt the price of just a little waste (1/36), we can make the "vancode" generation much simpler. Just go from 0z to 110 instead of 100. I'm scared of anyone but Steven maintaining this if it stays as-is...
By the way, is "vancode" an invented term? Google isn't too familiar with it.
I like the rest of the patch.
Comment #2
chx CreditAttribution: chx commentedWell, JonBob is right but the functionality is just fantastic.
Comment #3
Steven CreditAttribution: Steven commentedWell the 1/36 waste is indeed not that important and saves a lot of code.
Committed the simplified version to HEAD.
Comment #4
(not verified) CreditAttribution: commentedComment #5
drewish CreditAttribution: drewish commentedI wish there was more discussion of the origin of the term "vancode" it does seem to be a total Google fail.