The comment.module uses a special trick to get threaded comments out in one query, sorted by thread. It stores a key of the form '1.2.3/' where each number is the index of the comment at that depth.

Because these keys are sorted as strings, we don't use regular integers (because 10 would be sorted before 2). Instead we use the sequence 1,2,...,9,90,91,92,...,99,990,991,...

This is a very inefficient system, because it adds another digit every 10 items. It doesn't scale well and generates excessively long keys. Long keys means slower db.

So, I thought of a better scheme: each number consists of a single digit for length, followed by a regular number. So:
00, 01, 02, ..., 09, 100, 101, 102, ..., 198, 199, 2000, 2001, 2002, ..., 2998, 2999, 3000, 3001, ...

Compared to regular decimal numbers, it only has 1 extra digit, and it retains string sorting order up to millions of comments. We can make it even more efficient by using higher base numbers. PHP goes up to base 36, so it's trivial to encode/decode these numbers with built-in routines.

The last hurdle is to convert existing comments, which is relatively easy with the update system.

Patch attached.

CommentFileSizeAuthor
#1 comment_26.patch4.74 KBJonBob
comment_24.patch5.38 KBSteven
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

JonBob’s picture

FileSize
4.74 KB

At the price of just a little waste (1/36), we can make the "vancode" generation much simpler. Just go from 0z to 110 instead of 100. I'm scared of anyone but Steven maintaining this if it stays as-is...

By the way, is "vancode" an invented term? Google isn't too familiar with it.

I like the rest of the patch.

chx’s picture

Status: Needs review » Reviewed & tested by the community

Well, JonBob is right but the functionality is just fantastic.

Steven’s picture

Status: Reviewed & tested by the community » Fixed

Well the 1/36 waste is indeed not that important and saves a lot of code.

Committed the simplified version to HEAD.

Anonymous’s picture

Status: Fixed » Closed (fixed)
drewish’s picture

Version: x.y.z » 4.7.0

I wish there was more discussion of the origin of the term "vancode" it does seem to be a total Google fail.