Drupal\Component\Uuid\Php::generate() is all but undocumented, and contains bugs:
-
Four hex chars are grabbed for
$clock_seq_hi_and_reservedinstead of two (the high byte is then discarded in the bitwise AND), and then the two chars used for$clock_seq_loware re-used as the first two chars of$nodes. -
$clock_seq_lowis a string (of two hex chars), but the formatting treats it as an integer, using printf placeholder%02xinstead of%s -
All the processing for
$time_hi_and_versionworks, but is completely unnecessary. That evidentially originated in the Ruby code this was (loosely) based on, but in this version we already have a string of hex characters, and merely need to replace the first character of the string with "4" (rather than converting to an integer, performing bitwise ops to set the top four bits, and re-formatting as a string again). -
The method of obtaining the initial hex string is flawed, reducing the available UUID space by first hashing the original random number, and then truncating the result.
The original (not truncated) hashes of the 16 random bytes should be at least nearly as good as the data it was based on; but truncating those hashes to 32 characters will undoubtedly introduce collisions, thus reducing the UUID space (because there is no margin for error at all -- every single permutation of 32 hex chars is needed to represent all the possible 16-byte values).
OTOH, PHP has
bin2hex()which will give us (in considerably fewer CPU cycles) precisely what we really wanted -- the hexadecimal representation of the original random number. -
And finally, as mentioned, there's very little documentation for any of this, save for a pointer to the RFC (which actually has enough inconsistencies to cause serious confusion, so we definitely need a pointer to its errata as well).
| Comment | File | Size | Author |
|---|---|---|---|
| #22 | interdiff-2179537-19-22.txt | 1017 bytes | jweowu |
| #22 | drupal-uuid_bug_fixes_and_doc-2179537-22.patch | 2.94 KB | jweowu |
| #19 | interdiff-2179537-18-19.txt | 1.25 KB | jweowu |
| #19 | drupal-uuid_bug_fixes_and_doc-2179537-19.patch | 2.93 KB | jweowu |
| #18 | interdiff-2179537-17-18.txt | 529 bytes | jweowu |
Comments
Comment #1
jweowu commentedComment #2
jweowu commentedAnother thing -- the method of obtaining the initial hex string seems flawed to me.
The original (not truncated) hashes of the 16 random bytes should be at least nearly as good as the data it was based on; but truncating those hashes to 32 characters will undoubtedly introduce collisions, thus reducing the UUID space (because there is no margin for error at all -- every single permutation of 32 hex chars is needed to represent all the possible 16-byte values).
OTOH, PHP has
bin2hex()which will do precisely what we really want.Comment #3
jweowu commentedComment #4
jweowu commentedComment #5
jhedstromPatch still applies here, and the unit test still has 100% test coverage of the method. I think the technical changes need more review though--the logic in #2 makes sense to me.
Comment #6
jweowu commentedFWIW I checked the spec and its errata carefully when writing this patch (and reviewing/writing patches for the UUID contrib module), but I agree that someone else should verify the correctness of the code.
The relevant sections of the RFC aren't very long. The only real confusion is over the bit-ordering, which the referenced errata clarifies. The new comments in the code should make it as easy as possible to follow.
There's also http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28... (which obviously isn't the authoritative source, but is a simpler statement of the format).
Comment #8
jweowu commentedA fairly gratuitous title change in the hope that someone reviews this 20-month-old fix before Drupal 8 is released.
I've hesitantly upgraded the priority to "Major" on the basis that this component has system-wide use. I thought "Normal" seemed appropriate at the time, but it will seem a little absurd to me if D8 gets released with a knowingly-incorrect UUID generator, and I presume this is completely lost in the sea of 2,500 "normal" D8 bugs (albeit not necessarily much more visible in the sea of 500 "major" D8 bugs).
Comment #10
alexpottFixing this is important given that the RFC says that the clock sequence is important.
See https://www.drupal.org/coding-standards/docs
Comment #11
jweowu commentedThank you for reviewing this, alexpott.
I'll change the comments if necessary, but...
I know there are coding standards for comments, but are they not in the service of maximising code readability and comprehension, and consequently open to exceptions where sensible?
For the record, I believe these comments:
are objectively better than these comments:
The positioning and alignment of the former makes both the hexadecimal values and the purpose of the sequence of binary operations instantly comprehensible and clear. The latter version of the comments can only detract from that clarity. (Not to any huge extent, sure; but it's still making it worse than it was, while doubling the number of lines in the process!)
You might elaborate on the comments in the latter version so they're not just a number on a line, but then you're padding out something which should have required no padding -- I believe the original version makes maximum sense with minimum waste/distraction. Why would we mess with that?
I'm not especially bothered by adding a full stop to the end of each comment (although, again, it's not an improvement for the binary number comments), but I really do object to not using end-of-line comments in cases when they make the code easier to read. I'll go ahead and change them if they need to be changed, of course; I'm just not in favour of making things objectively worse for the sake of conventions, so I wanted to say something about it before doing it, in case exceptions are allowable?
n.b. There is also a stated exception whereby "If each line of a list needs a separate comment, the comments may be given on the same line", and I would suggest that the instances in this patch are of a similar purpose, even if not actual list items.
Comment #12
alexpott@jweowu I agree with the fact that comments on the same line are better. But in order to do that we need to fix our coding standards - which takes a long time. We could open a followup to add the values in once we have a new standard and do something like this...
Comment #13
claudiu.cristeaDon't we need some unit testing here?
Comment #14
alexpottThere are unit tests already and the changes here does not break it - the fix here is going to be impossible to test to be honest
Comment #15
wiifmHey @alexpott, the new comments seem to make more sense, and adhere to the comment standards.
I am not able to technically review the UUID implementation, but surely if anyone understands the spec now, it would be @jweowu. Thanks Phil for reading this!
+1 RTBC from me.
Comment #16
jweowu commentedI've tweaked alexpott's revised comments for readability; added mention of the missing
$clock_seq_lowto the list of variable names; and switched to binary format for the$clock_seq_hi_and_reservedmanipulations in order to restore the clarity of those calculations without the need for the comments I'd used originally.I'm happy with the revised comments. Those and some minor formatting aside, the only code change between #2 and this version is the change from
0x3Fto0b00111111, and0x80to0b10000000(n.b. "Binary integer literals are available since PHP 5.4.0" -- http://php.net/manual/en/language.types.integer.php ).Hi wiifm :)
Comment #17
jweowu commentedAnd a trivial change from
"4"to'4', in case that was going to bother anyone :)Comment #18
jweowu commentedSpotted and fixed a typo in a comment.
Edit: Actually, the typo is in the original code. The RFC uses the name 'node', not 'nodes'.
I'll fix and re-roll...
Comment #19
jweowu commentedCorrect fix for the
node/nodesnaming mismatch.Comment #20
alexpottUnfortunately this is a coding standard violation :(
Comment #21
jweowu commentedSigh. I'm hating the commenting standards quite a bit right now.
Ok, I'd best get phpcs set up and make sure I'm not submitting any more invalid comments. I'll post a new version shortly.
Comment #22
jweowu commentedWell I've silenced that complaint with just an empty // instead of a completely blank line between those comments, and phpcs is happy.
I also took another look at the referenced ruby code and decided that this PHP bears so little resemblance to that, that there's just no purpose to pointing to it as even a loose basis. I'd keep it if they were noticeably similar, but they're just not IMHO, so I've removed those comments from the header.
Comment #23
jweowu commentedAny chance of another review? To the best of my knowledge, this patch is good to go.
Comment #24
alexpott@jweowu sorry this fell off my radar (my issue on d.o moves very very quickly). I've reviewed this several times and it is good to go. It is not testable because of the nature of the fix and the existing test coverage shows we've broken nothing.
Comment #25
jweowu commentedMany thanks, alexpott.
Comment #28
catchCommitted/pushed to 8.2.x and cherry-picked to 8.1.x and 8.0.x. Thanks!
Comment #29
jweowu commentedIt doesn't seem like the 8.0.x commit has occurred?
Comment #30
catchHmm not sure what happened, just pushed again though.