Drupal\Component\Uuid\Php::generate() is all but undocumented, and contains bugs:

  • Four hex chars are grabbed for $clock_seq_hi_and_reserved instead of two (the high byte is then discarded in the bitwise AND), and then the two chars used for $clock_seq_low are re-used as the first two chars of $nodes.

  • $clock_seq_low is a string (of two hex chars), but the formatting treats it as an integer, using printf placeholder %02x instead of %s

  • All the processing for $time_hi_and_version works, but is completely unnecessary. That evidentially originated in the Ruby code this was (loosely) based on, but in this version we already have a string of hex characters, and merely need to replace the first character of the string with "4" (rather than converting to an integer, performing bitwise ops to set the top four bits, and re-formatting as a string again).

  • The method of obtaining the initial hex string is flawed, reducing the available UUID space by first hashing the original random number, and then truncating the result.

    The original (not truncated) hashes of the 16 random bytes should be at least nearly as good as the data it was based on; but truncating those hashes to 32 characters will undoubtedly introduce collisions, thus reducing the UUID space (because there is no margin for error at all -- every single permutation of 32 hex chars is needed to represent all the possible 16-byte values).

    OTOH, PHP has bin2hex() which will give us (in considerably fewer CPU cycles) precisely what we really wanted -- the hexadecimal representation of the original random number.

  • And finally, as mentioned, there's very little documentation for any of this, save for a pointer to the RFC (which actually has enough inconsistencies to cause serious confusion, so we definitely need a pointer to its errata as well).

Comments

jweowu’s picture

Status: Active » Needs review
StatusFileSize
new2.59 KB
jweowu’s picture

StatusFileSize
new2.64 KB

Another thing -- the method of obtaining the initial hex string seems flawed to me.

The original (not truncated) hashes of the 16 random bytes should be at least nearly as good as the data it was based on; but truncating those hashes to 32 characters will undoubtedly introduce collisions, thus reducing the UUID space (because there is no margin for error at all -- every single permutation of 32 hex chars is needed to represent all the possible 16-byte values).

OTOH, PHP has bin2hex() which will do precisely what we really want.

jweowu’s picture

Issue summary: View changes
jweowu’s picture

Issue summary: View changes
jhedstrom’s picture

Patch still applies here, and the unit test still has 100% test coverage of the method. I think the technical changes need more review though--the logic in #2 makes sense to me.

jweowu’s picture

FWIW I checked the spec and its errata carefully when writing this patch (and reviewing/writing patches for the UUID contrib module), but I agree that someone else should verify the correctness of the code.

The relevant sections of the RFC aren't very long. The only real confusion is over the bit-ordering, which the referenced errata clarifies. The new comments in the code should make it as easy as possible to follow.

There's also http://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_.28... (which obviously isn't the authoritative source, but is a simpler statement of the format).

jweowu’s picture

Title: Bug fixes and documentation for Drupal\Component\Uuid\Php::generate() » Drupal 8 has a broken UUID generator
Priority: Normal » Major

A fairly gratuitous title change in the hope that someone reviews this 20-month-old fix before Drupal 8 is released.

I've hesitantly upgraded the priority to "Major" on the basis that this component has system-wide use. I thought "Normal" seemed appropriate at the time, but it will seem a little absurd to me if D8 gets released with a knowingly-incorrect UUID generator, and I presume this is completely lost in the sea of 2,500 "normal" D8 bugs (albeit not necessarily much more visible in the sea of 500 "major" D8 bugs).

alexpott’s picture

Status: Needs review » Needs work
  1. +++ b/core/lib/Drupal/Component/Uuid/Php.php
    @@ -22,27 +27,32 @@ class Php implements UuidInterface {
    -    $clock_seq_low = substr($hex, 20, 2);
    -    $nodes = substr($hex, 20);
    ...
    +    $hex = bin2hex(Crypt::randomBytes(16));
    

    Fixing this is important given that the RFC says that the clock sequence is important.

    The initial value MUST NOT be correlated to the node identifier.

  2. This patch fixes our generator and complies with RFC afaics.
  3. Unfortunately the new comments are not quite adhering to our coding standards...
    FILE: .../dev/sites/drupal8alt.dev/core/lib/Drupal/Component/Uuid/Php.php
    ----------------------------------------------------------------------
    FOUND 10 ERRORS AFFECTING 7 LINES
    ----------------------------------------------------------------------
     33 | ERROR | [x] Comments may not appear after statements
     34 | ERROR | [x] Inline comments must end in full-stops, exclamation
        |       |     marks, colons, or question marks
     34 | ERROR | [x] Comments may not appear after statements
     41 | ERROR | [x] Inline comments must start with a capital letter
     44 | ERROR | [x] Comments may not appear after statements
     45 | ERROR | [x] Inline comments must end in full-stops, exclamation
        |       |     marks, colons, or question marks
     45 | ERROR | [x] Comments may not appear after statements
     47 | ERROR | [x] Comments may not appear after statements
     48 | ERROR | [x] Inline comments must end in full-stops, exclamation
        |       |     marks, colons, or question marks
     48 | ERROR | [x] Comments may not appear after statements
    

    See https://www.drupal.org/coding-standards/docs

jweowu’s picture

Thank you for reviewing this, alexpott.

I'll change the comments if necessary, but...

I know there are coding standards for comments, but are they not in the service of maximising code readability and comprehension, and consequently open to exceptions where sensible?

For the record, I believe these comments:

$clock_seq_hi_and_reserved &= 0x3F; // 00111111
$clock_seq_hi_and_reserved |= 0x80; // 10000000

are objectively better than these comments:

// 00111111.
$clock_seq_hi_and_reserved &= 0x3F;
// 10000000.
$clock_seq_hi_and_reserved |= 0x80;

The positioning and alignment of the former makes both the hexadecimal values and the purpose of the sequence of binary operations instantly comprehensible and clear. The latter version of the comments can only detract from that clarity. (Not to any huge extent, sure; but it's still making it worse than it was, while doubling the number of lines in the process!)

You might elaborate on the comments in the latter version so they're not just a number on a line, but then you're padding out something which should have required no padding -- I believe the original version makes maximum sense with minimum waste/distraction. Why would we mess with that?

I'm not especially bothered by adding a full stop to the end of each comment (although, again, it's not an improvement for the binary number comments), but I really do object to not using end-of-line comments in cases when they make the code easier to read. I'll go ahead and change them if they need to be changed, of course; I'm just not in favour of making things objectively worse for the sake of conventions, so I wanted to say something about it before doing it, in case exceptions are allowable?

n.b. There is also a stated exception whereby "If each line of a list needs a separate comment, the comments may be given on the same line", and I would suggest that the instances in this patch are of a similar purpose, even if not actual list items.

alexpott’s picture

Status: Needs work » Needs review
StatusFileSize
new3.05 KB
new2.78 KB

@jweowu I agree with the fact that comments on the same line are better. But in order to do that we need to fix our coding standards - which takes a long time. We could open a followup to add the values in once we have a new standard and do something like this...

claudiu.cristea’s picture

Issue tags: +Needs tests

Don't we need some unit testing here?

alexpott’s picture

Issue tags: -Needs tests

There are unit tests already and the changes here does not break it - the fix here is going to be impossible to test to be honest

wiifm’s picture

Hey @alexpott, the new comments seem to make more sense, and adhere to the comment standards.

I am not able to technically review the UUID implementation, but surely if anyone understands the spec now, it would be @jweowu. Thanks Phil for reading this!

+1 RTBC from me.

jweowu’s picture

I've tweaked alexpott's revised comments for readability; added mention of the missing $clock_seq_low to the list of variable names; and switched to binary format for the $clock_seq_hi_and_reserved manipulations in order to restore the clarity of those calculations without the need for the comments I'd used originally.

I'm happy with the revised comments. Those and some minor formatting aside, the only code change between #2 and this version is the change from 0x3F to 0b00111111, and 0x80 to 0b10000000 (n.b. "Binary integer literals are available since PHP 5.4.0" -- http://php.net/manual/en/language.types.integer.php ).

Hi wiifm :)

jweowu’s picture

And a trivial change from "4" to '4', in case that was going to bother anyone :)

jweowu’s picture

Spotted and fixed a typo in a comment.

Edit: Actually, the typo is in the original code. The RFC uses the name 'node', not 'nodes'.

I'll fix and re-roll...

jweowu’s picture

Correct fix for the node/nodes naming mismatch.

alexpott’s picture

+++ b/core/lib/Drupal/Component/Uuid/Php.php
@@ -22,27 +24,39 @@ class Php implements UuidInterface {
+    // the field names defined in RFC 4122 section 4.1.2.
+
+    // Use characters 0-11 to generate 32-bit $time_low and 16-bit $time_mid.

Unfortunately this is a coding standard violation :(

jweowu’s picture

Sigh. I'm hating the commenting standards quite a bit right now.

Ok, I'd best get phpcs set up and make sure I'm not submitting any more invalid comments. I'll post a new version shortly.

jweowu’s picture

Well I've silenced that complaint with just an empty // instead of a completely blank line between those comments, and phpcs is happy.

I also took another look at the referenced ruby code and decided that this PHP bears so little resemblance to that, that there's just no purpose to pointing to it as even a loose basis. I'd keep it if they were noticeably similar, but they're just not IMHO, so I've removed those comments from the header.

jweowu’s picture

Any chance of another review? To the best of my knowledge, this patch is good to go.

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

@jweowu sorry this fell off my radar (my issue on d.o moves very very quickly). I've reviewed this several times and it is good to go. It is not testable because of the nature of the fix and the existing test coverage shows we've broken nothing.

jweowu’s picture

Many thanks, alexpott.

  • catch committed 424515e on 8.2.x
    Issue #2179537 by jweowu, alexpott: Drupal 8 has a broken UUID generator
    

  • catch committed 2669386 on 8.1.x
    Issue #2179537 by jweowu, alexpott: Drupal 8 has a broken UUID generator...
catch’s picture

Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.2.x and cherry-picked to 8.1.x and 8.0.x. Thanks!

jweowu’s picture

It doesn't seem like the 8.0.x commit has occurred?

catch’s picture

Hmm not sure what happened, just pushed again though.

  • catch committed f996547 on 8.0.x
    Issue #2179537 by jweowu, alexpott: Drupal 8 has a broken UUID generator...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.