Problem/Motivation
Investigating a problem with bulk form submissions / actions on a site with a few different languages, we've come across the situation where bulk_form_keys are being generated which look like this:
en-gb-1234
Where 'en-gb' is the language code, and 1234 is the nid.
At present loadEntityFromBulkFormKey explodes the key on '-' and expects to find either two items (langcode and ID) or three items (langcode, ID and vid).
protected function loadEntityFromBulkFormKey($bulk_form_key) {
$key_parts = explode('-', $bulk_form_key);
$revision_id = NULL;
// If there are 3 items, vid will be last.
if (count($key_parts) === 3) {
$revision_id = array_pop($key_parts);
}
// The first two items will always be langcode and ID.
$id = array_pop($key_parts);
$langcode = array_pop($key_parts);
This means that if there's a langcode with a dash in it, the id is mistakenly used as the vid and bad things happen (in this case Drupal was writing rows to the node_field_revision with the nid of the nodes in question set as the value for vid).
Proposed resolution
Assuming that it's valid for langcodes to contain hyphens (which is fairly common), something else should be used as the separator for the bulk_form_key to avoid this situation.
Remaining tasks
Commit.
User interface changes
None.
API changes
None.
Data model changes
None.
Why this should be an RC target
Data is corrupted. As per the problem report: If there's a langcode with a dash in it, then nid is mistakenly used as the vid, so Drupal is writing rows to the node_field_revision with the nid of the nodes in question set as the value for vid.
Comment | File | Size | Author |
---|---|---|---|
#43 | interdiff-2599246-40-43.txt | 1.02 KB | mcdruid |
#43 | better_bulk_form_key_separator-2599246-43.patch | 8.9 KB | mcdruid |
#40 | interdiff-2599246-33-40.txt | 1.53 KB | mcdruid |
#40 | better_bulk_form_key_separator-2599246-40.patch | 8.9 KB | mcdruid |
#33 | interdiff-2599246-31-33.txt | 690 bytes | mcdruid |
Comments
Comment #2
mcdruidPatch which uses the PATH_SEPARATOR predefined constant instead of a '-'.
I'm open to suggestions about better alternatives to use as this separator; it looks like it could be just about anything other than a '-' so long as it's valid HTML when used in the value attribute of a form element.
To be clear, the difference this makes in the form HTML is that we get e.g.
<input type="checkbox" class="form-checkbox" value="en-gb:1234" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">
...instead of:
<input type="checkbox" class="form-checkbox" value="en-gb-1234" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">
Comment #3
mcdruidComment #4
Gábor HojtsyThanks, good find. This is indeed bad. We should have tests to prove it did not work with en-gb, etc. Extending the existing test should be fine and hopefully doable easily. Apart from that:
If its only used within this class, it should be a class constant, not a global one.
Also I would just define it as ':', its more predictable across environments.
Comment #5
mcdruidThanks for the really useful suggestions Gábor.
Here's an updated patch using a class constant simply set to ':'.
I've made a separate patch for the (very simple) update to Drupal\node\Tests\Views\BulkFormTest as this makes it easier to verify that tests fail - as expected - before the patch which changes the separator is applied.
I'm happy to provide one overall patch if that would be useful.
I originally tried to change 'fr' to 'fr-ca' in the test in order to include a language code with a region subtag, but for some reason I don't fully understand that caused problems. Simply changing 'fr' to 'en-gb' had the expected result; I'm seeing 5 failures before the separator patch is applied e.g.:
With the patch applied to core/modules/system/src/Plugin/views/field/BulkForm.php I get 100% passes again.
Comment #7
mcdruidOk, so that test fail was actually what we expected to happen, and what Gábor asked for:
Here's a patch which combines changes from both of the patches which I posted separately before. There are no actual changes compared to the patches in #5.
I expect this patch to pass tests.
Comment #8
Gábor HojtsyThanks, that looks good :) Yay! Hope to get this in an RC, it would be a big problem to release without this fixed.
Comment #11
Gábor HojtsyPatch does not apply anymore :/
Comment #12
druprad+Needs reroll
Comment #13
mcdruidre-rolling
Comment #14
mcdruidThis collided with #2598620: Wrong assertion on BulkFormTest so required a simple manual fix:
That was the only change I had to make.
Comment #15
Gábor HojtsyYay, thanks!
Comment #20
mcdruidOops - uploaded the wrong version of that patch, sorry.
Comment #21
mcdruidBack to RTBC per Gábor in #15, now that I've submitted the correct patch.
Comment #25
Gábor HojtsyComment #26
effulgentsia CreditAttribution: effulgentsia at Acquia commentedI discussed this with @catch, and yep, data corruption bugs are good to fix during RC rather than punt to after release.
Comment #27
effulgentsia CreditAttribution: effulgentsia at Acquia commentedCan we add a comment for why we can be confident that langcode, ID, and revision ID cannot have ':' in them?
Comment #28
mcdruidGood idea. I'll make some notes here, then work on a suggested comment to accompany the definition of BULK_FORM_KEY_SEPARATOR.
Language tag
The comments in \Drupal\Core\Language\LanguageInterface refer to:
That RFC states:
I'd think that confirms that a ':' is safe here.
IDs
\Drupal\Core\Entity\EntityTypeInterface includes a comment for the getKeys() method ( / function):
So I think that would suggest that we're safe to assume id (and revision id?) will always be numeric.
However, \Drupal\Core\Entity\EntityInterface defines the id:
...which suggests it could be a string (as opposed to "its value must be numeric.").
I'll keep researching, but would welcome any input on whether we can be confident that the ids will always be numeric.
Comment #29
Gábor Hojtsy@mcdruid: I think we need to make some assumptions, but the dash assumption before this patch was bad :) If we think the entity IDs may be strings, then there is no definite character to rely on for separators I think.
Comment #30
mcdruidHmm... it seems it's not safe to assume that entity IDs will be numeric.
See, for example, #1823494: Field API assumes serial/integer entity IDs, but the entity system does not where one of the comments talks about using ids from an external system in the form of "namespace:1234", which would be a problem with this this patch as it stands.
If we want to be totally confident that there won't be any collision between the character(s) we use as a separator, and the actual data that's being embedded in the value of this key, perhaps we should use something like base64 encoding? That way we know for certain which characters will be in the (encoded) data, and we can safely choose a separator which will never occur in the data itself. A colon would be a safe choice as it should never appear in base64.
Some simple test code as an example:
This will safely transmit the value of the ID despite the fact that it contains the character we're using as a separator:
It's not exactly pretty, and it means the keys are not really human-readable any more, but I think this approach gets around making any assumptions about what will be in the inputs that go into the key.
I'll have a look at getting this into the existing code and adding to the test coverage.
Comment #31
mcdruidNew patch which runs the parts of the key through base64_encode on the way in, and base64_decode on the way back out.
Example markup this produces:
The bulk node form seems to work okay with this in place, and running (patched) Drupal\node\Tests\Views\BulkFormTest locally I get 100% passes.
As for adding additional test coverage (e.g. check what happens when an entity id is non-numeric and / or contains the separator character), that can't go into node's BulkFormTest, as nids (and vids) are always integers. I'll see if there's a different test that it would make sense to add to, or whether we'd need to add a new test.
Comment #32
Gábor HojtsyI don't think we go to this length elsewhere to support string ids but well, I think that will ultimately be a committer or component maintainer decision.
Extra whitespace at end of first line. You should break the line as soon as needed but not sooner.
Comment #33
mcdruidThanks Gábor; new patch hopefully fixes the comment.
I agree that the Base64 seems like "a sledgehammer to crack a nut" in most cases (e.g. node, user etc..) but it does seem mostly harmless and should allow for any entities to be handled by the bulk form.
As you say, happy to leave this to the discretion of a committer or component maintainer; if we don't need the Base64 encoding to cater for arbitrary string ids, the patch from #20 should be sufficient.
Comment #34
Gábor HojtsyYay, thanks!
Comment #38
alexpottGiven that it is technically possible to have string entity IDs I'm not too sure about this separator either... how about using a tab character or something.
Comment #39
Gábor HojtsyHm. Using a tab character would also eliminate the need for base64 encoding.
Comment #40
mcdruidCouldn't a tab character also appear in an entity ID if they can be arbitrary strings (as unlikely as that seems)?
Gábor, I think you were right when you said:
alexpott made a very sensible suggestion in IRC involving serializing the key parts; this means not worrying about choosing one or more safe characters to use as a separator. We agreed base64 encoding probably still makes sense as we have no idea what characters might be in the actual IDs and therefore whether they'd produce valid HTML even inside a serialised string.
So, here's a new patch where the key parts are first serialized and then base64 encoded. This means we don't need a separator at all, so the class constant for that is gone.
This produces markup like this in the bulk form (which I've checked works as expected):
Drupal\node\Tests\Views\BulkFormTest passes when I run it locally with this version.
Comment #41
Gábor HojtsyI would stear clear of unserializing data that the client might have tempered with. See https://wiki.php.net/rfc/secure_unserialize
Comment #42
Gábor HojtsyAs per http://php.net/unserialize PHP 7 comes with an option to avoid unserializing arbitrary classes, that does not seem to be possible in earlier versions.
Comment #43
mcdruidGood catch Gábor, thank you.
So how about using json_encode / json_decode instead? IIUC they don't have the same security risk as unserialize (e.g. http://stackoverflow.com/questions/12292029/is-it-safe-to-call-json-deco... ), and there's also the advantage that they're apparently faster and produce less verbose output.
Example markup (the value is indeed shorter than with serialize):
Comment #44
Gábor HojtsyThat looks fine to me. I am not entirely convinced the base64 is required, given that Drupal should properly escape the attribute contents when printing, but it definitely does not hurt. Especially if alexpott agreed with it :)
Comment #48
alexpottI think the base64encode is fine - it makes it more difficult to tamper with a node ID and nice catch re the security concerns of unserialize.
Committed 103cbc8 and pushed to 8.0.x. Thanks!
Comment #51
Gábor HojtsyYay, thanks!