Problem/Motivation

Investigating a problem with bulk form submissions / actions on a site with a few different languages, we've come across the situation where bulk_form_keys are being generated which look like this:

en-gb-1234

Where 'en-gb' is the language code, and 1234 is the nid.

At present loadEntityFromBulkFormKey explodes the key on '-' and expects to find either two items (langcode and ID) or three items (langcode, ID and vid).

  protected function loadEntityFromBulkFormKey($bulk_form_key) {
    $key_parts = explode('-', $bulk_form_key);
    $revision_id = NULL;

    // If there are 3 items, vid will be last.
    if (count($key_parts) === 3) {
      $revision_id = array_pop($key_parts);
    }   

    // The first two items will always be langcode and ID.
    $id = array_pop($key_parts);
    $langcode = array_pop($key_parts);

This means that if there's a langcode with a dash in it, the id is mistakenly used as the vid and bad things happen (in this case Drupal was writing rows to the node_field_revision with the nid of the nodes in question set as the value for vid).

Proposed resolution

Assuming that it's valid for langcodes to contain hyphens (which is fairly common), something else should be used as the separator for the bulk_form_key to avoid this situation.

Remaining tasks

Commit.

User interface changes

None.

API changes

None.

Data model changes

None.

Why this should be an RC target

Data is corrupted. As per the problem report: If there's a langcode with a dash in it, then nid is mistakenly used as the vid, so Drupal is writing rows to the node_field_revision with the nid of the nodes in question set as the value for vid.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mcdruid created an issue. See original summary.

mcdruid’s picture

Patch which uses the PATH_SEPARATOR predefined constant instead of a '-'.

I'm open to suggestions about better alternatives to use as this separator; it looks like it could be just about anything other than a '-' so long as it's valid HTML when used in the value attribute of a form element.

To be clear, the difference this makes in the form HTML is that we get e.g.

<input type="checkbox" class="form-checkbox" value="en-gb:1234" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">

...instead of:

<input type="checkbox" class="form-checkbox" value="en-gb-1234" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">

mcdruid’s picture

Version: 8.1.x-dev » 8.0.x-dev
Priority: Normal » Major
Status: Active » Needs review
Gábor Hojtsy’s picture

Status: Needs review » Needs work
Issue tags: +Needs tests, +D8MI, +language-content, +sprint

Thanks, good find. This is indeed bad. We should have tests to prove it did not work with en-gb, etc. Extending the existing test should be fine and hopefully doable easily. Apart from that:

+++ b/core/modules/system/src/Plugin/views/field/BulkForm.php
@@ -24,6 +24,8 @@
+const BULK_FORM_KEY_SEPARATOR = PATH_SEPARATOR;
+

If its only used within this class, it should be a class constant, not a global one.

Also I would just define it as ':', its more predictable across environments.

mcdruid’s picture

Thanks for the really useful suggestions Gábor.

Here's an updated patch using a class constant simply set to ':'.

I've made a separate patch for the (very simple) update to Drupal\node\Tests\Views\BulkFormTest as this makes it easier to verify that tests fail - as expected - before the patch which changes the separator is applied.

I'm happy to provide one overall patch if that would be useful.

I originally tried to change 'fr' to 'fr-ca' in the test in order to include a language code with a region subtag, but for some reason I don't fully understand that caused problems. Simply changing 'fr' to 'en-gb' had the expected result; I'm seeing 5 failures before the separator patch is applied e.g.:

1: British English translation has been unpublished

"pvTxzv7z [2:en-gb] (Original translation) - The following content translations will be deleted:" found

With the patch applied to core/modules/system/src/Plugin/views/field/BulkForm.php I get 100% passes again.

Status: Needs review » Needs work

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

mcdruid’s picture

Status: Needs work » Needs review
FileSize
8.99 KB

Ok, so that test fail was actually what we expected to happen, and what Gábor asked for:

We should have tests to prove it did not work with en-gb

Here's a patch which combines changes from both of the patches which I posted separately before. There are no actual changes compared to the patches in #5.

I expect this patch to pass tests.

Gábor Hojtsy’s picture

Issue summary: View changes
Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests +rc target triage

Thanks, that looks good :) Yay! Hope to get this in an RC, it would be a big problem to release without this fixed.

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 7: better_bulk_form_key_separator-2599246-6.patch, failed testing.

Gábor Hojtsy’s picture

Patch does not apply anymore :/

druprad’s picture

Issue tags: +Needs reroll

+Needs reroll

mcdruid’s picture

Assigned: Unassigned » mcdruid

re-rolling

mcdruid’s picture

Assigned: mcdruid » Unassigned
Status: Needs work » Needs review
Issue tags: -Needs reroll
FileSize
8.99 KB

This collided with #2598620: Wrong assertion on BulkFormTest so required a simple manual fix:

<      $this->assertTrue($node->isPublished(), 'Node has been made sticky');
---
>      $this->assertTrue($node->isSticky(), 'Node has been made sticky');

That was the only change I had to make.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Yay, thanks!

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 14: better_bulk_form_key_separator-2599246-14.patch, failed testing.

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

The last submitted patch, 7: better_bulk_form_key_separator-2599246-6.patch, failed testing.

The last submitted patch, 14: better_bulk_form_key_separator-2599246-14.patch, failed testing.

mcdruid’s picture

Status: Needs work » Needs review
FileSize
8.99 KB

Oops - uploaded the wrong version of that patch, sorry.

mcdruid’s picture

Status: Needs review » Reviewed & tested by the community

Back to RTBC per Gábor in #15, now that I've submitted the correct patch.

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

The last submitted patch, 7: better_bulk_form_key_separator-2599246-6.patch, failed testing.

The last submitted patch, 14: better_bulk_form_key_separator-2599246-14.patch, failed testing.

Gábor Hojtsy’s picture

Issue summary: View changes
effulgentsia’s picture

Issue tags: -rc target triage +rc target

I discussed this with @catch, and yep, data corruption bugs are good to fix during RC rather than punt to after release.

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/src/Plugin/views/field/BulkForm.php
@@ -36,6 +36,11 @@ class BulkForm extends FieldPluginBase implements CacheableDependencyInterface {
+   * Separator used in bulk_form_keys.
+   */
+  const BULK_FORM_KEY_SEPARATOR = ':';
+

Can we add a comment for why we can be confident that langcode, ID, and revision ID cannot have ':' in them?

mcdruid’s picture

Good idea. I'll make some notes here, then work on a suggested comment to accompany the definition of BULK_FORM_KEY_SEPARATOR.

Language tag

The comments in \Drupal\Core\Language\LanguageInterface refer to:

  /** 
   * Language code referring to the default language of data, e.g. of an entity.
   *
   * See the BCP 47 syntax for defining private language tags:
   * http://www.rfc-editor.org/rfc/bcp/bcp47.txt
   */
  const LANGCODE_DEFAULT = 'x-default';

That RFC states:

2.1. Syntax

A language tag is composed from a sequence of one or more "subtags",
each of which refines or narrows the range of language identified by
the overall tag. Subtags, in turn, are a sequence of alphanumeric
characters (letters and digits), distinguished and separated from
other subtags in a tag by a hyphen ("-", [Unicode] U+002D).

I'd think that confirms that a ':' is safe here.

IDs

\Drupal\Core\Entity\EntityTypeInterface includes a comment for the getKeys() method ( / function):

  /** 
   * Gets an array of entity keys.
   *
   * @return array
   *   An array describing how the Field API can extract certain information
   *   from objects of this entity type:
   *   - id: The name of the property that contains the primary ID of the
   *     entity. Every entity object passed to the Field API must have this
   *     property and its value must be numeric.
   *   - revision: (optional) The name of the property that contains the
   *     revision ID of the entity. The Field API assumes that all revision IDs
   *     are unique across all entities of a type. If this entry is omitted
   *     the entities of this type are not revisionable.

So I think that would suggest that we're safe to assume id (and revision id?) will always be numeric.

However, \Drupal\Core\Entity\EntityInterface defines the id:

  /** 
   * Gets the identifier.
   *
   * @return string|int|null
   *   The entity identifier, or NULL if the object does not yet have an
   *   identifier.
   */
  public function id();

...which suggests it could be a string (as opposed to "its value must be numeric.").

I'll keep researching, but would welcome any input on whether we can be confident that the ids will always be numeric.

Gábor Hojtsy’s picture

@mcdruid: I think we need to make some assumptions, but the dash assumption before this patch was bad :) If we think the entity IDs may be strings, then there is no definite character to rely on for separators I think.

mcdruid’s picture

Hmm... it seems it's not safe to assume that entity IDs will be numeric.

See, for example, #1823494: Field API assumes serial/integer entity IDs, but the entity system does not where one of the comments talks about using ids from an external system in the form of "namespace:1234", which would be a problem with this this patch as it stands.

If we want to be totally confident that there won't be any collision between the character(s) we use as a separator, and the actual data that's being embedded in the value of this key, perhaps we should use something like base64 encoding? That way we know for certain which characters will be in the (encoded) data, and we can safely choose a separator which will never occur in the data itself. A colon would be a safe choice as it should never appear in base64.

Some simple test code as an example:

const BULK_FORM_KEY_SEPARATOR = ':';

function generate_key($lang, $id, $vid = null) {
  $parts = array($lang, $id, $vid);
  $parts = array_filter($parts);
  $parts = array_map('base64_encode', $parts);
  $key = implode(BULK_FORM_KEY_SEPARATOR, $parts);
  return $key;
}

function decode_key($key) {
  $parts = explode(BULK_FORM_KEY_SEPARATOR, $key);
  $parts = array_map('base64_decode', $parts);
  return $parts;
}

$key = generate_key('en-gb', 'ab:cd', 5678);
print_r(array('key' => $key, 'parts' => decode_key($key)));

This will safely transmit the value of the ID despite the fact that it contains the character we're using as a separator:

$ php test.php
Array
(
    [key] => ZW4tZ2I=:YWI6Y2Q=:NTY3OA==
    [parts] => Array
        (
            [0] => en-gb
            [1] => ab:cd
            [2] => 5678
        )

)

It's not exactly pretty, and it means the keys are not really human-readable any more, but I think this approach gets around making any assumptions about what will be in the inputs that go into the key.

I'll have a look at getting this into the existing code and adding to the test coverage.

mcdruid’s picture

Status: Needs work » Needs review
FileSize
9.2 KB
854 bytes

New patch which runs the parts of the key through base64_encode on the way in, and base64_decode on the way back out.

Example markup this produces:

<input type="checkbox" class="form-checkbox" value="ZW4=:Mg==" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">

The bulk node form seems to work okay with this in place, and running (patched) Drupal\node\Tests\Views\BulkFormTest locally I get 100% passes.

As for adding additional test coverage (e.g. check what happens when an entity id is non-numeric and / or contains the separator character), that can't go into node's BulkFormTest, as nids (and vids) are always integers. I'll see if there's a different test that it would make sense to add to, or whether we'd need to add a new test.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

I don't think we go to this length elsewhere to support string ids but well, I think that will ultimately be a committer or component maintainer decision.

+++ b/core/modules/system/src/Plugin/views/field/BulkForm.php
@@ -464,7 +469,11 @@ protected function calculateEntityBulkFormKey(EntityInterface $entity, $use_revi
+    // Base64 encode the parts of the key ¶
+    //  to ensure no collision with the separator.

Extra whitespace at end of first line. You should break the line as soon as needed but not sooner.

mcdruid’s picture

Status: Needs work » Needs review
FileSize
9.25 KB
690 bytes

Thanks Gábor; new patch hopefully fixes the comment.

I agree that the Base64 seems like "a sledgehammer to crack a nut" in most cases (e.g. node, user etc..) but it does seem mostly harmless and should allow for any entities to be handled by the bulk form.

As you say, happy to leave this to the discretion of a committer or component maintainer; if we don't need the Base64 encoding to cater for arbitrary string ids, the patch from #20 should be sufficient.

Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

Yay, thanks!

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

The last submitted patch, 7: better_bulk_form_key_separator-2599246-6.patch, failed testing.

The last submitted patch, 14: better_bulk_form_key_separator-2599246-14.patch, failed testing.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work
+++ b/core/modules/system/src/Plugin/views/field/BulkForm.php
@@ -36,6 +36,11 @@ class BulkForm extends FieldPluginBase implements CacheableDependencyInterface {
   /**
+   * Separator used in bulk_form_keys.
+   */
+  const BULK_FORM_KEY_SEPARATOR = ':';
+

Given that it is technically possible to have string entity IDs I'm not too sure about this separator either... how about using a tab character or something.

Gábor Hojtsy’s picture

Hm. Using a tab character would also eliminate the need for base64 encoding.

mcdruid’s picture

Couldn't a tab character also appear in an entity ID if they can be arbitrary strings (as unlikely as that seems)?

Gábor, I think you were right when you said:

If we think the entity IDs may be strings, then there is no definite character to rely on for separators

alexpott made a very sensible suggestion in IRC involving serializing the key parts; this means not worrying about choosing one or more safe characters to use as a separator. We agreed base64 encoding probably still makes sense as we have no idea what characters might be in the actual IDs and therefore whether they'd produce valid HTML even inside a serialised string.

So, here's a new patch where the key parts are first serialized and then base64 encoded. This means we don't need a separator at all, so the class constant for that is gone.

This produces markup like this in the bulk form (which I've checked works as expected):

<input type="checkbox" class="form-checkbox" value="YToyOntpOjA7czoyOiJlbiI7aToxO3M6MToiMiI7fQ==" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">

Drupal\node\Tests\Views\BulkFormTest passes when I run it locally with this version.

Gábor Hojtsy’s picture

Status: Needs review » Needs work

I would stear clear of unserializing data that the client might have tempered with. See https://wiki.php.net/rfc/secure_unserialize

unserialize() in PHP has certain security issues, brought by the fact that serialized data can include objects with data, and once these objects are instantiated, destructors are called when they are destroyed. This could allow to inject serialized data into application which may perform actions not intended by application writer, such as carelessly assuming unserialized data have certain types and calling functions on them which may have different meaning for different objects. Besides destructors, functions like __toString, __call, etc. can be vulnerable.

Gábor Hojtsy’s picture

As per http://php.net/unserialize PHP 7 comes with an option to avoid unserializing arbitrary classes, that does not seem to be possible in earlier versions.

mcdruid’s picture

Good catch Gábor, thank you.

So how about using json_encode / json_decode instead? IIUC they don't have the same security risk as unserialize (e.g. http://stackoverflow.com/questions/12292029/is-it-safe-to-call-json-deco... ), and there's also the advantage that they're apparently faster and produce less verbose output.

Example markup (the value is indeed shorter than with serialize):

<input type="checkbox" class="form-checkbox" value="WyJlbiIsIjEiXQ==" name="node_bulk_form[0]" id="edit-node-bulk-form-0" data-drupal-selector="edit-node-bulk-form-0">
Gábor Hojtsy’s picture

Status: Needs review » Reviewed & tested by the community

That looks fine to me. I am not entirely convinced the base64 is required, given that Drupal should properly escape the attribute contents when printing, but it definitely does not hurt. Especially if alexpott agreed with it :)

The last submitted patch, 5: better_bulk_form_key_separator_test-2599246-5.patch, failed testing.

The last submitted patch, 7: better_bulk_form_key_separator-2599246-6.patch, failed testing.

The last submitted patch, 14: better_bulk_form_key_separator-2599246-14.patch, failed testing.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

I think the base64encode is fine - it makes it more difficult to tamper with a node ID and nice catch re the security concerns of unserialize.

Committed 103cbc8 and pushed to 8.0.x. Thanks!

  • alexpott committed 103cbc8 on 8.0.x
    Issue #2599246 by mcdruid, Gábor Hojtsy: data in bulk_form_key should...

Status: Fixed » Needs work

The last submitted patch, 43: better_bulk_form_key_separator-2599246-43.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Fixed
Issue tags: -sprint

Yay, thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.