Updated: Comment #0

Problem/Motivation

Follow-up from #2136197-75: Move field/instance/widget/formatter settings out of annotation / plugin definition.

Currently this code assumes that field settings always have values as plain structure

+++ b/core/lib/Drupal/Core/TypedData/TypedDataManager.php
@@ -259,6 +259,13 @@ public function getPropertyInstance(TypedDataInterface $object, $property_name,
     }
     $key = $definition->getDataType();
     if ($settings = $definition->getSettings()) {
+      // Do not throw notices if settings values is array.
+      foreach ($settings as $k => $v) {
+        if (is_array($v)) {
+          debug($v, "key: $k");
+          $settings[$k] = ''; // stub for array settings or maybe serialize?
+        }
+      }
       $key .= ':' . implode(',', $settings);
     }
     $key .= ':' . $object->getPropertyPath() . '.';

This implode() throws notice:
Array to string conversionimplode(',', Array) Drupal\Core\TypedData\TypedDataManager->getPropertyInstance(Object, 0, NULL) Drupal\Core\TypedData\Plugin\DataType\ItemList->createItem(0) Drupal\Core\Field\FieldItemList->__construct(Object, NULL, NULL) Drupal\Core\TypedData\TypedDataManager->createInstance('list', Array) Drupal\Core\TypedData\TypedDataManager->create(Object) Drupal\system\Tests\Entity\EntityFieldTest->testEntityConstraintValidation() Drupal\simpletest\TestBase->run() simpletest_script_run_one_test('227', 'Drupal\system\Tests\Entity\EntityFieldTest')

Proposed resolution

Allow field settings with nested arrays,

Remaining tasks

Find a way to replace implode() or refactor this code

User interface changes

no

API changes

no

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

andypost’s picture

andypost’s picture

FileSize
994 bytes

Let's just calculate a hash, not sure how this affects performance

Berdir’s picture

Status: Needs review » Needs work

Do we even need the hash? As long as you don't have more than 2-3 settings, just the json string is shorter than the hash and likely also easier to debug.

Also needs a comment, something like "Add a string representation of the settings to the key to ensure that different objects are created for data types with different settings."?

andypost’s picture

Status: Needs work » Needs review
FileSize
1.19 KB

Added a comment, also explained the reason for hash

fago’s picture

I see. Given we've objects as data definitions now, I wonder whether we couldn't just the spl object id for identifying definitions and start with that as the key? That means, that as long as we start off with shared definition objects (what we really should make sure it's the case anyway), we'll have the prototyping.

Given that, I think we could even vastly simplify the logic and just prototype based on a passed in $definition object of the data type to be created.

andypost’s picture

Nice idea, let's see

Status: Needs review » Needs work

The last submitted patch, 6: 2221577-hash-settings-6.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
896 bytes

Seems list part is needed

Status: Needs review » Needs work

The last submitted patch, 8: 2221577-hash-settings-8.patch, failed testing.

andypost’s picture

The last submitted patch, 8: 2221577-hash-settings-8.patch, failed testing.

andypost’s picture

Status: Needs work » Needs review
FileSize
834 bytes

Discussed this with @fago, we still need the property path here

Status: Needs review » Needs work

The last submitted patch, 13: 2221577-hash-settings-12.patch, failed testing.

andypost’s picture

@fago there's still a lot of exceptions somehow

yched’s picture

Weird - I ran a couple of those tests locally, they passed.

Status: Needs work » Needs review
andypost’s picture

FileSize
931 bytes

Suppose one-line should be fine

andypost’s picture

Green! That's was bot flux...

alexpott’s picture

Status: Needs review » Reviewed & tested by the community
Related issues: +#2271419: Allow field types, widgets, formatters to specify config dependencies
alexpott’s picture

Status: Reviewed & tested by the community » Needs work

Hmmm... actually i think the solution is not quite right.

+++ b/core/lib/Drupal/Core/TypedData/TypedDataManager.php
@@ -253,11 +253,7 @@ public function getPropertyInstance(TypedDataInterface $object, $property_name,
-    $key = $definition->getDataType();
-    if ($settings = $definition->getSettings()) {
-      $key .= ':' . implode(',', $settings);
-    }
-    $key .= ':' . $object->getPropertyPath() . '.';

In order for this static cache to work then we have to be using exactly the definition object. That was not the case with the old code.

What is wrong with $key .= ':' . serialize($settings);?

andypost’s picture

The reason to not use serialization is performance that's why I initially used a hash()

So to prevent duplication of definition objects the spl_object_hash() looks better approach so no static cache needed

alexpott’s picture

@andypost but my point is that once we use spl_object_hash() the static cache that is being set up here is less likely to be used because using that means that exactly the same field definition object needs to be pass in - not just the same values - the same instantiated object.

Berdir’s picture

Which should be the case if you create multiple nodes of the same bundle for example. Also used to be the case to be the same for all bundles for base fields, not sure if that changed. With overrides, certainly.

yched’s picture

Priority: Major » Critical
Status: Needs work » Reviewed & tested by the community

This is blocking #2271419: Allow field types, widgets, formatters to specify config dependencies which is critical, so this is critical ?

I tend to agree with @andypost and @Berdir here. The perf optim here are mostly about "this is the *same* definition object". I'd think this is good enough, and more robust, than building a compound key with "all the things that matter, including serializing stuff" ?

The existence of overrides means definitions *objects* differ for base fields across bundles, sure. I'd say that's fair game ?

Since this is also the approach suggested by @fago, temptatively kicking back in @alexpott's yard ?

yched’s picture

FileSize
1003 bytes
828 bytes

That could really use a comment, though.

yched’s picture

Also, no need to use both : and . as separators between the spl_hash() and the propertyPath().

swentel’s picture

+++ b/core/lib/Drupal/Core/TypedData/TypedDataManager.php
@@ -253,11 +253,10 @@ public function getPropertyInstance(TypedDataInterface $object, $property_name,
+    // - and for the same parent hiararchy.

It's hierarchy no ?

alexpott’s picture

Wasn't the . separator used because getPropertyPath() returns stuff with dots in - see

/**
   * {@inheritdoc}
   */
  public function getPropertyPath() {
    if (isset($this->parent)) {
      // The property path of this data object is the parent's path appended
      // by this object's name.
      $prefix = $this->parent->getPropertyPath();
      return (strlen($prefix) ? $prefix . '.' : '') . $this->name;
    }
    // If no parent is set, this is the root of the data tree. Thus the property
    // path equals the name of this data object.
    elseif (isset($this->name)) {
      return $this->name;
    }
    return '';
  }
yched’s picture

Status: Reviewed & tested by the community » Needs work

Hem, so much for harmless post-RTBC changes...

But actually, I looked into the surrounding code a bit more, and now I'm just confused :-/

I overlooked that that line at the top of TDM::getPropertyInstance() so far:
$definition = $object->getRoot()->getDataDefinition();

The $definition that is being looked at to answer "do I have a prototype I can reuse for the new Data object I'm asked to create" is the definition of the root object.
(The method then adds to the confusion by later using the same $definition var name for "the DataDefinition of the actual object I need to create" - i.e the FieldItemList, FieldItem, etc...)

So the logic currently is : re-use the prototype if we have already created objects at the same property path (e.g body.0) on a "similar" root object - for some definition of "similar".
In an Entity/FieldItemList/FieldItem context, that root object is always the entity, and $definition is the EntityDataDefinition for the entity.

Then :
- I have to admit I don't understand what this specific issue here is about :-)
$entity_data_definition->getSettings() always returns an empty array anyway.
- As a side note, not sure what the "if ($definition instanceof ListDataDefinition) {" block is used for, it looks like it's never executed - at least not when manipulating entities/fields. Was added in #2112239: Convert base field and property definitions, (discussed in #11 / #13 / #23), so was definitely in a "field" context. For the record, it's the same issue that added the "settings" lookup we're discussing here.
- The $key used to determine if a prototype can be re-used is, in current HEAD, 'entity:[entity_type]:[bundle]:.[property_path]".
We never look at the field definition. "Same entity type + bundle + field name ? Reuse the prototype". I don't know, that sounds... overly accepting ?

I'm a confused atm about what exact logic we'd want here, but yeah, we cannot go with "similar = same runtime EntityDataDefinition object according to spl_object_hash()", because after #2002138: Use an adapter for supporting typed data on ContentEntities, EntityDataDefinition objects are created at runtime for each entity (Entity::getTypedData() / EntityTypeData::createFromEntity())

As a proof - if I add the following debug code in TDM::getPropertyInstance() :

if (no protottype for the $key) {
  print('create prototype for $key');
  ... 
}
// Debug:
else {
  print('reuse prototype for $key');
}

Then running the following session in "drush php" (node 1 and 2 are both article nodes):
HEAD:

> node_load(1)->title->value;
create prototype for entity:node:article:.title
> node_load(2)->title->value;
reused prototype for entity:node:article:.title

PATCH (spl_object_hash):

> node_load(1)->title->value;
create prototype for entity:node:article:.title
> node_load(2)->title->value;
create prototype for entity:node:article:.title

In short :
- the code and logic in TDM::getPropertyInstance() could really use some cleanup / clarification / possibly rethinking ?
- in the current logic, spl_object_hash() is not an option...
- but it's not clear to me what's the actual issue around field settings we're trying to address here to begin with :-)

alexpott’s picture

+++ b/core/lib/Drupal/Core/TypedData/TypedDataManager.php
@@ -253,11 +253,10 @@ public function getPropertyInstance(TypedDataInterface $object, $property_name,
-      $key .= ':' . implode(',', $settings);

@yched this is the problem we are trying to fix since this can fail if $settings is a multi-dimensional array - this

$entity_data_definition->getSettings() always returns an empty array anyway.

Is not always true.

yched’s picture

@alexpott: ok, then what I don't get is when is this not an empty array ? It was in the couple cases I stepped through.
+ I might be overlooking something, but I don't see what would be the settings of an EntityDataDefinition ?

Maybe it would help to have an example dump of what is found in getSettings() if not empty ?
Or more simply put : how do we reproduce the bug discussed here ? :-)

alexpott’s picture

Status: Needs work » Needs review
FileSize
1.28 KB
618 bytes

Explanation in #2271419-49: Allow field types, widgets, formatters to specify config dependencies.

Test only patch attached.

Patch attached should also address the cache misses reported in #30

Status: Needs review » Needs work

The last submitted patch, 33: 2221577.test-only.patch, failed testing.

yched’s picture

Status: Needs work » Reviewed & tested by the community

OK, so the fails are caused by EntityFieldTest::testEntityConstraintValidation() creating a FieldItemList directly without a parent entity.

Then, when an Item is created, getPropertyInstance() is called with $object = the FieldItemList
- $object->getRoot() = $object, since it has no known parent
- $definition is the ListDataDefinition instead of the EntityDataDefinition
- special code for "if ($definition instanceof ListDataDefinition) { use the ItemDefinition instead }"
- that ItemDefinition is the FieldItemDataDefinition, and its ->settings() are the field settings...

This is not supposed to happen in the regular flow (we don't support FieldItemLists without a parent entity), but we do have a test that does it. This screws the "prototype cache" logic, but no biggie in a test.

Also, I guess outside of the "Entity/FieldItemList" case, we do want to support standalone ItemLists without a parent, and with ItemDataDeffinitions that might have non-scalar settings.

So, yeah, serializing looks like the only valid option in that case.

I do intend to try to cleanup / clarify the content of getPropertyInstance(), but for that issue here, #33 should be good ?

andypost’s picture

Issue tags: +needs profiling

As this one blocks other issues the fix is fine.
OTOH memory consumption should grow with a straight serialized data so needs profiling.

+++ b/core/lib/Drupal/Core/TypedData/TypedDataManager.php
@@ -255,7 +255,7 @@ public function getPropertyInstance(TypedDataInterface $object, $property_name,
-      $key .= ':' . implode(',', $settings);
+      $key .= ':' . serialize($settings);

I still think that we need to hash serialized serialized data to make data structure smaller.
Also having ":" does not make any sense because serialized data could have a lot of it

yched’s picture

I'm not really worried about memory consumption. As explained above, $definition->getSettings() is an empty array in of the 99% actual use cases - at least in the current logic for prototype reuse that is based on comparing definitions of the root object.

Also, the presence of ':' in the serialized data is not really problematic, since we're only concatenating cache key parts, and never try epxlode them back apart.

But well, since the case for serialize() is pretty rare anyway, we could also chose to serialize() + hash. No strong opinion, TBH.

alexpott’s picture

FileSize
1.57 KB
986 bytes

I'm not really worried about memory consumption.

Well better safe that sorry - let's hash the output of serialise so the size of the output is predictable.

yched’s picture

Alright. Confirming RTBC.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to 8.x. Thank you!

  • Dries committed ccbe585 on
    Issue #2221577 by andypost, alexpott, yched: Fix assumption that field...
yched’s picture

yched’s picture

About the use of Crypt::hashBase64() :

#2346763: Improve views field rendering performance by caching entity display objects has a similar case of serializing an array of settings for a static cache key.
Over there, @msonnabaum used crc32(serialize($array)).

Quoting him from #2346763-10: Improve views field rendering performance by caching entity display objects :

I used crc32 because it's the fastest way to hash something for non-cryptographic purposes. It has a higher collision rate, but for a static cache with other strings attached, the risk of that seems extremely low.

Using sha256, base64-ing, then doing a string replace as Crypt::hashBase64 does, is totally overkill.

The difference is perhaps a micro-optimization, but I think it's generally a bad habit to not distinguish between cryptographic and non/cryptographic hash requirements

So, should we change to crc32() here as well for consistency ?

andypost’s picture

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.