toArray() now uses the config schema to detect what to export. That is slow.

In my profiling, with a bunch of config fields, ConfigEntityBase::toArray() is called 57 times and that takes 9.3ms, 2.2% of the whole request. Despite using APC as the cache backend for the config schema.

Also, EntityManager::getFieldDefinitions(), with 3 calls, in my case takes 12ms (which includes the 9.3ms from above), that is more than the other 52 calls to the apc cache backend combined.

It is also problematic because a lot can go wrong and FieldInstanceConfig::toArray() also calls getFieldStorageDefinition()->getType(), so while waking up the field instance, we already call out/back to the entity manager for the storage definitions/FieldStorageConfig, which is then woken up too.

Can we find a way to make __wakup() work that does not involve toArray() and __construct() We should have most of the information available now without those tricks, as the field is lazy loaded?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Berdir’s picture

Title: Field storage and instance call toArray() in __wakeup() » Field storage and instance call toArray() in __wakeup() is very slow, remove it?
Status: Active » Needs review
FileSize
1.71 KB

This does seem too easy :)

pounard’s picture

As long as you don't have any resources references or other objects references, there's great chances that basic PHP serialization remains safe.

alexpott’s picture

Status: Needs review » Reviewed & tested by the community

@pounard: Yep the __sleep() will remove all of these so removing the wakeUp is fine.

@Berdir nice one! Good to go.

Anonymous’s picture

nice, berdir++

yched’s picture

The logic in __sleep() / __wakeup() so far was :
- __sleep() saves to the same array that is used for regular config storage
- on that array, __wakeup() performs the same steps than when reading from config storage (i.e pass the array through __construct())

With the patch, we "downcast" with toArray(), but no longer "upcast" through __construct(). This happens to work now because FieldStorageConfig / FieldInstanceConfig __construct() have been simplified quite a bit in the past couple weeks, less critical stuff happens there. But still, that feels incorrect. The contract for toArray() is "scale the object down to an array of values for storage, those values are what your __construct() will receive when the object is re-instantiated".

If it's toArray() that is costly, what if we just remove the toArray() call, but keep $this->__construct(get_object_vars($this)); ?
TBH, I don't really remember why the array_intersect_key() thing was added...

Also : the __sleep() implementations have this :

    // Serialize $entityTypeId property so that toArray() works when waking up.
    $properties[] = 'entityTypeId';

Not needed anymore if __wakeup() no longer calls toArray().

catch’s picture

Status: Reviewed & tested by the community » Needs review
Berdir’s picture

We do not use toArray() values. We only use the mix of toArray() and the object properties to figure out which object properties we want to serialize (the part of the values that exist as properties on the class), then we return those keys.

Then PHP knows which properties of our object should be serialized and sets them back automatically before it calls __wakeup(). The current implementation there then collects those *again* and then sends them to the constructor to be re-assigned a second time.

This was only necessary while we were actively loading stuff in the constructor as we needed that to happen. I really don't see the point anymore in doing this. Instead of toArray(), we could also call it with a hardcoded list of keys, which would make it a bit more explicit, but then we'd have to maintain that property list in case we want to change it.

The logic you describe is how SerializableInterface works, our current implementation is a weird mix of __sleep/__wakeup() and that interface because we had to make that __construct() call work somehow (or duplicate the relevant code in __wakeup()).

I can't see where entityTypeId would be used in that class at all, I guess that was renamed at some point and now just works?

yched’s picture

Hm, right, my #5 is outdated, "serialize to toArray() like on regular config storage" was what we were doing initially when we used php's Serializable interface.

But then we moved to __sleep() / __wakeup() because of those weird fails across PHP versions, and __sleep() doesn't let specifiy *values* to serialize, only the *names* of the properties of the current object to include in the serialization. So we used array_keys(toArray()) as a way to separate "basic properties" from "runtime, lazy-loaded or lazy-computed properties that shouldn't be serialized". We need to intersect with the actual properties present in the actual object, because FieldInstanceConfig::toArray() adds 'field_type' for config storage, which is currently not present as an actual property in the object, and would trigger notices.

[Note that we *should* probably keep "field_type" as an actual property in FieldInstanceConfig, since you pointed out there are critical code paths where we only load the FieldStorage to be able to get the field type - and we do save that info in the yml file, just to ditch it on load... But that's probably a different task]

I still think the behavior described in #5 (mimick what happens on regular config storage, i.e serialization as storage in a different backend) would be the right generic approach for all ConfigEntities, since they too get serialized, but, unlike ContentEntities, don't have a generic mechanism - I tried that in #1977206: Default serialization of ConfigEntities, but stalled. Looks like __sleep() / __wakeup() won't allow that though.

Meanwhile, I guess I'm fine with the current patch : keep __sleep(), remove __wakeup() and stop going through the __construct(), since we don't do anything important in there.

Regarding $properties[] = 'entityTypeId':
It was added in #2256679: Use config schema to determine which config entity properties to export regardless of whether they are public or protected, which added a base implementation of ConfigEntityBase::toArray() that, well, is suddenly more expensive :-), and that relies on $this->getEntityType() (and thus $this->getEntityTypeId(), and thus $this->entityTypeId).

It wouldn't be needed anymore if __wakeup() stopped calling toArray() but still called __construct(). But if we stop calling __construct() too, then I guess it's still needed, otherwise it looks like an unserialized object would have no $this->entityTypeId at all ? We'd just need to update the comment in __sleep(), because it wrongfully points at a toArray() call in __wakeup() that no longer exists.

Berdir’s picture

Yep, and those tricks with the intersection of the toArray() keys and object properties is pretty hard to understand stuff.

What about being very explicit about what we do not want to serialize? See the attached patch, no magic, just getting all the properties, unset the 2-3 we don't want and then return that. We end up serializing a handful of additional properties from the base classes too that we don't really need, but on the other hand, it's also much more reliable in case Entity or ConfigEntityBase suddenly adds something that it needs to stick there...

[Note that we *should* probably keep "field_type" as an actual property in FieldInstanceConfig, since you pointed out there are critical code paths where we only load the FieldStorage to be able to get the field type - and we do save that info in the yml file, just to ditch it on load... But that's probably a different task]

Yep, already discussed with @alexpott and he agreed that makes sense. There's a lot more unfortunately for which we load the field storage, but it's a start.

yched’s picture

Well, that's precisely what I wanted to avoid with #1977206: Default serialization of ConfigEntities - have a generic serialization code that works for all config entity types, rather than have each ConfigEntity type figure out its serialization code (or break slient havoc if it doesn't - as we know, some entity types don't even realize they're being serialized by some stuff somewhere).

The current code in __sleep() (or the one in patch #1) could potentially work for all entity types. IIRC from #1977206: Default serialization of ConfigEntities, it was Views config entities that were choking, but then it's more Views that would be a special case that need specific overrides.

I'd personnally rather stick with #1 (with the comment fix mentioned in #7) and not lose all hopes of #1977206: Default serialization of ConfigEntities :-). Anyone feel free to disagree though, I won't oppose #9 if people favor that.

Berdir’s picture

Hm, that is true, but for me, the current code is too much magic, as mentioned, like including the entityTypeId like that. One thing that is for example then missing is the original ID, that is a not a problem for fields as you can't change their ID, but for config entities where that is allowed, this could result in nasty problems because it would break the rename detection if you try to save them.

Basically, you trade "not needing to know about a specific config entity type" with "needing to know about (internal) stuff of the base classes. If that happens in the base class, then the situation is again a bit different, as those are then your internals.

Anyway, both options are fine with me as well. Any other opinions?

yched’s picture

Status: Needs review » Reviewed & tested by the community

Well, internal properties like entityTypeId & originalID were taken care of by calling the __construct() on __wakeup() ;-)

But well,
- running the __construct() would make sense if what we stored were the values of toArray() (= what CMI storage does), and makes less sense if we store the values of the live object as we currently do. Storing the values of toArray() would require moving to Serializable instead of __sleep() / __wakeup(), so not for this issue.
- agreed with the reasoning that, ideally, serialization would be taken care of generically in ConfigEntityBase, and then that code having to deal with the internals of ConfigEntities is fine.

So, I'm fine with moving forward with #9 for now, with the hope that we can get #1977206: Default serialization of ConfigEntities right with Serializable.

yched’s picture

tim.plunkett’s picture

Version: 8.x-dev » 8.0.x-dev

This is running against old code, let's make sure it still passes.

alexpott’s picture

Status: Reviewed & tested by the community » Fixed

Committed 54a49d5 and pushed to 8.x. Thanks!

  • alexpott committed 54a49d5 on 8.0.x
    Issue #2313053 by Berdir: Fixed Field storage and instance call toArray...

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.