Problem/Motivation

The time to add a new field, and the time to delete a field, grows with the number of existing fields.

I've been doing some benchmarking of the entity field api to explore whether it could serve as the foundation for webform module in D8 (see Issue #2075941). This benchmarking uncovered the following behaviour in 8.0.0-beta9:

  • The time to add a new field grows with the existing field count. At 1,000 existing fields, the time is ~1 second.
  • The time to delete a field grows with the existing field count. At 1,000 fields, the time is ~10 seconds.
  • Field management, form and display configuration, and entity render times seem unaffected.

This behaviour could pose performance problems for sites with a large number of fields, webform or no.

Steps to Reproduce

I used the attached script of custom drush commands to benchmark. (Change filename to 'entityform.drush.inc' and place in .drush folder).

// Add 100 content types.
drush ebct node 100

// Add 10 fields to each content type.
drush ebaf node all 10

Then add/delete a field via the UI.

Proposed resolution

I'll preface this by saying I am relatively new to D8 core, but I believe the cause has to do with certain methods defined in EntityManagerInterface that fetch field config/field storage config definitions for all bundles of a single entity type, or for all bundles of all entity types. For example, EntityManager::getFieldMap() builds a map of all field definitions across all entities. This method is executed when a field is deleted, and xhprof tells me it's very costly (~9 seconds in the above benchmarking).

I'm not sure if this is related, but from poking around in the DB I also noticed that for each entity type, there exists a serialized array of all field_storage_config entities belonging to it. In the above benchmarking (1,000 total node fields), the size of the serialized array for the node entity type was approaching 1MB in size.

A solution would involve making these methods accept entity type and bundle arguments so that field configuration is only fetched in smaller pieces at a time. Code that uses these methods would need to be refactored. I suspect this would require some refactoring of the config architecture for fields as well, but I'm not familiar enough with those internals to say for sure.

Remaining tasks

1. Validate above results.
2. If validated, isolate causes and decide whether fixes are necessary.
3. If a fix is deemed necessary, create patch(es).

User interface changes

None that I can see. This is all back-end stuff.

API changes

Changes to the EntityManager as discussed above.
Possibly changes to the config/caching system for fields.

Files: 
CommentFileSizeAuthor
#30 entity_field_bench.tar_.gz11.98 KBmesch
#27 field-map-state-2473983-27-interdiff.txt27.56 KBBerdir
#27 field-map-state-2473983-27.patch34.21 KBBerdir
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,933 pass(es). View
#18 field_map.php_.txt1.04 KBBerdir
#14 field-map-state-2473983-14-interdiff.txt9.5 KBBerdir
#14 field-map-state-2473983-14.patch13.29 KBBerdir
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,697 pass(es). View
#7 field-map-state-2473983-7-interdiff.txt1.76 KBBerdir
#7 field-map-state-2473983-7.patch6 KBBerdir
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,611 pass(es), 13 fail(s), and 1 exception(s). View
#5 field-map-state-2473983-5.patch5.93 KBBerdir
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 89,220 pass(es), 599 fail(s), and 1,017 exception(s). View
entityform.drush_.txt6.02 KBmesch

Comments

mesch’s picture

Issue summary: View changes
hass’s picture

Category: Feature request » Bug report
Priority: Normal » Critical
Berdir’s picture

We actually discussed this yesterday at a dev days dinner with @fago and @amateescu.

And yes, we're aware of the getFieldMap() bottleneck, and we had two ideas on how to resolve that yesterday:

1) Kill it. But that might not be possible.
2) Make it a persistent storage, not a cache. Introduce new methods on entity manager to notify him about new fields + he listens on on some events, then maintain the field map in state/key value.

Note that I don't think that using bundles configurable fields like this is a scalable solution for webform. Even if we fix this, the configuration system is just not mean to store tens of thousands of configuration files. As I explained in my D8 contrib overview presentation yesterday at DDD, that's a limitation that the contact.module has, otherwise, it's a very interesting alternative if you just need a few contact forms on your site with a few fields.

Maybe you can use the widgets/formatters/field item classes anyway and create runtime FieldDefinition(Storage)Interface objects, that aren't exposed through entity manager....

Berdir’s picture

Assigned: Unassigned » Berdir

Might experiment with this a bit during my train ride...

Berdir’s picture

Status: Active » Needs review
FileSize
5.93 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 89,220 pass(es), 599 fail(s), and 1,017 exception(s). View

Status: Needs review » Needs work

The last submitted patch, 5: field-map-state-2473983-5.patch, failed testing.

Berdir’s picture

Status: Needs work » Needs review
FileSize
6 KB
FAILED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,611 pass(es), 13 fail(s), and 1 exception(s). View
1.76 KB

Fixing some obvious bugs.

Status: Needs review » Needs work

The last submitted patch, 7: field-map-state-2473983-7.patch, failed testing.

dawehner’s picture

+++ b/core/lib/Drupal/Core/Entity/EntityManager.php
@@ -640,19 +641,34 @@ public function getFieldMap() {
+
+        $this->cacheSet($cid, $this->fieldMap, Cache::PERMANENT, array('entity_types'));

So is the plan to remove the cache there?

Berdir’s picture

Probably not, because we still have a the loop over the field definitions, which might still result in quite a lot of separate cache calls and memory usage.

My idea was to have the mix, with maintaining the per-bundle fields in state, and the base fields by looping over them, otherwise we'd have to keep a separate list of them and act on entity type creation and changes. And I don't think those are a scaling problem, we have much, much less entity types than entity types * their bundles.

dawehner’s picture

Ah I see, it makes sense to have kinda both. Just tagging the issue as we need to update and describe the implemented proposal

Berdir’s picture

Assigned: Berdir » Unassigned

The remaining test failures should be fixable by implementing onFieldDefinitionDelete() and remove the field again (remember to remove the field completely if no bundles are left) and call that from postDelete() in FieldConfig. They're all about field storage configs getting deleted automatically.

I was also wondering if we should update that and do a config entity query there instead for the check, but that's not really related.

I don't have time before tonight, so if anyone wants to work on this issue, this should be relatively easy to push forward (that method, documentation, unit tests will be harder...)

catch’s picture

Splitting out seems OK, but why use state? Using the cache API but ensuring it's write-through (and not cleared when it shouldn't be) should be enough?

Berdir’s picture

Status: Needs work » Needs review
FileSize
13.29 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,697 pass(es). View
9.5 KB

A write through cache is still a cache. It would get deleted during cache clears and we would have to spend 9s and tons of memory to rebuild the list.

catch’s picture

It would get deleted during cache clears

Which cache clears though? We're pretty close to not invalidating caches when we don't need to.

Berdir’s picture

drupal_flush_all_caches(). We still do that when people clear the cache manually, on update.php and so on.

I'll do some profiling/benchmarking with field creating and cache clears soon.

catch’s picture

drupal_flush_all_caches(). We still do that when people clear the cache manually, on update.php and so on.

Right I think that's preferable to putting caches into state to work around this. We could also remove the full cache clear from update.php if we want to - vast majority of updates don't need one, and rebuild.php or drush cc all can always be called separately.

The distinction between cache/state/key value is quite nuanced so core should really try to keep the separation clear. I've seen a (high-ish usage) Drupal 7 contrib module use variable_set()/variable_get() as a static cache/global replacement before.

Also drupal_flush_all_caches() or at least rebuild.php I'd expect to clear something like this and force a rebuild. If it got corrupted somehow (which is possible given no transaction support anywhere here) the only way to fix that would be to go in and manually delete the state entry.

Berdir’s picture

FileSize
1.04 KB

Strange.. did some profiling using the attached script, but I didn't quite get the results that I expected. the new approach was faster until I got to bundle + 20 fields #250, then it exploded from taking 3 seconds to 10+. HEAD has a similar effect and it is equally slow but it's slower before.

Looks like there are some other bottlenecks, profiling now....

HEAD:

0: Created bundle with 20 fields in 0.39s
1: Created bundle with 20 fields in 0.36s
2: Created bundle with 20 fields in 0.45s
3: Created bundle with 20 fields in 0.48s
4: Created bundle with 20 fields in 0.4s
5: Created bundle with 20 fields in 0.47s
6: Created bundle with 20 fields in 0.5s
7: Created bundle with 20 fields in 0.48s
8: Created bundle with 20 fields in 0.51s
9: Created bundle with 20 fields in 0.48s
...
284: Created bundle with 20 fields in 12.26s
285: Created bundle with 20 fields in 12.27s
286: Created bundle with 20 fields in 11.98s
287: Created bundle with 20 fields in 12.14s
288: Created bundle with 20 fields in 12.06s
289: Created bundle with 20 fields in 11.99s
290: Created bundle with 20 fields in 12.05s
291: Created bundle with 20 fields in 12.01s
292: Created bundle with 20 fields in 12.05s
293: Created bundle with 20 fields in 12.26s
294: Created bundle with 20 fields in 12.9s
295: Created bundle with 20 fields in 12.25s
296: Created bundle with 20 fields in 12.25s
297: Created bundle with 20 fields in 12.3s
298: Created bundle with 20 fields in 12.6s
299: Created bundle with 20 fields in 12.29s
300: Created bundle with 20 fields in 12.44s

Patch:

0: Created bundle with 20 fields in 0.3s
1: Created bundle with 20 fields in 0.4s
2: Created bundle with 20 fields in 0.37s
3: Created bundle with 20 fields in 0.34s
4: Created bundle with 20 fields in 0.38s
5: Created bundle with 20 fields in 0.37s
6: Created bundle with 20 fields in 0.45s
7: Created bundle with 20 fields in 0.42s
8: Created bundle with 20 fields in 0.54s
9: Created bundle with 20 fields in 0.58s
10: Created bundle with 20 fields in 0.59s
11: Created bundle with 20 fields in 0.42s
12: Created bundle with 20 fields in 0.45s
13: Created bundle with 20 fields in 0.48s
...
231: Created bundle with 20 fields in 3.85s
232: Created bundle with 20 fields in 3.79s
233: Created bundle with 20 fields in 3.83s
234: Created bundle with 20 fields in 3.71s
235: Created bundle with 20 fields in 4.35s
236: Created bundle with 20 fields in 4.35s
237: Created bundle with 20 fields in 4.64s
238: Created bundle with 20 fields in 5.42s
239: Created bundle with 20 fields in 5.76s
240: Created bundle with 20 fields in 6.34s
241: Created bundle with 20 fields in 6.8s
242: Created bundle with 20 fields in 7.11s
243: Created bundle with 20 fields in 7.94s
244: Created bundle with 20 fields in 8.59s
245: Created bundle with 20 fields in 8.56s
246: Created bundle with 20 fields in 8.47s
247: Created bundle with 20 fields in 8.49s
248: Created bundle with 20 fields in 8.7s
249: Created bundle with 20 fields in 10.29s
250: Created bundle with 20 fields in 11.38s
...
288: Created bundle with 20 fields in 12.86s
289: Created bundle with 20 fields in 12.77s
290: Created bundle with 20 fields in 12.87s
291: Created bundle with 20 fields in 12.77s
292: Created bundle with 20 fields in 12.96s
293: Created bundle with 20 fields in 13.05s
294: Created bundle with 20 fields in 12.99s
295: Created bundle with 20 fields in 13.23s
296: Created bundle with 20 fields in 13.1s
297: Created bundle with 20 fields in 13.16s
298: Created bundle with 20 fields in 13.1s
299: Created bundle with 20 fields in 13.28s
300: Created bundle with 20 fields in 13.25s
Berdir’s picture

Interesting, looks like I've been running into two problems..

a) Config save getting slow if you have a lot of config. 76% of whole time is spent when creating 10 bundles with 10 fields, when I already have 300++ bundles with 20 fields is in Drupal\Core\Config\Entity\ConfigEntityBase::preSave(), which ensures that the UUID is unique. See See #2423591: Optimize config entity import and specifically #2430219: Implement a key value store to optimise config entity lookups for that problem.

b) Some sort of overhead because I did that before all in the same script. When I started again to just create the mentioned 10 bundles, then it was considerably faster (2.2s for a bundle with 10 fields).

Also, I'm much stupid, because creating fields doesn't actually call getFieldMap(). So the good part is that getFieldMap() is not responsible for the first part of this issue (creating fields), that's really only the UUID problem.

The bad part? getFieldMap() is actually way slower than I thought. In my current site, with 340 bundles, getFieldMap() with xhprof enabled took 80 seconds on HEAD. With the patch, 67 milliseconds. And with xhprof disabled, that's still around 60s on HEAD.

@catch: Sure state was just a first implementation because I was lazy. We can use a key value with a separate collection for example, or something like that. and variable_get()set() is very different from either of those two, because we don't have a global cache for them all.

yched’s picture

On a vacation, didn't read all atm.

Just: being able to use field definitions not stored in configuration would be cool, but the problem is that the entity system is completely not injectable atm when it comes to field definitions - you cannot inject arbitrary FDs to use in an entity, the FDs are always pulled from the "official registry" in the EM.

Would be very valuable to fix that IMO, but that might be not completely trivial :-/

mesch’s picture

@Berdir, thanks for looking at this. I will try to help where I can, but I'm still getting familiar w/ D8 core.

@yched re: #20, that would be fantastic. I wonder whether entities could control their own field and bundle configurations, perhaps by way of the entity storage handler. I'd imagine this is probably more of a nice to have for D9 though...

yched’s picture

Could get a look on the patch at last, +1 on the approach.
Looks like we could unpack the code a bit, shorten some var names, but other than this this looks awesome.

Also, not sure why k/v would be more appropriate than state ?

Berdir’s picture

state *is* key value. The difference is that state has certain semantics, we already have static caching and I can totally see someone adding an implementation that provides persistent caching, e.g. using a cache collector. state is more for small pieces of data, althoug we mis-use it elsewhere already.

We could for example use a separate key value collection and use a key per entity type, then the update logic would be easier as we'd have fewer nestings, and getFeldMap() could load all values of the collection.

mesch’s picture

If a way to conduct performant queries on config entity properties emerges from #2430219: Implement a key value store to optimise config entity lookups, could we do away with getFieldMap() entirely? Grepping about I find three uses of getFieldMap() in core:

\Drupal\forum\Controller\ForumController::create()
- needs to find bundles associated with a single node field.

\Drupal\field\Entity\FieldStorageConfig::getBundles()
- needs to find bundles associated with a single field for a single entity type.

\Drupal\Core\Entity\EntityManager::getFieldMapByFieldType()
- creates a map of fields for a specific field type.

In each case, a tailor-made query would be preferable to incurring the overhead of creating a map of all fields on the site, but obviously it depends on how queryable the config schema is.

Berdir’s picture

That's not an option.

The field map also contains base fields and other fields registered through hooks. For example, you could define an entity type that has a comment field as a base field, that would not work unless the field map contains those fields as well.

fago’s picture

Great findings! The k/v approach sounds good to me, I agree that state does not fit.

Just: being able to use field definitions not stored in configuration would be cool, but the problem is that the entity system is completely not injectable atm when it comes to field definitions - you cannot inject arbitrary FDs to use in an entity, the FDs are always pulled from the "official registry" in the EM.

Yep, that would be huge as it would enable the door for contrib to mess with per entity fields also. But yeah, that's for sure non trivial and might be too destructive at this stage :/

Berdir’s picture

FileSize
34.21 KB
PASSED: [[SimpleTest]]: [PHP 5.4 MySQL] 91,933 pass(es). View
27.56 KB

So, switched to key value collection. I changed the constructor, using a collection as a services is a bit strange and it's an overhead as we need to instantiate the database factory service and the collection object even though we will very likely not use it.

Also, lots and lots of unit tests.

Berdir’s picture

Title: Evaluate Entity Field API Scalability » [meta] Evaluate Entity Field API Scalability
Priority: Critical » Major

I've created #2482295: Rebuilding field map with many bundles/fields is very slow for the field map problem.

We have #2430219: Implement a key value store to optimise config entity lookups to solve the problem with the slow config creation, we also have #2423591: Optimize config entity import to look for further improvements there.

@alexpott also found #2482231: Deleting configuration entities is super slow once you have a few., which is the reason for why deleting a field is slow. We expect that the field map improvements will solve that as well, but we are keeping the issue open to do more profiling once that is fixed.

As far as I see, all known problems have separate issues, and those that must be addressed are critical on their own. It makes sense to keep this open to do more profiling with similar and different scenarios, but I think we can change this to major again. In case we find more critical problems, we can always create new issues or change it again.

Wim Leers’s picture

mesch’s picture

FileSize
11.98 KB

I did some further benchmarking against HEAD (9f4d5e8) with the attached script via drush scr entity_field_bench.php | tee [output_file]. Results also attached.

The results seem to suggest the number of field storage configs have a much greater bearing on performance than the number of field instances.

mgifford’s picture

Status: Needs review » Needs work

Patch no longer applies.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Berdir’s picture

Category: Bug report » Plan
Status: Needs work » Fixed

I think we can safely close this issue now.

There are certainly limits, the most obvious one seems to simply be the amount of configuration files, which slows down config sync.

We had a site with around 100 contact forms, each with a good amount of fields, a lot of them shared. Plus form and view displays. That was around 2k config files if I remember correctly. We've cleaned up a bit since then, but we had no obvious problem with that amount of forms and that should be enough for the vast majority of the sites.

I wouldn't use this to build something like webform.com, where you might have thousands of complex, completely custom forms. But we never designed for that and I don't think that's a use case that we need to support with entity bundles/fields.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.