Problem/Motivation

Quoting myself from #2995537: This is how I'm using JSON API and JSON API Extras

there were ~250 nodes on the page and on a cold cache, the page takes 30-40 sec to load all the nodes.... Loading ~250 nodes from the DB is a pain but I don't think JSON API can do anything about it.

This is not a big problem for non-fieldable, non-revisionable and non-translatable entities but as soon as we start using it for fieldable or revisionable or translatable entities or fieldable, revisionable and translatable entities performance degrades hugely.

Here are some suggestions:

  • Is using entity API a must?
  • Don't load full entity object when it is listing page.
  • Load requested field values via EFQ instead of loading full entity.
  • Purpose above solution for the core entity API improvments

Proposed resolution

TBD

Remaining tasks

TBD

User interface changes

None

API changes

TBD

Data model changes

None

Comments

jibran created an issue. See original summary.

jibran’s picture

gabesullice’s picture

Version: 8.x-1.x-dev » 8.x-2.x-dev
Issue tags: -performace +API-First Initiative, +Performance, +needs profiling
StatusFileSize
new51.79 KB

This is a very large patch that I do not propose to commit in its current form. It has many components to it that could themselves be separate issues.

But I've seen a few performance related remarks lately and would love it if you would report on whether this does anything to improve uncached responses, especially those with includes and sparse field sets. Before and after statistics would be most appreciated.

e0ipso’s picture

Load requested field values via EFQ instead of loading full entity.

Are we losing entity cache? Is that wise in the aggreggate of performance? We should not focus only on the cold cache scenario, but the big picture.

jibran’s picture

Are we losing entity cache?

Yes, IMO core should provide some kind of API to load partial entities which support entity cache.

e0ipso’s picture

@gabesullice what's the strategy towards performance with the patch in #3? Is it to avoid normalization of entities that have been already normalized? If that's the case I agree we can get a big improvement by avoiding unnecessary work. What about a less disruptive change like a static cache for rasterized entities, would that achieve the same goal?

Also, did you work off a performance profile for this, or out of common sense?

gabesullice’s picture

@e0ipso, common sense. That's why I'm soliciting data from people :P

A coupled strategies that I hope will improve performance are:

  1. It processes includes prior to normalization and loads as much as it can in bulk. So, /jsonapi/node/article?include=uid with 50 nodes collects all the target user IDs and makes a single loadMultiple call, vs calling $node->uid->get('entity')->getValue() 50 times.
  2. Because it does this, it no longer needs to normalize fields that aren't included in a sparse field set. Right now, we still normalize them in order to pick up includes.
  3. Doing both those things means JsonApiDocumentTopLevelNormalizer doesn't get called recursively for includes, which means we avoid a lot of unnecessary work for errors and links. I'm hoping that this will make the normalizer process easier to profile for further performance issues.
  4. A side benefit of that is that normalization become a lot easier to reason about, hopefully making #2994473: [META] JSON API's normalizers support schema tracking, to guarantee comprehensive schema easier for us as well.

What about a less disruptive change like a static cache for rasterized entities, would that achieve the same goal?

I've thought about caching normalizations, but it'd have to take a different form. This particular suggestion becomes a non-issue with the above changes because it'll never normalize an include more than once, since normalizing is no longer responsible for includes and includes are deduplicated prior to the normalization process.

e0ipso’s picture

StatusFileSize
new330.24 KB

Relevant IRC conversation.


Let's move this from experiment to actual issues. EXCITING!

gabesullice’s picture

Title: Improve performance for entity collection endpoint » [META] Improve collection and include performance
wim leers’s picture

#7:

  1. Doing both those things means JsonApiDocumentTopLevelNormalizer doesn't get called recursively for includes, which means we avoid a lot of unnecessary work for errors and links. I'm hoping that this will make the normalizer process easier to profile for further performance issues.

That's what #2997600: Resolve included resources prior to normalization does, right? And that's what the IRC conversation in #8 is referring to, right?

Also: YAY for this issue! And more importantly: YAY for this becoming a problem! For the longest time, nobody ever complained about performance, even though I/we raised concerns and sometimes preemptively addressed them. People actually complaining about this means they're using it for increasingly ambitious and complex projects. Which is what's necessary for this functionality to mature! 🎉

wim leers’s picture

Category: Task » Plan
gabesullice’s picture

@jibran brought this up during our API-First Weekly Meeting:

JSON API's collection routes are bundle-specific, but we use the {entity_type}_list cache tag on them because there's not a less generic list cache tag available. That means that if any node--article entity is updated, every node--page collection is invalidated for no reason.

There's already an open Drupal core issue that would help us ameliorate that problem: #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing

wim leers’s picture

Interestingly, @Berdir just pinged me about that on Twitter too: https://twitter.com/berdir/status/1059769594985005057 — perhaps now is the time to start pushing forward that core issue? :) Seems like we're finding increasingly many reasons to fix that issue!

jibran’s picture

StatusFileSize
new96.65 KB

Here is some good news by addressing #3011099: Only serialize sparse_fieldset fields in \Drupal\jsonapi\Normalizer\EntityNormalizer::normalize(), #3007091: Performance issue in ConfigurableResourceType and introducing some static cache for the custom computed fields, we were able to improve the response time a lot.

Thank you for all the help.

Note: Before stats were taken with JSONAPI 1.x and after stats were taken with JSONAPI 2.x.

gabesullice’s picture

Wow! Thanks @jibran. Those are really good numbers.

I'm sure that it really depends on which fields are left unserialized and how many, i.e., those numbers will be different for everyone. But, the whole point of sparse field sets is to tailor and increase performance for your particular needs. It wasn't previously possible, but now it is!

s. B    s. A    % faster
11.37 | 3.75 |  67.02
22.48 | 6.82 |  69.66
22.09 | 6.85 |  68.99
17.49 | 5.32 |  69.58
1.33  | 1.61 | -21.05
     Average:   50.84
wim leers’s picture

Impressive! 👏

Well done, @jibran — thanks for speaking up, constructively criticizing, helping change happen and then reporting back on concrete impact! Thanks! 🙏

wim leers’s picture

Status: Active » Closed (outdated)

#20: Thanks, that issue has landed since then. #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer() is closed in favor of #3014283: Use ResourceTypeFields to statically determine the field normalizer now. That leaves only #2819335: Resource (entity) normalization should use partial caching as one of the remaining open JSON:API issues listed in this issue.

@ndobromirov, perhaps you want to take on #2819335: Resource (entity) normalization should use partial caching? :) That would have a far bigger performance impact than many of the issues you've worked on so far.

Then there's #3013596: Generating resource links (`self` + `related`) is too slow — noticeable when generating 140K of them, which @ndobromirov opened.

Finally, there's of course #2145751: Introduce ENTITY_TYPE_list:BUNDLE cache tag and add it to single bundle listing, which is a core issue to address more fundamental architectural problems; problems in core, not JSON:API. @jibran, I've been keeping an eye on that, but @Berdir has already been providing great reviews, so I don't think you've been blocked yet :)


IMHO we can now close this issue. It fulfilled its destiny of identifying several performance bottlenecks. We had a meaningful discussion that started in September and lasted a few months. Now the next steps are #3014283: Use ResourceTypeFields to statically determine the field normalizer and #2819335: Resource (entity) normalization should use partial caching. Quite a few performance improvements have already landed. At this point, this is outdated, and it's pointless to keep it up-to-date. Just look at JSON:API issues tagged Performance: https://www.drupal.org/project/issues/search/jsonapi?status%5B%5D=Open&i...