Problem/Motivation

We want to have a reliable schema for our resources, but the output of our normalizers is not enforced to a particular type or object shape. On top of this the normalizer to use is determined at run-time. These make the normalized outputs for fields a bit brittle, from a consistency stand point.

Proposed resolution

#3014277: ResourceTypes should know about their fields introduced the concept of field resources. Each resource field object should have a normalizer statically defined. Then we can change the entity normalizer (both content and config) to something like:

// Pseudo-code.
$enabled_fields = array_filter($resource_type->getResourceFields(), 'isEnabled');
$enabled_attributes = array_filter($enabled_fields, instanceof ResourceAttributeDefinition);
$enabled_relationships = array_filter($enabled_fields, instanceof ResourceRelationshipDefinition);
return [
  'type' => $type,
  'id' => $id,
  'attributes' => array_map($enabled_attributes, 'normalize'),
  'relationships' => array_map($enabled_relationships, 'normalize'),
];

Comments

e0ipso created an issue. See original summary.

wim leers’s picture

Title: Use ResourceFieldInterface to statically determine the field normalizer » Use ResourceFieldInterface to statically determine the field normalizer (also fixes #3011497)
Issue tags: +API-First Initiative, +Performance

Very very interesting! The best part is that we can do this without breaking BC, so we can do this after 2.0! 🎉

Also: it's great to see that we're now in a place where we can work on performance instead of stabilizing! 🤘

wim leers’s picture

ndobromirov’s picture

ndobromirov’s picture

StatusFileSize
new26.24 KB
new136.5 KB

Just an update here... This is my current biggest slow-down on jsonapi 2.x. This is using around 19-20% of the total page generation time with cold caches. See profile data below:

Global time:

Method time:

Can I have some guide where to look to start implementing this, even if it's on a PoC level?

gabesullice’s picture

@ndobromirov, I think the simplest way to see if a static map of normalizers would improve performance would be to override Drupal\jsonapi\Serializer\Serializer::getNormalizer and implement some kind of cache to prevent this loop from repeatedly calling supportsNormalization:

foreach ($this->normalizers as $normalizer) {
    if ($normalizer instanceof NormalizerInterface && $normalizer->supportsNormalization($data, $format, $context)) {
        return $normalizer;
    }
}

LMK if you want to chat about it at all.

wim leers’s picture

@gabesullice That's funny, I was advising against that in #3016129-2: [upstream] Fetching normalizers seems slow :P And to instead do this issue, the scope of which is bigger than what you describe in #6. If we're going to do #6, then let's do it in #3016129: [upstream] Fetching normalizers seems slow, and then let's do this issue afterwards.

gabesullice’s picture

To be clear, @ndobromirov asked about a PoC, and this was the simplest thing I could think of to test whether statically determining field normalizers will actually be a significant improvement.

I suspect most calls to supportsNormalization are actually on the field property level and this issues proposed solution won't be as big a win as we hope it will be performance-wise, even if it's a step in the right direction.

e0ipso’s picture

@ndobromirov one good way to get a profile of that would be to temporarily alter the symfony method in your local install.

wim leers’s picture

#9++!

Thanks so much, @ndobromirov, for being all over the issue queue in the last week or so, and chiming in everywhere with your excellent input! 👍 🤘 👏

ndobromirov’s picture

StatusFileSize
new26.27 KB
new139.67 KB

Ok so hacking the upstream was a big WIN: ~700k function calls have dropped as well as about half a second cold cache time (with xhprof running).

Should we try to push this upstream in some way?

Here is the code changes + results...

    /**
     * Returns a matching normalizer.
     *
     * @param mixed  $data    Data to get the serializer for
     * @param string $format  Format name, present to give the option to normalizers to act differently based on formats
     * @param array  $context Options available to the normalizer
     *
     * @return NormalizerInterface|null
     */
    private function getNormalizer($data, $format, array $context)
    {
        static $cache = [];

        $cid = get_class($data) . ':' . $format;
        if (isset($cache[$cid])) {
          return $cache[$cid];
        }

        foreach ($this->normalizers as $normalizer) {
            if ($normalizer instanceof NormalizerInterface && $normalizer->supportsNormalization($data, $format, $context)) {
                return $cache[$cid] = $normalizer;
            }
        }
    }

Top level:

Method level:

wim leers’s picture

~700k function calls have dropped

🤘

Could you apply the changes you quoted in #11 to JSON:API's override of the Symfony Serializer class? If that's feasible, then it'd be pretty easy to commit this if my fellow maintainers feel that's warranted :)

ndobromirov’s picture

I've opened this issue in symfony, so let's see :).

ndobromirov’s picture

The problem is that it's impossible, mainly because getting normalizers is a private method in the symfony class. We might be able to do it indirectly but it is going to be ugly and result in kind of re-implementing the whole base class in our own...

jibran’s picture

FWIW, they have already improved getNormalizer in Symfnoy 4 https://github.com/symfony/serializer/blob/master/Serializer.php#L222

jibran’s picture

But one can always hack upstream using own fork which you have to keep up to date or you can use composer patches and patch the upstream.

e0ipso’s picture

I don't think we should even if we could. With the current normalizer implementation the first normalizer is not guaranteed to be the correct one. This issue aims to fix that.

I wanted the profile to see what's the utopic potential improvement. It's been proven to be A LOT as we suspected.

gabesullice’s picture

This issue is about using ResourceFieldInterface to statically determine which field normalizer to use. I proposed changing Drupal\jsonapi\Serializer\Serializer::getNormalizer method (notice the JSON:API namespace) so that we could predict what would happen if we only had a cache of field normalizers but not entity-level or typed data level normalizers (as opposed to all normalizers).

~700k function calls have dropped as well as about half a second cold cache time (with xhprof running).

700k is huge. But ~1/2s isn't a huge portion of the total time. We have to assume that our actual results will be less ideal because we don't have the luxury of caching all normalizers. We only have the luxury of updating the normalizer interfaces that JSON:API controls (only field-level and up).

That means we should expect < 50% of the improvement we see here. Why? First, because entity-level normalizers are inconsequential. There will only every be a few hundred entities in a response at most (we'd run out of memory after that) and so they can only represent a very small portion of supportsNormalization function calls. For every entity-level normalizer, we'll have at least one field normalizer at play. But typically, field-level normalizers will be far more numerous (we can expect each entity to have many fields). But again, every field has at least one property to be normalized. Each of those properties will be on the typed data level, which JSON:API has not isolated from Drupal's normalization system and so we can't fix without core's help (which will be hard to do without breaking BC and/or updating a lot of normalizers).


Conclusion: we should cache all the normalizers we can, but let's not get too excited just yet! It's gonna be an uphill battle.

ndobromirov’s picture

Why not try to update only the serilizers in core to 4.x?

gabesullice’s picture

Why not try to update only the serilizers in core to 4.x?

"We can't fix without core's help (which will be hard to do without breaking BC and/or updating a lot of normalizers)... It's gonna be an uphill battle." :P

Will you open a core issue so we can get that ball rolling?

ndobromirov’s picture

Status: Active » Postponed

Postponing on the other one, as that's the place it needs to be fixed... :(

e0ipso’s picture

Agreed on #18, it articulates very well what I meant on #17 as utopic potential improvement.

wim leers’s picture

This issue was about restructuring certain things. In #3, I described how hopefully this issue would remove the need for the work-around-thing-we-can-do-right-now from #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer().

Unfortunately, #6 through #22 as far as I can tell essentially rescoped this issue back to the same scope as #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer().

I find this very confusing now.

e0ipso’s picture

@Wim Leers I think it's because of "(also fixes #3011497)" in the title.

So far there was only discussion to see what would be the potential gain in performance. I see the original scope unchanged from #3.

gabesullice’s picture

Title: Use ResourceFieldInterface to statically determine the field normalizer (also fixes #3011497) » Use ResourceFieldInterface to statically determine the field normalizer
Issue summary: View changes
Status: Postponed » Active

Unfortunately, #6 through #22 ... rescoped this issue back to the same scope as #3011497

I disagree. In #18, I explained why this issue as proposed will not address #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer() in a meaningful way and @e0ipso agreed. So it does not have the same scope. It's simply a proposed solution for better schema support.

Essentially, what was determined in 6-22 was that #2 should not have scoped this issue to fix #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer() also. They're distinct problems.

#22 should not have marked this Postponed on an upstream change considering the original scope of the issue.

e0ipso’s picture

I disagree. In #18, I explained why this issue as proposed will not address #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer() in a meaningful way and @e0ipso agreed. So it does not have the same scope. It's simply a proposed solution for better schema support.

Wait. I agreed that we won't ever be able to solve the problem #3011497: [upstream] Too many calls to \Symfony\Component\Serializer\Serializer::getNormalizer() beyond a certain point. This issue will solve it for as much as we can do.

gabesullice’s picture

Whoops! Sorry I misunderstood!

ndobromirov’s picture

The core issue has frozen, so the only way to get any further speed-ups for me is this one...


Any bullet points that can be followed to start implementation for this one? I get the overall idea but my know-how in jsonapi is somewhat limited to implement this from scratch :). Even if this can not achieve the utopic 20% speed-up as see in #11, if we can implement this and see it's 10%, this is still a lot of server time to be spared in the long run.
wim leers’s picture

Title: Use ResourceFieldInterface to statically determine the field normalizer » [PP-1] Use ResourceFieldInterface to statically determine the field normalizer
Status: Active » Postponed
slv_’s picture

A bit late to the party, but I'm getting a similar performance behavior to what @ndobromirov described. As it's already clear the normalizers issue comes from upstream. After much digging and hacking locally, I'm seeing around a 30% performance improvement using in-memory caching of the normalizers. In our case this is huge.

Also, as the API grows in complexity (in our case we need to have quite a few custom normalizers), the issue gets much worse, as this scales per number of available normalizers, as well as the amount of data to serve.

From looking at different issues in core / symfony / jsonapi, it looks like the only viable solution is to patch projects to make core allow ^3.4^||^4.3 for the serializer (and implement in every normalizer the new interface introduced in version 4.1 of the serializer component, since otherwise nothing is cached by default).

Or alternatively, implement a custom serializer in contrib that extends the symfony one and replaces the call `getNormalizer()` to a custom `ownGetNormalizer()`, which would include the caching behavior. Extending the method is just not possible as it's declared private.

ndobromirov’s picture

#31 I know it's not the best of fixes and it was already pointed in the other issue, but you can test this one with composer patches against symfony.serializer package.

"symfony/serializer": {
  "Add a cache to symfony/serializer::getNormalizer()": "https://www.drupal.org/files/issues/2018-12-18/symfony-serializer-3.4.20-getNormalizer-cache.patch"
}

You can check my benchmarks and a patch.

I will be using that above solution, until #3017291: \Symfony\Component\Serializer\Serializer::getNormalizer() is slow gets in core. Pretty much the same thing you are proposing.

slv_’s picture

Hey @ndobromirov. Thanks for the heads up. I had looked into that issue and other of the places where you posted, as it seems we were both facing pretty much the same issues. We ended up using our own Serializer class because.

1.- Symfony's one doesn't actually cache the normalizers unless you make them implement an interface and a method that returns a boolean (cacheable VS not cacheable normalizer). Out of the box if you don't touch anything, it actually makes things worse because it triggers twice the calls! The way it was implemented, in short, wasn't quite polished imho.
2.- It also happened that in our case, a lot of the normalizers would apply or not to the data, based on entity type / bundle, and field data, so we couldn't "just cache the normalizer" in the way that Symfony cache allows, which assumes that "same object" <=> "same normalizer" in all cases. (we noticed this after the 30% improvement I mentioned above) So the initial performance gains started to decrease as we noticed that a lot of the normalizers we had cached, had to be cached in some other way. Having a Serializer class allowed us to cache them by entity_type, bundle, field_name, etc, as appropriate.

jibran’s picture

Project: JSON:API » Drupal core
Version: 8.x-2.x-dev » 8.8.x-dev
Component: Code » jsonapi.module

Moving to core issue queue as #3014277: ResourceTypes should know about their fields is RTBC.

wim leers’s picture

Title: [PP-1] Use ResourceFieldInterface to statically determine the field normalizer » Use ResourceFieldInterface to statically determine the field normalizer
Status: Postponed » Active

#3014277: ResourceTypes should know about their fields landed, so … unpostponing 🤓🥳

e0ipso’s picture

Ohhh! Thanks for bringing a notification to my inbox about this. This is one of the issues I was excited a while ago (when I created it).

I think this may lead us to have a complicated relationship with the Symfony serialization component. We want to abandon the run-time discovery feature, but we are stuck with the existing normalizers… including the discovery methods.

gabesullice’s picture

Title: Use ResourceFieldInterface to statically determine the field normalizer » Use ResourceTypeFields to statically determine the field normalizer

#36 +1

I honestly think we should just abandon the Symfony serializer for everything above the DataType level and roll our own. It's more trouble than it's worth to support it. We're already abusing it with CacheableNormalizations and by completely opting out of the service tag feature by requiring normalizers to be in the JSON:API namespace.

wim leers’s picture

Assigned: Unassigned » wim leers

#37++ Exactly :) I already started working on a patch for exactly that last week.

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

bbrala’s picture

Hey Wim, this kinda ended on you having started on a prach, but i assume you never finished there? Or is there some work somewhere you never posted?

wim leers’s picture

Assigned: wim leers » Unassigned

Oh my … I have absolutely no idea what happened here back then. I also can't find a patch for this locally anymore…

I doubt I got very far, otherwise I'd have posted it here.

Still … sorry for having kept you hanging here :/

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

wim leers’s picture

#3031214: Introduce "deterministic" normalizers was closed in #3031214-19: Introduce "deterministic" normalizers because #3252872: Use CacheableSupportsMethodInterface for performance improvement in normalizers fixed it by using new upstream facilities! 🥳

And #3017291: \Symfony\Component\Serializer\Serializer::getNormalizer() is slow could in principle be solved in the same way — opened by @ndobromirov at #21 at @gabesullice's request in #20.

Is this still relevant once that is fixed? Skimming this again, per #37 and #38, it does seem like issue intends to go significantly further?

bbrala’s picture

#3017291: \Symfony\Component\Serializer\Serializer::getNormalizer() is slow can be closed I think since we did that in #3252872: Use CacheableSupportsMethodInterface for performance improvement in normalizers.

This issue is still relevant since we still want to find a way to make normlizers a bit more determinitic in what they use to normalize/denormalize. I'm still not too sure though how we should go about that tbh.

bradjones1’s picture

Per #47:

This issue is still relevant since we still want to find a way to make [field normalization] a bit more determinitic in what they use to normalize/denormalize. I'm still not too sure though how we should go about that tbh.

Trying to help refine what the goal should be now that #3252872: Use CacheableSupportsMethodInterface for performance improvement in normalizers landed and addressed the performance/deterministic component of this problem space.

Per #37:

I honestly think we should just abandon the Symfony serializer for everything above the DataType level and roll our own. It's more trouble than it's worth to support it. We're already abusing it with CacheableNormalizations and by completely opting out of the service tag feature by requiring normalizers to be in the JSON:API namespace.

That seems to be a pretty heavy lift and would affect other subsystems beyond json:api?

From the IS:

Each resource field object should have a normalizer statically defined.

How would this work with things like jsonapi_extras, which implements enhancer plugins that change the normalization and schema thereof?

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 11.x-dev » main

Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

Read more in the announcement.