Problem/Motivation
This bug took a lot of hours to track down - I found 2-3 unrelated bugs on the way and various dead end rabbit holes before realising what was happening.
Twig has traditionally used output buffering to render templates. This is incompatible with Fibers, because once the output buffer starts, other things can be outputted to it out of order.
This has actually been fixed in Twig already, see @fabpot's note in https://github.com/twigphp/Twig/issues/3599. However, Twig still uses output buffering by default, yield is only an option. So we need a two line change to core, then it should work.
I will post a fix-only MR here soon, this can then be integrated into #1237636: Lazy load multiple entities at a time using fibers and give us a green run there.
I tried to write a failing kernel test for this earlier in the week, but failed - however that was before I understood the root cause of the bug so maybe I can adapt that to expose the issue now. Will add that later.
Steps to reproduce
Proposed resolution
There are two changes:
1. Switch Twig to use use_yield, this will be the only option in Twig 4.
2.use_yield results in even later rendering of MarkupInterface objects than we currently do, this turns out to be too late when a mutable object is rendered and also mutated inside a Twig macro. This happens in core in menu templates that render menu link levels recursively. To resolve this we can cast MarkupInterface objects to a string in TwigEnvironment::escapeFilter() so that they're cast when they're yielded (e.g. before the next line in the template runs) rather than even later.
Remaining tasks
User interface changes
Introduced terminology
API changes
Data model changes
Release notes snippet
Issue fork drupal-3546376
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
catchThe bug was initially surfaced via a single Umami test failure in #1237636: Lazy load multiple entities at a time using fibers - this asserts the order of various content on the Umami front page.
What was happening with the bug itself was elusive for a while. Initially I thought that placeholders and their output were getting 'swapped' somehow. Need to open a couple of spin-off issues for cleaning up code that ultimately turned out not to be a problem, will link from here once that's done.
Closer inspection of the lazy builder markup output indicated that instead of being swapped, bits of placeholders were getting mashed together in the output of different placeholders. This was even more confusing and I still couldn't track it down. We're not assigning output to a static cache or anything like that anywhere, Fiber rendering can't be doing anything that weird even to recursive rendering of render arrays etc.
Then in the shower this morning (really) I remembered that Twig uses output buffering. So once the output buffer was started, different bits of different lazy builder markup were going into the output buffer depending on which placeholders suspended fibers.
Comment #3
catchMR may be the most hours per lines of code I've ever submitted.
Comment #5
catchAdded some test coverage.
(Note: a percentage of the test coverage/test module was written using Claude cli, had to rewrite a lot of it but at least the skeleton was Claude. This is because Tag1 sponsored some time to work on any core/contrib issue, but specifically using Claude in order to evaluate it. I chose this one because I knew if I was able to track down the bug that I'd want to try to write a kernel test for it that didn't involve nested media/image rendering (which is what surfaced it in the entity loading issue). I don't think it saved any time overall since things constantly go wrong and it takes as much time thinking how to prompt it to get what you want as it would to have typed it out, and the things that do work could often be copypastad from a different test or module and altered (which is what I usually do to save time). The one thing that maybe saved time waswhen I asked it for some alternative ways to execute PHP code during Twig template rendering it suggested a custom twig function, which was a fairly minimalist way to do that - probably would have go there myself but hadn't yet at the point I prompted. - so a mixed bag).
Comment #7
catchDraft test only MR pipeline shows the correct test failure: https://git.drupalcode.org/issue/drupal-3546376/-/jobs/6520638#L4067
Comment #8
catchComment #9
catchMaking it clear we're changing a Twig option here and not rewriting how it renders templates.
Comment #10
godotislateA couple comments on the MR. Also it looks like some Nightwatch tests are consistently failing.
Comment #11
catchThe nightwatch failures are reproducible with the standard profile - you need to add a second level of menu items, then things get weird.
It looks like it's possible to have twig nodes that aren't compatible with yield - e.g. that print or echo, but I can't see any sign of anything like that yet.
Comment #12
berdirMenu templates use a recursive block/function, I think that's the only example of that, has to be related to that?
(nothing is ever easy)
Comment #13
catchThis is (at least part of) the markup difference:
The missing class (
primary-nav__menu--level-1orprimary-nav__menu--level-2is added in the olivero primary menu template with ::addClass(). This happens inside the menu_links macro so it's very possible it's something to do with the macro/recursion, no idea what though.Comment #15
godotislateFound the issue, and it's really annoying.
So in the menu templates, such as
core/themes/olivero/templates/navigation/menu--primary-menu.html.twig, theaddClass()call is subsequently followed byremoveClass()in the next line:What appears to be happening because of
yieldis thatrenderVaris called on theattributesobject at the time of iteration through the returned generator, so the addClass is negated by the removeClass.I've opened MR 13228 with a POC of a quick "fix" (I don't really think it should be done this way) in the Olivero templates to demonstrate what's going on.
Comment #16
catchThat is indeed very annoying.
Had a play around with different approaches, if we make sure create_attribute also clones, we can skip the toArray() - same overall approach though.
Pushed a commit for the original MR for that.
Comment #18
godotislateSome general thoughts:
Comment #19
catchI looked at what was going on with Twig rendering a bit more.
Here's the relevant output from the actual generated twig template (e.g. what gets written to sites/default/files/php/twig):
I was confused why the ::escapeFilter() call doesn't cause the attribute to cast to string there, because Attributes implement ::__toString()
However, escapeFilter() passes through MarkupInterface objects without casting to string. Now I actually understand what's going on finally (maybe).
I've pushed a change that reverts all the template changes, and special cases atrributes in escapeFilter so that they get cast earlier.
Olivero menu rendering still works, let's see if this causes any other test failures though. If it works and doesn't cause other problems, that lets us switch to Yield with no changes for template authors.
I feel like we could even go a step further and cast any MarkupInterface object to a string in ::escapeFilter(), leaving only Twig Markup object completely intact there. But will see if we get a green test run first.
Comment #20
godotislateTo answer my own question, it looks like it's a combination of using a macro with the
attributevariable being a reference to an object that is mutable by methods likeaddClass()andremoveClass().If I put the following method in a kernel test class, only the last assert fails:
Comment #21
catchOK pushed another change that now early-casts for all MarkupInterface objects, this would mean anything attribute-like either now or in the future would also be covered.
Without use_yield, the MarkupInterface objects that are preserved in TwigExtension::escapeFilter() get immediately cast to a string anyway as soon as they leave it, with use_yield that can be slightly delayed which causes the problem here.
While this changes which method they get cast to a string by, I'm pretty sure it's only a fraction earlier with nothing else happening to them in-between, which allows us to keep the behaviour transparent for everything else.
If I'm overlooking something, we could go back to only casting Attributes early and handle the rest in a follow-up.
Comment #22
catchCopy and pasted #20 into a new test class. The unit test change doesn't prove that we fixed the bug, only that we changed the behaviour, the nightwatch test is... a nightwatch test.
Comment #23
catchUpdated the issue summary. With this approach I don't think we need a change record for Twig template authors, but we probably need a general one in case someone somewhere is doing something weird with print in Twig nodes or something.
Comment #24
catchAdded a change record.
Comment #25
godotislateHave one question on the MR about possibly dead code, which I came about by checking what happens if
$autoescapeisFALSE, though I don't know in what circumstances that happens.I think this makes sense and is fine, but it would probably be best to get another opinion. I think we might need another pair of eyes on the kernel test as well.
Comment #26
godotislateOne other option, though I'm not sure it's better: in
escapeFilter(), return a clone of the object if an Attribute object is passed in.Comment #27
catch@godotislate
I was doing this in an earlier revision, but didn't really want to special-case Attribute because something else could theoretically be implemented similarly and suffer from the same problem.
@alexpott
I thought about this, and it looks to me like we just wanted to pass through anything that doesn't get sanitized with the least amount of change + it will get cast to a string almost immediately after this logic runs anyway. The method name is only about sanitization not about casting, so there might have been a reluctance to do more than was advertised too. But also I reckon we probably didn't think about the sanitization-skipping logic too much.
Comment #28
nod_Played with the patch a bit, the attribute stuff I'm doing with htmx doesn't break. And things seems in order in the admin UI. I asked @grimreaper to test it with ui_patterns/display_builder since they might do more complex things with twig than core (everything fine, testing didn't show any issue). Maybe we need to test twig_tweak too?
Comment #29
nod_Comment #30
alexpottI've reviewed the twig_tweak module and that one does not really involve itself in escape filtering or MarkupInterface or Twig/Markup objects so I think we're good here.
Perhaps it is worth a follow-up to consider not wrapping in Twig\Markup objects and returning strings here - less object creation FTW but I think this is a safe middle position.
Committed 3ec578f and pushed to 11.x. Thanks!
Comment #33
andypostCR could be published and MRs closed
Comment #36
nod_Thanks for the reminder, done
Comment #37
alexpottI created the follow-up #3547965: Don't create Twig\Markup objects unnecessarily - I reviewed the upstream code more carefully and concluded I was overly-cautious in recommending that we use Twig\Markup objects. I guess a potential benefit of the old code was that we wouldn't cast some objects to strings if somehow later on they are no used - this can save a query with translatable strings - but we're already losing that here when we cast them to make the Twig\Markup object so there's no tangible benefit. Sorry.