Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I'm not too fond of the way we currently handle the 'sanitize' step (where field module should run check_markup() and stuff if needed) - currently on field_attach_view() , before building the render arrays in field_default_view().
It might be better suited as a pre_render step.
Leaving this as a reminder for further pondering.
Comment | File | Size | Author |
---|---|---|---|
#4 | core-remove_early_sanitization_recursion-371724-1.patch | 1.94 KB | jenlampton |
#2 | node.png | 509.93 KB | jenlampton |
Comments
Comment #1
yched CreditAttribution: yched commentedWe might want to piggyback what comes out of #369011: Performance: do check_markup() in text_field_load()
Comment #2
jenlamptonIt looks like the sanitization got moved all the way up into
hook_field_load()
which is way too early, and is causing me some serious recursion and performance headaches for lots of different scenarios.Here's my use case: I have a site that's using "shortcodes" (read: custom input filter) to render various different things inside the body field of a node. These things can be single field off a same node (inline images) a collection of fields off the same node (image / credit / caption), or sometimes, a teaser for another node on the site, like so:
Scenario 1
Only a single node involved, but pulling content from a second field into the body also causes headaches.
As this node A is being loaded, here's what happens:
* node_load(node A)
* add fields for node A
* run check_markup on body field for node A
* * replacement pattern calls node_load(node A)
* * add fields for node A
* * run check_markup on body field for node A
* * * replacement pattern calls node_load(node A)
* * * add fields for node A
* * * run check_markup on body field for node A
etc, etc, (recursion)
Since this node has not yet FINISHED loading by the time the second node_load() is called, there is nothing in the cache yet, so, this causes infinite recursion. I added a recursion-detection stop into the custom input filter, but now we have another problem. The first time the node is loaded, recursion is detected and rendering is aborted. The embedded content does not appear in the node body. Reload the page a second time, and the embedded content appears in the body. Clear the cache: broken. Reload page: fine. There's no way we can get the embedded content to appear in the first page load! :(
Scenario 2
If two nodes reference each other via shortcodes, we also have a problem. The first node (call it node A) often references a second node (call it node B) and commonly, node B also references node A back.
As this node A is being loaded, here's what happens:
* node_load(node A)
* add fields for node A
* run check_markup on body field for node A
* * replacement pattern calls node_load(node B)
* * add fields for node B
* * run check_markup on body field for node B
* * * replacement pattern calls node_load(node A)
* * * add fields for node A
* * * run check_markup on body field for node A
* * * * replacement pattern calls node_load(node B)
etc, etc (recursion)
since neither node has FINISHED loading yet by the time the filters are run this causes infinite recursion.
The solution here seems simple: don't filter the content on load.
To me, this current order of things is architecturally a little bonkers - "load" is supposed to mean "get the thing out of the database". Sanitization should happen much later in the process, as part of
node_view()
or maybe even in prepare. Swapping around the order of things to get a performance gain seems a little like cheating.In my case, the only valid work-around was to disable the cacheability of my filter. This is what the previous developer on this site had done, and the site was suffering from extreme performance problems because of it. The site loads upwards of 10 nodes per (node) page, and without being able to cache *any* of the filtered output - that's very expensive.
Comment #3
jenlamptonComment #4
jenlamptonAttached is a patch that removes the early sanitization. So far my testing hasn't revealed any issues caused by this. I'll keep testing on my production site and report back.
Comment #5
jenlamptonI just read through all of #369011: Performance: do check_markup() in text_field_load() and though I agree with a 10% performance gain on principle, I disagree with moving view-logic into the load-logic in order to get it. That's how we end up with bugs like this.
I would love to hear some feedback from @catch @yched or anyone who participated in #369011: Performance: do check_markup() in text_field_load() here, now that we know what kinds of problems this is causing.
Comment #6
yched CreditAttribution: yched commentedThat seems like a drastic change for D7 at this point :-/
Maybe there would be a way to avoid the recursion by deferring the invocation of hook_field_load() until after all field values have been read from the database ?
Comment #7
jenlamptonIsn't the point of
hook_field_load()
to read field values from the database?Comment #8
yched CreditAttribution: yched commentedNo, the storage layer is in charge of loading the values. hook_field_load() runs after that, and allows the field type to do extra massaging or load additional stuff on top of the raw values loaded from the db.
For example, for D7 file fields, file_field_load() receives the stored $item['fid'], and adds the info from the {file} records.
Comment #9
jenlamptonIf running the sanitization in
*_load()
is already too early, how does having another earlier phase (storage layer) help here? I'm happy to try another approach, but I'm not sure I understand how to do that just yet.Comment #10
jurgenhaasI came across this issue on a customer site that has some pretty expensive filters (highlighting terms and phrases from a big taxonomy) which is extremely slowing down the performance of the site. Even worse if nodes get exported where a huge number of nodes need to be processed all at once.
It comes down to the sanitization in the text module and I'm not getting it why _text_sanitizes() calls check_markup() without setting $cache to TRUE. The $cache variable in the signature of check_markup defaults to FALSE and hence, the filter result of text fields is never cached, right?
I was looking around for other field loaders or sanitizers in core and contrib and they all seem to be setting cache to TRUE. Why is the text module of core not doing it?
In my use case it accelerates the site massively and I checked a lot of other things and it doesn't seem to break anything. Even from a logical point of view it seems that this is the correct way of handling it.
Am I wrong or am I missing something?
Comment #11
jenlamptonPatch in #4 still applies cleanly to Drupal 7.58
@jurgenhaas I think your suggestion is a good one, but is perhaps unrelated to this issue. Here we're talking about problems with running the sanitization (and token replacement) before nodes are finished loading, causing incomplete replacements since the data we need hasn't been created yet. I think you might want to open a new issue about the cache flag.
Comment #12
jenlamptonPatch in #4 still applies cleanly to Drupal 7.59.
Comment #13
amit.drupal CreditAttribution: amit.drupal as a volunteer and at gai Technologies Pvt Ltd for gai Technologies Pvt Ltd commentedPatch in #4 is working fine.
Comment #14
jenlamptonPatch in #4 still applies cleanly to Drupal 7.63.
Comment #15
jenlamptonPatch in #4 still applies cleanly to Drupal 7.69, through 7.94.