I noticed today that rdf_preprocess_username() calls user_load() individually for every comment or node author displayed on a page, that's potentially a lot of database queries and processing, which would better be handled by rdf_comment_load() and rdf_node_load() pulling all needed user objects into memory ready for the preprocess hook to use that data. Or possibly even user_comment_load() and user_node_load() doing the same so that other modules don't run into the same issue. As we move towards requiring API functions to be called rather than faking objects with direct queries, this is going to become more an more of an issue.
However, there are two serious drawbacks to loading one entity into another:
1. It breaks http://drupal.org/project/entitycache - since now you have duplicated cached data which needs to be cleared or get out of sync.
2. It can lead to infinite recursion.
Let's take for example, a theoretical node_profile() module, which loads a profile node into the user object - I'm not sure if modules like bio or content_profile do this, but I remember seeing that og does something similar. At the same time, node module loads a full user object into the node object. I've done this D6 style for simplicity since the same potential is there too, but we special case a lot more things in D6 with direct queries so it doesn't come up so much.
// First load the user.
user_load(1);
// This triggers hook_user_load().
function node_profile_user_load($account) {
$nid = node_profile_get_node($account->uid);
$node->node_profile = node_load($nid);
}
// Which triggers user_node_load().
function user_node_load($node) {
user_load($node->uid);
}
Since the original node_load() can't complete until the user load has completed, which triggers node_load() again with the same $nid, you have an infinite loop.
I think the only answer to this, especially at this stage in the release cycle, is a new optional hook along the lines of hook_node_build_multiple() (also applied to other entities which have that same stage in the rendering process). This would be exactly the same conceptually as http://api.drupal.org/api/function/field_attach_prepare_view/7 - except for non-fields to act.
Then either RDF or user module or whatever could implement that hook and save loading things one by one, without the risk of blowing things up.
| Comment | File | Size | Author |
|---|---|---|---|
| #39 | view_multiple.patch | 903 bytes | catch |
| #36 | entity_prepare_view-636992-36.patch | 8.17 KB | yched |
| #29 | entity_prepare_view.patch | 8.2 KB | catch |
| #27 | entity_prepare_view.patch | 8.21 KB | catch |
| #24 | entity_prepare_view.patch | 7.52 KB | catch |
Comments
Comment #1
catchHere's a start on a patch, don't have time to do more than this today so posting what I have.
Patch includes a fix for #638048: rdf.module triggers a user_load() per node or comment author.
Comment #2
catchTagging since we have to deal with the RDF performance issue here too.
Comment #3
yched commentedI probably agree with the approach (I'm not that familiar with RDF code), but this raises a DX question:
Originally the field_attach_*() namespace was supposed to be "the functions that fieldable entities are supposed to call in their workflow".
field_attach_load() is now already taken care of by the entity layer. #367006: [meta] Field attach API integration for entity forms is ungrokable automates all the 'form-related' f_a_*() calls (form, form_submit, form_validate) to the entity layer as well. entity modules need not and in fact should not call those functions.
This patch is a little different, it results in "call entity_prepare_view() where you called f_a_prepare_view() previously".
It feels like the "Field Attach API", from 'push' in D6 (CCK pushes its stuff in hook_nodeapi()), and 'pull' in early D7 Field API (entity modules call the right functions at the right time), is slowly moving to an intermediate state that smells like a DX WTF...
Comment #4
sunHoly cow. Nice finding.
Comment #5
catch@yched
Yeah I'm a bit torn on this - I'm hoping in D8 that we can extend the API to full CRUD and rendering, but that doesn't have to be done piecemeal here. However the main issue is that field_attach_prepare_view() isn't re-entrant and in some cases needs to be called in more than one place for the same entity - see comments in node_build_multiple() - exactly the same problem is going to affect this new hook.
I really, really don't want node / comment etc. to be handling making their hook invocations re-entrant themselves, so that leans towards a helper function, however there's possibly no reason not to add this separately and leave the field_attach_prepare_view() calls where they are. In general I think we need a rule that wherever there's a field_* call there's an associated entity-level hook invocation - although I think this is the only point we've departed from that.
Comment #6
catchNew patch.
Moved the field_attach_prepare_view() calls back to where they were, this keeps the helper function though.
Removed hook_$entity_type_prepare_view() since there's as yet no use-case for it in core - this way we're adding one new hook instead of six and it more closely mirrors the field API. That's up for discussion of course but keeps the patch as small and unobtrusive as possible.
Did some basic tests on a node with 50 comments generated by devel. In this example there are approximately 36 user objects being loaded by rdf (some comments have the same author).
Without patch: Executed 302 queries in 205.35 milliseconds.
With patch: Executed 174 queries in 121.74 milliseconds.
That's 50% of queries on that page being generated by rdf_preprocess_username() without the patch, ouch - we do separate lookups for the user table, field cache, user roles and user pictures per-username otherwise.
Added system.api.php docs, didn't yet look at how to deal with the problem of invoking this on $foo_build_multiple vs. $foo_build - would like to get yched's thoughts on this since ideally we'll have exactly the same pattern in both field_atttach_prepare_view() and entity_prepare_view() for resolving that.
Comment #8
catchMinus stupid typo.
Comment #9
bjaspan commentedsubscribe
Comment #10
yched commentedNot true anymore.
Can we move $entity_type as the 1st param, to match Field API ? or is "$entities first" a pattern on the entity API side ?
"This module" ?
Convoluted ;-)
More generally on the approach - er, I'm a bit confused right now ;-)
This review is powered by Dreditor.
Comment #11
yched commented"field_attach_prepare_view() isn't re-entrant" : is this the same issue than #493314-58: Add multiple hook for formatters ? ("direct calls to [comment|node]_build($single_object) do not trigger field_attach_prepare_view().")
Then maybe we need some '#view_prepared = TRUE' flags - a bit similar to FAPI's ' #validated. Not sure this flag can apply to an $entity as a whole, or needs to be field-by-field...
Comment #12
catchFixed yched's points in #9.
On #10 - yes, same thing. I was thinking of a static keyed by entity type and ID but that's got more overhead than a flag so I'd prefer that option if we can come up with something.
Comment #13
catchComment #14
yched commentedStumbled on ""node_build() doesn't trigger field_attach_prepare_view()" once more in #652834: Field formatters as render arrays.
Comment #15
catchSo because we do field_attach_prepare_view() and entity_prepare_view() separately, they can't share the same flag.
That gives us two rough options:
1. $entity->#entity_view_prepared / $entity->#field_prepared
2. $entity->#entity_prepared /$entity->field[0]['#prepared']
These should be entirely internal (apart from the fact people will see them when they do dsm($node)) so it's just a case of if we need #2 to allow field level rendering to be done separately during a request. I'll defer to yched on that - however if we can live with $entity->#entity_view_prepared here then I can re-roll with that.
Comment #16
Anonymous (not verified) commentedsubscribe
Comment #17
yched commentedTwo tricky things:
- we cannot always assume a field_attach_*() (processes all fields of an object) workflow. Fields can be displayed 'on their own' using a custom formatter (by the currently broken field_display_field() / field_format() functions or their updated counterparts in #612894: field_format() is just a pile of code that doesn't work)
- The build mode is important. Which build mode you're being built in determines the formatter to use and thus which hook_field_formatter_prepare_view() will be called.
The same node could be displayed in several build modes within the same page request (e.g 'full' node in main content, but with a couple fields deported in a 'sideblock' build mode in a sidebar).
I'd say a global $entity->#field_prepared flag might be enough if we find the proper place(s) to reset it after the job is done.
I'm still wondering whether making field_attach_view() itself 'multiple objects' (as apposed to f_a_prepare_view() being multiple-object because it does some db requests, and f_a_view() being single-object because originally it doesn't really need to be 'multiple' in itself) would not let both ops be merged into a single one (f_a_view() takes care of prepare_view), and solve the mess of 'sometimes one of them is not run'.
Would mainly affect field_default_view() (internal code). Only API change is hook_field_attach_view_alter() - not even necessarily, I think.
Comment #18
yched commented"make field_attach_view() itself multiple-objects": forget it, I guess. This still wouldn't solve the issue that node_build_multiple($nodes) calls node_build() / node_build_content(), but the last 2 can also be called directly on a single node.
Comment #19
catchfield_attach_view() being multiple would be nice, but yeah it doesn't solve the specific issue.
Here's the patch with $object->entity_view_prepared added. Not trying to solve the fields issue here since there's an existing issue, and this one has the main API addition / performance change attached which needs to be resolved sooner rather than later.
Comment #20
Anonymous (not verified) commentedThe patch looks pretty good. Also, everything appears correct after applying it. :)
Comment #21
yched commentedcatch: agreed.
Minor: we should specify that the array has to be keyed by object id.
Shouldn't we call the hook only with entities where entity_view_prepared is not TRUE ?
(and friends)
Nitpick: entity API being lower level that field API, I'd suggest calling e_prepare_view() before f_a_prepare_view()
In an earlier comment I suggested having the $type param come first. Was done for entity_prepare_view(), but now hook_entity_prepare_view() is off.
Also, given the entity_view_prepared flag, shouldn't we also call entity_prepare_view() from 'single' node_build() ?
I'm on crack. Are you, too?
Comment #22
catchKeyed by ID - added to comment.
Only prepare items without the flag - it already does that, see the $prepare array, but I shuffled code comments and logic slightly to make that more obvious / robust.
Entity hooks - I followed the pattern of field first, entity specific hooks second established elsewhere - i.e. field_attach_load() runs before both hook_entity_load() and hook_node_load(). I nearly added hook_ENTITY_TYPE_prepare_view() but don't want to add so many new hooks this late in the cycle, especially when our only core use-case works well with the single hook.
Arguments to hook_entity_prepare_view() - these are consistent with http://api.drupal.org/api/function/hook_entity_load/7 - I'd rather keep those consistent between themselves now, then look at changing these when and if we have a complete entity API in D8.
Added a call in node_build_content().
Comment #24
catchbah.
Comment #25
yched commentedAgreed with catch's answers in #22.
"Only prepare items without the flag - it already does that see the $prepare array"
Doh, my bad.
Minor drawback is that hook_entity_prepare_view receives $entities where entity_view_prepared is already set to TRUE. No biggie I guess, since by contract it receives $entities that do need to be prepared, so it shouldn't need to actually check the flag. Setting the flag after the actual 'prepare' hook has run would mean an additional loop on the $entities array. Probably not worth.
Comment #26
sunNote that I didn't really read any of the follow-ups, just the patch.
The PHPDoc of entity_prepare_view() lacks some information, IMHO. At least, I don't really have a clue what it does, and why I would want to invoke it.
The same applies more or less to the corresponding hook, although I get a slightly better idea of it after reading that.
On a related note, I don't really understand why entity_prepare_view() is invoked *after* field_attach_prepare_view()... I would have assumed that fields are the cause for potential recursion, which may not be true, but then this really needs much more docs to be not confusing.
In general though, none of the PHPDoc mentions the critical cause for this function, why we implemented it, and how it is supposed to work - i.e. "API documentation". ;)
Lastly, can we test this?
I'm on crack. Are you, too?
Comment #27
catch@yched - I'd prefer it if we only had a single route to render an entity as much as possible - i.e. not having the separate code flows for lists and single object views, but we're too late for that in D7, so better to encapsulate the issues that causes in one place if possible. We could stick the 'prepared' flag in after the hook invocation - that would be more correct, but it's more verbose too for the same effect.
@sun - I updated the phpdoc for entity_prepare_view(), see if that helps explain things.
I have not added tests yet. We implement the hook in rdf.module so it already has some code coverage, all we'd really be doing otherwise is testing that module_invoke_all() works. The re-entrant code hunks we have some actual real bugs (I think taxonomy previews on nodes) which we should add tests for when fixing those in Field API - it'll be the same pattern in both.
The real test for the RDF hunks is the query count and benchmarks, which I don't see an obvious way to account for in simpletest.
HEAD - node/n, 50 comments, anonymous user with access to view comments.
Patch:
I make that ~30%.
Comment #28
yched commented"You may optionally" sounds weird. Shouldn't it be compulsory ? "If you're an entity, you need to call this". You cannot predict what contrib or 3rd party use cases will be, right ?
double space
Hm, for those cases we generally have hook_field_attach_[op]() hooks. f_a_prepare_view() fails to call such a hook currently - different issue though.
This review is powered by Dreditor.
Comment #29
catchI've changed it to "you should". Some entities may never go through the standard rendering process, such as files in HEAD, so didn't want to make the wording too strong. Also removed the double space.
I've left the final comment as is for now.
Comment #30
yched commentedThanks !
We do need to fix this for field's own _prepare_view() op. I'll try to do this asap, but it won't be within this week.
+ opened #660514: f_a_prepare_view() has no associated hook
Comment #31
catchI've just opened #670648: New comments link on node teasers issues two queries per node which depends on this.
Comment #32
yched commentedPosted a fix for the field API counterpart in #661494: direct calls to node_view() do not trigger f_a_prepare_view()
Comment #33
dries commentedSorry, this no longer applies.
Comment #36
yched commentedRerolled after #661494: direct calls to node_view() do not trigger f_a_prepare_view() and the 'build / view' renames.
Comment #37
catchThanks for re-roll, all looks proper.
Comment #38
webchickSince this is one of the top 5 performance improvements, and it looks like a reasonable solution to the problem. I do get concerned about how many of these 'phases' we're adding everywhere, but also can't exactly think of anything better to do.
Committed to HEAD. But one thing I noticed too late is that:
are wrong. They're both view something something. But since I couldn't remember what and couldn't find it easily, marking CNW for that, as well as documenting this API change.
We also should add tests for this. So adding a tag for that as well.
Comment #39
catchHere's a followup for build_multiple/view_multiple() - build_content() is right though.
I've added this to the upgrade modules page.
Comment #40
webchickGreat, thanks! :D
Committed that small folllow-up to HEAD.