Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
Looks very similar to the taxonomy and user versions, upload.module is converted - one possible bit of optimisation would be to make upload_node take an array of nodes and grab every file in one go, at the moment we only reduce queries if there's >1 file upload attached to a node. Tests pass.
Comment | File | Size | Author |
---|---|---|---|
#22 | file_load_multiple.patch | 17 KB | catch |
#18 | file_load_multiple.patch | 20.01 KB | catch |
#17 | file_load_multiple.patch | 20.01 KB | catch |
#15 | file_load_multiple.patch | 20.05 KB | catch |
#14 | file_load_multiple.patch | 19.87 KB | catch |
Comments
Comment #1
drewish CreditAttribution: drewish commentedsweet! i'd been meaning to start working on this patch.
looking over it i didn't see anything big that jumped out at me... the indenting on:
+ // Remove any loaded files from the array if they don't match $conditions.
was off by a space.
i'd love to expand the unit tests a bit so they cover the file_multiple_load() in more depth.
we also need to update the hook_node_load() docs.
Comment #2
catchModifined hook_file_load() docs, fixed whitespace, fully converted upload_nodeapi_load() so that it takes two queries to get all the files attached to the nodes passed in (this compares to a query per file per node in HEAD).
Still needs some extra tests so leaving at CNW.
Comment #3
catchNow with tests.
in irc drewish suggested removing the mixed param to file_load and make it just take fid. I'm a bit split on this since it seems like filepath (for example), ought to be unique, so providing a function to get a single file by that makes sense - but if it's only used in testing (which it seems like it is), then maybe that's not necessary.
Comment #4
drewish CreditAttribution: drewish commentedcatch, ought to is the key word. see #329301: Rename {files} to {file} and add unique key to the filepath column
Comment #5
catchfile_load() takes only $fid as argument now.
Comment #6
catchbetter variable naming via drewish in irc.
Comment #7
catchAvoid a nested foreach in upload_nodeapi_load() and a small testing cleanup.
Comment #8
catchdrewish suggested some better docs for the end of file_load_multiple() - array_intersect_key() etc. - I want to abstract this and maybe some other bits of the multiple_load stuff out into helper function(s) which will allow for hopefully making each one a lot less verbose, but that's in a followup patch.
Comment #9
catchHaven't benchmarked this yet but here's some maths.
10 nodes with 5 uploads each, quite a lot but not outlandish.
HEAD:
Each node requires 1 query in upload_load() to grab the fids = 10 queries
Each file_load requires a query = 50
Each upload_file_load() requires a query = 50
Each file had seven cats, oh wait...
= 110
Patch:
One query each in upload_nodeapi_load(), file_load_multiple() and upload_file_load()
= 3
Viewing a single node with 5 uploads would be 11 queries vs. 3 with the patch, so a saving there as well.
Comment #10
drewish CreditAttribution: drewish commentedcatch, this looks *really* good. I'll confess that I spent a bit of time scratching my head wondering why hook_file_load($files) wasn't going by reference... then it finally clicked that the array isn't byref but the objects inside it were. Anyway...
I made a few changes to white space and comments.
In upload_nodeapi_load() I added this bit:
to avoid the extra query when loading types without uploading enabled.
I flushed out the tests a bit. We could probably do more but I think this gets the important bits. In my mind this is RTBC but I'd love to have a third party give it a look.
Comment #11
Dave ReidInitial patch code review (will do a through functionality test in a little bit):
In file_load_multiple:
$query->conditions('f.' . $field, $value);
should be$query->condition(...)
.In upload_file_load()/hook_file_load:
With
db_query(...)->fetchAll(PDO::FETCH_ASSOC)
, do not need to cast results into arrays in the foreach loops. They are already arrays.In upload_nodeapi_load():
Could easily be condensed to:
Comment #12
catchThanks Dave, fixed the typo and the array casting.
With the last point - afaik there's nothing stopping two uploads sharing the same fid (particularly if there's contrib interaction with the upload UI), and fetchAllAssoc('fid') would overwrite should that be the case - so I deliberately didn't use it here. If I'm wrong on that, then yes it would indeed be more concise, have left it for now though.
Comment #13
drewish CreditAttribution: drewish commentedActually you don't even need contrib to get multiple nodes sharing an fid... just translate the node. It looks really awkward though and to help prevent this from being "optimized" later I've reworked the comments in this function to explain why we're doing it the long way.
Comment #14
catchAdded some clarifications about the $conditions array after some review with Drewish and Damz in irc (this pretty much matches the node_load_multiple() docs now). Also we can do a select * rather than hitting the schema in the select now. This might be worth porting back into taxonomy_load_multiple() too.
Comment #15
catchCross-posted with drewish, here's a merge of the two.
Comment #16
drewish CreditAttribution: drewish commentedI gave it another close look and at this point I'd say it's good to go.
Comment #17
catchRe-rolled for the new spangly db_placeholders() replacement.
Comment #18
catchAnd without whitespace silliness.
Quick benchmarks - node/1 with 5 uploads. Touch faster. Can't get devel generate to work with uploads in HEAD so will have to wait for those.
HEAD:
5.57 #/sec
5.69 #/sec
5.78 #/sec
Patch:
5.82 #/sec
5.87 #/sec
5.83 #/sec
Comment #19
catchGot devel to generate some uploads. Ten nodes on front page, an upload for each:
Devel confirms that we save 27 queries in the same comparison - each node with an upload currently requires 1 initial query + 2 per upload, that's 30 queries in the above example, it's three with the patch regardless of how many nodes or uploads are loaded.
HEAD:
Executed 68 queries in 17.74 milliseconds.
Patch
Executed 41 queries in 14.2 milliseconds
Comment #20
Dries CreditAttribution: Dries commentedcatch, can you try to disable the new file caching and then run the benchmarks again?
Comment #21
catchHi Dries,
I'd love to, but I don't know what 'the new file caching' is.
Comment #22
catchwebchick and Damz suggest it's the $file_cache static, which makes sense of course. And after some discussion in irc, there doesn't seem a lot of reason to have it in this case - i.e. files attached to nodes will be cached in the node static anyway, and we're not going to load the same file on the same page that often.
Here's a patch which rips that bit out, and benchmarks. A bit faster if anything.
HEAD:
4.10
4.11
4.15
Patch
4.42
4.34
4.31
Comment #23
Dries CreditAttribution: Dries commentedWhen reviewing the patch, the static caching looked completely redundant to me, so I was wondering if we could rip it out. I should have been more explicit about that in my comment. Either way, you did exactly what I was hoping for.
I committed this patch to CVS HEAD. +1 for catch. Thanks! :)
At one point, we'll need a CHANGELOG.txt entry to document all the "load multiple" stuff.
Comment #24
catchExcellent, thanks!
Documented here: http://drupal.org/node/224333#file_load_multiple
See also #347250: Multiple load users and #351797: Multiple load vocabularies.