Looks very similar to the taxonomy and user versions, upload.module is converted - one possible bit of optimisation would be to make upload_node take an array of nodes and grab every file in one go, at the moment we only reduce queries if there's >1 file upload attached to a node. Tests pass.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drewish’s picture

Status: Needs review » Needs work

sweet! i'd been meaning to start working on this patch.

looking over it i didn't see anything big that jumped out at me... the indenting on:
+ // Remove any loaded files from the array if they don't match $conditions.
was off by a space.

i'd love to expand the unit tests a bit so they cover the file_multiple_load() in more depth.

we also need to update the hook_node_load() docs.

catch’s picture

FileSize
11.53 KB

Modifined hook_file_load() docs, fixed whitespace, fully converted upload_nodeapi_load() so that it takes two queries to get all the files attached to the nodes passed in (this compares to a query per file per node in HEAD).

Still needs some extra tests so leaving at CNW.

catch’s picture

Status: Needs work » Needs review
FileSize
13.11 KB

Now with tests.

in irc drewish suggested removing the mixed param to file_load and make it just take fid. I'm a bit split on this since it seems like filepath (for example), ought to be unique, so providing a function to get a single file by that makes sense - but if it's only used in testing (which it seems like it is), then maybe that's not necessary.

drewish’s picture

catch’s picture

FileSize
14.83 KB

file_load() takes only $fid as argument now.

catch’s picture

FileSize
14.81 KB

better variable naming via drewish in irc.

catch’s picture

FileSize
14.64 KB

Avoid a nested foreach in upload_nodeapi_load() and a small testing cleanup.

catch’s picture

FileSize
14.74 KB

drewish suggested some better docs for the end of file_load_multiple() - array_intersect_key() etc. - I want to abstract this and maybe some other bits of the multiple_load stuff out into helper function(s) which will allow for hopefully making each one a lot less verbose, but that's in a followup patch.

catch’s picture

Haven't benchmarked this yet but here's some maths.

10 nodes with 5 uploads each, quite a lot but not outlandish.

HEAD:
Each node requires 1 query in upload_load() to grab the fids = 10 queries
Each file_load requires a query = 50
Each upload_file_load() requires a query = 50
Each file had seven cats, oh wait...
= 110

Patch:
One query each in upload_nodeapi_load(), file_load_multiple() and upload_file_load()
= 3

Viewing a single node with 5 uploads would be 11 queries vs. 3 with the patch, so a saving there as well.

drewish’s picture

FileSize
20.01 KB

catch, this looks *really* good. I'll confess that I spent a bit of time scratching my head wondering why hook_file_load($files) wasn't going by reference... then it finally clicked that the array isn't byref but the objects inside it were. Anyway...

I made a few changes to white space and comments.

In upload_nodeapi_load() I added this bit:

  // If there aren't any vids don't bother trying to load files.
  if (empty($node_vids)) {
    return;
  }

to avoid the extra query when loading types without uploading enabled.

I flushed out the tests a bit. We could probably do more but I think this gets the important bits. In my mind this is RTBC but I'd love to have a third party give it a look.

Dave Reid’s picture

Initial patch code review (will do a through functionality test in a little bit):

In file_load_multiple:
$query->conditions('f.' . $field, $value); should be $query->condition(...).

In upload_file_load()/hook_file_load:
With db_query(...)->fetchAll(PDO::FETCH_ASSOC), do not need to cast results into arrays in the foreach loops. They are already arrays.

In upload_nodeapi_load():

+  // Fetch the fids associated with these node revisions.
+  $result = db_query('SELECT u.fid, u.nid, u.vid FROM {upload} u WHERE u.vid IN (' . db_placeholders($node_vids) . ') ORDER BY u.weight, u.fid', $node_vids);
+
+  // We need to build an array of the file ids for file_load_multiple() and an
+  // array of upload records for matching files to nodes.
+  $fids = array();
+  $uploads = array();
+  foreach ($result as $record) {
+    $fids[] = $record->fid;
+    $uploads[] = $record;
+  }

Could easily be condensed to:

$uploads = db_query('SELECT u.fid, u.nid, u.vid FROM {upload} u WHERE u.vid IN (' . db_placeholders($node_vids) . ') ORDER BY u.weight, u.fid', $node_vids)->fetchAllAssoc('fid');
$fids = array_keys($uploads);
catch’s picture

FileSize
19.72 KB

Thanks Dave, fixed the typo and the array casting.

With the last point - afaik there's nothing stopping two uploads sharing the same fid (particularly if there's contrib interaction with the upload UI), and fetchAllAssoc('fid') would overwrite should that be the case - so I deliberately didn't use it here. If I'm wrong on that, then yes it would indeed be more concise, have left it for now though.

drewish’s picture

FileSize
20.21 KB

Actually you don't even need contrib to get multiple nodes sharing an fid... just translate the node. It looks really awkward though and to help prevent this from being "optimized" later I've reworked the comments in this function to explain why we're doing it the long way.

catch’s picture

FileSize
19.87 KB

Added some clarifications about the $conditions array after some review with Drewish and Damz in irc (this pretty much matches the node_load_multiple() docs now). Also we can do a select * rather than hitting the schema in the select now. This might be worth porting back into taxonomy_load_multiple() too.

catch’s picture

FileSize
20.05 KB

Cross-posted with drewish, here's a merge of the two.

drewish’s picture

Status: Needs review » Reviewed & tested by the community

I gave it another close look and at this point I'd say it's good to go.

catch’s picture

FileSize
20.01 KB

Re-rolled for the new spangly db_placeholders() replacement.

catch’s picture

FileSize
20.01 KB

And without whitespace silliness.

Quick benchmarks - node/1 with 5 uploads. Touch faster. Can't get devel generate to work with uploads in HEAD so will have to wait for those.

HEAD:
5.57 #/sec
5.69 #/sec
5.78 #/sec

Patch:
5.82 #/sec
5.87 #/sec
5.83 #/sec

catch’s picture

Got devel to generate some uploads. Ten nodes on front page, an upload for each:

ab -c1 -n500 http://d7.7/node

All numbers are requests/sec

HEAD:
4.68
4.66
4.68

Patch:
4.73
4.83
4.69

Devel confirms that we save 27 queries in the same comparison - each node with an upload currently requires 1 initial query + 2 per upload, that's 30 queries in the above example, it's three with the patch regardless of how many nodes or uploads are loaded.

HEAD:
Executed 68 queries in 17.74 milliseconds.

Patch
Executed 41 queries in 14.2 milliseconds

Dries’s picture

catch, can you try to disable the new file caching and then run the benchmarks again?

catch’s picture

Hi Dries,

I'd love to, but I don't know what 'the new file caching' is.

catch’s picture

FileSize
17 KB

webchick and Damz suggest it's the $file_cache static, which makes sense of course. And after some discussion in irc, there doesn't seem a lot of reason to have it in this case - i.e. files attached to nodes will be cached in the node static anyway, and we're not going to load the same file on the same page that often.

Here's a patch which rips that bit out, and benchmarks. A bit faster if anything.

HEAD:
4.10
4.11
4.15

Patch
4.42
4.34
4.31

Dries’s picture

Status: Reviewed & tested by the community » Fixed

When reviewing the patch, the static caching looked completely redundant to me, so I was wondering if we could rip it out. I should have been more explicit about that in my comment. Either way, you did exactly what I was hoping for.

I committed this patch to CVS HEAD. +1 for catch. Thanks! :)

At one point, we'll need a CHANGELOG.txt entry to document all the "load multiple" stuff.

catch’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.