Multiple load files [#348201]

Comment	File	Size	Author
#22	file_load_multiple.patch	17 KB	catch

#18	file_load_multiple.patch	20.01 KB	catch

#17	file_load_multiple.patch	20.01 KB	catch

#15	file_load_multiple.patch	20.05 KB	catch

#14	file_load_multiple.patch	19.87 KB	catch

#13	file_348201.patch	20.21 KB	drewish

#12	file_load_multiple.patch	19.72 KB	catch

#10	file_348201.patch	20.01 KB	drewish

#8	file_load_multiple.patch	14.74 KB	catch

#7	file_load_multiple.patch	14.64 KB	catch

#6	file_load_multiple.patch	14.81 KB	catch

#5	file_load_multiple.patch	14.83 KB	catch

#3	file_load_multiple.patch	13.11 KB	catch

#2	file_load_multiple.patch	11.53 KB	catch

	file_load_multiple.patch	8.03 KB	catch

Comment #1

drewish CreditAttribution: drewish commented 18 December 2008 at 08:01

Status:

Needs review

» Needs work

sweet! i'd been meaning to start working on this patch.

looking over it i didn't see anything big that jumped out at me... the indenting on:
+ // Remove any loaded files from the array if they don't match $conditions.
was off by a space.

i'd love to expand the unit tests a bit so they cover the file_multiple_load() in more depth.

we also need to update the hook_node_load() docs.

Log in or register to post comments

Comment #2

catch

he/him

English

CreditAttribution: catch commented 18 December 2008 at 23:33

File	Size
file_load_multiple.patch	11.53 KB

Modifined hook_file_load() docs, fixed whitespace, fully converted upload_nodeapi_load() so that it takes two queries to get all the files attached to the nodes passed in (this compares to a query per file per node in HEAD).

Still needs some extra tests so leaving at CNW.

Log in or register to post comments

Comment #3

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 00:06

Status:

Needs work

» Needs review

File	Size
file_load_multiple.patch	13.11 KB

Now with tests.

in irc drewish suggested removing the mixed param to file_load and make it just take fid. I'm a bit split on this since it seems like filepath (for example), ought to be unique, so providing a function to get a single file by that makes sense - but if it's only used in testing (which it seems like it is), then maybe that's not necessary.

Log in or register to post comments

Comment #4

drewish CreditAttribution: drewish commented 19 December 2008 at 00:12

catch, ought to is the key word. see #329301: Rename {files} to {file} and add unique key to the filepath column

Log in or register to post comments

Comment #5

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 00:43

File	Size
file_load_multiple.patch	14.83 KB

file_load() takes only $fid as argument now.

Log in or register to post comments

Comment #6

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 00:55

File	Size
file_load_multiple.patch	14.81 KB

better variable naming via drewish in irc.

Log in or register to post comments

Comment #7

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 01:35

File	Size
file_load_multiple.patch	14.64 KB

Avoid a nested foreach in upload_nodeapi_load() and a small testing cleanup.

Log in or register to post comments

Comment #8

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 02:05

File	Size
file_load_multiple.patch	14.74 KB

drewish suggested some better docs for the end of file_load_multiple() - array_intersect_key() etc. - I want to abstract this and maybe some other bits of the multiple_load stuff out into helper function(s) which will allow for hopefully making each one a lot less verbose, but that's in a followup patch.

Log in or register to post comments

Comment #9

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 02:50

Haven't benchmarked this yet but here's some maths.

10 nodes with 5 uploads each, quite a lot but not outlandish.

HEAD:
Each node requires 1 query in upload_load() to grab the fids = 10 queries
Each file_load requires a query = 50
Each upload_file_load() requires a query = 50
Each file had seven cats, oh wait...
= 110

Patch:
One query each in upload_nodeapi_load(), file_load_multiple() and upload_file_load()
= 3

Viewing a single node with 5 uploads would be 11 queries vs. 3 with the patch, so a saving there as well.

Log in or register to post comments

Comment #10

drewish CreditAttribution: drewish commented 19 December 2008 at 03:14

File	Size
file_348201.patch	20.01 KB

catch, this looks *really* good. I'll confess that I spent a bit of time scratching my head wondering why hook_file_load($files) wasn't going by reference... then it finally clicked that the array isn't byref but the objects inside it were. Anyway...

I made a few changes to white space and comments.

In upload_nodeapi_load() I added this bit:

  // If there aren't any vids don't bother trying to load files.
  if (empty($node_vids)) {
    return;
  }

to avoid the extra query when loading types without uploading enabled.

I flushed out the tests a bit. We could probably do more but I think this gets the important bits. In my mind this is RTBC but I'd love to have a third party give it a look.

Log in or register to post comments

Comment #11

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 19 December 2008 at 19:34

Initial patch code review (will do a through functionality test in a little bit):

In file_load_multiple:
$query->conditions('f.' . $field, $value); should be $query->condition(...).

In upload_file_load()/hook_file_load:
With db_query(...)->fetchAll(PDO::FETCH_ASSOC), do not need to cast results into arrays in the foreach loops. They are already arrays.

In upload_nodeapi_load():

+  // Fetch the fids associated with these node revisions.
+  $result = db_query('SELECT u.fid, u.nid, u.vid FROM {upload} u WHERE u.vid IN (' . db_placeholders($node_vids) . ') ORDER BY u.weight, u.fid', $node_vids);
+
+  // We need to build an array of the file ids for file_load_multiple() and an
+  // array of upload records for matching files to nodes.
+  $fids = array();
+  $uploads = array();
+  foreach ($result as $record) {
+    $fids[] = $record->fid;
+    $uploads[] = $record;
+  }

Could easily be condensed to:

$uploads = db_query('SELECT u.fid, u.nid, u.vid FROM {upload} u WHERE u.vid IN (' . db_placeholders($node_vids) . ') ORDER BY u.weight, u.fid', $node_vids)->fetchAllAssoc('fid');
$fids = array_keys($uploads);

Log in or register to post comments

Comment #12

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 21:11

File	Size
file_load_multiple.patch	19.72 KB

Thanks Dave, fixed the typo and the array casting.

With the last point - afaik there's nothing stopping two uploads sharing the same fid (particularly if there's contrib interaction with the upload UI), and fetchAllAssoc('fid') would overwrite should that be the case - so I deliberately didn't use it here. If I'm wrong on that, then yes it would indeed be more concise, have left it for now though.

Log in or register to post comments

Comment #13

drewish CreditAttribution: drewish commented 19 December 2008 at 22:42

File	Size
file_348201.patch	20.21 KB

Actually you don't even need contrib to get multiple nodes sharing an fid... just translate the node. It looks really awkward though and to help prevent this from being "optimized" later I've reworked the comments in this function to explain why we're doing it the long way.

Log in or register to post comments

Comment #14

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 23:20

File	Size
file_load_multiple.patch	19.87 KB

Added some clarifications about the $conditions array after some review with Drewish and Damz in irc (this pretty much matches the node_load_multiple() docs now). Also we can do a select * rather than hitting the schema in the select now. This might be worth porting back into taxonomy_load_multiple() too.

Log in or register to post comments

Comment #15

catch

he/him

English

CreditAttribution: catch commented 19 December 2008 at 23:23

File	Size
file_load_multiple.patch	20.05 KB

Cross-posted with drewish, here's a merge of the two.

Log in or register to post comments

Comment #16

drewish CreditAttribution: drewish commented 19 December 2008 at 23:43

Status:

Needs review

» Reviewed & tested by the community

I gave it another close look and at this point I'd say it's good to go.

Log in or register to post comments

Comment #17

catch

he/him

English

CreditAttribution: catch commented 24 December 2008 at 14:49

File	Size
file_load_multiple.patch	20.01 KB

Re-rolled for the new spangly db_placeholders() replacement.

Log in or register to post comments

Comment #18

catch

he/him

English

CreditAttribution: catch commented 24 December 2008 at 16:17

File	Size
file_load_multiple.patch	20.01 KB

And without whitespace silliness.

Quick benchmarks - node/1 with 5 uploads. Touch faster. Can't get devel generate to work with uploads in HEAD so will have to wait for those.

HEAD:
5.57 #/sec
5.69 #/sec
5.78 #/sec

Patch:
5.82 #/sec
5.87 #/sec
5.83 #/sec

Log in or register to post comments

Comment #19

catch

he/him

English

CreditAttribution: catch commented 29 December 2008 at 14:24

Got devel to generate some uploads. Ten nodes on front page, an upload for each:

ab -c1 -n500 http://d7.7/node

All numbers are requests/sec

HEAD:
4.68
4.66
4.68

Patch:
4.73
4.83
4.69

Devel confirms that we save 27 queries in the same comparison - each node with an upload currently requires 1 initial query + 2 per upload, that's 30 queries in the above example, it's three with the patch regardless of how many nodes or uploads are loaded.

HEAD:
Executed 68 queries in 17.74 milliseconds.

Patch
Executed 41 queries in 14.2 milliseconds

Log in or register to post comments

Comment #20

Dries CreditAttribution: Dries commented 30 December 2008 at 20:51

catch, can you try to disable the new file caching and then run the benchmarks again?

Log in or register to post comments

Comment #21

catch

he/him

English

CreditAttribution: catch commented 30 December 2008 at 21:49

Hi Dries,

I'd love to, but I don't know what 'the new file caching' is.

Log in or register to post comments

Comment #22

catch

he/him

English

CreditAttribution: catch commented 30 December 2008 at 22:46

File	Size
file_load_multiple.patch	17 KB

webchick and Damz suggest it's the $file_cache static, which makes sense of course. And after some discussion in irc, there doesn't seem a lot of reason to have it in this case - i.e. files attached to nodes will be cached in the node static anyway, and we're not going to load the same file on the same page that often.

Here's a patch which rips that bit out, and benchmarks. A bit faster if anything.

HEAD:
4.10
4.11
4.15

Patch
4.42
4.34
4.31

Log in or register to post comments

Comment #23

Dries CreditAttribution: Dries commented 31 December 2008 at 11:09

Status:

Reviewed & tested by the community

» Fixed

When reviewing the patch, the static caching looked completely redundant to me, so I was wondering if we could rip it out. I should have been more explicit about that in my comment. Either way, you did exactly what I was hoping for.

I committed this patch to CVS HEAD. +1 for catch. Thanks! :)

At one point, we'll need a CHANGELOG.txt entry to document all the "load multiple" stuff.