As explained at length in #367753: Bulk deletion, comments 17-32, field_attach_query() as currently implemented has a significant design flaw. It is basically never safe to call, because on a large site it can return an arbitrarily large amount of data, consuming all available memory, and there is no way to avoid the problem. Also, it turns out that the FIELD_QUERY_RETURN_VALUES option cannot be implemented correctly in any reasonable way.

This issue will fix these two problems by (1) removing FIELD_QUERY_RETURN_VALUES entirely, to be replaced by a call to field_attach_load() following field_attach_query(), and (2) adding $count and $offset arguments to field_attach_query() to allow paging through results. Callers that really want to can still specify FIELD_QUERY_NO_LIMIT for $count.

I am pulling these changes out of my Bulk Delete implementation in hopes of making that patch less scary.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bjaspan’s picture

Status: Active » Needs review
FileSize
22.33 KB

First try.

Dries’s picture

Status: Needs review » Fixed

Easy to read, well-documented and tested. Committed to CVS HEAD.

yched’s picture

Status: Fixed » Reviewed & tested by the community
FileSize
1.71 KB

Followup for the hooks doc.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs review

Holy cow, Batman! That was a fast commit. Unfortunately, I wasn't done. :-) New patch:

* Incorporates yched's doc fixes from #3.
* Adds tests for the count/cursor functionality of field_attach_query().
* Fixes a bug in the count/cursor functionality revealed by the tests. :-)
* Adds a $options argument to field_attach_load() making it possible to load a single field, since that is what callers to field_attach_query() will generally want; making them load the entire object as a replacement for RETURN_VALUES is too big a performance hit.

There are no tests for the field_attach_load() changes yet. I NEVER thought I'd have to say this, but Dries please do not commit this patch yet. :-) :-) :-)

bjaspan’s picture

I should add that the f_a_query() changes deserve a review by the original requesters of f_a_query() to make sure the new API is sufficient. I do not think we have an alternative, but that is what reviews are for.

yched’s picture

Barry: no patch :-)

bjaspan’s picture

FileSize
12.92 KB

Oops.

yched’s picture

Also : not sure of what

 * @param $count
 *   The number of results that is requested. This is only a
 *   hint to the storage engine(s); callers should be prepared to
 *   handle fewer or more results.

means for calling code. I.e : "when do I know I got all the results and can stop iterating over f_a_query() ?" when I get less than $count results ? when results is empty ?.
If I trust field_sql_storage_field_storage_query(), it seems "when I get less than $count results" works. Only thing is: results being keyed by object type, counting the actual # of results is (moderately) tedious.

I'm wondering if hook_field_storage_query() could set $cursor to a special value (defined constant equal to -1 ?) when it knows we're done (that would be when $count > 0 after exiting the loop ?)

_taxonomy_clean_field_cache() in #491190: Provide a taxonomy term field type (clean relevant field cache entries when a term is updated or deleted) is going to be one of the 1st actual use case, and I'm not sure which sample calling code to advise there.

bjaspan’s picture

The idea behind $count not being exact is that, especially since data is stored per delta and not per object, it may not be efficient to insist on the exact request number being returned.

I'm wondering if hook_field_storage_query() could set $cursor to a special value (defined constant equal to -1 ?) when it knows we're done (that would be when $count > 0 after exiting the loop ?)

Sounds like a good plan.

bjaspan’s picture

FileSize
14.58 KB

Added FIELD_QUERY_COMPLETE as suggested in #8. No tests for f_a_load($options) yet.

bjaspan’s picture

FileSize
15.89 KB

Tests for field_attach_load($options['field_name']) added.

Should we move the $age argument into $options?

Incidentally, I think I'm the one that created the pattern of a separate function for foo() and foo_revision() in Field API, and frankly I've never liked it. I'd be happy to see the foo_revision() functions go away, with users using whatever argument foo_revision() uses when it calls foo() anyway. But that's a separate issue.

bjaspan’s picture

The answer is yes, we should move $age into $options, but this is getting farther afield from the purpose of this patch, so let's leave it for a followup issue.

I now think the patch is ready for an RTBC and commit.

yched’s picture

FileSize
78.04 KB

- PHPdoc for field_attach_load() and field_attach_load_revision(), $options param:
"Note that returned objects may contain data for other fields, for example if they are read from a cache."
Doesn't seem true, since the code sets $cacheable to FALSE if there is an $options['field_name'], and nothing will be read from cache ?

- Do we still want to have field_sql_storage_field_storage_query() return "stub objects" ? Previously we only returned
$return[$row->type][$id] = $row->entity_id; for FIELD_QUERY_RETURN_IDS.
Why not, I guess. The caller will not be able to build those stub objects with just an id if he needs them (which I suspect is the case for bulk delete ?)

- Optimization in field_sql_storage_field_storage_query: when arriving at the last rows, and the last query returns n < (query range) rows, the code recently committed needs an additional query returning *no* rows before we can exit the do-while loop.
Similarly, with patch #11, FIELD_QUERY_COMPLETE is set only if no results were found, so f_a_query() callers always need an additional call after the one which returned the last actual results (thus an additional 'empty' query).

Attached patch should get rid of those extra queries by keeping two separate counters ($row_count and $obj_count). When the query asked for n rows returns less than n rows, we can exit the do-while and set FIELD_QUERY_COMPLETE (I also changed the code from decremental counters to incremental counters, having ++s mixed with --s was a bit confusing)

Also, removed the references to FIELD_QUERY_RETURN_IDS in the tests.
I didn't change anything related to the first two remarks, waiting for Barry's confirmation.

yched’s picture

FileSize
17.31 KB

Doh, same patch without commenting out tests in FieldAttachTestCase (I often comment out unrelated tests so that I can faster run and re-run the test that interests me, and regularly forget to uncomment before rolling patches...)

yched’s picture

FileSize
17.3 KB

Sigh. And without the (done) note-to-self TODO.

bjaspan’s picture

Instead of removing the comment about other field data being returned, perhaps we should replace $cacheable with $cache_read and $cache_write. There is no reason not to read objects from the cache just because we only want one field, but we cannot add such objects to the cache.

The reason for f_a_query() to return stub objects is so callers can pass the returned array directly to f_a_load(), yes.

I'm fine with your optimization to return FIELD_QUERY_COMPLETE one call earlier. I thought it was good to make one last call to verify no new data had been added, but that is not really essential.

yched’s picture

$cache_read and $cache_write sound good.

yched’s picture

FileSize
17.83 KB

Updated f_a_load() with $cache_read and $cache_write.

Which should make this ready.

yched’s picture

Status: Needs review » Reviewed & tested by the community

Test are green, so RTBC.

Status: Reviewed & tested by the community » Needs work

The last submitted patch failed testing.

yched’s picture

Status: Needs work » Reviewed & tested by the community

Rerolled. Getting this one in would make it easier to reroll and review #367753: Bulk deletion.

yched’s picture

FileSize
17.83 KB

Doh, here's the patch.

yched’s picture

Also, Dries: webchick prefers to leave this one to you, since you're already part of it.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Committed to CVS HEAD. Thanks.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.