field_attach_query is totally useless. Edit: this proposal allows querying for entities based on their properties (both stored in the entity table and in field columns) but nothing more. Ie. if you want to list nodes created by users who joined the last 10 days that wont work. But, I am fairly sure quite a number of sites (yes, very likely examiner.com too) can be built by using this small query builder.

/**
 * Retrieve entities matching a given set of conditions.
 *
 * Note that the query 'conditions' only apply to the stored values.
 * In a regular field_attach_load() call, field values additionally go through
 * hook_field_load() and hook_field_attach_load() invocations, which can add
 * to or affect the raw stored values. The results of field_attach_query()
 * might therefore differ from what could be expected by looking at a regular,
 * fully loaded entity.
 *
 * Not all storage engines are required to support queries on all column, or
 * with all operators below. A FieldQueryException will be raised if an
 * unsupported condition is specified or if the query does not specify exactly
 * one field storage engine.
 *
 * @param $entity_conditions
 *   Either a list of conditions, where each condition is
 *   array(entity type, column, value, operator) or just a list of
 *   entity types.
 * @param $field_conditions
 *   Another list of conditions, where each condition is
 *   array(field_name, column, value, operator, delta group,
 *   language group). delta group and language_group are optional arbitrary
 *   identifiers, conditions in the same group must have the same delta /
 *   language.
 *
 * Supported operators:
 *   - '=', '!=', '>', '>=', '<', '<=', 'STARTS_WITH': these operators expect
 *     the value as a literal of the same type as the column,
 *   - 'IN', 'NOT IN': this operator expects the value as an array of
 *     literals of the same type as the column.
 *   - 'BETWEEN': this operator expects the value as an array of two literals
 *     of the same type as the column.
 *   The operator can be ommitted, and will default to 'IN' if the value is an
 *   array, or to '=' otherwise.
 *
 * Example values for $entity_conditions:
 * @code
 * array(
 *   array('node', 'status', 1),
 *   array('node', 'created', 1234567890, '>'),
 * );
 * array('node', 'user');
 * @endcode
 *
 * Example values for $field_conditions:
 *   @code
 *   array(
 *     array('counter', 'value', 12, '>'),
 *     array('classification', 'tid', 15),
 *   );
 *   array(
 *     array('body', 'format', 1, '=', 'node_body'),
 *     array('body', 'summary', 'dear', 'STARTS_WITH', 'node_body'),
 *   );
 *   @endcode
 *
 * @param $options
 *   An associative array of additional options:
 *   - deleted: TRUE to include deleted fields.
 *   - limit: The number of results that is requested. This is only a hint to
 *     the storage engine(s); callers should be prepared to handle fewer or
 *     more results. Specify FIELD_QUERY_NO_LIMIT to retrieve all available
 *     entities. This option has a default value of 0 so callers must make an
 *     explicit choice to potentially retrieve an enormous result set.
 *   - count: If TRUE, return a single count of all matching entities; limit and
 *     cursor are ignored.
 *   - age: Internal use only. Use field_attach_query_revisions() instead of
 *     passing FIELD_LOAD_REVISION.
 *     - FIELD_LOAD_CURRENT (default): query the most recent revisions for all
 *       entities. The results will be keyed by entity type and entity id.
 *     - FIELD_LOAD_REVISION: query all revisions. The results will be keyed by
 *       entity type and entity revision id.
 *   - sort: A list of sorts. Every sort is a list. To sort on an entity column,
 *     use array('entity', $column, $direction) where $column is a column
 *     defined in hook_schema() for the entity base table and $direction is
 *     'asc' or 'desc'. To sort on a field column, use array('field',
 *     $field_name, $colum_name, $direction).
 *   @code
 *     array(
 *       array('entity', 'status', 'asc'),
 *       array('field', 'classification', 'tid', 'desc'),
 *     )
 *   @endcode
 * @return
 *   An array or an object implementing Traversable, each row will be an array
 *   or an object, containing at least entity_type and entity_id.
 */
CommentFileSizeAuthor
#146 tests.patch33.73 KBdmitrig01
#144 entity-query-780154-144.patch101.6 KBchx
#143 entity-query-780154-143.patch101.6 KBchx
#137 entity-query-780154-135.patch101.58 KBchx
#134 entity-query-780154-134.patch71.52 KBbjaspan
#131 780154-131.patch101.66 KBchx
#130 780154-130.patch101.74 KBchx
#120 780154-120.patch101.29 KBjhodgdon
#119 780154-119.patch99.59 KBchx
#118 780154-118.patch99.59 KBchx
#111 780154_entity_field_query_111.patch101.01 KBaspilicious
#109 780154-minor-reorg-109.patch99.67 KBpwolanin
#108 780154-minor-reorg-108.patch66.38 KBpwolanin
#106 780154-doccleaned.patch100.69 KBjhodgdon
#100 entity_field_query_21.patch99.15 KBdhthwy
#98 entity_field_query.patch86.38 KBchx
#96 entity_field_query_19.patch86.77 KBdhthwy
#94 field_listing_api.patch77.65 KBbcn
#90 fields_listing_api.patch79.32 KBbcn
#87 entity_field_query.patch77.64 KBchx
#80 entity_field_query.patch76.24 KBchx
#79 entity_field_query_16.patch63.43 KBdawehner
#77 entity_field_query.patch76.73 KBchx
#74 entity_field_query.patch76.59 KBchx
#67 entity_field_query.patch75.38 KBchx
#58 780154_entity_field_query_58.patch74.94 KBaspilicious
#57 entity_field_query.patch73.28 KBchx
#54 entity_field_query.patch73.27 KBchx
#50 entity_field_query.patch73.1 KBchx
#49 entity_field_query.patch73.34 KBchx
#46 entity_field_query.patch73.21 KBchx
#45 entity_field_query.patch73.22 KBchx
#40 entity_field_query.patch72.72 KBchx
#37 entity_field_query.patch58 KBchx
#35 entity_field_query.patch57.1 KBchx
#34 entity_field_query.patch58.78 KBchx
#29 entity_field_query.patch70.11 KBchx
#24 entity_field_query.patch67.67 KBchx
#22 entity_field_query.patch68.67 KBchx
#21 entity_field_query.patch71 KBchx
#18 field_attach_query.patch49.48 KBchx
#6 field_attach_query.patch48.6 KBchx
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dmitrig01’s picture

Crell’s picture

I am intrigued by this concept. Positively so. The API implied by the docblock above is, I think, not good, as it's still passing around lots of complicated arrays, but the principle is sound. I'd be much happier with an interface that was modeled on a join-less, reduced capability version of the SelectQuery builder in the DB layer as it will be much easier to work with and read. If we can get something like this in core, though, that would be highly nifty.

It's kinda like a micro-views for fields, which is cool.

chx’s picture

Sure thing we can flip this around to be

$query = new fieldAttachQuery;
$query
  ->fieldCondition($field_name, $column_name, $value, $operator);
  ->entityCondition($entity_column, $value, $operator);
  ->sort($field_name, $field_column, $direction);
$entities = $query->execute();
foreach ($entities as $entity) ...

that's a miniscule detail and I am not opposed to it if that's what it takes to get my baby views in core.

Crell’s picture

Yep, that's a much nicer looking approach. I like. (Give or take details that we'll hash out in later comments once I've had a chance to look at the code, of course.)

Damien Tournoud’s picture

That would be a really nice addition/replacement to our current (very limited) field_attach_query().

I quickly discussed the API with chx. It seems that what we only need here is a class that:

(1) collect the query data,
(2) determine which field storage module applies to the query (and raise an exception if a query is attempted across storage modules)
(3) call the hook_field_storage_query() of the field storage module and pass it relevant data

chx’s picture

FileSize
48.6 KB
chx’s picture

So we have this really pretty field API, gives you forms, insert/update/delete and load. But, if you want to create a list of anything (like the Drupal frontpage, users tagged by a term etc) you have the following options:

  1. Run field_attach_query and retrieve the field values for every entity. You dont have another choice here because there is no sort and so limit is useless. Repeat for every column you have a condition on... this does not lead anywhere.
  2. One day in the very distant future every storage engine will have Views support and field integration too. Hopefully.
  3. Just give up and write direct queries against the storage engine you use today. Hopefully it's the right choice in 1-2 years too... because you have tied your applicaiton to that database.

If this gets in then a surprisingly large amount of listings can simply go through this simple (and yes it IS simple) query builder which is trivial to support by a field storage engine (the SQL implementation is 80 LoC).

While we were developing examiner.com and the mongodb query backend we needed to change the data structure a few times of how fields are stored in mongodb and rewrite all the 80+ queries we had. By hand. Several times. It was fun. Maybe not. So to add to 3. you should hope that not just the database backend you picked is up to the task but the Drupal integration too. 'Cos if this would be in, we could just change the field_storage_query and noone is the wiser.

chx’s picture

Also note that this patch was not even possible to write before we did months of work with the field API so we know what's really needed and what are the pitfalls. Read the delta_group doxygen for the pitfall of the decade :)

bleen’s picture

Wow ... that's the most commented patch I'm ever seen. Rockin'

bjaspan’s picture

subscribe

chx’s picture

Also note that I wrote the docs and Dmitri wrote tests based on docs and I wrote the implementation totally separately. Test driven development. Worked fairly well even if in the current state the tests dont pass. Before the OOP wrapper they did :)

catch’s picture

field_attach_query() was left intentionally somewhat useless because when it was added, we had no idea what non-SQL storage backends would look like - see issues like #552716: Impossible to join or order by on field tables without using Views. However now we do, we know it's far too limited to be of use with the only viable non-SQL fields backend we have.

I looked at the initial phpdoc for the patch with chx at SF but haven't reviewed the code yet, however this offers roughly as much functionality as you get with the PHP mongodb extension (and not more than that except perhaps some edge cases), and a very useful subset of what's possible with custom queries or Views against SQL field storage.

We're running into similar issues in #556022: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do where barry just wrote an initial patch for field_attach_bulk_update().

chx’s picture

Note that this is more generic than what mongodb allows. A little more but still. But that's not a problem, I am just saying, we are not trying to sneak in a MongoDB query builder.

bjaspan’s picture

Agreed that we always knew f_a_query() was very weak and we were waiting for more experience to see how to do it better. It's great for Drupal that Examiner provided that experience. Conceptually, I approve of this approach.

If I do not specify any entity types or conditions, just fields, this code lets me query field data from a storage engine different than the storage engine of the entity the field is attached to, right? That is important. We do not want to require that fields on an entity be stored in the same storage engine as the entity if we are willing to not cross-query them.

It does not look like this patch yet supports 'deleted' queries properly, and thus deleted fields purge won't work. Presumably the tests will catch that until it works.

The entire Field API is procedural (because we decided this big a change plus designing the right OO interface was too much at once). Is making this one part of the Field API OO really a good idea? On the other hand, I do recognize that the OO interface is probably a cleaner way of specifying the args to the query. So I'm torn on this.

Nothing else in the Field API touches data outside the Field API internal SQL tables or the field storage engine. So this version of field_attach_query() is a new kind of animal. Should it be part of the "Field Entity API", thus field_entity_query() or fieldEntityQuery()?

+++ modules/field/field.attach.inc	2010-04-26 01:13:43 +0000
@@ -1044,108 +1044,224 @@ function field_attach_delete_revision($e
+ * Not all storage engines are required to support queries on all column, or

Typo: "columns"

+++ modules/field/field.attach.inc	2010-04-26 01:13:43 +0000
@@ -1044,108 +1044,224 @@ function field_attach_delete_revision($e
+    return $query->execute();

It's not immediately obvious why execute() and entityQuery() can't cause a recursive loop.

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	2010-04-26 01:14:45 +0000
@@ -472,129 +472,85 @@ function field_sql_storage_field_storage
+    $query->condition("$table_alias.deleted", !$deleted);

The current semantics are that f_a_query() restricts to non-deleted data unless a condition on 'deleted' is specified, in which case we respect the condition. It looks to me like the patch always restricts to non-deleted, period (b/c there is no way to specify otherwise).

90 critical left. Go review some!

chx’s picture

  1. Yes, I forgot to add a deleted method, but the code is there. As said the OOP wrapper has issues.
  2. The entity API on the other hand is quite OOP already. What about EntityFieldQuery?
  3. Also "starts with" op is not implemented
  4. Re. recursive loop: because $query is a SelectQueryInterface object, courtesy of DBTNG, I will add a comment.
  5. Edit: the code, in itself does not implement cross engine querying but does not stop you from doing it, either as your storage query callback will get fieldConditions and entityConditions and then let you cook whatever you can cook with them. Good luck writing a cross engine query, however. Also, if your driver is an entity storage aside from being a field storage too, then it can peruse the pre hook to implement the case when there are no fieldConditions only entityConditions.

Thanks for approving.

bjaspan’s picture

re #15: On item 5, my question was not whether the code supports cross-engine queries. My question was whether there is any assumption that if entities are stored in SQL, fields in mongodb, and I query the field without any entity conditions, will it work? I think it will, because the field storage module that builds up its query will not try to connect to any entity storage since it isn't given any entity types to connect to, I just want to make sure that is part of the official specification. Otherwise, we lose the ability to support "remote fields."

re: EntityFieldQuery. I actually like that more for a few reasons:

1. Field API is currently only concerned with fields. Entities are already concerned with both themselves and with the Field Attach APIs. So the Entity API is higher-level than Field API, and it makes sense for entity-and-field query logic to belong to the Entity API.

2. It eliminates the dichotomy with Field API being procedural except for the query function.

3. By positioning the Entity Field Query hook as outside the Field API, it means a conforming Field storage engine does not have to implement. This is good, as it is not obvious that all field storage engines can implement it, even just for queries across multiple fields stored in that engine. For example, a read-only remote field storage engine against Amazon skus or Flickr photos or whatever that exposes fields like "book" or "photo" and "user" may not support a lot of cross-field querying. Field SQL Storage will choose to implement it because as core's default storage engine it wants to support both Entities and Fields, but other storage engines won't have to.

chx’s picture

I will make the documentation cleaner -- there is already a warning the field storage engine can throw an exception if it gets a combination it can't support. That combination can be just from field conditions or from field and entity conditions both. We do not restrict or demand anything from the storage engine. The fieldAttachQuery (soon entityFieldQuery) methods pretty much just collect information and then hand over to the relevant storage engine. Also, I provided a utiltity function which handles the degenerate no-field-only-entity conditions but as mentioned above , even this can be overtaken in the pre query callback.

chx’s picture

FileSize
49.48 KB

Just for fun here is the latest patch which adds a deleted method, "TRUE to only return deleted data, FALSE to return non-deleted data, NULL to return everything. Defaults to FALSE." -- do we want FALSE or NULL for a default? and makes the tests pass. Still needs renaming. I have changed the comments slightly on the class to more emphasize that who-knows-what will succeed:

Storage engines are not required to support all kinds of queries. A FieldQueryException will be raised if an unsupported condition is specified or if the query has field conditions / sorts which are stored in different field storage engines. It entirely depends on the field storage engine what sort of queries it supports and what kind it does not. For example, if a field is stored in a separate datastore than the entities they belong to then it's very likely that passing in both a fieldCondition and an entityCondition will raise this exception.
bleen’s picture

Status: Active » Needs review

here testbot ... here boy

Status: Needs review » Needs work

The last submitted patch, field_attach_query.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
71 KB

The patch might look huge but a real lot of it is just documentation. There is a lot of that.

chx’s picture

FileSize
68.67 KB

Bah! Another reason is a patch mixup, not that helps a lot....

Status: Needs review » Needs work

The last submitted patch, entity_field_query.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
67.67 KB

Keeping up with HEAD

Status: Needs review » Needs work

The last submitted patch, entity_field_query.patch, failed testing.

aaron’s picture

looks useful; a step towards a light views in core. +1 and subscribe.

yched’s picture

Slowly getting out of a "days fully packed with off-drupal stuff" period, I didn't notice this thread so far.
I like the direction too.

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+ * engine and hands over the query to hook_field_storage_query. If there are
+ * neither field conditions nor field orders then
+ * EntityFieldQuery::entityQuery() queries the SQL entity tables.

The entity "base table" is not necessarily SQL ?

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+   * @param $field_name
+   *   Name of the field.

f_a_query() used $field_id. That was pretty critical for the 'field purge' code : there can be several fields with the same name (one real, non-deleted field + one or more deleted-not-yet-purged fields)

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+   * @param $column
+   *   A column defined in the hook_field_schema() of this field. entity_id and
+   *   bundle can also be used.

Strictly speaking, 'entity_id' and 'bundle' are then more 'entity conditions' than 'field conditions', aren't they ? It just so happens that it's the Field storages who know those properties by a generic, entity-agnostic name (entity_id vs nid, uid... / bundle vs type, vocab_machine_name...)

With the current state of the proposed API, you can either
- use entityCondition(), but you then need to specify the relevant column name for the entity-type ('nid', 'uid')
- use fieldCondition(), and you can then use a generic property name ('entity_id')

OK I guess, but a bit confusing...

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+   *   - '=', '!=', '>', '>=', '<', '<=', 'STARTS_WITH': these operators expect

removes 'ENDS_WITH' (fair enough) and 'CONTAINS' (more problematic ?). Not all engines are supposed to support all operators anyway...

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+  public function entityOrderBy($column, $direction) {

Shouldn't this one take an $entity_type param, just like entityCondition() ?

BTW, entityTypes() method / $entity_type param confuse me, since entityQuery() only supports querying against exactly 1 entity type.

+++ includes/entity.inc	2010-05-03 09:11:30 +0000
@@ -298,3 +298,339 @@ class DrupalDefaultEntityController impl
+    $query->addExpression(':entity_type', 'entity_type', array(':entity_type' => $entity_type));

addExpression() ? why not condition() ?

+++ modules/field/field.api.php	2010-05-03 23:49:41 +0000
@@ -1620,6 +1625,8 @@ function hook_field_storage_pre_update($
+<<<<<<< TREE
+=======

:-)

+++ modules/field/field.attach.inc	2010-05-03 05:45:19 +0000
@@ -1042,132 +1033,6 @@ function field_attach_delete_revision($e
-function field_attach_query_revisions($field_id, $conditions, $options = array()) {

The proposed API doesn't currently support this notion of 'I want to query against current revisions only' or 'I want to query against all revisions'

+++ modules/field/field.crud.inc	2010-05-03 08:10:30 +0000
@@ -993,21 +993,17 @@ function field_purge_batch($batch_size) 
-          // field_attach_query() may return more results than we asked for.
-          // Stop when he have done our batch size.
-          if ($batch_size-- <= 0) {
-            return;
-          }
-

Not sure if that's still needed or not. IIRC, the fact that f_a_query() could return more results that the requested $range was kind of tightly related to the fact that fields have deltas spanning over several rows.

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	2010-05-03 23:44:17 +0000
@@ -472,129 +472,122 @@ function field_sql_storage_field_storage
+          $query->where("$table_alias.$column = " . $groups[$column][$group_name] . ".$column");

where() ? why not condition() ?

+++ modules/field/tests/field_test.field.inc	2010-05-03 09:24:02 +0000
@@ -26,6 +26,14 @@ function field_test_field_info() {
+      'description' => t('Another dummy field.'),

'another dummy field *type*' :-)

+++ modules/field/tests/field_test.field.inc	2010-05-03 09:24:02 +0000
@@ -33,18 +41,36 @@ function field_test_field_info() {
+  else {
+    return array(
+      'columns' => array(
+        'shape' => array(

Nitpick, I'd rather see an explicit switch ($field['type']) {

+++ modules/field/tests/field_test.install	2010-05-03 09:14:41 +0000
@@ -50,6 +50,37 @@ function field_test_schema() {
+  $schema['test_entity_1'] = array(
+    'description' => 'The base table for test_entities.',

That's the base table for 'test_entity_1' entity type. The description for 'test_entity_2' table needs fixing too.

Those additional test entity types were a bit hastily crammed onto the existing test entity types : having
test_entity, test_cacheable_entity, test_entity_1, test_entity_2, is not that great.

+++ modules/simpletest/tests/entity_query.test	2010-05-03 09:54:39 +0000
@@ -0,0 +1,304 @@
+   * Test field_attach_query().

:-)

General nitpicks :
- 3rd person for the PHPdoc one-liners
- I think we're now supposed to add an empty line between the last @param and the @return sections
87 critical left. Go review some!

yched’s picture

Additionally :

- the code duplication between entityQuery() and the part of field_sql_storage_field_storage_query() that handles $entity_conditions is a bit unfortunate.

- f_a_query() did not perform a full load() on the selected entities, but only returned a stub object with base properties (id, bundle, revision id if applicable). Most of the times, all you'll want is a list of ids. Do we really need to force a full load ?

chx’s picture

Status: Needs work » Needs review
FileSize
70.11 KB

I think there are still a few failures left but I think quite some of these are coming from the tests themselves being... weird.

chx’s picture

I crossposted with yched, sorry, have not seen his review! Will work on that too. Quick remark: yes the entity_id and the bundle are a bit weird will elaborate on those.

Edit: the list of ids is totally useless. What are you going to do with them if not load them? If you dont load you lose the opportunity to entitycache them foir example.

Crell’s picture

I may be passing the IDs off to some other routine that is going to do something with them. That routine doesn't need full objects yet, just the IDs. Perhaps we're putting those IDs into a queue for later processing. We don't want to load the entities now, we want to load them when we process that item in the queue.

yched’s picture

The existing core use cases for f_a_query() are examples of 'full objects not needed' (necessarily, since f_a_query() currently only returns ids)

catch’s picture

I also think we should return IDs rather than full objects, queue is the obvious use case but I'm sure there's others. It's also easy to create a wrapper which loads full objects if we want that too. Will try to give this a proper look asap.

chx’s picture

FileSize
58.78 KB

Sure thing. Edit: this does not yet contain yched's findings.

chx’s picture

FileSize
57.1 KB

Erm. I left in some debug code. I think we have about a dozen fails left and then addressing yched's concerns. Now we return the same stub entities as field_attach_query did.

Status: Needs review » Needs work

The last submitted patch, entity_field_query.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
58 KB

STILL need to address yched's concerns but this should, you know, pass. The rest should be easy.

Dries’s picture

I begun to look at this, but will need to take another pass it before I fully grok it.

It looks like this patch has the support from many people I trust -- I'm left wondering if they recommend me to make a code freeze exception for this.

Crell’s picture

I haven't looked at the code yet, so I cannot comment on that specifically. However, I do support the concept and assuming the code is up to snuff or can be brought up to snuff I would support allowing this in. It should simplify a lot of ugliness, and we know it's based on real-world use cases. API cleanup and simplification is still on the table, is it not?

chx’s picture

FileSize
72.72 KB

After some discussion with Crell, we have arrived to this version which sports entity, field and property conditions. entity conditions are on entity_id, revision_id, entity_type and bundle. property conditions are on the entity properties, like node.created. field conditions are unchanged. This addresses yched's concerns and unless I missed something this should be ready.

webchick’s picture

Priority: Normal » Critical

API cleanup and simplification is still on the table, is it not?

No!!!! And I really don't know how we can make this any more flamingly clear! :( We had 18 months of code thaw, followed by a good 6 months of "polish" explicitly for API clean-ups and simplifications. We're done, apart from critical fixes. Our goal at this point is simply to get Drupal 7 out the door (that's right! you've forced me to resort to blink tags! :P). The fact that I'm reading questions like this in May 2010 from one of the top 25 contributors to Drupal 7 makes my stomach absolutely churn... ugh. If you have ideas on how we can help make the fact that polish phase is over more blindingly obvious, please let us know.

However, the general "vibe" here from all the various Field API maintainers seems to be that f_a_query() is so resoundingly useless that being stuck with it for 3-4 years is a worse fate than breaking the code freeze rule once again for this patch. So let's increase priority accordingly, then.

catch’s picture

Priority: Critical » Normal

Going to review this today but I'll argue for the code freeze exception first...

I think this is worth a code freeze exception. field_attach_query() has a lot of well documented limitations, which have been found to be more limiting than they need to be, to the point where it's not actually used for anything apart from deletion and other fairly internal stuff. We also have some critical bugs, #353458: hook_file_references() was not designed for a highly flexible field storage and #556022: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do, for example which would benefit from a better query API for field values (whether it's the right fix for them is a different question, but it's definitely a factor). Additionally field_attach_query() is used in only a handful of places in core, and while I've not grepped, I very much doubt it has a lot of usage in contrib either.

More long term, this opens the way for field storage agnostic modules - which is currently impossible to do apart from simple loading and saving of field values. It allows for both storage agnostic queries using the API directly, and potentially storage agnostic views as well. Currently you need to choose between an SQL backend or a mongodb backend before worrying about nodes, users etc. If someone writes the entity query views backend, modules shipping with default views could provide ones which work with multiple field storage engines. That would really, really help it to get them more widely used during D7, and would knock one of the major parts off the list for D8 as well.

catch’s picture

Priority: Normal » Critical

cross-posted.

Status: Needs review » Needs work

The last submitted patch, entity_field_query.patch, failed testing.

chx’s picture

Status: Needs work » Needs review
FileSize
73.22 KB

I forgot to convert the queries in field.test to the new methods. Also, I forgot to do yched's request to use field ids instead of names. But that's done too and it again should pass tests. For the first time, you can just tell the system to query all values a field has -- it was necessary for the field tests and allowed me to write a really clean field_has_data implementation, the previous ones were hacky.

chx’s picture

FileSize
73.21 KB

Oh noes I posted the wrong one :( that one has a few failures in file due to me using field_id instead of 'id'. (That field structures have an 'id' instead of a 'field_id' is another DX #fail but that's totally D8 material)

chx’s picture

About the freeze exception, let's quote bjaspan:

Agreed that we always knew f_a_query() was very weak and we were waiting for more experience to see how to do it better. It's great for Drupal that Examiner provided that experience.

Also, I spent two complete weekends and a bit more than that to get this patch done a whole week before the beta deadline (Bartik says May 17) so it can get reviews and then get in comfortably before beta. I consider this crucial in order to see the pluggable field storage properly utilized.

catch’s picture

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+   *  If EntityFieldQuery::propertyCondition() or
+   *  EntityFieldQuery::propertyOrderBy() is also called for this query then
+   *  for $name 'entity_type' $value must be a string and the $operator is not
+   *  used. If neither is called, field storage engines might support field
+   *  conditions on more than one entity type and so $value can be either a
+   *  scalar or an array and $operator is used.

I had to read this a few times. It makes sense, but it's two very long sentences. I think we could do something like "If an order by or entity property is part of the overall query, then this can only be used to query one entity type, and operator is ignored. However some field storage engines may allow cross-entity queries on fields, in which case $name may be an array, and $operator is respected.

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+   * @return EntityFieldQuery
+   *   The called object.

Not sure what dbtng says here, but feels like it should be a "an object of class foo" or something.

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+   * @param $field_id
+   *   The id of the field.

Sorry but why? This completely breaks portable queries between different sites which may well not have the same field ID for the same field. I'd like to see field IDs backed out eventually, let's not bake them in further here. It should be possible to take a field name, get the field ID from that, then use the id internally if that's necessary.

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+   *   Possible values:
+   *   - '=', '!=', '>', '>=', '<', '<=', 'STARTS_WITH', 'CONTAIN': these

"CONTAINS" no?
What about IS NULL and IS NOT NULL?

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+   *   The value to test the field against. In most cases, this is a scalar. For more
+   *   complex options, it is an array. The meaning of each element in the array is

Exceeds 80 chars.

+++ includes/entity.inc	2010-05-10 02:47:10 +0000
@@ -298,3 +298,475 @@ class DrupalDefaultEntityController impl
+  public function propertyHelper(SelectQuery $select_query, $entity_base_table) {

Could we call this propertyProcess() or something?

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	2010-05-10 03:03:51 +0000
@@ -482,127 +482,106 @@ function field_sql_storage_field_storage
+  // Figure out whether the entity is revisioned or not and JOIN based on this
+  // information.

Do we need to join on revision id given that field storage is split between current and revisioned?

+++ modules/simpletest/tests/entity_query.test	2010-05-10 03:03:47 +0000
@@ -0,0 +1,328 @@
+    $this->assertTrue($pass, t("Can't query the universe."));

;)

89 critical left. Go review some!

chx’s picture

FileSize
73.34 KB

Changed those two long sentences into two shorter sentences. That return is from DBTNG straight. Param now can be a field or a field name. CONTAINS yes, fixed that. Renamed to processProperty. Yes we need to join on revision_id because ... oopsie, we forgot to join in the revision table, did we? Fixed.

chx’s picture

FileSize
73.1 KB

More minor fixes and one major: we always join the base entity table, revision table be damned, the node_revisions table is so ... ouch we do not want to waste effort on supporting that.

catch’s picture

Looks great to me now, IMO it's RTBC but could do with an extra review or two.

Anonymous’s picture

Status: Needs review » Needs work

the comments at the top of the file definitely need work, nitpicks below.

+ * ..... It entirely depends on the field storage
+ * engine what sort of queries it supports and what kind it does not.

that sentence feels redundant and could go away with no loss in meaning. also, a mile-high description should probably come before diving into explanations about the exceptions that may be thrown by different storage implementations?

+ * The class methods itself just collection information passed to the methods,
+ * there is no logic before hook_entity_query() is fired in the execute method
+ * so this hook can change EntityFieldQuery totally. If this hook does not
+ * provide a result, then EntityFieldQuery::execute() finds the field storage
+ * engine and hands over the query to hook_field_storage_query. If there are
+ * neither field conditions nor field orders then
+ * EntityFieldQuery::entityQuery() queries the SQL entity tables should
+ * those exist.

hmm. not sure if i grok the code 100% yet, but after reading the tests, i *think* a possible first paragraph could read:

This class acts as a container for conditions which will be passed to storage implementations via its execute() method. Consuming code should create a query object, use methods such as fieldCondition() and fieldOrderBy() to add conditions to the entity query, then call execute() to retrieve the results.

@code
  // This code block could use some work....
  $query = new EntityFieldQuery();
  $query->fieldCondition($field, 'value', 2, '>');
  foreach ($query->execute() as $entity_type => $entities) {
    foreach ($entity_ids as $entity_id => $stub_entity) {
    }
  }
@endcode
+  function entityCondition($name, $value, $operator = NULL) {

missing a public|private|protected keyword?

+   * Note that this only affects field condiitions, properties are always
+   * using the base table.

maybe ", properties always use the base table." ?

+      elseif (array_diff($storage, $field['storage'])) {

should be "else if", i think, but we seem to have it both ways in core...

+  protected function propertyQuery() {

no docblock for this method.

+function _field_sql_storage_query_join_entity(SelectQuery $select_query, $entity_type, $field_base_table) {;

umm, stray semi-colon? also, missing a @return in the docblock.

Anonymous’s picture

overall, big +1 on this patch :-)

the high level things i wonder are:

- do we want to make the results drupal_alter()'able? i'm ok if that's for another patch?

foreach (module_implements('entity_query') as $module) {

- i'm probably just missing something obvious as to why we have to do it this way, but if i'm reading this correctly, then every module that implements hook_entity_query will need to interrogate the EntityFieldQuery object passed to it to figure out if its interested? that feels like a lot of boilerplate in a lot of places - is there any way at all we can split this into more hooks based on the data added to the EntityFieldQuery? also, is there any way we can look at the data in the object and figure out that we can skip straight to the fields query? i'm late to this game, so i might be way off, but i'd thought i'd throw it out there in case.

- we don't seem to have any OR support. say i wanted "all entites with [insert entity level condition here] OR [insert field level condition here]". feels like that would be useful, so i'm wondering if we want to add support for that (ok if that's another patch). and that seems to go against some of the comments about different storage backends for entities and the fields on the entities - but are we putting up an unnecessary API barrier here?

- could we put this block in a helper method and call it when the appropriate field conditions are called, so we die when the error happened, not later in execute():

+    foreach ($this->fields as $field) {
+      if (!isset($storage)) {
+        $storage = $field['storage'];
+      }
+      elseif (array_diff($storage, $field['storage'])) {
+        throw new EntityFieldQueryException(t("Can't handle more than one field storage engine"));
+      }
+    }

more nitpicks:

+ *   A EntityFieldQuery, the properties are helper methods are documented

should be "the properties and helper methods".

+   *  If an order by or a property condition is part of the overall query, then
+   *  this can only be used to query one entity type, and operator is ignored.
+   *  If neither is used then field storage engines might support field
+   *  conditions on more than one entity type and so $value can be either a
+   *  scalar or an array and $operator is used.

should this block live above the @param bits in entityCondition()'s docblock?

+   * Adds a condition.
+   * This is a helper for hook_entity_query() and hook_field_storage_query().

missing blank line.

chx’s picture

Status: Needs work » Needs review
FileSize
73.27 KB

Discussed with beejeebus and made a lot of comment changes to make them better. Thanks for the help. We do not support OR. We do not put logic in the collecting methods either as hook_entity_query should be able to make a decision -- someone might write an implementation that supports two different storage engines. Also there is no point in splitting this hook apart (not to mention the lack of ability) probably only storage engine modules (and storage engine helpers) will implement this. Anyways, here is the new version.

Anonymous’s picture

ok, i'm much happier with this patch now. will leave it to someone with more fields-API fu to RTBC it.

aspilicious’s picture

Status: Needs review » Needs work
+++ modules/field/field.api.php	2010-05-10 16:50:13 +0000
@@ -1432,24 +1432,22 @@ function hook_field_storage_delete_revis
+ *   in the class.
  * @return

needs a newline before @return

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	2010-05-10 16:49:39 +0000
@@ -482,127 +482,100 @@ function field_sql_storage_field_storage
+ * Add the base entity table to a field query object.

Hmm if this function isn't a hook this needs to be "adds"

+++ modules/system/system.api.php	2010-05-10 16:50:07 +0000
@@ -268,6 +268,22 @@ function hook_entity_update($entity, $ty
+ *   in the class.
+ * @return
+ *   NULL if the implementation did not handle the query, an array of arrays

need newline before @return

Powered by Dreditor.

chx’s picture

Status: Needs work » Needs review
FileSize
73.28 KB

Sigh. Next time i will write less comments so I do not need to reroll because of two newlines and an "s".

aspilicious’s picture

If you read http://drupal.org/node/1354 , you don't need to reroll.
Awsome comments and tests but if the comments don't follow the standards someone else will open another issue to fix them, it's easier to fix them now.

I rerolled with some more fixes.
1) normal functions have to begin with a 3th person verb (yes every normal function)
2) @see function() can't end with a . (I just fixed all of those in core so *please* don't do this ever again)

yched’s picture

+++ includes/entity.inc	11 May 2010 08:33:40 -0000
@@ -290,3 +290,489 @@
+   * If an order by or a property condition is part of the overall query, then
+   * this can only be used to query one entity type, and the operator is
+   * ignored. If neither is used then field storage engines might support field
+   * conditions on more than one entity type and so value can be either a
+   * scalar or an array and the operator is used.

Confusing - this paragraph looks like it is generic while it in fact only applies to conditions on 'entity_type' ?

+++ includes/entity.inc	11 May 2010 08:33:40 -0000
@@ -290,3 +290,489 @@
+   *   A column defined in the hook_field_schema() of this field. If this is
+   *   ommitted then the field_id is used only to find the storage engine to
+   *   talk to and that will run a query based on entity and property
+   *   conditions.

Unclear to me - fieldCondition() whithout an actual condition on a column ? What are the use cases this addresses ?

+++ includes/entity.inc	11 May 2010 08:33:40 -0000
@@ -290,3 +290,489 @@
+  public function propertyCondition($column, $value, $operator = NULL) {

Having the 'generic' entity properties ('entity_type', 'bundle', 'entity_id') handled outside fieldCondition() is much nicer.
The separation of entityCondition() and entityProperty() sounds unfortunate, though.
Would it be possible to only keep one entityCondition() method, internally special-casing the finite, hardcoded list of 'generic' properties.

+ What about sorts ?
The patch has fieldOrderBy() and propertyOrderBy(). Can we sort by 'entity_id' ?
Same reasoning, could we have fieldOrderBy() and entityOrderBy() (deals with both 'generic' and specific entity properties) ?

88 critical left. Go review some!

Crell’s picture

The reason for having 3 separate methods is that there are three separate "levels" at which one can filter: Those things that all entities have, period; those things that all entities of a given type have (properties); and fields, which may or may not exist at the bundle level. Those have slightly different workflows and different implications for what other methods you can call.

For instance, filtering only to entities of type "user" and then the "published" property of 1 makes no logical sense at all, because the user entity has no such property. So the available properties and fields to filter on depend on entity filtering, which is therefore split off as a separate "thing on which to filter".

Ordering gets split off the same way for the same reason, and for parallelism.

I still need to take the time to dig through the code itself... :-(

bjaspan’s picture

re #41: Should this patch go into core at this time?

It seems to me that it does not *have* to. We could leave the existing field_attach_query() in place and have this new functionality renamed in the Entity Field Query contrib module (how many sites besides Examiner.com will even care?). Perhaps we could even change the PHPdocs for f_a_query() to say that the function is likely deprecated to be replaced in D8 by said contrib module.

Please note that I'm not necessarily advocating for or against this approach, I am just pointing out that AFAICT it would work. So far, I do not really have an opinion either way.

I will try to do a code review this week.

yched’s picture

re Crell #60 : ack for the rationale about the 3 separate "add condition" functions - entityCondition(), entityProperty(), fieldCondition().
Yet the naming draws a similarity between entityCondition() (the generic properties) and fieldCondition(), rather than between fieldCondition() and entityProperty() ("may or may not be there" properties).

Ordering gets split off the same way for the same reason, and for parallelism

patch only has fieldOrderBy() and propertyOrderBy()

chx’s picture

Patch will get propertyOrderBy soon -- and the three conditions are named propertyCondition, entityCondition and fieldCondition, there is no entityPropery whatsoever.

yched’s picture

"there is no entityProperty() whatsoever."

Er, doh. I guess I was tired when I posted this. Sorry bout that.

webchick’s picture

I'd love to hear reactions to bjaspan's suggestion at #61 to move this effort to a Entity Field Query contrib module, for possible inclusion in core for D8. At the rate this patch is changing, it fills me with absolute dread that we're either going to lock in something sub-optimal to Drupal 7, or have 20,000 follow-up patches when we're trying to bash down the criticals queue and get D7 out the door.

catch’s picture

I'm a bit wary doing that while we still have issues like #353458: hook_file_references() was not designed for a highly flexible field storage and #556022: When a text format is deleted, some modules (like text.module) don't update their data, even though we say they do knocking around and pretty much stalled. For example this allows for the equivalent of SELECT 1 FROM {table} LIMIT 0, 1; which I don't think field_attach_query() does, and that may end up being necessary for both of those issues depending on how they go (or it might not). On the other hand, if it actually ends up not blocking those issues whatsoever, then sure it could be a contrib module, but in this particular case I don't feel there's much risk to it being in core either.

Also we already have something baked in that's sub-optimal, so this can't make that situation worse. If someone comes up with something even better in contrib for D7 and it's committed to D8, that'd be fine too - committing this doesn't block people working on improvements from contrib.

catch@catch-laptop:~/www/7$ grep -rl field_attach_query *
modules/field/field.attach.inc
modules/field/modules/list/list.module
modules/field/field.crud.inc
modules/field/tests/field.test
modules/field/field.module
modules/field/field.api.php
modules/file/file.module

The reference in list.module is a @todo comment which hasn't been implemented yet (and is for $has_data - we don't want a count for that). The reference in file.module is in file_get_file_references() which is #353458: hook_file_references() was not designed for a highly flexible field storage anyway.

On "how many sites besides Examiner.com will even care?", well four months ago I'd probably be sitting here saying the same thing. However having used mongodb field storage on Examiner I can't imagine building a Drupal 7 site with > 5000 nodes without it. I have a Drupal 6 site which is entirely dependent on views and block caching not to completely fall over, and mongo field storage + this API finally points to a situation where listing performance could be acceptable without hundreds of hours of development overhead. That site is single server, run in my own time on almost zero budget, about the tenth the size of Drupal.org in terms of content. The advantage of having sites run on alpha is that they're able to find issues and contribute back fixes before a release, this is the single greatest contribution which could be made from that effort.

chx’s picture

FileSize
75.38 KB

Any non-trivial sized sites will likely need a different storage engine than the default. We knew this when we wrote field API. We knew, Barry wrote PBS to prove the API is flexible enough, that there is a way out. This adds entity sorting and implements entity conditions if there are no field conditions.

Entity Property Field Works One entity type only?
- - - - n/a
- - + + -
- + - + +
- + + + +
+ - - + +
+ - + + -
+ + - + +
+ + + + +
fago’s picture

Generally +1 for such an implementation, not sure whether it is right to include it that late or not. I leave that up to others and give some general, high-level feedback & suggestions to the code instead:

+ * Do not modify the class properties, they are internal and only public
+ * because the hook_entity_query() and hook_storage_field_query() hooks need
+ * them.

Hm, I'd much prefer having them be really private by making them protected. A hook for optionally altering the execution of the query looks awkward too me. Why not implement a simple factory?
Introduce a small info hook that specifies the class to use for the queries + a small function for creating the object -> entity_query(). That way a module can easily override the class for altering internal logic and overriding execute.
(And it is hook_field_storage_query()).

E.g. we could leverage hook_entity_info() for specifying the class + let the controller specify the default query class to use if no module specifies another one. Thus a mongodb entity controller could provide a mongodb implementation as default. For that add an interface + a base class implementing it + a sql implementation.

For altering I'd suggest to just make public methods the return references on the internal variables. Then it's still possible to alter, but it's clear that the variables are internal. db-tng does it that way too.

Apart from that having entityCondition() + propertyCondition() + fieldCondition() is also confusing to me. While it's nice to have a way to say give me bundle 'foo' regardless of the bundle key that is in use, does it work with using the concrete bundle key + propertyCondition() the same way? Is it a shortcut, or the only way to do it?
It really shouldn't make that a difference for the caller whether a property is the bundle key or not, so just adding a condition for it should still work.
I see the point of having separate fieldConditions(), but ideally that what be also hidden from the user creates a query - internally it should make use of the right condition though. Thus we'd have a single condition() method, powerful enough to serve any of those three, while it would internally split it up as necessary. If there is invalid combination, exception thrown - done.

chx’s picture

I am very very strongly against a factory function which makes the code flow incredible hard to follow. I hate we have all these variable-named classnames which makes it a nightmare to find anything in Drupal 7. No tools work because you have no idea what variable holds what class. Especially given how damned rare it is going to be override this class vs how often it will be used so how important it is to be able to able to debug. I feel the tradeoff we came up with here is way better than this factory madness. Adding reference-returning getters make the code a lot more bloated. The only reason we are using OOP here is to have nice places to put doxygen on. You can't leverage entity info 'cos you are querying field data which can be across entity types.

Of the trinity of the conditions, you should understand that there is no good single way of expressing them unified. First of all, field conditions have two coordinates (field name, column) vs the rest only have one (name). Now, entity conditions need to be separated because they have widely different behaviours: for fields they are auxilliary data but for properties they are merely derivatives, another name for some properties. field values must have entity_type and object keys saved together because again they can belong to diff entity types.

Edit: and it should be noted that I do not like the way DBTNG turned out and I have complained in many issues about it being extremely hard to debug. Don't claim DBTNG as an example to follow to me.

chx’s picture

Status: Needs review » Closed (won't fix)

Moral of the story: leave an issue open long enough and someone will come and hopelessly derail it.

catch’s picture

Status: Closed (won't fix) » Needs review

The measure of this patch should be two things:

1. Is it more useful than field_attach_query()? (yes).
2. Does it result in less lock in for D7 field_attach_query() does? (also yes).

I would hate to see something which deals with those two issues, which are not small ones, bogged down in OOP implementation details, if that happens, then it'll just get punted to D8, and every field storage engine, whether core, mongo, per-bundle or whatever else shows up during the D7 life cycle will have to re-implement this stuff from scratch, either custom storage-engine specific queries or their own unique views backends - making it impossible to port code between them without manual intervention.

This is the equivalent to pluggable cache backends - we have had an imperfect architecture for those in core for 3-4 releases, it works great in terms of allowing people to change their caching backend without having to refactor their code base. Field storage does that with the single exception of this API.

fago’s picture

>Moral of the story: leave an issue open long enough and someone will come and hopelessly derail it.
!?
I just wanted to help shaping the architecture by providing my view of how the API should work. If you disagree or feel it it's out of scope, what hinders us from discussing it - as you have already started? I surely don't wanted to derail this issue, on the contrary I'd really like to see something like this in d7.

ad #71: Agreed, sounds reasonable!

@factory making it hard to trace code:
That's a problem of any pluggable system. $function() is not easier to trace either. The question is, do we want a pluggable system at this place?
Well, the point is the EntityFieldQuery class is tied to the DB, but entities could live somewhere else. Thus using hook_entity_info() for specifying the query implementation - or allowing the controller to define it, fixes that.

catch’s picture

The problem is that you can have a field-only query, which queries mongo, and doesn't care that you have a tags field attached to the 'node' entity, and also attached to the 'mongodb_comment' entity or whatever - that's a perfectly valid query which this API supports - i.e. it will return something for each entity type which the field is attached to. So making this pluggable on the entity level wouldn't account for non-parallel mixed storage (fields in mongo, some entities in SQL, some in mongo) - which is unfortunately not that unlikely.

Noticed a typo in the patch:

+  public function endityOrderBy($name, $direction) {
chx’s picture

FileSize
76.59 KB

typo fixed. more tests coming the coming days to catch such typos but it's 1:20am. And , those tests can be a follow up issue.

What hinders us from discussing? I can give you a long list of issues where the OOP theorists and purists have won over practical and lean, that's what. Once these have raised, it's very hard to keep code from bloating over in the name of theory and clean. Edit: and writing getters that return references is just that: bloat because they violate encapsulation the very same as public properties...

fago’s picture

>The problem is that you can have a field-only query, which queries mongo, and doesn't care that you have a tags field attached to the 'node' entity, and also attached to the 'mongodb_comment' entity or whatever - that's a perfectly valid query which this API supports - i.e. it will return something for each entity type which the field is attached to.

I see that a field-only query across different entities though doesn't play well with that - which controller should be use for? For that we probably have to separate the field querying part from the entity querying part, such that the field querying part is usable on its own too. That's probably already too much for d7 now too, thus "out of scope". However when only db-stored entities are supported, we need at least make that clear.

fago’s picture

ad #74:
There is surely something true in that. I didn't want to kick of OOP discussions - the comment just read awkward to me and I thought about ways of doing it better - sry for that.

- hook_storage_field_query()
+ hook_field_storage_query()

chx’s picture

FileSize
76.73 KB

Addressed the comments in #75 and #76. And I am glad we are over this OOP....

aspilicious’s picture

What happend with those verbs I corrected? :(

dawehner’s picture

FileSize
63.43 KB

@@ -2783,6 +2783,22 @@ function user_role_change_permissions($r
  */
 function user_role_grant_permissions($rid, array $permissions = array()) {
   $modules = user_permission_get_modules();
+  // Disabled modules update too and a permission => module mapping is needed
+  // for those too.
+  if (defined('MAINTENANCE_MODE') && MAINTENANCE_MODE == 'update') {
+    $disabled_modules = db_query("SELECT name, filename, info FROM {system} WHERE type = 'module' AND status = 0 AND schema_version > :schema ORDER BY name", array(':schema' => SCHEMA_UNINSTALLED));
+    foreach ($disabled_modules as $module) {

This comes from another patch. Here is a updated version.
Powered by Dreditor.

chx’s picture

FileSize
76.24 KB

aspilicious sorry, i have added your changes. I skipped dereine's patch (edit: it's missing the new test file) but have added lots of comments and a distinct on multiple values to field storage query based on DamZ's feedback.

Crell’s picture

Status: Needs review » Needs work

Comments on the latest patch, in the order I notice them while reading the file top to bottom...

I am not at all in favor of these myriad public properties. We already have an established solution for exposing them to alter hooks, and it's bad DX for some systems to have reference getters (which by nature read as "you're doing something dangerous, be careful or it's going to asplode") and some to have just bare properties with a comment that most people will never actually see that says "please don't touch these, k?" The functions themselves are small and do not add any appreciable runtime overhead. (Lines of code is a very poor way of measuring "bloat"; it is a red herring.)

The object properties should declare their type with a @var doc statement, eg @var array.

In the docblock for fieldCondition(): "An arbitrary identifier, conditions in the same group must have the same language." That's a comma splice. That comma should be a period, or perhaps a semi-colon. That problem appears twice in that docblock.

"Filters on the data being deleted." - Should probably read "Filters the result set on whether or not an entity has been marked for deletion."

Should the count() method just modify in place, or return a cloned object that is a count query for parallelism with the way SelectQuery::countQuery() works? (It may make sense the way it is, but as it's different than its closest similar system it's worth asking.)

public function processRangeCount($select_query) clearly requires a SelectQuery as its parameter, so it should be type hinted as well. Actually to be really correct it requires SelectQueryInterface, not SelectQuery, so the docblock should be adjusted as well.

public function addCondition(SelectQuery $select_query, $sql_field, $condition) should type hint SelectQueryInterface, not SelectQuery. The same is true of a few other methods, it looks like.

Don't these two methods only work if the entity is stored locally in the DB? What do we lose if the entity in question is non-local?

Why isn't hook_entity_query() formally an alter hook? It's being used the same way as hook_query_alter(), and is documented as being an alter hook. It should be hook_entity_query_alter(). That removes a few lines from the execute() method as well, as it can just call drupal_alter(). If there's some reason it can't, that needs to be documented inline.

In propertyQuery(), there's this code block:

+    if (!empty($entity_info['entity keys']['revision'])) {
+      $id_map['revision_id'] = $entity_info['entity keys']['revisionq'];
+    }

I suspect that 'q' is a typo. :-)

+ throw new EntityFieldQueryException(t('Do know how to execute a condition on @key for @entity_type', array('@key' => $key, '@entity_type' => $entity_type)));

That sentence does not parse grammatically. Do you mean something like "Unable to execute a condition on..."?

I still think we should be returning ids, not stub entities. If all I want is IDs (say for tossing into a queue for later processing), why go to the trouble of loading up all the stubs? I'm going to have to them iterate them and get just the IDs out, and throw the rest away. If I do need real entity objects, one call to entity_load_multiple() should take care of it in a cache-friendly way.

Even if we don't have a full factory with swappable classes, what about a utility wrapper for creating a new field query object? That way it's chainable, which a constructor is not (because PHP is annoying like that).

function field_sql_storage_field_storage_query(EntityFieldQuery $query) ...

If you're not going to have an interface for EntityFieldQuery, don't type hint it. Type hinting a class directly is a coding standards violation.

In field_sql_storage_field_storage_query(), when iterating fieldConditions the code uses the indirect $tablename_function(). When iterating orderBy, it calls _field_sql_storage_tablename() directly. Shouldn't the latter also be using $tablename_function()? (Same for column name.)

The new test class puts way too much into a single test method. Those should all be separate test methods. It's bad unit testing form to cram all of that into a single method.

function EntityFieldQueryHelper($query, $intended_results, $message, $ordered = FALSE) (in the test class) is misnamed. It should begin with a lowercase letter.

On the whole, I dig this patch. The new code looks a lot easier to follow than the old. Most of the comments above are relatively minor and easily fixed.

The only architectural question I have is hook_entity_query(). For lack of a better way to describe it at 1:30 am, I don't get it. It looks like it's an alter hook and a magic callback rolled into one, which doesn't make sense to me at all. If the intent is that each storage backend needs its own execute() implementation, OK I get that. Normally that would be a case for subclassing and a factory, but in this case we don't know the storage engine we need at creation time so we can't do that anyway. The next solution would be to use the object just as a collector and pass it off to an execution object that will run it. Since storage backends aren't objects but a pseudo-namespace based on the module that defines it, a magic callback is the currently most common way to handle it. It's a horribly ugly approach, but until storage engines improve it's the best we've got. :-)

That said, it's still wrong to merge the alter hook and execution magic callback. If I want to alter a query but let someone else execute it, then I have to mess with module weights. That's a hack. Also, I don't understand why there's a default implementation in the execute() method if we expect any storage engine to implement its own. If we have a default of some kind, shouldn't that live elsewhere as a default callback?

I would refactor the execute() method to be something like:

public function execute() {
  drupal_alter('entity_query', $this);
  $engine = $this->figureOutTheEngineSomehow();
  $function = $engine . '_entity_query_execute';
  return $function($this);
}

And then $this->figureOutTheEngineSomehow() can return "default" for whatever case would have resulted in the current patch continuing in execute(). Then there's a default_entity_query_execute() (or whatever) that handles whatever the default case is. That way it's a consistent pattern regardless of the code path.

Then then gives us hook_entity_query_alter() (a real alter hook) and a magic callback called $module_entity_query_execute(), which aren't trying to pretend to be the same thing.

Does that make any sense? :-)

sun’s picture

Status: Needs work » Needs review

Great work, awesome reviews so far. Definitely required for D7.

yched’s picture

Crell #81 :

I still think we should be returning ids, not stub entities. If all I want is IDs (say for tossing into a queue for later processing), why go to the trouble of loading up all the stubs?

The interest of returning a stub object is that you get both ids and revisions ids (when querying over past revisions).
The current field_attach_query() function does :

 * @return 
 *   An array keyed by entity type (e.g. 'node', 'user'...), then by entity id
 *   or revision id (depending of the value of the $age parameter). The values
 *   are pseudo-entities with the bundle, id, and revision id fields filled in

so you get relevant ids as keys of the result array.
AFAICT, the current patch keeps that return format

catch’s picture

If the array is keyed by entity ID, nothing stops us from doing node_load_multiple(array_keys($result['node'])); It's not the most convenient, but doesn't require a loop.

Crell’s picture

Status: Needs review » Needs work

I spoke with chx in IRC last night, and I was conflating hook_entity_query and hook_field_storage_query, which is what was confusing me. We went back and forth a bit, and came up with the following:

Instead of hook_entity_query, we have hook_entity_query_alter() which works the same as hook_query_alter() essentially. That gets called with drupal_alter($this); One of the things an alter hook can do if it feels like it is call $query->setExecutor() (or something like that) with the name of a function. After the alter hook, if $this->alternateExecFunction (or whatever) is set, we call that function and return its result rather than running the rest of the execute() method.

That cleanly separates the alter hook logic from the "clobber the entire execution process and do it yourself" logic, which were previously being mixed.

Crell’s picture

chx and I chewed over the public property question at length in IRC (and it got rather heavily digested, let me tell you...) End conclusion: We'll stick with public properties here, but document why this is an exception to the OOP standards in place, because this is a valid exception to them. To wit, the following paragraph should be added to the class's docblock:

"Normally we would not want to have public properties on the object as that allows the object's state to become inconsistent too easily. However, this class's standard use case involves primarily code that does need to have direct access to the collected properties in order to handle alternate execution routines. We therefore use public properties for simplicity. Note that code that is simply creating and running a Field query should still use the appropriate methods add conditions to the query."

The rest of my comments in #81 and #85 still stand but should be easy enough to implement.

chx, let's finish this puppy off. :-)

chx’s picture

Status: Needs work » Needs review
FileSize
77.64 KB

Crell, forward, forward, back, back, block.

aspilicious’s picture

+ * @param $select_query
+ * A SelectQuery which has entity_tpe, entity_id, revision_id and bundle
+ * fields added.
+ * @return
+ * @see EntityFieldQQuery::execute().

Newlines needed + indentation errors

+ * Alter or an EntityFieldQuery.
+ *
+ * @param EntityFieldQuery $query
+ * A EntityFieldQuery, the properties and helper methods are documented
+ * in the class. One of the most important properties to be changed is
+ * EntityFieldQuery::executeCallback. If this is set to an existing function,
+ * this function will get the query as its single argument and its result
+ * will be the result of EntityFieldQuery::execute(). This can be used to
+ * change the behaviour of EntityFieldQuery entirely. For example, the
+ * default impleemntation can only deal with one field storage engine, it's

=> Alter or and EtityFieldQuery sounds weird (but I can be wrong)
=> impleemntation is wrong (I'm sure about that ;) )

chx’s picture

I am sure there will be other concerns ;) and then I will roll these in. That wanted to be "Alter or execute an".

bcn’s picture

FileSize
79.32 KB

Reroll to fix items in #88 (i think), a few occurrences of 'a entity...' which should be 'an entity', and a few other typo's.

Status: Needs review » Needs work

The last submitted patch, fields_listing_api.patch, failed testing.

bcn’s picture

Status: Needs work » Needs review

#90: fields_listing_api.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, fields_listing_api.patch, failed testing.

bcn’s picture

Status: Needs work » Needs review
FileSize
77.65 KB

Re-rolled the changes (by hand this time) into chx's from #87. Not sure why the previous one wouldn't apply here...

Status: Needs review » Needs work

The last submitted patch, field_listing_api.patch, failed testing.

dhthwy’s picture

Status: Needs work » Needs review
FileSize
86.77 KB

Think this patch takes care of #88.

Also included in this patch:

More tests.
Fixed a bug caught by one of the new tests when doing a condition on an entity_id.

I hope this works, this is my first patch with git.

bcn’s picture

Status: Needs review » Needs work
+++ includes/entity.inc
@@ -290,3 +290,616 @@ class DrupalDefaultEntityController implements DrupalEntityControllerInterface {
+ * conditions, and will raise a EntityFieldQueryException when an usupported

An EntityFieldQueryException

+++ includes/entity.inc
@@ -290,3 +290,616 @@ class DrupalDefaultEntityController implements DrupalEntityControllerInterface {
+   *   @see EntityFieldQQuery::execute().

EntityFieldQuery::execute()

+++ modules/field/field.api.php
@@ -1440,24 +1440,21 @@ function hook_field_storage_delete_revision($entity_type, $entity, $fields) {
+ * Execute a EntityFieldQuery.

An EntityFieldQuery

+++ modules/field/field.api.php
@@ -1440,24 +1440,21 @@ function hook_field_storage_delete_revision($entity_type, $entity, $fields) {
+ *   A EntityFieldQuery, the properties and helper methods are documented

An EntityFieldQuery

78 critical left. Go review some!

chx’s picture

Status: Needs work » Needs review
FileSize
86.38 KB

Fixed typos, more than #97 asked for. Huge props to dhthwy who have doubled the number of tests we have from 20 to 39! And those are in one method because it's way way faster than separate methods.

chx’s picture

Note that the initial tests were written by dmitrig01 so he needs to get credit too :)

dhthwy’s picture

FileSize
99.15 KB

66 tests now.

Another bug fix by chx found by one of the new tests in this patch.

jhodgdon’s picture

Status: Needs review » Needs work

The docblocks in this patch need a small amount of editing work to comply with our style guidelines:
http://drupal.org/node/1354
http://drupal.org/node/338208

- Classes/functions - doc first line should start with a verb in 2nd person, like "Retrieves" rather than "Retrieve".
- The proper punctuation for "e.g." is "e.g., " and it should not have "for" before it (e.g. means "for example").
- In lists with commas, put a comma before the final end. E.g., "Shoes, socks, and shirts" not "Shoes, socks and shirts".

That was all I found in the first 20 lines of the patch... I don't have time to edit it all for style right now, but it needs to be done before the patch is good to go. Thanks.

aspilicious’s picture

I'll try to look at it once more....

+   * @return
+   *   @see EntityFieldQuery::execute().
+   */

Stil isn't fixed (newline and indentation and NO point at the end)

+  /**
+   * Fetch the results of an EntityFieldQuery and compare.
+   *
+   * @param $query

Fetches

+  /**
+   * Test field_attach_query().
+   */

Tests

chx’s picture

Status: Needs work » Needs review

That's not a review I am acting on. I am going to fix if a relevant review appears, but there is no need to set this to CNW for now. (This more and more looks like a penalty for writing too much docs)

jhodgdon’s picture

I am cleaning up the doc now (and hopefully reviewing the patch as well). Reporting back (with cleaned doc patch) in an hour or so...

matt2000’s picture

(o) (o)

jhodgdon’s picture

FileSize
100.69 KB

OK. Here's a patch with cleaned up doc.

I also removed an apparently debugging line of code from field.attach.inc:

if (!$entity_type) file_put_contents('/tmp/log', var_export(debug_backtrace(), TRUE));

Also, one question: At this stage of the game you are proposing to get rid of field_attach_query()? I realize it was limited, but anyone who has been doing D7 porting of field-related modules has probably been forced to use it... getting rid of it seems rather disruptive?

I added a function body to the query alter hook. The hook_field_storage_query still needs one.

jhodgdon’s picture

Issue tags: +Needs documentation

I just scrolled up and read the arguments about why field_attach_query is useless (sorry, hadn't read the backscroll before).

I agree. But at the same time, as one of probably a very few people who actually *used* the stupid function, it is just going to be one more way I'll have to chase head when it goes away and my D7 contrib module port will break *again*. For being in code freeze, this has happened surprisingly often... Sigh.

Someone needs to write this up for the module update guide, for sure!

pwolanin’s picture

FileSize
66.38 KB

Reorganization of the execute() method to make it more clear how we end up with SQL storage as a default.

pwolanin’s picture

FileSize
99.67 KB

oops that missed the new test file

chx’s picture

If we need to mess with the approx two people who [did not run screaming from field_attach_query and used direct queries instead] vs. mess with everyone who will build a D7 site in the future I know whom would I mess with. (I will buy both of you a good $beverage at the next 'Con, OK?)

aspilicious’s picture

Minor doc issue fixed

jhodgdon’s picture

RE #110: OK, as one of those two people. Beverage optional. Better Drupal API++.

My plan for later today is to try out this new patch with my contrib module that is/was using field_attach_query() and see how it goes, as a way of reviewing the patch. If all goes well, I'll (a) be able to keep up with D7 HEAD and (b) mark this RTBC. But I have a few things to take care of first (clients, etc.). So give me a few hours...

jhodgdon’s picture

So... I'm trying to use this new EntityFieldQuery to do what I was previously doing with field_read_instances()/field_attach_query(), and I am a bit unsure...

This particular contrib module allows you to search the contents of files attached to nodes, limited to fields and content types the admin has chosen. So what I need to do is query to find all the files attached to those particular content types using those particular fields.

What I am doing now is (pre-this-patch):
- Use field_read_instances() to find a list of all instances (i.e., field I'm interested in, attached to node type I'm insterested in).
- On each instance, use field_attach_query() to find a list of all nodes that have a file attached via that instance.

What I was hoping was that I could use EntityFieldQuery to do that in one step. Is that possible, or can I just use it to replace the field_attach_query() step?

And even so... once I have the field instance, I am not sure how to use an EntityFieldQuery to find the file attachments.

Current code, just for reference (with some error checking and things removed for simplicity):

$instances = field_read_instances(array(
    'object_type' => $objtypes,
    'bundle' => $bundles,
    'field_name' => $fieldtypes,
));

foreach ($instances as $instance) {
  $field_id = $instance['field_id'];
  $field_name = $instance['field_name'];
  $bundle = $instance['bundle'];
  $obj_type = $instance['object_type'];

  $specifics = field_attach_query($field_id, array(
      array('type', $obj_type),
      array('bundle', $bundle),
    ),
    array(
      'limit' => FIELD_QUERY_NO_LIMIT,
    ));

  $specifics = $specifics[$obj_type];
  $ids = array_keys($specifics);

  $objs = entity_load($obj_type, $ids, array(), TRUE);
  foreach ($objs as $id => $obj) {
        // do what I need to do with these entities
  }
}
jhodgdon’s picture

Oh... Sorry, needed to read the doc a little closer. It looks like fieldCondition() allows me to omit the columns. I'll try that...

jhodgdon’s picture

Hmmmm... Well, on closer reading, I don't think this is going to help. If I call fieldCondition() with no condition on the field values, it doesn't seem as though there are any conditions added to the query. So... ???

jhodgdon’s picture

Status: Needs review » Needs work

So... either the documentation needs to make clear how to create this query, or (as I think) the query class is lacking the ability to do the query I could previously do with field_read_instances()/field_attach_query(). Either way, I think the patch needs work.

chx’s picture

Let's discuss how to enhance the docs. field_has_data has some hints on how to do this query:

$query = new EntityFieldQuery;
$query
  ->fieldCondition($field_name)
  ->entityCondition('entity_type', $entity_type)
  ->entityCondition('bundle', $bundle)
chx’s picture

Status: Needs work » Needs review
FileSize
99.59 KB

Not just fixed the relevant docs with the help of jhodgdon but separated the loop that adds the tables from the loop that adds the condition in field_sql_storage_field_storage_query so now you can ask for entities that have field data in field A and field B and ... previous patches allowed that only for field A. This have simplified the code in field_sql_storage_field_storage_query a little as now there is one separate loop for joins and then one for conditions and one for orders. Previously both conditions and orders joined.

chx’s picture

FileSize
99.59 KB
jhodgdon’s picture

FileSize
101.29 KB

Hah! With chx's help and the patch in 119, I was able to port my module (Search by Page) to use the new API.

The code in #113 became:

 foreach ($fieldtypes as $field_name) {
    $query = new EntityFieldQuery;
    $query
      ->entityCondition('entity_type', 'node', '=')
      ->entityCondition('bundle', $bundles, 'IN')
      ->fieldCondition($field_name);

    $results = $query->execute();
    $objs = entity_load('node', array_keys($results['node']));

    foreach ($objs as $id => $obj) {
       // Do whatever I need to do with each node $obj.
    }
}

Note that in #113 I had defined $objtypes = array('node').

Anyway. I'm ready to say "Put this in core!".

But I took one last look through the docs and found a couple of remaining spelling/grammar errors and a couple of whitespace issues (spaces at ends of lines etc.), which of course I couldn't resist fixing. So, here's a slightly modified patch.

I am ready for this to be marked RTBC, as lon as chx is happy with the minor changes I just made, and assuming the test bot is still happy.

jhodgdon’s picture

Status: Needs review » Reviewed & tested by the community

OK, I will go ahead and mark it RTBC. chx can mark it not-RTBC if he's not happy with the patch in #120.

My modified Search by Page module works fine with this patch, and hopefully the test bot will come back all green.

And I think we can use something like #113/#120 as an example for the module update page... if it needs to go on the module update page? I am not sure, since it's not a 6/7 diff at this point?

chx’s picture

I am superb happy with the attention of detail this patch got lately. Can't wait for it to get in.

andypost’s picture

+++ modules/field/field.crud.inc	3 Jun 2010 22:45:57 -0000
@@ -1036,25 +1036,23 @@
+    $query = new EntityFieldQuery;

This

+++ modules/field/field.module	3 Jun 2010 22:45:57 -0000
@@ -877,8 +855,12 @@
+  $query = new EntityFieldQuery();

And this

Why they are different all over a patch?

jhodgdon’s picture

Why is this a problem? I don't know which is preferred, having parens for class construction with no args or not having parens, or why it's a big problem that some parts of a patch that one person wrote are a bit differrent from other parts of a patch that someone else wrote.

Hmmm...
http://www.php.net/manual/en/language.oop5.decon.php
The examples here show () in every case.

http://www.php.net/manual/en/language.oop5.references.php
The examples here don't show () at all.

http://drupal.org/coding-standards
has absolutely nothing about any coding standard for this.

aspilicious’s picture

I must admit that it was wondering the same thing. It's not really a problem just looks messy.

catch’s picture

Let's open a separate code style issue, thrash it out, then fix it across core, in a separate issue, separate from this one. Did I mention it should be a separate issue?

jhodgdon’s picture

Excellent idea. So let's get this in as-is...
#818206: Apply coding standard for no-arg constructors

webchick’s picture

Since Dries already took a gander at this one, and since this looks like it goes way over my head, leaving for him to finish up.

bjaspan’s picture

Generally, bravo; this looks very good.

My review is fairly shallow (though only some of my comments are only about doc :-), it is all I have time for at the moment. I found a number of things I thought were confusing and did not take the time to go figure out why they are right, so I just left them as questions. Hopefully by posting questions which reveal I don't fully understand how this patch works yet I will be motivated to finish figuring it out. :-)

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	3 Jun 2010 22:45:57 -0000
@@ -482,127 +482,112 @@
+    $select_query->join($tablename, $table_alias, "$table_alias.etid = $field_base_table.etid AND $table_alias.$id_key = $field_base_table.$id_key");

While reading at this point, it is not clear to me why this will not result in tablename($query->fields[0]) always being joined twice, because $fields[0] is the base table of the query and then it is being added in again here.

+++ modules/field/modules/field_sql_storage/field_sql_storage.module	3 Jun 2010 22:45:57 -0000
@@ -482,127 +482,112 @@
+ * @param SelectQuery $select_query

Wasn't there a previous comment that this type hint (and others like it) should be SelectQueryInterface, or be removed?

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+   * TRUE to return only deleted data, FALSE to return only non-deleted data,
+   * NULL to return everything.

TRUE and FALSE are clear here because we're setting a boolean property. However, $query->deleted(NULL) is going to be pretty confusing. If this were my patch, I would define a constant FIELD_QUERY_ALL or something to use in place of NULL.

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+  /**
+   * The field IDs used.
+   *
+   * @var array
+   */
+  public $fields = array();

This documentation is wrong; see comments below.

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+   *   An arbitrary identifier: conditions in the same group must have the same
+   *   $delta_group. For example, if a multivalue field has two columns,
+   *   'color' and 'shape', and two values are red/square and blue/circle, and
+   *   the query should find red/circle, then if you supply two conditions
+   *   (color = red, shape = circle), it will not correctly find this entity.
+   *   If the two conditions are given with the same $delta_group, then the
+   *   query will work as expected.
...
+   *   An arbitrary identifier: conditions in the same group must have the same
+   *   $language_group.

My reading of this example is that if we want to find red/circle, the query should return no results because no such value exists, but in fact it will return both entities because one value is red and the other value is a circle.

However, the text here says "it will not correct find this entity." This text suggests that the desired behavior would be to find an entity, when the desired behavior is not to find one, but the actual behavior is to find one.

So I suggest making the example more explicit: "Without a delta_group, the conditions (color=red, shape=circle) would find an entity with field values red/square and blue/circle because one delta is red and the other is circle. With a delta group, the query will require that the same delta is both red and circle, and thus the query will only return the desired results." Or something.

I do not think this is just nit-picking. This is confusing functionality, and the example needs to be crystal clear.

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+    $index = count($this->fields);
+    $this->fields[$index] = $field;
+    if (isset($column)) {
+      $this->fieldConditions[$index] = array(
+        'field' => $field,
+        'column' => $column,
+        'value' => $value,
+        'operator' => $operator,
+        'delta_group' => $delta_group,
+        'language_group' => $language_group,
+      );

This confuses me. Up above $this->fields is doc'ed as "the Field IDs used". So I figured it was an array of unique field IDs. In fact it is an array of non-unique Field structures.

If we set two fieldConditions on the same field (e.g. color=red and shape=circle, or whatever), then the field ends up in $this->fields array twice (and in $this->fieldConditions twice, as opposed to $this->fieldConditions[$field_id] being an array of two elements). That's fine so long as the code everywhere expects that, and I'm sure it does since otherwise nothing would work. But the doc for $this->fields is wrong and confusing. The data structures should be explained so that the code makes sense.

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+   * Note that this only affects field conditions. Properties always use the
+   * base table.

The term "base table" is somewhat out of scope here. I'd suggest: "Entity property conditions as always applied to the current revision."

+++ includes/entity.inc	3 Jun 2010 22:45:56 -0000
@@ -290,3 +290,647 @@
+      elseif (array_diff($storage, $field['storage'])) {
+        throw new EntityFieldQueryException(t("Can't handle more than one field storage engine"));

Is it worth saving the storage module name separately so this can be a simple string comparison instead of an array comparison?

+++ modules/field/modules/list/list.module	3 Jun 2010 22:45:57 -0000
@@ -99,7 +99,7 @@
- * exists in the databae (use field_attach_query()) to find out.
+ * exists in the field storage (use EntityFieldQuery) to find out.

Amusingly, you preserved the incorrect close-paren which should go after "out".

+++ modules/simpletest/tests/entity_query.test	1 Jan 1970 00:00:00 -0000
@@ -0,0 +1,849 @@
+    // First, of type test_entity_bundle_key.

Huh?

72 critical left. Go review some!

chx’s picture

Status: Reviewed & tested by the community » Needs review
FileSize
101.74 KB
  1. Fixed fields[0] added twice.
  2. Added RETURN_ALL constant.
  3. Removed the mention of field IDs from doc, added 4 lines of comments instead.
  4. Hopefully the new doxygen for the delta group makes more sense.
  5. changed storage to just the storage module.
  6. fixed the comment in list module.
  7. fixed the comment in the test.
chx’s picture

FileSize
101.66 KB

delta group doxygen fixes from catch. I ran the entity tests and all 66 pass still no wonder, the changes are so small.

catch’s picture

Status: Needs review » Reviewed & tested by the community

While I wrote the docs change I think it's clearer now,looks like chx dealt with the rest of Barry's review too. I'm marking this RTBC again. #808534: file_get_file_references() is completely broken is pretty much blocked on this going in unless we want to do that work twice.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs work

I see at least one place now where RETURN_ALL isn't used that it should be. Also, I'm now REALLY confused because chx said he fixed fields[0] being added to the query twice, which suggests that I was right that it WAS being added twice (once as FROM and once as JOIN), except I do not actually see that change in the code. So now I wonder if chx changed something other than what I was referring to, and wonder what that is. But I'm running off to a sprint demo so I can't diff the diffs just now. I will look at this today (see, I told you doing a review would get me more involved!).

bjaspan’s picture

Okay, I do now see where chx put in code to prevent fields[0] from being added to the query twice. To convince myself that we really were doing an extra join before and that the new change fixes it, I hacked in some debug logging:

  function finishQuery($select_query, $id_key = 'entity_id') {
    if ($this->range) {
      $select_query->range($this->range['start'], $this->range['length']);
    }
    if ($this->count) {
      file_put_contents('/tmp/entity_query.log', $select_query->__toString(), FILE_APPEND);
      return $select_query->countQuery()->execute()->fetchField();
    }
    file_put_contents('/tmp/entity_query.log', $select_query->__toString()."\n\n", FILE_APPEND);

I then ran EntityFieldQueryTestCase with and without the change to avoid adding fields[0] twice. The first entity query run by the tests that is affected is, old way:

SELECT DISTINCT field_revision_lcsz8rij_field_name.entity_id AS entity_id, field_revision_lcsz8rij_field_name.revision_id AS revision_id, field_revision_lcsz8rij_field_name.bundle AS bundle, fcet.type AS entity_type
FROM 
{field_revision_lcsz8rij_field_name} field_revision_lcsz8rij_field_name
INNER JOIN {field_config_entity_type} fcet ON fcet.etid = field_revision_lcsz8rij_field_name.etid
INNER JOIN {field_revision_lcsz8rij_field_name} field_revision_lcsz8rij_field_name0 ON field_revision_lcsz8rij_field_name0.etid = field_revision_lcsz8rij_field_name.etid AND field_revision_lcsz8rij_field_name0.revision_id = field_revision_lcsz8rij_field_name.revision_id
WHERE  (field_revision_lcsz8rij_field_name0.lcsz8rij_field_name_value >= :db_condition_placeholder_0) AND (field_revision_lcsz8rij_field_name.deleted = :db_condition_placeholder_1) AND (fcet.type = :db_condition_placeholder_2) 

and new way:

SELECT field_revision_robxro8n_field_name.entity_id AS entity_id, field_revision_robxro8n_field_name.revision_id AS revision_id, field_revision_robxro8n_field_name.bundle AS bundle, fcet.type AS entity_type
FROM 
{field_revision_robxro8n_field_name} field_revision_robxro8n_field_name
INNER JOIN {field_config_entity_type} fcet ON fcet.etid = field_revision_robxro8n_field_name.etid
WHERE  (field_revision_robxro8n_field_name.robxro8n_field_name_value >= :db_condition_placeholder_0) AND (field_revision_robxro8n_field_name.deleted = :db_condition_placeholder_1) AND (fcet.type = :db_condition_placeholder_2) 

So, yes, we were using table field_revision_lcsz8rij_field_name twice, once in the FROM and once in

INNER JOIN {field_revision_lcsz8rij_field_name} field_revision_lcsz8rij_field_name0 ON field_revision_lcsz8rij_field_name0.etid = field_revision_lcsz8rij_field_name.etid AND field_revision_lcsz8rij_field_name0.revision_id = field_revision_lcsz8rij_field_name.revision_id

However, I also that the query changed from SELECT DISTINCT to just SELECT. The distinct setting comes from field_sql_storage_field_storage_query():

  // Do not add fields[0] twice, we already have the first alias.
  unset($query->fields[0]);
  $table_aliases[0] = $field_base_table;
  // Add tables for the fields used.
  foreach ($query->fields as $key => $field) {
    if ($field['cardinality'] != 1) {
      $select_query->distinct();
    }

So I think by preventing the base field table from being added twice we introduced a bug in which if the field base table has cardinality > 0 we used to mark the query distinct and now we do not. This looks easy to fix by duplicating the cardinality check outside the loop for fields[0], but as I am now setting in my sprint demo meeting and trying to pay attention, I can't quite be sure enough to fix it right now. :-)

I did, however, roll a new patch that uses RETURN_ALL in some comments that were still referring to deleted(NULL).

arlinsandbulte’s picture

Status: Needs work » Needs review
bjaspan’s picture

Status: Needs review » Needs work

The query example in my previous comment is somewhat confusing because the field names are randomly generated and thus different in the two examples. (I've always objected to the use of randomName() in Drupal's tests for this exact reason. Tests should be 100% consistent and reproducible.) To make the example more clear, I hacked up EntityFieldQueryTestCase to name the fields consistently:

    static $count = 0;
    $this->field_names[0] = $field_name = drupal_strtolower('f'.($count++). '_field_name');

I then re-ran the tests both before and after the fields[0] fix. The first affected query, old way:

SELECT DISTINCT field_revision_f0_field_name.entity_id AS entity_id, field_revision_f0_field_name.revision_id AS revision_id, field_revision_f0_field_name.bundle AS bundle, fcet.type AS entity_type
FROM 
{field_revision_f0_field_name} field_revision_f0_field_name
INNER JOIN {field_config_entity_type} fcet ON fcet.etid = field_revision_f0_field_name.etid
INNER JOIN {field_revision_f0_field_name} field_revision_f0_field_name0 ON field_revision_f0_field_name0.etid = field_revision_f0_field_name.etid AND field_revision_f0_field_name0.revision_id = field_revision_f0_field_name.revision_id
WHERE  (field_revision_f0_field_name0.f0_field_name_value >= :db_condition_placeholder_0) AND (field_revision_f0_field_name.deleted = :db_condition_placeholder_1) AND (fcet.type = :db_condition_placeholder_2) 

and new way:

SELECT field_revision_f0_field_name.entity_id AS entity_id, field_revision_f0_field_name.revision_id AS revision_id, field_revision_f0_field_name.bundle AS bundle, fcet.type AS entity_type
FROM 
{field_revision_f0_field_name} field_revision_f0_field_name
INNER JOIN {field_config_entity_type} fcet ON fcet.etid = field_revision_f0_field_name.etid
WHERE  (field_revision_f0_field_name.f0_field_name_value >= :db_condition_placeholder_0) AND (field_revision_f0_field_name.deleted = :db_condition_placeholder_1) AND (fcet.type = :db_condition_placeholder_2) 

It shows the same result as my previous comment but is easier to compare.

chx’s picture

Status: Needs work » Needs review
FileSize
101.58 KB

That's easy to fix -- I have moved the initial select query create inside the loop. Less code, even. Note that Barry's patch left out the new test file.

bjaspan’s picture

Status: Needs review » Reviewed & tested by the community

Approved.

jhodgdon’s picture

Status: Reviewed & tested by the community » Needs review

Some of the doxygen comments in this patch need editing. Please do not commit until this is done. If no one gets around to it before I do, I should have time in the next day or two.

ADDED: If you don't mind, I would also like the chance to verify that my contrib module still works with these changes, since it may (so far) be the only customer for this field query class.

jhodgdon’s picture

This is the one I noticed:

+  /**
+   * Return everything regardless they are deleted or not.
+   *
+   * @see EntityFieldQuery::deleted()
+   */
+  const RETURN_ALL = NULL;

I think it should say "Indicates that both deleted and non-deleted fields should be returned."

I didn't read past that point (short on time today). If that's the only new docblock, and the above change is made, and you're sure the patch still works, I'm OK with RTBC.

chx’s picture

Filed a followup at #821910: PHPDoc rules (and not Doxygen as one would assume) . Also, we will need a followup which JOINs less (this is not trivial so it's not in this issue). And another which adds an alter to the SQL storage module so that node access has a chance to add metadata / hook in otherwise. These are followups.

jhodgdon’s picture

chx says there is a new comment on + * @param $delta_group in function fieldCondition. I'm fine with that doc.

So if #140 is done I am for RTBC.

chx’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
101.6 KB

Rerolled with #140.

chx’s picture

Wow, coder found three whitespace errors.

Dries’s picture

Status: Reviewed & tested by the community » Fixed

I've spent a good amount of time (and 2 beers) looking at this patch over the last couple of weeks, and decided to commit it to CVS HEAD. Great job all!

dmitrig01’s picture

Status: Fixed » Needs review
FileSize
33.73 KB

You forgot to commit the tests - don't drink and Drupal! (and no Drupalling Under the Influence)

Status: Needs review » Needs work

The last submitted patch, tests.patch, failed testing.

marcingy’s picture

The tests look committed to me, leaving open for a second pair of eyes.

marcingy’s picture

Status: Needs work » Needs review
dmitrig01’s picture

Status: Needs review » Fixed

Yes, looks like it did. I'm reading commits off Heine's feed and maybe it didn't pick it up for some reason.

rfay’s picture

Issue tags: +API change

OK, this was an API change. at least: field_attach_query() was replaced by EntityFieldQuery.

Please post a coherent description of the API change impact and I'll announce it to the dev list.

Island Usurper’s picture

Feel free to use or edit this as needed.

To convert code that is using field_attach_query(), the main task is to categorize the conditions into entity, field and property conditions. The array of conditions passed to field_attach_query() becomes multiple calls to the respective condition methods of EntityFieldQuery. Each condition is an array containing the name of the column and the value to compare it against. These are merely arguments to the condition methods, similar to the arguments of SelectQuery::condition().

Entity conditions are easy because there are only four possibilities: 'entity_type', 'bundle', 'revision_id', and 'entity_id'. Note that these names are not the same as the entity-specific columns that were used in field_attach_query() (e.g.: 'type', 'nid', and 'vid' for nodes).

EntityFieldQuery allows for multiple fields to be queried, but converted code will probably only need one field condition. Instead of the internal field ID, EntityFieldQuery::fieldCondition() takes the field name or the return value of field_info_field() as the first argument. The column name, value, and operator from the conditions array follow as usual.

Property conditions compare entity data that isn't stored in fields, such as $node->sticky. Using these requires the entityCondition on entity_type. I don't believe field_attach_query() supported anything like this, but it may be that using it will simplify the surrounding code.

The options that changed how field_attach_query() worked have analogous methods that set flags on the EntityFieldQuery object.

Example code demonstrating the changes needed. The query counts the number of nodes whose types are in $types and have the taxonomy term $child:

   $field = field_info_field('taxonomy_catalog');
   $children = taxonomy_term_load_multiple($tids);
 
   foreach ($children as $child) {
-    $conditions = array(
-      array('tid', $child->tid),
-      array('type', 'node'),
-      array('bundle', $types),
-    );
-
-    $child->nodes = field_attach_query($field['id'], $conditions, array('count' => TRUE));
+    $query = new EntityFieldQuery();
+    $query->entityCondition('entity_type', 'node')
+      ->entityCondition('bundle', $types)
+      ->fieldCondition($field, 'tid', $child->tid)
+      ->count();
+
+    $child->nodes = $query->execute();

     $catalog->children[] = $child
   }
jhodgdon’s picture

See also comments #113 (before) and #120 (after) above, which give an example of a contrib module that I ported to use the new code.

dman’s picture

*sigh* I'm another HEAD-follower that deserves a beer from someone.
Though I'm following a few weeks behind..

OK, previously, I was getting taxonomy_xml D7 to work by
- extending terms with a field called 'guid' so I can match taxonomy terms to a solid GUID, external URI. (previously used taxonomy_enhancer for that in D6)
eg http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#PastaWithWhiteSauce from the named W3C RDF taxonomy example I'm using as one of the test cases.

I have a function called taxonomy_xml_get_term_by_guid($guid, $vid = NULL) that resolves this for me.

Given a GUID, what is the term that locally represents that?
That is,
what is the term that has a field GUID with the given value?

Before:

  // first, find the field ID I use $guid_field_id . This is mostly static
  // then
   $field_lookup = field_attach_query($guid_field_id, array(array('value', $guid)));

  // Load that entity (assume only one valid match)
  if (! empty($field_lookup['taxonomy_term'])) {
    $term_entity = array_pop($field_lookup['taxonomy_term']);
    $term = taxonomy_term_load($term_entity->tid);
    return $term;
  }

After:

    // find the field info for the GUID field I'm using. ( using field_info_field() )
    // Now, 
    // Find the thing(s)
    // of type taxonomy_term
    // that has a field (guid)
    // with a value of (desired guid)
    $id_lookup_query = new EntityFieldQuery();
    $id_lookup_query->entityCondition('entity_type', 'taxonomy_term')
      ->fieldCondition($guid_field_info, 'value', $guid);

    $field_lookup = $id_lookup_query->execute();
    // $field_lookup is now an array of entities (or at least entity ids) that match the lookup
    // eg
    // array('taxonomy_term' => array(
    //   9 => object('tid' => 9, 'vocabulary_machine_name' => 'food'),
    // ))

    /// Fully load that entity (assume only one valid match)
    if (! empty($field_lookup['taxonomy_term'])) {
      $term_entity = array_pop($field_lookup['taxonomy_term']);
      $term = taxonomy_term_load($term_entity->tid);
      return $term;
    }

Yahoo. My import log works again.

Found http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#EdibleThing = EdibleThing 70
Pasta # 75 is a child of EdibleThing # 70
Found http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#Pasta = Pasta 75
PastaWithWhiteSauce # 76 is a child of Pasta # 75
Found http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#PastaWithWhiteSauce = PastaWithWhiteSauce 76
PastaWithLightCreamSauce # 83 is a child of PastaWithWhiteSauce # 76 (source)

Thanks Island Usurper for the example #152 that gave me enough to go on!

I probably should get the vid filter back in there also??, but not right now.

Posting my before and after just as migration notes if anyone else needs examples.

chx’s picture

Also you can do

    if (! empty($field_lookup['taxonomy_term'])) {
      $tids = array_keys($field_lookup['taxonomy_term']);
      $term = taxonomy_term_load($tids[0]);
      return $term;
    }

Or

    if (! empty($field_lookup['taxonomy_term'])) {
      list($tid) = each($field_lookup['taxonomy_term']);
      $term = taxonomy_term_load($tid);
      return $term;
    }
rfay’s picture

@dman: I feel your pain. As a result, I try to announce API changes in the Development mailing list, so if you're not on that you may want to subscribe. This one was announced on June 17: http://lists.drupal.org/pipermail/development/2010-June/035734.html

Of course if anybody has a better idea of how to announce API changes, we're looking forward to it.

dman’s picture

Thanks, that post is actually how I found myself here, after searching for 'field_attach_query' and wondering why my code had it in it and what I had been thinking at the time...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

the greenman’s picture

Since I can find no other issue for raising this, I am going to pop in here to gain a few eyeballs.

There is no way to chain conditions on the same field.

When I attempt to select by two different columns on the same field, the query generated creates a join that breaks any combined logic.

Working with the new Organic Groups, I would like to select all active members of a group. Group membership is now a field, so I create a field query that specifies the group id and the state.

$query
    ->entityCondition('entity_type', 'user')
    ->fieldCondition('group_audience', 'gid', $gid);
    ->fieldCondition('group_audience', 'state', (array) $states, 'IN');

The generated SQL however creates both these fieldCondition statements as joins, meaning the state returned is for the correct user, but may have nothing to do with the group specified.

Although I understand that this may be a limitation of the "possibly not sql" back-end, we should probably have some way to achieve this.

jhodgdon’s picture

This issue is closed.

If you have discovered a documentation or functionality problem with this API, please file a separate issue. If you need support or help, please use one of the Community and Support options (click on Community and Support in the drupal.org header) such as support forums or IRC.

Damien Tournoud’s picture

@the greenman: look at the $delta_group and $language_group parameters. They are designed precisely for what you are looking for.