Placeholder issue, not sure I get a clear picture of this for now.

The question of indexes was not discussed during the FiC sprint.
#392494: Fields API needs to allow searching of scalars kind of begs for it, and it will probably be required by #412518: Convert taxonomy_node_* related code to use field API + upgrade path.

The main issues for indexes in CCK were the UI, and preserving indexes during field storage migration (per field / per type). So the D7 core issue itself is simplified for now.

Off hand, the main questions are that come to mind are :

- Who defines indexes :
a) the field-type module
b) the caller of field_create_field()
c) both (hardcoded defaults by the field type, overridable on field creation)

As I argued in several CCK threads, a) is OK for nodereference or taxonomy fields, but leads to index bloat when applied to other field types (text, number). It's the actual query usage (Views, custom queries) that should dictate what indexes are created.
b) or c) raise the question of #367013: Field update (field_update_field) : how do you change or add indexes to a field once it's been created ?

- We cannot assume SQL storage anymore
So we need a storage-agnostic index definition. Shouldn't be too much of an issue, we already do that for field storage 'columns', and indexes are simpler beasts. If a storage engine doesn't support indexes, that's not our problem, as long as it makes it clear to its users.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Priority: Normal » Critical

Subscribing and bumping priority to critical.

yched’s picture

Status: Active » Needs review
FileSize
12.68 KB

Discussions in other threads have showed that reference fields (nodereference field, filefield, taxo as field, formatted text as referencing to a text format...) will have to selectively clear the field cache when one of the referenceable entities (node, file, taxo term, text format...) is updated or deleted. This will rely on field_attach_query() and requires indexes on the reference colulmn. So we do need at least option a) above (indexes imposed by the field type)
Attached patch goes for c) : field types can specify hardcoded indexes, and the $field structure provided to field_create_field() can specify additional indexes.

The question of how field indexes can be added to existing fields is tied to field_update_field(), still to be tackled.

- field-type module's hook_field_columns() now returns 'columns' and 'indexes', and has thus been renamed (back) hook_field_schema() (which some comments still referred to).
Granted, 'columns' is not consistent with regular hook_schema() definitions, but using 'fields' in our case seems out of the question, and it has been agreed somewhere that it's rather hook_schema's 'fields' entry that should be renamed...
- those indexes can only involve the data columns. You cannot specify a multicolumn index mixing 'value' and 'bundle'.
- In order to store $field['indexes'], the patch modifies the schema of the field_config table to use a serialized 'data' column, storing everything that doesn't require it's own dedicated column - just like what we have for the field_config_instance table.

Still a few TODOs for PHPdocs, but setting to CNR for testbot and feedback.

moshe weitzman’s picture

looks reasonable to me.

bjaspan’s picture

Haven't tried it but seems reasonable.

I notice that if you pass in indexes to field_create_field(), it completely replaces the indexes specified by the field type instead of adding to them. I guess that is reasonable since the field creator may know more about how the field will be used. This means the field types need to document clearly what indexes they create by default so that field creators know how to keep those indexes and add more if that is what they want to do.

yched’s picture

Status: Needs review » Needs work

More specifically, the field definition and field type indexes are merged together. If the field definition defines an index with the same name as one of the field type indexes, field definition wins (you can even remove the index by providing an empty array of columns). I added a few simpletests for field_create_field() to explicitly check that behavior.
Of course, the docs should make it clear that you override the field-type indexes at your own risks, and should only do so if you know what you're doing.

Will post the new patch when I'm done with the PHPdocs.

yched’s picture

Status: Needs work » Needs review
FileSize
20.47 KB

Updated with tests and PHPdocs. I think we're all set.

Summary of what the patch does (see #2 and #5 for the rationale)
- let field types specify default indexes; repurposes hook_field_columns() to hook_field_schema().
- adds default indexes to existing core field types.
- let field_create_field() accept additional indexes, or override the default
- add tests that check that overrides between default indexes and runtime-specified indexes take place as expected. The db API doesn't provide index inspection, so we cannot have tests for the SQL storage module to check that indexes are actually created in the db.
- removes a few more 'field_name' => drupal_strtolower() in FieldTestCase, in favor of explicit 'field_[n]' names.
- adds/updates PHPdocs for the feature. Documenting the 'override default indexes in field_create_field()' part expanded into documenting the default values of the non-required $field and $instance properties for field_create_field() and field_create_instance()

bjaspan’s picture

yched, you've been doing an amazing amount of great work. Thank you!

catch’s picture

Status: Needs review » Needs work

Didn't review the functionality in full, but yched's description sounds really good.

Below is code style review only:

-      'field_name'        => array('type' => 'varchar', 'length' => 32, 'not null' => TRUE, 'default' => ''),
-      'bundle'            => array('type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'default' => ''),
-      'widget_type'       => array('type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'default' => ''),
-      'widget_module'     => array('type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'default' => ''),
+      'field_name'        => array(
+        'type' => 'varchar',
+        'length' => 32,
+        'not null' => TRUE,
+        'default' => ''
+      ),

Looks like this is only a whitespace change, makes the patch harder to review.

An array of indexes on data columns, using the same definition format
+ *     than Schema API index specifications. Only columns that appear in the
+ *     'columns' setting are allowed. Note that field types can specify
+ *     default indexes, that can be modified or added to when creating a field.

Couple of grammar issues - suggest this:

An array of indexes on data columns, using the same definition format
+ *     as Schema API index specifications. Only columns that appear in the
+ *     'columns' setting are allowed. Note that field types can specify
+ *     default indexes, which can be modified or added to when creating a field.
+  $field['columns'] = $schema['columns'];
+  // 'indexes' can be both hardcoded in the field type, and specified in the
+  // incoming $field definition.
+  $field += array(

Very minor but this gets quite dense, could do with a blank space above the comment.

+ *   Other properties, if ommitted

should be 'omitted'.

+ *   A field instance structure. The field_name and bundle properties are
+ *   required.
+ *   Other properties, if ommitted, will be given the following default values:
+ *   - label: the field name
+ *   - description: empty string
+ *   - weight: 0
+ *   - required: FALSE
+ *   - default_value_function: empty string
+ *   - settings: each ommitted setting is given the default value specified in
+ *     hook_field_info().
+ *   - widget:
+ *     - type: the default widget specified in hook_field_info().
+ *     - settings: each ommitted setting is given the default value specified in
+ *       hook_field_widget_info().
+ *   - display:
+ *     Settings for the 'full' build mode will be added, and each build mode
+ *     will be completed with the follwong default values:
+ *     - label: 'above'
+ *     - exclude: FALSE
+ *       TODO This is subject to change, see: http://drupal.org/node/367215
+ *     - type: the default formatter specified in hook_field_info().
+ *     - settings: each ommitted setting is given the default value specified in
+ *       hook_field_formatter_info().

This doesn't look like it has anything to do with indexes? There's more whitespace fixes below as well. If there's a reason for it to be added in this patch, typo on 'follwong'.

yched’s picture

Status: Needs work » Needs review
FileSize
18.86 KB

Updated for catch's comments.

About the last remark : That's what I was referring to with "Documenting the 'override default indexes in field_create_field()' part expanded into documenting the default values of the non-required $field and $instance properties for field_create_field() and field_create_instance()".
The PHPdoc changes for field_create_instance are not related to indexes, but it keeps the docs for field_create_field() and field_create_instance() inline.

bjaspan’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
20.14 KB

Nice work. I fixed a few comment typos/wordos but made no functional changes. New patch attached, RTBC.

One question that should NOT hold up a commit: Should we allow unique keys to be specified by a field type or field_create_field() caller? Probably not, because inter-object validation is not something the Field API currently considers, and if we want that it can be in a separate issues. I'm just asking since we are now returning a more complete schema structure from hook_field_schema().

Dries’s picture

Status: Reviewed & tested by the community » Fixed

Looks great. Thanks -- committed.

yched’s picture

Status: Fixed » Reviewed & tested by the community

Yes, I've been wondering about unique indexes too. If at some point we wanted to add a 'unique field values' feature, it should be done at a higher level, because we'll want to catch duplicates and raise errors at validate time, meaning before the storage write.
So, maybe 'unique key' would be handy as a syntax to specify the feature in the $field definition, but I'm not sure it would translate to actual unique keys at the storage level.
Besides, it seems users might want to specify 'unique values' :
- across all instances
- per instance (per bundle)
- per object

yched’s picture

Status: Reviewed & tested by the community » Fixed

Oops, crosspost. Thanks Dries !

yched’s picture

FWIW, here's a screenshot of the options screen proposed by the unique_field contrib D6 module : http://drupal.org/node/393650.
The storage engine encapsulation will probably make the 'combiniation of fiald_a, field_b, field_c must be unique' feature tricky performance-wise...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.