These functions exist in field_sql_storage.module:

_field_sql_storage_tablename
_field_sql_storage_revision_tablename
_field_sql_storage_columnname
_field_sql_storage_indexname

Furthermore, the function _field_sql_storage_schema exposes a Drupal table schema from which all of this can be learned (and _field_sql_storage_schema calls some of the functions listed above).

In order to avoid creating materialized Views (leads to data bloat) piecemeal and without an API (leads to unintelligibility and risk for bugs) for each use case where we need to join field instance value data with Drupal core table data, we need to let Drupal know the table and column names so it can construct queries which join on field instance data tables.

Plan:

  1. Turn some or all of the above functions into field storage API "hooks."
  2. Document that they're optional for field storage modules. Likely only SQL storage modules could return something meaningful.
  3. All code that depends on field attach functions would need to handle the possibility that no value is returned.
  4. Return to fixing #412518: Convert taxonomy_node_* related code to use field API + upgrade path and other issues that depend on this.

This was discussed with yched and chx at Drupalcon code sprint Saturday.

CommentFileSizeAuthor
#24 field_api_storage_exposed-569224-24.patch10.86 KBAnonymous (not verified)
#22 field_api_storage_exposed-569224-21.patch10.71 KBAnonymous (not verified)
#18 field_api_storage_exposed-569224-18.patch10.51 KBAnonymous (not verified)
#17 field_api_storage_exposed-569224-17.patch10.32 KBAnonymous (not verified)
#15 field_api_storage_exposed-569224-15.patch9.87 KBAnonymous (not verified)
#14 field_api_storage_exposed-569224-14.patch9.99 KBAnonymous (not verified)
#13 field_api_storage_exposed-569224-13.patch6.55 KBAnonymous (not verified)
#11 field_api_storage_exposed-569224-11.patch3.54 KBAnonymous (not verified)
#10 field_api_storage_exposed-569224-10.patch3.06 KBAnonymous (not verified)
#8 field_api_storage_exposed-569224-8.patch2.65 KBAnonymous (not verified)
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

All code that depends on field attach functions would need to handle the possibility that no value is returned. <= yes but this is a very little extra code. For example in taxonomy page you already have a piece of code that checks whether there are any nodes for that page and displays the "sorry no dice!" message if not. So , in pseudocode:

$output = array();
$table_data = field_api_give_me_table_data();
if ($table_data) { This if should be all your error handling.
  produce output
}
if (!$output) {
  no nodes here; // This code already exists.
}

so definitely do not die, throw an exception etc just walk out gracefully. After all the message "no nodes to display" is correct because you did not find any nodes that this code can display. Anyone storing taxonomy data in non-SQL will write their own Views3 backend and list for themselves or something like that anyways. Just dont produce errors where none is welcome.

catch’s picture

Category: task » bug

Can't ship core without this.

bjaspan’s picture

Category: bug » task

In principal this is all fine. We do not need to force lowest-common-denominator storage functionality on code that wants to require a specific kind of storage.

yched and I discussed the api for this a bit this morning. I was thinking we would create a hook like this:

// Return internal details about the storage for $field.  For example, an SQL storage module might return the 
// Schema API structure for the table.  A key/value storage module might return the server name, 
// authentication credentials, and bin name.
function hook_field_storage_details($field);

// Return an array of storage details for $field keyed by the field storage module(s) storing the data. 
// The returned array may have more than one entry if multiple storage modules are storing data for
// the field; for example, the data might be stored in the local SQL database for use by Views but also in a
// shared key/value store for super-fast parallel reads of fields for object loading.
function field_info_storage_details($field);

In the above approach, Field API would invoke hook_field_storage_details() for all modules on a field and any module that stores it (or that will store it, because no data may have been provided yet) returns its data.

yched's idea was that basically this exact information would exist, but instead of having to call field_info_storage_details($field) to get it, the info would be part of the basic field structure available from field_info_field($field_name).

I probably like yched's idea better. I'm not sure if there is any practical difference in the difficulty of implementation.

chx’s picture

Barry's idea pretty works for me, how the exact implementation happens matters less to me. Once this is done, PBS needs to check whether it's storing something tha'ts SQL bound originally and dont do squat if it's not. Or so I think. Per field storage will help thsi a lot.

yched’s picture

I don't think there are many differences in the two approaches in #3. Having the info available in $field definition array would just be field API taking care of calling hook_field_storage_details() in _field_info_prepare_field().

Also, I'm not sure returning a schema API structure is what we really want. The important info is
a) the table name
b) for each of the field's 'columns' ('value', 'format'...), the name of the actual db column.

bjaspan’s picture

Category: task » bug

Didn't mean to change the category in #3.

bjaspan’s picture

Continuing my slow march through the critical queue... though this one is blocked on #443422: 'per field' storage engine.

Anonymous’s picture

Status: Active » Needs review
FileSize
2.65 KB

Here's my first draft. It does put the whole schema into the storage array of the field info. It probably doesn't need to. Do we need details about revision storage?

A screenshot devel field info output is here: http://files.me.com/bangpound/c0w0iy

bjaspan’s picture

Status: Needs review » Needs work

Thanks for taking this on! Comments so far:

* Storage details need to be per-instance, not per-field. Otherwise, we lose the most obvious use case of being able to expose storage info for per-bundle tables.
* The storage details need to be documented as read-only; no module can safely ever write anything except through Field Attach API.
* We probably should expose revision storage info, yes; the Field API supports revisions, and modules may want efficient access to that data.
* Multiple modules should be able to add storage details to instances, under different keys. For example, a field's primar storage engine might be local SQL, but a contrib module might also store the data in memcache. Some other module may want to query that field and know that if memcache details are available, it can be accessed faster.
* You are exposing the whole table schema, but you are not actually exposing the mapping between field column name and table column name (e.g. for the body field, the field column 'value' is stored in the table column 'body_column'). This I think is why yched suggested just returning tablename => (field_column => table_column). If a caller wants the full schema, it can use drupal_get_schema(tablename) to get it; generally, it won't be needed.

The hardest part to figure out, and the reason I've still been thinking about this issue instead of writing a patch, has to do with the use case of per-field vs. per-bundle storage. Assume per-field storage is the default, but someone turns on the per-bundle storage module. When Views goes to query the field's data, it wants to know that it can read it from the per-bundle table. Options:

1. Per-field storage details are in $instance['storage']['field_sql_module']. Per-bundle details go in $instance['storage']['pbs'], and Views just has to know to look there. Not that great.
2. Per-field storage details are in $instance['storage']['field_sql_module']. pbs.module alters the info to refer to is per-bundle tables. Better, I think.
3. Per-field storage details are in $instance['storage']['sql'], and pbs.module alters them there. This means that the storage details key name is not necessarily tied to the storage module name, but can be a generic key to refer to a storage location that other modules may want to tweak. We just let the generic key names evolve on their own.
4. We decide that the whole idea of a non-storage module using the _pre-hooks to change or duplicate storage is an idea that only made sense before we had per-field storage engines, and now pbs.module should be rewritten as a full storage engine that uses its per-bundle tables when possible and otherwise dispatches to the field_sql_storage engine (i.e.: pbs.module would be specifically written as an add-on storage module on top of field_sql_module). But then we would still need to decide the key for $instance['storage']. It would basically still have to be 'sql', because otherwise Views etc. wouldn't know where to look for storage details depending on whether pbs.module was enabled or not.

The difference between 3 and 4 is that with #3, this patch has to provide a way for modules to alter storage details from other modules, whereas in #4 if you want to do any kind of duplicate/denormalized/sharded storage, you have to write a new storage engine that layers on top of other storage engines.

I think 4 might be the long-term correct approach, but it will probably need a storage-routing layer since otherwise every storage engine that wants to duplicate/denormalize/etc will have to be tied to a specific lower-level engine, and we don't have that yet. So for now, I think 3 is the most straightforward: let modules alter storage details provided by other modules.

Comments?

Anonymous’s picture

Don't review this patch. It's not significantly different from the previous one. I'm just putting it here for safe keeping. I'll resume working on it this evening.

Anonymous’s picture

Status: Needs work » Needs review
FileSize
3.54 KB

A new patch. The gathering is done in _field_info_collate_fields().

The storage details are stuck onto instances. This is the output of field_info_instances() http://skitch.com/bangpound/nnh6b/execute-php-code-drupal-7

bjaspan’s picture

Status: Needs review » Needs work

Nice work! Review:

hook_field_storage_details_alter() needs the $field and $instance for the $details being altered. Otherwise, it can't know if or how to alter the storage details. Document that the $instance structure will not already have details filled in, they have to be read and modified in the $details argument.

This needs tests. Suggested tests:

1. Create a field using field_sql_storage, create an instance. Retrieve the instance with field_info_instance(), make sure the results look right.

2. Implement field_test_field_storage_details_alter and change the storage details for any field named "field_test_change_my_storage_details" to something else. Then create that field, get its storage details, and verify they look right. (Notice that you need the $field/$instance structures for this to work.)

Anonymous’s picture

Status: Needs work » Needs review
FileSize
6.55 KB

Added $field and $instance to the hook_field_storage_details_alter().
Added documentation about where to change the details and noting that they're not in the instance yet.
Added tests for field_sql_storage.
Began to add field_test_field_storage_details_alter but didn't finish.

Anonymous’s picture

This patch has tests for hook_field_storage_details_alter.

Anonymous’s picture

New patch. $instances isn't optional.

bjaspan’s picture

Status: Needs review » Needs work

This is good, very close now.

No biggie, but you can leave out most of the $instance options when creating an instance for tests, making the code simpler. You only really need field_name and bundle.

You should not have to call _field_info_collate_fields(TRUE) explicitly in your tests; if you do, there is a bug somewhere.

Documenting in field.test that field_test module is magically changing the storage details b/c of your field name, and that moon/mars is correct, will be helpful.

Why test only one of the column lists in details? Seems likt it would be easy to make a loop and test them both, and by explicit name instead of via array_shift($details).

Anonymous’s picture

Status: Needs work » Needs review
FileSize
10.32 KB
Anonymous’s picture

New screenshot shows that the current and revisions tables are identified in the array of storage details.

http://skitch.com/bangpound/ndbw2/welcome-to-drupal-7-drupal-7

This patch implements those changes.

Status: Needs review » Needs work

The last submitted patch failed testing.

Anonymous’s picture

Status: Needs work » Needs review

huh?

Dries’s picture

I'm slightly confused by the 'current' tables -- where does that come from? That does not seem to be documented.

Anonymous’s picture

The 'current' and 'revisions' refers to the age of field instance values that can be loaded from the field storage. In trying to document this, I discovered FIELD_LOAD_CURRENT and FIELD_LOAD_REVISION constants. Should I use these for the array keys?

I've updated the patch with new documentation on hook_field_storage_details.

bjaspan’s picture

re #22: Yeah, I like using FIELD_LOAD_CURRENT and FIELD_LOAD_REVISION instead of current/revision.

Anonymous’s picture

Here it is. Documentation on the hooks is still not my best work... the 'age' part is tricky to explain in a few lines of phpdoc.

Dries’s picture

Status: Needs review » Fixed

This patch looks good. I made some minor tweaks to the documentation and committed it to CVS HEAD. The documentation could use some more love, but that's fine to do as a follow-up. Great job bangpound.

yched’s picture

It's been difficult for me to find net acces the last couple days, sorry about that. The patch that went in looks excellent to me. Great job !

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.