This issue is for committing the initial version of the Field API to Drupal.

Reviewing and finally committing this patch will be a somewhat complex process. The patch itself adds a fair bit of code, currently around 290K, so "reviewing it" like smaller patches is not really practical. Even so, the patch by no means contains everything the Field API will or must eventually contain. However, we cannot really wait to get "everything" done before committing anything. So, the plan is to have the initial patch reviewed as well as is practical, do whatever additional really mandatory development is required (ie: try to make sure it does not break everything or have really egregious design flaws), commit it, and then do more via follow-up patches.

If you want to help with Field API but are not sure how to get started, please read the How to Help with Field API page: http://drupal.org/node/361849.

If you find reviewing the patch overwhelming, you can provide just as much or more value by reviewing the API documentation: http://drupal.org/node/362764.

A variety of background reading will be helpful:

* The Fields in Core Code Sprint Final Report: http://drupal.org/node/361042
* The Data Architecture Design Sprint Final Report: http://groups.drupal.org/node/9221

This issue obsoletes the previous Fields in Core issue at http://drupal.org/node/265604.

The first patch will be arriving here shortly.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

bjaspan’s picture

FileSize
288.15 KB
markus_petrux’s picture

subscribe

bjaspan’s picture

FileSize
289.42 KB

I used the wrong diff options.

David Strauss’s picture

If you just want to browse the code for Drupal CVS HEAD + Fields in Core, we have a repository browser set up:

http://vcs.fourkitchens.com/drupal/7-fic/files

KarenS’s picture

People following this should be aware there is now a 'mother' document that lists all the Fields in core issues at http://drupal.org/node/361849.

KarenS’s picture

The above patch is the core portion of the new code, I've just committed to CCK HEAD the contrib portion of the code so you can use the UI to try things out. Many things are still broken but it is generally working. I'll try to keep CCK HEAD current with the new code as we move along. I also added to CCK head two new modules from our Bazaar repository, 'demofield' a simple module to create a demo field, and 'PBS' to create a single table for all fields in a content type (now called a 'bundle').

cburschka’s picture

Copying my subscription from the obsoleted issue...

Anonymous’s picture

Likewise

mfer’s picture

/me wishes there was a subscribe button

David Strauss’s picture

FileSize
315.65 KB

Attached are test failures of HEAD + the latest contents of the Bazaar repository.

bjaspan’s picture

Hmmm. Shouldn't that have been generated automatically?

David Strauss’s picture

@bjaspan It probably would get generated automatically if I posted an updated patch, but I run the test process after each merge from CVS HEAD to ensure I'm not committing massive new failures back to the repository, and I may as well post the results. I will probably just put the PDFs on the Four Kitchens server in the future and link to the results from my commit message.

bjaspan’s picture

Assigned: bjaspan » Unassigned

This is a team effort *and* I do not want to clutter up the [#xxx] links with "Assigned to." :-)

swentel’s picture

Subscribing for review later, exciting stuff!

bjaspan’s picture

So far we've had some subscribes but no reviews. If you find reviewing the patch overwhelming, you can provide just as much or more value by reviewing the API documentation: http://drupal.org/node/362764.

Please?

agentrickard’s picture

Just a little review based on reading the API code (no testing). Hooray for the use of hook_modules_(un)installed() and _enabled(). That's a nice little tool in D7.

Not crazy about the function name field_transpose_array_rows_cols(). This is a variant of array_flip, and the name should indicate that. Maybe array_flip_recursive()?

I would use filter_xss_admin() instead of field_filter_xss() simply for a consistent administrator experience. There needs to be a very compelling reason why text that is valid in the footer is not valid for fields, and the comments do not explain the need for this function.

The argument list for field_format() is painfully long (6 elements). Drupal 7 has been moving away from these types of functions. Is there a good way to consolidate the arguments here? Perhaps this can be two arrays? One with $field data and one with $formatter data? Psuedo-code:

function field_format($field, $formatter = array()) {
 //... Unpack the arguments, this just shows what data is provided.
  $field['name'] = $name // The name of the field, to prevent using $field as both string and array, as in the current code.
  $field['data'] = $field; // The array of field data.
  $field['object_type'] = $obj_type; // This might just be $field['type']...
  $field['object'] = $object; 
  $field['item'] = $item;
  $formatter['name'] = $formatter_name;
  $formatter['settings'] = $formatter_settings;
}

Above, I am thinking about the changes to functions like l(), where the long sequence of arguments has been deprecated. However, the first 4 arguments are all required.... Still, I would prefer to pass a $field object or array that contains those 4 elements instead. Standardizing the $field definition should make development easier.

This comment also applies to field_view_field().

field_access should not use $op, but instead be split into field_access_edit() and field_access_view(), in keeping with recent changes to core hooks.

bjaspan’s picture

FileSize
329.52 KB

New version of patch. No major structural changes, mostly just tightening down some screws and adding docs. Ken's comments from #16 are *not* included.

Dries’s picture

I'm tempted to commit the Fields API patch as is. In many ways, it has been reviewed already, and it is too big to start nitpicking. I'm thinking that this time, we can refine it _after_ it has landed.

agentrickard’s picture

I would agree with Dries. My comments are largely semantics, in response to bjaspan's call for review. I would RTBC, but I haven't actually applied the patch...

Owen Barton’s picture

Issue tags: +Fields in Core

Adding a Fields in Core tag, so we can flag issues that are waiting for this one.

bjaspan’s picture

Dries: The rate of change has slowed so I think committing it is fine in terms of Field API functionality. We're not done, but a lot is there. I think the hard requirement for a commit should be "doesn't break anything that previously worked and doesn't cause core test failures," just like any patch. There are two ways I know we aren't there yet:

* The Field and Field Storage SQL modules are marked required but I do not think that fully configures them properly during install. So we need to fix that up.

* As David showed, we are causing some core test failures. We should fix those.

I'll shift my (currently recovering from food-poisoned) attention to those two tasks. After that, I'd be perfectly happy with a commit.

Dries’s picture

bjaspan: that makes sense, and I'm happy to proceed as outlined. Let's go for it! :)

alex_b’s picture

Subscribe.

bjaspan’s picture

Status: Needs work » Needs review
FileSize
317.68 KB

I'm setting this patch to CNR so the patch-bot will test it. Karen is wanting to finish #366377: Default values and #366383: Clean up new 'List' field type before this patch gets committed. I agree with her at least on the first one, because without that fix the CCK UI module displays errors every time you create a field. Most people will test Field API by running CCK, and we do not want their first impression to be "it's buggy."

bjaspan’s picture

FileSize
319.98 KB

The previous patch omitted some changes to core .inc files that I thought were spurious bzr diffs.

yched’s picture

Hem, the patch starts with changes to form.inc, that sneaked in from a previous attempt and led to a separate core patch. I removed it in the bzr repo, sorry about that...
Shouldn't make testbot cough, though.

kika’s picture

Perhaps I'm too late to the party but is the future (and possible phase-out) of CCK acronym is considered? There are several references to "CCK" in this patch, in the documentation -- do they need to be there? Can we just reference to "contributed Field configuration UI" or something similar?

Note: I am only questioning references to CCK in core -- not the renaming the contributed CCK modules (yet).

Anonymous’s picture

FAPI might work; #$@! that one is already taken. ;p How about FLDAPI?

bjaspan’s picture

Two responses:

1. I've never been a fan of the acronym CCK or name Content Construction Kit itself. CCK conveys no meaning at all and Content Construction Kit is cumbersome and not entirely clear either. I'd be happy to see the name replaced. Perhaps we should hold a naming contest! :-)

2. I am also troubled by the references to CCK/"the Field config UI" in the core patch. The fact that we find it necessary to refer to a contrib module from core is pretty good evidence that perhaps the referred-to functionality should be in core itself.

The plan since Feb 2008 has been not to include a "CCK UI" in core. The primary reason is just to keep the size of the initial commit down to something manageable. I also initially had conceptual objections on the grounds that core should contain mechanisms and functionality that enable features to be built but, in general, should not contain too many features in and of itself for reasons of maintainability and flexibility (because "lots o' features" is what contrib is for). However, my conceptual objections to a CCK UI in core are weakening. CCK is a fundamental strength of Drupal and we ought to ensure that 100% of users have access to it.

However, I'm still opposed to the current CCK UI going into core because, frankly, I do not think the UI model is very good. Certainly the implementation is way better than it used to be; Karen and Yves have made great improvements. However, the fundamental user experience of the UI is the same as it always has been, and I think it is poor. Nate's new Form Builder may be the right direction, or perhaps some completely different approach is best. The problem is that once we put a CCK UI into core it cannot change any more quickly than a new major Drupal release which, after D7, won't be at least until late 2010. I think that's too long to commit to the current UI model.

KarenS’s picture

I think trying to rename CCK will be an exercise in futility. There will be lots of ideas and no consensus and it will tie up lots of time that could be better spent doing other things and I am 99% sure that at the end of the day we would end up leaving it alone because we can't agree on an alternative. CCK has meaning for anyone now using Drupal, which is a lot of people. 'Acquia' is a meaningless name, as is any brand name. The name becomes meaningful as people associate it with what it is or does.

I agree the UI is not ideal, and I agree about leaving it out of core for now, but for a different reason. Again, we could spend a lot of cycles trying to re-invent it without coming to any conclusions. Look at the very very very long process that Views went through to try to reinvent its UI, and you still have plenty of people who think it went the wrong way.

I like Nate's form builder, but CCK is not a form builder, it is a content creator, and that's not the same thing -- you select the type of fields you want, and based on the type of fields you select, you have some options about what kinds of widgets are appropriate and you add them to your content. You can't do it the other way around. You can't choose a select widget and then decide what kind of field it is. And you can't change your mind about the field type later. And you do need a way to change your mind about what widget to use with your field, but you will have to be limited by the other widgets that work with that field. And there are a number of field settings that you need to lock down and not let anyone change. All those things might not be that easy to do with the form builder. We need to see if Nate can find a way to make the form builder into a content builder that will work the way CCK needs to work.

In the meantime, the current system, however far from ideal it is, works. And the reason I would leave it out of core for now is so we can continue to change it as much as necessary until we find a better system.

But ultimately we need a UI in core, and core needs to know that its API will work right in a UI. Some of the comments in the core code now are to document why certain things must be done one way or another and how it impacts the UI. The comments will make more sense once there is a UI in core, but we need to keep that information in the documentation for perspective.

yched’s picture

Not sure I see the differences between Karen's and Barry's positions, but I agree :-)

Having a usable 'semi straight foreport from CCK D6' UI for core fields API is an essential thing for the initial acceptance of the Fields API, it lets people start familiarize themselves with the concepts, etc.
I'm also opposed to *this* UI entering core. I also don't like current CCK UI. It just sucks less in D6 than in D5, but at the end of the D6 cycle it has been the main roadblock for new features :
- formatter settings (already supported by the D7 fields API, but not in the current UI)
- separate ordering (weights) and grouping (fieldgroups) for forms and for each display contexts

Fields are feature rich. I'd say that they are going to be most 'complex' and feature-rich thing in core. They need an UI with richer concepts and UI methaphors than what's currently needed (and thus offered) in core : drawers, AHAH subforms, popups, whatever. Not saying the future of a Fields Ui is necessarily Views 2 - but at least something in that direction...
I do not really see us finding the right feature-rich UI in time for a D7 release, *and* having it ready full fletched and polished enough that it can be frozen for a whole core cycle...

kika’s picture

but what about removing these 5+ "CCK" references from the patch and taking this discussion to "Fields in Core" in g.d.o?

bjaspan’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
322.82 KB

I believe this patch meets the criteria discussed in #21, meaning Dries is ready to commit it. Karen also managed to squeeze in some key fixes for default values and List and Option Widget modules, because she rocks.

So, setting this to RTBC. We know it is not perfect, but it is good enough to get started.

catch’s picture

While it doesn't stop us moving code around after commit, why is field.autoload.inc doing manual includes and calling private functions? I just applied the patch, grepped for "function field_attach_load_revision", ended up in autoload.inc, then had to schlep off to field.attach.inc to find the right function. Doesn't seem like anything which couldn't be handled by the registry instead.

KarenS’s picture

@catch, that is explained (somewhat) in #363967: Resolve auto-loading strategy.

bjaspan’s picture

Status: Reviewed & tested by the community » Needs work

I just discovered this change in the Field API patch that I do not understand. We should not commit the patch until we verify it is correct. Florian added it on 12/18/08 (during the sprint), bzr revision 229:

=== modified file 'modules/system/system.admin.inc'
--- modules/system/system.admin.inc	2009-01-27 01:00:13 +0000
+++ modules/system/system.admin.inc	2009-01-30 23:17:27 +0000
@@ -577,14 +577,7 @@ function system_modules($form_state = ar
   cache_clear_all('schema', 'cache');
   // Get current list of modules.
   $files = module_rebuild_cache();
-
-  // Remove hidden modules from display list.
-  foreach ($files as $filename => $file) {
-    if (!empty($file->info['hidden']) || !empty($file->info['required'])) {
-      unset($files[$filename]);
-    }
-  }
-
+  
   uasort($files, 'system_sort_modules_by_info_name');
 
   // If the modules form was submitted, then system_modules_submit() runs first
@@ -603,6 +596,10 @@ function system_modules($form_state = ar
 
   // Iterate through each of the modules.
   foreach ($files as $filename => $module) {
+    // Do not display hidden modules.
+    if (!empty($module->info['hidden']) || !empty($module->info['required'])) {
+      continue; 
+    }
     $extra = array();
     $extra['enabled'] = (bool) $module->status;
     // If this module requires other modules, add them to the array.

I note that Field module is now required, though it was not during the sprint. I do not know if this has anything to do with this patch.

KarenS’s picture

That's probably what was causing what I thought was a core bug that created all kinds of problems when non-required modules are dependent on required modules. I think this should not be in there, but maybe Florian can explain it.

bjaspan’s picture

I was wondering the same thing. In any case, no other required modules are ever listed as dependencies (e.g. no module lists node as a dependency), so there is no reason field.module should be listed as a dependency of anything.

Still, we need to figure out what is up with this system.module patch. I pinged Florian about it.

floretan’s picture

On the module administration page, hidden modules are removed from the general module array, but not from each module's dependencies. So if you set test_field.module (hidden) to be dependent on field.module, you get an undefined index error when trying to display the dependents of field.module (which is pulled out of the array containing all the module information from which hidden modules are removed.

This was a quick fix for that error we were seeing, but I realize that it shouldn't be part of the "Fields in core" patch. Now that fields are required by default, it's not needed anyways. I reverted the changes from revision 229 and committed them to bzr.

bjaspan’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
326.64 KB
sun’s picture

This patch contains a bloat of coding-style flaws. I tried to work on them based on #34, but it seems that my work is obsolete already. :(

Please set this issue to CNW after committing, so we can fix with the introduced issues.

bjaspan’s picture

@sun: Until this patch gets committed, you can make patches against it following the instructions at [#361849]. If you post a new full version of the patch with your own incremental changes, we will not easily be able to use it.

Once this patch gets applied, the normal d.o process will apply.

Dries’s picture

A couple of quick comments:

- It is not clear why "option widgets" is called "option widgets" instead of just "options". The other fields do not at "widget" either.

- We sometimes write "optionwidgets" in user facing text (e.g. modules page). We shouldn't glue these words together. (I think we should drop the word 'widget'.)

Dries’s picture

I have decided that I will commit this patch Tuesday evening at 5pm CET (11am EST) -- assuming the patch applies then. This gives everyone 60 hours for more "nitpicking", and maybe we can get some suggestions in still.

KarenS’s picture

I'm open to changing 'Option_widgets' to 'Options'. I can make that change, but first checking to see if anyone else sees a reason not to.

moshe weitzman’s picture

Very exciting. A few thoughts:

  1. Looks like _field_attach_load() maintains its own cache. Since this works for all fieldable entities, I would guess that #111127: Cache node_load is obsolete now. Agreed?
  2. It is bad sportsmanship that we are not yet supporting node_load_multiple(). We worked hard to get that into core. This could be a dealbreaker for Tuesday commit IMO. Yes, there is a comment about it in the code but that doesnt make it right - "// TODO D7 : to be consistent we might want to make hook_field_load() accept multiple objects too. ".
  3. // TODO : to my knowledge, no field module has any use for 'presave' on D6.
    +  // should we keep this ?

    Yes, we should. This is a very needed entry point. Remove the comment.

  4. This should be a drupal_alter() to be consistent with similar hooks on the node and page level: + foreach (module_implements('field_attach_view') as $module) {. I know that this would be slightly inconsistent with other field attach hooks, but thats how we usually handle the view operation.
  5. "leaving the garbage collection for a separate process". Could we elaborate or point people somewhere for more info?
  6. I'm not clear on what field.autoload.inc is for. The functions there look a lot like an avoidance of the auto-load feature of the code registry. We should never need to manually require_once() anymore except very early in the bootstrap. If this is needed, lets explain why in the @file comment.
  7. behaviors is a well established term in our javascript code. Perhaps we should find a different term?
  8. // TODO: check that field_name < 32 chars.. Can we increase that limit? A bit harsh IMO.
  9. "Load a field instance.". Should be plural.
  10. typo: "build_nmode"
  11. There are some unfamilar drupal_render properties in field_default_view(). Would be great to document the code flow during widget render in particular (perhaps field render as well). Might be in the patch already somewhere.
  12. +  // TODO : why do we need this ?
    +  $form['#cache'] = FALSE;
    

    Looks a bit troubling. If a form has fields it can't be cached? Is this AHAH related?

Thats it for now - Only got through 33% of the patch!

yched’s picture

- cacheability : caching node_load would be more efficient, it also caches non-field additions - and for now there are quite a few, since no core feature has been moved to 'fields' yet. Field API currently takes the best of both worlds : we let fieldable entities declare whether they implement their own cache (like node would). Field-level cache only triggers on entities that do not maintain their own cache.
This lowers the cost of being 'fieldable' : you don't *need* to implement your own cache, but you can if you want.

- we *are* supporting multiple loads :-) field_attach_load() and field_sql_storage_load() are multiple.
The TODO comment is only about hook_field_attach_load() : the hook that lets field types alter the data that has been fetched from the db (in D6, mainly used by modules that handle their own storage completely : amazon field, computed field...).

- comment about 'presave' - we would have a 'free for all' hook_field_attach_presave(). The question is rather 'do we want a field-type hook hook_field_presave()' ? I'm still not sold.

- drupal_alter instead of hook_field_attach_view() : yes, probably. I opened #367525: drupal_alter instead of hook_field_attach_view() for this.

- // TODO: check that field_name < 32 chars.. Can we increase that limit? A bit harsh IMO.
No, because of db limits on the length of columns names. We generate db column names like [FIELD_NAME]_[value|format|whatever_the_field_type_exposes_as_a_'column']

- + $form['#cache'] = FALSE;
I honestly don't know / can't remember... Unless this rings a bell to somebody, I think the only way to find out is to remove it and see what breaks :-p. We're starting to have form tests now, but they currently don't test AHAH 'add more' behavior.

yched’s picture

- "Load a field instance.". Should be plural.
- typo: "build_nmode"

Fixed in bzr repo.

moshe weitzman’s picture

Thanks yched for the corrections and replies. Hopefully someone else will address the other ones. I just edited my post to use a numbered list for easier conversation. Got through some more of the patch:

  1. FIELD.MODULE
  2. field_add_more_js() accesses $_POST directly and looks really raw. I am no expert on ahah, but this smells bad.
  3. I'm not a fan of the name field.info.inc since we already have info files and they mean something different than field_info. But no big deal.
  4. +// $Id: field.module,v 1.8 2008/12/10 19:09:44 yched Exp $. I suppose these existing Id lines will be overwritten?
  5. +function field_init() {
    +  module_load_include('inc', 'field', 'field.crud');
    +  module_load_include('inc', 'field', 'field.autoload');
    +}

    There are module_load_include () calls in field.module and field.install which are likely not needed with code registry.

  6. + * On order to keep marker rows in the database,. s/On/In
  7. field_filter_xss(): if this is really needed, it should be in filter.inc because non field modules have similar needs.
  8. + * The $variables array contains the following arguments:. s/arguments/keys. I would also remove the '$' before each key.
  9. $variables['page'] = (bool)menu_get_object();. This is node specific. Perhaps omit it.
  10. Large commented out block in FieldInfoTestCase class. Could we uncomment or remove or implement this?
  11. FIELD_SQL_STORAGE
  12. field_sql_storage.module is required?
  13. field_sql_storage_field_storage_write(). doxygen please.
  14. Lets omit history like
    + * Content module will set the weight, field name and delta values
    + * for each form element. This is a change from earlier CCK versions
    + * where the widget managed its own multiple values.
  15. OPTIONWIDGETS
  16. The comment above option_widgets_field_new() looks misplaced.
  17. typo: "add a remainder in the messages area."
  18. // Set #element_validate in a way that it will not wipe out other
    +  // validation functions already set by other modules.
    +  if (empty($element['#element_validate'])) {
    +    $element['#element_validate'] = array();
    +  }
    

    This happens over and over. IMO, it can be omitted. There is not even a NOTICE when create an array from scratch using $example[] = 'foo'

  19. + * The default theme will display N/A for a radio list and blank for a select.. '- None -' instead of 'blank'.
  20. TEXT
  21. I don't see a trim() in theme_field_formatter_text_trimmed(). Might be happenning elsewhere.
  22. I am not seeing '#text_format' key in the form element. See node_body_field() for an example. This is new for D7.
  23. FIELD.TPL.PHP
  24. + * - $node: The node object.. Only for node attached fields? Same for teaser and page.
  25. NODE.MODULE
  26. if (!empty($type->old_type) && $type->old_type != $type->type) {
    +      field_attach_rename_bundle($type->old_type, $type->type);
    +    }

    Could we move this to field_attach_node_type('update').

  27. Similarly, I see + field_attach_presave('node', $node); in node_save(). Any reason why field.module needs to go first and can't just implement hook_nodeapi_presave()? This mingling of field.module into node.module and user.module is a bit dirty. If it is an order of operations issue, lets give field module preferred treatment in node_invoke_nodeapi().
chx’s picture

About the field_sql_storage -- yes the module is IMO required and it'll change roles once of the handler patches are in.

KarenS’s picture

#14 was already fixed in bzr (http://vcs.fourkitchens.com/drupal/7-fic/revision/540).

#16 the whole business of field_new() in the Optionwidgets code should be removed, that was in the original code and is no longer used. The form_alter there now belongs in CCK, not core. I'm fixing both of these in bzr repo.

Not having heard any response to the question of changing the name of Option widgets to 'Options', except for Dries original suggestion, I'm planning to start making that change in the code.

yched’s picture

#26 / #27 : that's not the direction we defined for the field_attach API (pull, not push) : field.module knows nothing about either nodes,, users, etc..., and as such implements no node or user hooks. The whole job of being 'fieldable' is left out to the entities, mainly by calling the right field_attach_* function at the time that makes sense for them. Field module won't be able to say or do anything about contrib entities, yet they can be 'fieldable' if they call the right functions. Not a question of order (firing field stuff before regular nodeapi stuff doesn't really matter AFAICT), but a consistent direction in the API.

catch’s picture

Agree with Moshe that hook_field_attach() should take the array of objects - most core object hook_$object_load() do this now (see hook_file_load() for a lower level example which is usually attached as a field), and in the case of files/uploads it already saves a lot of queries.

I still really, really dislike field.autoload.inc, despite reading the referenced threads, but I imagine that'll have to be de-uglied in a follow-up issue.

@yched - while it makes sense for Field API to operate like that, what's to stop node and user modules from implementing hook_nodeapi_* itself to call the field_attach functions? (and user and taxonomy for that matter). That would disentangle field loading from regular object loading. I guess it depends whether we see hook_nodeapi being deprecated at some point in favour of this.

I don't see any reference to materialized views in the patch - is there a followup issue for that?

On my Drupal 7 install, I can see:

List Defines list field types. Use with Optionwidgets to create selection lists.
Number Defines numeric field types.
Optionwidgets Defines selection, check box and radio button widgets for text and numeric fields.
Text

on admin/build/modules. Since we have no CCK UI going in with this patch, what are these doing there? We currently have zero core modules which have expose no UI whatsoever, let's not introduce any new ones - especially ones which in Drupal 6 would've given you a UI for adding fields of this type with CCK. That's going to cause a lot of confusion as it stands.

bjaspan’s picture

No one ever actually replied to Ken's comments in #16. So:

field_transpose_array_rows_cols(): I actually prefer the name as is (though I did not create it). "transpose" has a precise mathematical meaning which this function implements, whereas "flip_recursive" could mean anything.

filter_xss_admin() vs field_filter_xss(): My bet is that this got created in response to a SA. Karen or Yves?

field_format() argument list: A large number of field API functions accept arguments ($obj_type, $object, $field, $items, others...). The "others" list is rarely (never?) more than one or two arguments. So, to make the list shorter using a $params array we would logically have to move all four standard arguments into the $params array. Then, all Field API functions would just take a single $params array as an argument, which would result in very poor DX and a hard to use API.

field_access(): Yes, this will probably be de-opified. Field access is being left for a future patch.

Thank you, Ken!

bjaspan’s picture

re #47: Thanks, Moshe. Here are my replies.

47.1. Addressed.
47.2. hook_field_attach_load() may not even access the database. There is no reason to assume making it multiple will be much more efficienct. That said, we plan to do it (#362024: Field Attach: Make hook_field_load() be 'multiple' like field_attach_load()), but I disagree that it should hold up this patch.
47.3. Removed.
47.4. I know nuthin' 'bout this.
47.5. Bulk deletion and garbage collection turns out to be a complex topic. I just created a separate issue for it: #367753: Bulk deletion. For now, I'll just say that no part of Drupal core handles bulk deletes in an acceptable manner so the fact that Field API has not yet implemented it either is again not a reason to hold up this patch. :-)
47.6. field.autoload.inc is necessary because the code registry is only useful for callbacks from menu, theme, etc. It is useless for API functions. This is addressed in #363967: Resolve auto-loading strategy and can be changed/removed later if necessary.
47.7. I can't think of a better term at the moment. Anyone?
47.8. Addressed.
47.9. Addressed.
47.10. Addressed.
47.11. I know nuthin' 'bout this.
47.12. I know nuthin' 'bout this.

I will also strike out those items which I think are addressed/resolved. You or others can remove the strikeouts if you disagree.

webchick’s picture

Can we please get a new copy of the patch to review now that a bunch of the previous comments have been incorporated?

bjaspan’s picture

Responding to catch in #54 (I can't handle Moshe's excellent but long list of comments tonight):

We all agree hook_field_attach_load() should be multiple and there is a separate issue for that. Not critical.

Regarding the debate about field.autoload.inc... I sense a new episode of the The DX Files approaching.

Absolutely nothing prevents node.module from implementing its own hook_nodeapi hooks that call Field Attach API functions. However, if Field Attach API is not called before other hook_nodeapi hooks, then the other hook_nodeapi hooks won't be able to see the field data. Does that matter? Don't know. node.module can make that decision.

Materialized Views is not going in with the initial Field API patch.

Regarding the List field type and Option widgets, the fact that no CCK UI exists doesn't matter. A module can still create a field using the List field type and Option widgets, just as it can create a field using the Text field type and the textfield widget.

The point about Field API being the first module with zero UI in core is a good one. Perhaps Field API should not exist in module form at all and should just be a collection of .inc files in the includes/field directory. I recall that we had a reason to make it a module... but I cannot remember what that reason was. I do observe that since it is a required module it is invisible on the admin/build/modules page, so I'm not sure whether actually being a module has any downside.

We also discussed having all the field type modules in core (text, number, options, possibly node_ and user_reference) also be required so they are always available. I still think this is a good idea. When a new D7 user tries to use this funky "create your own content type" functionality, they should not first have to go enable extra modules to get basic things like text and number fields.

bjaspan’s picture

@dries: In #44 you ask about Option Widgets vs. Options. The reason Option Widgets is named differently from the other field modules is that Option Widgets only exports widget types, whereas the others export field types and widgets types. Perhaps this is a weak reason.

In my previous comment I suggested that all core field and widget modules should be required == always enabled == hidden on the modules page, at which point it doesn't even matter to the user what they are called because they are invisible.

eaton’s picture

I'm tempted to commit the Fields API patch as is.

I've weighed in on the architectural issues I have with this patch in other locations, and I'll try to lay out my concerns with it as simply as possible for the benefit of those who didn't participate in the discussions on groups.drupal.org.

I don't have e deep enough knowledge of all the details of the patch (due primarily to its size), save the one I have focused on: the storage mechanism. Simply put, the one-table-per-field storage mechanism implemented in this patch is unacceptable for inclusion in the shipping version of Drupal core. chx and Barry Jaspan have noted that the implementation of it (the code that determines what actual physical tables are created, altered, and written to when fields are managed by administrators) is swappable. The current mechanism is easier to implement than the one that today's CCK uses, so if it is an interim 'shim' to get a working version of the patch committed while the final storage mechanism is completed, that's one thing. But unless we add that strict caveat, the patch constitutes a crippling of one of Drupal's most important pieces of functionality.

Why?
A move to per-field table storage rather than the normalized storage currently used by CCK ensures that performance will suffer, and suffer greatly. In extremely simple early testing (adding just four simple fields to a content type) the table-per-field approach is ~65% slower than today's CCK, and that number gets worse with each field that's added. The performance hit occurs on reads, writes, and updates. It increases the risk of data corruption in non-transactional engines given that every single node save will be spread over a much larger number of writes.

No benchmarks have been posted for this patch since the original ones on groups.drupal.org over a month ago, and those should be a point of grave concern.

Two solutions to this problem have been proposed: one is the addition of a new large-scale API for mirroring the results of cross-table selects (like all the fields for a node) in a single secondary table. This is a useful technique for spot optimization (drupal.org's "recent posts" tracker has benefited from it), but it is not a universal solution to solve a systemic, large-scale slowdown. It also comes with its own nontrivial overhead and tradeoffs (data latency, more danger of data corruption on non-transactional DBs, and so on). The second proposed solution is a smaller, lighter-weight module that would implement a similar solution, optimized only for custom fields. Neither solution is, as far as I can tell, included in this patch.

Both solutions, while extremely useful in certain case-by-case performance optimization scenarios, are not solutions to the underlying dangers of putting each field in its own table. Fortunately there is a solution that is accounted for in this patch: the code that handles mapping field metadata to discrete database tables, and generating the SchemaAPI definitions for them, is pluggable. Again, according to Barry Jaspan and chx, it is possible to use the current patch as a stake in the ground, then replace it with a smarter field-to-table mapping engine, the same way FormAPI was committed to core before all core modules were ported to it.

In many ways, it has been reviewed already, and it is too big to start nitpicking. I'm thinking that this time, we can refine it _after_ it has landed.

If by refine you mean implementing a different field to schema mapping engine and replacing the current version with it, I'm willing to accept that the patch as it currently stands is an incomplete interim solution that gives us a stake in the ground, even if it's unacceptable for a shipping Drupal 7.

Otherwise -- if we consider the one-table-per-field storage mechanism our new official architecture, I must voice my strenuous objection to this patch being committed. This isn't a slight on the awesome work that's gone into it, just grave concern about the ramifications of one specific component.

eaton’s picture

Status: Reviewed & tested by the community » Needs review

I'm setting this back to 'code needs review.' I would like to set it back to 'Code needs work', but that depends on the 'shim versus final' status of the table-per-field approach.

webchick’s picture

Status: Needs review » Reviewed & tested by the community

I can post some anecdotal evidence about what Eaton is saying...

Each time that we do a Lullabot training gig and show off CCK, the very first question raised by someone (usually with a rather smug smile on their face since they're some Oracle DBA with 20 years of Java experience or whatever ;)) is "Yeah, but what is that doing behind the scenes in the database, hmmm?"

When we go into PHPMyAdmin and show them that it is creating one table per content type with one column per field by default, except in the case of shared/multiple-value fields where it will auto-normalize the schema, they sit back down, impressed. We've actually had DBAs at major companies sign off on Drupal after learning this about it.

This reaction will go away if we go with the each-table-in-its-own-field approach. Especially when they learn that the only way to reach anything near previous performance levels is to move a bunch of DB logic to the application layer.

Is there a reason we can't have Barry's D5/D6 CCK facsimilie storage the default, along with materialized views of that data, for a boost of performance? Seems like that'd be the best of both worlds.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Oops. Didn't mean to reset status.

eaton’s picture

Is there a reason we can't have Barry's D5/D6 CCK facsimilie storage the default, along with materialized views of that data, for a boost of performance? Seems like that'd be the best of both worlds.

Barry's module -- per_bundle_fields -- would implement a mirror of the current CCK field structure on top of this patch's table-per-field mechanism. It's much lighter weight than the materialized views proposal, and doesn't aspire to quite as much flexibility as the full Materialized Views system. This makes it easier to analyze and test, but we would still face all of the performance problems we do today in CCK, with the addition of the 'extra queries on every insert and update, doubled data storage, and the risk of corruption' problems.

Ultimately, the addition of a duplicated layer of data on top of table-per-field storage is a side discussion. Materialized views can be implemented on top of today's CCK storage style too: it doesn't require that we decompose our tables into per-field storage.

The underlying question is, should we store each field in its own table, or should we maintain one table for each 'entity' (a node type, a user role, etc) that has a particular group of fields attached to it. Four years ago the original CCK architects opted for the latter, and while the administrative tools atop CCK took much longer to come to fruition, we are still benefiting from their original choice of storage strategies. Per-field tables is a return to one of the fundamental design problems of Flexinode.

walkah’s picture

Having just spent dinner discussing this issue with Eaton (prior to his comment) and having not said much of anything publicly about it at all, I thought I'd throw $0.02 at it:

One of the major concerns here is it drastically changes the nature of Drupal. Traditionally, we have worked very hard to ensure that Drupal works well "out of the box" in a very wide array of environments, while allowing hooks/etc for those working at extreme ends of the scale (witness: swappable cache.inc or sqlite support). What the default storage model here does is mean that for *most* use-cases Drupal won't work out of the box. (Add more than 4 fields to your content and things slow to a crawl? Enable materialized views on my mom's garden club site?)

Of all the issues I've seen/heard/deal with in CCK over the years - the dancing to keep data normalized is actually, imo, one of it's strengths. Its something that "we" spent a lot of time *four* years ago benchmarking.

I share the aforementioned grave concerns with this patch as-is

webchick’s picture

Materialized views can be implemented on top of today's CCK storage style too: it doesn't require that we decompose our tables into per-field storage.

Just for clarity's sake, that's what I meant. :)

eaton’s picture

Also, a point of clarification about my "Which way should we store things".

If the answer is, "Yes, we should store things per-entity, but there was not time to implement that in the sprint" I will say that the current approach is an excellent temporary shim to allow the APIs and the system to be tested in core while more efficient storage is readied for Drupal 7's "release". If the answer is, "No, we should store each field in its own table because that's the right way to do it," the community needs to have that discussion and settle it explicitly, the same way it did 4 years ago when the question was first posed.

We are very fortunate that the approach is a swappable implementation, not unlike the temporary 'compatability' inc file that formapi included during the early days of the patch to keep core working while the conversion work was completed.

eaton’s picture

Point of note: early versions of the discussions mentioned in this thread were captured at:

http://drupal.org/cck-status
http://drupal.org/node/17302

The decisions captured in the two above posts are actually very similar to the approach taken in the current patch: the table-per-field approach was abandoned by the CCK team for the reasons that have been raised in this post. We will be condemning ourselves to the same cycle of experimentation and correction that they already endured. Let's learn from the work that's already gone into CCK.

David Strauss’s picture

Before you brag too much about impressing Oracle DBAs, did you show them what CCK does when you reconfigure a field to be shared or allow multiple values? Did you mention how it not only dangerously moves canonical data around but also breaks existing production code without warning? Did you show them that, for many common uses of CCK (read: Views), a temporary table and filesort are nearly inevitable given the typical JOINs and ORDER BY/WHERE conditions? In any case, I'm not impressed with arguments that hinge on impressing others in demonstrations. Drupal's merits are proven in demanding production environments.

Barry's contributed module imposes minor additional overhead to each node save. Because his tables are per-bundle, the maximum overhead is one additional write per bundle (and node types each have one associated bundle). That means one additional write per node save. Tracker 2 added two additional writes per node save *and* per comment save on Drupal.org, and we've hardly seen appreciable write delays or replication problems because of it. Moreover, Barry's patch allows Drupal to maintain a data store that is both more reliable (no canonical data in flight), easier to code against (no complex API for knowing what fields are in what table), and implemented with lean code (fields *combined with* Barry's module has less code than the old approach).

I'm not convinced that the current Field API implementation -- when combined with Barry's module -- has any notable disadvantage over the current CCK approach.

David Strauss’s picture

In response to #68, the conclusion in http://drupal.org/node/17302 was to go with "normalized database tables with node caching." That is, table per field with something not unlike Barry's work (if not a higher level object cache).

eaton’s picture

Before you brag too much about impressing Oracle DBAs, did you show them what CCK does when you reconfigure a field to be shared or allow multiple values?

Yes. We did. They said that only an idiot would try to do that without treating it as a fundamental data migration task, as serious as bulk importing or exporting. I would not have put it as strongly as they did, but we did show them, told them the implications, and they considered it a non-issue for that reason.

For the record, I have no problem eliminating the ability to magically convert single-value fields to multi-value fields via the CCK UI. We don't allow users to convert number fields to text fields, or node reference fields to user reference fields. Single vs. multi-value and shared vs. not shared is just as significant of a decision and should be handled the same way, IMO.

Did you show them that, for many common uses of CCK (read: Views), a temporary table and filesort are nearly inevitable given the typical JOINs and ORDER BY/WHERE conditions?

Yes, and this patch will not solve that problem. It makes it worse, as the sprint team's own benchmarks demonstrated. Both your Materialized Views module and Barry's Per Bundle Storage module work to solve the temp table/filesort issue, but the approach they take does not require table-per-field storage, so this objection is a red herring.

Barry's contributed module imposes minor additional overhead to each node save. Because his tables are per-bundle, the maximum overhead is one additional write per bundle (and node types each have one associated bundle).

That's a deliberate misrepresentation, David. It is only one query more than this patch. it requires considerably more queries than today's CCK. For a content type with, say, three fields, it would require three extra queries beyond what CCK currently imposes. For a content type with ten fields, it would require ten more queries than CCK. For a content type with twenty fields, twenty additional queries would be required... And so on. That is not an unreasonable number of fields: many of our clients, the same high-performance sites you refer to, need that many fields to capture the metadata on their complex content.

Decomposing per-entity storage into per-field storage is, at the end of the day, not necessary to eliminate the dangerous code you warn about: the red flags you mention above all relate to single-to-multivalue migration code in CCK. Simplifying CCK's code while preserving that one feature has come at the cost of performance. In addition, the 'mirroring data in a denormalized table' approach both face similar problems to today's CCK when you change a single-value field to a multi-value field. When field meta-data changes, the 'mirror' table must be altered and all of its data updated in the same way that today's CCK schemas are altered. The code to do that will have to be written, and have to live somewhere -- pushing it up the stack to a layer of data-mirroring doesn't make it free.

In any case, I'm not impressed with arguments that hinge on impressing others in demonstrations. Drupal's merits are proven in demanding production environments.

You're deliberately ignoring substantive arguments in favor of snide responses to a single anecdote.

I would like to reiterate my earlier statement: this Fields in Core patch is a tremendously valuable piece of work, and the move to entity-neutral fields (users, comments, nodes, etc can all have fields) is a tremendous advance. The one exception, and it is a serious one, is the move to table-per-field storage. Because the Field Storage SQL module is swappable, per-entity storage can be implemented later, replacing the table-per-field approach before Drupal 7 ships. However, if you believe that should not be done, and would oppose it, I must oppose this patch.

eaton’s picture

In response to #68, the conclusion in http://drupal.org/node/17302 was to go with "normalized database tables with node caching." That is, table per field with something not unlike Barry's work (if not a higher level object cache).

That's correct. As I noted in #68, they decided on a storage model similar to what this patch, plus Barry's module, implements.

As I also noted in #68, Jonbob, John VanDyk, and the rest of the CCK team proceeded down that path, found it unacceptable once it had been implemented, and explicitly refactored CCK to use the current per-entity storage model. At the very least, that should give us pause.

markus_petrux’s picture

I would like to say that I agree with Eaton here, so I would not extend to tell the same.

Is there any reason for not building MV, PBS or whatever on top of current (CCK2) model?

merlinofchaos’s picture

I've been largely staying out of this because when I got dragged into this the the first time it went pretty badly, but I want to make sure that it's well known that I solidly agree with Eaton on this one.

David Strauss’s picture

In response to #73, I have a reluctance to use a data model that requires refactoring the data when reconfiguring fields. I'd rather accept the burden of additional writes to the database than use a data model that changes during what appears to a user as a simple reconfiguration. Some of my largest clients have accidentally broken production code through inadvertent field reconfiguration. Either we need to make field reconfiguration more "scary" to users or fix the underlying reason field reconfiguration is scary. I'd rather do the latter, and Field API addresses that concern.

I see the merits of Eaton's latest post (#71), but I'd still opt for pushing the current patch into core now. Waiting for a Field API that pleases everyone at every level will further lengthen the time core and contrib have to take advantage of newfound Field API capabilities. And if Field API should migrate to canonical per-bundle storage, we would have to implement an impressively robust migration architecture. The current CCK conversion code is not sufficient. Barry has lofty intentions here, but they are rather difficult to implement.

markus_petrux’s picture

I would say the fact that users can brake the DB is not a reason for implementing a model that's not so efficient as the one that's been working to date. Rather, I would introduce whatever is needed to avoid the problem. For example, additional checks could be implemented to ensure certain operations cannot be done, or maybe even add some kind of batch handlers to deal with DB changes that cannot be resolved directly from the 'Manage fields' page (ALTER TABLEs affecting thousands, millions of rows may take a lot), etc.

eaton’s picture

FileSize
51.48 KB
53.56 KB
29.43 KB
25.83 KB

For those who aren't deeply immersed in CCK's innards, I've prepared a couple of illustrations that show the current CCK storage mechanism, the table-per-field mechanism implemented by this patch, and the methods proposed for both Barry's 'per bundle storage' module and David's Materialized Views module. The latter two are not part of this patch but have been proposed as possible solutions to the serious performance issues.

CCK Today [link]
This illustration shows the current storage mechanism in CCK 2.0 for Drupal 6. Each node type gets its own table to store the data for its custom fields. Two kinds of fields (multiple-value fields and fields that are shared between content types) can't be lumped into these combined tables, however. CCK has problems in two specific situations:

  1. An administrator decides to make a normal field like [Custom Field 1] into [1:N Field] or [Common Field 1]. CCK not only creates a new database table, it has to copy all of the old data from the old [Custom Field 1] into the new table, then delete it from the combined table.
  2. An administrator defines a query that requires the database to sort the results using the values in [Entity 1] and [Custom Field 1]. Because those two values are in separate tables, and there is no way to create a speed-boosting index to accelerate this query. The more content the site gets, the slower and slower it will become. We see this when we create a view that sorts nodes by creation date and a CCK field. The real pain begins when it also needs to sort on [Common Field 1]; the more data is split up across tables, the more punishing those sorting scenarios become.

The Fields in Core patch [link]
This illustration demonstrates the way data is stored with the Fields in Core patch. Every field gets its own separate table hanging off of the entity it's attached to. This makes turning [Custom Field 1] into [1:N Field] or [Common Field 1] really easy, because they are all stored in the same way already. Unfortunately, data is now stored in even more tables, requiring more inserts, more updates, and more joins to assemble all the data for a given entity like a node or a user.

This means that the sorting issues we discussed above are amplified: there is absolutely no way to create an index that speeds up sorting or filtering based on several fields at the same time, because they are always split into separate tables. We've traded the "changing single fields to multivalue fields" problem for the "queries are now punishingly, brutally slow" problem that was noted in the earlier discussions on groups.drupal.org. A content type with 4 fields, for example, was 65% slower than today's CCK. This is the way the patch, as it appears in this thread, operates.

The Fields in Core patch, plus Per-Bundle-Storage [link]
As David Strauss mentioned above, Barry Jaspan wrote a module called Per Bundle Storage. This illustration demonstrates how it works. Fields are still stored in their own tables, as with the second illustration, and these per-field tables are used whenever data is created or updated. The Per Bundle Storage module also creates an additional table, however: it mirrors the storage style of Today's CCK, and is kept in sync by inserting, updating, and deleting whenever entities like nodes and users are saved.

As long as other modules use the Bundle table when running their select queries, select performance would be equal to that of today's CCK. However, inserts and updates would need to be made against the Bundle table as well as all of the individual field-specific tables. If other code wrote directly to the field-specific tables, or wrote directly to the Bundle table, the data would fall out of sync. In addition, this approach would suffer from the same two problems noted in the CCK Today description: sorts that touch both the node table and the bundle table would still be un-indexable, and changes to the field definitions that require data migration in CCK today would require the same tricky data migration in the bundle table. It's possible that the entire bundle table could be cleared out and regenerated from scratch when schema changes are made, but this would simplify the code at the expense of site uptime: because all modules would be doing their selects against the bundle table, views and other listing pages would have no data until the Bundle table finished rebuilt from the individual tables.

The Fields in Core patch, plus Materialized Views [link]
This fourth approach is not implemented yet, but has been proposed in IRC discussions and groups.drupal.org. The Materialized Views module, which was present in the bzr repository from the fields in core sprint but not in the final .patch file here in the issue queue, provides an API on which this type of system could be built.

How does it work? It's a more aggressive version of the Per Bundle Storage concept; it can combine the fields native to a given entity (like a user or a node) along with the custom fields that are attached to it into a single flat table. This eliminates the indexed sorting problems that plague all three approaches above, with the lone exception of the [1:N Field], which still has to live in its own table. The Materialized Views module supports even more aggressive approaches; a single Materialized View table could hold node data, node revision data, all a node's (single-value) field data, the comment statistics for the node, the user account information for the node's author, and more. (This is the approach used on drupal.org to speed up the tracker page -- work David Strauss spearheaded to great effect).

What's the downside? Like the Per-Bundle-Storage illustration above, Materialized Views require that everything be kept carefully in sync. The more data that's kept in the flattened table, the more work has to be done to keep it up to date and the more frequently it has to be synced. Managing the rules that trigger the updates and ensuring that they happen in a timely fashion is not a simple programming task. In addition, as with the Per-Bundle-Fields, any time the underlying field structure changes, the entire materialized table must be altered and rebuilt. This is a time-consuming operation, more than equivalent to the time taken by the 'Alter the schema' gotcha described in the CCK Today section. For some sites it is still a huge performance win, but it doesn't come for free by any means.

The Upshot
CCK today is as optimized as it can get without altering the node table itself. However, it has some serious gotchas when building common views with large data sets, and migrating data from single-value to multi-value fields is a big task. The Fields in Core patch as it stands simplifies CCK's code by storing everything in its own table, as if it were a multi-value field already. It's a big win for code simplicity, at the cost of amplifying the performance problems dramatically. the Per Bundle Storage and Materialized Views proposals would implement additional tables on top of the Fields In Core structure to allow more efficient select queries, would add more work on inserts and updates. Also, neither is part of the patch as it currently exists.

merlinofchaos’s picture

Point of note: I believe the requirement for data migration is being overstated. I don't believe this should be a requirement.

eaton’s picture

I see the merits of Eaton's latest post (#71), but I'd still opt for pushing the current patch into core now.

Thanks for noting that, David -- I really appreciate it and I don't want any of the conversation to appear angry or heated. I'm just concerned about the implications and don't want to let it slide without things being ironed out and discussed clearly.

As long as we consider "canonical per-bundle storage" a requirement for core that can be developed after this patch is committed, rather than a pie-in-the-sky future dream, I'm perfectly fine with the patch going in as it currently stands so that we can flush out bugs, module developers can start working against the API, and so on. The advantage of the patch's current implementation is that we can swap out Field Storage SQL module for the 'Today's CCK' storage style without breaking modules written against its API.

And if Field API should migrate to canonical per-bundle storage, we would have to implement an impressively robust migration architecture.

If Drupal 7 ships with canonical per-bundle storage as its default, I'm not sure why this would be any more daunting than per-field storage. Today's CCK uses it per-bundle storage: converting legacy Drupal 6 CCK data to this patch's per-field storage would require just as much migration code, no?

dww’s picture

Eaton++

I haven't had time (and don't now) to closely review the code here, but it's obvious that it is a tremendous accomplishment and an extremely valuable contribution. I support committing something now to make it easier for the rest of our development process to improve it, and most of what's been described here is a huge win for core.

However, if table-per-field is not just a temporary placeholder, but the "it should now and forever work like this" position, I too must be opposed to this going in as-is. If a simple site with relatively modest custom field requirements will need materialized views to prevent grinding to a halt, we're moving in the wrong direction.

The objections about CCK's current data migration behavior potentially causing havoc is an argument about the current CCK UI, not its storage mechanism.

markus_petrux’s picture

Followup to #77 / CCK Today / 1:

An administrator decides to make a normal field like [Custom Field 1] into [1:N Field] or [Common Field 1]. CCK not only creates a new database table, it has to copy all of the old data from the old [Custom Field 1] into the new table, then delete it from the combined table.

I would say that's what one would have to do if it was not CCK but a custom application. It's a general issue when changing DB designs.

To avoid problems when tables are big, maybe this could be approached generating the instructions to perform the DB changes than can be resolved using some kind of batch processing.

Dries’s picture

@dries: In #44 you ask about Option Widgets vs. Options. The reason Option Widgets is named differently from the other field modules is that Option Widgets only exports widget types, whereas the others export field types and widgets types. Perhaps this is a weak reason.

I'd prefer to get rid of the 'widget(s)' part in both the module name and the end-user facing name. I think it is a weak reason -- I don't think the end-user needs to deal with those semantic differences. We can simply document these in the module.

In my previous comment I suggested that all core field and widget modules should be required == always enabled == hidden on the modules page, at which point it doesn't even matter to the user what they are called because they are invisible.

I agree. The should be required and hidden, IMO. I still think we need to clean up the module name.

catch’s picture

@bjaspan's #58
I realise materialized views isn't in this patch, hence asking where the follow up issue is. Storage is the primary concern people have with this, and while I personally think either the per-bundle storage module or materialized views are promising if they're available to any average site admin (i.e., in core, and with support for both the most common core queries and Views by default, despite the extra writes involved), their absence from the patch means we're entirely reliant on future issues to resolve this, and it'd be good to have one to point to. Plus the last time I discussed the per-bundle storage module with bjaspan in irc, he said there were no plans to have it in core - which runs the risk of leaving average site builders in the lurch, since materialized views in core, along with sensible defaults and built-in Views support is a big undertaking in itself.

In terms of the text/options modules - yes, let's make them required and hidden. edit: cross-posted with Dries, so +1 then.

chx’s picture

Quite a conundrum. One thing that somehow have not got mentioned lately: fields are not updateable http://groups.drupal.org/node/17758. The very definition of field vs instance is that the things you want to change are going into instance. A contrib module can be written to update, of course.

With that said, I think the easy solution, to ease the fears of everyone is to make a pledge that we will put the per bundle storage module into core as a followup. Barry, is there anything that would stop this?

moshe weitzman’s picture

Basically, we need to hear from KarenS and/or yched on this. They are the ones who recommended per field storage based on pages and pages of issues, and nights and nights of pain with per bundle storage. yched even built a batch API in order to support the data flight needs of per bundle, and he still prefers per field. None of us are as informed as them about the perils of per bundle. At the sprint, they preferred per field and made a good argument for it. I think some folks are idealizing per bundle and D6 CCK.

Also, we will probably group node_save() into a transaction in D7 thus eliminating worries that "data could get corrupted due to many writes".

bjaspan’s picture

re #85:

Fields are partially updateable, though field_update_field() is not yet implemented because even *with* per-field storage there are complexities we have yet to fully resolve. We're close, though.

Putting per-bundle storage into core would not address eaton's concerns.

re #60-84:

I'll respond later today.

eaton’s picture

None of us are as informed as them about the perils of per bundle.

I'm eager to hear Karen and yched's thoughts on this. I want to note, though, that the tangly Batch-API driven migration code you refer to is required even if we use per-field storage and materialized views. It will live in the Materialized Views module instead of the Fields module, but it will have to exist as long as we allow people to change single value fields to multi-value fields, or to turn one-entity fields into shared fields.

We are not saving ourselves the work of writing and maintaining that code by moving to per-field storage, just punting that work to the optimization layer, making things many times slower in the process.

I stand by the assertion that converting single value fields to multi-value fields should not be supported in the API or the UI. It's a task for an enterprising third-party, and should be treated as a data migration task (moving one field to a new field) rather than a settings change.

chx’s picture

Because the Field Storage SQL module is swappable, per-entity storage can be implemented later, replacing the table-per-field approach before Drupal 7 ships. However, if you believe that should not be done, and would oppose it, I must oppose this patch.

This is why I think Eaton would be fine with per bundle storage.

eaton’s picture

This is why I think Eaton would be fine with per bundle storage.

If the native storage schema provided by the field_storage_sql module is per-bundle rather than per-field, yes. And I'm even okay with per-field storage as a temporary shim that lets the API go into core while we implement per-bundle "canonical storage", as David Strauss says.

That would allow materialized views as a spot performance optimization on top of CCK's schema on high-traffic sites, but would preserve the reasonable "native" performance of today's CCK for most sites.

chx’s picture

It would seem that committing Per bundle storage and #363787: Plugins: Swappable subsystems for core and then setting up a handler to store into the per field storage in case there is no set maximum for multiplicity and per bundle storage otherwise would be a performant solution. Now we just need opinions from Eaton, Barry, Karen and yched on this plan.

Edit: to make it absolutely clear, my solution is to store PBS only or per field only but never both. When there will be an upgrade system it needs to batch anyways and it can use the storage system, happily oblivious to how they are stored. Introducing the tables to Views will be a bit more work but CCK has this already mostly.

eaton’s picture

I concur. To clarify, implementing what chx describes can be done just by changing the field_storage_sql module. No additional layers on top of it would be needed to preserve CCK's current performance profile. Materialized Views could still be added on top for high-traffic sites that need hot spot optimization.

Ideally, CCK's current approach (non-shared single-value fields in one table per entity, shared and multi-value fields in their own tables) would be a followup patch to this one. Future improvements are also possible, like rolling 'limited count multi-value fields' into the combined storage table as well, but the key point is making this combined table the native storage mechanism, not just a layer of mirrored data on top.

With that as a required followup patch, I would wholeheartedly support this patch, and am willing to help implement of the followup patch.

KarenS’s picture

Actually, re #86, that makes it sound like the decision to go with per-field storage came from me or Yves. When we sat at lunch and went around the table and voted on our preferred solutions, I said my preference was to go per-field on multiple/shared fields and per-bundle on the others and to alleviate the problems of making changes to that schema once the tables were created I would allow users to make a decision at the time the field was created whether it would be potentially sharable (and if so create it as a separate table from the beginning) or not potentially sharable (and go into the bundle table). As I recall, the argument from everyone else against that was that it wouldn't allow people to change their minds later and that that was a deal-breaker. It was later, when David said he thought MV would provide a way to do per-field storage without the performance problems that I agreed to that idea.

There are lots of wins for per-field storage, it's not just that it makes the API simpler, it also makes it trivial to change most aspects of the field because the field is always stored the same way no matter what. You can decide later to share it or make it multiple or change it to non-multiple or whatever with absolutely no change in the way it is stored. It is absolutely predictable. You don't have to figure out a way to tell where the data is stored, it is always in the same place. And because of those things, there is virtually no way you will accidentally destroy data by changing a setting. There are many many ways that the per-field storage assumption simplified the code, so I was totally in favor of that solution, if it would work.

I think everyone agrees we can't have our core solution allow for changing the database schema -- it has to be locked down one way or the other. If we lock it down any way other than per-field we have to create a UI that prevents some kinds of changes and communicates to users when they create fields exactly what things they can later change and which they cannot, and help them make an irrevocable decisions about things like whether or not the field should be potentially shared or potentially multiple in the future.

One of the flaws in the current UI is that it is too easy to make changes that break things, and we have to eliminate all the places where that can happen. With per-field storage there are very few changes that will break things, so that job is made much easier and the UI can be simpler. Anything other than per-field storage introduces a lot more complexity into the UI and makes it harder for end-users who don't know anything about DB storage to understand. So the biggest impact of that change is probably in the UI.

I'm not at all opposed to going with some solution that uses an optimized combination of per-field and per-bundle storage. I never was opposed to that. I was waiting for MV to mature to the point where we can tell if it will do what we need, and for people who know far more than I do about the impact of these things on high-performance sites to weigh in with recommendations. I am more than willing to go whichever direction makes the most sense.

bjaspan’s picture

This argument is not necessary.

"We" have no moral objection to hybrid storage. The Field API code sprinters voted 6-1 in favor of per-field storage because we were already making enormous changes, and we knew that per-field storage could be implemented correctly in reasonable time. Also, we (or at least I) remain unconvinced there is a valid use case for per-bundle storage for Drupal (we have no data supporting it; the 65% number from earlier does not count). However, we made the storage subsystem pluggable so additional modules can be written to replace it. If someone implements a superior field storage module, great. If someone tries to implement any important/useful field storage mechanism and discovers that the Field API is somehow designed in a way that makes it impossible, we can change the Field API to accommodate. There is no problem here.

I should warn whomever wants to implement the hybrid storage engine and make it the default, however, that a different set of very vocal people will start hyperventilating when you tell them that you cannot change the cardinality of a field once you add data to it. We've been down this road many times.

As for the purported performance killing nature of per-field storage, I'd like to point out that D5 CCK used per-field storage 100% of the time, too. Yes, it saved data in per-content-type tables. But node_load() performed a separate SELECT query for every field for every field-cache miss. node_save() performed a separate INSERT/UPDATE for every field, every time. Views generated a separate JOIN for every field, every time. Yes, even for fields in per-content-type tables. Everyone thought we were doing this cool efficient thing and were very impressed, except we weren't doing it. Surprise! Only with D6 did we actually implement the optimization of using single queries for per-content-type tables and for merging redundant JOINs in Views.

Sure, we'd like to avoid a performance regression. We'd also like to avoid a functionality regression. We can't avoid both in a core-worth fashion until we implement field migration, which has its own set of complex, unresolved issues.

In the meantime, we must either commit this patch with its current storage mechanism, or wait an unknown length of time for someone to produce meaningful performance data and write a new storage mechanism. I'm pretty sure I know Dries' choice.

alex_b’s picture

#93 - "I think everyone agrees we can't have our core solution allow for changing the database schema -- it has to be locked down one way or the other. If we lock it down any way other than per-field we have to create a UI that prevents some kinds of changes and communicates to users when they create fields exactly what things they can later change and which they cannot, and help them make an irrevocable decisions about things like whether or not the field should be potentially shared or potentially multiple in the future."

I'd like to carry on at Karen's point: Assuming that there is a significant performance issue with per field storage only, would it be a reasonable solution to have the site builder take the decision between performance and flexibility? In the first option the field would be locked down as per bundle storage, not shared, in the second option it would be stored per field.

(*) single value, non shared (fast, can't be changed later)
( ) single value
( ) multiple value 

I understand that this approach isn't exactly ideal as it adds another setting to the build UI and it would add more complexity to the code but is there any other reason why this would be a bad idea?

eaton’s picture

As for the purported performance killing nature of per-field storage, I'd like to point out that D5 CCK used per-field storage 100% of the time, too. Yes, it saved data in per-content-type tables. But node_load() performed a separate SELECT query for every field for every field-cache miss. node_save() performed a separate INSERT/UPDATE for every field, every time. Views generated a separate JOIN for every field, every time. Yes, even for fields in per-content-type tables. Everyone thought we were doing this cool efficient thing and were very impressed, except we weren't doing it. Surprise! Only with D6 did we actually implement the optimization of using single queries for per-content-type tables and for merging redundant JOINs in Views.

First, CCK in Drupal 5 did not user per-field storage. Its SQL was inefficient, and was optimized in CCK 2 to eliminate that problem. Moving to per-field storage not only rolls that progress back, it makes it impossible without the overhead of additional middleware layers like materialized views. Second, please refer to #77, and the explanation of the second 'pitfall'. The performance-killing quality of CCK is not simply multiple joins or selects: the gravest danger is the inability to index across table boundaries for sorts and filters. In this way, this patch is far worse than both CCK in Drupal 5 and CCK in Durpal 6.

I tried to be very explicit about the precise nature of the performance dangers in my post: as David Strauss has noted, the entire reason for the Materialized Views system is avoiding the penalty for unindexable sorts and filters. Per-field storage will not only roll back the hard-won efficiency improvements in CCK 2 and Views 2 query building, it will ensure that we encounter the unindexable sort penalty on every content type.

Also, we (or at least I) remain unconvinced there is a valid use case for per-bundle storage for Drupal (we have no data supporting it; the 65% number from earlier does not count).

Do you mean that there is no data suggesting that per-field storage will be slower than CCK 2.0's per-entity storage? This sounds like a case for benchmarking. To date, though, the only data posted -- as well as common sense applied to SQL -- points to a considerable slowdown.

I should warn whomever wants to implement the hybrid storage engine and make it the default, however, that a different set of very vocal people will start hyperventilating when you tell them that you cannot change the cardinality of a field once you add data to it. We've been down this road many times.

The difficulty of migrating or rebuilding data when schemas change does not go away with per-field storage: it's just pushed to the performance layer with Materialized Views. If you're saying that that mechanisms won't be necessary at all, and that the earlier benchmarks posted by the Fields In Core team were incorrectly pessimistic, then we should discuss that.

In the meantime, we must either commit this patch with its current storage mechanism, or wait an unknown length of time for someone to produce meaningful performance data and write a new storage mechanism.

I've reiterated that I'm in favor of committing the patch as long as we commit to implementing a better storage implementation before D7 ships. I'm not interested in holding up this patch while we wait for the 'perfect' storage system, but I and others consider the current per-field storage approach unacceptable for final use in the shipping version of Drupal. Obviously there is disagreement on this point.

joshmiller’s picture

Status: Needs review » Reviewed & tested by the community

+1 for removing the word "widget"

joshmiller’s picture

Status: Reviewed & tested by the community » Needs review

umm how did I change the status?

SORRY!

syscrusher’s picture

I haven't yet tested in great detail, but I did try this as an in-place upgrade to an existing D7 install from UNSTABLE-4. Running update.php doesn't seem to invoke the install hook for the new field module, so you get fatal errors about missing field_* functions. I can fix this manually, but right now it was just easier to reinstall Drupal from scratch, with an empty database, which worked fine.

Given that I was upgrading from UNSTABLE-* to a patch that isn't even committed to HEAD, I am fairly confident that my "bug" is a non-starter that won't get fixed, and shouldn't get fixed. But it's still a data point, and I felt it should be documented. If the reply is, "no need to fix this", that's fine.

This looks like great work so far, albeit not yet production-ready. I have mixed feelings about the database schema issues and will refrain from commenting until I have had more time to think about it. I will definitely follow this issue with great interest, however, as my most visible module is strongly impacted.

Kind regards,

Syscrusher

Aron Novak’s picture

FileSize
5.03 KB
4.21 KB

I would like to share my benchmark results:

Load 100 nodes with 10, 20, 50, 100 fields:
http://img.skitch.com/20090202-gse9ya36ret61esucaicfcc4gu.png

Create 100 nodes with 10, 20, 50, 100 fields:
http://img.skitch.com/20090202-kcqmbh1b3wmtp7xhfkj4kt3pjd.png

The results are in milliseconds.

You can reproduce the tests with the help of attached .test files.
It would be good if fields-in-core experts could review the benchmarking code.

The results show that creating and loading nodes with fields is significantly slower in Drupal 7 + FieldAPI patch than with Drupal 6.9 CCK.

Test setup:

  • Free system memory before the test: approx. 800 Mbyte
  • Differences of repeated execution were under 10%
  • The fic test suite started to consume lots of memory.

Lots of code is borrowed from CCK Crud test.
To reproduce:
cckbench.test is for Drupal 6.x and Simpletest 6.x-2.x-dev.
fieldsbench.test is for Drupal HEAD and patch from #41

I am also working on a benchmark for Drupal 7 + Fields in Core patch with per bundle storage module (http://groups.drupal.org/node/18302)

syscrusher’s picture

Here is a question about the design:

For multi-valued fields, is there any concept in the field API of a *weight* that can be assigned to each *value* of the field within a single entity instance?

For example, if the field type is an image, and it is multi-valued, the content creator should have a way to reorder the photos in the list.

This applies also to my links_related module, which allows users to attach URLs to arbitrary content types. As you can imagine, I plan to migrate that functionality into this new core feature. But the list of related links is an *ordered* list based by default on the order in which links were added, but re-orderable by the content editor.

I believe, if I understand the new core feature correctly, that the "delta" field in the existing schema will take care of this need, but can someone clarify how this would work?

Also, are multiple storage engines allowed at the same time? That is, can a contrib module provide its own storage engine that manages data that is external to Drupal core's physical database schema? From the online documentation, this *seems* to be the case, but I want to be sure I've understood correctly.

Kind regards,

Scott

chx’s picture

Status: Needs review » Reviewed & tested by the community

I put this back to RTBC because what Barry and Karen says is, basically, "no, we have no moral objection to hybrid storage. Go ahead and write it. If the Field API needs to be changed to make a hybrid storage module possible, we'll change it.". And what I say, as the guy who made sure that the storage engine is pluggable, the update process already needs to batch. Changing cardinality or any other field settings need to batch load/save every affected node and this needs to be written yet (and we said so during the sprint, and I am less sure how much this got communicated to the outside world). This is exactly because of the pluggable storage engine -- no code outside the storage modules have any clue whatsoever to how any field is stored and if you change any field settings that becomes a different field that might be stored different. Is this a functionality regression? Maybe. But it can be fixed and it'll be a lot more correct than the current very scary code in CCK. And, it allows storage in webservices :)

catch’s picture

@Aron, we know that a straight node_load() in HEAD is slower than in D6 due to db_select() - so for the benchmarks to be viable, you'd need to run the test with D6 minus CCK/ D7 minus Fields in Core installed to see what the base difference is.

eaton’s picture

Status: Reviewed & tested by the community » Needs review

It's also worth noting that the benchmarks Aaron points to are not the worst case scenario for per-field storage.

With a large pool of nodes, a query that sorts on three or four of those custom fields cannot be indexed... which greatly amplifies the CCK snafu discussed in #77. This is totally separate from the number of selects and joins needed during a simple node_load(). Aaron, thanks for taking the time to take this stab at the benchmarks. It's a very important aspect of the patch.

eaton’s picture

Status: Needs review » Reviewed & tested by the community

Didn't intend to push it back to Needs Review. I don't believe it's ready for commit, but it's obviously a matter of opinion at this point, and getting into a 'status tug-of-war' won't help anyone.

yched’s picture

We (the FiC 'team' and the community) can definitely look into improving the default db storage mechanism after the current patch is committed.
Taking the 'per field' (PF) decision in the sprint did let us move forward and do a *tremendous* cleanup and refactoring of CCK code and concepts, and get rid of storage-structure specific assumptions all around the code. I cannot really start to summarize all the issues that got simplified or abstracted out. From hair-pulling to reasonably fluent.

Moving to back to 'per bundle' (PB) by default should be much more doable with the current level of code separation. It *will*, however, bubble up to some API-level concepts
(what is 'shared' - what is 'migration' - what does field_update_field() allow).

PF storage has been the conclusion of both the Chicago DADS sprint in Feb 08 and the Boston FiC sprint in Dec 08. The whole issue was rehashed from the start both times with different attendees, and in between in discussions on g.d.o. In fact the discussion has been rehashed *so* many times it's hard to keep track. It basically always boils down to user experience (allowing people to change their minds on stuff like 'cardinality' and 'shared') vs maintainability and performance of the db storage.

As Karen said, the collective choice of moving to 'per field' (PF) in Boston relied on the assumption that an MV-like solution (or Barry's PBS, which came after) can answer the performance issue. I myself am not enough of a db guru to have a definitive opinion about it. It was, though, a deliberate acceptation of higher write costs vs storage predictability and user flexibility. We know from the CCK issue queue that there is a semand for more flexibility, not less. Putting more restraints on users makes people scream (heck, they cannot stand that changing a field machine name is impossible - so, cardinality, shareability ?)
Putting currently basic operations aside as being 'migration' is not a solution. Someone will need to handle this and offer users the flexibility. CCK contrib ? another contrib module ? Doesn't really change the fact that users will want a way do do this easily, and the lower in the APIs and concepts this is supported, the less will break when they do.

Also note that a notion of intermediate tables duplicating the data in 'more-efficient-to-query' tables was suggested by Dries at some point in spring. David's MVs are the result of his own considerations, leading him to the same (only more specified) conclusions. So the idea was in the air. Obviously, not a definite sign that this is *the* way to go.

I'd also like to warn against the 'it worked in CCK D6, why don't we just keep it as is ?' arguments. They don't stand, Fields in core is not CCK.
- We're moving from a UI-driven, user-triggered concept, to an API with fields primarily defined in code, and a UI being just one way to define them or act on them.
API means anything can change at any time, and we have to clearly define what's allowed and what is not. + batch API is not an option at this level.
- Everyone agrees we do *not* want field migration on the main data-storage
- Yet we have to preserve a maximum of the features users are accustomed to...

merlinofchaos’s picture

Users scream about stuff. So it goes, we're pretty used to it. People who don't understand the complexities of schema changes aren't qualified to have their screaming heard. People have come to accept that view names can't change, and yes there's the occasional whine about this (and some guy actually tried to convince me to use database values, heh heh heh) the fact is, that's not the real argument. If we implement every feature because users scream about it, we lose and we know that.

bjaspan’s picture

FileSize
324.96 KB

Rerolled the patch from bzr for the latest tweaks.

chx’s picture

I don't believe it's ready for commit

But why? You said you want per bundle storage per default. Every sprinter answered, sure, go ahead, it was just simpler for the sprint so it's per field in this patch but nothing stops further patches to happen. You said you are willing to work on that. I will help. So... what's the problem? (the only disadvantage was a perceived harder data migration which is a red herring as I pointed out already)

alex_b’s picture

Status: Reviewed & tested by the community » Needs review

The results of Aron Novak's benchmarks in #100 should raise a flag. They are strengthening the argument that the patch at hand is not performing well enough for a default behavior of Drupal core.

From my point of view, flexible field configuration isn't worth the scale of performance drawbacks that #100 suggests: According to the benchmark, in Drupal 7 + FIC patch, creating 100 nodes with 20 fields would take 3.5 half times longer than in Drupal 6+CCK. catch pointed out in #103 that the node load tests are flawed, so I won't cite them here. (I would love to see even more benchmarking and more eyes on the benchmarks in #100.)

#94

I should warn whomever wants to implement the hybrid storage engine and make it the default, however, that a different set of very vocal people will start hyperventilating when you tell them that you cannot change the cardinality of a field once you add data to it. We've been down this road many times.

This makes me think that there is not enough consensus that we will have to address the storage engine dilemma in core once committed nor how the dilemma needs to be addressed. I am saying this with an eye on eaton's comment in #67 and chx comment in #109. I guess my biggest concern is that we could get stuck with a Field API in core that does not perform and without an agreement on how it needs to be fixed.

Therefore, I don't think this patch is ready to be committed. All in all, committing without a solid concept on how to follow up later could cost us even more time.

While I have to voice my grave concerns I'd like to make a point in saying that I do really appreciate the great work that has gone into this effort. I am really happy to see this huge improvement to Drupal getting together.

David Strauss’s picture

@eaton You are technically correct that CCK D5 did not exclusively use per-field storage. But Barry's point is that, despite the per-type tables in CCK D5, fields were saved, read, and otherwise queried as if they were in individual tables.

As for the schema modification issue, I don't agree that the a potential one-time migration during the D7 upgrade process is equivalent to the current issue of needing to migrate data on field reconfiguration. An upgrade is exactly the sort of time such a change is acceptable, even expected.

@Aron Novak What was the field configuration you used? Was node caching in place? Could you run the same benchmarks on CCK D5?

markus_petrux’s picture

Looking at the latest patch, there seems to be no support for non-consecutive deltas(*). Will this be added at some point? If yes, I guess it will affect the following issue of CCK2:

- #196421: Deleting unwanted multiple values / multiple values delta issues
- #119102: Combo field - group different fields into one

...or that both things can be implemented in separate moments? Which is the vision on this of the FiC team?

(*) In fact, it is not exactly "non-consecutive deltas", but the ability for multiple value fields to deal with empty items. This is related to the way field_set_empty() works and the fact that items cannot be removed when empty, but when the user marks them for removal. More or less... this is needed for something like the multigroup where several multiple value fields need to be in sync with themselves.

Dries’s picture

I will go ahead and commit this patch later today.

I think we need to move forward with this patch, so other people can proceed with related work (e.g. get the UI in core, finish the documentation, etc) or take advantage of the new API (e.g. eliminate the profile module, explore taxonomy terms as fields, etc). What is important at this point, is that we distribute and amplify the remaining work. Committing this patch is the best strategy, because it will enable us to start driving innovation, rather than waste time trying to stay in sync with HEAD.

I will consider and accept patches that implement performance improvements when they emerge. It looks like we have people that volunteered to work on these improvements so we should be in good shape. It is important that we'll work on the performance improvements _now_ rather than 3 months from now.

eaton’s picture

But why? You said you want per bundle storage per default. Every sprinter answered, sure, go ahead.

If that's the case my concerns are lessened; from the last set of posts, I understood that there was disagreement about whether a performance problem exists at all. If I've misunderstood, my apologies.

The patch is going in, so there's no reason to keep up the back-and-forth in this thread. Work will begin on more detailed performance benchmarks and the changes to field_storage_sql.

Dries’s picture

Status: Needs review » Fixed

Committed to CVS HEAD. Thanks for everyone who helped -- now let's make it even better. :)

Damien Tournoud’s picture

Category: task » bug
Status: Fixed » Active

Congratulations! HEAD is now broken for both SQLite and PostgreSQL.

SQLite:
- the first Field test results in a An HTTP error 500 occurred.

PostgreSQL:
- won't install anymore: PDOException: INSERT INTO field_config_entity_type (type) VALUES (:db_insert_placeholder_0) - Array ( ) SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "field_config_entity_type_type_key" in _field_sql_storage_etid() (line 72 of /var/www/Drupal/Core7/modules/field/modules/field_sql_storage/field_sql_storage.module).

Damien Tournoud’s picture

Title: Field API initial patch » HEAD BROKEN: Field API initial patch
mgifford’s picture

So just ran a cvs update for my Drupal 7 sandbox and got "Fatal error: Call to undefined function field_attach_load() in [snip]/modules/user/user.module on line 218"

I've not been playing with that module so I'm assuming it isn't me.

Mike

Damien Tournoud’s picture

@mgifford: 7 to 7 updates are *not* supported. Please wipe your db (or at least the registry in your case) and try again.

mgifford’s picture

Thanks Damien,

Deleting the registry records didn't work:
TRUNCATE TABLE `registry`
TRUNCATE TABLE `registry_file`

However, wiping the db & starting fresh did it again.

Mike

David_Rothstein’s picture

Congratulations on this getting in - great news!

However, not only is HEAD broken right now, there are also failing tests... 7 of the field attach tests are currently failing, so the testbot seems to be going around marking every patch on d.o. as 'code needs work' because of it :(

Aron Novak’s picture

FileSize
2.38 KB
10.95 KB

I did some more benchmarking, just to see the whole picture.
First, according to catch's suggestion, i measured plain node creation / load (without cck or fields) on D5.x and D7, so the results can be used as a base.
Here are the results:
D6.9
------
100 nodes are created in 195.38 milliseconds.
100 nodes are loaded in 131.26 milliseconds.

D7
---
100 nodes are created in 309.73 milliseconds. (~1.5X slower than 6.9)
100 nodes are loaded in 554.86 milliseconds. (~4X slower than 6.9)

It's clear that D7 is much slower in node handling, just exactly as catch stated. Let's compute an offset from these:

  • load offset: 554.86 - 131.26 = 423.60 ms
  • create offset: 309.73 - 195.38 = 114.35 ms

While this is not totally correct (because the DB:TNG overhead affects the field inserts as well), we can use it as an approximation to correct the difference between D7 and D6:
http://img.skitch.com/20090203-x9w1pyqam9xx4iic3caipycc2g.png

Here the corrected results:

Loading nodes D7/FIC vs D6/CCK http://img.skitch.com/20090203-dp4awuytbn71rpcgi5heenk543.png
Creating nodes D7/FIC vs D6/CCK http://img.skitch.com/20090203-ewqex2mmrtfhs7894ghdgxnfy.png

The other thing is PBS (Per-Bundle Storage). I wrote a test file for that too. (I did an ugly hack to write my own special edition of node_load, just to grab the additional field data from PBS table. This code is just for testing, this is definitely useless in the wild.)
The results are below:
Tabular:
http://img.skitch.com/20090203-k86chmjdfd1n567ubrmiduaipa.png

In graphs:
Loading nodes D7/FIC vs D7/FIC+PBS http://img.skitch.com/20090203-d5nujnf7ctu224fm3pcjcc36hk.png
Creating nodes D7/FIC vs D7/FIC+PBS http://img.skitch.com/20090203-pw426sx16fc6jgh7kgm2fpi1sj.png

It's not surprising that if we store the fields in one table, it can be fetched really fast and it does scale well.

I found a problem, when I tried to create 100 fields, PBS failed to create the related MySQL table, it seems to be a storage engine-dependent limit.
At PBS node creation is slow because of the data mirroring. This may be improved in the future, so it's only valid for the current status.

Personally I'm really glad that Field API will be committed.The above results are only for further considerations.

Test conditions: same as #100.

To reproduce:
fieldsbundlebench.test file is for 7-fic (bzr repo), turn on pbs module as well.
nodebench.test is for both D6.x and D7, it measues the plain node performance.

catch’s picture

Aron, could you do me one more benchmark? D6 loading vs. D7 loading with a looped node_load() vs. D7 loading all with node_load_multiple() with and without PBS?

David Strauss’s picture

Having read the benchmarks, I'm a little worried about the tax we're paying with the new DB abstraction layer. If it's adding 100ms+ to bare node loads and saves (which don't run that many queries), it's probably adding a ton of the overhead we're seeing with D7 fields versus CCK D6.

Also, it should not be taking longer to load nodes than save them.

eaton’s picture

I found a problem, when I tried to create 100 fields, PBS failed to create the related MySQL table, it seems to be a storage engine-dependent limit.

Yeah, that's definitely a fundamental limit. One thing that confuses me a bit is that you note the PBS makes loading nodes considerably faster; in the graphs it appears to be just the opposite. Is that an error, or is there a new and wackier twist?

catch’s picture

When #324313: Load multiple nodes and terms at once went in, we knew that db_select() was adding about a 1-3% hit for a full node/n page load (and about 1ms per node_load() when narrowed down to just that function) http://drupal.org/node/324313#comment-1128423 is where Justin Randell worked this out.

It looks like Aron's benchmarks show a significant extra slowdown over and above that though.

bjaspan’s picture

FileSize
736 bytes

The problem with pgsql is here:

function _field_sql_storage_etid($obj_type) {
  $etid = variable_get('field_sql_storage_' . $obj_type . '_etid', NULL);
  if (is_null($etid)) {
    $etid = db_insert('field_config_entity_type')->fields(array('type' => $obj_type))->execute();
    variable_set('field_sql_storage_' . $obj_type . '_etid', $etid);
  }
  return $etid;
}

The db_insert() query is not returning the last insert id. Inside InsertQuery_pgsql::execute(), we have:

    $schema = drupal_get_schema($this->table);

$schema ends up as NULL. This is happening inside install_tasks('main', 'configure'). Somehow, field_sql_storage.module's schema is not available, even though its tables already exist.

I've attached a patch that fixes the problem by not depending on the schema being present. I'm not sure if it is the "right" fix, but it works.

bjaspan’s picture

Status: Active » Needs review
catch’s picture

Status: Needs review » Needs work

The postgres issue got discussed on irc last night but doesn't seem to have been mentioned here - can you try with the patch at #349671: Make the postgresql driver independant of the schema ? If that's not it, this shoud use ->fetchField() instead of db_result().

Aron Novak’s picture

About PBS performance on node loading: I did a mistake, in pbs case, the offsets were not substracted.

Please do not compare these values to previously posted results. Here there is no offset is applied, more free memory so not much swapping..

Third round of benchmarking (only node loading, 100 nodes):
D6 repeated node_load()s: ~75ms
D7, repeated node_load()s : ~260ms
D7, node_load_multiple() : ~60ms
About PBS results: if there aren't any additional fields, my experimental node_load is just as D7 without any patches.
D7 + PBS, repeated node_load()s, 10 fields : ~220ms
D7 + PBS, repeated node_load()s, 20 fields : ~265ms
D7 + PBS, node_load_multiple(), 10 fields : ??
D7 + PBS, node_load_multiple(), 20 fields : ??

I could not test D7+PBS + node_load_multiple(), because of the following:

$this->node_load_multiple(NULL, NULL, TRUE);

$t_load = 'cckbench_load_'. $type;
timer_start($t_load);
$loaded_nodes = $this->node_load_multiple($nodes);
$this->assertTrue(TRUE, t('100 nodes (!fi fields) are loaded in !sec seconds (node_load_multiple).', array('!fi' => count($this->fields), '!sec' => timer_read($t_load))));

I always got unrealistically low values (under 30ms) despite the fact of cache resetting. Altough If i watched the content of the$loaded_nodes, it was correct. The code ran on the bzr repo drupal.

bjaspan’s picture

Status: Needs work » Needs review

The patch at #349671: Make the postgresql driver independant of the schema fixes the problem. This confirms that the pgsql issue is not a Field API bug but a pgsql driver bug.

There is still the pending SQLite test problem. Is the policy for SQLite and pgsql that it is up to the maintainers of those drivers to fix bugs related to those drivers, and not to hold up other issues due to bugs in the db drivers? If so, I am inclined to mark this issue 'fixed' again.

eaton’s picture

Status: Needs review » Fixed

I'd agree. The problem lies there, this patch just happened to stumble into that bug.

NancyDru’s picture

Status: Fixed » Needs review

I just downloaded the current 7.x-dev version and get this immediately:
Fatal error: Call to undefined function field_attach_load() in C:\www\webapps\drupal-7\modules\node\node.module on line 917

Php 5.2.5, MySql 5.0.45, Windows XP SP3

catch’s picture

Status: Needs review » Fixed

Nancy, you need to empty your database and reinstall - there's no HEAD-HEAD upgrade path.

@Aron, interesting benchmarks, can we continue on another issue somewhere, this information is important and shouldn't get lost now this one is 'fixed'.

Jose Reyero’s picture

Status: Fixed » Needs work

> David Strauss: Having read the benchmarks, I'm a little worried about the tax we're paying with the new DB abstraction layer. If it's adding 100ms+ to bare node loads and saves (which don't run that many queries

I think this is a very interesting question that may not belong here but may have a great impact in all this benchmarks.

As I'm new to the new DB abstraction layer too, I don't know whether we are creating a new (big) query object for each query or we have in place some 'object' recycling (reusing the same object for subsequent queries) which is common in other languages and systems.

So this is my question, should we open a new thread to discuss DB layer performance, and the possibility of some 'object recycling', is there one already?

webchick’s picture

Status: Needs work » Active

@Nancy: Yes, you'll get that error with an existing 7.x install. You'll need to re-install from scratch to make it go away (we do call it 'unstable' for a reason. ;D).

However, this issue is not fixed. We still urgently needs the broken "Field attach tests" tests fixed. We can't enable testing bot until that happens, which means I can't commit patches bigger than a line or two.

Damien Tournoud’s picture

Status: Active » Needs work

One of the fields tests fail for me on MySQL:

Field attach tests: 105 passes, 7 fails, and 0 exceptions

On SQLite, it's even impossible to launch the Field attach tests:

An HTTP error 500 occurred.

On PostgreSQL (with #349671: Make the postgresql driver independant of the schema applied), this is not that great either:

Field attach tests: 28 passes, 7 fails, and 7 exceptions
Field form tests: 78 passes, 21 fails, and 19 exceptions

I don't see how this is fixed.

Is the policy for SQLite and pgsql that it is up to the maintainers of those drivers to fix bugs related to those drivers, and not to hold up other issues due to bugs in the db drivers? If so, I am inclined to mark this issue 'fixed' again.

SQLite and PostgreSQL are supported drivers. It's up to the patch makers to test their patches properly on these database engines. Of course, there are bugs in the db drivers (there are too in MySQL), but that's not an excuse not to test patches properly.

NancyDru’s picture

@catch, webchick: Thanks. That did make it go away, unfortunately everything else went away too.

bjaspan’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
608 bytes

With the attached patch, which fixes a genuine Field API bug that triggered on pgsql but not mysql, plus the patch in #349671: Make the postgresql driver independant of the schema which fixes a pgsql driver bug, all Field API tests using pgsql on Windows XP.

My actual Field API patch is an obvious one-line fix, so I'm RTBC'ing it myself.

webchick’s picture

Status: Reviewed & tested by the community » Needs work
FileSize
1.92 MB

Sorry, Barry. Tests are still failing for me on Mac even with that patch (and, presumably, testing bot as well since it's the same failures it marked on a bunch of patches after this was committed).

I've attached the output from SimpleTest.

Damien Tournoud’s picture

With the patch in #139, we are back to:

Field attach tests: 105 passes, 7 fails, and 0 exceptions

On both MySQL and PostgreSQL. At least that's consistent ;)

Damien Tournoud’s picture

And, exactly the same on SQLite!

302 passes, 7 fails, and 0 exceptions
yched’s picture

Assigned: Unassigned » yched

working on the field attach test failures

webchick’s picture

Assigned: yched » Unassigned

In the meantime, I've committed #139 to at least make HEAD installable on pgsql/sqlite. Thanks!

webchick’s picture

Sorry, yched. :( Didn't mean to unassign you. :(

yched’s picture

Assigned: Unassigned » yched

Never mind :-)
The test failures are caused by the changes in drupal_render that went in just after the fields patch got committed, BTW.

bjaspan’s picture

Title: HEAD BROKEN: Field API initial patch » Field API initial patch

... in which case, SCREW YOU, OTHER-PATCH, for getting us blamed for breaking head! I'm removing the "HEAD BROKEN" denigration from this issue because we did not break it.

yched’s picture

Title: Field API initial patch » HEAD BROKEN: Field API initial patch
Status: Needs work » Reviewed & tested by the community

OK, I can't for the life of me seem to be rolling a proper patch against HEAD on my current (on the road) setup, so here are the needed changes. It's late here, so I'm deferring on a good soul to wrap a proper patch.

field.module needed to be updated to the new behavior of #theme / #theme_wrapper
+ the full explanation for the change in common.inc is in http://drupalbin.com/7861; chx and eaton gave a +1 on IRC, so I dare set to RTBC.
This sets us back to 100% tests pass.

modules/field/field.module

// line 560
-      // Use isset() to avoid undefined index message on #children when field values are empty.
-      $variables['items'][$delta]['view'] = isset($element['items'][$delta]['#children']) ? $element['items'][$delta]['#children'] : '';
+      $variables['items'][$delta]['view'] = drupal_render_children($element['items'], array($delta));
// line 572
-    $variables['items'][0]['view'] = $element['items']['#children'];
+    $variables['items'][0]['view'] = drupal_render_children($element, array('items'));

modules/field/field.default.inc

// line 152
-      '#type' => 'field',
+      '#theme' => 'field',
// line 174
-      '#theme' => $theme,
+      '#theme_wrapper' => $theme,

includes/common.inc

// line 3382
-function drupal_render_children($element, $children_keys = NULL) {
+function drupal_render_children(&$element, $children_keys = NULL) {
yched’s picture

Title: HEAD BROKEN: Field API initial patch » Field API initial patch

Oops, didn't mean to put back the shame on us.

sun’s picture

FileSize
59.76 KB

#148 as patch. As was mentioned before - Windows line-endings.

webchick’s picture

Category: bug » feature
Status: Reviewed & tested by the community » Fixed

Awesome. Thanks SO much yched/sun! Tentatively switching the bot back on... and resetting issue status.

webchick’s picture

Testing bot looks happy again. Hooray!! :D Great work, all!

sun’s picture

Status: Fixed » Needs review
FileSize
442.69 KB

Sorry, but major parts of the code-base have been screwed up due to this patch.

Status: Needs review » Needs work

The last submitted patch failed testing.

sun’s picture

Status: Needs work » Needs review
FileSize
442.7 KB

Last patch has uncovered a wrong function signature in hook_field_storage_write() in api.field.php. That said, the filename api.field.php is wrong. Leaving that for a separate clean-up issue though.

webchick’s picture

Status: Needs review » Fixed

Committed that too, which will probably break every spin-off FiC patch in the queue, but much better to do that now than later. :)

Marking this fixed for what I hope will be the final time. ;D

yched’s picture

Thanks fox fixing the windows linebreaks snafu, sun !
Not much of a mistery, the culprit is either Karen or me, since we were the only windows people committing in the bzr repo. Very possibly my own personal bad.

Anonymous’s picture

[off-topic post for windows users]
RE: #157

I use VIM to edit files and I tend to put at either the beginning or the end of the files.

// vim:ft=php:sts=2:sw=2:ts=2:et:ai:sta:ff=unix

and I edit the difference file to remove this change. I also execute :1,$retab! before creating the patch.

yched’s picture

Assigned: yched » Unassigned

Not sure I remember was this was assigned to me at some point, but not really fair for posterity :-)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

sun’s picture

For anyone coming here via CVS annotate and wondering why this was introduced here - see #347959: modules_installed is broken during testing (which thankfully reverts it)