There should be a way to allow a column to be indexed, so that Views (or other queries) take advantage of the speed increases.

I would suggest at the basic parsing a 'index' => TRUE in the database columns arrays.

yched mentioned there was an issue or groups discussion about this. Sorry if I'm duplicating this, or saying something that's already been hashed out. I didn't see that in a search.

Thanks,
Aaron

CommentFileSizeAuthor
#19 content-index-231453-19.patch9.36 KBmarkus_petrux
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

aaron’s picture

there could even be a case made for having 'primary' => TRUE as well, if we ever decided to allow an editor to force unique values, although forcing 'not null' => FALSE will stop this from happening.

Owen Barton’s picture

I have added a patch for D5 CCK that does a (simple) approach like this in http://drupal.org/node/241078, with the aim that we could get a more fine grained index control for fields in D6 CCK.

Moonshine’s picture

I was thinking about this issue today while indexing some columns. Is there a reason why CCK couldn't offer a "Index field" option during the field definition process. I realize it would only create a singular column index but would certainly be useful in many Views situations.

Now that hook_field_settings($op = 'database columns') now uses Schema API syntax, I would think this could be done reasonably? Or am I overlooking something...

dopry’s picture

I think you can return an index, unless someone can prove me wrong... would be a great performance boost... I'll explore tomorrow if I get the chance.

Owen Barton’s picture

Status: Active » Postponed

Please see (and help test!) http://drupal.org/node/241078 which allows exactly that (passing in an 'INDEX' => TRUE attribute).

Once that patch is in we can look at supporting multi-column index, index lengths and index types, which would be the next step, if we want to take this further.

yched’s picture

Status: Postponed » Active

Let's reopen this.

As I wrote in #241078: Reverse node-reference views relationship : to ensure upgradability (and a non-nightmarish maintanance of the D6 upgrade path), this should be done in D6 first before a D5 backport can be considered.
I also think index creation needs to be an explicit decision of admins. We cannot automatically create indexes on all fields: would lead to unacceptable db bloat (explosion in db file size, db write execution times, etc...)

I see several steps for this feature, each can be worked on and committed separately for progressive enhancement.

1) In hook_field_settings('columns'), field modules define which of their columns are 'indexable' (I don't think there's currently any sense in letting admins index the 'format' column of a text field)
On a field's settings page, admins specify (checkboxes) on which columns they want to create an index - Help text : 'Note: only create indexes if you know you have a reason to do so'. To begin with, we allow only one index per field, and indexes are not configurable (size...). Keeps the UI issues bite-sized, and should be enough for most cases.

Ideally, this actually takes place in the 'basic settings' form rather than the field settings, since the set of db columns depends of the settings (see text.module's 'format' column - or even better, date module: date, date2, timezone, offset...). We want to avoid situations we had in D5, where changing some settings invalidates other parts in the form.

2) We implement Views' 'analyze' hook to generate reports of the CCK fields that could benefit from an index, given the current set of active views.
Extra bonus points for an AHAH-updated report directly below the 'create index' checkboxes :-)

3) If time / resources permit (meaning: er, unlikely :-) )
- multiple indexes (probably requires some AHAH magic if we want to keep this on the field settings page)
- index settings ?
- order for multi-column indexes - yuck...

Thoughts ?

[edit - forgot to add: can we agree this is post 2.0 stuff ? :-)]

moshe weitzman’s picture

i wonder if we shouldn't do the Views stuff first. The output of the anayliss could be code you could paste into an update function so that your site would be optimized. roughly like how Views emits helpful themeing code. In this scenario, CCK needs no changes.

yes, definately post 2.0.

yched’s picture

@Moshe:
This would probably be the way to go for a generic index analyzer (non specific to CCK fields).
Woudn't apply well for CCK fields because indexes added manually (or through db_create_index() calls pasted in an update) would not be persistent. Share / unshare the field and the index is gone without you even knowing it.
CCK needs to set and persist the indexes on its columns itself, so 2) without 1) is not really satisfactory (at least as far as CCK is concerned, of course)

Anyway, a Views index analyzer, be it generic or limited to CCK fields, is not exactly a trivial thing. You need to take detailed settings into account (for instance, does a string filter with the 'ends with' operator benefit from an index ? dunno for sure, but I'm no expert on db indexes), and to figure out the exact query clauses that get generated.
The handlers themselves know best, so this would probably need a views_handler::suggest_index() method.
If we go generic, we'd also need a way for CCK fields to report results ('You should edit this field's settings and add an index on (col1, col2)') different from other db fields ('db_create_index()' code snippets). Mainly thinking out loud here.
This 'Analyze Views' part probably deserve its own thread.

Owen Barton’s picture

I agree that a generic index adder (and ideally analyzer) would be a valuable contribution. Because of all the different module interactions and use cases, index optimization goes beyond just CCK, for example there are several taxonomy and i18n modules that need additional indexes to work well. I also agree that the required indexes is strongly linked to the queries that are run, so tying it to the data source (i.e. CCK) does not really make sense.

In addition, since we have http://api.drupal.org/api/function/hook_schema_alter/6 the index adder module should be able to add indexes transparently to any schema without needing to tie into that module directly. As long as we use this hook we probably don't need to do any work within CCK - obviously indexes would be lost if fields move to their own table or whatever, but this is probably the best behavior anyway since that alone changes the performance in ways that justifies reanalyzing the indexes (and if you are doing that on a busy, large, live site you had better be careful and preemptive doing this anyway).

In this case I think we can probably close this ticket - I suggest we continue this discussion at http://groups.drupal.org/node/16483 (a wiki page to flesh this idea out further). If it turns out we actually do need to add an API to CCK to assist with this we can reopen this ticket.

yched’s picture

CCK data tables are currently not generated / updated against the schema, so a hook_schema_alter solution wouldn't work here.
Now, we updated that part of the code quite a while ago, and it is quite complex (lots of dependencies / race cases), so I cannot remember the exact reason for this, nor whether this could be changed.

yched’s picture

Thinking about this a little more :
- detecting which indexes are actually relevant is a different issue - whole task in itself, no clear horizon yet.
- the best UI to add indexes is phpMyAdmin :-)

Lowest hanging fruit IMO is to let admins use phpMyAdmin or their regular db admin tool to add whatever indexes they see fit, and simply ensure that when some field gets moved around, CCK preserves the indexes it finds on the columns it migrates.
Would require some db inspection - not much more than 50 lines of code to steal from Barry's schema.module, which has it all already (mysql + pgsql).

What do you guys think ?
Any taker ? :-)

killes@www.drop.org’s picture

I've created a patch http://drupal.org/node/338934 that hangs very low. It accidentally does exactly what aaron suggested in less than 10 lines of added code.

I agree about the problem of index bloat and would not suggest to add indices by default to cck core widgets.

But if you write special widgets it is quite convenient.

And that's what cck is about.

markus_petrux’s picture

May we awake this?

There's a clear need to provide indexes for node and user reference fields. This is necessary as soon as you join for these fields, and probably a huge percent of users do.

This could be solved by the implementation of the 'index' attribute for hook_field_settings('database columns'), then:

1) adapt the code in content_table_schema().
2) adapt the code in content_alter_db().
3) implement the new 'index' attribute in node and user reference fields.
4) implement a new hook_update_N() to create these indexes on existing node and user reference fields.
5) anything else?

Is there any problem on doing just this?

Owen Barton’s picture

Note that I already wrote a patch that does exactly what markus describes (for Drupal 5, but should be easy to port) over at http://drupal.org/node/241078#comment-1018872 - there is some views code chunk (for that issue) that you can ignore.

yched’s picture

re markus #13: the only drawback is that index creation automatically based on field type easily leads to index bloat.
Well, I'm possibly ready to let go on this for noderef / userref.

IMO it would be best to have content_alter_db() *preserve indexes it finds in the table* (not just based on the hook_field_settings('database columns') information). This lets you add use PHPmyAdmin to add your own indexes on a specific text field you know would benefit from indexing, and be sure CCK doesn't wipe out the indexes next time the field is moved around. See my comment #11 above.

markus_petrux’s picture

Agree on that this should not be used for all fields. I was mostly worried about noderef/userref fields, where it is clearly a must, and most sites won't do it themselves, so the overall performance on Drupal/CCK could be enhanced a little if we can implement this. :)

I'm not sure what you mean on not wiping existing indexes. As far as I can see, content_alter_db() in CCK2 build the table schema programmatically, it does not ask the DB server for the existing schema. So existing indexes would only be removed if the table is recreated, which is something that may happen already. Please, correct me if I'm wrong.

Other than that, I'll be happy to roll a patch to implement the features described in #13.

moshe weitzman’s picture

"preserve indexes it finds in the table" is a nice feature but not one we are obligated to support IMO. it really is important that the schema in drupal match the schema in the DB, indexes included. consider what will happen when simpletests are run - the manual indexes are not created. it is really easy to use hook_schema_alter() and i would not feel bad requiring that in this case.

markus_petrux’s picture

Ugh, adding hook_schema_alter() here would allow other modules add more columns and this was discarded recently. See: #417122: Allow drupal_alter() on Field and Widget Settings

Agree it would be nice, but it's a bit more tricky to do it properly. It may require to do this in batches, and that complicate things too much, I think.

markus_petrux’s picture

Status: Active » Needs review
FileSize
9.36 KB

Here we go!

I had to create a function to check if an index exists. It supports MySQL and PostgrSQL. I was able to test only with MySQL, but this is based on #360854: db_index_exists() missing, module updates cannot handle indexes properly, so I assume the SQL statement itself was tested by someone else, also there are references in that issue to the PgSQL manual.

That patches includes hook_update_N() implementations for node and user reference fields, so that all existing fields can be updated. If other indexes exist, those won't be dropped, unless the table is re-created for some reason, which is unlikely to happen existing for noderefs/userrefs.

Tested here and worked like a charm. Please, critique! :)

markus_petrux’s picture

Anyone had the chance to test the patch in #19?

Testing is easy. Note that the hook_update_N() functions in node and user reference fields can be executed as many times as needed. These functions only do something when the indexes do NOT exist.

If you don't run update.php, then you can also check if the indexes are created when submitting the field settings form of node/user reference fields.

markus_petrux’s picture

Status: Needs review » Fixed

Committed to CCK2 and CCK3 with a slight variation on the patch in #19. The diff is that function content_db_index_exists() is located in content.module, so hook_update_N() in node and user reference field do not need to load includes/content.admin.inc.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mikeytown2’s picture

Project related to this
http://drupal.org/project/dbtuner