Many core tables have very poor indicies - for example {system}. At the very least we should do a basic audit of the core tables where queries are running on every page load.
For a start, the system table. The most common queries are of the form SELECT * {system} WHERE type = 'module' AND status = 1
and SELECT * {system} WHERE type = 'module' AND name = '%s'
right now, neither of these sorts of queries uses any index. Also, the existing index 'weight' is useless and will never be used.
mysql> explain select * FROM system where type = 'module' AND status = 1 ORDER by weight, filename;
+----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------+
| 1 | SIMPLE | system | ALL | NULL | NULL | NULL | NULL | 41 | Using where; Using filesort |
+----+-------------+--------+------+---------------+------+---------+------+------+-----------------------------+
if we add the key: KEY `type_status` (`type`(8),`status`,`weight`,`filename`)
we do much better (see module.inc line ~61 - both queries have the same explain result):
mysql> explain select * FROM system where type = 'module' AND status = 1 ORDER by weight, filename;
mysql> explain select * FROM system where type = 'module' AND status = 1 AND bootstrap = 1 ORDER by weight, filename;
+----+-------------+--------+------+---------------+-------------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+-------------+---------+-------------+------+-------------+
| 1 | SIMPLE | system | ref | type_status | type_status | 28 | const,const | 13 | Using where |
+----+-------------+--------+------+---------------+-------------+---------+-------------+------+-------------+
Adding an index like KEY `name_type` (`name`(16),`type`(8))
optimizes the other sort of query.
mysql> explain SELECT * FROM system where type = 'module' AND name = 'system';
+----+-------------+--------+------+-----------------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+-----------------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | system | ref | type_status,name_type | name_type | 72 | const,const | 1 | Using where |
+----+-------------+--------+------+-----------------------+-----------+---------+-------------+------+-------------+
Starting patch attached just for system.schema.
Comment | File | Size | Author |
---|---|---|---|
#190 | too-unique-164532-182.patch | 4.8 KB | hswong3i |
#181 | too-unique-164532-181.patch | 4.75 KB | pwolanin |
#170 | too-unique-164532-170b.patch | 5.85 KB | pwolanin |
#150 | too-unique-164532-150.patch | 4.48 KB | pwolanin |
#148 | too-unique-164532-148b.patch | 1.44 KB | pwolanin |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commenteda small start too on taxonomy suggested by David Strauss.
Comment #2
moshe weitzman CreditAttribution: moshe weitzman commentedsubscribe ... thanks for diving into this.
Comment #3
Zothos CreditAttribution: Zothos commentedWhere else can the indexes increase the performance? I cant be only in the taxomy table... :P
Comment #4
BioALIEN CreditAttribution: BioALIEN commentedFinally someone has stepped up to this. Thanks pwolanin!
Last month I analysed the database of a 5.x install. Here are the commends I noted:
1. node table, phpMyAdmin spit out a
"PRIMARY & INDEX keys should not both be set of column 'nid'. More than one INDEX key was created for column 'status'."
2. blocks table, NO INDEX DEFINED! - added INDEX on delta or module columns?
3. flood table has no INDEX! - Create INDEX on timestamp?
4. All CCK generated tables require "nid" to be set as INDEX. Issue submitted to CCK devs: http://drupal.org/node/162535
The list goes on but it is related to contrib modules rather than core itself.
Hope this helps!
Comment #5
pwolanin CreditAttribution: pwolanin commentedThanks - there was an article recently (http://www.dave-cohen.com/node/1066) that pointed out that some tables such as blocks are missing a PRIMARY key (bad).
Comment #6
pwolanin CreditAttribution: pwolanin commentedthe frickin Actions module's table has a broken schema index declaration, no primary key, and no validation function to avoid inserting duplicates.
Also, the {locales_target} is missing a Primary key - should it be (lid, language, plid) or (lid, language, plural) or something else???
2 of 3 of the search module tables don't have a Primary key- is there another patch pending for these tables?
Comment #7
ChrisKennedy CreditAttribution: ChrisKennedy commentedhttp://drupal.org/node/147298 has some earlier work on this issue by David if you haven't seen it.
Comment #8
David StraussSubscribing.
Comment #9
pwolanin CreditAttribution: pwolanin commentedwtf? Trying to double-check that the suggested patch by David Strauss for removing the bid column from {blocks} makes sense - there is obviously some confused code in block module:
So somehow the delta from {boxes} and the bid are the same in come cases?
Comment #10
David Strauss@pwolanin My patch in the old issue isn't nearly complete, and I wouldn't make such a change on this issue because this issue is performance-focused.
The reason I removed the bid column on the old issue's patch is because it's redundant, at best. We already identify blocks by module and delta; it's unnecessary to introduce another column. Also, because we so casually delete and insert rows from/into the blocks table to refresh it, an auto_increment column isn't even a reliable primary key.
Comment #11
pwolanin CreditAttribution: pwolanin commented@David Strauss - yes - seems an autoincrement doesn't make much sense. Still, perhaps defining a unique index on (module, delta, theme) would be simpler than removing this column (at least for 6.x)?
Comment #12
bjaspan CreditAttribution: bjaspan commentedSubscribe.
Comment #13
m3avrck CreditAttribution: m3avrck commentedThis can still make it in Drupal 6 since it is both a bug and performance fix.
Comment #14
catchJust a quick note that: http://drupal.org/node/146688 was duplicate, and has a different suggested index to the latest patch doesn't it?
David Strauss' post from that issue here for ref:
Comment #15
catchhttp://drupal.org/node/172724
was also duplicate.
Comment #16
coupet CreditAttribution: coupet commentedsubscribe... we need a comprehensive solution for ver. 6
Comment #17
jaydub CreditAttribution: jaydub commentedsubscribe
Comment #18
dmitrig01 CreditAttribution: dmitrig01 commentedBump. Needs to be re-rolled for actions -> triggers
Comment #19
pwolanin CreditAttribution: pwolanin commentedsplit off the Trigger module portion here: http://drupal.org/node/178650
Comment #20
pwolanin CreditAttribution: pwolanin commentedre-rolling patch with the rest of the changes in the above patch
Comment #21
catchThis patch: http://drupal.org/node/147298 also related although not sure if/which should be marked as duplicate.
Comment #22
catch#171685 was duplicate.
Comment #23
Dries CreditAttribution: Dries commentedI'd love to commit this patch but it is hard to verify whether the changes actually improve things or not. All the information seems to be spread across multiple issues. Is there a good way to test this or do we just commit this and assume it does more good than wrong?
Comment #24
JirkaRybka CreditAttribution: JirkaRybka commentedAnother issue is http://drupal.org/node/185126 (system table, containing a patch, with upgrade path and benchmark attempt too (seems to be 3.5% gain for cached pages)).
Comment #25
NancyDruI can confirm (at least under 5.x) that adding theme, module, delta as a key to blocks makes a visible performance difference, although I have not benchmarked it.
Comment #26
NancyDruAlso note that the "nid" and "status" indexes on node are redundant.
Comment #27
catchIs this needs work because there's no upgrade path? What else needs to be done to get it in?
#69111 was yet another duplicate.
Comment #28
catchDries: http://drupal.org/node/146688 states that taxonomy_get_tree (forum index) currently uses a filesort. This is a big issue on drupal.org and any site with a decent number of forums. The proposed index by David Strauss in that issue is exactly the same as the one in this patch.
However the system changes look much better dealt with over at: http://drupal.org/node/185126 I've attached a patch with that section taken out so the system module index can be dealt with in it's own right.
So what does this need to get in? Upgrade path?
Comment #29
catchack! schema files are gone of course.
So reroll to move into .install + upgrade path. Sorry for so many posts in a row.
Comment #30
catchThis patch moves the schema changes back to .install.
This section:
Appears to have already been changed, so I left that out. Marking back to review now. This still needs an upgrade path assuming agreement on the indexes themselves.
Comment #31
JirkaRybka CreditAttribution: JirkaRybka commentedRelated issue: http://drupal.org/node/85453 - Statistics tables.
Comment #32
catch#147298 was closed in favour of this one, however David Strauss' patch has a lot of work that didn't make it over here (and is against .schema files). Deals with aggregator, block, filter, system and taxonomy. system is best dealt with on that other issue, but I'll try to roll anything that doesn't conflict into a new patch against HEAD today. We'll still need an upgrade path for this once that's done though.
Comment #33
catchOk this is a re-roll of the previous patches in this issue with the addition of David Strauss' suggested changes linked above.
There are three schema changes in this (the removal of id columns in block, taxonomy and filter module schemas and their replacement with better primary keys) - I'm not sure if that will get in at this stage with beta, but I've left them in. If they should be taken out I'm happy to re-roll without those sections and either revive or create seperate issues for those particular changes rather than complicate this one.
Leaving at "needs review", but we still have zero upgrade path for any of these changes as yet, so more work to be done (and probably not by me, all I'm doing is reading issues and cut and pasting to get this presentable).
This is untested but I'll clear my dev copy + search through modules to see how much is broken by the schema changes a bit later.
Comment #34
catchThinking about it, we should absolutely not introduce any schema changes in this patch. I'll look for any relevant block/taxonomy/filter issues and split those changes over to there, probably for D7.
So this patch only deals with index changes, still needs review and upgrade path.
Comment #35
catchblock.module currently only has a primary key on bid, nothing else, so presumably there are index optimisations possible which we could deal with here.
Linked issues above removed bid (autoincrement) entirely, I've split these to #14158 with a re-rolled patch for D7, don't see changes of that size getting into D6 at this point.
Comment #36
catchMore reading around, sorry for the multiple bumps, every time I think I'm done I find another issue trying to deal with this.
node.install has these indexes on {node}.
This leads redundant indexes on nid and status, attached patch removes the ('nid', 'vid') unique key and the 'status' index as reported by nancyw in #26 (and phpmyadmin). There's still some redundancy even with this, but it'll require better eyes than mine to work out how to remove them cleanly without making something else worse.
Comment #37
David StraussAt this stage, we probably can't remove important indexes and columns in a way that would require code refactoring.
Comment #38
NancyDruI thought bid (in the boxes table) was how the box was linked to the block. If you remove it, how do they tie together?
Comment #39
David Strauss@nancyw Blocks are uniquely identified by module and delta. bid is a redundant unique index.
Comment #40
JirkaRybka CreditAttribution: JirkaRybka commentedOn boxes table, indeed bid is used, but the patch earlier in this issue removed bid on blocks table, where it seems to be really unused. The relation (as seen in the code) seems to be blocks.delta - boxes.bid . But anyway, this is D7 stuff, schema changes, so let's proceed wihout this, as suggested above.
Comment #41
NancyDruAh, I see now. I was confused because there is no "bid" in the blocks table in D5. But I did manually add a primary on theme, module, and delta and saw a performance difference.
Comment #42
catchThis is the one main change I can think of that isn't in the patch as yet. I also looked through all D6 tables and I'm pretty sure the parent and tid indexes in {term_hierarchy} could go - primary key is ('tid', 'parent') anyway.
Comment #43
David StraussThe (tid, parent) key is not usable for queries selecting on parent alone.
Comment #44
JirkaRybka CreditAttribution: JirkaRybka commentedYet another frequently performed queries are in the locale module. I didn't test it, but the indexes seem to me not optimal too, and these tables are big (thousands of rows).
'locales_source' have primary key 'lid', and index 'source'(30), but almost all queries use also the new 'textgroup' column, so I guess it should be an index too. 'locales_target' have indexes 'lid', 'plid', 'plural', 'language' (which is fine), but no primary key (not sure whether this is a problem; the unique identifier here is lid+language FYI), and upgrade path is missing the new 'language' index (!!!) which might be a separate issue.
The frequent runtime queries are:
Uncached string (>75 chars long):
SELECT s.lid, t.translation, s.version FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = '%s' WHERE s.source = '%s' AND s.textgroup = 'default'
Cache rebuild (after new string added):
SELECT s.source, t.translation, t.language FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = '%s' WHERE s.textgroup = 'default' AND s.version = '%s' AND LENGTH(s.source) < 75
Comment #45
pwolanin CreditAttribution: pwolanin commented@David - for blocks is the unique key really (module, delta), or is it (module, delta, theme)?
Comment #46
David Strauss@pwolanin Sorry, you're correct. You need the theme column to form a candidate key.
Comment #47
catchOK so block.module needs a key on theme, module, delta
@JirkaRybka - if the upgrade path is bad in locale, should we split to it's own issue or try to include here?
Marking to needs work for those, and because there's still no upgrade path in my patch from #36
Comment #48
JirkaRybka CreditAttribution: JirkaRybka commentedRe: #47 - I guess we should create a new issue if it's only the locale upgrade path (which I will as time permits), but if we decide to change some keys for locale here, then it might go together.
--------------------------------------
Edit (2007-11-11): Issue created at http://drupal.org/node/191236
Comment #49
David StraussHere's my latest work. I'd like to have people agree on what we want before I spend time writing update code.
Comment #50
David StraussI'd like people to review my proposed changes. Once we can agree on the indexes, I'll write the update code.
Comment #51
dmitrig01 CreditAttribution: dmitrig01 commentedThe index on comments.status is good - usually we select comments where status = 1
The order in statistics.install is odd
Comment #52
David Strauss"The index on comments.status is good - usually we select comments where status = 1."
What queries did you have in mind? The mere existence of "status = 1" in the WHERE clause does not mean the index is useful.
Comment #53
chx CreditAttribution: chx commentedIf it's CNW noone will look into it. That's not what we want at this point.
Comment #54
catchThe {term_node} indexes should probably also be applied to {forum} as well - it's the same fields, many similar queries (more if this patch gets in).
Comment #55
NancyDruAs long as you're looking at clean up tasks, what is "tid" in the Permissions table? It is always 0 in all my sites.
Comment #56
scott.mclewin CreditAttribution: scott.mclewin commentedI'm seeing slow queries show up on one of my sites is in the cron job cleanup of the watchdog logs. In my reading of the patch, I don't see this timestamp field indexed. Granted, this is not an eighteen second query time that my users explicitly sit through when browsing to a page, but it is tying up the database server, which would slow down other requests.
# Query_time: 18 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
DELETE FROM watchdog WHERE timestamp < 1191564006;
explain needs a select, so here's a variation on the delete:
mysql> explain select * FROM watchdog WHERE timestamp < 1191564006;
+----+-------------+----------+------+---------------+------+---------+---
---+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | re
f | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+---
---+--------+-------------+
| 1 | SIMPLE | watchdog | ALL | NULL | NULL | NULL | NU
LL | 120879 | Using where |
+----+-------------+----------+------+---------------+------+---------+---
---+--------+-------------+
1 row in set (0.00 sec)
Granted, I've kept too much watchdog log around (something I'm fixing), but the point for this thread is that the timestamp field in the watchdog table is a candidate for this patch.
Comment #57
David StraussI'm against adding an index to watchdog. It will slow down INSERTs, which do occur during user page loads. Plus, watchdog clean-outs are moving more toward fix row limits, anyway.
Comment #58
scott.mclewin CreditAttribution: scott.mclewin commentedThat's a good point, and I didn't know about the row limit approach being taken/considered for the watchdog logs.
Comment #59
catchI ran this snippet which identifies worst performing indexes, linked from here. I think David Strauss' patch already deals with some of these redundant indexes, but here's the results for reference (3 results removed because they were duplicates between drupal installs).
same page also says:
-which at least those node indexes are likely to fall under.
Comment #60
klando CreditAttribution: klando commentedPostgresql (>=8.2) users should start the statitistic, add lot of index, and after few days select from pg_stat* to see which index are used, or never used, etc.... It is usually a good starting point as it provide very good data on what was read from disk, memory, via an index or not
Comment #61
JirkaRybka CreditAttribution: JirkaRybka commentedThe locales_target lang/language/locale (or whatever is it named through the versions) index will get much more selective once you've more languages set up. With only just one language, all entries have the same, obviously.
Comment #62
catchWell it looks like the only thing that needs doing with this patch is the statistics order and the upgrade path. None of the proposed changes have been challenged and additional indexes can always have their own issue later on.
edit: #49 still applies with offset. So it's small reroll of that plus upgrade path. Do we have a good example of an upgrade which adds and/or drops indexes and works on pgsql?
Comment #63
Gábor HojtsyAnyone taking this up?
Comment #64
pwolanin CreditAttribution: pwolanin commentedIf there is consensus now on the changes required, I'll try to work on the remaining pieces
Comment #65
catchpwolanin, nothing added in #49 has been disputed as yet, and a lot in there had been proposed in other, duplicate, issues of this one. If there's anything missing it's easy to do a followup patch as well.
Comment #66
David StraussIf this patch causes any problems, there are only two possible causes:
1. We're trying to store invalid data into a table.
2. Or, we misunderstand what sort of data we should storing in a table.
Comment #67
pwolanin CreditAttribution: pwolanin commentedlooking though the patch #49 - whose comment is this?
Comment #68
pwolanin CreditAttribution: pwolanin commentedok -patch applies cleanly, but install breaks:
Comment #69
NancyDruI'm glad I'm not the only one to run into the length problem on blocks.
I have mine set to: theme(100), module(50), delta(6)
I can't imagine a theme name that long, but it worked and my performance improved, so I didn't change it.
If the theme name was changed to VARCHAR(64) like the module name, it might take care of the whole thing.
Comment #70
NancyDruJust to confirm that, I just changed one of my test sites so that theme is defined as VARCHAR(64). Then I deleted and recreated the primary key without lengths and it worked fine.
Comment #71
pwolanin CreditAttribution: pwolanin commentedfound this lovely error too, now that the filter index is unique:
This is due to a bug in system.install - I'll file a separate patch, since that's a trivial fix: http://drupal.org/node/194029
New patch attached. Limits the theme name to 64 chars for the index. The module field is already set to 64. Are theme names really allowed to be 255 chars?
Also, correct a defect in the {history} schema definition.
Comment #72
FiReaNGeL CreditAttribution: FiReaNGeL commentedYou can limit the index size on one part (such as theme name) to like 10 characters instead of limiting the field itself. You really don't have to index the full string here; and I'm sure there's other places like this in already existing indexes.
Comment #73
pwolanin CreditAttribution: pwolanin commentedah - I cross-posted with nancyw - I think I'll agree that changing the column to 64 (rather than the index alone) is the better solution.
Here's a revised patch with a start on the update path.
Comment #74
pwolanin CreditAttribution: pwolanin commented@FiReaNG3L - yes, I also though of that first, but it doesn't really make sense to limit the length of the indexed part if we are trying to achieve a unique index. You cnnot be sure the index is unique unless you index the full length of the field.
Comment #75
FiReaNGeL CreditAttribution: FiReaNGeL commentedpwolanin: my bad, I missed the unique part; if we can't change the table substantially (add a primary key field or something), then I agree that reducing the size of the field would be the only option. 255 chars for such a field is overkill anyway - whats the theme with the longest name nowadays?
Comment #76
catchfresh install. enabled all modules. installed schema.module, all smooth so far.
Comment #77
catchComment #78
pwolanin CreditAttribution: pwolanin commentedok - patch with (untested) update code. Code does not yet cover taxonomy, locale, or user.
locale needs bigger work that will be captured here: http://drupal.org/node/194310
taxonomy and user can be added to this patch (if anyone has time).
Comment #79
pwolanin CreditAttribution: pwolanin commentedopps. last patch missed upload too. This one should have all except locale - taxonomy and user are included.
Comment #80
catchOK I installed 5.x-dev, enabled all core modules. ugpraded to D6, installed schema module. Update went smoothly but the following errors reported:
Comment #81
bjaspan CreditAttribution: bjaspan commentedRegarding primary key fields not being 'not null', please read http://drupal.org/node/159328. Bottom line, add 'not null' => TRUE to all field definitions in the primary key.
When schema.module says an index is "missing in database" it means the index is declared by the module's hook_schema() but no index has been created in the database, presumably because the database has been upgraded from D5 and the update functions do not yet create that index.
Comment #82
pwolanin CreditAttribution: pwolanin commentedum, well - I'll have to run schema module on my other computer. Apparently it needs PHP 5.1 (see: http://drupal.org/node/194764 )
Anyhow, here's another (as yet untested) patch that should correct the errors above.
Comment #83
catchquick 5.x-6.x-patched upgrade, I still get this in schema module. Everything else looks fixed though!
fwiw in the actual 6.x database itself, vocabulary has a primary key on vid, and nothing else. ought to be weight and name according to schema definition.
Comment #84
pwolanin CreditAttribution: pwolanin commentednew patch catches the missing vocabulary index. I think everything else matches the schema definition.
Comment #85
catchindeed it does.
Comment #86
bjaspan CreditAttribution: bjaspan commentedErrors on pgsql.
* warning: pg_query() [function.pg-query]: Query failed: ERROR: constraint "upload_pkey" does not exist in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: ALTER TABLE upload DROP CONSTRAINT upload_pkey in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 790.
* warning: pg_query() [function.pg-query]: Query failed: ERROR: multiple primary keys for table "upload" are not allowed in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: ALTER TABLE upload ADD PRIMARY KEY (vid,fid) in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 778.
It's dinnertime, I'll try to look into this later.
Comment #87
pwolanin CreditAttribution: pwolanin commentedI'm getting ready to try it on postres - did you get those errors on a 5.x upgrade, or on install?
Comment #88
pwolanin CreditAttribution: pwolanin commenteda clean install with PostgeSQL 8.2 (obviously applying the patch before the install) doesn't give me any errors - and schema module reports only "Match" for all installed tables
Comment #89
pwolanin CreditAttribution: pwolanin commentedok - I'm seeing about these same sorts of errors now on a D5 -> D6 update. Looks for the first like the system update that's supposed to match the MySQL and PostgreSQL schema needs to do something more with the upload table? The index I see is named file_revisions_pkey, not upload_pkey
Comment #90
pwolanin CreditAttribution: pwolanin commentedattached patch straightens out the primary key and other indexes for {upload} by fixing up system_update_6022(). This prevents the errors described above.
as expected, schema module reports a mismatch for locale module:
* languages
indexes list: missing in database
the {languages} index probably needs to be fixed in: http://drupal.org/node/194310 since it's not clear if those update functions will come before or after.
Comment #91
bjaspan CreditAttribution: bjaspan commentedWith patch #90 applied, I get
* warning: pg_query() [function.pg-query]: Query failed: ERROR: index "locales_target_language_idx" does not exist in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: DROP INDEX locales_target_language_idx in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 854.
along with the languages.list index warning, which sounds like is as expected. Note that I have not actually reviewed any of the code or new indexes for correctness, I just ran the D5 > D6 update for pgsql.
Comment #92
pwolanin CreditAttribution: pwolanin commentedah - at the last minute I added a line to drop an index- turns out it doesn't exist in the 5.x schema. I did it because in the patch for locale.install this index is removed form the 6.x schema. Attached patch is tested and should update without errors on pgSQL.
Comment #93
jaydub CreditAttribution: jaydub commentedTesting results with both MySQL (5.0.38) and PostgreSQL (8.1):
Fresh Drupal 6 Beta 3
- Installed all modules
- Installed Schema module
- Ran schema compare. No mismatched tables
- Installed the patch. No errors in MySQL or PostgreSQL
- Ran schema compare. 3 mismatched tables in both MySQL and PostgreSQL:
upload:
upload
* primary key:
declared: array('vid', 'fid')
actual: array('fid', 'vid')
* indexes fid: missing in database
* indexes vid: unexpected (not an error)
locale:
languages
* indexes list: missing in database
book:
book
* unique keys nid: missing in database
* indexes nid: unexpected (not an error)
Re-ran test with fresh Drupal 5.3
- Installed all modules
- Installed Drupal 6 Beta 3 over Drupal 5 and ran upgrade. No errors in MySQL or PostgreSQL
- Installed Schema module
- Ran schema compare. No mismatched tables
- Installed the patch. No errors in MySQL or PostgreSQL
- Ran schema compare. 3 mismatched tables in both MySQL and PostgreSQL same as above.
Comment #94
catchjaydub - please test with the latest cvs checkout, or at least the latest development tarball:
http://drupal.org/node/97368
http://drupal.org/node/320
Also, you'll need to apply the patch before you run update.php from 5.3 to 6.x since that's the main issue here.
Comment #95
catchneeds a re-roll in light of this: http://drupal.org/node/194310
Comment #96
pwolanin CreditAttribution: pwolanin commentedsimple re-roll for system.install function numbering, etc - not tested yet.
Comment #97
catch#96 applies cleanly and passes schema testing on a D5-6 update (MySQL). Missing locale index is still there but was being dealt with elsewhere I think.
Comment #98
pwolanin CreditAttribution: pwolanin commentedre-roll including putting the proper schema API stuff into locale.install based on http://drupal.org/node/194310#comment-637940
Comment #99
pwolanin CreditAttribution: pwolanin commentedok - getting some errors on PG upgrade:
I'm not sure if the first set is due to this patch - second set certainly is
Comment #100
pwolanin CreditAttribution: pwolanin commentedComment #101
pwolanin CreditAttribution: pwolanin commentedok - first error is due to bad use of quotes in a different patch. See:
http://drupal.org/node/188498#comment-646495
new patch attached which fixes locale module error above (and renumbers locale's update functions).
With this patch schema module reports 100% match of schema to definition with 5.x->6.x update on postgres
Comment #102
pwolanin CreditAttribution: pwolanin commentedComment #103
catchTested again on MySQL and this fully passes schema testing on 5-6 upgrade. Should be ready.
Comment #104
Gábor HojtsyCan we run benchmarks on some key pages with and without the patch? Dries had a concern up above. I read all the follow ups in this issue, and it seems like all the indexes added came from actual site experience, slow query logs, worst performing indexes, etc. It would be great to validate this with some benchmarks though.
Comment #105
David Strauss"It would be great to validate this with some benchmarks though."
Just a friendly reminder that this patch isn't just about performance. The new and improved UNIQUE indexes improve data integrity.
Comment #106
Gábor HojtsyAlso, the "stupid" comment on user roles should be toned down a bit IMHO. We know not everything in Drupal is best architecture wise, but we can document this more politely IMHO. (Of course this is a minor little nitpick which I can modify on commit even, so take your time with the benchmarks instead).
Comment #107
webchickMade benchmarking this into a GHOP task: http://code.google.com/p/google-highly-open-participation-drupal/issues/...
Comment #108
Gábor HojtsyComment #109
pwolanin CreditAttribution: pwolanin commentedno one seems to have claimed this as a GHOP task yet - but we do need it done soon...
Comment #110
webchickI specified as part of the task that it expires when this issue is marked 'fixed'. So anyone is free to benchmark, regardless if a student signs on to do this or not.
Comment #111
catchsystem_update_6041() just got committed, so this needs a reroll.
Comment #112
CorniI CreditAttribution: CorniI commentedI claimed this task one minute ago in GHOP ;)
I'm in IRC Corni
Comment #113
Gábor HojtsyGreat, awaiting your feedback.
Comment #114
pwolanin CreditAttribution: pwolanin commentedhere's a re-rolled patch - it applies cleanly, but not tested further.
Comment #115
pwolanin CreditAttribution: pwolanin commentedComment #116
catchJust a quick note to say Corni kindly let us know that he'll be unable to begin this task until at least the weekend and it's been reopened for now on the GHOP task list. So this is still open for anyone to bench.
Comment #117
pwolanin CreditAttribution: pwolanin commentedok - some very crude benchmarks - I installed all core modules that create tables, created some users and content (~1500 nodes) with devel_generate, added a couple feeds manually. Increased items on the front page to 30.
(note - this exercise makes me think poll module and aggregator have some bugs).
using ab (1.3) across my local net to the server (an old Dell running kubuntu with Apache 2.2, PHP 5.2, MySQL 5.0 - no query cache).
Aggregator:
Tracker
did a mysql dump (omitting create table commands):
dropped all tables. Applied patch from #114. Installed Drupal and enabled modules. Truncated all tables from mysql prompt and then loaded the SQL dump back in.
Node
Aggregator
Tracker
So - the benchmarks in this case look essentially unchanged. However, that's expected since many of the index changes relate to data integrity, not speed.
Comment #118
klando CreditAttribution: klando commentedI haven't parse all the patch, not applyed.
But, my 2 cents:
on table like that (only few thousand of rows) indices are not really use, and the change absent, because of cache.
data integrity should also be done by using foreign key.
I don't know what are the benefits to chage primary key like that : PK(a,b) vs PK(b,a) what do you really expect ?
Anyway the best thing is to look to the explain and explain analyze results.
Benchs must be done on bigest DB.
Comment #120
catch@klando Reordering of primary keys (and other indexes) - MySQL reads indexes left to right, so (a,b) gives a free index on (a). This eliminates the overhead of some duplicate indexes which are removed in the patch.
MyISAM doesn't have proper (any?) support for foreign keys.
Comment #121
pwolanin CreditAttribution: pwolanin commented@klando - feel free to come up with and do better benchmarks. We all know ab is not the best tool.
Otherwise, I think this is RTBC....
Comment #122
catchAnyone I 'know' is welcome to a 5.x dump with c. 19k nodes, 2.5k users and 250k comments for benchmarking if that helps. But yes I'm moving this back to rtbc. Benchmarks are good, but I this is about more than performance (and it would only be an issue if they showed a performance hit, which they don't, so we only win here).
Comment #123
David Strauss"so (a,b) gives a free index on (a)"
That's 99% true. There is a touch more overhead for MySQL to use an (a, b) index for just (a) than a simple (a) index. But *generally*, that doesn't merit the disk space and processing time needed to create an index on (a) when you already have (a, b).
Comment #124
CorniI CreditAttribution: CorniI commentedOK, I saw that i'm able to claim a task and did it in a week, not in 2 days. so i'd do this saturday. But is this now already done, or do you want me to benchmark this, too (Platform: Gentoo Linux, MySQl 5.0, PHP 5.2.5, Core2Duo E6320, 4GB RAM)?
Comment #125
catchDavid Strauss, thanks for the clarification!
Cornil: until this gets committed it's not done yet. Alternative benchmarks (and with a bigger dataset) would be great. As would running some relevant queries through EXPLAIN like klando suggested, although that's outside the remit of the GHOP task of course.
Comment #126
pwolanin CreditAttribution: pwolanin commented@Corni - an additional round of benchmarking would be well worth while if you are set up to do it.
Comment #127
CorniI CreditAttribution: CorniI commentedOK, i've a db of ~580MB created (devel_generate is teh rul0r ;)) Is this enough, or shall I add one or 2hundred megs? I'll post the benchnmark the next day (saturday), then you're able to commit and fix this issue as fast as possible :)
Comment #128
pwolanin CreditAttribution: pwolanin commented@CorniI - see if you can find someone to show you how to put a translation in too, plus some feeds, etc. Maybe also some user profile categories and data if possible.
Probably that's big enough - make sure that you can dump and re-load without the table structure (unless you can think of a better way to bench the same content with different indices). As above, make sure you have no query cache. Hit several of the relevant pages - such as: a listing of users with a profile category, one or more taxonomy pages, aggregator, and a page with lots of comments,
Comment #129
CorniI CreditAttribution: CorniI commentedThis is my GHOP task, if anyone of the Maintainers can say that's ok then i'll post it at google's tracker, so you can mark my task as closed :)
Specs:
Core2Duo E6320, overclocked to 2,24Ghz 4GB Ram unknown hdd
Preparations:
I prepared a database of ~585M
>102000 Rows in comments
~84000 in node
~27000 in term_data
~16850000 in term_node
200000 in users
and 10000 in vocabulary
ALl modules are enabled, except of search, the search index can't be built if you have that amount of data (at least, with 100nodes/cron cron stops executing with php errors, with below nodes/cron i get the message cron failed when executing it manually).
I added manually one forum container+forum+1thread, and some randomly chosen heise news-feeds. I enabled all Blocks and sent them randomly over the page ;)
I runned cron directly before I started ab
What I benched:
the aggregator page
the tracker page
the index
a heavily, big node with >4200pages of comments and a huge amount of terms
I exited X for the benchmarks, so there shouldn't be that much load except from ab and apache/mysql.
After benching the unindiced tables, I deleted all tables, applied the patch and reinstalled drupal. I activated all modules. I truncated all tables. I imported my old SQL. I redid the benchs.
All benchmarks are done with
$ab2 -c1 -n500 url
Compared results:
/node
unpatched
patched:
Result:
for an unknown reason, the patched version is a bit slower then the unpatched drupal
heavily overloaded node:
unpatched:
patched:
Result:
Here we have a bit faster drupal - as expected for the heavily database stuff
/tracker
unpatched:
patched:
Result:
For an unknown reason, we take more time :(
/aggregator
unpatched:
patched:
Result:
Here we gain a bit of speed, not that much, but, a bit :)
Overall Conclusion: I can't recommend anything, sorry guys. But I'd commit the patch due to the better node-load-page-time
Comment #130
SteveR CreditAttribution: SteveR commentedComment #131
SteveR CreditAttribution: SteveR commentedJust subscribing
Comment #132
David StraussThese performance tests have been entirely untargeted for what this patch affects.
Comment #133
CorniI CreditAttribution: CorniI commentedand for what *is* the patch? and what should i've tested as part of GHOP #63?
Comment #134
hass CreditAttribution: hass commentedNeeds a reroll for system update 6042 -> 6043
Comment #135
pwolanin CreditAttribution: pwolanin commented@DavidStrauss - please do share which benchmarks you think would be valuable - this issue has been held up on this.
Looking at the patch, I see changes to the aggregator tables - though probably not performance change from these. The changes for blocks and book are more about data integrity. There might be some performance gain from the changes to contact, maybe some gains when listing filters, perhaps a gain for poll when checking whether a user already voted, a significant gain when looking up a particular user's access in statistics modules, perhaps a gain when listing vocabularies, and perhaps a gain when looking up users by role. So benchmarking these last 3 seems useful.
@CorniI - looking at the number of bytes you got per request, I think there was some flaw in your process - looks like you were just hitting the "Access denied" page. You have < 300 bytes, while I has, for example, 73281 bytes. Did you have access enabled for anonymous users (especially for aggregator, etc)?
I had this problem when I first tried benchmarking - try running a test with ab with a high verbosity (e.g. ab -v6 -n1) do that you can be sure you you are getting a 200 response code. Also, you might want -c5 or some such.
Comment #136
pwolanin CreditAttribution: pwolanin commentedre-rolled patch - applies cleanly.
Comment #137
pwolanin CreditAttribution: pwolanin commented@CorniI - also you could run a set of EXPLAIN queries with the two set of indicies and show that we go from filesort to using the index. See the very top of the issue for one example.
Comment #138
catchStill RTBC.
Comment #139
Dries CreditAttribution: Dries commentedI've tested this patch and it seems to work. While the performance tests might not be extensive, I'll trust the community's judgement with regard to performance. I also value David's point about improved data integrity, although it might not be the best way to guarantee this. All in all, excellent job folks. :-)
Comment #140
catchNice to see this go in :)
Just realised that some of the reason for the inconclusive benchmarking may be that the biggest performance gain here was in system - already dealt with in #185126 separately, so didn't make it into the final patch despite being mentioned a lot early on.
Comment #141
CorniI CreditAttribution: CorniI commentedwhat do you expect me to do for GHOP now? Can david strauss give me a hint? I feel a bit lost ;)
Comment #142
AjK CreditAttribution: AjK commentedJust like to point out that making the users.mail field UNIQUE has now made it impossible to register a user without a unique email address. For the most part people would say "yes, and so....?". Well, the old drupal.module (moved from Core to Contrib as site_network.module) uses this. Ie, it would create a new user without an email address. So that's one "use case" for a user without an email address.
You can see this now by creating a d.o account and then using that to create a new login on g.d.o (you get a user without an email address which you can subseqently add/alter after a successful login).
Here's the error message:-
This is derived from line 314 of users.module:-
It's worth noting that Drupal records "New external user" in the watchdog just fine, it just fails to create the user account as the mail field isn't present in that INSERT. Since the anonymous user already has an empty users.mail field the Db quite rightly rejects the SQL.
If we are going to start making fields "unique" should we not be testing for uniqueness and the required presence of the field in any INSERT?
As for site_network.module, I'm not entirely sure which way to go. I could make it return the email address but (a) not sure that's wise in a world of "my data is private, don't give away at all" and (b) it would make site_network.module not backward compat with the older drupal.module.
Anyway, let's not discuss site_network.module on this issue. From my perspective this is "should we be making db fields UNIQUE without actually testing for uniqueness?" and can we have a generic query building line (as above) that's likely to fail if a unique field is missing? From the above, Drupal worked just fine and was unaware of a problem. It just failed to create the user.
Comment #143
Gábor HojtsyAjK: (offtopic) OpenID module is in a similar situation, so it requires that you provide your email address before saving you as a user (and it also even validates the email address with an activation mail sent).
(ontopic) Well, yes, if we have unique fields, the code should check whether there is an existing row with that value. I just encountered the same problem with taxonomy terms (on the forum admin interface) while testing. I accidentally tried to add two forum terms with the same name, and I got a PHP error, instead of a Drupal warning. That's the same unique field problem.
Comment #144
NancyDruAndy, I don't know about your systems, but I've been unable to add users (aside from the drupal module) without a unique email address since 4.7.6 (where I started). I find it a bit annoying because I sign myself up as "admin" (user/1) and as "nancy" (usually user/2), so I have to use two separate email addresses. (Maybe this is another use case.)
I would be more concerned that drupal/site_network doesn't store an email address. It should be retrieving that and saving on my site.
Comment #145
AjK CreditAttribution: AjK commentedGábor (offtopic), yup, the first thing I did was to look at OpenId to see what it was doing. When transffered to the OpenId server you have to supply a "nickname" and an "email address". So there I thought that's how that was working. However, when I actually tried to do a full cycle OpenId login all I got was a drupal_set_message("Login failed", "error") and nothing in the watchdog log as a clue to why it failed. I'm more assuming at this point that it failed due to teh way I create "test sites" on my local server but I'm not entirely sure as yet. But I would have expected a failed login to have left a record in the watchdog.
Getting back on topic, having looked at the patch it's entirely "database schema" related and no application code at all. I think if database schema changes are made then the application also needs to alter to cater for any specific instances (like those we are now facing). Imho this is one patch that should be rolled back.
Comment #146
AjK CreditAttribution: AjK commentedLooking into this a little further, I tried to initially add in some simple application checks. For example, using db_affected_rows() to ensure the INSERT was successful was my first method. Another method could be to check the UNIQUE fields exist and are infact unique.
However, user_save() should always return a user object so trying to return something differnent to indicate a failure (such as FALSE) breaks the API. And we're not in a position to start changing the API at this late stage of the release cycle.
So, in the case of user_save() there doesn't appear to be a bailout condition to handle errors of this type.
I've only looked at users.mail being made UNIQUE. As Gábor points out above, elsewhere he has hit a similar problem in taxonomy. Given that this patch "liberally sprinkled" database constraits around without any application code to support them, I'm still of the humble opinion this needs rolling back (even though I am a great fan of making the database work better for you!).
Comment #147
pwolanin CreditAttribution: pwolanin commentedSo - at the very least the users one probably needs to change now. Webernet also pointed out that the new unique indices may cause a failure during 5.x- > 6.x updates. For example, if someone already had two taxonomy terms with the same name, or two users with blank e-mail addresses.
Comment #148
pwolanin CreditAttribution: pwolanin commentedthis makes a the new user mail unique key a simple index instead.
Patch also need to add validation on taxonomy form validation, and checks in the update code to insure that the unique conditions now imposed for the taxonomy tables are met before adding the unique keys.
Comment #149
chx CreditAttribution: chx commentedAdding unique keys is a feature past feature freeze they never should happen. I have not looked into this issue before but unique keys added here should be removed IMNSHO and re added, one by one in a separate feature issue for D7. The reason? They could break update.
Comment #150
pwolanin CreditAttribution: pwolanin commenteda more complete rollback patch - untested
Comment #151
chx CreditAttribution: chx commentedI do not want to RTBC yet but a CNR is in order.
Comment #152
David StraussRemoving the UNIQUE index on the mail column is *wrong*. We use mail addresses to look up accounts; we cannot have duplicates. If you want the value absent for the mail column, allow it to be NULL. Multiple NULL values aren't considered a violation of the UNIQUE constraint.
Comment #153
pwolanin CreditAttribution: pwolanin commented@David - we all agree it is wrong, but user_save() does not do any checks to insure that the query won't fail. In other words, the amount of work on the PHP code to support this properly could be substantial, and is probably a bad idea at RC1. Also, is the duplicate NULL true across the DBs we do support (or hope to support)?
Comment #154
David Strauss@pwolanin How are we handling user_save() where the username is already taken? We could do the same thing with an email address that's already in use. And, yes, the duplicate NULL values in a UNIQUE index is a widespread feature in databases, not just a MySQL thing.
I just don't see how we can justify having things like password recovery use email lookups and not enforce uniqueness of email addresses for accounts.
What is it: are they unique or not? We should either enforce uniqueness across the board or remove the checks entirely.
Comment #155
pwolanin CreditAttribution: pwolanin commented@David - the user registration/edit form does validation checks to insure that the name and e-mail are unique.
However, user_save() does not check. So right now if you are foolish enough to make an API call to user_save() with a duplicate username, I think you'll get a failed query as well. And worse, you may get back the uid for the previously inserted user, and overwrite their data in hook_user.
http://api.drupal.org/api/function/user_save/6
Opened this issue to attempt some modicum of error checking in user_save(): http://drupal.org/node/204705
Comment #156
catchTested #150 on a clean install, and on a clean 5.x - 6.x upgrade.
No issues, passed schema testing etc.
Marking back to review. However, iirc this was committed before RC1 was released - do we need an upgrade path from RC1 to RC2?
Comment #157
coupet CreditAttribution: coupet commentedData integrity versus code logic?
Comment #158
pwolanin CreditAttribution: pwolanin commented@coupet - the problem is the lack of synchronization between the code and the DB constraints. If the code has not been updated to respect the new DB constraints, then it's a bug in the code.
Comment #159
Gábor HojtsyActually David had a reason to CNW this in #152. Along the same lines, we should check whether existing code validated uniqueness, and only remove the constraint where such validation is not in place and is not feasible to implement IMHO.
(I also got a note about an upgrade report with locale tables with unique indexes introduced in Drupal 6, which is also due to broken data: http://drupal.org/node/205613 but I am not sure at all that we should keep supporting broken data).
Comment #160
pwolanin CreditAttribution: pwolanin commented@Gabor - in #152 "wrong" is more of a moral/idealist/DBA judgment than a PHP code judgment. In Drupal 5, the system works ok without this consraint. If you want to make this column unique for D6, it will require a new update function, at the very least, and perhaps a reworking of the schema to allow this column to be NOT NULL, and a reworking of user_save() to allow it to save a NULL value to the mail column.
Comment #161
Gábor Hojtsypwolanin: Well, understood.
Now, we need to think about RC1 users. Although we did not support upgrades from betas, we have a tradition to support upgrades from RC1 onward. After all this is an RC, so people are possibly using it as "this is nearly a release", and rightly so. So as catch noted in #156, we should think about providing an upgrade path. The tricky part is that we need to modify the existing update code as well, since we would not like to cause problems there with the unique indexes added on upgrades (from Drupal 5), but we'd like to support upgrades from D6RC1, where the unique indexes are in place.
Comment #162
catchSo we have:
5.x - RC1 - RC2
RC1 - RC2
5.x - RC2
system_updates run sequentially right? Is it off the wall to suggest a new one to drop the uniques and create the new indexes and leaving the existing update intact? That may be stupid, but if it's not then it'd mean all cases get the same upgrade path - just 5.x - RC2 (and onwards) gets it all in one go but no special cases.
Comment #163
Gábor HojtsyThe problem with the unique indexes is that you might have a "broken" data set (ie. multiple user accounts without email address), so you get an error of that index not being possible to apply. This issue was reopened in part because of this symptom. Simply put, we should not leave any of the later dropped unique indexes created on the update path.
Comment #164
catchOf course, doh! I should've realised from reading the issue back a few posts. Thanks for the explanation.
Comment #165
Gábor HojtsyA few things we can do (all start with applying the changes to the existing update code, but continue on alternative ways):
1. Add a new update function which detects if there are unique indexes already added (RC1), and "demotes" them to plain indexes. If these unique indexes are not detected, do nothing.
2. Add a variable_set() to the existing, modified update function to store the info that we are running through it from a pre-RC1 update (those who already run the update will not have that variable). Add a new update function which demotes the unique indexes only if that variable is not set. If the variable is set, do nothing.
First is acting based on database schema detection, second is acting on a missing variable. The effect is the same, after the update, the database will be in a consistent state regardless of direct 5 => RC2 update or 5 => RC1 => RC2.
Comment #166
pwolanin CreditAttribution: pwolanin commented@Gabor - is there a core schema API function for determining the actual type of an index, or does that require schema module?
Comment #167
Gábor HojtsyWell, I don't know of such introspection functions in Drupal itself, and I looked into it now, but did not find anything, not even for checking whether an index exists at all or not, let alone the type of the index. What we have done previously is to communicate 'this update function ran' type of things with variable_set() and variable_get(), there are quite a few examples for this in (previous) update code.
Comment #168
Gábor HojtsyJust tested the update on a data set of a running Drupal site (before patch):
Failed: ALTER TABLE {users} ADD UNIQUE KEY mail (mail)
Failed: ALTER TABLE {term_data} ADD UNIQUE KEY vid_name (vid, name)
Failed: ALTER TABLE {term_node} ADD PRIMARY KEY (tid, vid)
By the way, from the error reports, all are result of same bad data, but this "feature update" would be too far fetched to fix, so I can validate the need to not add these indexes in a rush.
Comment #169
fractile81 CreditAttribution: fractile81 commentedI just wanted to chime in and say that the update fails for me when adding the UNIQUE to mail as well. Also, option 2 in #165 sounds like the path of least resistance for fixing this IMHO.
Comment #170
pwolanin CreditAttribution: pwolanin commentedhere's a go at an update function per #2. Not tested - please have at it.
It leaves the one on blocks - is that one good? It's on ('theme', 'module', 'delta') - which seems safe. The module, delta combination must be unique for each block and you can't have two themes with the same name.
Also seems ok based on the way the form is built:
http://api.drupal.org/api/function/block_admin_display_form/6
Though if the blocks one is ok, maybe we should leave the one on filters too? ('format', 'module', 'delta')
Comment #171
David StraussWe should try to leave the one on filters; it helped up find one bug already.
Comment #172
catch#170 works smoothly for rc-1 to 6.x-dev upgrade and passes schema.module testing. Didn't test 5.x-rc1-6.x or straight 6.x yet.
Comment #173
Gábor HojtsyAgreed with pwolanin and David on #171. I also have an interesting update issue on my test data on the tid, vid index on term_node added here, but I need to investigate that more.
Comment #174
pwolanin CreditAttribution: pwolanin commented@Gabor - that tid, vid index is just a conversion from the unique key (i.e. I didn't spend a lot of thought on it).
Setting to CNW to put back the filter unique key (unless I'm misreading #173?).
Comment #175
Gábor Hojtsypwolanin: yes, you understood it right.
I took some time to dig down to get insight on the tid, vid issue (without the patch):
- system_update_6001 adds the vid field to term_node so that term relations can be versioned
- it does
UPDATE {term_node} t SET vid = (SELECT vid FROM {node} n WHERE t.nid = n.nid)
- my term_node table had nid, tid relations for nonexistent nodes, so their vid became 0
- because there were multiple (10) instances of this, the addition of the unique index on tid, vid failed
So to solve this, I added
'DELETE FROM {term_node} WHERE nid NOT IN (SELECT nid FROM {node})'
to my usual pre-upgrade cleanup scripts, which make all the update work fine and "go green" (ie. no failed queries) at the end.Note that I have a very old data set which was migrated to Drupal from another system in 2003 and runs on Drupal since then, so these 10 faulty associations might have come in anytime. That said, these kinds of errors will probably be rare, but the above query should not hurt (it just removes term_node associations for nodes you have deleted already), so I'd suggest adding this in to the code before where we add the tid, vid unique index.
Comment #176
Gábor HojtsyBTW just to note, my update is all green now (without the patch) because I have added that DELETE and because I cleaned up the term_data and users table data, where the unique indexes caused problems earlier. I had two users with the same email address and a few terms with same names (tids in succession, so probably a symptom of PHP crashing and users resubmitting). So as said, my original data set validates the need for this patch, as others will not do these fine grained data set corrections I do, and the above note in #175 is just as addendum to add a safe data cleanup query into Drupal to help make a unique index work, which could otherwise fail as well.
Comment #177
pwolanin CreditAttribution: pwolanin commented@Gabor - will that query:
'DELETE FROM {term_node} WHERE nid NOT IN (SELECT nid FROM {node})'
scale to (e.g.) 200,000 nodes? Do we need to do that in batches?
Comment #178
catchI have three rows returned by
SELECT * FROM term_node WHERE nid NOT IN (SELECT nid FROM node);
This is a database from 4.5 which had a phpbb2drupal import run on it during 4.7. So rare, but not unheard of.
Comment #179
catchIf a site has 200,000 rows returned by that query, they have more serious problems than this update failing IMO.
Comment #180
Gábor Hojtsypwolanin: I have already quoted an UDPATE query above, which "joins" the term_node and node table, so it will get to even more rows. It does not work with a list of 200,000 values though. I am not sure how all the databases resolve these IN subqueries, and I don't have a bigger data set to performance test it with.
Comment #181
pwolanin CreditAttribution: pwolanin commentedhow about:
'DELETE FROM {term_node} WHERE vid = 0'
?Here's a re-rolled patch which leaves the filter table alone and adds this extra cleanup query.
Comment #182
pwolanin CreditAttribution: pwolanin commentedComment #183
David StraussFaster alternative to #178:
SELECT * FROM term_node tn
LEFT OUTER JOIN node n ON n.nid = tn.nid
WHERE n.vid IS NULL
Comment #184
pwolanin CreditAttribution: pwolanin commentedran both an update from 5.x and a clean install with this patch (w/ MySQL 5.0). Schema module reports no mismatches, so the patch code is at least correct.
Comment #185
yched CreditAttribution: yched commentedhttp://drupal.org/node/206941 looks like another occurrence of non code-backed unique indexes on {term_data} table
Comment #186
catchI can confirm this passes schema.module with a clean issue 5.x - 6.x upgrade.
'DELETE FROM {term_node} WHERE vid = 0'
ought to work fine since we already have the vid column by the time we get to system_update_6044(). It could be added to 6001 as well, but no need, since only sites which already upgraded to rc1 with bad entries in term_node (probably zero sites) will be affected by the unique index anyway. (I have to admit to running gabor's subquery on my actual site to clean it out, so I no longer have test data without restoring an older backup.).So, RTBC.
Comment #187
catchHmm, does this need testing on pgsql as well?
Comment #188
PanchoComment #189
chx CreditAttribution: chx commentedNo it does not need testing with postgresql. Schema API just works. Very nice job and cures my fears, thanks to all that participated (pwolanin, davidstrauss, catch and Gabor -- hope I have not forgotten anyone).
Comment #190
hswong3i CreditAttribution: hswong3i commentedReroll patch from #181 via CVS HEAD. Tested with Debian etch PostgreSQL 8.1.9, pass system_update_6044() with no question.
Comment #191
Gábor HojtsyThanks for the hard work and all the testing folks, committed.
Comment #192
Anonymous (not verified) CreditAttribution: Anonymous commentedAutomatically closed -- issue fixed for two weeks with no activity.
Comment #193
sammos CreditAttribution: sammos commentedHello I'm confused about something - I thought that installing 5.7 would pick up this patch, but the sitedoc module still recommends adding indexes. Also my 5.5 to 5.7 update script picked up no new updates. So do existing drupal installs need to manually add the indexes? And if so, has anybody made up an update script? thanks!
Comment #194
NancyDruWell, Sitedoc tells you that there are no indexes, not that you have to have them.
This was not a back-ported fix. You may go in and find the applicable ones to back port manually.
Comment #195
sammos CreditAttribution: sammos commentedyes I know - didn't mean to imply otherwise. But I'd like to have them and figured maybe somebody wrote an update script for v.5. I'll see about writing one and posting it. Any thoughts welcome about whether it's better to backtrace this thread or to look at the code in the sitedoc module as it did also make suggestions. Thanks!
Comment #196
NancyDruHmm, I don't remember what suggestions I would have made other than viewing this thread. I'll have to go look again.
Comment #197
sinmao CreditAttribution: sinmao commentedI've upgraded to 5.10. Do I still need this patch? Thx.
Comment #198
sinmao CreditAttribution: sinmao commentedHi, so were you able to make an update script?
Comment #199
NancyDruThis is for 6.x, not 5.x.
Comment #200
sinmao CreditAttribution: sinmao commentedUm...What should I do with 5.10, then? Thx for the reply.
Comment #201
NancyDruIf you are having specific problems, manually apply the indexes as shown in this patch.