Improve table indices for common queries [#164532]

Comment	File	Size	Author
#190	too-unique-164532-182.patch	4.8 KB	hswong3i

#181	too-unique-164532-181.patch	4.75 KB	pwolanin

#170	too-unique-164532-170b.patch	5.85 KB	pwolanin

#150	too-unique-164532-150.patch	4.48 KB	pwolanin

#148	too-unique-164532-148b.patch	1.44 KB	pwolanin

#136	i164532-136.patch	29.77 KB	pwolanin

#114	i164532-113.patch	29.77 KB	pwolanin

#101	i164532-100.patch	29.8 KB	pwolanin

#98	i164532-98.patch	28.12 KB	pwolanin

#96	i164532-96.patch	24.71 KB	pwolanin

#92	i164532-92.patch	24.73 KB	pwolanin

#90	i164532-90.patch	24.78 KB	pwolanin

#84	i164532-84.patch	23.68 KB	pwolanin

#82	i164532-82.patch	23.57 KB	pwolanin

#79	i164532-79.patch	22.53 KB	pwolanin

#78	i164532-78.patch	20.95 KB	pwolanin

#73	i164532-72.patch	20.1 KB	pwolanin

#71	i164532-68.patch	17.99 KB	pwolanin

#49	i164532.patch	16.49 KB	David Strauss

#36	indexes_minus_schema_change_0.patch	4.5 KB	catch

#34	indexes_minus_schema_change.patch	3.63 KB	catch

#33	indexes.patch	6.82 KB	catch

#30	improve_indices_install.patch	1.2 KB	catch

#28	improve-indices-164532-20_0.patch	1.93 KB	catch

#20	improve-indices-164532-20.patch	2.7 KB	pwolanin

#6	index_fix_3.patch	5.06 KB	pwolanin

#1	index_fix_2.patch	1.67 KB	pwolanin

	index_fix_1.patch	788 bytes	pwolanin

Comment #1

pwolanin CreditAttribution: pwolanin commented 5 August 2007 at 20:59

File	Size
index_fix_2.patch	1.67 KB

a small start too on taxonomy suggested by David Strauss.

Log in or register to post comments

Comment #2

moshe weitzman CreditAttribution: moshe weitzman commented 7 August 2007 at 16:48

subscribe ... thanks for diving into this.

Log in or register to post comments

Comment #3

Zothos CreditAttribution: Zothos commented 8 August 2007 at 21:32

Where else can the indexes increase the performance? I cant be only in the taxomy table... :P

Log in or register to post comments

Comment #4

BioALIEN CreditAttribution: BioALIEN commented 23 August 2007 at 12:14

Finally someone has stepped up to this. Thanks pwolanin!

Last month I analysed the database of a 5.x install. Here are the commends I noted:
1. node table, phpMyAdmin spit out a "PRIMARY & INDEX keys should not both be set of column 'nid'. More than one INDEX key was created for column 'status'."
2. blocks table, NO INDEX DEFINED! - added INDEX on delta or module columns?
3. flood table has no INDEX! - Create INDEX on timestamp?
4. All CCK generated tables require "nid" to be set as INDEX. Issue submitted to CCK devs: http://drupal.org/node/162535

The list goes on but it is related to contrib modules rather than core itself.

Hope this helps!

Log in or register to post comments

Comment #5

pwolanin CreditAttribution: pwolanin commented 23 August 2007 at 13:19

Thanks - there was an article recently (http://www.dave-cohen.com/node/1066) that pointed out that some tables such as blocks are missing a PRIMARY key (bad).

Log in or register to post comments

Comment #6

pwolanin CreditAttribution: pwolanin commented 30 August 2007 at 23:48

Priority:

Normal

» Critical

File	Size
index_fix_3.patch	5.06 KB

the frickin Actions module's table has a broken schema index declaration, no primary key, and no validation function to avoid inserting duplicates.

Also, the {locales_target} is missing a Primary key - should it be (lid, language, plid) or (lid, language, plural) or something else???

2 of 3 of the search module tables don't have a Primary key- is there another patch pending for these tables?

Log in or register to post comments

Comment #7

ChrisKennedy CreditAttribution: ChrisKennedy commented 1 September 2007 at 16:15

http://drupal.org/node/147298 has some earlier work on this issue by David if you haven't seen it.

Log in or register to post comments

Comment #8

David Strauss

he/him

CreditAttribution: David Strauss commented 2 September 2007 at 06:04

Subscribing.

Log in or register to post comments

Comment #9

pwolanin CreditAttribution: pwolanin commented 3 September 2007 at 01:46

wtf? Trying to double-check that the suggested patch by David Strauss for removing the bid column from {blocks} makes sense - there is obviously some confused code in block module:

$ grep -rn blocks.*bid *

modules/block/block.module:277:  return db_fetch_array(db_query("SELECT bx.*, bl.title FROM {boxes} bx INNER JOIN {blocks} bl ON bx.bid = bl.delta WHERE bl.module = 'block' AND bx.bid = %d", $bid));

modules/block/block.module:382:    $result = db_query(db_rewrite_sql("SELECT DISTINCT b.* FROM {blocks} b LEFT JOIN {blocks_roles} r ON b.module = r.module AND b.delta = r.delta WHERE b.theme = '%s' AND b.status = 1 AND (r.rid IN (%s) OR r.rid IS NULL) ORDER BY b.region, b.weight, b.module", 'b', 'bid'), $theme_key, implode(',', array_keys($user->roles)));

modules/block/block.admin.inc:296:  db_query("DELETE FROM {blocks} WHERE module = 'block' AND delta = %d", $form_state['values']['bid']);

modules/system/system.install:2497:    db_query("UPDATE {blocks} SET title = '%s' WHERE delta = %d and module = 'block'", $box->title, $box->bid);

So somehow the delta from {boxes} and the bid are the same in come cases?

Log in or register to post comments

Comment #10

David Strauss

he/him

CreditAttribution: David Strauss commented 3 September 2007 at 02:01

@pwolanin My patch in the old issue isn't nearly complete, and I wouldn't make such a change on this issue because this issue is performance-focused.

The reason I removed the bid column on the old issue's patch is because it's redundant, at best. We already identify blocks by module and delta; it's unnecessary to introduce another column. Also, because we so casually delete and insert rows from/into the blocks table to refresh it, an auto_increment column isn't even a reliable primary key.

Log in or register to post comments

Comment #11

pwolanin CreditAttribution: pwolanin commented 3 September 2007 at 02:09

@David Strauss - yes - seems an autoincrement doesn't make much sense. Still, perhaps defining a unique index on (module, delta, theme) would be simpler than removing this column (at least for 6.x)?

Log in or register to post comments

Comment #12

bjaspan CreditAttribution: bjaspan commented 5 September 2007 at 21:25

Subscribe.

Log in or register to post comments

Comment #13

m3avrck CreditAttribution: m3avrck commented 14 September 2007 at 15:49

This can still make it in Drupal 6 since it is both a bug and performance fix.

Log in or register to post comments

Comment #14

catch

he/him

English

CreditAttribution: catch commented 17 September 2007 at 00:03

Just a quick note that: http://drupal.org/node/146688 was duplicate, and has a different suggested index to the latest patch doesn't it?

David Strauss' post from that issue here for ref:

ALTER TABLE {term_data} DROP INDEX vid;
ALTER TABLE {term_data} ADD INDEX taxonomy_tree (vid, weight, name);

Log in or register to post comments

Comment #15

catch

he/him

English

CreditAttribution: catch commented 18 September 2007 at 11:28

http://drupal.org/node/172724

was also duplicate.

Log in or register to post comments

Comment #16

coupet CreditAttribution: coupet commented 18 September 2007 at 22:24

subscribe... we need a comprehensive solution for ver. 6

Log in or register to post comments

Comment #17

jaydub CreditAttribution: jaydub commented 23 September 2007 at 11:16

subscribe

Log in or register to post comments

Comment #18

dmitrig01 CreditAttribution: dmitrig01 commented 23 September 2007 at 17:27

Bump. Needs to be re-rolled for actions -> triggers

Log in or register to post comments

Comment #19

pwolanin CreditAttribution: pwolanin commented 26 September 2007 at 01:37

split off the Trigger module portion here: http://drupal.org/node/178650

Log in or register to post comments

Comment #20

pwolanin CreditAttribution: pwolanin commented 26 September 2007 at 02:11

File	Size
improve-indices-164532-20.patch	2.7 KB

re-rolling patch with the rest of the changes in the above patch

Log in or register to post comments

Comment #21

catch

he/him

English

CreditAttribution: catch commented 18 October 2007 at 12:07

This patch: http://drupal.org/node/147298 also related although not sure if/which should be marked as duplicate.

Log in or register to post comments

Comment #22

catch

he/him

English

CreditAttribution: catch commented 25 October 2007 at 08:40

#171685 was duplicate.

Log in or register to post comments

Comment #23

Dries CreditAttribution: Dries commented 25 October 2007 at 17:59

I'd love to commit this patch but it is hard to verify whether the changes actually improve things or not. All the information seems to be spread across multiple issues. Is there a good way to test this or do we just commit this and assume it does more good than wrong?

Log in or register to post comments

Comment #24

JirkaRybka CreditAttribution: JirkaRybka commented 25 October 2007 at 19:48

Another issue is http://drupal.org/node/185126 (system table, containing a patch, with upgrade path and benchmark attempt too (seems to be 3.5% gain for cached pages)).

Log in or register to post comments

Comment #25

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 29 October 2007 at 12:25

I can confirm (at least under 5.x) that adding theme, module, delta as a key to blocks makes a visible performance difference, although I have not benchmarked it.

Log in or register to post comments

Comment #26

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 29 October 2007 at 12:36

Also note that the "nid" and "status" indexes on node are redundant.

Log in or register to post comments

Comment #27

catch

he/him

English

CreditAttribution: catch commented 29 October 2007 at 16:59

Is this needs work because there's no upgrade path? What else needs to be done to get it in?

#69111 was yet another duplicate.

Log in or register to post comments

Comment #28

catch

he/him

English

CreditAttribution: catch commented 31 October 2007 at 12:03

File	Size
improve-indices-164532-20_0.patch	1.93 KB

Dries: http://drupal.org/node/146688 states that taxonomy_get_tree (forum index) currently uses a filesort. This is a big issue on drupal.org and any site with a decent number of forums. The proposed index by David Strauss in that issue is exactly the same as the one in this patch.

However the system changes look much better dealt with over at: http://drupal.org/node/185126 I've attached a patch with that section taken out so the system module index can be dealt with in it's own right.

So what does this need to get in? Upgrade path?

Log in or register to post comments

Comment #29

catch

he/him

English

CreditAttribution: catch commented 31 October 2007 at 12:30

ack! schema files are gone of course.

So reroll to move into .install + upgrade path. Sorry for so many posts in a row.

Log in or register to post comments

Comment #30

catch

he/him

English

CreditAttribution: catch commented 31 October 2007 at 12:48

Status:

Needs work

» Needs review

File	Size
improve_indices_install.patch	1.2 KB

This patch moves the schema changes back to .install.

This section:

  $schema['term_hierarchy'] = array(
@@ -35,7 +35,6 @@ function taxonomy_schema() {
     'indexes' => array(
       'nid' => array('nid'),
       'tid' => array('tid'),
-      'vid' => array('vid')

Appears to have already been changed, so I left that out. Marking back to review now. This still needs an upgrade path assuming agreement on the indexes themselves.

Log in or register to post comments

Comment #31

JirkaRybka CreditAttribution: JirkaRybka commented 1 November 2007 at 11:21

Related issue: http://drupal.org/node/85453 - Statistics tables.

Log in or register to post comments

Comment #32

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 11:55

#147298 was closed in favour of this one, however David Strauss' patch has a lot of work that didn't make it over here (and is against .schema files). Deals with aggregator, block, filter, system and taxonomy. system is best dealt with on that other issue, but I'll try to roll anything that doesn't conflict into a new patch against HEAD today. We'll still need an upgrade path for this once that's done though.

Log in or register to post comments

Comment #33

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 12:34

File	Size
indexes.patch	6.82 KB

Ok this is a re-roll of the previous patches in this issue with the addition of David Strauss' suggested changes linked above.

There are three schema changes in this (the removal of id columns in block, taxonomy and filter module schemas and their replacement with better primary keys) - I'm not sure if that will get in at this stage with beta, but I've left them in. If they should be taken out I'm happy to re-roll without those sections and either revive or create seperate issues for those particular changes rather than complicate this one.

Leaving at "needs review", but we still have zero upgrade path for any of these changes as yet, so more work to be done (and probably not by me, all I'm doing is reading issues and cut and pasting to get this presentable).

This is untested but I'll clear my dev copy + search through modules to see how much is broken by the schema changes a bit later.

Log in or register to post comments

Comment #34

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 13:05

File	Size
indexes_minus_schema_change.patch	3.63 KB

Thinking about it, we should absolutely not introduce any schema changes in this patch. I'll look for any relevant block/taxonomy/filter issues and split those changes over to there, probably for D7.

So this patch only deals with index changes, still needs review and upgrade path.

Log in or register to post comments

Comment #35

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 14:20

block.module currently only has a primary key on bid, nothing else, so presumably there are index optimisations possible which we could deal with here.

Linked issues above removed bid (autoincrement) entirely, I've split these to #14158 with a re-rolled patch for D7, don't see changes of that size getting into D6 at this point.

Log in or register to post comments

Comment #36

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 15:04

File	Size
indexes_minus_schema_change_0.patch	4.5 KB

More reading around, sorry for the multiple bumps, every time I think I'm done I find another issue trying to deal with this.

node.install has these indexes on {node}.

    'indexes' => array(
      'node_changed'        => array('changed'),
      'node_created'        => array('created'),
      'node_moderate'       => array('moderate'),
      'node_promote_status' => array('promote', 'status'),
      'node_status_type'    => array('status', 'type', 'nid'),
      'node_title_type'     => array('title', array('type', 4)),
      'node_type'           => array(array('type', 4)),
      'status'              => array('status'),
      'uid'                 => array('uid'),
      'tnid'                => array('tnid'),
      'translate'           => array('translate'),
      ),
    'unique keys' => array(
      'nid_vid' => array('nid', 'vid'),
      'vid'     => array('vid')
      ),
    'primary key' => array('nid'),
    );

This leads redundant indexes on nid and status, attached patch removes the ('nid', 'vid') unique key and the 'status' index as reported by nancyw in #26 (and phpmyadmin). There's still some redundancy even with this, but it'll require better eyes than mine to work out how to remove them cleanly without making something else worse.

Log in or register to post comments

Comment #37

David Strauss

he/him

CreditAttribution: David Strauss commented 1 November 2007 at 15:25

At this stage, we probably can't remove important indexes and columns in a way that would require code refactoring.

Log in or register to post comments

Comment #38

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 1 November 2007 at 18:29

I thought bid (in the boxes table) was how the box was linked to the block. If you remove it, how do they tie together?

Log in or register to post comments

Comment #39

David Strauss

he/him

CreditAttribution: David Strauss commented 1 November 2007 at 18:36

@nancyw Blocks are uniquely identified by module and delta. bid is a redundant unique index.

Log in or register to post comments

Comment #40

JirkaRybka CreditAttribution: JirkaRybka commented 1 November 2007 at 18:42

On boxes table, indeed bid is used, but the patch earlier in this issue removed bid on blocks table, where it seems to be really unused. The relation (as seen in the code) seems to be blocks.delta - boxes.bid . But anyway, this is D7 stuff, schema changes, so let's proceed wihout this, as suggested above.

Log in or register to post comments

Comment #41

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 1 November 2007 at 18:49

Ah, I see now. I was confused because there is no "bid" in the blocks table in D5. But I did manually add a primary on theme, module, and delta and saw a performance difference.

Log in or register to post comments

Comment #42

catch

he/him

English

CreditAttribution: catch commented 1 November 2007 at 19:18

primary on theme, module, and delta

This is the one main change I can think of that isn't in the patch as yet. I also looked through all D6 tables and I'm pretty sure the parent and tid indexes in {term_hierarchy} could go - primary key is ('tid', 'parent') anyway.

Log in or register to post comments

Comment #43

David Strauss

he/him

CreditAttribution: David Strauss commented 1 November 2007 at 19:47

The (tid, parent) key is not usable for queries selecting on parent alone.

Log in or register to post comments

Comment #44

JirkaRybka CreditAttribution: JirkaRybka commented 1 November 2007 at 21:09

Yet another frequently performed queries are in the locale module. I didn't test it, but the indexes seem to me not optimal too, and these tables are big (thousands of rows).

'locales_source' have primary key 'lid', and index 'source'(30), but almost all queries use also the new 'textgroup' column, so I guess it should be an index too. 'locales_target' have indexes 'lid', 'plid', 'plural', 'language' (which is fine), but no primary key (not sure whether this is a problem; the unique identifier here is lid+language FYI), and upgrade path is missing the new 'language' index (!!!) which might be a separate issue.

The frequent runtime queries are:
Uncached string (>75 chars long):
SELECT s.lid, t.translation, s.version FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = '%s' WHERE s.source = '%s' AND s.textgroup = 'default'
Cache rebuild (after new string added):
SELECT s.source, t.translation, t.language FROM {locales_source} s LEFT JOIN {locales_target} t ON s.lid = t.lid AND t.language = '%s' WHERE s.textgroup = 'default' AND s.version = '%s' AND LENGTH(s.source) < 75

Log in or register to post comments

Comment #45

pwolanin CreditAttribution: pwolanin commented 1 November 2007 at 21:41

@David - for blocks is the unique key really (module, delta), or is it (module, delta, theme)?

Log in or register to post comments

Comment #46

David Strauss

he/him

CreditAttribution: David Strauss commented 1 November 2007 at 22:06

@pwolanin Sorry, you're correct. You need the theme column to form a candidate key.

Log in or register to post comments

Comment #47

catch

he/him

English

CreditAttribution: catch commented 3 November 2007 at 11:33

Status:

Needs review

» Needs work

OK so block.module needs a key on theme, module, delta

@JirkaRybka - if the upgrade path is bad in locale, should we split to it's own issue or try to include here?

Marking to needs work for those, and because there's still no upgrade path in my patch from #36

Log in or register to post comments

Comment #48

JirkaRybka CreditAttribution: JirkaRybka commented 11 November 2007 at 14:09

Status:

Needs review

» Needs work

Re: #47 - I guess we should create a new issue if it's only the locale upgrade path (which I will as time permits), but if we decide to change some keys for locale here, then it might go together.

--------------------------------------
Edit (2007-11-11): Issue created at http://drupal.org/node/191236

Log in or register to post comments

Comment #49

David Strauss

he/him

CreditAttribution: David Strauss commented 3 November 2007 at 20:35

File	Size
i164532.patch	16.49 KB

Here's my latest work. I'd like to have people agree on what we want before I spend time writing update code.

Log in or register to post comments

Comment #50

David Strauss

he/him

CreditAttribution: David Strauss commented 3 November 2007 at 20:39

Status:

Needs work

» Needs review

I'd like people to review my proposed changes. Once we can agree on the indexes, I'll write the update code.

Log in or register to post comments

Comment #51

dmitrig01 CreditAttribution: dmitrig01 commented 4 November 2007 at 16:00

Status:

Needs review

» Needs work

The index on comments.status is good - usually we select comments where status = 1
The order in statistics.install is odd

Log in or register to post comments

Comment #52

David Strauss

he/him

CreditAttribution: David Strauss commented 4 November 2007 at 17:04

"The index on comments.status is good - usually we select comments where status = 1."

What queries did you have in mind? The mere existence of "status = 1" in the WHERE clause does not mean the index is useful.

Log in or register to post comments

Comment #53

chx CreditAttribution: chx commented 4 November 2007 at 21:56

Status:

Needs work

» Needs review

If it's CNW noone will look into it. That's not what we want at this point.

Log in or register to post comments

Comment #54

catch

he/him

English

CreditAttribution: catch commented 5 November 2007 at 14:58

The {term_node} indexes should probably also be applied to {forum} as well - it's the same fields, many similar queries (more if this patch gets in).

Log in or register to post comments

Comment #55

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 6 November 2007 at 01:48

As long as you're looking at clean up tasks, what is "tid" in the Permissions table? It is always 0 in all my sites.

Log in or register to post comments

Comment #56

scott.mclewin CreditAttribution: scott.mclewin commented 6 November 2007 at 03:32

I'm seeing slow queries show up on one of my sites is in the cron job cleanup of the watchdog logs. In my reading of the patch, I don't see this timestamp field indexed. Granted, this is not an eighteen second query time that my users explicitly sit through when browsing to a page, but it is tying up the database server, which would slow down other requests.

# Query_time: 18 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
DELETE FROM watchdog WHERE timestamp < 1191564006;

explain needs a select, so here's a variation on the delete:

Granted, I've kept too much watchdog log around (something I'm fixing), but the point for this thread is that the timestamp field in the watchdog table is a candidate for this patch.

Log in or register to post comments

Comment #57

David Strauss

he/him

CreditAttribution: David Strauss commented 6 November 2007 at 03:35

I'm against adding an index to watchdog. It will slow down INSERTs, which do occur during user page loads. Plus, watchdog clean-outs are moving more toward fix row limits, anyway.

Log in or register to post comments

Comment #58

scott.mclewin CreditAttribution: scott.mclewin commented 6 November 2007 at 04:02

That's a good point, and I didn't know about the row limit approach being taken/considered for the watchdog logs.

Log in or register to post comments

Comment #59

catch

he/him

English

CreditAttribution: catch commented 12 November 2007 at 10:04

I ran this snippet which identifies worst performing indexes, linked from here. I think David Strauss' patch already deals with some of these redundant indexes, but here's the results for reference (3 results removed because they were duplicates between drupal installs).

table..column..indexname..seq..cols..card..estrows..sel%
cache_page  	expire  	expire  	1  	1  	0  	990  	0.00
cache_views 	expire 	expire 	1 	1 	0 	28 	0.00
locales_target 	lang 	locale 	1 	1 	2 	16271 	0.01
locales_target 	plural 	plural 	1 	1 	1 	16271 	0.01
node 	node_promote_status 	promote 	1 	2 	1 	7089 	0.01
node 	node_moderate 	moderate 	1 	1 	1 	7089 	0.01
node 	node_status_type 	status 	1 	3 	1 	7089 	0.01

This script shows the top 10 worst indexes (in terms of selectivity %) on the whole MySQL server instance. Selectivity is the percentage of distinct values in an indexed field compared to the number of records in the table.

same page also says:

If index has 2 values and one of them is in 99% and other in 1% it well may be index is a good idea to look up for second value - status field is typical type of such field.

-which at least those node indexes are likely to fall under.

Log in or register to post comments

Comment #60

klando CreditAttribution: klando commented 12 November 2007 at 01:22

Status:

Needs work

» Needs review

Postgresql (>=8.2) users should start the statitistic, add lot of index, and after few days select from pg_stat* to see which index are used, or never used, etc.... It is usually a good starting point as it provide very good data on what was read from disk, memory, via an index or not

Log in or register to post comments

Comment #61

JirkaRybka CreditAttribution: JirkaRybka commented 12 November 2007 at 23:20

The locales_target lang/language/locale (or whatever is it named through the versions) index will get much more selective once you've more languages set up. With only just one language, all entries have the same, obviously.

Log in or register to post comments

Comment #62

catch

he/him

English

CreditAttribution: catch commented 19 November 2007 at 15:13

Well it looks like the only thing that needs doing with this patch is the statistics order and the upgrade path. None of the proposed changes have been challenged and additional indexes can always have their own issue later on.

edit: #49 still applies with offset. So it's small reroll of that plus upgrade path. Do we have a good example of an upgrade which adds and/or drops indexes and works on pgsql?

Log in or register to post comments

Comment #63

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 19 November 2007 at 15:07

Anyone taking this up?

Log in or register to post comments

Comment #64

pwolanin CreditAttribution: pwolanin commented 20 November 2007 at 01:29

If there is consensus now on the changes required, I'll try to work on the remaining pieces

Log in or register to post comments

Comment #65

catch

he/him

English

CreditAttribution: catch commented 20 November 2007 at 08:25

pwolanin, nothing added in #49 has been disputed as yet, and a lot in there had been proposed in other, duplicate, issues of this one. If there's anything missing it's easy to do a followup patch as well.

Log in or register to post comments

Comment #66

David Strauss

he/him

CreditAttribution: David Strauss commented 20 November 2007 at 14:10

If this patch causes any problems, there are only two possible causes:
1. We're trying to store invalid data into a table.
2. Or, we misunderstand what sort of data we should storing in a table.

Log in or register to post comments

Comment #67

pwolanin CreditAttribution: pwolanin commented 21 November 2007 at 01:48

looking though the patch #49 - whose comment is this?

-        'description' => t('A comma-separated string of roles; references {role}.rid.'),
+        'description' => t('A comma-separated string of roles; references {role}.rid.'), // This is stupid.

Log in or register to post comments

Comment #68

pwolanin CreditAttribution: pwolanin commented 21 November 2007 at 02:19

Status:

Needs review

» Needs work

ok -patch applies cleanly, but install breaks:

Warning: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '0 (nid) ) /*!40100 DEFAULT CHARACTER SET UTF8 */' at line 6 query: CREATE TABLE history ( `uid` INT NOT NULL DEFAULT 0, `nid` INT NOT NULL DEFAULT 0, `timestamp` INT NOT NULL DEFAULT 0, PRIMARY KEY (uid, nid), INDEX 0 (nid) ) /*!40100 DEFAULT CHARACTER SET UTF8 */ in /Users/Shared/www/drupal6/includes/database.mysql.inc on line 167

Warning: Specified key was too long; max key length is 1000 bytes query: CREATE TABLE blocks ( `bid` INT NOT NULL auto_increment, `module` VARCHAR(64) NOT NULL DEFAULT '', `delta` VARCHAR(32) NOT NULL DEFAULT '0', `theme` VARCHAR(255) NOT NULL DEFAULT '', `status` TINYINT NOT NULL DEFAULT 0, `weight` TINYINT NOT NULL DEFAULT 0, `region` VARCHAR(64) NOT NULL DEFAULT '', `custom` TINYINT NOT NULL DEFAULT 0, `throttle` TINYINT NOT NULL DEFAULT 0, `visibility` TINYINT NOT NULL DEFAULT 0, `pages` TEXT NOT NULL, `title` VARCHAR(64) NOT NULL DEFAULT '', `cache` TINYINT NOT NULL DEFAULT 1, PRIMARY KEY (bid), UNIQUE KEY tmd (theme, module, delta), INDEX list (theme, status, region, weight, module) ) /*!40100 DEFAULT CHARACTER SET UTF8 */ in /Users/Shared/www/drupal6/includes/database.mysql.inc on line 167

Log in or register to post comments

Comment #69

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 21 November 2007 at 02:34

I'm glad I'm not the only one to run into the length problem on blocks.

I have mine set to: theme(100), module(50), delta(6)

I can't imagine a theme name that long, but it worked and my performance improved, so I didn't change it.

If the theme name was changed to VARCHAR(64) like the module name, it might take care of the whole thing.

Log in or register to post comments

Comment #70

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 21 November 2007 at 02:39

Just to confirm that, I just changed one of my test sites so that theme is defined as VARCHAR(64). Then I deleted and recreated the primary key without lengths and it worked fine.

Log in or register to post comments

Comment #71

pwolanin CreditAttribution: pwolanin commented 21 November 2007 at 03:11

Status:

Needs work

» Needs review

File	Size
i164532-68.patch	17.99 KB

found this lovely error too, now that the filter index is unique:

Warning: Duplicate entry '1-filter-3' for key 2 query: INSERT INTO filters (format, module, delta, weight) VALUES (1, 'filter', 3, 10) in /Users/Shared/www/drupal6/includes/database.mysql.inc on line 167

This is due to a bug in system.install - I'll file a separate patch, since that's a trivial fix: http://drupal.org/node/194029

New patch attached. Limits the theme name to 64 chars for the index. The module field is already set to 64. Are theme names really allowed to be 255 chars?

Also, correct a defect in the {history} schema definition.

Log in or register to post comments

Comment #72

FiReaNGeL CreditAttribution: FiReaNGeL commented 21 November 2007 at 03:15

You can limit the index size on one part (such as theme name) to like 10 characters instead of limiting the field itself. You really don't have to index the full string here; and I'm sure there's other places like this in already existing indexes.

Log in or register to post comments

Comment #73

pwolanin CreditAttribution: pwolanin commented 21 November 2007 at 03:49

File	Size
i164532-72.patch	20.1 KB

ah - I cross-posted with nancyw - I think I'll agree that changing the column to 64 (rather than the index alone) is the better solution.

Here's a revised patch with a start on the update path.

Log in or register to post comments

Comment #74

pwolanin CreditAttribution: pwolanin commented 21 November 2007 at 03:52

@FiReaNG3L - yes, I also though of that first, but it doesn't really make sense to limit the length of the indexed part if we are trying to achieve a unique index. You cnnot be sure the index is unique unless you index the full length of the field.

Log in or register to post comments

Comment #75

FiReaNGeL CreditAttribution: FiReaNGeL commented 21 November 2007 at 21:57

pwolanin: my bad, I missed the unique part; if we can't change the table substantially (add a primary key field or something), then I agree that reducing the size of the field would be the only option. 255 chars for such a field is overkill anyway - whats the theme with the longest name nowadays?

Log in or register to post comments

Comment #76

catch

he/him

English

CreditAttribution: catch commented 21 November 2007 at 23:13

Status:

Needs review

» Needs work

fresh install. enabled all modules. installed schema.module, all smooth so far.

Log in or register to post comments

Comment #77

catch

he/him

English

CreditAttribution: catch commented 21 November 2007 at 23:03

Log in or register to post comments

Comment #78

pwolanin CreditAttribution: pwolanin commented 22 November 2007 at 00:54

Status:

Needs work

» Needs review

File	Size
i164532-78.patch	20.95 KB

ok - patch with (untested) update code. Code does not yet cover taxonomy, locale, or user.

locale needs bigger work that will be captured here: http://drupal.org/node/194310

taxonomy and user can be added to this patch (if anyone has time).

Log in or register to post comments

Comment #79

pwolanin CreditAttribution: pwolanin commented 22 November 2007 at 03:14

File	Size
i164532-79.patch	22.53 KB

opps. last patch missed upload too. This one should have all except locale - taxonomy and user are included.

Log in or register to post comments

Comment #80

catch

he/him

English

CreditAttribution: catch commented 22 November 2007 at 10:33

Status:

Needs review

» Needs work

OK I installed 5.x-dev, enabled all core modules. ugpraded to D6, installed schema module. Update went smoothly but the following errors reported:


    * profile_values.uid is part of the primary key but is not specified to be 'not null'.
    * profile_values.fid is part of the primary key but is not specified to be 'not null'.

    *
      vocabulary
          o indexes list: missing in database

    *
      profile_values
          o column fid:
            declared: array('type' => 'int', 'unsigned' => 1, 'not null' => , 'default' => 0)
            actual: array('type' => 'int', 'unsigned' => 1, 'not null' => 1, 'default' => 0, 'disp-width' => '10')
          o column uid:
            declared: array('type' => 'int', 'unsigned' => 1, 'not null' => , 'default' => 0)
            actual: array('type' => 'int', 'unsigned' => 1, 'not null' => 1, 'default' => 0, 'disp-width' => '10')

#

    *
      aggregator_category_feed
          o indexes fid: missing in database
    *
      aggregator_category_item
          o indexes iid: missing in database

Log in or register to post comments

Comment #81

bjaspan CreditAttribution: bjaspan commented 23 November 2007 at 04:04

Regarding primary key fields not being 'not null', please read http://drupal.org/node/159328. Bottom line, add 'not null' => TRUE to all field definitions in the primary key.

When schema.module says an index is "missing in database" it means the index is declared by the module's hook_schema() but no index has been created in the database, presumably because the database has been upgraded from D5 and the update functions do not yet create that index.

Log in or register to post comments

Comment #82

pwolanin CreditAttribution: pwolanin commented 23 November 2007 at 14:51

Status:

Needs work

» Needs review

File	Size
i164532-82.patch	23.57 KB

um, well - I'll have to run schema module on my other computer. Apparently it needs PHP 5.1 (see: http://drupal.org/node/194764 )

Anyhow, here's another (as yet untested) patch that should correct the errors above.

Log in or register to post comments

Comment #83

catch

he/him

English

CreditAttribution: catch commented 23 November 2007 at 16:37

Status:

Needs review

» Needs work

quick 5.x-6.x-patched upgrade, I still get this in schema module. Everything else looks fixed though!

      vocabulary
          o indexes list: missing in database

fwiw in the actual 6.x database itself, vocabulary has a primary key on vid, and nothing else. ought to be weight and name according to schema definition.

Log in or register to post comments

Comment #84

pwolanin CreditAttribution: pwolanin commented 23 November 2007 at 17:24

Status:

Needs work

» Needs review

File	Size
i164532-84.patch	23.68 KB

new patch catches the missing vocabulary index. I think everything else matches the schema definition.

Log in or register to post comments

Comment #85

catch

he/him

English

CreditAttribution: catch commented 23 November 2007 at 21:07

Status:

Needs review

» Reviewed & tested by the community

indeed it does.

Log in or register to post comments

Comment #86

bjaspan CreditAttribution: bjaspan commented 24 November 2007 at 01:22

Status:

Reviewed & tested by the community

» Needs work

Errors on pgsql.

* warning: pg_query() [function.pg-query]: Query failed: ERROR: constraint "upload_pkey" does not exist in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: ALTER TABLE upload DROP CONSTRAINT upload_pkey in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 790.
* warning: pg_query() [function.pg-query]: Query failed: ERROR: multiple primary keys for table "upload" are not allowed in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: ALTER TABLE upload ADD PRIMARY KEY (vid,fid) in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 778.

It's dinnertime, I'll try to look into this later.

Log in or register to post comments

Comment #87

pwolanin CreditAttribution: pwolanin commented 24 November 2007 at 02:24

I'm getting ready to try it on postres - did you get those errors on a 5.x upgrade, or on install?

Log in or register to post comments

Comment #88

pwolanin CreditAttribution: pwolanin commented 24 November 2007 at 03:09

a clean install with PostgeSQL 8.2 (obviously applying the patch before the install) doesn't give me any errors - and schema module reports only "Match" for all installed tables

Log in or register to post comments

Comment #89

pwolanin CreditAttribution: pwolanin commented 24 November 2007 at 03:45

ok - I'm seeing about these same sorts of errors now on a D5 -> D6 update. Looks for the first like the system update that's supposed to match the MySQL and PostgreSQL schema needs to do something more with the upload table? The index I see is named file_revisions_pkey, not upload_pkey

Log in or register to post comments

Comment #90

pwolanin CreditAttribution: pwolanin commented 24 November 2007 at 04:40

Status:

Needs work

» Needs review

File	Size
i164532-90.patch	24.78 KB

attached patch straightens out the primary key and other indexes for {upload} by fixing up system_update_6022(). This prevents the errors described above.

as expected, schema module reports a mismatch for locale module:

* languages
indexes list: missing in database

the {languages} index probably needs to be fixed in: http://drupal.org/node/194310 since it's not clear if those update functions will come before or after.

Log in or register to post comments

Comment #91

bjaspan CreditAttribution: bjaspan commented 24 November 2007 at 15:48

Status:

Needs review

» Needs work

With patch #90 applied, I get

* warning: pg_query() [function.pg-query]: Query failed: ERROR: index "locales_target_language_idx" does not exist in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 155.
* user warning: query: DROP INDEX locales_target_language_idx in C:\Documents and Settings\bjaspan\Home\WORK\drupal\head\includes\database.pgsql.inc on line 854.

along with the languages.list index warning, which sounds like is as expected. Note that I have not actually reviewed any of the code or new indexes for correctness, I just ran the D5 > D6 update for pgsql.

Log in or register to post comments

Comment #92

pwolanin CreditAttribution: pwolanin commented 24 November 2007 at 16:45

Status:

Needs work

» Needs review

File	Size
i164532-92.patch	24.73 KB

ah - at the last minute I added a line to drop an index- turns out it doesn't exist in the 5.x schema. I did it because in the patch for locale.install this index is removed form the 6.x schema. Attached patch is tested and should update without errors on pgSQL.

Log in or register to post comments

Comment #93

jaydub CreditAttribution: jaydub commented 29 November 2007 at 15:56

Testing results with both MySQL (5.0.38) and PostgreSQL (8.1):

Fresh Drupal 6 Beta 3
- Installed all modules
- Installed Schema module
- Ran schema compare. No mismatched tables
- Installed the patch. No errors in MySQL or PostgreSQL
- Ran schema compare. 3 mismatched tables in both MySQL and PostgreSQL:

upload:
upload
* primary key:
declared: array('vid', 'fid')
actual: array('fid', 'vid')
* indexes fid: missing in database
* indexes vid: unexpected (not an error)

locale:
languages
* indexes list: missing in database

book:
book
* unique keys nid: missing in database
* indexes nid: unexpected (not an error)

Re-ran test with fresh Drupal 5.3
- Installed all modules
- Installed Drupal 6 Beta 3 over Drupal 5 and ran upgrade. No errors in MySQL or PostgreSQL
- Installed Schema module
- Ran schema compare. No mismatched tables
- Installed the patch. No errors in MySQL or PostgreSQL
- Ran schema compare. 3 mismatched tables in both MySQL and PostgreSQL same as above.

Log in or register to post comments

Comment #94

catch

he/him

English

CreditAttribution: catch commented 29 November 2007 at 16:19

jaydub - please test with the latest cvs checkout, or at least the latest development tarball:
http://drupal.org/node/97368
http://drupal.org/node/320

Also, you'll need to apply the patch before you run update.php from 5.3 to 6.x since that's the main issue here.

Log in or register to post comments

Comment #95

catch

he/him

English

CreditAttribution: catch commented 30 November 2007 at 13:18

Status:

Needs review

» Needs work

needs a re-roll in light of this: http://drupal.org/node/194310

Log in or register to post comments

Comment #96

pwolanin CreditAttribution: pwolanin commented 2 December 2007 at 21:12

Status:

Needs work

» Needs review

File	Size
i164532-96.patch	24.71 KB

simple re-roll for system.install function numbering, etc - not tested yet.

Log in or register to post comments

Comment #97

catch

he/him

English

CreditAttribution: catch commented 2 December 2007 at 21:53

#96 applies cleanly and passes schema testing on a D5-6 update (MySQL). Missing locale index is still there but was being dealt with elsewhere I think.

Log in or register to post comments

Comment #98

pwolanin CreditAttribution: pwolanin commented 3 December 2007 at 00:03

File	Size
i164532-98.patch	28.12 KB

re-roll including putting the proper schema API stuff into locale.install based on http://drupal.org/node/194310#comment-637940

Log in or register to post comments

Comment #99

pwolanin CreditAttribution: pwolanin commented 3 December 2007 at 01:10

Status:

Needs review

» Needs work

ok - getting some errors on PG upgrade:

warning: pg_query() [function.pg-query]: Query failed: ERROR: column "Account settings" does not exist LINE 1: UPDATE profile_fields SET category = "Account settings" WHER... ^ in /var/www/drupal/includes/database.pgsql.inc on line 155.
user warning: query: UPDATE profile_fields SET category = "Account settings" WHERE LOWER(category) = "account" in /var/www/drupal/modules/system/system.install on line 2705.
warning: pg_query() [function.pg-query]: Query failed: ERROR: column "javascript" of relation "languages" already exists in /var/www/drupal/includes/database.pgsql.inc on line 155.
user warning: query: ALTER TABLE languages ADD COLUMN javascript varchar(32) NOT NULL default '' in /var/www/drupal/includes/database.pgsql.inc on line 700.

I'm not sure if the first set is due to this patch - second set certainly is

Log in or register to post comments

Comment #100

pwolanin CreditAttribution: pwolanin commented 3 December 2007 at 01:49

Log in or register to post comments

Comment #101

pwolanin CreditAttribution: pwolanin commented 3 December 2007 at 01:49

File	Size
i164532-100.patch	29.8 KB

ok - first error is due to bad use of quotes in a different patch. See:

http://drupal.org/node/188498#comment-646495

new patch attached which fixes locale module error above (and renumbers locale's update functions).

With this patch schema module reports 100% match of schema to definition with 5.x->6.x update on postgres

Log in or register to post comments

Comment #102

pwolanin CreditAttribution: pwolanin commented 3 December 2007 at 01:51

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #103

catch

he/him

English

CreditAttribution: catch commented 3 December 2007 at 09:31

Status:

Needs review

» Reviewed & tested by the community

Tested again on MySQL and this fully passes schema testing on 5-6 upgrade. Should be ready.

Log in or register to post comments

Comment #104

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 December 2007 at 09:48

Can we run benchmarks on some key pages with and without the patch? Dries had a concern up above. I read all the follow ups in this issue, and it seems like all the indexes added came from actual site experience, slow query logs, worst performing indexes, etc. It would be great to validate this with some benchmarks though.

Log in or register to post comments

Comment #105

David Strauss

he/him

CreditAttribution: David Strauss commented 3 December 2007 at 09:52

"It would be great to validate this with some benchmarks though."

Just a friendly reminder that this patch isn't just about performance. The new and improved UNIQUE indexes improve data integrity.

Log in or register to post comments

Comment #106

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 December 2007 at 10:29

Also, the "stupid" comment on user roles should be toned down a bit IMHO. We know not everything in Drupal is best architecture wise, but we can document this more politely IMHO. (Of course this is a minor little nitpick which I can modify on commit even, so take your time with the benchmarks instead).

Log in or register to post comments

Comment #107

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 3 December 2007 at 18:24

Title:

Improve table indicies for common queries

» GHOP #63: Improve table indicies for common queries

Made benchmarking this into a GHOP task: http://code.google.com/p/google-highly-open-participation-drupal/issues/...

Log in or register to post comments

Comment #108

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 December 2007 at 19:40

Status:

Reviewed & tested by the community

» Needs review

Log in or register to post comments

Comment #109

pwolanin CreditAttribution: pwolanin commented 10 December 2007 at 02:29

no one seems to have claimed this as a GHOP task yet - but we do need it done soon...

Log in or register to post comments

Comment #110

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 10 December 2007 at 02:33

I specified as part of the task that it expires when this issue is marked 'fixed'. So anyone is free to benchmark, regardless if a student signs on to do this or not.

Log in or register to post comments

Comment #111

catch

he/him

English

CreditAttribution: catch commented 11 December 2007 at 14:21

Status:

Needs review

» Needs work

system_update_6041() just got committed, so this needs a reroll.

Log in or register to post comments

Comment #112

CorniI CreditAttribution: CorniI commented 12 December 2007 at 12:39

I claimed this task one minute ago in GHOP ;)
I'm in IRC Corni

Log in or register to post comments

Comment #113

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 12 December 2007 at 18:59

Great, awaiting your feedback.

Log in or register to post comments

Comment #114

pwolanin CreditAttribution: pwolanin commented 12 December 2007 at 21:01

File	Size
i164532-113.patch	29.77 KB

here's a re-rolled patch - it applies cleanly, but not tested further.

Log in or register to post comments

Comment #115

pwolanin CreditAttribution: pwolanin commented 12 December 2007 at 21:13

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #116

catch

he/him

English

CreditAttribution: catch commented 12 December 2007 at 22:27

Just a quick note to say Corni kindly let us know that he'll be unable to begin this task until at least the weekend and it's been reopened for now on the GHOP task list. So this is still open for anyone to bench.

Log in or register to post comments

Comment #117

pwolanin CreditAttribution: pwolanin commented 13 December 2007 at 03:27

ok - some very crude benchmarks - I installed all core modules that create tables, created some users and content (~1500 nodes) with devel_generate, added a couple feeds manually. Increased items on the front page to 30.

(note - this exercise makes me think poll module and aggregator have some bugs).

using ab (1.3) across my local net to the server (an old Dell running kubuntu with Apache 2.2, PHP 5.2, MySQL 5.0 - no query cache).

$ ab -n200 -c2 http://192.168.1.5/drupal/node      
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/node
Document Length:        73281 bytes

Concurrency Level:      2
Time taken for tests:   216.135 seconds
Complete requests:      200
Failed requests:        0
Broken pipe errors:     0
Total transferred:      14758200 bytes
HTML transferred:       14656200 bytes
Requests per second:    0.93 [#/sec] (mean)
Time per request:       2161.35 [ms] (mean)
Time per request:       1080.68 [ms] (mean, across all concurrent requests)
Transfer rate:          68.28 [Kbytes/sec] received

Aggregator:


$ ab -n200 -c2 http://192.168.1.5/drupal/aggregator
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/aggregator
Document Length:        25144 bytes

Concurrency Level:      2
Time taken for tests:   123.125 seconds
Complete requests:      200
Failed requests:        16
   (Connect: 0, Length: 16, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      5130778 bytes
HTML transferred:       5028778 bytes
Requests per second:    1.62 [#/sec] (mean)
Time per request:       1231.25 [ms] (mean)
Time per request:       615.62 [ms] (mean, across all concurrent requests)
Transfer rate:          41.67 [Kbytes/sec] received

Tracker

$ ab -n200 -c2 http://192.168.1.5/drupal/tracker
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/tracker
Document Length:        10735 bytes

Concurrency Level:      2
Time taken for tests:   107.285 seconds
Complete requests:      200
Failed requests:        76
   (Connect: 0, Length: 76, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      2247514 bytes
HTML transferred:       2145514 bytes
Requests per second:    1.86 [#/sec] (mean)
Time per request:       1072.85 [ms] (mean)
Time per request:       536.42 [ms] (mean, across all concurrent requests)
Transfer rate:          20.95 [Kbytes/sec] received

did a mysql dump (omitting create table commands):

 mysqldump -uroot -p -c --create-options -q --disable-keys -t dp60 > dump.sql

dropped all tables. Applied patch from #114. Installed Drupal and enabled modules. Truncated all tables from mysql prompt and then loaded the SQL dump back in.

Node

$ ab -n200 -c2 http://192.168.1.5/drupal/node
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/node
Document Length:        73281 bytes

Concurrency Level:      2
Time taken for tests:   216.310 seconds
Complete requests:      200
Failed requests:        0
Broken pipe errors:     0
Total transferred:      14758200 bytes
HTML transferred:       14656200 bytes
Requests per second:    0.92 [#/sec] (mean)
Time per request:       2163.10 [ms] (mean)
Time per request:       1081.55 [ms] (mean, across all concurrent requests)
Transfer rate:          68.23 [Kbytes/sec] received

Aggregator

$ ab -n200 -c2 http://192.168.1.5/drupal/aggregator
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/aggregator
Document Length:        25144 bytes

Concurrency Level:      2
Time taken for tests:   122.903 seconds
Complete requests:      200
Failed requests:        0
Broken pipe errors:     0
Total transferred:      5152520 bytes
HTML transferred:       5050010 bytes
Requests per second:    1.63 [#/sec] (mean)
Time per request:       1229.03 [ms] (mean)
Time per request:       614.52 [ms] (mean, across all concurrent requests)
Transfer rate:          41.92 [Kbytes/sec] received

Tracker

$ ab -n200 -c2 http://192.168.1.5/drupal/tracker
This is ApacheBench, Version 1.3d <$Revision: 1.73 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.5 (be patient)
Completed 100 requests
Finished 200 requests
Server Software:        Apache/2.2.4                                       
Server Hostname:        192.168.1.5
Server Port:            80

Document Path:          /drupal/tracker
Document Length:        10735 bytes

Concurrency Level:      2
Time taken for tests:   107.643 seconds
Complete requests:      200
Failed requests:        74
   (Connect: 0, Length: 74, Exceptions: 0)
Broken pipe errors:     0
Total transferred:      2247505 bytes
HTML transferred:       2145505 bytes
Requests per second:    1.86 [#/sec] (mean)
Time per request:       1076.43 [ms] (mean)
Time per request:       538.22 [ms] (mean, across all concurrent requests)
Transfer rate:          20.88 [Kbytes/sec] received

So - the benchmarks in this case look essentially unchanged. However, that's expected since many of the index changes relate to data integrity, not speed.

Log in or register to post comments

Comment #118

klando CreditAttribution: klando commented 13 December 2007 at 09:22

I haven't parse all the patch, not applyed.

But, my 2 cents:

on table like that (only few thousand of rows) indices are not really use, and the change absent, because of cache.

data integrity should also be done by using foreign key.

I don't know what are the benefits to chage primary key like that : PK(a,b) vs PK(b,a) what do you really expect ?

Anyway the best thing is to look to the explain and explain analyze results.

Benchs must be done on bigest DB.

Log in or register to post comments

Comment #120

catch

he/him

English

CreditAttribution: catch commented 13 December 2007 at 11:15

@klando Reordering of primary keys (and other indexes) - MySQL reads indexes left to right, so (a,b) gives a free index on (a). This eliminates the overhead of some duplicate indexes which are removed in the patch.

MyISAM doesn't have proper (any?) support for foreign keys.

Log in or register to post comments

Comment #121

pwolanin CreditAttribution: pwolanin commented 13 December 2007 at 12:42

@klando - feel free to come up with and do better benchmarks. We all know ab is not the best tool.

Otherwise, I think this is RTBC....

Log in or register to post comments

Comment #122

catch

he/him

English

CreditAttribution: catch commented 13 December 2007 at 15:22

Status:

Needs review

» Reviewed & tested by the community

Anyone I 'know' is welcome to a 5.x dump with c. 19k nodes, 2.5k users and 250k comments for benchmarking if that helps. But yes I'm moving this back to rtbc. Benchmarks are good, but I this is about more than performance (and it would only be an issue if they showed a performance hit, which they don't, so we only win here).

Log in or register to post comments

Comment #123

David Strauss

he/him

CreditAttribution: David Strauss commented 13 December 2007 at 16:33

"so (a,b) gives a free index on (a)"

That's 99% true. There is a touch more overhead for MySQL to use an (a, b) index for just (a) than a simple (a) index. But *generally*, that doesn't merit the disk space and processing time needed to create an index on (a) when you already have (a, b).

Log in or register to post comments

Comment #124

CorniI CreditAttribution: CorniI commented 13 December 2007 at 16:44

OK, I saw that i'm able to claim a task and did it in a week, not in 2 days. so i'd do this saturday. But is this now already done, or do you want me to benchmark this, too (Platform: Gentoo Linux, MySQl 5.0, PHP 5.2.5, Core2Duo E6320, 4GB RAM)?

Log in or register to post comments

Comment #125

catch

he/him

English

CreditAttribution: catch commented 13 December 2007 at 18:43

David Strauss, thanks for the clarification!

Cornil: until this gets committed it's not done yet. Alternative benchmarks (and with a bigger dataset) would be great. As would running some relevant queries through EXPLAIN like klando suggested, although that's outside the remit of the GHOP task of course.

Log in or register to post comments

Comment #126

pwolanin CreditAttribution: pwolanin commented 14 December 2007 at 00:01

@Corni - an additional round of benchmarking would be well worth while if you are set up to do it.

Log in or register to post comments

Comment #127

CorniI CreditAttribution: CorniI commented 14 December 2007 at 20:15

OK, i've a db of ~580MB created (devel_generate is teh rul0r ;)) Is this enough, or shall I add one or 2hundred megs? I'll post the benchnmark the next day (saturday), then you're able to commit and fix this issue as fast as possible :)

Log in or register to post comments

Comment #128

pwolanin CreditAttribution: pwolanin commented 15 December 2007 at 02:26

@CorniI - see if you can find someone to show you how to put a translation in too, plus some feeds, etc. Maybe also some user profile categories and data if possible.

Probably that's big enough - make sure that you can dump and re-load without the table structure (unless you can think of a better way to bench the same content with different indices). As above, make sure you have no query cache. Hit several of the relevant pages - such as: a listing of users with a profile category, one or more taxonomy pages, aggregator, and a page with lots of comments,

Log in or register to post comments

Comment #129

CorniI CreditAttribution: CorniI commented 15 December 2007 at 17:48

This is my GHOP task, if anyone of the Maintainers can say that's ok then i'll post it at google's tracker, so you can mark my task as closed :)

Specs:
Core2Duo E6320, overclocked to 2,24Ghz 4GB Ram unknown hdd
Preparations:
I prepared a database of ~585M
>102000 Rows in comments
~84000 in node
~27000 in term_data
~16850000 in term_node
200000 in users
and 10000 in vocabulary
ALl modules are enabled, except of search, the search index can't be built if you have that amount of data (at least, with 100nodes/cron cron stops executing with php errors, with below nodes/cron i get the message cron failed when executing it manually).
I added manually one forum container+forum+1thread, and some randomly chosen heise news-feeds. I enabled all Blocks and sent them randomly over the page ;)
I runned cron directly before I started ab
What I benched:
the aggregator page
the tracker page
the index
a heavily, big node with >4200pages of comments and a huge amount of terms
I exited X for the benchmarks, so there shouldn't be that much load except from ab and apache/mysql.
After benching the unindiced tables, I deleted all tables, applied the patch and reinstalled drupal. I activated all modules. I truncated all tables. I imported my old SQL. I redid the benchs.
All benchmarks are done with
$ab2 -c1 -n500 url

Compared results:

/node

unpatched

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/node
Document Length:        266 bytes

Concurrency Level:      1
Time taken for tests:   0.75467 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      215000 bytes
HTML transferred:       133000 bytes
Requests per second:    6625.41 [#/sec] (mean)
Time per request:       0.151 [ms] (mean)
Time per request:       0.151 [ms] (mean, across all concurrent requests)
Transfer rate:          2769.42 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

patched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/node
Document Length:        266 bytes

Concurrency Level:      1
Time taken for tests:   0.78039 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      215000 bytes
HTML transferred:       133000 bytes
Requests per second:    6407.05 [#/sec] (mean)
Time per request:       0.156 [ms] (mean)
Time per request:       0.156 [ms] (mean, across all concurrent requests)
Transfer rate:          2678.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

Result:
for an unknown reason, the patched version is a bit slower then the unpatched drupal

heavily overloaded node:

unpatched:

  This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
  Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
  Copyright 2006 The Apache Software Foundation, http://www.apache.org/

  Benchmarking localhost (be patient)


  Server Software:        Apache
  Server Hostname:        localhost
  Server Port:            80

  Document Path:          /head/node/45
  Document Length:        269 bytes

  Concurrency Level:      1
  Time taken for tests:   0.79884 seconds
  Complete requests:      500
  Failed requests:        0
  Write errors:           0
  Non-2xx responses:      500
  Total transferred:      216500 bytes
  HTML transferred:       134500 bytes
  Requests per second:    6259.08 [#/sec] (mean)
  Time per request:       0.160 [ms] (mean)
  Time per request:       0.160 [ms] (mean, across all concurrent requests)
  Transfer rate:          2641.33 [Kbytes/sec] received

  Connection Times (ms)
                min  mean[+/-sd] median   max
                Connect:        0    0   0.0      0       0
                Processing:     0    0   0.0      0       0
                Waiting:        0    0   0.0      0       0
                Total:          0    0   0.0      0       0

                Percentage of the requests served within a certain time (ms)
                  50%      0
                    66%      0
                      75%      0
                        80%      0
                          90%      0
                            95%      0
                              98%      0
                                99%      0
                                 100%      0 (longest request)

patched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/node/45
Document Length:        269 bytes

Concurrency Level:      1
Time taken for tests:   0.76488 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      216500 bytes
HTML transferred:       134500 bytes
Requests per second:    6536.97 [#/sec] (mean)
Time per request:       0.153 [ms] (mean)
Time per request:       0.153 [ms] (mean, across all concurrent requests)
Transfer rate:          2758.60 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

Result:

Here we have a bit faster drupal - as expected for the heavily database stuff

/tracker

unpatched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/tracker
Document Length:        269 bytes

Concurrency Level:      1
Time taken for tests:   0.80013 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      216500 bytes
HTML transferred:       134500 bytes
Requests per second:    6248.98 [#/sec] (mean)
Time per request:       0.160 [ms] (mean)
Time per request:       0.160 [ms] (mean, across all concurrent requests)
Transfer rate:          2637.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

patched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/tracker
Document Length:        269 bytes

Concurrency Level:      1
Time taken for tests:   0.92375 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      216500 bytes
HTML transferred:       134500 bytes
Requests per second:    5412.72 [#/sec] (mean)
Time per request:       0.185 [ms] (mean)
Time per request:       0.185 [ms] (mean, across all concurrent requests)
Transfer rate:          2284.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.6      0      13
              Waiting:        0    0   0.6      0      13
              Total:          0    0   0.6      0      13

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%     13 (longest request)

Result:

For an unknown reason, we take more time :(

/aggregator

unpatched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/aggregator
Document Length:        272 bytes

Concurrency Level:      1
Time taken for tests:   0.80175 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      218000 bytes
HTML transferred:       136000 bytes
Requests per second:    6236.36 [#/sec] (mean)
Time per request:       0.160 [ms] (mean)
Time per request:       0.160 [ms] (mean, across all concurrent requests)
Transfer rate:          2644.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

patched:

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)


Server Software:        Apache
Server Hostname:        localhost
Server Port:            80

Document Path:          /head/aggregator
Document Length:        272 bytes

Concurrency Level:      1
Time taken for tests:   0.77273 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Non-2xx responses:      500
Total transferred:      218000 bytes
HTML transferred:       136000 bytes
Requests per second:    6470.57 [#/sec] (mean)
Time per request:       0.155 [ms] (mean)
Time per request:       0.155 [ms] (mean, across all concurrent requests)
Transfer rate:          2743.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
              Connect:        0    0   0.0      0       0
              Processing:     0    0   0.0      0       0
              Waiting:        0    0   0.0      0       0
              Total:          0    0   0.0      0       0

              Percentage of the requests served within a certain time (ms)
                50%      0
                  66%      0
                    75%      0
                      80%      0
                        90%      0
                          95%      0
                            98%      0
                              99%      0
                               100%      0 (longest request)

Result:

Here we gain a bit of speed, not that much, but, a bit :)

Overall Conclusion: I can't recommend anything, sorry guys. But I'd commit the patch due to the better node-load-page-time

Log in or register to post comments

Comment #130

SteveR CreditAttribution: SteveR commented 15 December 2007 at 18:41

Log in or register to post comments

Comment #131

SteveR CreditAttribution: SteveR commented 15 December 2007 at 18:42

Just subscribing

Log in or register to post comments

Comment #132

David Strauss

he/him

CreditAttribution: David Strauss commented 15 December 2007 at 18:51

These performance tests have been entirely untargeted for what this patch affects.

Log in or register to post comments

Comment #133

CorniI CreditAttribution: CorniI commented 15 December 2007 at 19:24

and for what *is* the patch? and what should i've tested as part of GHOP #63?

Log in or register to post comments

Comment #134

hass CreditAttribution: hass commented 15 December 2007 at 22:56

Status:

Reviewed & tested by the community

» Needs work

Needs a reroll for system update 6042 -> 6043

Log in or register to post comments

Comment #135

pwolanin CreditAttribution: pwolanin commented 16 December 2007 at 15:23

@DavidStrauss - please do share which benchmarks you think would be valuable - this issue has been held up on this.

Looking at the patch, I see changes to the aggregator tables - though probably not performance change from these. The changes for blocks and book are more about data integrity. There might be some performance gain from the changes to contact, maybe some gains when listing filters, perhaps a gain for poll when checking whether a user already voted, a significant gain when looking up a particular user's access in statistics modules, perhaps a gain when listing vocabularies, and perhaps a gain when looking up users by role. So benchmarking these last 3 seems useful.

@CorniI - looking at the number of bytes you got per request, I think there was some flaw in your process - looks like you were just hitting the "Access denied" page. You have < 300 bytes, while I has, for example, 73281 bytes. Did you have access enabled for anonymous users (especially for aggregator, etc)?

I had this problem when I first tried benchmarking - try running a test with ab with a high verbosity (e.g. ab -v6 -n1) do that you can be sure you you are getting a 200 response code. Also, you might want -c5 or some such.

Log in or register to post comments

Comment #136

pwolanin CreditAttribution: pwolanin commented 16 December 2007 at 21:32

Status:

Needs work

» Needs review

File	Size
i164532-136.patch	29.77 KB

re-rolled patch - applies cleanly.

Log in or register to post comments

Comment #137

pwolanin CreditAttribution: pwolanin commented 17 December 2007 at 00:11

@CorniI - also you could run a set of EXPLAIN queries with the two set of indicies and show that we go from filesort to using the index. See the very top of the issue for one example.

Log in or register to post comments

Comment #138

catch

he/him

English

CreditAttribution: catch commented 17 December 2007 at 00:16

Status:

Needs review

» Reviewed & tested by the community

Still RTBC.

Log in or register to post comments

Comment #139

Dries CreditAttribution: Dries commented 18 December 2007 at 13:01

Status:

Reviewed & tested by the community

» Fixed

I've tested this patch and it seems to work. While the performance tests might not be extensive, I'll trust the community's judgement with regard to performance. I also value David's point about improved data integrity, although it might not be the best way to guarantee this. All in all, excellent job folks. :-)

Log in or register to post comments

Comment #140

catch

he/him

English

CreditAttribution: catch commented 18 December 2007 at 15:06

Nice to see this go in :)

Just realised that some of the reason for the inconclusive benchmarking may be that the biggest performance gain here was in system - already dealt with in #185126 separately, so didn't make it into the final patch despite being mentioned a lot early on.

Log in or register to post comments

Comment #141

CorniI CreditAttribution: CorniI commented 21 December 2007 at 11:09

what do you expect me to do for GHOP now? Can david strauss give me a hint? I feel a bit lost ;)

Log in or register to post comments

Comment #142

AjK CreditAttribution: AjK commented 28 December 2007 at 13:09

Priority:	Critical	» Normal
Status:	Fixed	» Active

Just like to point out that making the users.mail field UNIQUE has now made it impossible to register a user without a unique email address. For the most part people would say "yes, and so....?". Well, the old drupal.module (moved from Core to Contrib as site_network.module) uses this. Ie, it would create a new user without an email address. So that's one "use case" for a user without an email address.

You can see this now by creating a d.o account and then using that to create a new login on g.d.o (you get a user without an email address which you can subseqently add/alter after a successful login).

Here's the error message:-

user warning: Duplicate entry '' for key 'mail' query: INSERT INTO users (name, pass, init, status, created) VALUES ('AjK@drupal.org', '********************************', 'AjK@drupal.org', 1, 1198842942) in /usr/home/ajk/WS/DRUPAL/HEAD/drupal/modules/user/user.module on line 314.

This is derived from line 314 of users.module:-

  db_query('INSERT INTO {users} ('. implode(', ', $fields) .') VALUES ('. implode(', ', $s) .')', $values);

It's worth noting that Drupal records "New external user" in the watchdog just fine, it just fails to create the user account as the mail field isn't present in that INSERT. Since the anonymous user already has an empty users.mail field the Db quite rightly rejects the SQL.

If we are going to start making fields "unique" should we not be testing for uniqueness and the required presence of the field in any INSERT?

As for site_network.module, I'm not entirely sure which way to go. I could make it return the email address but (a) not sure that's wise in a world of "my data is private, don't give away at all" and (b) it would make site_network.module not backward compat with the older drupal.module.

Anyway, let's not discuss site_network.module on this issue. From my perspective this is "should we be making db fields UNIQUE without actually testing for uniqueness?" and can we have a generic query building line (as above) that's likely to fail if a unique field is missing? From the above, Drupal worked just fine and was unaware of a problem. It just failed to create the user.

Log in or register to post comments

Comment #143

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 28 December 2007 at 13:26

AjK: (offtopic) OpenID module is in a similar situation, so it requires that you provide your email address before saving you as a user (and it also even validates the email address with an activation mail sent).

(ontopic) Well, yes, if we have unique fields, the code should check whether there is an existing row with that value. I just encountered the same problem with taxonomy terms (on the forum admin interface) while testing. I accidentally tried to add two forum terms with the same name, and I got a PHP error, instead of a Drupal warning. That's the same unique field problem.

Log in or register to post comments

Comment #144

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 28 December 2007 at 13:28

Andy, I don't know about your systems, but I've been unable to add users (aside from the drupal module) without a unique email address since 4.7.6 (where I started). I find it a bit annoying because I sign myself up as "admin" (user/1) and as "nancy" (usually user/2), so I have to use two separate email addresses. (Maybe this is another use case.)

I would be more concerned that drupal/site_network doesn't store an email address. It should be retrieving that and saving on my site.

Log in or register to post comments

Comment #145

AjK CreditAttribution: AjK commented 28 December 2007 at 13:53

Gábor (offtopic), yup, the first thing I did was to look at OpenId to see what it was doing. When transffered to the OpenId server you have to supply a "nickname" and an "email address". So there I thought that's how that was working. However, when I actually tried to do a full cycle OpenId login all I got was a drupal_set_message("Login failed", "error") and nothing in the watchdog log as a clue to why it failed. I'm more assuming at this point that it failed due to teh way I create "test sites" on my local server but I'm not entirely sure as yet. But I would have expected a failed login to have left a record in the watchdog.

Getting back on topic, having looked at the patch it's entirely "database schema" related and no application code at all. I think if database schema changes are made then the application also needs to alter to cater for any specific instances (like those we are now facing). Imho this is one patch that should be rolled back.

Log in or register to post comments

Comment #146

AjK CreditAttribution: AjK commented 28 December 2007 at 14:19

Looking into this a little further, I tried to initially add in some simple application checks. For example, using db_affected_rows() to ensure the INSERT was successful was my first method. Another method could be to check the UNIQUE fields exist and are infact unique.

However, user_save() should always return a user object so trying to return something differnent to indicate a failure (such as FALSE) breaks the API. And we're not in a position to start changing the API at this late stage of the release cycle.

So, in the case of user_save() there doesn't appear to be a bailout condition to handle errors of this type.

I've only looked at users.mail being made UNIQUE. As Gábor points out above, elsewhere he has hit a similar problem in taxonomy. Given that this patch "liberally sprinkled" database constraits around without any application code to support them, I'm still of the humble opinion this needs rolling back (even though I am a great fan of making the database work better for you!).

Log in or register to post comments

Comment #147

pwolanin CreditAttribution: pwolanin commented 28 December 2007 at 15:04

Title:	GHOP #63: Improve table indicies for common queries	» Improve table indicies for common queries
Priority:	Normal	» Critical

So - at the very least the users one probably needs to change now. Webernet also pointed out that the new unique indices may cause a failure during 5.x- > 6.x updates. For example, if someone already had two taxonomy terms with the same name, or two users with blank e-mail addresses.

Log in or register to post comments

Comment #148

pwolanin CreditAttribution: pwolanin commented 28 December 2007 at 21:11

Status:

Active

» Needs work

File	Size
too-unique-164532-148b.patch	1.44 KB

this makes a the new user mail unique key a simple index instead.

Patch also need to add validation on taxonomy form validation, and checks in the update code to insure that the unique conditions now imposed for the taxonomy tables are met before adding the unique keys.

Log in or register to post comments

Comment #149

chx CreditAttribution: chx commented 28 December 2007 at 21:15

Adding unique keys is a feature past feature freeze they never should happen. I have not looked into this issue before but unique keys added here should be removed IMNSHO and re added, one by one in a separate feature issue for D7. The reason? They could break update.

Log in or register to post comments

Comment #150

pwolanin CreditAttribution: pwolanin commented 28 December 2007 at 21:27

File	Size
too-unique-164532-150.patch	4.48 KB

a more complete rollback patch - untested

Log in or register to post comments

Comment #151

chx CreditAttribution: chx commented 28 December 2007 at 22:05

Status:

Needs work

» Needs review

I do not want to RTBC yet but a CNR is in order.

Log in or register to post comments

Comment #152

David Strauss

he/him

CreditAttribution: David Strauss commented 28 December 2007 at 22:13

Status:

Needs review

» Needs work

Removing the UNIQUE index on the mail column is *wrong*. We use mail addresses to look up accounts; we cannot have duplicates. If you want the value absent for the mail column, allow it to be NULL. Multiple NULL values aren't considered a violation of the UNIQUE constraint.

Log in or register to post comments

Comment #153

pwolanin CreditAttribution: pwolanin commented 29 December 2007 at 15:45

@David - we all agree it is wrong, but user_save() does not do any checks to insure that the query won't fail. In other words, the amount of work on the PHP code to support this properly could be substantial, and is probably a bad idea at RC1. Also, is the duplicate NULL true across the DBs we do support (or hope to support)?

Log in or register to post comments

Comment #154

David Strauss

he/him

CreditAttribution: David Strauss commented 29 December 2007 at 17:09

@pwolanin How are we handling user_save() where the username is already taken? We could do the same thing with an email address that's already in use. And, yes, the duplicate NULL values in a UNIQUE index is a widespread feature in databases, not just a MySQL thing.

I just don't see how we can justify having things like password recovery use email lookups and not enforce uniqueness of email addresses for accounts.

What is it: are they unique or not? We should either enforce uniqueness across the board or remove the checks entirely.

Log in or register to post comments

Comment #155

pwolanin CreditAttribution: pwolanin commented 30 December 2007 at 03:21

@David - the user registration/edit form does validation checks to insure that the name and e-mail are unique.

However, user_save() does not check. So right now if you are foolish enough to make an API call to user_save() with a duplicate username, I think you'll get a failed query as well. And worse, you may get back the uid for the previously inserted user, and overwrite their data in hook_user.

http://api.drupal.org/api/function/user_save/6

Opened this issue to attempt some modicum of error checking in user_save(): http://drupal.org/node/204705

Log in or register to post comments

Comment #156

catch

he/him

English

CreditAttribution: catch commented 30 December 2007 at 11:15

Status:

Needs work

» Needs review

Tested #150 on a clean install, and on a clean 5.x - 6.x upgrade.

No issues, passed schema testing etc.

Marking back to review. However, iirc this was committed before RC1 was released - do we need an upgrade path from RC1 to RC2?

Log in or register to post comments

Comment #157

coupet CreditAttribution: coupet commented 31 December 2007 at 02:37

Data integrity versus code logic?

Log in or register to post comments

Comment #158

pwolanin CreditAttribution: pwolanin commented 31 December 2007 at 03:11

@coupet - the problem is the lack of synchronization between the code and the DB constraints. If the code has not been updated to respect the new DB constraints, then it's a bug in the code.

Log in or register to post comments

Comment #159

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 2 January 2008 at 16:18

Status:

Needs review

» Needs work

Actually David had a reason to CNW this in #152. Along the same lines, we should check whether existing code validated uniqueness, and only remove the constraint where such validation is not in place and is not feasible to implement IMHO.

(I also got a note about an upgrade report with locale tables with unique indexes introduced in Drupal 6, which is also due to broken data: http://drupal.org/node/205613 but I am not sure at all that we should keep supporting broken data).

Log in or register to post comments

Comment #160

pwolanin CreditAttribution: pwolanin commented 2 January 2008 at 17:01

Status:

Needs work

» Needs review

@Gabor - in #152 "wrong" is more of a moral/idealist/DBA judgment than a PHP code judgment. In Drupal 5, the system works ok without this consraint. If you want to make this column unique for D6, it will require a new update function, at the very least, and perhaps a reworking of the schema to allow this column to be NOT NULL, and a reworking of user_save() to allow it to save a NULL value to the mail column.

Log in or register to post comments

Comment #161

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 January 2008 at 10:43

Status:

Needs review

» Needs work

pwolanin: Well, understood.

Now, we need to think about RC1 users. Although we did not support upgrades from betas, we have a tradition to support upgrades from RC1 onward. After all this is an RC, so people are possibly using it as "this is nearly a release", and rightly so. So as catch noted in #156, we should think about providing an upgrade path. The tricky part is that we need to modify the existing update code as well, since we would not like to cause problems there with the unique indexes added on upgrades (from Drupal 5), but we'd like to support upgrades from D6RC1, where the unique indexes are in place.

Log in or register to post comments

Comment #162

catch

he/him

English

CreditAttribution: catch commented 3 January 2008 at 11:00

So we have:
5.x - RC1 - RC2
RC1 - RC2
5.x - RC2

system_updates run sequentially right? Is it off the wall to suggest a new one to drop the uniques and create the new indexes and leaving the existing update intact? That may be stupid, but if it's not then it'd mean all cases get the same upgrade path - just 5.x - RC2 (and onwards) gets it all in one go but no special cases.

Log in or register to post comments

Comment #163

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 January 2008 at 11:28

The problem with the unique indexes is that you might have a "broken" data set (ie. multiple user accounts without email address), so you get an error of that index not being possible to apply. This issue was reopened in part because of this symptom. Simply put, we should not leave any of the later dropped unique indexes created on the update path.

Log in or register to post comments

Comment #164

catch

he/him

English

CreditAttribution: catch commented 3 January 2008 at 11:53

Of course, doh! I should've realised from reading the issue back a few posts. Thanks for the explanation.

Log in or register to post comments

Comment #165

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 January 2008 at 11:58

A few things we can do (all start with applying the changes to the existing update code, but continue on alternative ways):

1. Add a new update function which detects if there are unique indexes already added (RC1), and "demotes" them to plain indexes. If these unique indexes are not detected, do nothing.

2. Add a variable_set() to the existing, modified update function to store the info that we are running through it from a pre-RC1 update (those who already run the update will not have that variable). Add a new update function which demotes the unique indexes only if that variable is not set. If the variable is set, do nothing.

First is acting based on database schema detection, second is acting on a missing variable. The effect is the same, after the update, the database will be in a consistent state regardless of direct 5 => RC2 update or 5 => RC1 => RC2.

Log in or register to post comments

Comment #166

pwolanin CreditAttribution: pwolanin commented 3 January 2008 at 13:17

@Gabor - is there a core schema API function for determining the actual type of an index, or does that require schema module?

Log in or register to post comments

Comment #167

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 January 2008 at 13:32

Well, I don't know of such introspection functions in Drupal itself, and I looked into it now, but did not find anything, not even for checking whether an index exists at all or not, let alone the type of the index. What we have done previously is to communicate 'this update function ran' type of things with variable_set() and variable_get(), there are quite a few examples for this in (previous) update code.

Log in or register to post comments

Comment #168

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 3 January 2008 at 15:03

Just tested the update on a data set of a running Drupal site (before patch):

Failed: ALTER TABLE {users} ADD UNIQUE KEY mail (mail)
Failed: ALTER TABLE {term_data} ADD UNIQUE KEY vid_name (vid, name)
Failed: ALTER TABLE {term_node} ADD PRIMARY KEY (tid, vid)

By the way, from the error reports, all are result of same bad data, but this "feature update" would be too far fetched to fix, so I can validate the need to not add these indexes in a rush.

Log in or register to post comments

$fractile81’s picture$

Comment #169

fractile81 CreditAttribution: fractile81 commented 3 January 2008 at 21:23

I just wanted to chime in and say that the update fails for me when adding the UNIQUE to mail as well. Also, option 2 in #165 sounds like the path of least resistance for fixing this IMHO.

Log in or register to post comments

Comment #170

pwolanin CreditAttribution: pwolanin commented 4 January 2008 at 02:05

Status:

Needs work

» Needs review

File	Size
too-unique-164532-170b.patch	5.85 KB

here's a go at an update function per #2. Not tested - please have at it.

It leaves the one on blocks - is that one good? It's on ('theme', 'module', 'delta') - which seems safe. The module, delta combination must be unique for each block and you can't have two themes with the same name.

Also seems ok based on the way the form is built:
http://api.drupal.org/api/function/block_admin_display_form/6

Though if the blocks one is ok, maybe we should leave the one on filters too? ('format', 'module', 'delta')

Log in or register to post comments

Comment #171

David Strauss

he/him

CreditAttribution: David Strauss commented 4 January 2008 at 02:49

We should try to leave the one on filters; it helped up find one bug already.

Log in or register to post comments

Comment #172

catch

he/him

English

CreditAttribution: catch commented 4 January 2008 at 09:53

#170 works smoothly for rc-1 to 6.x-dev upgrade and passes schema.module testing. Didn't test 5.x-rc1-6.x or straight 6.x yet.

Log in or register to post comments

Comment #173

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 4 January 2008 at 12:15

Agreed with pwolanin and David on #171. I also have an interesting update issue on my test data on the tid, vid index on term_node added here, but I need to investigate that more.

Log in or register to post comments

Comment #174

pwolanin CreditAttribution: pwolanin commented 4 January 2008 at 12:32

Status:

Needs review

» Needs work

@Gabor - that tid, vid index is just a conversion from the unique key (i.e. I didn't spend a lot of thought on it).

Setting to CNW to put back the filter unique key (unless I'm misreading #173?).

Log in or register to post comments

Comment #175

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 4 January 2008 at 13:04

pwolanin: yes, you understood it right.

I took some time to dig down to get insight on the tid, vid issue (without the patch):

- system_update_6001 adds the vid field to term_node so that term relations can be versioned
- it does UPDATE {term_node} t SET vid = (SELECT vid FROM {node} n WHERE t.nid = n.nid)
- my term_node table had nid, tid relations for nonexistent nodes, so their vid became 0
- because there were multiple (10) instances of this, the addition of the unique index on tid, vid failed

So to solve this, I added 'DELETE FROM {term_node} WHERE nid NOT IN (SELECT nid FROM {node})' to my usual pre-upgrade cleanup scripts, which make all the update work fine and "go green" (ie. no failed queries) at the end.

Note that I have a very old data set which was migrated to Drupal from another system in 2003 and runs on Drupal since then, so these 10 faulty associations might have come in anytime. That said, these kinds of errors will probably be rare, but the above query should not hurt (it just removes term_node associations for nodes you have deleted already), so I'd suggest adding this in to the code before where we add the tid, vid unique index.

Log in or register to post comments

Comment #176

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 4 January 2008 at 13:07

BTW just to note, my update is all green now (without the patch) because I have added that DELETE and because I cleaned up the term_data and users table data, where the unique indexes caused problems earlier. I had two users with the same email address and a few terms with same names (tids in succession, so probably a symptom of PHP crashing and users resubmitting). So as said, my original data set validates the need for this patch, as others will not do these fine grained data set corrections I do, and the above note in #175 is just as addendum to add a safe data cleanup query into Drupal to help make a unique index work, which could otherwise fail as well.

Log in or register to post comments

Comment #177

pwolanin CreditAttribution: pwolanin commented 4 January 2008 at 13:13

@Gabor - will that query: 'DELETE FROM {term_node} WHERE nid NOT IN (SELECT nid FROM {node})'

scale to (e.g.) 200,000 nodes? Do we need to do that in batches?

Log in or register to post comments

Comment #178

catch

he/him

English

CreditAttribution: catch commented 4 January 2008 at 13:24

I have three rows returned by SELECT * FROM term_node WHERE nid NOT IN (SELECT nid FROM node);

This is a database from 4.5 which had a phpbb2drupal import run on it during 4.7. So rare, but not unheard of.

Log in or register to post comments

Comment #179

catch

he/him

English

CreditAttribution: catch commented 4 January 2008 at 13:28

If a site has 200,000 rows returned by that query, they have more serious problems than this update failing IMO.

Log in or register to post comments

Comment #180

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 4 January 2008 at 13:45

pwolanin: I have already quoted an UDPATE query above, which "joins" the term_node and node table, so it will get to even more rows. It does not work with a list of 200,000 values though. I am not sure how all the databases resolve these IN subqueries, and I don't have a bigger data set to performance test it with.

Log in or register to post comments

Comment #181

pwolanin CreditAttribution: pwolanin commented 5 January 2008 at 00:42

File	Size
too-unique-164532-181.patch	4.75 KB

how about: 'DELETE FROM {term_node} WHERE vid = 0'?

Here's a re-rolled patch which leaves the filter table alone and adds this extra cleanup query.

Log in or register to post comments

Comment #182

pwolanin CreditAttribution: pwolanin commented 5 January 2008 at 01:17

Status:

Needs work

» Needs review

Log in or register to post comments

Comment #183

David Strauss

he/him

CreditAttribution: David Strauss commented 5 January 2008 at 16:46

Faster alternative to #178:

SELECT * FROM term_node tn
LEFT OUTER JOIN node n ON n.nid = tn.nid
WHERE n.vid IS NULL

Log in or register to post comments

Comment #184

pwolanin CreditAttribution: pwolanin commented 6 January 2008 at 01:29

ran both an update from 5.x and a clean install with this patch (w/ MySQL 5.0). Schema module reports no mismatches, so the patch code is at least correct.

Log in or register to post comments

Comment #185

yched CreditAttribution: yched commented 6 January 2008 at 16:16

http://drupal.org/node/206941 looks like another occurrence of non code-backed unique indexes on {term_data} table

Log in or register to post comments

Comment #186

catch

he/him

English

CreditAttribution: catch commented 7 January 2008 at 15:05

Status:

Needs review

» Reviewed & tested by the community

I can confirm this passes schema.module with a clean issue 5.x - 6.x upgrade.

'DELETE FROM {term_node} WHERE vid = 0' ought to work fine since we already have the vid column by the time we get to system_update_6044(). It could be added to 6001 as well, but no need, since only sites which already upgraded to rc1 with bad entries in term_node (probably zero sites) will be affected by the unique index anyway. (I have to admit to running gabor's subquery on my actual site to clean it out, so I no longer have test data without restoring an older backup.).

So, RTBC.

Log in or register to post comments

Comment #187

catch

he/him

English

CreditAttribution: catch commented 7 January 2008 at 16:02

Status:

Reviewed & tested by the community

» Needs review

Hmm, does this need testing on pgsql as well?

Log in or register to post comments

Comment #188

Pancho

UTC+2 🇪🇺 EU

CreditAttribution: Pancho commented 7 January 2008 at 20:33

Title:

Improve table indicies for common queries

» Improve table indices for common queries

Log in or register to post comments

Comment #189

chx CreditAttribution: chx commented 8 January 2008 at 01:39

Status:

Needs review

» Reviewed & tested by the community

No it does not need testing with postgresql. Schema API just works. Very nice job and cures my fears, thanks to all that participated (pwolanin, davidstrauss, catch and Gabor -- hope I have not forgotten anyone).

Log in or register to post comments

Comment #190

hswong3i CreditAttribution: hswong3i commented 8 January 2008 at 06:12

File	Size
too-unique-164532-182.patch	4.8 KB

Reroll patch from #181 via CVS HEAD. Tested with Debian etch PostgreSQL 8.1.9, pass system_update_6044() with no question.

Log in or register to post comments

Comment #191

Gábor Hojtsy

he/him

Hungarian

Hungary

CreditAttribution: Gábor Hojtsy commented 8 January 2008 at 07:55

Status:

Reviewed & tested by the community

» Fixed

Thanks for the hard work and all the testing folks, committed.

Log in or register to post comments

Comment #192

Anonymous (not verified) CreditAttribution: Anonymous commented 22 January 2008 at 08:04

Status:

Fixed

» Closed (fixed)

Automatically closed -- issue fixed for two weeks with no activity.

Log in or register to post comments

Comment #193

sammos CreditAttribution: sammos commented 7 June 2008 at 20:07

Hello I'm confused about something - I thought that installing 5.7 would pick up this patch, but the sitedoc module still recommends adding indexes. Also my 5.5 to 5.7 update script picked up no new updates. So do existing drupal installs need to manually add the indexes? And if so, has anybody made up an update script? thanks!

Log in or register to post comments

Comment #194

NancyDru

she/her/hers

English

Boston

CreditAttribution: NancyDru commented 8 June 2008 at 07:38

Well, Sitedoc tells you that there are no indexes, not that you have to have them.

This was not a back-ported fix. You may go in and find the applicable ones to back port manually.

Log in or register to post comments

Comment #195

sammos CreditAttribution: sammos commented 8 June 2008 at 15:17

yes I know - didn't mean to imply otherwise. But I'd like to have them and figured maybe somebody wrote an update script for v.5. I'll see about writing one and posting it. Any thoughts welcome about whether it's better to backtrace this thread or to look at the code in the sitedoc module as it did also make suggestions. Thanks!