Currently, there are the following indexes on the {user_relationships} table.
mysql> SHOW INDEXES FROM user_relationships;
*************************** 1. row ***************************
Table: user_relationships
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: requester_id
Collation: A
Cardinality: 13768
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 2. row ***************************
Table: user_relationships
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 2
Column_name: requestee_id
Collation: A
Cardinality: 234072
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 3. row ***************************
Table: user_relationships
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 3
Column_name: rtid
Collation: A
Cardinality: 234072
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 4. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: requester_id
Seq_in_index: 1
Column_name: requester_id
Collation: A
Cardinality: 10177
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 5. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: requestee_id
Seq_in_index: 1
Column_name: requestee_id
Collation: A
Cardinality: 6502
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 6. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: rtid
Seq_in_index: 1
Column_name: rtid
Collation: A
Cardinality: 11
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
*************************** 7. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: rid
Seq_in_index: 1
Column_name: rid
Collation: A
Cardinality: 234072
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
7 rows in set (0.01 sec)
As can be seen from these indexes, rtid has a very low cardinality. This means that any indexes (PRIMARY and rtid) can cause quite a performance hit. I think it's a fair assumption that most sites will have very few contact types compared to the number of relationships, but I'd love to hear about any cases to the contrary. On this site in particular, there is only one relationship type (it's using InnoDB so cardinality is an estimate), so removing the indexes on rtid provides a very significant performance improvement.
The following indexes are working pretty well for me in testing:
mysql> SHOW INDEXES FROM user_relationships;
*************************** 1. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: requester_id
Seq_in_index: 1
Column_name: requester_id
Collation: A
Cardinality: 14655
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 2. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: requestee_id
Seq_in_index: 1
Column_name: requestee_id
Collation: A
Cardinality: 12212
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 3. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: rid
Seq_in_index: 1
Column_name: rid
Collation: A
Cardinality: 219825
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 4. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: relationship
Seq_in_index: 1
Column_name: requester_id
Collation: A
Cardinality: 10991
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
*************************** 5. row ***************************
Table: user_relationships
Non_unique: 1
Key_name: relationship
Seq_in_index: 2
Column_name: requestee_id
Collation: A
Cardinality: 219825
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Removing the rtid index is easy. Unfortunately, removing the PRIMARY key would expose the table to duplicate rows. I'd really like to have a way to enforce the uniqueness of (requester_id, requestee_id, rtid) without the performance impact of the index, but I'm not sure how to do that.
As a rough benchmark, these index changes brought the average time loading /relationships/1 from 4.8s to 1.8s.
Comment | File | Size | Author |
---|---|---|---|
#2 | 1187608.2-drop_evil_indexes-6.x.patch | 1.8 KB | deviantintegral |
#2 | 1187608.2-drop_evil_indexes-7.x.patch | 1.62 KB | deviantintegral |
Comments
Comment #1
mrf CreditAttribution: mrf commentedI'd be shocked if anyone comes forward with more the 10 relationship types on a site, so I think you're right to make the assumption about cardinality.
Just a heads up that 6.x is mainly in maintenance mode and big changes are likely to go un-commited, but I'd love to see an exception for the db performance improvements you have been suggesting. I'm assuming similar gains could also apply to 7.x.
Comment #2
deviantintegral CreditAttribution: deviantintegral commentedHere's a pair of patches.
Comment #3
BerdirI'm not convinced that removing the primary key is an improvement.
Some EXPLAIN's would be nice, on a site with your amount of data (I currently only have a few test sites). Should be easy by clicking around in the relationship listings and using devel.module's query log.
Comment #4
deviantintegral CreditAttribution: deviantintegral commentedHere's a query:
Here's an EXPLAIN:
The following queries show the difference in performance. The {user_relationships} table has around 220K rows.
Comment #5
BerdirWould it be possible that you can send me the content of user_relationships table? I'd like to do my own tests with a database of your size. I don't think it contains any personal information...
Comment #6
giorgio79 CreditAttribution: giorgio79 commentedhttp://www.lullabot.com/articles/slow-queries-check-cardinality-your-mys...
Comment #7
anavarreSubscribe
Comment #8
mdupontIf the primary key is removed, won't it allow duplicate entries? How about at least having a unique constraint on the requester_id, requestee_id, rtid tuple?
Comment #9
rjbrown99 CreditAttribution: rjbrown99 commentedSubscribe, happy to test on a reasonably large 6.x site - few relationship types (two) but lots of users and established relationships.
Comment #10
deviantintegral CreditAttribution: deviantintegral commented@mdupont: Unfortuantely, adding the unique constraint creates an index, that the MySQL optimizer then uses. The only way around that is to use an index hint, which is MySQL only. I suppose we could check for the database type, and then add it, but I'm hesitant to add DB-specific code.
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Comment #11
BerdirOk, finally found the time to play around with your data set!
This looks good. In fact, it looks like the performance actually gets *worse* when you have multiple types, I've tested it with 2 and 4 types and performance just got worse and worse.
Also important, removing the primary key makes updates/inserts *much* faster. To test multiple types, I've used the following query to distribute the rtid's randomly across the data set: "update user_relationships set rtid = (requestee_id % 4 + 1)". This took almost 2 minutes with the primary key and 2 seconds without ;)
Sure, this is not a relevant query in any way, but it still gives an idea of the performance an INSERT would have...
Avoiding duplicates needs to be handled anyway in code so the only thing we loose is the absolute proof that there are none.
Updated the patch (in the meantime, the module name changed, unrelated column names changed and we already added a 7003 update function... o.0). I did manually run the tests and verified the upgrade path.
Back to 6.x-1.x!