I've been doing some testing with Relation to see where it starts to run into performance issues. We built a relation that relates content nodes to a parent node in a manner very similar to OG content. I tested with around 250 "group" nodes and a large number of content nodes. It looks like once field_data_endpoints has 20-30K relations in it, most views queries become a bottleneck due to the inner join to find the relation. For example:
SELECT DISTINCT field_data_endpoints0.entity_type AS entity_type, field_data_endpoints0.entity_id AS entity_id, field_data_endpoints0.revision_id AS revision_id, field_data_endpoints0.bundle AS bundle
FROM
field_data_endpoints field_data_endpoints0
INNER JOIN field_data_endpoints field_data_endpoints1 ON field_data_endpoints1.entity_type = field_data_endpoints0.entity_type AND field_data_endpoints1.entity_id = field_data_endpoints0.entity_id
INNER JOIN relation relation ON relation.rid = field_data_endpoints0.entity_id
WHERE (field_data_endpoints0.endpoints_entity_type = 'node') AND (field_data_endpoints1.endpoints_entity_id = '280') AND (field_data_endpoints1.delta = field_data_endpoints0.delta) AND (field_data_endpoints0.deleted = '0') AND (relation.relation_type = 'about') AND (field_data_endpoints0.entity_type = 'relation')EXPLAINs as:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: relation
type: ref
possible_keys: PRIMARY,relation_types
key: relation_types
key_len: 767
ref: const
rows: 1
Extra: Using where; Using index; Using temporary
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: field_data_endpoints0
type: ref
possible_keys: PRIMARY,entity_type,deleted,entity_id,endpoints_relation
key: PRIMARY
key_len: 391
ref: const,juno.relation.rid,const
rows: 1
Extra: Using where
*************************** 3. row ***************************
id: 1
select_type: SIMPLE
table: field_data_endpoints1
type: ref
possible_keys: PRIMARY,entity_type,entity_id
key: PRIMARY
key_len: 390
ref: const,juno.relation.rid
rows: 1
Extra: Using where; DistinctIt's the temporary table from the inner join on field_data_endpoints that is killing performance.
I'm sure others are using Relation with far more data then I was testing with, so does anyone have suggestions for improving performance? Or is it simply a problem of having a large number of source entities pointing to a small number of target entities, reducing index performance? It seems like I might just have to go an alternate storage engine for the field, but I would have expected to hit the limit at far more than 30K nodes.
| Comment | File | Size | Author |
|---|---|---|---|
| #18 | add_separate_indexes_to_endpoint_columns-1649398-18.patch | 1.03 KB | mikran |
| #14 | 1649398_field_data_endpoints_add_indexes_2.sql_.gz | 168 bytes | hclaria |
| #2 | 1649398.2-tables.sql_.gz | 608.47 KB | deviantintegral |
Comments
Comment #1
chx commentedIf you could sanitize the DB and attach the tarball that would be great. When I have tested relation I havent seen such things. However, I wonder -- can you get rid of that DISTINCT? Does that help?
Comment #2
deviantintegral commentedRemoving DISTINCT helps in that it does remove the need temporary table, but the query still takes just as long to execute.
Here's a copy of the {relation} and {field_data_endpoints} table.
Comment #3
bendikrb commentedThe exact same problem over here.
After importing a rather large dataset, I now have 600k+ records in {field_data_endpoints} and about 300k in {relation} table. My relations is between standard drupal nodes.
I'm getting no less than 3000ms per query, often times as much as 6000ms!
Starting to get cold feet...
Comment #4
mikran commentedEven though this is still in support requests the performance is a concern for a stable release so marking this as a blocker too. #1707058: Relation 7.x-1.0
Comment #5
chx commentedI will try to slot this in within a few days. Please ping me on IRC/Twitter/email if I don't before Aug 13.
Comment #6
steveoliver commentedMarking this a major priority task and unassigning from chx as he's super busy and has less than no time for Relation these days.
Comment #7
leenx commentedWe have 122K field_data_endpoints and 61K relations for a start and see 112502ms for page view.
Tried APC, memcached and even tuned mysql pretty much, but seeing a query very similar to the above query in the mysql slow query log. All else seems fine. Pages without relations are fine. Was thinking that maybe entitycache might help, if we could push into a cache bin of some sort.
So this is a little detailed follow and hope somebody else has an idea on a fix.
Comment #8
rafamd commentedAuch, we were planning to use relation on an even bigger dataset... did you find any solution to this issue?
Comment #9
bendikrb commentedI ended up building the site with pgsql instead, which got alot faster.
Also, I wrongly blamed some of the issues I had on the relation module, when infact the most significant problem was with the relation_select module (it deep-loads the relation entity on every entity view, among other flaws.
Comment #10
mikran commentedCore issue #1859084: Conditions on multi-column fields in EntityFieldQuery can solve some problems with RelationQuery
Comment #11
hclaria commentedWe encountered the same issue for one of our customer's website.
The issue started as soon as the table was getting filled with records at ~60 000 records.
We did manage to solve the issue permanently by adding missing indexes on the following columns :
KEY `sh_endpoints_entity_type` (`endpoints_entity_type`),
KEY `sh_endpoints_entity_id` (`endpoints_entity_id`),
KEY `sh_endpoints_r_index` (`endpoints_r_index`)
Those missing indexes on such a big table were causing the MySQL daemon to execute a table scan. Depending on your physical machine performances (innodb_buffer_pool_size, memory and so on) the query was stuck on the "copying to tmp table" state because result dataset was too large versus tmp_table_size // max_heap_table_size -> flush to the disk implying slow I/Os.
Give it a try and let me know if it could be the official permanent fix regarding this performance issue.
Comment #12
mikran commentedGreat news. I've also noticed slowness with "copying to tmp table" state so that makes sense. Do you have patch that adds the keys?
Comment #13
hclaria commentedNot familiar with the community's patches best practices, hope the following patch which adds the missing indexes will be correct.
Regards,
Comment #14
hclaria commentedMissing semicolons in previous patch, this one should work.
Comment #15
chx commentedThanks for the patch! A few pointers:
Comment #16
hclaria commentedThen I'll let anyone willing to do so, not enough free time to write a proper patch.
Comment #17
mikran commentedComment #18
mikran commentedI rolled this into a proper patch but I didn't do any actual benchmarking yet.
Comment #19
attiks commentedPatch looks good and contains and update hook
Comment #20
mikran commentedThis needs some numbers that prove that the issue is solved, or at least better.
Comment #21
zxaos commentedI've inherited a site exhibiting this problem that's clocking over 30 hours now on a query in this format (granted, it's doing 23 inner joins rather than the two in the example).
I'll try the patch and report back.Sorry, am not going to be able to try the patch.Comment #22
uhkis commentedSome performance testing with slightly modified query from summary:
Row count:
Query used:
Before patch:
After patch:
I also used 2 databases to make sure that the results are not cached.
Comment #23
mikran commentedComment #25
mikran commented