Performance issues with many relations [#1649398]

I've been doing some testing with Relation to see where it starts to run into performance issues. We built a relation that relates content nodes to a parent node in a manner very similar to OG content. I tested with around 250 "group" nodes and a large number of content nodes. It looks like once field_data_endpoints has 20-30K relations in it, most views queries become a bottleneck due to the inner join to find the relation. For example:

SELECT DISTINCT field_data_endpoints0.entity_type AS entity_type, field_data_endpoints0.entity_id AS entity_id, field_data_endpoints0.revision_id AS revision_id, field_data_endpoints0.bundle AS bundle
FROM 
field_data_endpoints field_data_endpoints0
INNER JOIN field_data_endpoints field_data_endpoints1 ON field_data_endpoints1.entity_type = field_data_endpoints0.entity_type AND field_data_endpoints1.entity_id = field_data_endpoints0.entity_id
INNER JOIN relation relation ON relation.rid = field_data_endpoints0.entity_id
WHERE  (field_data_endpoints0.endpoints_entity_type = 'node') AND (field_data_endpoints1.endpoints_entity_id = '280') AND (field_data_endpoints1.delta = field_data_endpoints0.delta) AND (field_data_endpoints0.deleted = '0') AND (relation.relation_type = 'about') AND (field_data_endpoints0.entity_type = 'relation')

EXPLAINs as:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: relation
         type: ref
possible_keys: PRIMARY,relation_types
          key: relation_types
      key_len: 767
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: field_data_endpoints0
         type: ref
possible_keys: PRIMARY,entity_type,deleted,entity_id,endpoints_relation
          key: PRIMARY
      key_len: 391
          ref: const,juno.relation.rid,const
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: field_data_endpoints1
         type: ref
possible_keys: PRIMARY,entity_type,entity_id
          key: PRIMARY
      key_len: 390
          ref: const,juno.relation.rid
         rows: 1
        Extra: Using where; Distinct

It's the temporary table from the inner join on field_data_endpoints that is killing performance.

I'm sure others are using Relation with far more data then I was testing with, so does anyone have suggestions for improving performance? Or is it simply a problem of having a large number of source entities pointing to a small number of target entities, reducing index performance? It seems like I might just have to go an alternate storage engine for the field, but I would have expected to hit the limit at far more than 30K nodes.

Comment	File	Size	Author
#18	add_separate_indexes_to_endpoint_columns-1649398-18.patch	1.03 KB	mikran
#14	1649398_field_data_endpoints_add_indexes_2.sql_.gz	168 bytes	hclaria
#13	1649398_field_data_endpoints_add_indexes.sql_.gz	165 bytes	hclaria
#2	1649398.2-tables.sql_.gz	608.47 KB	deviantintegral

Comments

Comment #1

chx commented 4 July 2012 at 09:18

If you could sanitize the DB and attach the tarball that would be great. When I have tested relation I havent seen such things. However, I wonder -- can you get rid of that DISTINCT? Does that help?

Comment #2

deviantintegral commented 4 July 2012 at 17:29

Status	File	Size
new	1649398.2-tables.sql_.gz	608.47 KB

Removing DISTINCT helps in that it does remove the need temporary table, but the query still takes just as long to execute.

Here's a copy of the {relation} and {field_data_endpoints} table.

Comment #3

bendikrb commented 23 July 2012 at 14:11

The exact same problem over here.
After importing a rather large dataset, I now have 600k+ records in {field_data_endpoints} and about 300k in {relation} table. My relations is between standard drupal nodes.

I'm getting no less than 3000ms per query, often times as much as 6000ms!

Starting to get cold feet...

Comment #4

mikran commented 6 August 2012 at 05:07

Issue tags:

+7.x-1.0 blocker

Even though this is still in support requests the performance is a concern for a stable release so marking this as a blocker too. #1707058: Relation 7.x-1.0

Comment #5

chx commented 6 August 2012 at 11:33

Assigned:

Unassigned

» chx

I will try to slot this in within a few days. Please ping me on IRC/Twitter/email if I don't before Aug 13.

Comment #6

steveoliver commented 18 October 2012 at 19:44

Assigned:	chx	» Unassigned
Category:	support	» task
Priority:	Normal	» Major

Marking this a major priority task and unassigning from chx as he's super busy and has less than no time for Relation these days.

Comment #7

leenx commented 31 October 2012 at 18:39

We have 122K field_data_endpoints and 61K relations for a start and see 112502ms for page view.

Tried APC, memcached and even tuned mysql pretty much, but seeing a query very similar to the above query in the mysql slow query log. All else seems fine. Pages without relations are fine. Was thinking that maybe entitycache might help, if we could push into a cache bin of some sort.

So this is a little detailed follow and hope somebody else has an idea on a fix.

Comment #8

rafamd commented 18 March 2013 at 15:58

Issue tags:

+scalability

Auch, we were planning to use relation on an even bigger dataset... did you find any solution to this issue?

Comment #9

bendikrb commented 27 March 2013 at 18:36

I ended up building the site with pgsql instead, which got alot faster.

Also, I wrongly blamed some of the issues I had on the relation module, when infact the most significant problem was with the relation_select module (it deep-loads the relation entity on every entity view, among other flaws.

Comment #10

mikran commented 27 March 2013 at 19:13

Core issue #1859084: Conditions on multi-column fields in EntityFieldQuery can solve some problems with RelationQuery

Comment #11

hclaria commented 8 September 2014 at 10:17

We encountered the same issue for one of our customer's website.
The issue started as soon as the table was getting filled with records at ~60 000 records.

We did manage to solve the issue permanently by adding missing indexes on the following columns :
KEY `sh_endpoints_entity_type` (`endpoints_entity_type`),
KEY `sh_endpoints_entity_id` (`endpoints_entity_id`),
KEY `sh_endpoints_r_index` (`endpoints_r_index`)

Those missing indexes on such a big table were causing the MySQL daemon to execute a table scan. Depending on your physical machine performances (innodb_buffer_pool_size, memory and so on) the query was stuck on the "copying to tmp table" state because result dataset was too large versus tmp_table_size // max_heap_table_size -> flush to the disk implying slow I/Os.

Give it a try and let me know if it could be the official permanent fix regarding this performance issue.

Comment #12

mikran commented 8 September 2014 at 10:29

Great news. I've also noticed slowness with "copying to tmp table" state so that makes sense. Do you have patch that adds the keys?

Comment #13

hclaria commented 8 September 2014 at 11:00

Status	File	Size
new	1649398_field_data_endpoints_add_indexes.sql_.gz	165 bytes

Not familiar with the community's patches best practices, hope the following patch which adds the missing indexes will be correct.

Regards,

Comment #14

hclaria commented 8 September 2014 at 11:04

Status	File	Size
new	1649398_field_data_endpoints_add_indexes_2.sql_.gz	168 bytes

1 file was hidden/shown/deleted

Status	File	Size
hidden	1649398_field_data_endpoints_add_indexes.sql_.gz	165 bytes

Missing semicolons in previous patch, this one should work.

Comment #15

chx commented 8 September 2014 at 15:33

Thanks for the patch! A few pointers:

Please do not gz them.
Please write an update hook https://api.drupal.org/api/drupal/modules%21system%21system.api.php/func...
Please use https://api.drupal.org/api/drupal/includes%21database%21database.inc/fun...

Comment #16

hclaria commented 8 September 2014 at 16:45

Then I'll let anyone willing to do so, not enough free time to write a proper patch.

Comment #17

mikran commented 10 October 2014 at 10:53

Assigned:

Unassigned

» mikran

Comment #18

mikran commented 10 October 2014 at 12:38

Status:

Active

» Needs review

Status	File	Size
new	add_separate_indexes_to_endpoint_columns-1649398-18.patch	1.03 KB

I rolled this into a proper patch but I didn't do any actual benchmarking yet.

Comment #19

attiks commented 10 October 2014 at 20:49

Status:

Needs review

» Reviewed & tested by the community

Patch looks good and contains and update hook

Comment #20

mikran commented 29 November 2014 at 12:20

Status:

Reviewed & tested by the community

» Needs review

This needs some numbers that prove that the issue is solved, or at least better.

Comment #21

zxaos commented 5 December 2014 at 14:44

I've inherited a site exhibiting this problem that's clocking over 30 hours now on a query in this format (granted, it's doing 23 inner joins rather than the two in the example).

~~I'll try the patch and report back.~~ Sorry, am not going to be able to try the patch.

Comment #22

uhkis commented 18 December 2014 at 08:49

Some performance testing with slightly modified query from summary:
Row count:

MariaDB [relation]> select count(*) from field_data_endpoints\G
*************************** 1. row ***************************
count(*): 82486

Query used:

SELECT DISTINCT field_data_endpoints0.entity_type AS entity_type, field_data_endpoints0.entity_id AS entity_id, field_data_endpoints0.revision_id AS revision_id, field_data_endpoints0.bundle AS bundle FROM field_data_endpoints field_data_endpoints0 INNER JOIN field_data_endpoints field_data_endpoints1 ON field_data_endpoints1.entity_type = field_data_endpoints0.entity_type AND field_data_endpoints1.entity_id = field_data_endpoints0.entity_id INNER JOIN relation relation ON relation.rid = field_data_endpoints0.entity_id WHERE  (field_data_endpoints1.endpoints_entity_id = '21280') AND (field_data_endpoints1.delta = field_data_endpoints0.delta) AND (field_data_endpoints0.deleted = '0') AND (field_data_endpoints0.entity_type = 'relation');

Before patch:

2 rows in set (0.37 sec)

After patch:

2 rows in set (0.00 sec)

I also used 2 databases to make sure that the results are not cached.

Comment #23

mikran commented 18 December 2014 at 08:56

Status:

Needs review

» Fixed

Comment #24

18 December 2014 at 08:57

mikran committed f693812 on 7.x-1.x authored by hclaria

Issue #1649398 by hclaria, mikran, Uhkis: Fixed performance issues by...

Comment #25

mikran commented 18 December 2014 at 08:59

Parent issue:

» #1707058: Relation 7.x-1.0

Comment #26

1 January 2015 at 09:04

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Performance issues with many relations

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Parent issue

News items

Our community

Documentation

Drupal code base

Governance of community