I've been doing some testing with Relation to see where it starts to run into performance issues. We built a relation that relates content nodes to a parent node in a manner very similar to OG content. I tested with around 250 "group" nodes and a large number of content nodes. It looks like once field_data_endpoints has 20-30K relations in it, most views queries become a bottleneck due to the inner join to find the relation. For example:

SELECT DISTINCT field_data_endpoints0.entity_type AS entity_type, field_data_endpoints0.entity_id AS entity_id, field_data_endpoints0.revision_id AS revision_id, field_data_endpoints0.bundle AS bundle
FROM 
field_data_endpoints field_data_endpoints0
INNER JOIN field_data_endpoints field_data_endpoints1 ON field_data_endpoints1.entity_type = field_data_endpoints0.entity_type AND field_data_endpoints1.entity_id = field_data_endpoints0.entity_id
INNER JOIN relation relation ON relation.rid = field_data_endpoints0.entity_id
WHERE  (field_data_endpoints0.endpoints_entity_type = 'node') AND (field_data_endpoints1.endpoints_entity_id = '280') AND (field_data_endpoints1.delta = field_data_endpoints0.delta) AND (field_data_endpoints0.deleted = '0') AND (relation.relation_type = 'about') AND (field_data_endpoints0.entity_type = 'relation')

EXPLAINs as:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: relation
         type: ref
possible_keys: PRIMARY,relation_types
          key: relation_types
      key_len: 767
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: field_data_endpoints0
         type: ref
possible_keys: PRIMARY,entity_type,deleted,entity_id,endpoints_relation
          key: PRIMARY
      key_len: 391
          ref: const,juno.relation.rid,const
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: field_data_endpoints1
         type: ref
possible_keys: PRIMARY,entity_type,entity_id
          key: PRIMARY
      key_len: 390
          ref: const,juno.relation.rid
         rows: 1
        Extra: Using where; Distinct

It's the temporary table from the inner join on field_data_endpoints that is killing performance.

I'm sure others are using Relation with far more data then I was testing with, so does anyone have suggestions for improving performance? Or is it simply a problem of having a large number of source entities pointing to a small number of target entities, reducing index performance? It seems like I might just have to go an alternate storage engine for the field, but I would have expected to hit the limit at far more than 30K nodes.

Comments

chx’s picture

If you could sanitize the DB and attach the tarball that would be great. When I have tested relation I havent seen such things. However, I wonder -- can you get rid of that DISTINCT? Does that help?

deviantintegral’s picture

StatusFileSize
new608.47 KB

Removing DISTINCT helps in that it does remove the need temporary table, but the query still takes just as long to execute.

Here's a copy of the {relation} and {field_data_endpoints} table.

bendikrb’s picture

The exact same problem over here.
After importing a rather large dataset, I now have 600k+ records in {field_data_endpoints} and about 300k in {relation} table. My relations is between standard drupal nodes.

I'm getting no less than 3000ms per query, often times as much as 6000ms!

Starting to get cold feet...

mikran’s picture

Issue tags: +7.x-1.0 blocker

Even though this is still in support requests the performance is a concern for a stable release so marking this as a blocker too. #1707058: Relation 7.x-1.0

chx’s picture

Assigned: Unassigned » chx

I will try to slot this in within a few days. Please ping me on IRC/Twitter/email if I don't before Aug 13.

steveoliver’s picture

Assigned: chx » Unassigned
Category: support » task
Priority: Normal » Major

Marking this a major priority task and unassigning from chx as he's super busy and has less than no time for Relation these days.

leenx’s picture

We have 122K field_data_endpoints and 61K relations for a start and see 112502ms for page view.

Tried APC, memcached and even tuned mysql pretty much, but seeing a query very similar to the above query in the mysql slow query log. All else seems fine. Pages without relations are fine. Was thinking that maybe entitycache might help, if we could push into a cache bin of some sort.

So this is a little detailed follow and hope somebody else has an idea on a fix.

rafamd’s picture

Issue tags: +scalability

Auch, we were planning to use relation on an even bigger dataset... did you find any solution to this issue?

bendikrb’s picture

I ended up building the site with pgsql instead, which got alot faster.

Also, I wrongly blamed some of the issues I had on the relation module, when infact the most significant problem was with the relation_select module (it deep-loads the relation entity on every entity view, among other flaws.

mikran’s picture

Core issue #1859084: Conditions on multi-column fields in EntityFieldQuery can solve some problems with RelationQuery

hclaria’s picture

We encountered the same issue for one of our customer's website.
The issue started as soon as the table was getting filled with records at ~60 000 records.

We did manage to solve the issue permanently by adding missing indexes on the following columns :
KEY `sh_endpoints_entity_type` (`endpoints_entity_type`),
KEY `sh_endpoints_entity_id` (`endpoints_entity_id`),
KEY `sh_endpoints_r_index` (`endpoints_r_index`)

Those missing indexes on such a big table were causing the MySQL daemon to execute a table scan. Depending on your physical machine performances (innodb_buffer_pool_size, memory and so on) the query was stuck on the "copying to tmp table" state because result dataset was too large versus tmp_table_size // max_heap_table_size -> flush to the disk implying slow I/Os.

Give it a try and let me know if it could be the official permanent fix regarding this performance issue.

mikran’s picture

Great news. I've also noticed slowness with "copying to tmp table" state so that makes sense. Do you have patch that adds the keys?

hclaria’s picture

Not familiar with the community's patches best practices, hope the following patch which adds the missing indexes will be correct.

Regards,

hclaria’s picture

StatusFileSize
new168 bytes

Missing semicolons in previous patch, this one should work.

chx’s picture

Thanks for the patch! A few pointers:

  1. Please do not gz them.
  2. Please write an update hook https://api.drupal.org/api/drupal/modules%21system%21system.api.php/func...
  3. Please use https://api.drupal.org/api/drupal/includes%21database%21database.inc/fun...
hclaria’s picture

Then I'll let anyone willing to do so, not enough free time to write a proper patch.

mikran’s picture

Assigned: Unassigned » mikran
mikran’s picture

Status: Active » Needs review
StatusFileSize
new1.03 KB

I rolled this into a proper patch but I didn't do any actual benchmarking yet.

attiks’s picture

Status: Needs review » Reviewed & tested by the community

Patch looks good and contains and update hook

mikran’s picture

Status: Reviewed & tested by the community » Needs review

This needs some numbers that prove that the issue is solved, or at least better.

zxaos’s picture

I've inherited a site exhibiting this problem that's clocking over 30 hours now on a query in this format (granted, it's doing 23 inner joins rather than the two in the example).

I'll try the patch and report back. Sorry, am not going to be able to try the patch.

uhkis’s picture

Some performance testing with slightly modified query from summary:
Row count:

MariaDB [relation]> select count(*) from field_data_endpoints\G
*************************** 1. row ***************************
count(*): 82486

Query used:

SELECT DISTINCT field_data_endpoints0.entity_type AS entity_type, field_data_endpoints0.entity_id AS entity_id, field_data_endpoints0.revision_id AS revision_id, field_data_endpoints0.bundle AS bundle FROM field_data_endpoints field_data_endpoints0 INNER JOIN field_data_endpoints field_data_endpoints1 ON field_data_endpoints1.entity_type = field_data_endpoints0.entity_type AND field_data_endpoints1.entity_id = field_data_endpoints0.entity_id INNER JOIN relation relation ON relation.rid = field_data_endpoints0.entity_id WHERE  (field_data_endpoints1.endpoints_entity_id = '21280') AND (field_data_endpoints1.delta = field_data_endpoints0.delta) AND (field_data_endpoints0.deleted = '0') AND (field_data_endpoints0.entity_type = 'relation');

Before patch:

2 rows in set (0.37 sec)

After patch:

2 rows in set (0.00 sec)

I also used 2 databases to make sure that the results are not cached.

mikran’s picture

Status: Needs review » Fixed

  • mikran committed f693812 on 7.x-1.x authored by hclaria
    Issue #1649398 by hclaria, mikran, Uhkis: Fixed performance issues by...
mikran’s picture

Parent issue: » #1707058: Relation 7.x-1.0

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.