Problem/Motivation

loadUnchanged is expensive as it is loading a whole entity and running post load hooks and additionally creating a new object in the memory. An entity query should be less expensive and faster.

Proposed resolution

Determine if the entity has been changed meanwhile through an entity query instead loading the unchanged entity.

As the changed field might not be named "changed" in contrib, we check if the entity has this field and only in this case perform an entity query with conditions for each translation on that field, otherwise fallback to loading the unchanged entity.

Remaining tasks

User interface changes

API changes

Data model changes

Comments

hchonov created an issue. See original summary.

hchonov’s picture

Status: Active » Needs review
StatusFileSize
new1.46 KB

A proof of concept patch.

Status: Needs review » Needs work

The last submitted patch, 2: 2837707-PoC.patch, failed testing.

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new1.47 KB
new824 bytes

Oups..

hchonov’s picture

StatusFileSize
new11.11 KB
new9.54 KB

Adding the changed key to the entity annotation.

Status: Needs review » Needs work

The last submitted patch, 5: 2837707-5.patch, failed testing.

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new11.12 KB
new820 bytes

Status: Needs review » Needs work

The last submitted patch, 7: 2837707-7.patch, failed testing.

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new9.66 KB
new1.75 KB

It looks like adding dynamically the content_translation_changed field to the entity metadata keys in content_translation_entity_type_alter is causing problems on existing sites when rebuilding the cache and going into some infinite loops... So I am removing it for now.

Status: Needs review » Needs work

The last submitted patch, 9: 2837707-9.patch, failed testing.

hchonov’s picture

Status: Needs work » Needs review
hchonov’s picture

Actually entity queries are complex and expensive so another option would be to introduce a new system table: entity_changed_tracking and a hook_entity_insert/update and upsert the changed time in the table if saving the default revision, then query this table instead of building an entity query.

Any thoughts on this?

amateescu’s picture

Since \Drupal\Core\Entity\EntityChangedTrait hard-codes the field name to 'changed', I think we can get away with hard-coding it as well in the entity query for now, and open a separate feature request issue to make the changed field name configurable, which ultimately would look like what we did in #2248983: Define the revision metadata base fields in the entity annotation in order for the storage to create them only in the revision table.

@hchonov mentioned in IRC that someone could implement EntityChangedInterface without using EntityChangedTrait, but that's an edge case IMO, for which we can keep the current loadUnchanged() code as a fallback when an entity type doesn't have a 'changed' field.

amateescu’s picture

Re #12: That would mean another custom table that alternative storage engines would have to implement, so it seems a bit of overkill for bypassing a simple entity query :)

hchonov’s picture

Well we could probably use the key value store instead of creating a new table? So what do you think is better - entity query with hardcoded changed field name vs hook_entity_insert/update + key value storage?

hchonov’s picture

StatusFileSize
new8.19 KB

What I was proposing is something like this by using the key value store for entity changed tracking. If you do not like the idea or think it is not worth the optimization then I would do an entity query instead. The current solution does not require any update hooks as there is a fallback in the EntityChangedConstraintValidator to load the unchanged entity if the key value store does not contain the changed time for the given entity.

hchonov’s picture

StatusFileSize
new8.26 KB
new1.66 KB

Oups .. a small mistake only using id as a name, but I had to use entity_type_id + entity id.

The last submitted patch, 16: 2837707-16.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 17: 2837707-17.patch, failed testing.

amateescu’s picture

Well we could probably use the key value store instead of creating a new table? So what do you think is better - entity query with hardcoded changed field name vs hook_entity_insert/update + key value storage?

My personal opinion/preference would be entity query with hardcoded changed field name + fallback to load the unchanged entity.

The downside of the key value approach is that we need to track that an entity throughout its entire lifetime (create, update, delete) and then it's not really a performance optimization anymore.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

hchonov’s picture

Issue summary: View changes
Status: Needs work » Needs review
StatusFileSize
new10.36 KB
new10.38 KB

My personal opinion/preference would be entity query with hardcoded changed field name + fallback to load the unchanged entity.

The downside of the key value approach is that we need to track that an entity throughout its entire lifetime (create, update, delete) and then it's not really a performance optimization anymore.

@amateescu++

After the changes, that we've made in #2837022: Concurrently editing two translations of a node may result in data loss for non-translatable fields now I think as well that tracking the entity throughout its lifetime will be complex, because we would need to track each translation.

In the new patch I check if the entity has a changed field and if so perform an entity query, otherwise fallback to loading the unchanged entity from the storage. An interdiff doesn't really make sense, as the validator has changed its check to individually check each translation.

Status: Needs review » Needs work

The last submitted patch, 23: 2837707-23-8.5.x.patch, failed testing. View results

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new3.92 KB

The other changes introducing the entity changed tracking should be removed as well.

hchonov’s picture

StatusFileSize
new4.1 KB
new1.5 KB

If the entity type isn't translatable we should not add the language code to the entity query condition, as otherwise an exception will be thrown for a non-existent column "langcode":

Column not found: 1054 Unknown column 'entity_test_constraints.langcode' in 'on clause': SELECT base_table.id AS id, base_table.id AS base_table_id

.

The last submitted patch, 25: 2837707-25.patch, failed testing. View results

Status: Needs review » Needs work

The last submitted patch, 26: 2837707-26.patch, failed testing. View results

hchonov’s picture

Status: Needs work » Needs review
StatusFileSize
new4.39 KB
new4.52 KB

I've forgot to narrow down the query based on the entity ID :).

But there is some problem with the entity queries when querying the same field in an OR condition group and specifying the langcode.

The failing patch uses a query like the following one:

$query = \Drupal::entityTypeManager()->getStorage('node')->getQuery(); 
$changed_time_conditions = $query->orConditionGroup()->
  condition('changed', 1513858630, '>', 'de')->
  condition('changed', 1513858653, '==', 'en'); 
$query->condition('nid', 1)
  ->condition($changed_time_conditions)
  ->execute();

which unfortunately is compiled to SQL where the language code is not part of each condition for the changed timestamp, but is used for the INNER JOIN:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
LEFT JOIN {node_field_data} node_field_data_2 ON node_field_data_2.nid = base_table.nid AND node_field_data_2.langcode = :langcode0
WHERE (node_field_data.nid = :db_condition_placeholder_1) AND ((node_field_data_2.changed > :db_condition_placeholder_2) or (node_field_data_2.changed > :db_condition_placeholder_3)); Array
(
    [:db_condition_placeholder_1] => 1
    [:db_condition_placeholder_2] => 1513858630
    [:db_condition_placeholder_3] => 1513858653
    [:langcode0] => de
)

The passing patch uses a query which looks like this:

$query = \Drupal::entityTypeManager()->getStorage('node')->getQuery('OR');
$and1 = $query->andConditionGroup()
  ->condition('nid', 1)
  ->condition('langcode', 'de')
  ->condition('changed', 1513858630, '>');
$and2 = $query->andConditionGroup()
  ->condition('nid', 1)
  ->condition('langcode', 'en')
  ->condition('changed', 1513858653, '>');
$query->condition($and1)
  ->condition($and2);
$query->execute();

which compiles to SQL where the language code is part of each condition for the changed timestamp:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN {node_field_data} node_field_data_2 ON node_field_data_2.nid = base_table.nid
WHERE ((node_field_data.nid = :db_condition_placeholder_0) and (node_field_data.langcode = :db_condition_placeholder_1) and (node_field_data.changed > :db_condition_placeholder_2)) OR ((node_field_data_2.nid = :db_condition_placeholder_3) and (node_field_data_2.langcode = :db_condition_placeholder_4) and (node_field_data_2.changed > :db_condition_placeholder_5)); Array
(
    [:db_condition_placeholder_0] => 1
    [:db_condition_placeholder_1] => en
    [:db_condition_placeholder_2] => 1513858630
    [:db_condition_placeholder_3] => 1
    [:db_condition_placeholder_4] => de
    [:db_condition_placeholder_5] => 1513858654
)
hchonov’s picture

StatusFileSize
new4.48 KB
new2.41 KB

I was able to move the comparison of the entity ID out of each condition for the changed timestamp, which results in the following SQL and should have better performance:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
INNER JOIN {node_field_data} node_field_data_2 ON node_field_data_2.nid = base_table.nid
INNER JOIN {node_field_data} node_field_data_3 ON node_field_data_3.nid = base_table.nid
WHERE (node_field_data.nid = :db_condition_placeholder_0) AND (((node_field_data_2.langcode = :db_condition_placeholder_1) and (node_field_data_2.changed > :db_condition_placeholder_2)) or ((node_field_data_3.langcode = :db_condition_placeholder_3) and (node_field_data_3.changed == :db_condition_placeholder_4)))

The last submitted patch, 29: 2837707-29-entity-query-wrong.patch, failed testing. View results

The last submitted patch, 29: 2837707-29-entity-query-right.patch, failed testing. View results

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

kfritsche’s picture

Status: Needs review » Needs work

I'm not an mysql expert but somehow this query is slow on our side.

We have 8 languages and around 4000 entities (16.000 rows in the table - so not all is translated to all languages yet)

The query build by entity API is:

SELECT base_table.vid AS vid, base_table.id AS id
FROM
  as_wf_node_state base_table
    INNER JOIN as_wf_node_state_field_data as_wf_node_state_field_data ON as_wf_node_state_field_data.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_2 ON as_wf_node_state_field_data_2.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_3 ON as_wf_node_state_field_data_3.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_4 ON as_wf_node_state_field_data_4.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_5 ON as_wf_node_state_field_data_5.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_6 ON as_wf_node_state_field_data_6.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_7 ON as_wf_node_state_field_data_7.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_8 ON as_wf_node_state_field_data_8.id = base_table.id
    LEFT JOIN as_wf_node_state_field_data as_wf_node_state_field_data_9 ON as_wf_node_state_field_data_9.id = base_table.id
WHERE (as_wf_node_state_field_data.id = '1354')
  AND (((as_wf_node_state_field_data_2.langcode = 'en')
  and (as_wf_node_state_field_data_2.changed > '1547743190'))
  or ((as_wf_node_state_field_data_3.langcode = 'de')
    and (as_wf_node_state_field_data_3.changed > '1547743190'))
  or ((as_wf_node_state_field_data_4.langcode = 'de-AT')
    and (as_wf_node_state_field_data_4.changed > '1547743190'))
  or ((as_wf_node_state_field_data_5.langcode = 'nl')
    and (as_wf_node_state_field_data_5.changed > '1547743190'))
  or ((as_wf_node_state_field_data_6.langcode = 'el')
    and (as_wf_node_state_field_data_6.changed > '1547743190'))
  or ((as_wf_node_state_field_data_7.langcode = 'sl')
    and (as_wf_node_state_field_data_7.changed > '1547743190'))
  or ((as_wf_node_state_field_data_8.langcode = 'es')
    and (as_wf_node_state_field_data_8.changed > '1547743190'))
  or ((as_wf_node_state_field_data_9.langcode = 'it')
    and (as_wf_node_state_field_data_9.changed > '1547743190')))

Running this query takes 1 minute.

Using explain it doesn't seem to be a missing index problem or similar, but sitll here the output of explain.

1	SIMPLE	as_wf_node_state_field_data	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	as_wf_node_state__id__default_langcode__langcode	4	const	8	Using index
1	SIMPLE	base_table	eq_ref	PRIMARY	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	1
1	SIMPLE	as_wf_node_state_field_data_2	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_3	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_4	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_5	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_6	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_7	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_8	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2
1	SIMPLE	as_wf_node_state_field_data_9	ref	PRIMARY,as_wf_node_state__id__default_langcode__langcode	PRIMARY	4	dev_as8.as_wf_node_state_field_data.id	2	Using where

Loading the full entity is better in our case than this query...

amateescu’s picture

Since #2753675: loadUnchanged should use the persistent cache to load the unchanged entity if possible instead of invalidating it, loadUnchanged() is a very simple "persistent cache get" in most cases, and no entity query will ever be more performant than that.

I think we should close this issue as won't fix. @hchonov, do you agree?

hchonov’s picture

Status: Needs work » Closed (won't fix)

@amateescu, yes, you are right - we cannot improve this further.