Follow-up to #2658268: Cache persistence after derivative creation

Problem/Motivation

\Drupal\crop\Entity\Crop::findCrop() is a often-used to find a crop type for the given URI and crop type.
It uses an entity query on "URI", "type" and sorts the results on "cid" field.

Proposed resolution

"cid" seems like unnecessary here. Also, there should be a query index on "URI" and "type" fields.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mbovan created an issue. See original summary.

slashrsm’s picture

Issue tags: +D8Media

Sort by "cid" is there to have predictable results in an (unlikely) case of more crops being in DB for a given image style, uri combination. By using the one with the lowest cid we make sure we always return the oldest one.

I am not sure if that is strictly needed. Maybe we could handle this on save and prevent such duplicates from being inserted in the first place.

Definitely +1 for index.

Berdir’s picture

But don't we identify the crop based on the URI?

Ah, I guess there could be duplicates if you use multiple crop modules? But that seems... problematic anyway :)

also, that sorts by cid ascending, so it will always use the first.

anyway, we can keep that and just add that to the index too. See \Drupal\paragraphs\ParagraphStorageSchema for an example that adds a combined index and \paragraphs_update_8008() for the necessary update function.

This is probably a pretty good task for the drupalcon sprints...

matiasmiranda’s picture

Currently I'm working in a site with large amount of images cropped using focal_point module.

I'm experienced an overload in the RDS instance of MySQL then I could observe that the overload occurs due to a copy of data to memory, performed by MySQL.

This is an example of MySQL Show Process List output

Copying to tmp table | SELECT base_table.vid AS vid, base_table.cid AS cid FROM crop base_table INNER JOIN crop_field_data crop_field_data ON crop_field_data.cid = base_table.cid WHERE (crop_field_data.uri LIKE 'public://2017-04/12345.jpg' ESCAPE '\\') AND (crop_field_data.type = 'focal_point') GROUP BY base_table.vid, base_table.cid ORDER BY base_table.cid ASC LIMIT 1 OFFSET 0

Currently I have 15k registers in crop table, for that reason when the data is copied the memory is exhausted, then MySQL starts copying the temporary data to disk instead memory and the use of processor reachs 100%, and then the server crashes

In some cases MySQL creates internal temporary tables while processing statements, in our case due to the ORDER BY and the GROUP BY clauses.

The crop table has column "cid" as PRIMARY and Auto_incremet, for that reason we can remove ORDER BY, because in that case the result set is delivered ordered by PRIMARY KEY.

But that is not enough, because range(0,1) in findCrop still adding a GROUP BY clause to the statement, to avoid this is necessary the metaData simple_query = TRUE

By this way, findCrop will retrieve the oldest cid for a certain URI and Crop Type, without copying data to tmp table in memory or disk.

I attach a patch to be reviewed.

dagmar’s picture

Status: Active » Needs review
Berdir’s picture

not sure the simple_query thing is something that should be set from outside.

I started working on optimizing entity queries in #2875033: Optimize joins and table selection in SQL entity query implementation, would be interesting to see how that affects that query. the patch is definitely not yet ready though.

matiasmiranda’s picture

Interesting.. maybe the simple_query not should be seted outside, anyway for this case it solves the problem, and will work fine until your optimization in entity query be ready (will be nice to have a retro-compatibility in order to avoid remove the metaData :)

matiasmiranda’s picture

You're right @Berdir, is not a good idea use metaData here. So, I rewrote the findCrop method to use a simple query using db api instead entity query.

Here goes a new patch to be reviewed

Status: Needs review » Needs work

The last submitted patch, 8: improving_findCrop_performance-2800739-8.patch, failed testing.

matiasmiranda’s picture

Status: Needs work » Needs review
FileSize
1005 bytes

Oops, the "Reformat Code" of my IDE made a mistake.. I'm sorry for that
Here goes again..

Berdir’s picture

Status: Needs review » Needs work

Yes, that's possibly the fastest option, I did something like that as well in redirect.module. However, then it either needs be a method on the storage or a separate service, tagged with backend_overridable, see the redirect.repository service.

So that if someone wants to replace the storage and use MongoDB or something like that, they can provide an alternative implementation of that service.

woprrr’s picture

Assigned: Unassigned » woprrr

Hi all in that case we can purpose a new Crop Storage like "\Drupal\comment\CommentStorage" way to use that fastest option without loose other system compatibility. I try to purpose you a new one today !

woprrr’s picture

Status: Needs work » Needs review
FileSize
2.24 KB
2.38 KB

Here the patch with "CropStorage" implemented and use in findCrop() method and that permit overrides Crop handler to change it for MongoDB or something like that.

@Berdir ++ for backend_overridable I don't know that I see that. Which method are better in our case ? EntityStorage or backend_overridable service ?

Berdir’s picture

+++ b/src/CropStorage.php
@@ -5,8 +5,28 @@ namespace Drupal\crop;
+
+    return $query->execute()->fetchField();

You could consider to already return null or the loaded entity here with $cid ? $this->load($cid) : NULL;

Would make the static findCrop() method a simple one-liner.

woprrr’s picture

You totally right !! And I really love to make methods more simpler/cleaner as possible. Here you are the patch with refactoring applied.

Berdir’s picture

Status: Needs review » Reviewed & tested by the community

I'd write "if nothing matches the search parameters" or so, but other than that, looks fine to me.

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 15: improve_findcrop-2800739-15.patch, failed testing. View results

woprrr’s picture

Status: Needs work » Reviewed & tested by the community
FileSize
2.39 KB
415 bytes

Here the grammar fixe + follow-up issue to solve deprecated BaseFieldDefinition breaking the tests #2925305: Change all deprecated BaseFieldDefinition use

Status: Reviewed & tested by the community » Needs work

The last submitted patch, 18: improve_findcrop-2800739-18.patch, failed testing. View results

woprrr’s picture

Status: Needs work » Reviewed & tested by the community
Berdir’s picture

Looks good to me, lets get this in for both 8.x-1.x and 8.x-2.x.

  • woprrr committed 0662430 on 8.x-1.x
    Issue #2800739 by woprrr, matiasmiranda, Berdir, mbovan: Improve...

  • woprrr committed 32f60ab on 8.x-2.x
    Issue #2800739 by woprrr, matiasmiranda, Berdir, mbovan: Improve...
woprrr’s picture

Status: Reviewed & tested by the community » Fixed

Merged in both branches Thanks all :) good job.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.