TaxonomyTermController::cacheGet() takes more time than it needs to.
The method was showing up as the top offender in xhprof "Excl CPU (microsec)" with 137 ms. This is a very site-specific number, obviously.

The method contains a foreach() loop that checks if the term name really matches.

Main optimization:
- The foreach() loop should only happen if isset($conditions['name']) is true.

Possibly other optimizations:
- drupal_strtolower($conditions['name']) should be taken out of the foreach() loop.
- The term should not be converted to array!
- Maybe build a new array instead of modifying the existing one?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

donquixote created an issue. See original summary.

donquixote’s picture

Two versions of this patch.
One builds a new array, the other uses unset() to remove elements of the existing array.

donquixote’s picture

The patches rename $ids to $tids. I don't really have a strong opinion about this.

The last submitted patch, 2: D7-2835798-TaxonomyTermController-cacheGet-v0.patch, failed testing.

donquixote’s picture

v0 needs old-school array syntax.

donquixote’s picture

I notice that the existing TaxonomyTermController::cacheGet() contains a bug.
Look at the brackets:

      if (isset($conditions['name']) && drupal_strtolower($conditions['name'] != drupal_strtolower($term_values['name']))) {

Ouch..

This code is equivalent to this:

      if (isset($conditions['name']) && $conditions['name'] != drupal_strtolower($term_values['name'])) {

because (bool)strtolower(TRUE) === TRUE, and (bool)strtolower(FALSE) === FALSE, see https://3v4l.org/F9HsF

I think originallly the intended code was this:

      if (isset($conditions['name']) && drupal_strtolower($conditions['name']) != drupal_strtolower($term_values['name'])) {

This would be equivalent in behavior to the patch in #5.

This does seem like a BC-breaking behavior change, if we only look at this local code.

However, it actually does not make a difference, because the same terms that are discarded here are loaded later with the query.
Also, we can assume that anything we would discard here was already discarded before in parent::cacheGet().

The actual change that this causes is which entities are loaded from (static) cache, and which are loaded from the database.
The following code snippet demonstrates this, when run in Devel php execution:

function _x() {
  // Assume we have
  // - one term 'photo printers' (lowercase)
  // - one term 'Printers' (ucfirst)
  foreach ([
    'photo printers',
    'Photo printers',
    'printers',
    'Printers',
  ] as $expected_name) {
    $terms0 = taxonomy_term_load_multiple(FALSE, ['name' => $expected_name]);
    $terms1 = taxonomy_term_load_multiple(FALSE, ['name' => $expected_name]);

    $terms_intersect = array_intersect_key($terms0, $terms1);
    $terms_diff_0 = array_diff_key($terms0, $terms1);
    $terms_diff_1 = array_diff_key($terms1, $terms0);

    $tids_diff_pointer = [];
    $tids_same_pointer = [];
    foreach ($terms_intersect as $tid => $term) {
      if ($terms0[$tid] !== $terms1[$tid]) {
        $tids_diff_pointer[] = $tid;
      }
      else {
        $tids_same_pointer[] = $tid;
      }
    }

    dpm(get_defined_vars(), $expected_name);
  }
}

Observation:
- If a lowercase name condition finds a lowercase term, then the same term object from cache will be returned on second call.
- Any other combination causes subsequent calls to return a different object.

We can use this to blow up memory usage:

function _y($expected_name) {
  $identiterms = [];

  // Initialize all caches before calling mem_get_usage().
  taxonomy_term_load_multiple(
    FALSE,
    ['name' => $expected_name]);

  $mem_before = memory_get_usage();
  for ($i = 0; $i < 1000; ++$i) {
    $terms = taxonomy_term_load_multiple(
      FALSE,
      ['name' => $expected_name]);
    $identiterms[] = reset($terms);
  }
  $mem_after = memory_get_usage();
  // I have this useful MemoryUtil::bytesGetHuman() function in my code somewhere.
  // Feel free to do something else instead.
  dpm(MemoryUtil::bytesGetHuman($mem_after - $mem_before));
}

_y('photo printers'); // -> 94.34 KB
_y('Photo printers'); // -> 7.36 MB
_y('printers'); // -> 5.24 MB (this term is a bit smaller)
_y('Printers'); // -> 5.24 MB

So, it seems that everything is fine in the case where term name and search term are both lowercase.

However, even in this case, the database query is run 1000 times!
And not just that, also the DrupalDefaultEntityController::attachLoad() is called 1000 times.
These entity objects are loaded and built, and then discarded because we have them already in the cache.

You can see why in DrupalDefaultEntityController::load(). In case of condition without given ids, this will run the query no matter what.

----------------------

Conclusion

DrupalDefaultEntityController::load() is really quite stupid and inefficient if called with conditions (and without ids).
Fixing this would be out of scope for this issue.

TaxonomyTermController::cacheGet() has some flaws that can be fixed locally. This issue should focus on those flaws.
The patch in #5 is getting close, but I think we can do better after this analysis.

The behavior of TaxonomyTermController::cacheGet() itself does not need to be preserved 100%. It only needs to make sure to not load any terms that the query would not load. So it has to imitate the filtering of TaxonomyTermController::buildQuery() as well as possible.

donquixote’s picture

Category: Task » Bug report

I'd say this qualifies as a bug report, even if the only consequence is the possibility of high memory usage in some obscure cases.

donquixote’s picture

Issue tags: +Performance
donquixote’s picture

Title: Optimize TaxonomyTermController::cacheGet() » Optimize TaxonomyTermController::cacheGet(), and prevent memory leaks.
donquixote’s picture

Btw, the implementation of TaxonomyTermController::buildQuery() and ::cacheGet() also prevent array-valued conditions.

// This will cause "Warning: addcslashes() expects parameter 1 to be string, array given in DatabaseConnection->escapeLike()", and $terms will be empty.
$terms = taxonomy_term_load_multiple(FALSE, ['name' => ['printers']]);

// This works just fine, and returns the correct nodes.
$nodes = node_load_multiple(FALSE, ['title' => ['Frontpage']]);
donquixote’s picture

It is also interesting how the code treats integer values, especially zero.

dpm(count(taxonomy_term_load_multiple(FALSE, ['name' => 0])));  // -> 0
dpm(count(taxonomy_term_load_multiple(FALSE, ['name' => 'photo printers'])));  // -> 1
dpm(count(taxonomy_term_load_multiple(FALSE, ['name' => 0])));  // -> 1
dpm(count(taxonomy_term_load_multiple(FALSE, ['name' => '0'])));  // -> 0

In the third call, the term 'photo printers' is already in memory, and 0 == 'photo printers' is TRUE. See https://3v4l.org/PB4Er

donquixote’s picture

donquixote’s picture

New issue for the more general problem in DrupalDefaultEntityController: #2888658: Cached entities are loaded again, if $condition is given.

donquixote’s picture