TaxonomyTermController::cacheGet() takes more time than it needs to.
The method was showing up as the top offender in xhprof "Excl CPU (microsec)" with 137 ms. This is a very site-specific number, obviously.
The method contains a foreach() loop that checks if the term name really matches.
Main optimization:
- The foreach() loop should only happen if isset($conditions['name'])
is true.
Possibly other optimizations:
- drupal_strtolower($conditions['name'])
should be taken out of the foreach() loop.
- The term should not be converted to array!
- Maybe build a new array instead of modifying the existing one?
Comment | File | Size | Author |
---|---|---|---|
#14 | D7-2835798-14-TaxonomyTermController-optimize.patch | 3.94 KB | donquixote |
#14 | D7-2835798-14-vs-12-TaxonomyTermController-optimize.interdiff.txt | 554 bytes | donquixote |
Comments
Comment #2
donquixote CreditAttribution: donquixote as a volunteer commentedTwo versions of this patch.
One builds a new array, the other uses unset() to remove elements of the existing array.
Comment #3
donquixote CreditAttribution: donquixote as a volunteer commentedThe patches rename
$ids
to$tids
. I don't really have a strong opinion about this.Comment #5
donquixote CreditAttribution: donquixote as a volunteer commentedv0 needs old-school array syntax.
Comment #6
donquixote CreditAttribution: donquixote as a volunteer commentedI notice that the existing
TaxonomyTermController::cacheGet()
contains a bug.Look at the brackets:
Ouch..
This code is equivalent to this:
because
(bool)strtolower(TRUE) === TRUE
, and(bool)strtolower(FALSE) === FALSE
, see https://3v4l.org/F9HsFI think originallly the intended code was this:
This would be equivalent in behavior to the patch in #5.
This does seem like a BC-breaking behavior change, if we only look at this local code.
However, it actually does not make a difference, because the same terms that are discarded here are loaded later with the query.
Also, we can assume that anything we would discard here was already discarded before in
parent::cacheGet()
.The actual change that this causes is which entities are loaded from (static) cache, and which are loaded from the database.
The following code snippet demonstrates this, when run in Devel php execution:
Observation:
- If a lowercase name condition finds a lowercase term, then the same term object from cache will be returned on second call.
- Any other combination causes subsequent calls to return a different object.
We can use this to blow up memory usage:
So, it seems that everything is fine in the case where term name and search term are both lowercase.
However, even in this case, the database query is run 1000 times!
And not just that, also the
DrupalDefaultEntityController::attachLoad()
is called 1000 times.These entity objects are loaded and built, and then discarded because we have them already in the cache.
You can see why in
DrupalDefaultEntityController::load()
. In case of condition without given ids, this will run the query no matter what.----------------------
Conclusion
DrupalDefaultEntityController::load()
is really quite stupid and inefficient if called with conditions (and without ids).Fixing this would be out of scope for this issue.
TaxonomyTermController::cacheGet()
has some flaws that can be fixed locally. This issue should focus on those flaws.The patch in #5 is getting close, but I think we can do better after this analysis.
The behavior of TaxonomyTermController::cacheGet() itself does not need to be preserved 100%. It only needs to make sure to not load any terms that the query would not load. So it has to imitate the filtering of
TaxonomyTermController::buildQuery()
as well as possible.Comment #7
donquixote CreditAttribution: donquixote as a volunteer commentedI'd say this qualifies as a bug report, even if the only consequence is the possibility of high memory usage in some obscure cases.
Comment #8
donquixote CreditAttribution: donquixote as a volunteer commentedComment #9
donquixote CreditAttribution: donquixote as a volunteer commentedComment #10
donquixote CreditAttribution: donquixote as a volunteer commentedBtw, the implementation of TaxonomyTermController::buildQuery() and ::cacheGet() also prevent array-valued conditions.
Comment #11
donquixote CreditAttribution: donquixote as a volunteer commentedIt is also interesting how the code treats integer values, especially zero.
In the third call, the term 'photo printers' is already in memory, and
0 == 'photo printers'
is TRUE. See https://3v4l.org/PB4ErComment #12
donquixote CreditAttribution: donquixote as a volunteer commentedHere is an implementation that I think is smarter than what we do now.
Comment #13
donquixote CreditAttribution: donquixote as a volunteer commentedNew issue for the more general problem in DrupalDefaultEntityController: #2888658: Cached entities are loaded again, if $condition is given.
Comment #14
donquixote CreditAttribution: donquixote as a volunteer commentedOops. Leftover snippet in #12.