Problem/Motivation

The following statistics were extracted from a particular SQL instance. The site is a big one, using menu position rules extensively, having 720 active ones for now. Biggest issues in it are bytes sent and average execution time.

SQL query statistics from percona-tools: totals, top 10 slowest queries and the details for the slowest one:

# Totals:
# Overall: 68.51M total, 611 unique, 196.49 QPS, 0.11x concurrency _______
# Time range: 2016-04-21 11:05:35 to 2016-04-25 11:56:51
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time         38523s    51us    267s   562us     1ms    64ms   125us
# Lock time          3616s       0   429ms    52us    66us    65us    47us
# Rows sent          4.52G       0   2.92M   70.84  420.77   1.38k    1.96
# Rows examine      10.56G       0   2.92M  165.54  420.77   3.13k    4.96
# Rows affecte      28.96k       0   7.24k    0.00       0    0.90       0
# Bytes sent       901.28G      11 664.23M  13.79k  13.78k 268.08k  313.99
# Query size         5.17G       6  79.56k   80.97  102.22  110.15   54.21

# Profile
# Rank Query ID           Response time    Calls    R/Call  V/M   Item
# ==== ================== ================ ======== ======= ===== ========
#    1 0x8C2FFCA0307FA532 14136.2340 36.7%  2650059  0.0053  0.00 SELECT menu_position_rules
#    2 0x9D0CB602DD025CCE  2931.5101  7.6%  3995441  0.0007  0.00 SELECT domain_alias
#    3 0x8BA7189ACA371127  2807.3743  7.3%  8248377  0.0003  0.00 SELECT domain
#    4 0x67A347A2812914DF  1592.6305  4.1%    10216  0.1559 16... SELECT CUSTOM-TABLE-IN-NON-DRUPAL-PROJECT (Ignore).
#    5 0x0A47FF1E378DDAD2  1466.4478  3.8%  2221700  0.0007  0.00 SELECT domain_alias
#    6 0x6CA6B1186679597D  1276.9851  3.3%    35092  0.0364  0.00 SELECT menu_links menu_router
#    7 0x412A031543B4EA48  1199.9383  3.1%  4254064  0.0003  0.00 SELECT languages
#    8 0x0A971376736289F0  1139.7765  3.0% 11610182  0.0001  0.24 SELECT domain_export
#    9 0xE7091269E98B03C3   955.1257  2.5%    11469  0.0833  0.00 SELECT redirect
#   10 0xF598075701F7C596   893.9671  2.3%  7357200  0.0001  0.00 SELECT date_format_locale

# Query 1: 7.60 QPS, 0.04x concurrency, ID 0x8C2FFCA0307FA532 at byte 3879110581
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2016-04-21 11:05:35 to 2016-04-25 11:56:51
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          3 2650059
# Exec time     36  14136s     2ms      4s     5ms     9ms     5ms     5ms
# Lock time      4    173s    35us     4ms    65us    76us     8us    63us
# Rows sent     39   1.77G     716     716     716     716       0     716
# Rows examine  33   3.53G   1.40k   1.40k   1.40k   1.40k       0   1.40k
# Rows affecte   0       0       0       0       0       0       0       0
# Bytes sent    78 707.67G 280.01k 280.01k 280.01k 280.01k       0 280.01k
# Query size     3 187.02M      74      74      74      74       0      74
# String:
# Databases    DB
# Hosts        web1 (885378/33%)... 2 more
# Last errno   0
# Users        DBUser
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  ################################################################
#  10ms  ###
# 100ms  #
#    1s  #
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `DB` LIKE 'menu_position_rules'
#    SHOW CREATE TABLE `DB`.`menu_position_rules`
# EXPLAIN /*!50100 PARTITIONS*/
SELECT * FROM menu_position_rules WHERE enabled = '1' ORDER BY weight, rid

Notes:
On this particular portal, there is plenty of writes, so query cache is hindering performance and it's disabled. This results into the fact that this particular query is executed always. Even if that is the case, the slowness comes mostly from the network data transfer, not the query execution within the server.

This is because it does select * every time, it will continue to degrade as more rules are going to be added in future. Out of the whole SQL network traffic generated by the portal, this query s responsible for 25-30% of it on average (or 707 GB for 4 days based on the aggregates). This is the most time-consuming query on the portal, utilizing ~36% of the total SQL server time spent on queries.

Proposed resolution

There are 2 directions that can be taken on:
1. Offload DB from the query by adding persistent cache around it. This should be relatively fast and easy. Moving the network delays from the DB to the cache layer. There is one cave-at there will be 1 big cache item that might just move the bottleneck to the cache and nothing more. It would be best to have the rules granulated and have many cache items with small sizes (if possible). As this cache is based on the list of rules, any rule that is changed, deleted or updated should invalidate the whole cache data, so cache consistency is left-out to the CRUD methods menu_position_(add|edit|delete)_rule methods. Cache is cleared also when rules are reordered.

2. Optimize the query itself. We should aim in 2 directions here. First reduce columns requested, as * is really non-scalable for long. Second, we are filtering based on the ENABLED flag. Can we have some more filters added, so we can reduce the result set rows count? (This is implemented in comment 6).

Note that 1 and 2 are not mutually exclusive, on the contrary, they supplement each other.

Remaining tasks

Discussion, Patch, Reviews, RTBC, Commit.

Patch in comment 9 has points 1 and 2 implemented. With functionality related to point 2 on my workload I am experiencing 10-20% result-set size reduction, depending on page context values.

User interface changes

New administrative setting option to handle enable/disable of the persistent cache logic.

API changes

Reamed the old parameter to reflect it's core feature - static caching (use_static_cache). Affecting only the function itself.

API addition - added one more optional parameter (last) to menu_position_evaluate_rules method - use_cache defaulting to TRUE. It is ignored if the setting value is disabled.

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ndobromirov created an issue. See original summary.

ndobromirov’s picture

Issue summary: View changes

Updated the listing with top 10 slowest queries to have a comparison.

ndobromirov’s picture

As a first step, it should be pretty safe to change

SELECT * FROM menu_position_rules WHERE enabled = '1' ORDER BY weight, rid

to

SELECT rid, conditions, menu_name, plid, mlid FROM menu_position_rules WHERE enabled = '1' ORDER BY weight, rid

excluding the columns admin_title, weight and enabled:
1. The admin_title is not used in this part of the code and it's more UI related.
2. We know that all of them are enabled, so this can be safely ignored from the result-set.
3. We use the weight for ordering and that is already there. We do not need the weight value.

Any thoughts on this?

ndobromirov’s picture

Can we use the

// Retrieve the list of menus the path is already in.
$menu_names = db_query('SELECT menu_name FROM {menu_links} WHERE link_path = :path', array(':path' => $context['path']))->fetchCol();

to additionally filter-out the menu position rules with a menu_name not in (...) condition?
Only if no one implements the alter hook, then no one can change the matches to true, so it is meaningless to select this records either way. At least in theory.

I ca make a POC patch on this tomorrow including the stuff from the previous comment.

ndobromirov’s picture

Can we check how many modules on drupal.org are actually implementing that hook?
Do you have any stats on this?

ndobromirov’s picture

Status: Active » Needs review
FileSize
2.05 KB

Here is attempt at the patch and let's see what the test-bot will say :). It includes the changes described above + some more:
1. Reduced columns (this might need a revert due to the tokens feature).
2. Attempt at additional filtering of the result-set.
3. Small performance micro-optimizations.
4. Changed db_query to db_select.

Does it sound meaningful to have a persistent cache layer around this function (similar to the static one) that is already in-place, so we can spare the 2 DB calls altogether with 1 cache_get (on warm caches)?

ndobromirov’s picture

Version: 7.x-2.x-dev » 7.x-1.x-dev

Moved the issue to 1.x, as it was assigned to 2.x incorrectly.

Based on calculations and some tests the query result size is 8-10% smaller because of 1. In my case this will result in 15-18GB less network traffic per day. I am not able to test 2 at the moment.

ndobromirov’s picture

Added a persistent cache layer to fully offload the DB from the requested menu_position_rules data. The data stored in the cache is only the activated rules for a given context, so it should not tax the cache layer that much also.

ndobromirov’s picture

Updated solution description.
Added some fixes to the cache layer implementation.
Added new optional parameter to the menu_position_evaluate_rules method that will handle the persistent cache logic.
Tweaks in comments.
Performed tests on the filtration logic from the first patch. It should reduces the result-set by 1-10% depending on the current menu in my case. This fully depends on how many menu's there are in the system and how the rules are associated with them. It might trigger a bigger gain in other use-cases than mine.

ndobromirov’s picture

Issue summary: View changes

Updated issue description.
Added totals from the slow log query statistics to have a better comparison.
Improved wording on the problem, proposed solution and remaining tasks sections.
Provided some metrics that are observed after the changes.

BarisW’s picture

Hi Nikolay,

thanks for the patch and the perfect issue summary. Would you please run a test on a smaller site as well? To see if your solution doesn't slow down normal websites?
Im happy to commit this but I need others to test it as well. So let's wait 'till this is RTBC and to have some more stats.

ndobromirov’s picture

Hi,

This weekend I plan to prepare a test setup with a clean d7 based on standard profile + some common modules (ctools, views etc.). Validate the patch against that and provide some XHProf stats from there.

Some notes for the change:
1. module_implements("menu_position_rule_alter") - This should be statically cached in Drupal already (if not - we are warming the cache for the hook 5-10 lines below), so it's negligible overhead .
2. The rest is just making the SQL query lighter by reducing transferred payload, so it should only add improvements, It does not matter the scale.
3. The db_query changed to db_select - adds a negligible overhead for the query preparation itself.
4. The cache key implementation change is negligible overhead.

For cold caches - there is overhead of the cache layer -> cache get (miss) + db query.

For warm caches (usual case for the requests, as rules are only added occasionally):
- There is only one cache-get (taking only the rules for the context, not all of them).
- The data is un-serialized once, not per rule, so generally should be faster (less function calls).
- We cache only the rules that were activated for a context, so we are doing less loops and checks for triggering those rules.

So as I see it we are trading speed for cache memory and it should work consistently no-matter the scale.

Note that this will create cache entry per context (with many items this might pose issues for people that are constrained for memory).
Four our environment this should tax about 200 MB cache memory for 200k pages (but we have memcache cluster memory to spare in the GB range).

As this might not be wanted behavior for some, we can add configuration to enable/disable the persistent cache through a settings key or a check-box in the config page. This settings will just control (override) the $use_cache parameter internally, ignoring any input, based on configuration's value.

$conf['menu_position_use_persistent_rules_cache'] = TRUE; // Defaults to FALSE.
ndobromirov’s picture

FileSize
2.37 KB
7.55 KB

The following patch adds the configuration to enable/disable the persistent cache mentioned in #12.

It is disabled by default, so when testing the patch either add the following to the settings.php $conf['menu_position_rules_cache_enabled'] = 1; or configure the setting to Enabled from admin/structure/menu-position/settings.

This way, when the patch is applied, it will only improve the SQL query (by reducing the payload), making the whole patch safer.

ndobromirov’s picture

FileSize
7.89 KB
1.17 KB

I've done some profiling on a standard D7 with with 4 menu items and for each of them there is 1 rule.
I've then created a node that will be selected by a single rule and profiled against it's page with:

ab -n 1000 -c 1 URL

Page cache is disabled.
The test is with persistent cache enabled (see #13).
Using memcache for the cache API back-end.

Tests are performed against the latest version of the patch attached here (in #14).

Results on my laptop (Ubuntu, i7, SSD):
1. Without patch:

Concurrency Level:      1
Time taken for tests:   40.789 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      10030000 bytes
HTML transferred:       9523000 bytes
Requests per second:    24.52 [#/sec] (mean)
Time per request:       40.789 [ms] (mean)
Time per request:       40.789 [ms] (mean, across all concurrent requests)
Transfer rate:          240.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    36   41   1.1     41      47
Waiting:       35   39   1.0     39      44
Total:         36   41   1.1     41      47

2. With patch:

Concurrency Level:      1
Time taken for tests:   40.258 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      10030000 bytes
HTML transferred:       9523000 bytes
Requests per second:    24.84 [#/sec] (mean)
Time per request:       40.258 [ms] (mean)
Time per request:       40.258 [ms] (mean, across all concurrent requests)
Transfer rate:          243.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    37   40   1.6     40      47
Waiting:       35   38   1.5     38      44
Total:         37   40   1.6     40      47

There is small, but consistent improvement of about ~0.5 ms (~1%) in terms of total page speed on this small scale.
In the current test setup XHprof did'n't show any meaningful difference in memory usage.

ndobromirov’s picture

Note that within this test everything was on the local machine, so network delays were close to none. When rules data increases and DB and web servers are separated (very likely) the difference will increase.

BarisW’s picture

Status: Needs review » Reviewed & tested by the community

Thanks for the stats. Looks good to me. I'll commit this to 7.x-1.x-dev and test it some more on several sites.
If all keeps working fine, I'll create a new release.

BarisW’s picture

Version: 7.x-1.x-dev » 8.x-1.x-dev
Category: Bug report » Task
Priority: Major » Normal
Status: Reviewed & tested by the community » Patch (to be ported)

Committed. Thanks again.

Can we have this logic in D8 too?

ndobromirov’s picture

It should be possible.
I will port it when I manage to find some free time :).
One more reason to get more familiar with D8.

ndobromirov’s picture

Issue summary: View changes

Updated issue description to reflect the latest state of the patch.

KlemenDEV’s picture

Any updates on if this could be ported to D8?

joelpittet’s picture

Status: Patch (to be ported) » Closed (outdated)

When someone runs into this problem again we can look at it in D10. I'll close this for now.

Thanks for all your work on the D7 version