Problem/Motivation
Follow-up from #3496652: Allow nightly testing against development versions of dependencies, #3491405: Add performance testing, #3494505: Run performance tests in their own job etc.
The performance job currently allows failure, so a build won't fail if performance tests do.
I think this runs the risk of problems being introduced and being harder to track down where they came from, so it would be good to have it fail properly.
Generally, when performance tests fail due to a change (whether a contrib update or a recipe change), the only thing that needs to happen is to adjust the assertions to the new value. Usually one extra cache get, or query, or 1kb of CSS is fine. Having the test fail and be updated just means we know what's happening when it happens. If it's unexpected, can always open a follow-up to investigate - this is how we handle the majority of changes with core.
If something suddenly adds 20 database queries or 30 cache gets or 50kb of CSS, there's probably a problem though - and the only way to find this out is to have precise assertions in the first place.
When upcoming releases of contrib modules affect performance tests (which doesn't happen with core). This should be somewhat mitigated by #3496652: Allow nightly testing against development versions of dependencies. If a contrib module adds one extra query in a dev release, a performance test can be adjusted to use assertGreaterThan and assertLessThan instead of assertSame so it passes with both versions, then a follow-up to use assertSame for the new number.
Steps to reproduce
Proposed resolution
Remaining tasks
User interface changes
API changes
Data model changes
Issue fork drupal_cms-3496673
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
phenaproximaI really don't wanna use the strict asserts on query counts, and I admit I don't understand the reasons why we're doing that. What's wrong with setting some upper bounds on queries?
Comment #3
catch@phenaproxima so for example, let's say hitting the front page with a cold render cache takes 50 queries. To avoid a strict query count, you do assertLessThanOrEqualsTo($query_count, 60)
Issue A adds 10 extra queries, because it adds a poorly optimized block from a contrib module to the front page (or whatever).
Issue B adds 1 extra query because it installs redirect module, which does one extra database query on every page to look up the redirects.
Now issue B is responsible for upping the limit, even though it's 'innocent' and issue A went in undetected.
Either someone has to go through previous commits to find out what added the 10 queries in the meantime, which is much more time consuming than bumping a digit on issues that raise or lower a query count one by one.
Or no-one does it and you end up at 130 queries in a year with no explanation as to why.
Comment #4
phenaproximaThat makes sense. But I don't see how it prevents contributors (or hell, me) from mindlessly changing the query count in order to get tests passing in the name of delivering some feature or bug fix - which, I'm pretty sure, will always be the higher priority for Drupal CMS, given its mission and development philosophy. It's unrealistic to expect me, or any contributor, to do a performance deep dive when we add a module, or change some configuration, and the query count goes up (or down) by some amount. We'll just adjust the test and move on.
So I guess it's just hard to see the benefit of fine-grained performance testing when our product is a constellation of modules we don't really control, and our emphasis as a product is more on features than performance. In our particular situation, I fear it will be more of a barrier to contribution, and a nuisance, than a useful guardrail against suddenly introducing poor performance. What am I missing?
Your point about the dangers of upper bounds is well-taken, but I still worry that strict assertions are going to work against us from a velocity and contribution standpoint. So as a compromise...what if we adopted upper bounds, but made them fairly marginal? For example - what if we gave the query count a 5-query margin of error (or even less than that)? I'd like to find the right balance between having breathing room, and catching major performance regressions before they are introduced.
(Another thought occurs to me, which is more of a general performance testing thing and therefore not in scope - I don't know if number of queries is as useful a metric as time spent waiting for queries. Is there any way to measure the latter? If so, is that at all useful? After all, it would be entirely possible to stay within a certain number of queries while still running really shitty queries that are wasteful and slow.)
Comment #5
phenaproximaComment #6
catchDrupal CMS is intended to be used by smaller-medium sized sites and organisations, this should mean:
1. Good front end performance out of the box, without needing to write custom code or install a bunch of extra modules on top. I think #3493442: Klaro should be inactive until Vimeo or Youtube content is added saving over 200kb of JavaScript and CSS for every site visitor out of the box is a great example of this working. There is no point providing search engine optimisation recipes if Core Web Vitals is bad, it undoes all the effort.
2. Snappy admin experience, people evaluating Drupal CMS either via a demo or a local install will be put off if basic admin and editorial tasks mean waiting seconds for pages to load. #3493406: Add render caching for the navigation render array was found by the Drupal CMS tests because Navigation wasn't included in core's tests yet, and that's already fixed (in 11.2, we should be able to backport it to 11.1 still I think).
3. It should be able to run on cheaper hosting plans. This doesn't mean having to scale to hundreds of thousands of entities out of the box, but it should mean using render caching effectively and things like that. Look at the price ranges on https://www.bluehost.com/wordpress/wordpress-hosting or https://www.dreamhost.com/wordpress/#wp-plans and compare the prices to Drupal hosts, or the fact those big hosting companies don't even offer Drupal-specific hosting plans (or not that I can find).
To be able to run on very low cost hosting is to a large extent a Drupal core problem (which is being worked on to some extent), The recent installer/module install changes and memory leak plugging in 11.1 are a good example of us making progress there. But it also relies on contrib/Drupal CMS not reversing gains made in core.
If module/recipe X is added/updated, and it causes a regression in performance tests, there's two easy things that can be done:
1. Open a Drupal CMS follow-up to look into it more while going ahead for now.
2. Open an issue against the contrib module linking back to the performance test results, suggesting there might be a problem. #3491681: Klaro library is way too large is an example of this working (the issue existed two days before I would have opened it).
Both of these make it infinitely easier for either the contrib maintainer or someone like me to have a look later.
So there's another reason not to do this, but I forgot to mention it above. If you look at https://api.drupal.org/api/drupal/core%21profiles%21standard%21tests%21s... it includes the actual queries being executed. This means as soon as there's a change, you can see exactly why/what changed, without having to do any kind of additional profiling. This has saved a lot of time and messing about in core issues already (once we got it working). It's basically devel query log for tests. But for that to work, you need to assert on the actual queries.
Overall, having exact query counts + query log assertions with allow failure would be better than require fail and fuzzy assertions I think.
There's an inherent problem adding phpunit test coverage for anything based on timings, because even running the same test on the same machine results in different times, let alone running it on different machines. So we've avoided any kind of time-based assertions because they can never be reliable enough to pick a pass/fail threshold and not cause problems in one or the other direction.
However, the grafana dashboard that you can get locally with ddev gander, and which we're hoping to set up for Drupal CMS, does include the timings: https://gander.tag1.io/explore?panes=%7B%22Tkt%22:%7B%22datasource%22:%2... (hopefully the link works)
If not, going to https://gander.tag1.io/?orgId=1&refresh=30s and clicking on an 'Umami front page cold cache' will show the same trace.
So it's easy to see the timings, it's just not a good idea to assert on them. There's also a secondary issue that almost any database query on a nearly-empty database is going to be fast, to flush out slow queries we'd need to test against a lot of content. I have thought about running EXPLAIN on every query and then failing if there's a filesort or similar, but not got anywhere near that yet.
Comment #7
phenaproximaThat's all pretty persuasive. Thanks for taking the time to explain it.
Here's what I propose:
assertLessThanOrEqual()assertions, but keep their current numbers. That makes it okay for there to suddenly be fewer queries, but you need to explicitly adjust the tests if more queries are added.Comment #8
catchThis also causes problems unfortunately, however as I was writing up a long response I might have found a way to mitigate most of those problems:
The first problem:
- Let's say you use assertLessthanOrEqual(60). The actual query count is 60.
- One issue/contrib update removes a query, the assertion stays the same, the real query count goes down to 59, unknown to anyone.
- Another issue/contrib update adds a query, now you're back to 60 - but this additional query is due to a bug. Again it's not recorded.
- Another issue adds one more query, this one is necessary, but now you have to increase the query count to 61, but you didn't learn anything from the previous two issues.
2. As discussed above, performance tests allow for asserting on the exact queries being run, I didn't implement this in the Drupal CMS performance tests yet, mainly because #3494208: Get coffee data only when the search box is opened made that incredibly noisy (it's now fixed but it's not in a tagged release yet).
However once you add assertions on actual queries, which is ideal for debugging/understanding changes, you can't assertLessThan() any more if you're asserting that two arrays are identical.
However, mayyyybe instead of asserting the exact queries we could use an array_intersect()/array_diff() or something, so that if queries are removed from the expected queries it's fine, but not if they're added.
This would solve the inability to assert on exact queries with some degree of fuzziness, it would make it easier for tests to take into account the canary job (can add a query without it failing on the non canary job), and it would prevent new queries from slipping through even if the overall count is way under.
The only thing that leaves then is that eventually there will be crufty queries in the assertions that never run, but it's easy to adjust the test locally to exact assertions, delete a load of stuff from the array/lower thresholds, then change the assertions back and push an MR to clean things up.
We can switch to assertLessthan() before #3494208: Get coffee data only when the search box is opened is fully resolved in Drupal CMS, but sorting out the specific query assertions really needs it IMO to avoid a lot of noise/cruft that immediately needs to be removed.
Comment #10
catchWorked on this a bit:
1. A couple of assertions are failing on HEAD already, updated those here so they pass.
2. Added specific database queries along with a lessThanOrEquals assertion as described above. This should work to still find new queries without being as noisy when they're removed.
3. Added @todos on 11.2 to improve the specific assertions further.
4. Removed allow_failure: false from the performance test job, although easy to revert that and move it to another follow-up if you wanted to get the test improvements first.
I haven't made a 1.0.x MR yet, but happy to do so once the overall approach her has had reviews, or could do the backport after initial commit, or could alternatively not backport and let this settle in a bit until it becomes the release branch.
Comment #11
phenaproximaI'm comfortable with this change. I think it's good to have some solid guard rails against sudden performance problems, and I've seen enough consistency from the performance test since it was introduced, to convince me that it will genuinely catch performance regressions and not become a nuisance that slows development down.
Comment #13
phenaproximaMerged into 1.x. Thanks!
I'm also not sure about backporting, so leaving this open to continue considering that.
Comment #14
phenaproximaI think this is as far as we're gonna take this. In 2.x, I have turned the performance test job into a manually-triggered job (which also supports scheduled runs), because the performance tests have proven to be somewhat flaky and are constantly needing tweaking and maintenance as our upstream dependencies change. The effort of keeping them up-to-date is not worth it, and is slowing development down.
So although I do think there is value in the performance tests, and that we should keep them and their infrastructure, I don't think they should be part of our day-to-day CI runs, and certainly not something that be commit-blocking.