Problem/Motivation
Postponed on #3346765: Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests and #3352389: Add open-telemetry/sdk and open-telemetry/exporter-otlp as dev dependencies.
See #638078: Automated performance testing for core for rationale.
We're not able to easily see changes on core backend and frontend performance over time, this MR adds the capability to display graphs in a grafana/tempo instance as well as browse individual traces to see what's different.
Steps to reproduce
Proposed resolution


#3346765: Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests gives us the ability to pass or fail tests based on certain performance characteristics, but this only works for things you can count.
For things that can't be counted but only timed (like how long a request takes from first byte to first paint), we could instead look at graphing these over time and flagging changes.
Open Telemetry is a suite of tools that will allow us to do this via adding some development dependencies, and setting up a 'collector' on Drupal.org infrastructure.
We can add the capability to PerformanceTestBase to send traces to OpenTelemetry, these can then be made available for browsing in signoz. signoz will show both trends over time and allow you to drill down into individual traces. See the screenshot for a proof of concept.
Traces are currently using data from the performance navigation API: https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTi... via executeScript() - this lets us log this information without running any extra JavaScript on the tested site.
Note that the MR doesn't include the OpenTelemetry collector or the configuration to enable this, that will have to be handled on the infrastructure side. However there is a test repo using this patch that gives you a full end-to-end installation via ddev. https://github.com/tag1consulting/google-drupal
Remaining tasks
Once we have the initial framework for sending traces, we can add more tests, and more data to traces.
#3377660: Add authenticated user telemetry tests
#3377657: Add database query spans to otel traces
#3377654: Support 'interaction to next paint' (or close equivalent) in performance testing
#3377655: Add script, style, image HTTP requests to otel traces
#3379761: Add the commit hash to OpenTelemetry traces
#3379757: Track largestContentfulPaint::candidate events in PerformanceTestBase and allow assertions on them
#3379750: Figure out cold start issue with chromedriver performance logs
User interface changes
API changes
A new PerformanceTestTrait is added, it is included in the existing PerformanceTestBase.
A PerformanceData value object has been added in the Drupal\Tests namespace, this is only used by PerformanceTestTrait.
To get performance data to assert on within a test:
Before:
$this->drupalGet('node/1');
$this->assertSame(2, $this->styleSheetCount);
$this->assertSame(2, $this->scriptCount);
After, to record performance data to assert on, explicitly enable profiling via calling the 'collectPerformanceData()' method, which takes a callable.
$performance_data = $this->collectPerformanceData(function () {
$this->drupalGet('node/1');
});
$this->assertSame(2, $performance_data->getStylesheetCount());
$this->assertSame(1, $performance_data->getScriptCount());
The former behaviour of always recording performance data and adding information on PerformanceTestBase properties is removed, but is not in a tagged release yet, so no bc is provided, just existing tests updated:
To log telemetry to open telemetry, call the new ::logTelemetry() method, API is otherwise the same as ::collectPerformancedata(), so it's possible to combine both telemetry logging but also assert on PerformanceData from the same request.
$this->logTelemetry('umamiFrontPageWarmCache', function () {
$this->drupalGet('<front>');
});
Data model changes
Release notes snippet
Tests extending PerformanceTestBase can now additionally send OpenTelemetry traces to an open telemetry endpoint, but setting the OTEL_COLLECTORenvironment variable. An OpenTelemetry collecter must be accessible from the environment running the test.
| Comment | File | Size | Author |
|---|---|---|---|
| #92 | 100x_PerformanceTest.patch | 2.01 KB | spokje |
| #85 | 3352459-followup.patch | 915 bytes | catch |
| #60 | Screenshot 2023-09-18 at 2.28.37 PM.png | 498.62 KB | smustgrave |
| #54 | 3352459-nr-bot.txt | 90 bytes | needs-review-queue-bot |
| #52 | 3352459-nr-bot.txt | 90 bytes | needs-review-queue-bot |
Issue fork drupal-3352459
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
- 3352459-trait
changes, plain diff MR !4900
- 3352459-log-traces-from
changes, plain diff MR !4226
Comments
Comment #2
catchComment #3
catchComment #4
catchCouple more lines to bring in from the other issue.
Comment #5
catchRe-rolled based on the latest changes in #3346765: Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests.
Comment #6
catchExtra cleanup.
Comment #7
andypost$_ENV['OTEL_COLLECTOR']needs some request object)Comment #8
catchStill need to fix #7, couple more changes now this is based on the base class from the parent issue.
Comment #9
catchHad a quick look at getting environment variables in Symfony and they have a whole separate API for it, couldn't see a simple wrapper like the request object unless I'm missing something obvious.
New patch which abandons first content paint and largest contentful paint for now since they don't seem to be reliably available in the chromedriver log, and just does time to first byte instead.
Comment #10
catchNeed to set span.kind to server for it to show up in the Jaeger monitoring tab.
Comment #11
catchAdding an extra request and renaming the anonymous test so we can log/graph node/1 too.
Comment #12
catchUsing the performance timing API now - seems more reliable.
Comment #13
catchI've updated the issue summary to show a screenshot of how this looks in Jaeger/Prometheus and a bit more information about the implementation.
Comment #14
catchComment #16
catchComment #17
catchComment #18
catch#3346765: Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests landed so this now just needs #3352389: Add open-telemetry/sdk and open-telemetry/exporter-otlp as dev dependencies.
Comment #19
catchRe-rolled now that #3346765: Add PerformanceTestBase for allowing browser performance assertions within FunctionalJavaScriptTests is in.
Comment #20
wim leersExciting!
#3352389: Add open-telemetry/sdk and open-telemetry/exporter-otlp as dev dependencies landed so updating title.
Is this ready for review?
Comment #21
catchCould definitely use high-level review!
The best way to try this out is to use the repo at https://github.com/tag1consulting/google-drupal, should be just a couple of steps if you already have ddev, or a handful of steps if you don't. That way you get a local instance and can see what it looks like - very sparse at the moment but shows where things could go.
As it currently stands there is no public opentelemetry instance for us to log to, but I think the dependency issue landing and this one getting some review would be enough to make it worthwhile setting one up (although possibly not jaeger/prometheus - have been discussing using https://signoz.io/ with @nnewton but that is still @todo and TBD). Until we have a telemetry instance and gitlab CI setting the environment variables, this will be dead code in core and only useful for local development. So there is a bit of chicken and egg there.
Comment #22
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #23
catchSwitching to an MR now the two dependencies have landed.
Comment #25
catchThe testing repo needs updating for the latest core commits again so best to hold off trying to test that for now.[edit, this needs #3352389-34: Add open-telemetry/sdk and open-telemetry/exporter-otlp as dev dependencies to land]. [edit again: now fixed in the testing repo] Patch should be the same so leaving at needs review.Comment #26
heddnI think we have a solution for LCP mentioned in the IS. Maybe we should update that. The actual code here looks fine and has gone through several rounds of reviews already.
Comment #27
catchI sort of found a way to get Largest Contentful Paint originally, but also it was unreliable due to the way chrome logging works, so that is still TBD. The code for that (as far as I got) is in #3352459-8: Add OpenTelemetry Application Performance Monitoring to core performance tests.
The problem is that the chrome performance log can only get you Largest Contentful Paint candidates and it's up to interpretation which one is the actual Largest Contentful Paint (it should be the last one for any particular request generally).
What I was finding was that the first request in a test would not produce anything for LCP (i.e. there would be no candidates at all), but then the second or third request would, which suggests some kind of warm up or race condition issue with the chrome performance log or some artifact of retrieving the log programmatically in phpunit vs. looking at it in dev tools.
This is also an issue with the performance typing API which I switched to in #3352459-12: Add OpenTelemetry Application Performance Monitoring to core performance tests - i.e. it's better for everything else but still unreliable for LCP.
Comment #28
catchComment #29
catchComment #30
catchComment #31
catchHave done a bit more work on the dashboard, switching from Jaeger + Prometheus standalone, to Prometheus + Grafana + Grafana Tempo. Grafana gives us a lot more options to display data, and Grafana Tempo handles traces. This is still early days but you can see the possibilities.
I've also updated the MR to handle largest contentful paint again, after figuring out the weird things going on in the chrome performance logs (see inline comments). The dashboard is now able to expose the LCP issue in Umami from #3349447: Improve Largest Contentful Paint in Umami front page on the front page - compare the timings for to node/1.
Comment #32
catchComment #33
catchComment #34
catchComment #35
catchI've posted a chromium bug report here: https://bugs.chromium.org/p/chromium/issues/detail?id=1463436
There's two further ways to get the LCP event:
1. embed JavaScript on the tested site and log to opentelemetry from the tested site instead of the test. i.e. we'd need an opentelemetry module which we install during tests. This is how you'd do it for a production site because of course you don't control the browser in that case.
I started doing this with chromedriver instead because 1. I didn't think it would be this hard ;) 2. there's some data from the performance log we can assert on, so it's useful not only for opentelemetry 3. it leaves the tested site 'clean' without additional JavaScript - for example we have tests that ensure there is no anonymous js at all in the standard profile, once we add js we can't test the non-js state.
2. Inject some js via chromedriver, with an event listener for performance events into the browser instance, have that listen to the various events, using its own timer (so ignoring the timestamps of the events), then collecting that log in the test. but... no idea if that's possible with the behat/mink APIs.
We'll need to run some PHP on the tested site to collect things like database queries etc. but that is completely unavoidable (and it'll be easier to account for than having js vs. none at all). But.. we could switch to a more 'production' APM set-up with the trade-offs acknowledged if we need to.
Comment #36
catchAnother dashboard update, but getting much closer to something actually useful now.
Comment #37
catchThe dashboard/tests are now at the point where you can visually see the improvement from #3349447: Improve Largest Contentful Paint in Umami front page in the graphs on the OpenTelemetry side (which is the repo in the issue summary).
The green dots are LCP, the blue dots are FCP, the yellow dots are TTFB.
After the patch is applied, the green dots and blue dots merge - this means instead of two Largest Contentful Paint events, there is only one, which happens to be the same paint as the First Contentful Paint. Can also see the timings go down by 150-200ms.
I'll try to reproduce this one more time and post some better screenshots including traces next week.
Comment #38
catchComment #39
catchComment #41
heddnRan through the setup instructions and got this running locally. The prometheus/grafana/tempo UI looks really snazzy. There still a few nits on the MR, but once those are addressed, I don't have any objects to this. LGTM.
Comment #42
heddnI was looking through what we log to otel today and wondered if there was any way to get the ttfb, lcp and fcp numbers returned from PerformanceTestBase after calling
drupalGet(). In addition to logging to otel, we could also write asserts that one of these stats is less then X seconds? I can see where that would be a bad thing generally speaking because of slow testbots or other one-off issues. So not adding the feature is probably a good idea. But just raising it for completeness so we can quickly reject it.Comment #43
andypostComment #44
catchIt's definitely possible, the problem is once we have random failures no-one trusts anything, which is already a problem with normal js testing.
I had to split #3361833: Add image request counts to PerformanceTestBase out of the PerformanceTestBase issue since you sometimes get 7 or 8 images loaded between requests, which I assume is something slightly non-deterministic about how chrome calculates whether an image needs to be lazy loaded or not. Haven't gone back to that one recently.
Having said that, with Umami the FCP and LCP are exactly the same event after #3349447: Improve Largest Contentful Paint in Umami front page, and this is probably already the case for the standard profile. So... I wonder if we could use assertIdentical()? That would only catch a certain class of issues but it'd be more than nothing. We'd probably need to do the 'run the test 10,000 times' thing to make they really are identical and there's aren't occasional requests where chrome does something different.
On actually running these tests, my proposal is we run them sort-of twice:
1. On normal phpunit test runs, we don't set OTEL_COLLECTOR, so that collection doesn't happen. However running the test will make sure they don't get broken by other changes in core.
2. In a separate gitlab pipeline, we'd run the tests on a schedule against each branch (hourly + after each commit might be good), with no concurrency, so that the machine is doing one thing at a time. Hopefully that way we'll get more consistent results. However while this will hopefully help to avoid huge variation in the otel data, it won't help with consistency in timings for general test runs.
General note I looked into combining the test methods into a single method and clearing caches in the appropriate places etc. to try to reduce the number of times we need to re-install core, but that led to inconsistent results all over the place. So if we want to try to do that, it would be better done in a follow-up issue when we've got a consistent set of results to compare against. The chrome performance log is unfortunately not easy to work with like this.
I'll try to go through the MR nits soon. Also want to add a similar test case to the umami one for the standard profile. And maybe some logged in tests for both.
We need the grafana stack + gitlab pipeline set up before this can actually do anything publicly accessible (although that's part of the google project this is funded by), so while it would be great to have it committed, there are more steps to getting it going.
Comment #45
catchI've opened some follow-up issues postponed on this one for adding more information to traces. It would be possible to do here, but trying to keep this reviewable.
#3377660: Add authenticated user telemetry tests
#3377657: Add database query spans to otel traces
#3377654: Support 'interaction to next paint' (or close equivalent) in performance testing
#3377655: Add script, style, image HTTP requests to otel traces
Comment #46
catchComment #47
heddnI think we can do RTBC again.
Comment #48
wim leersLooking great, but I have a few questions, and am slightly worried about so many
@todos without follow-up issues. 😅Comment #49
catchI've added the following three follow-up issues as requested by Wim. Think I was able to resolve all the other threads too.
#3379761: Add the commit hash to OpenTelemetry traces
#3379757: Track largestContentfulPaint::candidate events in PerformanceTestBase and allow assertions on them
#3379750: Figure out cold start issue with chromedriver performance logs
Comment #50
andypost@catch is there a reason why https://git.drupalcode.org/project/drupal/-/tree/3352459-log-traces-from... branch exists in core repo?
Comment #51
catchNo good reason! Deleted!
Comment #52
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #53
catchComment #54
needs-review-queue-bot commentedThe Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.
Comment #55
catchSorry bot but I don't believe you.
Comment #56
catchComment #57
mherchelIs this MR right? There's over 1000 changes... many of them inconsequential.
Comment #58
catchNot sure what happened to it, but did a rebase which seems to have cleaned up the MR.
Comment #59
smustgrave commentedThink it would be really neat to have this included and maybe incorporated into the review process. Not sure who to ping though for all those approvals haha
Comment #60
smustgrave commentedCloned the repo and setup locally, had to do some workarounds as chromedriver addon doesn't work on M1 Macbooks.
Was able to get running and verified the charts populating, very neat!
Don't think I can mark till release manager and framework managers sign off.
But question, with all the work of gitlab where would this fall into that workflow? Would/could it be a tab reviewers could see like the "test-only" tab being worked on?
Comment #61
catchSo the idea is that we'd configure a specific pipeline job that sets the environment variable with an open telemetry endpoint, and then run only --group OpenTelemetry tests (directly with phpunit, not run-tests.sh). This would run only on scheduled core pipelines, not MRs or commits. That way the data going into the graphs is from core branches and on a predictable schedule.
No because this only populates the grafana/tempo installation per the above screenshots, and it's specific for things that can vary slightly between tests like timings that we can't easily add assertions for. However, once we have a public grafana/tempo instance for gitlab ci to point to, we'd be able to have a public dashboard either at a URL like core-performance.drupal.org and/or embedded somewhere. Exactly where to put it is tbd. Here's Gutenburg's (very different) dashboard for comparison: https://www.codevitals.run/project/gutenberg
Comment #62
smustgrave commentedAh makes sense! +1 from me.
Comment #63
fabianx commentedTested locally!
Works really well.
For those that want to do that on a M1/M2 from the google-drupal repo, here are the steps:
The rest works normally.
Reviewed:
There had been 4 things I flagged:
- a) One nit / question
- b) A suggestion to add a proper API, so that object properties do not need to be used as API
- c) I read now that JS wants to be avoided to load on the page directly, so it's probably answered, but would still be good to make this clear. (as this design decision is only in the chromium bug report)
- d) It's unclear how spans are working in relation to scopes; might be good to add some comments to explain how that works
Overall looks really really good and once these questions are answered, we can go back to RTBC!
Comment #64
fabianx commentedRTBC!
Thanks, Nat! The new API looks splendid and works fantastically well locally.
Comment #65
catchAddressed @quietone's points on the MR (I think) - leaving RTBC since these were 100% comment changes.
Comment #66
catchRemoving the FEFM and release manager review tags, we'll need RM review if we add alerting or similar in (which would be on the grafana end), but just for having this in core and producing the dashboard I think the main thing is the implementation here.
Comment #68
catchI started to look at a middleware, but realised we'd need a static + getter and setter to set telemetry logging on and off, then another static + getter and setter to hold stylesheet/script counts so that the test can assert on them.
However, I then realised a trait should be possible without the above drawbacks, so I've pushed a new branch that starts work on this.
Originally rejected a trait here here because PerformanceTestBase needs to override setUp() etc. in a way that normal class inheritance can continue to function, but we can do that by still having PerformanceTestBase and it calling out to some helpers. Then a different test base would have to do the same thing but that's only a little bit of boilerplate not hundreds of lines of code.
@alexpott mentioned in slack that a middleware would make logging telemetry data from non-drupalGet code easier - I think we can achieve this with the trait via a similar trick to logTelemetryData() so have added ::collectPerformanceData($callable, ?PerformanceData $performance_data);
This will either allow for logging performance data from things like form submissions, link clicking or drupalPost() out of the box, or with very little modification in a follow-up. It also means that other requests (log ins, cache warming), don't bother looping over the chrome performance log to collect request counts etc.
However it's a large refactoring of the existing PerformanceTestBase class that's already in HEAD, and I don't want to get bogged down here, so unless the API change here helps to get this issue committed faster, we might want to move that to a follow-up? There's still time to change the API before 10.2.x alpha/beta is tagged but not that much, and I don't want to be then adding bc layers in this issue too.
Comment #69
bhanu951 commentedSeems a random test failure. Initiating retesting.
Comment #70
catchIf the new approach looks OK we should update https://www.drupal.org/node/3366904, none of this is in a tagged release yet so just updating the existing change record should be fine.
I am not 100% on the PerformanceData value object, it gets rid of the class properties and makes everything more explicit but it also makes the profile requests more verbose. Generally the performance requests being more verbose feels like it's probably fine, it's good to make that explicit rather than implicit to differentiate them from setup/warmup steps.
Comment #71
fabianx commentedRe-reviewed and the changes look good to me.
Originally I thought that the collectPerformanceData() could just return the $performance_data object, but as it also needs to return the result of the inner closure, passing it as an argument is the best solution indeed.
I agree that a trait is the best compromise between an inflexible class and a hard-to-register MiddleWare.
A MiddleWare can easily be created in the future to test normal page requests using the new trait.
Back to RTBC!
Comment #72
catchYeah I didn't have a better idea. I don't know when we ever actually use the return value of drupalGet(), especially in web driver tests, but if don't return the callable return, there's no way to do it.
I guess we could maybe do something like PerformanceData::getOriginalReturnValue() or something, and stick it in there, then we can get rid of the argument and have less boilerplate. That would be an easy change to make but will wait for feedback before doing it.
Comment #73
fabianx commentedOh - I love that idea of just storing the returnValue in the performanceData.
++ to less boilerplate.
both feels really good.
And given it's an edge case that probably is not needed in 90% of cases, it's good to optimize for the common case.
Comment #74
catchUpdated the MR for the change in #72.
Comment #75
fabianx commentedBack to RTBC! Changes look great to me!
Nice API!
Comment #76
catchI've updated the 'API changes' section here, this can become the basis for the updated CR after commit too.
Comment #77
catchComment #79
alexpottRemoving the needs framework manager review tag. I've reviewed the code extensively and @catch and I have discussed all the API surface and these discussions have lead to the introduction of the trait and use of privates and immutable value objects. I think we're at the point where we can put this in and be confident that future changes to the API will be small and we can always increase the surface area if needs be.
Comment #80
alexpottCommitted a8f52aa253a and pushed to 11.x. Thanks!
Comment #81
catchUpdated https://drupal.org/node/3366904.
Comment #82
catchLooks like the commit didn't go through.
Comment #84
alexpottPushed... sorry.
Comment #85
catchManaged to get a wrong array key in some of the refactoring. This code only runs when tracing is enabled which is not the case yet.
Comment #86
alexpottCommitted the follow-up - no point waiting on tests as tests aren't tested :)
Comment #88
andypostThere's failure in SQlite job - probably because for sqlite database the testing directory is kept
Comment #89
andypostRef https://git.drupalcode.org/project/drupal/-/jobs/142771
Comment #90
catchThat seems like it would be intermittent, the tests themselves don't do anything special.
Comment #91
spokjeComment #92
spokjeComment #93
alexpottSo what I think is happening here is that the compiled javascript is being written to the site while we're trying to remove the test directory :) and this is happening because we've enable js and css aggregation.
Comment #94
spokje@alexpott That seems to make sense (per usual).
Not sure why this only hits SQLite, but that seems to have a fail rate of ~45%.
Should we open a follow-up, or revert this and fix here?
Comment #95
catchI've opened #3392125: Random test failure in Drupal\Tests\demo_umami\FunctionalJavascript\PerformanceTest, if it's #93 there might be a quick fix.
Also I'm a bit surprised that it's hitting this test in particular, because it predates this issue, rather than the new tests that were added here, but maybe it's one of the logic changes somewhere.
Comment #96
catchHave a non-optimal but working patch on #3392125: Random test failure in Drupal\Tests\demo_umami\FunctionalJavascript\PerformanceTest so going to go ahead and move this back to fixed.
Comment #97
catchUpdate just to say there is now a working public dashboard at http://grafana.tag1demo.tag1.io/d/teMVIdjVz/umami?orgId=1&refresh=30s&fr... it is currently populated by pushing buttons on the MR from #3391689: Add a performance tests job to gitlab and send data to OpenTelemetry which could use review.
Comment #98
tyler36 commentedUnable to get the tests running with the demo repo
- Submit MR to fix composer setup
Now I'm getting errors:
Does the demo repo include the latest patch/MR? Are there other steps required?
Comment #99
catch@tyler36 open telemetry has a new release out so you may be running into an issue with that.
Comment #100
spokjeThere's at least one problem with the new release we know of: #3395453: Bump "open-telemetry/*" to latest to make daily "updated deps" QA run pass again
Comment #101
tyler36 commentedThe testing repo provided in the "steps to reproduce" didn't include a
composer.lockfile. So following the provided instruction resulted in the latest version.Comment #102
catchI've committed #3395453: Bump "open-telemetry/*" to latest to make daily "updated deps" QA run pass again so a composer update on the sandbox repo might fix the issue for you now.
Note that the repo also currently doesn't account for fixes in #3391689: Add a performance tests job to gitlab and send data to OpenTelemetry but those shouldn't (hopefully) block getting something up and running.
Comment #104
gábor hojtsyI think we agreed at 10.1 that we don't include this in the annoucement until there is something practical, and now there is I think so adding 10.2 highlights tag.