Problem/Motivation
#1014086: Stampedes and cold cache performance issues with css/js aggregation changes aggregate URLs to a hash + query string with theme, delta, language and libraries.
The libraries list is the 'minimum representative set' (i.e. leaves of the dependency tree that are not dependents of anything else), but can still get quite long. user/1 visiting the front page of the standard profile ends up with query strings nearly 600 chars long. That gives us some headroom before it reaches 1,000, but it's not indefinite.
There is no RFC limit on URL or query string length, and we no longer support old versions of IE that used to have an arbitrary limit of around 2,000, but it's still possible for servers to set a maximum length (for example see discussion in https://stackoverflow.com/questions/812925/what-is-the-maximum-possible-...).
The most likely way someone would run into this is a site with 300 modules on quite a restrictive hosting platform, which is... well it's very possible really even if it'll be relatively uncommon.
@alexpott suggested compressing the query string if it's going to go over 950 characters.
Steps to reproduce
Proposed resolution
There are a couple of ways we could do this:
1. In the original issue, before we had the 'minimum representative set' option for libraries, #1014086-100: Stampedes and cold cache performance issues with css/js aggregation suggested keeping a lookup table, using a base64 increment or similar, so that each library could be represented by just one or two characters in the query string. We'd have to build that lookup table on library discovery and consult it both when building URLs and aggregates. Somewhat complex to implement but would be extremely efficent for the actual URL length which would end up looking something like A%2c0%2cf%2cbA
instead of drupal/once%2cdrupal/backbone%2cdrupal/ajax
2. @alexpott suggested using gzcompress(), i.e.:
strlen(base64_encode(gzcompress("contextual/drupal.contextual-links%2Csystem/base%2Colivero/global-styling%2Ccore/drupal.active-link%2Colivero/powered-by-block%2Colivero/feed /drupal.debounce%2Ctoolbar/toolbar%2Cuser/drupal.user.icons%2Ccore/shepherd%2Ctour/tour-styling%2Ctour/tour%2Ccore/drupal.tabbingmanager%2Ccontextual/drupal.contextual-tool
This reduces 544 characters => 324 characters.
I think we should go with #2, at least until there's a compelling reason to go with #1, because #2 will be trivial to implement, and #1 will be hard.
A further complication is we've been hoping to use the URL information for ajaxPageState: #3279206: Dynamically determine ajaxPageState based on libraries. I guess if we're only ever going to pass that list back to PHP at some point, maybe it's fine to just pass the encoded list around? But is it that simple?
Remaining tasks
User interface changes
API changes
Data model changes
Release notes snippet
Comment | File | Size | Author |
---|---|---|---|
#42 | 3303067-42.patch | 13.94 KB | catch |
| |||
#42 | 3303067-42-interdiff.txt | 722 bytes | catch |
#38 | 3303067-39.patch | 13.93 KB | catch |
| |||
#38 | interdiff-39.txt | 1.97 KB | catch |
#37 | 3303067-37.patch | 14.94 KB | catch |
Comments
Comment #2
nod_We can also add a new data attribute on the script tags that have this information in clear format so we don't need to mess around with url parsing.
Comment #3
nod_I think it'd be that simple. We don't use it in the frontend, it's only to send back to php.
Comment #4
catchTook a look at this.
I started off trying to implement compression only when the list was going to be over 950 chars, but then wondered if we really need the two code-paths:
We need to deal with both include + exclude, this does increase the risk that the query string length will get very long, since any limits will be for the entire string rather than each part, if we compress over 475 chars that's quite a low threshold, and if we start checking if both strings are there etc. that seems a bit overkill.
The main reason to conditionally compress would be to save some CPU cycles, and maybe to make debugging easier, however this is all behind caching, and if we move some code around, we do the compression once per asset type, not once per asset group. I'm also not sure two different formats helps debugging that much over a shorter code path.
Additionally, given the query string is on every asset URL, compressing is going to save some bytes on the HTML page itself - say we've got a 550 char string that's on four js assets and four CSS assets, that's 2,200 characters, compression gets that down closer to 1,000.
This has all given me another idea too, but uploading a theoretically working patch before trying that.
Comment #5
catchAlright the more comprehensive version seems like it might be a goer. No interdiff because it's more or less an entirely new patch, but here's what's different.
The one new constraint is that this now needs to handle a nested array, so we can't use implode()/explode(). We also can't use php serialize() because it's user data, so using json_encode()/json_decode() for that.
The 'include' and 'exclude' query parameters potentially have a lot of strings in common, since we might have to compress both, we might as well compress them into a single string. Compressing one slightly larger string should be more CPU efficient than compressing two smaller ones.
We can also include the theme in there, since that's common to every URL and similarly may contain duplicate strings (i.e. 'olivero' provides a couple of 'olivero/foo' libraries).
We also drop the separate 'include/exclude/theme' query argument keys, and chuck all this in a single 'data' key, which doesn't save a lot but slightly offsets the extra characters from using json_encode().
Comparisons of a real example (front page of a 10.1.x standard install as user/1 again):
HEAD:
(638)
Only compressing 'include', and using implode():
(438)
Newest approach:
(477)
So this results in a slightly longer compressed string, but it's more robust against a string with both include and exclude query parameters.
One more possible approach here, but again going to upload progress.
Comment #6
catchOne more iteration.
Instead of using json_encode()/json_decode(), have the new UrlHelper methods only accept a string.
However, we still compress all the bits of the URL we can - by custom encoding the array.
(437)
This results in a URL literally one character shorter than when we only encode 'include', but it'll be more robust when there's a list of libraries in 'exclude' and cheaper to generate. If someone wanted to use the compress/uncompress functions with json_encode()/json_decode() they still can - just call it on an array before passing in the string.
Part of the reason I wanted to try a slightly more generic approach is because there's a possibility we can use this for #956186: Allow AJAX to use GET requests too.
Comment #7
catchComment #8
catchComment #9
catchMissed the cspell issue, rewording to get around it.
Comment #10
nod_on umami frontpage with this URL:
~
as a separator: 794Not too happy about having the language, theme, include, exclude params as a set indexes of the data array, feels like a future headache.
One thing to be aware of is that using
gzcompress
makes it impossible to use from the JS (at least without using something like zlib which adds quite a bit of js just for that).One thing we could do easily is replace the
,
separator by~
because that one isn't urlencoded so we save 2chars for each separator, which can add up as seen above. I haven't seen a ~ in a library name declaration so that should be safe enough, we could always prevent people from using that character in the library definition.Comment #12
catchThis is true, but if we just need to pass it back to PHP, could the AJAX API just pass the encoded data and we deal with it in PHP again? We might need a proper generic API for PHP to get the relevant information in that case though.
Yeah I kind of think #4 (but maybe with the compression API in UrlHelper like the latter patches) might end up being the way to go here.
Comment #13
catchBacked that out, new version which has the same logic as #4, but drops the LibraryCompresser class in favour of the UrlHelper methods from later patches.
Also improved the test coverage a bit.
Interdiff is against #4 since it's basically a clean-up of that approach.
Wow that is a big difference, we might want to open new issue just for that change since it should be an easy one?
Comment #14
catchHadn't fully removed LibraryCompresser
Comment #15
catchMore test improvements and accounting for another failure condition - we don't want logs getting flooded if people put garbage in the 'include' or 'exclude' query parameters, since it's a client error not a logic error, so suppressing the warning and adding test coverage for it.
Comment #16
catchwhitespace.
Comment #17
nod_That comment is either unnecessary or in the wrong place.
could be
explode(',', $exclude_string)
here no?that should be
$include[]
here no? (without the "s")Comment #18
catchShould address #17.
Comment #19
nod_Still at 528 chars, all good. It makes sense that only compressing the include and exclude value the string would be the shortest since there are a lot or repetition inside that string "core/", "core/drupal.", "umami/", etc. so it checks out that it's the most efficient.
Tried to replace "," by "~", and the string ends up longer at 540 chars. So no need to open a follow-up for that since it never appear in the URL.
This will mean that if we use that include (and exclude) parameters for ajaxPageState, the backend will need to be able to receive a list of encoded arguments instead of the exhaustive list of libraries on the page. Not relevant to this issue and not a big deal in itself, something to keep in mind for BC and all that.
Given all that it's RTBC for me.
Comment #20
catchI'm also hopeful this will help with #956186: Allow AJAX to use GET requests.
Comment #21
Wim Leers🙈 Nit: 80 cols formatting
🤓 I was confused by "uncompress" — never seen that before! So did some searching and … apparently "uncompress" is rarely used, "uncompressed" is: to indicate that something is not/never was compressed.
I think one usually uses "decompress" — just like "encode & decode".
Sorry 😬
🤔 Why did these get moved to a different location in the code?
s/include/exclude/
Wow, super elegant test coverage! 🤩
🤓 Nit: this feels like it belongs in a separate test method.
Comment #22
catch#21.2 so I nearly used ::decompress() here, but PHP uses gzuncompress() https://www.php.net/manual/en/function.gzuncompress.php and that's what we're using to compress/un-/de-compress, so I went for that. Also checked out uncompress and it appears to be more widely used for the specific case of compressed files, i.e. https://www.merriam-webster.com/dictionary/uncompress
So... I don't have a strong preference, but that was the thinking behind it. Do you still think it should be decompress?
#21.3 the query string is nearly identical for every aggregate, so this means we compress the same string once outside the foreach loop. It's behind a cache, but it doesn't hurt. If there was a suitable place we could even do it once between css and js since the list of libraries is the same, but there's no obvious spot to do that at the moment.
Comment #23
Wim LeersHah! That's more than good enough for me 🤓
Comment #24
catchShould hopefully address #21.
Comment #25
nod_Patch still applies and works. Review from #21 addressed.
All good :)
Comment #26
olli CreditAttribution: olli commented@see uncompressQueryParameter()
It seems to double encode unsafe characters from base64 encode (+, / and =).
With "base64url" encoding the parameter would be few characters shorter. Something like this: https://www.php.net/manual/en/function.base64-encode.php#123098 ?
@see compressQueryParameter()
do we need to add ext-zlib to composer.json for gz-functions?
Comment #27
andypostZlib PHP extension can't be disabled, no need to change composer
Comment #28
catchMarking needs work for #26, 'base64url' encoding is an easy change to make and worth saving some characters for.
Comment #29
catchComment #30
catchFixing the typo.
Comment #31
Wim LeersNit: I don't think we support the Internet Explorer anymore? 🤓 (Also: 🥳)
Fascinating that we can uncompress even though
=
was omitted?! 🤯🤔 This looks like the message for
!$query->has('include')
, but the message for::uncompressQuery()
returningFALSE
?EDIT: ah, yes, the "has" check already happened above. We just need to tweak this message :)
Comment #32
nod_some libraries in umami changed so went back to the same commit i used in #10 to have numbers we can compare, with latest patch we're at 502, we have a winner.
For reference, using commit 6a1855c2:
~
as a separator: 794Comment #33
catch#31.1: removed the IE reference.
#31.2: this is because
=
is only used for right-padding the string and doesn't form part of the actual compressed data, so we're able to just strip it (note that it's replaced with''
). We could also have used rtrim() there but that'd mean more nested string manipulation. I added a comment.#31.3: adjusted the comment to match the language for the 'exclude' error.
#31.4/#32: nice!
Comment #34
nod_Seems like we're good to go now.
Comment #35
Wim Leers#33: TIL
=
is used for right-padding data in base 64 encoding! I actually always wondered what that was for, and never looked into it! 😅Thanks for teaching me something! 🙏
#32: very nice!
RTBC++Just realized one last thing 😬
What happens when you have cached responses (in Dynamic Page Cache, Page Cache, Varnish, CDN … — any reverse proxy) and that still contains uncompressed aggregate URL query strings? Does that still work?
Oh, HAH! #1014086: Stampedes and cold cache performance issues with css/js aggregation is
10.1.x
-only, so … no existing sites are on this yet! And when updating to a new minor, caches are cleared already anyway 👍Comment #36
alexpottI don't think we need to move the code blocks around. I think we can add
UrlHelper::compressQueryParameter()
where it is needed and not move all the code and change the flow. Just in case someone where is relying on theif ($libraries) {
behaviour ... and trying to keep changes to minimal necessary to implement.Comment #37
catchThat's a good point. Kept the new additional comment since it's more important now we have to base64 encode and compress the string, but moved things back where they were otherwise.
Comment #38
catchMore can be inside the if ($libraries)
Comment #40
catchComment #41
olli CreditAttribution: olli commentedThe double encoding is gone now.
Drop "urlencoded" or replace it with "URL-safe" or "Takes a compressed query parameter and converts it back to the original."?
@see \Drupal\Component\Utility\UrlHelper::compressQueryParameter()
Other than that looks good to me.
Comment #42
catchShould address #41.
Comment #43
b_sharpe CreditAttribution: b_sharpe at ImageX commented#42 looks good and addresses #41, working as expected for me and tests are passing. I don't see anything else here. RTBC
Comment #44
alexpottCommitted 6d62cf2 and pushed to 10.1.x. Thanks!
The new code has test coverage and the URLs are smaller - nice.
Comment #46
Wim LeersYAY!
Hopefully see y'all in #3303067: Compress aggregate URL query strings next 🤓
Comment #47
voleger#1945262: Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering
Comment #48
Wim Leers@voleger, oops, yes, that's what I meant 😅