Problem/Motivation
Drupal uses PHP's parse_str function to parse querystrings in multiple places. This function incorrectly assumes that a parameter name can occur only once and returns just the last value.
Example with drush php:
>>> print \Drupal\Core\Url::fromUri('https://example.com/page?tag=one&tag=two')->toString();
https://example.com/page?tag=two
This behavior is a PHP-ism and is not a web standard. Other platforms may and do use the format of repeated param name to pass multiple values.
Further reading:
Proposed resolution
Implement a version of parse_str that is backwards compatible, i.e. treats ?tag=one&tag=two the same way as ?tag[0]=one&tag[1]=two.
Remaining tasks
- Decide if this ticket should be for internal and external URLs, or external URLs only. See #31 and #41.
- Add a way to control the output of query arguments.
User interface changes
None
API changes
New static method Drupal\Core\Url::parseString($string).
Data model changes
None.
Release notes snippet
Fixed a bug in parsing query strings with repeated param names.
| Comment | File | Size | Author |
|---|---|---|---|
| #59 | 3038774-59-11.3.x-reroll.patch | 13.42 KB | jopdebeeck |
| #56 | 3038774-56-11.x-reroll.patch | 14.26 KB | jopdebeeck |
| #51 | 3038774-51-10.2.3-reroll.patch | 13.8 KB | glenndw |
| #49 | 3038774-49.patch | 14.11 KB | jopdebeeck |
| #45 | reroll_diff_40-45.txt | 7.1 KB | afireintheattic |
Issue fork drupal-3038774
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
blazey commentedComment #3
blazey commentedComment #5
blazey commentedComment #6
blazey commentedComment #9
panchoIt makes sense to have this configurable, though it's important to point out that neither one nor the other interpretation is an industry standard. There simply is no standard. Every platform handles this in a different way, see this overview from 2009:

(source: https://www.owasp.org/images/b/ba/AppsecEU09_CarettoniDiPaola_v0.8.pdf)
Please note that some hacker tricks make use of non-consistent interpretation of repeated query parameters, a behaviour that is known as "HTTP Parameter Pollution" (https://www.owasp.org/images/b/ba/AppsecEU09_CarettoniDiPaola_v0.8.pdf). This doesn't mean that abandoning parse_str() for a more flexible solution is a bad idea, it just means:
Please also note that W3C recommends
Comment #10
panchoAs we're currently in line with PHP, while there is no clear web / industry standard, I can't see how this is a bug. Maybe a task.
Found another good overview with some interesting references at the bottom: https://www.owasp.org/index.php/Testing_for_HTTP_Parameter_pollution_(OTG-INPVAL-004)
This is also interesting: https://cs.brown.edu/~vpk/papers/arc.acns12.pdf
Comment #11
blazey commentedIt really is a bug. The reason why I'm raising both this and #3038781: Output multiple values of a query param with repeating param name is that we've got an external web app that requires this a way of providing arguments and we need to store links to that app's screens. Urls fed into Drupal are correct but the output is broken.
Comment #12
pancho@blazey
Just to make it clear, I already said I'm sympathetic to the idea of allowing different parsing strategies for query parameters. And I'm sure you have a proper use-case and need it to work that way.
However, while the fact that you need this feature might make it an important feature, it doesn't make it a bug. There's nothing wrong with our current parsing strategy. It's just not the one you happen to need for your external web app.
If something's broken, then it's the expectation that different platforms would parse query parameters the same way.
I'm in favor of allowing other parsing strategies as a feature but not at the cost of breaking the expectations of other sitebuilders. And certainly not at the cost of unconsidered HPP risks.
It doesn't seem you read my links, and/or understand how sensitive this issue is. I'm leaving it there though and not going to argue with you further. I basically support your idea, so good luck with it.
Comment #13
blazey commented@Pancho I fully agree that this issue needs to be treated seriously and that there are security concerns. This is a matter of the shape of the solution.
However, when we take a step back and try to categorize the issue, we need to take these two facts into consideration:
1 - The standard doesn't say anything about the need for query string parameters to have unique names
2 - This is how it currently works in core:
This shows that we're making assumptions that go beyond the standards. That could still be ok-ish if we only used the Url class to represent internal links, as we'd always be building and parsing them ourselves. That's not the case though. The Url component is designed to handle external links as well and the example above shows that it's impossible to have it parse a valid address in that particular case.
Comment #14
blazey commentedFixing some of the tests.
Comment #15
blazey commentedComment #16
leksat commentedMarking this RTBC because it works :)
To reduce the security risk we could use a well known library instead of maintaining custom code.
Something like https://github.com/thephpleague/uri-query-parser could work. But this one requires PHP >= 7.2. Does this mean it can't be used with Drupal 8?
Comment #17
blazey commentedIf it's going to be committed at any point, please credit @leksat. He did a detailed review of the patch and helped with fixing the test failures. Thanks!
Comment #18
larowlanwe need a comment here
we definitely should be adding extensive unit tests for this
should this be @see?
Comment #19
blazey commentedThanks @larowlan. Here's a patch with all 3 points addressed.
Comment #20
larowlanDoesn't apply
Are we sure we want to override such a low-level function of PHP.
After discussing this issue with a long-term prolific contributor, the opinion was broached that perhaps this is something that belongs lower down the stack, e.g. in Guzzles PSR7 library (they already have parse_query implementation that covers a lot of the use cases here, but is not as thorough as parse_str).
Said contributor also pointed out that the current implementation doesn't handle nesting such as
Comment #21
blazey commentedThat's great news! Now we have two external functions, each covering a part of the spectrum. Here's a patch that is an attempt to combine both tools into a complete solution.
Nested values have been added to the unit test's data provider.
Comment #23
johnwebdev commentedThe test fails because parse_query returns null if the key has no value, and parse_str returns an empty string. Let's change that value to an empty string for BC, and also update the test to ensure the expected values matches type as well.
Comment #24
leksat commentedSorry for pushing this back, but I have found few issues.
I tested the patch with some random query strings. The test code was:
Results:
query=stringparse_strUrlHelper::parseStringarr[]=1&arr[]=2arr[][]=1&arr[][]=2arr=1&arr[]=2parseString takes first
a=1&b[]=2Comment #25
blazey commentedThanks for these examples @leksat. I'm working on a solution that would behave more like the
parse_str. The patch should be ready tomorrow.Comment #26
blazey commentedIt turns out
parse_queryintroduces more issues than it solves, so here's another take on the problem. It works like this:parse_strparse_strindividuallyThis way we get rid of problems with parsing (everything goes through
parse_str) and ordering (the initial execution dictates the order of keys).All the examples provided by @leksat have been added as test cases plus an additional one for upper / lower case letters in key names. One additional inconsistency between the parsing functions has been spotted. It turns out that
parse_strturnso%20eintoo_e(noto e) so this case has been updated.Comment #27
blazey commentedThe patch from #26 after a rebase.
Comment #30
carolpettirossi commentedWe faced the same issue and applying patch #27 worked for us.
Comment #31
alexpottIs this problem only being hit when trying to render external URLs entered into menus and link fields? Because if so I think there might be a less invasive solution. We can short circuit the \Drupal\Core\Utility\UnroutedUrlAssembler::buildExternalUrl() method and do something like:
We only need to process the URL if we changing the scheme, query string or fragments and this is not at all the usual case. With this code links with a value of
https://example.com/page?tag=one&tag=twoare outputted correctly for both link fields in content and in menus. That way this change would become a performance optimisation for displaying external links instead of being some that moves low level code from C to PHP.Comment #32
carolpettirossi commentedRe-rolling #27 patch for Drupal 9
Comment #35
nikitagupta commentedfixed the test cases.
Comment #36
carolpettirossi commentedI'm facing an issue when rendering a link when it contains more than one value for the same query param (e.g. industry_sector below)
So, for example, this URL (
/search-jobs?opportunity_types=Job&industry_sectors=1&industry_sectors=2&industry_sectors=3&start=0&default=1) added to a Link field is displayed like:/search-jobs?opportunity_types=Job&industry_sectors%5B0%5D=1&industry_sectors%5B1%5D=2&industry_sectors%5B2%5D=3&start=0&default=1Any ideas where we could fix this part?
Comment #37
daffie commentedAdded the requested change by @alexpott.
Comment #38
daffie commentedFixed the coding violations.
Comment #40
daffie commentedRemoved the following test from the patch:
This test fails because we added the code as requested by @alexpott in comment #31.
Comment #41
carolpettirossi commentedGuys, just a quick update that even with the new patch provided on #40, the issue (#36) when rendering a link when it contains more than one value for the same query param still happens.
For external URLs, I think this would address it: https://www.drupal.org/project/drupal/issues/3007243
However, how should we deal with internal URLs? Any ideas?
Comment #42
carolpettirossi commentedComment #45
afireintheattic commentedRe-rolled #40 for 9.3 support -- patch and diff is attached.
Comment #48
jopdebeeck commentedRe-rolled #40 for 9.4.12
Comment #49
jopdebeeck commentedRemade the patch from #48
Re-rolled #40 for 9.4.12
Comment #51
glenndw commentedRe-rolled #49 for 10.2.3
Comment #52
andyf commentedIt looks like there's an unanswered question of ticket scope: are we only interested in external URLs, or should we also deal with internal? Personally I'd suggest that if we can't simply solve both, then it makes sense to at least support external URLs along the lines of #31 and if we want to handle internal links do that in a separate ticket. The only issue with #31 that I can see is the slight WTF you'd get from adding an entirely separate query param (or even a fragment) to such an external URL: I don't think you'd expect that to change how the duplicate query parameters are handled. ie...
@carolpettirossi could you give any extra detail on why/how you're using local links with duplicate query parameter names? Thanks!
Comment #53
mauriciopieper commentedI can confirm patch #51 works on Drupal 10.3.
Comment #54
michelle-buckby commentedHello, I have tried using the patch at #51 on a 10.2.11 project.
This patch applies and retains the duplicate query parameter, following the proposed solution pattern (?tag[0]=one&tag[1]=two) but with an error clicking the link: A client error happened
```
Symfony\Component\HttpKernel\Exception\BadRequestHttpException: Input value "keyword" contains a non-scalar value. in Symfony\Component\HttpKernel\HttpKernel->handle() (line 83 of /var/www/html/vendor/symfony/http-kernel/HttpKernel.php).
```
The link field is using an internal url going to a filtered by keyword search.
`/collection/search?keyword%5B0%5D=Ryakuga&keyword%5B1%5D=haya-oshie`
When there are more than one of the same query parameter, only the last value is outputting.
I've tried the same patch on a 10.3.10 project, again patch applies, retains the duplicate parameter with the pattern of (?tag[0]=one&tag[1]=two),however clicking the link strips off all parameters completely.
Wondering if anyone else has seen anything similar (also trying to dig into if this is just something in our internal setup).
Comment #55
froboyJust... one more for the grinder... we have an external system (Daxko) that is providing links with even stranger keys... 😭
date_ranges%5B0%5D.start=&date_ranges%5B0%5D.enddecoded
date_ranges[0].start=&date_ranges[0].endCore's link formatter is mashing those together into
date_ranges[0]=which... the external system doesn't like.
Comment #56
jopdebeeck commentedRe-rolled #51 for 11.x
Comment #57
alexpottI would encourage people to review #3554522: Only parse and mess with the URI in \Drupal\Core\Utility\UnroutedUrlAssembler::buildExternalUrl() if we have to as that issue extracts the minimum change from this issue as recommended by me back in #3038774-31: Url only outputs the last value of a query parameter.
Comment #58
alexpottIf we do want to do this I recommend we investigate the use of https://uri.thephpleague.com/ and do not use our own code from URI and URL parsing.
Comment #59
jopdebeeck commentedRerolled patch for 11.3
Comment #60
rcaldeiradev commentedUpdated from 10.5.3 to 10.6.1: can't reproduce the issue anymore.