Problem/Motivation

Drupal uses PHP's parse_str function to parse querystrings in multiple places. This function incorrectly assumes that a parameter name can occur only once and returns just the last value.

Example with drush php:

>>> print \Drupal\Core\Url::fromUri('https://example.com/page?tag=one&tag=two')->toString();
https://example.com/page?tag=two

This behavior is a PHP-ism and is not a web standard. Other platforms may and do use the format of repeated param name to pass multiple values.

Further reading:

Proposed resolution

Implement a version of parse_str that is backwards compatible, i.e. treats ?tag=one&tag=two the same way as ?tag[0]=one&tag[1]=two.

Remaining tasks

  • Decide if this ticket should be for internal and external URLs, or external URLs only. See #31 and #41.
  • Add a way to control the output of query arguments.

User interface changes

None

API changes

New static method Drupal\Core\Url::parseString($string).

Data model changes

None.

Release notes snippet

Fixed a bug in parsing query strings with repeated param names.

Issue fork drupal-3038774

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

    Comments

    blazey created an issue. See original summary.

    blazey’s picture

    Issue summary: View changes
    blazey’s picture

    Assigned: blazey » Unassigned
    Status: Active » Needs review
    StatusFileSize
    new738 bytes
    new9.67 KB

    Status: Needs review » Needs work

    The last submitted patch, 3: 3038774-3.patch, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

    blazey’s picture

    StatusFileSize
    new738 bytes
    new10.01 KB
    blazey’s picture

    Status: Needs work » Needs review

    The last submitted patch, 5: 3038774-5-test-only.patch, failed testing. View results

    Status: Needs review » Needs work

    The last submitted patch, 5: 3038774-5.patch, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

    pancho’s picture

    Issue tags: +Needs security review, +http parameter pollution
    StatusFileSize
    new149.18 KB

    It makes sense to have this configurable, though it's important to point out that neither one nor the other interpretation is an industry standard. There simply is no standard. Every platform handles this in a different way, see this overview from 2009:
    Overview
    (source: https://www.owasp.org/images/b/ba/AppsecEU09_CarettoniDiPaola_v0.8.pdf)

    Please note that some hacker tricks make use of non-consistent interpretation of repeated query parameters, a behaviour that is known as "HTTP Parameter Pollution" (https://www.owasp.org/images/b/ba/AppsecEU09_CarettoniDiPaola_v0.8.pdf). This doesn't mean that abandoning parse_str() for a more flexible solution is a bad idea, it just means:

    • we need to be very careful changing anything
    • we need extensive test coverage.
    • both on the parsing and the output side this probably has to remain configurable to allow for consistent setups in heterogenous architectures.

    Please also note that W3C recommends

    that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&"

    pancho’s picture

    Category: Bug report » Task

    As we're currently in line with PHP, while there is no clear web / industry standard, I can't see how this is a bug. Maybe a task.

    Found another good overview with some interesting references at the bottom: https://www.owasp.org/index.php/Testing_for_HTTP_Parameter_pollution_(OTG-INPVAL-004)

    This is also interesting: https://cs.brown.edu/~vpk/papers/arc.acns12.pdf

    blazey’s picture

    Category: Task » Bug report

    It really is a bug. The reason why I'm raising both this and #3038781: Output multiple values of a query param with repeating param name is that we've got an external web app that requires this a way of providing arguments and we need to store links to that app's screens. Urls fed into Drupal are correct but the output is broken.

    pancho’s picture

    @blazey

    It really is a bug.

    Urls fed into Drupal are correct but the output is broken.

    Just to make it clear, I already said I'm sympathetic to the idea of allowing different parsing strategies for query parameters. And I'm sure you have a proper use-case and need it to work that way.

    However, while the fact that you need this feature might make it an important feature, it doesn't make it a bug. There's nothing wrong with our current parsing strategy. It's just not the one you happen to need for your external web app.

    If something's broken, then it's the expectation that different platforms would parse query parameters the same way.
    I'm in favor of allowing other parsing strategies as a feature but not at the cost of breaking the expectations of other sitebuilders. And certainly not at the cost of unconsidered HPP risks.

    It doesn't seem you read my links, and/or understand how sensitive this issue is. I'm leaving it there though and not going to argue with you further. I basically support your idea, so good luck with it.

    blazey’s picture

    @Pancho I fully agree that this issue needs to be treated seriously and that there are security concerns. This is a matter of the shape of the solution.

    However, when we take a step back and try to categorize the issue, we need to take these two facts into consideration:

    1 - The standard doesn't say anything about the need for query string parameters to have unique names

    2 - This is how it currently works in core:

    >>> print \Drupal\Core\Url::fromUri('https://example.com/page?tag=one&tag=two')->toString();
    https://example.com/page?tag=two
    

    This shows that we're making assumptions that go beyond the standards. That could still be ok-ish if we only used the Url class to represent internal links, as we'd always be building and parsing them ourselves. That's not the case though. The Url component is designed to handle external links as well and the example above shows that it's impossible to have it parse a valid address in that particular case.

    blazey’s picture

    StatusFileSize
    new831 bytes
    new10.2 KB

    Fixing some of the tests.

    blazey’s picture

    Status: Needs work » Needs review
    leksat’s picture

    Status: Needs review » Reviewed & tested by the community

    Marking this RTBC because it works :)

    To reduce the security risk we could use a well known library instead of maintaining custom code.
    Something like https://github.com/thephpleague/uri-query-parser could work. But this one requires PHP >= 7.2. Does this mean it can't be used with Drupal 8?

    blazey’s picture

    If it's going to be committed at any point, please credit @leksat. He did a detailed review of the patch and helped with fixing the test failures. Thanks!

    larowlan’s picture

    Status: Reviewed & tested by the community » Needs work
    1. +++ b/core/lib/Drupal/Component/Utility/UrlHelper.php
      @@ -419,4 +419,60 @@ public static function isValid($url, $absolute = FALSE) {
      +   * @return array
      

      we need a comment here

    2. +++ b/core/lib/Drupal/Component/Utility/UrlHelper.php
      @@ -419,4 +419,60 @@ public static function isValid($url, $absolute = FALSE) {
      +  public static function parseString($string) {
      

      we definitely should be adding extensive unit tests for this

    3. +++ b/core/lib/Drupal/Component/Utility/UrlHelper.php
      @@ -419,4 +419,60 @@ public static function isValid($url, $absolute = FALSE) {
      +    // http://php.net/manual/en/function.parse-str.php#76792
      

      should this be @see?

    blazey’s picture

    Status: Needs work » Needs review
    StatusFileSize
    new11.61 KB
    new2.03 KB

    Thanks @larowlan. Here's a patch with all 3 points addressed.

    larowlan’s picture

    Status: Needs review » Needs work

    Doesn't apply

    Are we sure we want to override such a low-level function of PHP.

    After discussing this issue with a long-term prolific contributor, the opinion was broached that perhaps this is something that belongs lower down the stack, e.g. in Guzzles PSR7 library (they already have parse_query implementation that covers a lot of the use cases here, but is not as thorough as parse_str).

    Said contributor also pointed out that the current implementation doesn't handle nesting such as

    parse_str('foo[1][2]=bar&foo[2][1]=baz', $result); print_r($result);
    
    blazey’s picture

    Status: Needs work » Needs review
    StatusFileSize
    new11.49 KB
    new2.98 KB

    After discussing this issue with a long-term prolific contributor, the opinion was broached that perhaps this is something that belongs lower down the stack, e.g. in Guzzles PSR7 library (they already have parse_query implementation that covers a lot of the use cases here, but is not as thorough as parse_str).

    That's great news! Now we have two external functions, each covering a part of the spectrum. Here's a patch that is an attempt to combine both tools into a complete solution.

    the current implementation doesn't handle nesting

    Nested values have been added to the unit test's data provider.

    Status: Needs review » Needs work

    The last submitted patch, 21: 3038774-21.patch, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

    johnwebdev’s picture

    Status: Needs work » Needs review
    StatusFileSize
    new11.63 KB
    new1.3 KB

    The test fails because parse_query returns null if the key has no value, and parse_str returns an empty string. Let's change that value to an empty string for BC, and also update the test to ensure the expected values matches type as well.

    leksat’s picture

    Status: Needs review » Needs work

    Sorry for pushing this back, but I have found few issues.

    I tested the patch with some random query strings. The test code was:

    $ drush eval '$s = "query=string"; parse_str($s, $r); var_dump($r); var_dump(\Drupal\Component\Utility\UrlHelper::parseString($s));'
    

    Results:

    query=string parse_str UrlHelper::parseString notes
    arr[]=1&arr[]=2
    array(1) {
      'arr' =>
      array(2) {
        [0] =>
        string(1) "1"
        [1] =>
        string(1) "2"
      }
    }
    array(1) {
      'arr' =>
      array(1) {
        [0] =>
        string(5) "Array"
      }
    }
    arr[][]=1&arr[][]=2
    array(1) {
      'arr' =>
      array(2) {
        [0] =>
        array(1) {
          [0] =>
          string(1) "1"
        }
        [1] =>
        array(1) {
          [0] =>
          string(1) "2"
        }
      }
    }
    array(1) {
      'arr' =>
      array(1) {
        [0] =>
        array(1) {
          [0] =>
          string(5) "Array"
        }
      }
    }
    same as above
    arr=1&arr[]=2
    array(1) {
      'arr' =>
      array(1) {
        [0] =>
        string(1) "2"
      }
    }
    array(1) {
      'arr' =>
      string(1) "1"
    }
    parse_str takes second
    parseString takes first
    a=1&b[]=2
    array(2) {
      'a' =>
      string(1) "1"
      'b' =>
      array(1) {
        [0] =>
        string(1) "2"
      }
    }
    array(2) {
      'b' =>
      array(1) {
        [0] =>
        string(1) "2"
      }
      'a' =>
      string(1) "1"
    }
    order change
    blazey’s picture

    Assigned: Unassigned » blazey

    Thanks for these examples @leksat. I'm working on a solution that would behave more like the parse_str. The patch should be ready tomorrow.

    blazey’s picture

    Assigned: blazey » Unassigned
    Status: Needs work » Needs review
    StatusFileSize
    new4.47 KB
    new717.26 KB

    It turns out parse_query introduces more issues than it solves, so here's another take on the problem. It works like this:

    1. Scan all the key=value pairs in the query
    2. Move all the repeated keys that do not use the array notation into a separate array
    3. Pass the query string (with the potential-data-loss items removed) through parse_str
    4. Iterate over the removed items and pass each one to parse_str individually
    5. Merge the intermediate results into the initial result

    This way we get rid of problems with parsing (everything goes through parse_str) and ordering (the initial execution dictates the order of keys).

    All the examples provided by @leksat have been added as test cases plus an additional one for upper / lower case letters in key names. One additional inconsistency between the parsing functions has been spotted. It turns out that parse_str turns o%20e into o_e (not o e) so this case has been updated.

    blazey’s picture

    StatusFileSize
    new13.04 KB

    The patch from #26 after a rebase.

    Version: 8.8.x-dev » 8.9.x-dev

    Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

    Version: 8.9.x-dev » 9.1.x-dev

    Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

    carolpettirossi’s picture

    Status: Needs review » Reviewed & tested by the community

    We faced the same issue and applying patch #27 worked for us.

    alexpott’s picture

    Status: Reviewed & tested by the community » Needs review

    Is this problem only being hit when trying to render external URLs entered into menus and link fields? Because if so I think there might be a less invasive solution. We can short circuit the \Drupal\Core\Utility\UnroutedUrlAssembler::buildExternalUrl() method and do something like:

      /**
       * {@inheritdoc}
       */
      protected function buildExternalUrl($uri, array $options = [], $collect_bubbleable_metadata = FALSE) {
        // Early return to not unnecessarily alter external URLs.
        if (empty($options['query']) && empty($options['fragment']) && !isset($options['https'])) {
          return $collect_bubbleable_metadata ? (new GeneratedUrl())->setGeneratedUrl($uri) : $uri;
        }
    

    We only need to process the URL if we changing the scheme, query string or fragments and this is not at all the usual case. With this code links with a value of https://example.com/page?tag=one&tag=two are outputted correctly for both link fields in content and in menus. That way this change would become a performance optimisation for displaying external links instead of being some that moves low level code from C to PHP.

    carolpettirossi’s picture

    StatusFileSize
    new12.33 KB

    Re-rolling #27 patch for Drupal 9

    Status: Needs review » Needs work

    The last submitted patch, 32: 3038774-32.patch, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

    Version: 9.1.x-dev » 9.2.x-dev

    Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

    nikitagupta’s picture

    Status: Needs work » Needs review
    StatusFileSize
    new12.8 KB
    new1.32 KB

    fixed the test cases.

    carolpettirossi’s picture

    I'm facing an issue when rendering a link when it contains more than one value for the same query param (e.g. industry_sector below)

    So, for example, this URL (/search-jobs?opportunity_types=Job&industry_sectors=1&industry_sectors=2&industry_sectors=3&start=0&default=1) added to a Link field is displayed like: /search-jobs?opportunity_types=Job&industry_sectors%5B0%5D=1&industry_sectors%5B1%5D=2&industry_sectors%5B2%5D=3&start=0&default=1

    Any ideas where we could fix this part?

    daffie’s picture

    StatusFileSize
    new2.13 KB
    new14.99 KB

    Added the requested change by @alexpott.

    daffie’s picture

    StatusFileSize
    new2.29 KB
    new15.02 KB

    Fixed the coding violations.

    Status: Needs review » Needs work

    The last submitted patch, 38: 3038774-38.patch, failed testing. View results

    daffie’s picture

    Status: Needs work » Needs review
    StatusFileSize
    new14.3 KB

    Removed the following test from the patch:

    diff --git a/core/tests/Drupal/KernelTests/Core/Url/LinkGenerationTest.php b/core/tests/Drupal/KernelTests/Core/Url/LinkGenerationTest.php
    index 3157a63dc7..b658195d19 100644
    --- a/core/tests/Drupal/KernelTests/Core/Url/LinkGenerationTest.php
    +++ b/core/tests/Drupal/KernelTests/Core/Url/LinkGenerationTest.php
    @@ -62,4 +62,12 @@ public function testHookLinkAlter() {
         $this->assertRaw('<strong>Test!</strong>');
       }
     
    +  /**
    +   * Tests how hook_link_alter() can affect escaping of the link text.
    +   */
    +  public function testMultipleGetArguments() {
    +    $url = Url::fromUri('http://example.com/page?tag=one&tag=two');
    +    $this->assertEquals('http://example.com/page?tag%5B0%5D=one&tag%5B1%5D=two', $url->toString());
    +  }
    +
     }
    

    This test fails because we added the code as requested by @alexpott in comment #31.

    carolpettirossi’s picture

    Guys, just a quick update that even with the new patch provided on #40, the issue (#36) when rendering a link when it contains more than one value for the same query param still happens.

    For external URLs, I think this would address it: https://www.drupal.org/project/drupal/issues/3007243

    However, how should we deal with internal URLs? Any ideas?

    carolpettirossi’s picture

    Status: Needs review » Needs work

    Version: 9.2.x-dev » 9.3.x-dev

    Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

    Version: 9.3.x-dev » 9.4.x-dev

    Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

    afireintheattic’s picture

    StatusFileSize
    new14.24 KB
    new7.1 KB

    Re-rolled #40 for 9.3 support -- patch and diff is attached.

    Version: 9.4.x-dev » 9.5.x-dev

    Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

    Version: 9.5.x-dev » 10.1.x-dev

    Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

    jopdebeeck’s picture

    StatusFileSize
    new10.21 KB

    Re-rolled #40 for 9.4.12

    jopdebeeck’s picture

    StatusFileSize
    new14.11 KB

    Remade the patch from #48

    Re-rolled #40 for 9.4.12

    Version: 10.1.x-dev » 11.x-dev

    Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

    glenndw’s picture

    StatusFileSize
    new13.8 KB

    Re-rolled #49 for 10.2.3

    andyf’s picture

    Issue summary: View changes

    It looks like there's an unanswered question of ticket scope: are we only interested in external URLs, or should we also deal with internal? Personally I'd suggest that if we can't simply solve both, then it makes sense to at least support external URLs along the lines of #31 and if we want to handle internal links do that in a separate ticket. The only issue with #31 that I can see is the slight WTF you'd get from adding an entirely separate query param (or even a fragment) to such an external URL: I don't think you'd expect that to change how the duplicate query parameters are handled. ie...

    // Assuming we're using a solution that only implements #31.
    \Drupal\Core\Url::fromUri('https://example.com/page?tag=one&tag=two')->toString();
    // Returns https://example.com/page?tag=one&tag=two
    \Drupal\Core\Url::fromUri('https://example.com/page?tag=one&tag=two', ['query' => ['foo' => 'bar']])->toString();
    // Returns https://example.com/page?tag=two&foo=bar
    
    

    @carolpettirossi could you give any extra detail on why/how you're using local links with duplicate query parameter names? Thanks!

    mauriciopieper’s picture

    I can confirm patch #51 works on Drupal 10.3.

    michelle-buckby’s picture

    Hello, I have tried using the patch at #51 on a 10.2.11 project.
    This patch applies and retains the duplicate query parameter, following the proposed solution pattern (?tag[0]=one&tag[1]=two) but with an error clicking the link: A client error happened

    ```
    Symfony\Component\HttpKernel\Exception\BadRequestHttpException: Input value "keyword" contains a non-scalar value. in Symfony\Component\HttpKernel\HttpKernel->handle() (line 83 of /var/www/html/vendor/symfony/http-kernel/HttpKernel.php).
    ```

    The link field is using an internal url going to a filtered by keyword search.
    `/collection/search?keyword%5B0%5D=Ryakuga&keyword%5B1%5D=haya-oshie`
    When there are more than one of the same query parameter, only the last value is outputting.

    I've tried the same patch on a 10.3.10 project, again patch applies, retains the duplicate parameter with the pattern of (?tag[0]=one&tag[1]=two),however clicking the link strips off all parameters completely.

    Wondering if anyone else has seen anything similar (also trying to dig into if this is just something in our internal setup).

    froboy’s picture

    Just... one more for the grinder... we have an external system (Daxko) that is providing links with even stranger keys... 😭

    date_ranges%5B0%5D.start=&amp;date_ranges%5B0%5D.end

    decoded

    date_ranges[0].start=&date_ranges[0].end

    Core's link formatter is mashing those together into

    date_ranges[0]=

    which... the external system doesn't like.

    jopdebeeck’s picture

    StatusFileSize
    new14.26 KB

    Re-rolled #51 for 11.x

    alexpott’s picture

    I would encourage people to review #3554522: Only parse and mess with the URI in \Drupal\Core\Utility\UnroutedUrlAssembler::buildExternalUrl() if we have to as that issue extracts the minimum change from this issue as recommended by me back in #3038774-31: Url only outputs the last value of a query parameter.

    alexpott’s picture

    If we do want to do this I recommend we investigate the use of https://uri.thephpleague.com/ and do not use our own code from URI and URL parsing.

    jopdebeeck’s picture

    StatusFileSize
    new13.42 KB

    Rerolled patch for 11.3

    rcaldeiradev’s picture

    Updated from 10.5.3 to 10.6.1: can't reproduce the issue anymore.

    Version: 11.x-dev » main

    Drupal core is now using the main branch as the primary development branch. New developments and disruptive changes should now be targeted to the main branch.

    Read more in the announcement.