This was brought to my attention by my colleague robbyahn this week.

<a href="http://drupal.org/project/modules/?f[0]=drupal_core%3A103&amp;f[1]=bs_project_sandbox%3A0">D7 modules</a>
is not valid in HTML5 and cannot be made so with D8, D7 or any Drupal. (without theme hacks)

This is a core issue with the drupal_http_build_query() function
Under the url() function.

Simply:
The "[" and "]" characters are (now) illegal in HTML5 URLs!

After some digging, that started with a "Huh?" and proceeded to my calling both the W3C 'experimental' validator and http://html5.validator.nu/ broken ... I find that HTML5 does not use URLs or URIs, but IRIs instead. This seems to have come in around the time of supporting international UTF-8 international characters in web addresses last decade.

This minimal HTML5 Document hard, red-flag fails when pasted into the W3C validator set as HTML5.

<!DOCTYPE html><html>
<head><title>example</title></head>
<body>
<a href="http://drupal.org/project/modules/?f[0]=drupal_core%3A103&amp;f[1]=bs_project_sandbox%3A0">Drupal 7 modules</a>
</body>
</html>

>Validation Output:

1 Error

  1. Only local images are allowed.

    Line 4,
    Column 100
    :
    Bad value http://drupal.org/project/modules/?f[0]=drupal_core%3A103&f[1]=bs_projec... for attribute href on element a: Illegal character in query component.

    …s/?f[0]=drupal_core%3A103&#38;amp;f[1]=bs_project_sandbox%3A0&#34;<strong title="Position where error was detected.">&#62;</strong>Drupal 7 modules&#60;/a&#62;
    Syntax of IRI reference:
    Any URL. For example: /hello, #canvas, or http://example.org/. Characters should be represented in NFC and spaces should be escaped as %20.

A valid version is

<!DOCTYPE html><html>
<head><title>example</title></head>
<body>
<a href="http://drupal.org/project/modules/?f%5B0%5D=drupal_core%3A103&amp;f%5B1%5D=bs_project_sandbox%3A0">Drupal 7 modules</a>
</body>
</html>

Which has "[" and "]" replaced with the URL-encoded "%5B" and "%5D" respectively.

In my limited testing so far doing so is browser-friendly (though suck-ier to read) and the Apache/PHP request handler does what it should when decoding it. Its functionally OK.

I am aware that nested GET arrays with [] arguments in the query strings has been a cool, lazy feature of PHP since forever. But if we (our clients) want a clean validation report and also want to use interesting arguments in URLs, this can be fixed.
Patch details below...

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dman’s picture

Our problem is the line in drupal_http_build_query()
which goes:

$key = ($parent ? $parent . '[' . rawurlencode($key) . ']' : rawurlencode($key));

So.

Either we fix that up to become

$key = ($parent ? $parent . '%5B' . rawurlencode($key) . '%5D' : rawurlencode($key));

or (I would prefer) we let the system sort this (and any similar issues) out with URL-encoding the key, always, by changing

$params[] = $key . '=' . str_replace('%2F', '/', rawurlencode($value));

to

$params[] = urlencode($key) . '=' . str_replace('%2F', '/', rawurlencode($value));
dman’s picture

Status: Active » Needs review
FileSize
524 bytes

Suggested strawman patch attached.
Input welcome. Is the really a huge problem? I've not found much noise about it in the PHP-HTML5 discssions online or in any existing issue search on html5

I am aware that if applied (to Drupals core url() function) it may trigger an amount of regression sparks.

It's not *killing* anything in the real world so far, but a W3C green-light is an important thing (to us)

Damien Tournoud’s picture

Status: Needs review » Needs work

This is never a problem in practice, but we should fix it anyway.

You patch leads to $key being encoded twice, so needs work for that. We probably also need to fix the tests.

dman’s picture

Status: Needs work » Needs review
FileSize
1.68 KB

Ah yeah, I see. Because drupal_http_build_query recurses upon itself it can double up the encoding.
So I guess the first place would be the place to do it. I didn't study it too hard when just trying to see if it was worth trying to fix at all..

Nice that there is a test already there for this!
Testbot, see if this will do better...
In my test I'm using urldecode() instead of a literal string, as I don't care how it arrives at the result, as long as the result ends up correct. OTOH, it doesn't prove that this fix does anything at all either, just that it's fully compatible with previous behavior. Hm.

Damien Tournoud’s picture

Status: Needs review » Needs work
-    $key = ($parent ? $parent . '[' . rawurlencode($key) . ']' : rawurlencode($key));
+    $key = ($parent ? $parent . rawurlencode('[' . $key . ']') : rawurlencode($key));

You can move the rawurlencode() outside the ternary operation.

-    $this->assertEqual(drupal_http_build_query(array('a' => array('b' => '2', 'c' => '3'), 'd' => 'foo')), 'a[b]=2&a[c]=3&d=foo', 'Nested array was properly encoded.');
+    $this->assertEqual(urldecode(drupal_http_build_query(array('a' => array('b' => '2', 'c' => '3'), 'd' => 'foo'))), 'a[b]=2&a[c]=3&d=foo', 'Nested array was properly encoded.');

No, please check the result explicitly. Modified like this, this test would pass with our without your patch :)

superspring’s picture

Assigned: dman » Unassigned
Status: Needs work » Needs review
FileSize
2.47 KB

Same patch as above, reworked with Damien's changes.

coderintherye’s picture

The test result says this passed, but the message here seems to be stuck at test request sent. What's up testbot?

dman’s picture

Status: Needs review » Needs work

try bumping to requeue I guess. It's well old now...

dman’s picture

Status: Needs work » Needs review

and testbot ahoy?

waverate’s picture

Has this been reviewed/committed in D8 so that it can be backported to D7?

waverate’s picture

Issue summary: View changes

wording improvement

joseph.olstad’s picture

Hi Waverate, I applied the changes shown in the patch from comment #6 to D7 and they work without modification. So the code works on D7.

Any chance this can get committed into D7 and D8 core (and D6 if someone can test it?) ? We applied this fix and were able to pass w3c validation on our facet search page.

Anyone else using this patch?
Thanks,

joseph.olstad’s picture

renamed patch to follow naming convention (description_description-NODEID-COMMENTNUMBER.patch ) thus w3c_url_validation_html5_changes-1827854-12.patch

Status: Needs review » Needs work

The last submitted patch, 12: w3c_url_validation_html5_changes-1827854-12.patch, failed testing.

joseph.olstad’s picture

Category: Task » Bug report
Status: Needs work » Needs review

Not sure why #12 failed (it is a copy of #6),
So use #6 instead, it passed.

dcam’s picture

Status: Needs review » Needs work
Issue tags: +Needs backport to D7, +Needs reroll

Closed the D7 backport issue, #2289867: [D7] W3C HTML5 Validation error with PHP array-based query-strings built with url() containing [ ] characters, as a duplicate.

I kind of doubt this is backportable, but I'll tag it and the committers can make the call.

#6 needs a reroll.

superspring’s picture

Status: Needs work » Needs review
FileSize
636 bytes

Rerolling... Looks like the majority of this patch has already been done.

The tests in
- core/modules/system/lib/Drupal/system/Tests/Common/UrlTest.php:testDrupalGetQueryParameters
may be sufficient now. Someone else can look at these if they would like some more.

Status: Needs review » Needs work

The last submitted patch, 16: html5-url-changes-1827854-16.patch, failed testing.

joseph.olstad’s picture

The patch 5 works on D7. I've got over 300 contrib modules running for quite some time using this patch and no side-effects yet seen. The client web browser handles decoding and nothing really changes from the drupal side of things except for properly encoding the two square brackets before generating a link.

The simpleTest failure seen above is actually a problem with UrlHelperTest.php .

Line 40 of UrlHelperTest.php should be changed..

It's now:

      array(array('a' => array('b' => '2', 'c' => '3'), 'd' => 'foo'), 'a[b]=2&a[c]=3&d=foo', 'Nested array was properly encoded.'),

Should be changed to:

      array(array('a' => array('b' => '2', 'c' => '3'), 'd' => 'foo'), 'a%5Bb%5D=2&a%5Bc%5D=3&d=foo', 'Nested array was properly encoded (with square bracket encoding).'),

%5B is the [
%5D is the ]

If you look at line 37, 38 , 39 they have the encoding for special characters using % but line 40 does not have the encoding.

So, the code is right, the test case UrlHelperTest.php needs to be updated for the new patch to pass testing .

Click here to view the source code for the test case

Thanks

joseph.olstad’s picture

Status: Needs work » Needs review

See comments #5, #6 and #18

gdaw’s picture

We may be able to test this in more than one D7 environment, but as Joseph reported we've seen no problems.

star-szr’s picture

Status: Needs review » Needs work
Issue tags: -Needs reroll

This doesn't reroll but does need further work.

joseph.olstad’s picture

We've been using this w3c validation fix to core for months now and it works.

Obviously the simple test needs to be adjusted, but the actual patch from superspring in comment #5 is good.

Anyone know how to adjust/update the simplytest? Obviously it's checking for the square bracket [ character instead of the uriencoded characters

for instance "[" is uri encoded as "%5B"
and "]" is uri encoded as "%5D"

the patch does this correctly but the simply test keeps checking for the non-encoded characters [ and ].

Please adjust the simplytest accordingly and then the patch will pass testing. If we can get this included in core then Drupal core will become HTML5 w3c compliant.

rooby’s picture

Issue tags: +Needs tests
bceyssens’s picture

Status: Needs work » Needs review
FileSize
1.67 KB

Rerolled

dawehner’s picture

So for D8 we have the following comment:

   * @todo Remove this function once PHP 5.4 is required as we can use just
   *   http_build_query() directly.
joseph.olstad’s picture

We're still using the D7 version of this patch.

cdonner’s picture

This is all good, until you paste such a URL as a target into the Redirect module. At that point the % gets encoded again, resulting in %255B and %255D, respectively. Filter query broken. You have to put the [] back into the URL before using it anywhere else in Drupal. Things like this make me want to throw Drupal out and write my site from scratch.

joseph.olstad’s picture

In response to dawehner and using "http_build_query() directly" (instead of rawurlencode?) , the official system requirements page says that 5.4.5 or higher is required for D8. So that means that the D8 patch could now use http_build_query()

"Drupal 8: PHP 5.4.5 or higher"

joseph.olstad’s picture

cdonner in comment #27 you mentioned the redirect module. I installed and tested the "redirect" module (version 7.x-1.0-rc1) in combination with this core patch and the redirect worked correctly for a link that contained square brackets and there was no problem dealing with the patch that url encodes the square brackets. The redirect of url encoded square brackets still works correctly after patching. The client browser interprets the square brackets correctly. Our production D7 site has been using the D7 version of this patch since it was created and we have over 400 modules and features and not one of them has had a problem with this.

martijn@ibuildings.nl’s picture

Status: Needs review » Reviewed & tested by the community
FileSize
1.67 KB

Just applied and reviewed the patch in #24. Changes seem sensible to me and I haven't found any breakage.
Re-uploaded the patch to make it run on the new test infrastructure.

legolasbo’s picture

My colleague mdeletter has reviewed this issue but is being hindered by the fact that is not a "confirmed user" yet. His comment got lost in translation which is why I am posting his comment for him.

Reviewed and tested the patch in #24. The changes seems sensible and I couldn't find any breakage.
Re-uploaded the patch to run it on the new test infrastructure.

Please credit mdeletter on commit for reviewing the patch.

legolasbo’s picture

Also D.O bugged out with his patch upload, which is why I am re-uploaded the patch again.

In case it wasn't already clear, patches #30 and #32 are identical to patch #24 and only re-uploaded to make sure they are running in the new testing infrastructure.

alexpott’s picture

Status: Reviewed & tested by the community » Needs review

Can someone confirm that we have test coverage of making a request using square brackets and it still working as expected after this change?

legolasbo’s picture

@alexpott,

Why does the way we generate a URL have anything to do with how a request is handled?

alexpott’s picture

@legolasbo so I might not have asked the question probably - but what we need is confirmation that PHP still handles the query string as expected.

legolasbo’s picture

@alexpott, So if I understand you correctly, you want to make sure that a request to example.com/blog?tags%5B3%5D=3 and example.com/blog?tags[3]=3 results in the same page without errors?

We have been unable to locate any automated tests that verify this in D8. However, since the request is (AFAIK) handled by Symfony, they should have test coverage for this (assumption). I have manually tested the case mentioned above and it is handled as expected.

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

anrikun’s picture

Testing the same patch as #32 for D7

jamesrward’s picture

Re #36 this is tested by symfony in HttpFoundation/Tests/RequestTest.php line 648:
2.8/src/Symfony/Component/HttpFoundation/Tests/RequestTest.php

Linked to 2.8 as that is what D8 core is using.

array('foo[]=1&foo[]=2', 'foo%5B%5D=1&foo%5B%5D=2', 'allows array notation'),

This is confirming that the un-encoded query string gets encoded properly. Encoded query strings are handled as-is. I tried hacking in the example in #36 and it passed the same test.

Would love to see this move towards RTBC for D8 here so we can get the D7 backport in.

jamesrward’s picture

Status: Needs review » Needs work

The last submitted patch, 42: html5-changes-1827854-42.patch, failed testing.

joseph.olstad’s picture

@jamesrward , looks like 4 new (failing) tests also need the conversion from [] to %5B%5D

joseph.olstad’s picture

@alexpott after this change nothing breaks , this is because the webbrowser automatically converts the square brackets , what is broken is w3c compliance (as silly as it is in this case IMHO ) when drupal without the patch renders links with [] , then when the w3c validator runs, it sees the square brackets and flags it as non html5 compliant. The patch does not prevent square brackets from being used (thats actually handled by web browsers when you click on a link). What the patch actually does is make Drupal link rendering be html5 compliant.

been using the D7 patch with hundreds of contrib modules and custom modules , no problem. Did not have to rewrite any code in my contrib or custom modules.

jamesrward’s picture

Here's a patch and an interdiff with the BigPipe tests fixed. Basically these tests expected to put square bracket arguments in and get square bracket arguments out. Our patch urlencodes those brackets so the input and output no-longer match. Per #45 I don't think we are actually breaking BigPipe with the patch in #42, just the tests. To me this is a good indicator that we need to get this fix in soon. The longer core stays like this, the more places we are going to have to hunt down and replace non-W3C compliant tests.

jamesrward’s picture

Status: Needs work » Needs review
jamesrward’s picture

Status: Needs review » Reviewed & tested by the community

I think the only thing left to decide here is if this goes on 8.3.x as a bug-fix or waits for 8.4. Paging @alexpott

jamesrward’s picture

Issue tags: -Needs tests +Baltimore2017
joseph.olstad’s picture

@jamesrward, yes you are correct, just the tests needed adjusting.

alexpott’s picture

Given http_build_query does this:

>>> http_build_query(['a' => ['foo' => 'bar']]);
=> "a%5Bfoo%5D=bar"

Our own equivalent will now do the same.

Let's get this done. I considered whether or not we need a CR for this and I don't think so. The return of \Drupal\Component\Utility\UrlHelper::buildQuery() is still a valid URL rawencoded string - only more so.

I guess it is technically possible that someone somewhere is doing preg_match on the output but that'd be wrong... you'd do:

>>> parse_str(http_build_query(['a' => ['foo' => 'bar']]), $result)
=> null
>>> $result
=> [
     "a" => [
       "foo" => "bar",
     ],
   ]
alexpott’s picture

I tested core's menu form for @cdonner's comment #27.
I installed standard and did an advanced search after logging in as user 1. This gave me the beautiful url of search/node?keys=search%20term&f%5B0%5D=type%3Aarticle&f%5B1%5D=language%3Aen&advanced-form=1 and I used that as a menu link. All works great.

alexpott’s picture

Status: Reviewed & tested by the community » Patch (to be ported)

Committed 02d56af and pushed to 8.4.x. Thanks!

I will discuss backporting this to 8.3.x with the other committers.

I credited @Damien Tournoud for his review that identified an issue with the original patch. Thanks to everyone else for their comments.

Perhaps someone can open a follow up to convert \Drupal\Component\Utility\UrlHelper::buildQuery() to just called http_build_query() - it will be a minor performance improvement.

  • alexpott committed 02d56af on 8.4.x
    Issue #1827854 by jamesrward, dman, superspring, joseph.olstad,...
alexpott’s picture

Status: Patch (to be ported) » Fixed

Discussed with @catch we agreed to backport to 8.3.x

  • alexpott committed 6febec1 on 8.3.x
    Issue #1827854 by jamesrward, dman, superspring, joseph.olstad,...
alexpott’s picture

The followup to use http_build_query() already exists - there are problems with it see #2322059: Keep the UrlHelper::buildQuery method and remove @todo

joseph.olstad’s picture

Version: 8.3.x-dev » 7.x-dev

D7 patch is ready to go, been using it on several sites for over a year without issues.

joseph.olstad’s picture

Status: Fixed » Reviewed & tested by the community
xjm’s picture

Version: 7.x-dev » 8.3.x-dev
Status: Reviewed & tested by the community » Fixed

@joseph.olstad, can you create a separate issue for D7? Our current process is to create separate issues for D7 if needed and just reference the D8 one in the related issues. (Also, none of the file attachments on this issue are D7 patches.)

jamesrward’s picture

@xjm backport issue re-opened at #2289867: [D7] W3C HTML5 Validation error with PHP array-based query-strings built with url() containing [ ] characters

#40 appears to be latest D7 ready so I will drop it into that issue for testing.

xjm’s picture

Thanks @jamesrward!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

andrewmacpherson’s picture

this is the only issue tagged with "w3c validator", moving to the other tag for posterity.

joseph.olstad’s picture