Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
url() still accepts a query string both as string AND as an array.
Ugly code everywhere. if ... then ... else ... then ... massacre.
What breaks?
Comment | File | Size | Author |
---|---|---|---|
#74 | 578520-follow-up-3.patch | 1.16 KB | c960657 |
#70 | 578520-follow-up-2.patch | 2.52 KB | c960657 |
#68 | 578520-follow-up-1.patch | 2.46 KB | c960657 |
#65 | drupal.url-query-array.65.patch | 65.44 KB | sun |
#64 | drupal.url-query-array.64.patch | 65.39 KB | sun |
Comments
Comment #2
sunMore?
Comment #4
sunLovely Batch API. :)
Comment #5
sunAdded type-hinting to ensure it's an array.
Comment #6
sunRemaining todos.
Especially not sure about the second change - I think that's in drupalGet(). According to parse_str(), that should retrieve key/value pairs for a passed in query string-alike string.
Not sure whether this is the same, but I guess so.
Also opens up the question why we don't use http://de2.php.net/manual/en/function.http-build-query.php in general? Anyway, separate issue...
I'm on crack. Are you, too?
Comment #8
sunThis is kind of patch that really really really really really really highlights how important it is to break backwards compatibility when doing API changes.
Debugging the test failures and finding the real cause is a monster pain, I can tell you.
Comment #9
Dries CreditAttribution: Dries commentedWelp, I didn't know about http_build_query -- seems like it is worth investigating (in a separate issue). Good find.
Comment #11
catchsubcribing.
Comment #12
sunLovely $destination munging in a couple of form button submit handlers....
Comment #14
sunThis one should finally pass.
Comment #15
sunHmmm.... those tests pass for me locally :(
Comment #17
sunComment #18
sunpwolanin asked for a better title.
Comment #20
sunAll tests keep passing on my local Windows machine. WTF?
Comment #22
sunSorry guys, I really tried everything and still can't replicate those failing tests. Tested with new install with patch applied, ran all tests, all pass.
Maybe a PHPWTF on Windows/Unix?
Comment #23
c960657 CreditAttribution: c960657 commentedPlease add type-hinting here (NULL is still allowed, because it is specified as the default value).
The function name suggests that the function returns a string, but it returns an array. How about naming it drupal_get_query() ?
There is usually no blank line between @param and @return.
I suggest you add a comment explaining the rationale for using
array_merge(array('q'), array_keys($_COOKIE))
as the default value. Also, this default is probably not good when called from url() for an external URL. I suggest you change url() to pass an empty array for $exclude when $options['external'] is TRUE.Use type hinting for $exclude also.
The Doxygen comment for url() should be updated to reflect that $query must now be an array.
Here the array is replaced by a string. Please use a new variable name for the encoded value. I suggest you don't put it in $options, but make a regular variable, e.g. $query_string, just like $script and $prefix.
I can reproduce the testbot problems if I turn off clean URLs. I think the problem is in the
// Without Clean URLs.
part of url().pager_get_querystring() no longer returns a query string but an array. I think the function should be renamed, e.g. to pager_get_query(). Also, the Doxygen comment needs to reflect the change in return type. Likewise for tablesort_get_querystring(). If you these two functions are renamed, we also get rid of the two different ways of spelling query_string/querystring.
The Doxygen comment for drupal_get_destination() says "Prepare a destination query string ...", but the function no longer returns a string.
TableSort::getQueryString() no longer returns a query string, and should probably be renamed to getQuery(). Likewise, $ts['query_string'] should be renamed to $ts['query'].
It looks like the filtering logic (i.e. $exclude) is duplicated in drupal_get_query_string() and drupal_query_string_encode(). How changing the latter to just
return http_build_query(drupal_get_query_string($query, $exclude), '', '&');
?The assertion message needs updating. I suggest you just remove the function call from the display string. Likewise for the following test.
This changes the behaviour of the test. Before it requested system-test/destination?xyz1234xyz, now it requests system-test/destination?destination=xyz1234xyz. I don't think it matters, though. I think the test should be extended to two cases, with and without a ?destination=xyz1234xyz argument, but we can do that in another issue if you prefer.
Comment #24
sunThanks for that awesome review!
And also thanks for catching clean URLs as cause - I didn't thought of that. (And it makes me a bit worry that the existing test that should have caught this should have failed on my box, too.)
I incorporated almost all of your suggestions except:
- Renaming to drupal_get_query() (including pager and tablesort functions): Too ambiguous with database queries. Thought of alternatives, but wasn't able to come up with better names. I think that "query string", when referring to URLs, is a pretty common term. So the only thing we could do would be to squeeze a 'url_' somewhere into those function names, but I'd like to defer that to a separate issue.
- Blank line between @param and @return: Actually, we do, and most developers prefer it. It's just not defined in the coding standards yet. There is an issue about this floating around somewhere, but I'm unable to find it currently.
- Generation of external URLs, which optionally allows to append a custom query string, is already covered in tests. NIH. :)
- http_build_query() cannot be used, because it doesn't use rawurlencode() like drupal_query_string_encode() does.
Additionally, I merged a stale test case for url(), which only tested external URLs, into the existing URL generation unit test, where l(), url(), aso. are already tested.
So, plenty of additional clean-ups in this patch - hence the almost doubled size...
Comment #26
c960657 CreditAttribution: c960657 commentedGood job! Just a brief follow-up before bedtime:
Should be “a URL” instead of “an URL”.
I don't understand what you mean by “ambitious”?
Agreed - when we are talking about the actual string representation, i.e.
foo=1&bar=2
. I think it is really confusing for developers that a function called get_foo_string() returns an array.Comment #27
sunThis one should pass.
Can we defer that function renaming to a separate issue, please? Those function names are a tiny WTF compared to the major WTF of $query being allowed to be two different data types.
Comment #29
sunAdditionally.... boy, that url() test in common.test is a *unit* test. Replaced those novels with developer friendly output.
Comment #31
sunThe test failures are caused by SimpleTest's own internal URL handling. Specifically, DrupalWebTestCase->getAbsoluteUrl().
I've fixed the bug after hours of debugging, but it's certainly not fixing the cause: That entire DrupalWebTestCase->getAbsoluteUrl() looks totally wrong to me. I've left a @todo plus lengthy inline comment in there explaining the parts I understand.
Also added tests to SimpleTest to ensure proper URL handling inside of DrupalWebTestCase.
In case this one doesn't pass, then this is beyond my horizon, and aliens will eat me tonight.
Comment #32
mfbLeaving at needs review for more reviews, but user_login_destination(), a new function in user.module, will also need to be patched
Comment #33
sun/me stops counting other issues to clean up in this patch.
Comment #34
mfbHmm why did you move the function? That's where I originally had it, but Dries suggested making it a generic function in user module for other authentication modules.
Comment #35
sunBecause you missed the accompanying PHPDoc that explains why this function lives there?
Comment #36
mattyoung CreditAttribution: mattyoung commentedsub
Comment #37
mfbUsing http_build_query() makes drupal_query_string_encode() quite a bit faster (depending on the complexity of the array, from two to four times faster in my testing). The function I benchmarked is:
The only difference is that http_build_query() URL-encodes the square brackets as of PHP 5.1.3, after this bug report: http://bugs.php.net/bug.php?id=36656 I don't think URL-encoding square brackets would cause any issues (and apparently would make the URLs more RFC-compliant).
EDIT: One additional difference, http_build_query() removes any query parameter with a NULL value. Whereas, drupal_query_string_encode() treats NULL value as equivalent to empty string, i.e. does not remove it.
Comment #38
sunCan we please move that discussion about whether http_build_query() may be used or not or whether it may be faster into a separate issue? :)
It was horrible pain to get this patch done and I don't see why any http_build_query()-related talk can't be moved elsewhere. The much more important part of this patch is the API clean-up to make $query in url() always an array.
Thus, I'd love to see a RTBC here.
Comment #39
mfbAnother minor issue I see, in system_admin_compact_page(),
drupal_goto($_GET['q'], drupal_get_destination());
can simply bedrupal_goto();
Otherwise you end up on admin?destination=admin for no good reason.Comment #40
sunSorry, that would potentially change the current behavior, so I'm not going to incorporate that change in this patch. :)
Comment #41
mfbThe patch changes the current behavior, my suggestion preserves current behavior.
Comment #42
sunSee, that's what happens if you delve into the "defender" role! ;) Thanks, man! :)
Comment #43
c960657 CreditAttribution: c960657 commentedIt's not that I am trying to pile up unrelated stuff in this issue, but I'd rather not that we add a completely new function, drupal_get_query_string(), and then immediately after file another issue about renaming it drupal_get_query(), knowing that the latter patch may be rejected due to API freeze.
Your patch still renames pager_get_querystring() to pager_get_query_string() and tablesort_get_querystring() to tablesort_get_query_string(), but there is a few variables left using the old spelling, $querystring.
I think this looks a bit confusing, with $query not referring to parameters as key-value-pairs but an array of 0, 1, or 2 URL-encoded strings. And later $query is reused to contain a string. Why not do
$options['query']['q'] = $path;
and then build the entire query string using drupal_query_string_encode() ?This change is perfectly fine, but I suggest that you change the order of the arguments to array_merge(). This would make it more obvious to readers of the code, that the 'sort' and 'order' arguments take precedence over any arguments in the $ts['query'] array. I know that the array does not contain these entries, but only after looking it up.
Now that drupal_get_destination() no longer calls urlencode(), the urldecode() here should be removed.
Comment #44
mfbCouple more issues:
* in field_ui.admin.inc, I don't think you want the urldecode() call inSee above.$destinations[] = urldecode($destination['destination']);
because destination is no longer urlencoded by drupal_get_destination().* in drupal_web_test_case.php, in two places you specify $this->drupalGet($url, array('absolute' => TRUE)) but drupalGet() always sets $options[absolute] = TRUE so this is unnecessary.
Comment #45
Dries CreditAttribution: Dries commentedI reviewed this patch and it is a great clean-up.
I'm wondering about the performance impact of this: url() and l() are in the critical path ... some performance testing would be good.
Some additional review:
"An list" should be "A list".
The explanation of $exclude is a bit wonky. Doesn't really explain why that parameter is useful, or to be used. Lacks some context.
It is a bit strange that $exclude is marked optional. Technically, it is optional, but in practice it is a parameter that needs to be non-empty for the function to do anything useful.
I think this change is harmless, but we have to be careful for XSS attacks. If I manually specify a custom query string attribute in the address bar, will it survive the array_merge()?
We don't usually abbreviate simple variables. We can use $absolute here.
Comment #46
sunOne of my other patches broke this one. :)
Straight re-roll for now.
I'll have another look at the follow-ups later on. However, especially the suggestion to rename to drupal_get_query() does not work for me, because when reading that function name I'd rather expect a database query, not URL query parameters. drupal_url_get_query(), url_get_query(), or drupal_get_query_parameters() may all work better, but then again, renaming those functions will take 5 minutes whenever an agreement is there, and I don't want to see this patch hold off, just because of that.
Comment #47
sun- Added drupal_parse_url() to properly parse a URL into an associative array suitable for url(). Basically an improved fork of my patch in #582456: Make confirm_form() respect query string destination.
- Changed query parameter handling for non-clean-URLs as suggested by c960657 in #43. Was that way before already, but changed in order to find the cause for the failing tests (which has been found elsewhere now, as mentioned before).
- Removed urldecode() in field_ui.admin.inc.
- Removed 'absolute' => TRUE changes from drupal_web_test_case.php.
@Dries:
- Regarding performance: Since most URLs already use the array-syntax for query parameters, the only difference in performance is the few times in core that url() was invoked with a string for $options['query']. There are no further changes regarding the amount of times drupal_query_string_encode()/drupal_get_query_string() or drupal_get_destination() is called.
- Optional arguments to drupal_get_query_string(): Actually, invoking drupal_get_query_string() without any arguments is a valid use-case. It defaults to $query = $_GET and $exclude = array('q'). Hence, you get $_GET with 'q' filtered out - the most common use-case.
- Security in tablesort: The last patch equaled the logic in the current code we have. Anyway, due to c960657's aforementioned suggestion, the 'sort' and 'order' parameters now come last, so they always have precedence.
...
Which leaves us with the function naming issue. Please all speak up now.
Comment #49
c960657 CreditAttribution: c960657 commentedSee also the suggested implementation of drupal_parse_url() in #553262: Make getAbsoluteURL() standard compliant. PHP's parse_url() is not suitable for parsing relative URLs.
Comment #50
sunReverted that change in url() for non-clean-URLs, because slashes in q= are escaped with that and we want those to be readable (according to comments in existing code). There are even tests that verify that.
Comment #51
sun@c960657: The patch in #553262: Make getAbsoluteURL() standard compliant seems to aim a completely different purpose. For regular URLs, we don't need any extra parsing. We could just use parse_url(), if url() would still support $options['query'] to be a string. However, this patch here removes that ambiguous support for string or array, so we need a simple wrapper around parse_url() to get a result that is suitable for url().
The second and more hidden issue that drupal_parse_url() fixes is that there are several places where code tries to directly pass $_GET['destination'] to url(). However, url() does not support a relative path containing a query string or fragment. Query parameters or a fragment always have to be passed via $options for internal/relative paths.
Comment #53
sunMy patch in #118345: Revamp hook_user_form/_register/_validate/_submit/_insert/_update broke this. :)
Feedback, anyone?
Comment #54
c960657 CreditAttribution: c960657 commentedThis can be fixed by changing drupal_query_string_encode() to use drupal_encode_path() instead of rawurlencode() (if I understand you correctly).
Well, to some extent. But I'd rather not have two versions in core for parsing URLs. I think we should combine the two in one function (assuming that the patch for that bug is accepted). Perhaps we can take this discussion in #337261: parse_url in drupal_goto breaks destination url so that it does not block this issue - drupal_goto() seems to be the only function calling your suggested drupal_parse_url().
I think query_parameters (or query_array) is better than query_string. Note that DatabaseStatementInterface::getQueryString() actually returns an SQL string with placeholders, so there is already total confusion :-) But okay, if there is agreement that we commit this in two stages, I'd be happy to roll the renaming patch when we get there.
Comment #55
sun#582456: Make confirm_form() respect query string destination will use the introduced drupal_parse_url() as well. Actually, that issue is currently blocked on this one. :(
So we need drupal_parse_url() for drupal_goto() in this patch and for confirm_form() in aforementioned issue. Yes, we can discuss further enhancements to that function in #337261: parse_url in drupal_goto breaks destination url.
Now I finally had some clues:
- drupal_encode_path() is a totally weird name. It's a wrapper around rawurlencode(). Therefore should be named drupal_rawurlencode().
- drupal_query_string_encode() is a custom implementation of http_build_query(). It returns a drupal_rawurlencode()'d query string. Therefore should be named drupal_http_build_query().
- And we now have drupal_get_query_parameters(). A bit long, but yeah. Basically, this function is a custom implementation of array_diff_key(). It just accepts different data to filter out of the first argument array. Actually, it works with any array, not necessarily query parameters. However, for now, let's go with that. Optionally re-think the function name in a separate issue.
- Renamed tablesort and pager "query_string" function suffix to "query_parameters".
- Renamed TableSort->getQueryString() to ->getQueryParameters().
- Added tests for drupal_get_query_parameters(). Let me note that the existing $exclude argument handling of the current/old drupal_encode_query_string() contains a bug. ;)
Benchmarks... it looks like my system is totally unsuitable for doing benchmarks. Anyway:
HEAD:
Patch:
Lastly, we want to put all PHP function wrappers and custom implementations of PHP functions into a new doxygen group: #584966: Add doxygen group for PHP function wrappers in Drupal
Comment #56
c960657 CreditAttribution: c960657 commenteddrupal_encode_path() should only be used on paths, not on query string parameters. That is why its name was changed from drupal_urlencode() in #310139: drupal_query_string_encode() should not call drupal_urlencode(). AFAICT you are reintroducing the bug that was fixed in #310139.
Note that when #284899: Drupal url problem with clean urls is fixed, the function will indeed be a simple wrapper around rawurlencode().
Good point.
w00t :-)
Comment #57
sunFurther discussion on IRC made me partially revert a couple of the last changes.
However, drupal_encode_path() is still moved to the URL/HTTP request handling function group and I rewrote its description to prevent this in the future.
Comment #58
sunNote to self: Always. Add. Tests.
This one should be ready to go. Added a unit test for drupal_parse_url(), and cleaned up PHPDoc some further.
Comment #59
sunRTBC, anyone?
As mentioned before, at least one issue is blocked by this one.
Comment #60
sunFixed a few nitpicks by chx.
Comment #61
Dries CreditAttribution: Dries commentedI'd love to see c960657 sign off on this patch, and mark it RTBC when he thinks it is.
Comment #62
sunc960657 somehow vanished... the latest patch is based on a IRC discussion with c960657, and based on that discussion, I think he was fine with the implementation.
Comment #63
c960657 CreditAttribution: c960657 commentedShouldn't this be
array('destination' => "node/$node->nid#comment-form")
?You should specify all three parameters for http_build_query(),
http_build_query($query, '', '&')
. Otherwise its result depends on an ini setting, i.e. the test may fail if the setting is set to an unexpected value.Apart from that it looks good.
Comment #64
sunThanks! I tried hard to catch you on IRC. :)
Attached patch fixes those two issues.
Comment #65
sunAlso preparing for #584966: Add doxygen group for PHP function wrappers in Drupal. :)
Comment #66
c960657 CreditAttribution: c960657 commentedThis looks good now :-)
Comment #67
Dries CreditAttribution: Dries commentedGreat! Good job. Committed to CVS HEAD!
Comment #68
c960657 CreditAttribution: c960657 commentedSmall follow-up based on the example URL in #337261: parse_url in drupal_goto breaks destination url. Apparently
parse_url('foo/bar:1')
returnsarray ('host' => 'foo', 'port' => 1, 'path' => '/bar:1')
.Comment #69
sunActually, it does:
But your specific example of a path containing a non-urlencoded colon is interpreted differently:
We rawurlencode() all query parameters, and we urldecode() $_GET['destination'] in drupal_goto().
Shouldn't we double-encode the 'destination' query parameter instead then, because we know that we'll urldecode it?
If not, then we at least want to make the inline comments more specific, i.e. mention the exact problem of parse_url().
Comment #70
c960657 CreditAttribution: c960657 commentedI don't think the problem is specific to destination.
drupal_parse_url('foo/bar:1')
should work.It's not a huge problem. The URL http://localhost/en/facetsearch/results/author:1 isn't generated by url() (because url() would have encoded the colon), so I guess the problem only occurs for "hand crafted" URLs. Anyway, the documentation for parse_url() explicitly says that it "doesn't work with relative URLs" (though elsewhere it states that "partial URLs are also accepted"), so I suggest we use the approach suggested in the patch anyway to guard against other (currently unknown) edge cases.'
I have extended the inline comment with an example.
Comment #71
sunThis code also runs for absolute (but not external) URLs -- so we probably want to use
here instead (ltrim(), not trim(), so a trailing slash is kept) + also add a test for that.
I'm on crack. Are you, too?
Comment #72
sunuhm, scratch that. That would make absolute paths relative. ;)
So, looks much better now, and I agree that we should do this to be on the safe side.
Comment #73
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks!
Comment #74
c960657 CreditAttribution: c960657 commentedAnother small follow-up:
In testGetAbsoluteUrl(), we need to call variable_set() instead of accessing $conf['clean_url'] directly, because $conf is overwritten by dwtc::refreshVariables() that is called from dwtc::drupalGet(). Without this patch, the test fails when clean URLs are enabled.
Comment #75
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD. Thanks.
Comment #77
catchhttp://api.drupal.org/api/function/drupal_query_string_encode/6 drupal_get_query_string() and friends no longer exist, couldn't find any documentation for that in the upgrade guide.
Comment #78
jpmckinney CreditAttribution: jpmckinney commentedNo longer a WTF.
Comment #79
c960657 CreditAttribution: c960657 commented@sun, I just realized that misunderstood your question about double-encoding in #69. Somehow I missed the fact the we call urldecode() in drupal_goto(). We shouldn't do that, so I just filed #796120: Do not urldecode() $_GET parameters in drupal_goto(). Your patch didn't do any double-encoding, so with the fix for #796120 I think we're good.
Comment #80
c960657 CreditAttribution: c960657 commentedThe renamed functions are now mentioned in the upgrade guide: http://drupal.org/update/modules/6/7#get_querystring.
The change to url() was already documented: http://drupal.org/update/modules/6/7#url_query_parameter.