Hello,
RE: Lines 1506-1510 in apachesolr_seach.module
// Replace keys with their rawurldecoded value
if (isset($fv['search_block_form'])) {
$raw_keys = rawurlencode($fv['search_block_form']);
$form_state['redirect'] = str_replace($fv['search_block_form'], $raw_keys, $form_state['redirect']);
}
The comment says the keys will be replaced with urldecoded value, while the code replaces the keys with the urlencoded value.
This was the apparant cause for the double encoding of our search string on the search results page. The problem disappeared once I changed:
$raw_keys = rawurlencode($fv['search_block_form']) TO $raw_keys = rawurldecode($fv['search_block_form']);
Is this a legitimate bug or am I missing something? Let me know if I should generate a patch.
Thank you for all your hard work.
Regards,
Mike
Comment | File | Size | Author |
---|---|---|---|
#69 | apachesolr-1688150-69.patch | 5.06 KB | raystuart |
| |||
#67 | apachesolr-1688150-67.patch | 4.96 KB | herved |
| |||
#64 | apachesolr-1688150-64.patch | 681 bytes | shrop |
#63 | apachesolr.patch | 2.26 KB | pBartels |
#62 | apachesolr.patch | 2 KB | pBartels |
Comments
Comment #1
Nick_vhthis, my friend, is very much a bug.
Can you make a patch? This is a very good find and easy to fix.
Comment #2
HalfChem CreditAttribution: HalfChem commentedThe patch is attached.
This is my first patch, so please let me know if there is anything incorrect.
Regards,
-Mike-
Comment #3
Cameron Tod CreditAttribution: Cameron Tod commentedComment #5
Cameron Tod CreditAttribution: Cameron Tod commentedRerolled patch.
Comment #6
alanmackenzie CreditAttribution: alanmackenzie commentedI can confirm that patch in #5 applies to 7.x-1.x-dev correctly and fixes the issue.
Comment #7
Nick_vhActually the code is correct, the comment is not
You can test it :
try typing test/test/test as keyword in the search block. You'll see that with your patch, the keyword is only test instead of test/test/test. Hence the urlencode. So, I guess a patch that fixes the comment would be welcome :-)
Comment #8
Nick_vhComment #9
Nick_vhcommitted in combination with another commit. Let me know if this needs a follow-up
Comment #11
shawn_smiley CreditAttribution: shawn_smiley commentedPart of the patch in #5 didn't make it into the release/dev tree.
The relevant code that is missing is on line 1502 of apachesolr_search.module:
The code
Should be
Re-applying the patch from #5 to the rc3 release fixed the double encoding issue.
Comment #12
ianthomas_ukShaun_Smiley: See comment #7. Please can you add some steps to reproduce.
Comment #13
shawn_smiley CreditAttribution: shawn_smiley commentedHere is our environment related to search:
On the site we use Solr for both the global site search and several pre-defined search pages (e.g. All support documents) and have a number of facets enabled (mostly taxonomy facets). We also have the "Search Form" block enabled and placed in the header region of the theme.
The problem only occurs when using the "Search Form" block.
So steps to reproduce would be:
The result is that the search form block appears to be URL encoding "water savings" to "water%20savings" during the submission and then the solr module is doing another encoding changing it to "water%2520savings".
I have reproduced this on a clean Drupal/solr site as well containing just the Solr module, solr multisite, and Facet API.
Comment #14
jhedstromI'm seeing this behavior in 7.x-1.0-rc3, and can confirm that the patch from #5 is not in the repo.
Comment #15
jhedstromRerolled #5 to cleanly apply.
Comment #16
shenzhuxi CreditAttribution: shenzhuxi commentedhttp://drupal.org/node/1774916
I find it only happen on php 5.4
Is it because "phpversion()>=5.3 will compliant with RFC 3986, while phpversion()<=5.2.7RC1 is not compliant with RFC 3986."?Comment #17
pwolanin CreditAttribution: pwolanin commentedRight, so this is a bug, but the fix is clearly not complete since it breaks searches w/ slashes.
Comment #18
rszrama CreditAttribution: rszrama commentedCan we get a second review on this? I'm not sure what you mean about it breaking searches with slashes - we just put this patch on drupalcommerce.org to fix our search block woes, and it works great. Searches with slashes turn up just fine and function equivalent to Drupal core search itself, with the DC.org search URL being generated w/ this patch and the ByWombats.com URL being generated by the core search block without Apache Solr:
Upon review, it seems to be ready to commit unless you think core itself is broken without Apache Solr as well.
(On the off chance you meant forward or backslashes, backslashes are still escaped properly.)
Comment #19
pwolanin CreditAttribution: pwolanin commentedOdd, I'll have to re-test. When I tried it a day or so ago it didn't work.
Comment #20
Nick_vhSo there seems to be a lot of confusion here
If someone would be so kind to see if this is not a php issue (see http://drupal.org/node/1774916 as mentioned by shenzhuxi)
Also, try to make a simpletest (should be easy right?) that replicates the issue so we have a reproducable way of fixing this.
If we manage that, we are already one step further. I have not seen any of this behavior recently so I'd inclined to say that this could be the problem of php indeed.
Comment #21
rszrama CreditAttribution: rszrama commentedDrupalCommerce.org is hosted on Acquia Dev Cloud running PHP 5.2.4-2ubuntu5.26 for what it's worth. On ByWombats.com where the search block form submits just fine without double encoding (and without this module installed), I'm using PHP 5.2.6-2ubuntu4.2. In other words, these are both running PHP 5.2.x and the Apache Solr module is the main difference disrupting the way the search block form is working. Once patched, it works fine on DC.org.
If you look at the patch, this is a case of the exact opposite function being called with respect to the search block form (rawurlencode() vs. rawurldecode()), so I just don't get how it could be PHP related.
Comment #22
kevishie CreditAttribution: kevishie commentedPatched work perfect for me. I was getting double encoding when submitting a search with the default search block. For example a search for "Blah Blah" would return "Blah%2520Blah"
Thanks
Comment #23
shenzhuxi CreditAttribution: shenzhuxi commented@kevishie
Do you use php 5.4 built-in web server too?
http://drupal.org/node/1774916#comment-6505704
Comment #24
rszrama CreditAttribution: rszrama commented@shenzhuxi Note my comment above that this also affects 5.2.x versions.
Comment #25
shenzhuxi CreditAttribution: shenzhuxi commentedI can't reproduce it on 5.2.x after testing. It's only on php 5.4 built-in web server.
Comment #26
rszrama CreditAttribution: rszrama commentedOk; at least, it's only reproducible for you on PHP 5.4... we can't deny my test results and phpinfo()'s. : P
Could be something peculiar about the PHP configuration on Dev Cloud. : ?
Comment #27
Nick_vhBasically we need to make 100% that this works for both versions AND we need to figure out where it goes wrong.
is anyone up to some comparison?
Comment #28
rszrama CreditAttribution: rszrama commentedI'm wondering why it isn't just considered a simple typo. : ?
Comment #29
MathieuMa CreditAttribution: MathieuMa commentedPatch #15 helped me, thanks.
I'm continuing to check another issue which seem related : accents and special chars get double encoded in the request.
Here is an example for a search for
l'oréal
- interestingly the requested word is double encoded, but not the spellcheck request :GET /solr/xxx/select?start=0&rows=10&&spellcheck=true&q=l%26%23039%3Bor%C3%A9al&fl=id%2Centity_id%2Centity_type%2Cbundle%2Cbundle_name%2Clabel%2Cis_comment_count%2Cds_created%2Cds_changed%2Cscore%2Cpath%2Curl%2Cis_uid%2Ctos_name&spellcheck.q=l%27or%C3%A9al&qf=taxonomy_names%5E2.0&qf=label%5E5.0&qf=content%5E40&qf=ts_comments%5E20&qf=tos_name%5E3.0&hl.fl=content%2Cts_comments&wt=json&json.nl=map
(I'm now trying to figure how the module works - it's my first real check of this module)
Comment #30
Nick_vhThere is only 1 real test
Search for test % / and make sure that all characters are encoded correctly. If the slash does not propagate back to the search result, we did something wrong and it will break for example facetapi_pretty_paths module as it depends on slashes
Comment #31
MathieuMa CreditAttribution: MathieuMa commentedWhen doing this search, I receive the following URL :
http://xxx/search/site/test%20%25%20
I see a redirection, where the trailing / is removed.
Testing with
test é / %
removes everything after the / in the search field, but not in the URL :http://xxx/search/site/test%20é%20/%20%25
Now, there is something odd : doing the exact same test with
l'oréal
now works as expected (the patch mentioned earlier here was applied). Maybe the remaining issue was due to a caching bug somewhere (I was in production, so we have the full caching stack enabled - varnish, memcached, apc, drupal)I also added lucene-memory.jar to the java classpath this morning, it was missing (but I don't see how it could be related)
Comment #32
Nick_vhI have not been able to catch a bug aside from the spellcheck. We should look at the impact fo the urldecode for the spellcheck and perhaps add the same code there. I do agree we should check if we can make the url bar a little cleaner with those special chars.
Here is why we do the encode for the query. This might be a totally different issue however.
Comment #33
Nick_vhSo what happens is the above code executes? You'll get an url like this : "search/site/test%20%26%25%24%40%20///"
How?
What does this return?
"search/site/test%20%26%25%24%40%20///"
Solution?
We solved this by "double encoding" of the query, this might not be the best solution, I agree. But removing that is not the solution.
Proposal?
What if we only encode the slashes twice? Would this help?
Attached is a patch that does just that + an image below to prove it
Comment #34
MathieuMa CreditAttribution: MathieuMa commentedThis indeed seems to be a nice solution : it doesn't break the URL, nor does it break the repeated search in the search box.
(and we don't get an additional redirect / wrong arguments passed because of the trailing/included slash)
Now I understand why this double encoding was done, there was a reason in the first place (which I didn't caught - bit It couldn't be something done without a meaning)
Comment #35
Nick_vhCommitted to 7.x-1.x. Thanks for all the input!
Comment #36
marcelovaniI had the same problem on the search form on results page, adding the patch for 7.x
Comment #37
cpliakas CreditAttribution: cpliakas commentedHi marcelovani.
As noted in #35, this patch was already committed to the 7.x-1.x branch of Apache Solr Search Integration. However, we would love some validation that this issue affects the 6.x-3.x branch, and if so we would be accepting patches for that version of the module. Reverting the status.
Thanks,
Chris
Comment #38
marcelovaniHi cpliakas
I believe the patch commited was for apachesolr_search.module, which fixes the problem with the form in the search box.
The patch I am submitting is for apachesolr_search.pages.inc, which fixes the problem with the form in the results page.
Maybe I should have put it into another issue?
Comment #39
brant CreditAttribution: brant commentedAgree with marcelovani -- had the same problem with an apachesolr_search_custom_page_search_form, and the patch in #36 to apachesolr_search.pages.inc fixes it for me (under 7.x). Changing status to needs review.
FWIW, for those running IIS (all 3 of you), this issue will trigger the following exception:
[HttpException (0x80004005): A potentially dangerous Request.Path value was detected from the client (%).]
...as the requestFiltering default config thinks the double-encoding is a bad idea.
HTH!
Comment #40
brant CreditAttribution: brant commentedDo note, however, that with these two patches applied, forward-slashes are still double-encoded (which is incorrect according to RFC), and any search containing a forward-slash will still cause the exception mentioned above under IIS. I do not know what the right answer here is, but seems like it might be outside of the _submit() functions.
Comment #41
jos_s CreditAttribution: jos_s commentedI also applied patch #36 to apachesolr_search.pages.inc. This solved the problem for me. I suggest it is committed there too.
Comment #42
nrahlstr CreditAttribution: nrahlstr commentedI am still seeing this error on Drupal-7.22 and apachesolr-7.x-1.3 even though your patch is definitely applied. When searching for "test &%$@ //// fishing" I get the resulting URL:
/search/site/test%2520%2526%2525%2524%2540%2520%252F%252F%252F%252F%2520fishing
and a not very pretty Apache Internal Server Error page.
No doubt it could be a problem between the chair and the keyboard...or has something else changed?
Here is what the relevant code looks like:
Thanks for any input!
Nathan
Comment #43
nrahlstr CreditAttribution: nrahlstr commentedOne additional datapoint to mention. RHEL6 php:
PHP 5.3.3 (cli) (built: Nov 29 2012 04:12:23)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
Thanks!
Nathan
Comment #44
nrahlstr CreditAttribution: nrahlstr commentedI think the php version is the issue. I'll investigate and respond back here.
Thanks!
Nathan
Comment #45
nrahlstr CreditAttribution: nrahlstr commentedTurns out my issue as reported last month had nothing to do with the PHP version.
We have CA SiteMinder around our Drupal site that was intercepting the encoded characters as XSS and shutting down the page...wow was that a hard one to find!
I've built a work-around into the module in my implementation. Sadly I think I will have to maintain a local patch.
Thanks,
Nathan
Comment #46
nrahlstr CreditAttribution: nrahlstr commentedI wrote up a description of how I resolved my siteminder + crazy (XSS potential) search terms on my blog here:
http://wp.me/p3OExS-K
Thanks!
Nathan
Comment #47
Nick_vhSetting this back to 6.x-3.x and needs backport. Any other adjustment to 7.x-1.x requires a new issue. Thanks for all the valuable input!
Comment #48
barancekk CreditAttribution: barancekk commentedAfter using the patch from from #36 slash "/" character gets properly into url but didn't get to the search phrase. For example when searching for "1/2inch screew" url redirect url is proper "site/custom_search/1%252f2inch%20screew" but the search phrase is only "1" and everything after "/" dissapears with slash too. I found that this is happening for custom defined searches that are handlend with apachesolr_search_custom_page() callback.
I am using 6.x-3.0-rc1+55-dev version when this is definitely happening. Found out this is happening because when drupal bootsraps and choosing which callback to use from menu_router it parses arguments by the "/" character. For example if I have defined "custom_search/%" as a custom search callback and I am passing it "custom_search/1%252f2inch%20screew" drupal encodes this url and makes "custom_search/1/2inch screew" and routes is as "custom_search/%" where is passing 1 as an argument "$keys" for apachesolr_search_custom_page() callback.
Because of this I made a patch that will add the rest of the search string to the $keys in apachesolr_search_custom_page() to make work the custom search work properly. I am not sure which versions of the apachesolr module have still this problem but can confirm for "6.x-3.0-rc1+55-dev".
Patch is in the attachment.
Comment #49
barancekk CreditAttribution: barancekk commentedThis is patch for comment #48.
Comment #50
yannickooCan anyone link the commit to the
7.x-1.x
branch? I cannot find it :/Comment #51
oknateThis fix worked for me:
see
https://www.drupal.org/files/1688150-double_encoding_search_terms.patch
at
https://www.drupal.org/node/1688150
Comment #52
oknateComment #53
oknateright now, when it redirects, it double encodes the search term in the url,
Search for anything (2+ words) http://qa.nysenate.codeandtheory.net/search/global/jesse%2520hamilton
2. Observe copy below "Search Results"
Actual result: YOUR SEARCH FOR "JESSE%20HAMILTON" GAVE BACK 90 RESULTS.
Comment #54
edmund518 CreditAttribution: edmund518 as a volunteer commentedTo get rid of the space(s) at the begin and end of the string:
$redirect_value = trim(str_replace("/","%2f",$form_state['values']['keys']));
Personally I preferred this method
or
$redirect_value = trim(rawurldecode($form_state['values']['keys']));
Comment #55
edmund518 CreditAttribution: edmund518 as a volunteer commentedThe patch...
Comment #56
pwolanin CreditAttribution: pwolanin as a volunteer and at Acquia commentedComment #57
alibama CreditAttribution: alibama as a volunteer commentedi'm still seeing this error.... really breaks the site to pieces... tried latest patch on latest dev
Comment #58
joachim CreditAttribution: joachim commentedWhat I'm seeing here is that this issue isn't yet fixed, though a patch was committed some time ago, and made it into the 7.x-1.0 release:
However, with 7.x-1.7, a search term of "<>chief O'Brien<b>" gets me a URL of search/site/%253Cb%253Echief%2520O%2527Brien%253Cb%253E (and a page title of "%3Cb%3Echief%20O%27Brien%3Cb%3E" but that is a separate problem and I'll file another issue with a patch).
Patch #55:
> To get rid of the space(s) at the begin and end of the string:
That's best left to another issue to avoid muddying the waters. This issue is already pretty hard to follow!
Patch #52: this fixes the problem for me. However, the documentation above that line is now completely wrong:
I am not sure what it should say instead! So setting this to needs work.
Comment #59
alibama CreditAttribution: alibama as a volunteer commented@joachim - fwiw I updated from solr 3.x to solr 5.x and that seems to have resolved this.... so it may be a non-issue for newer sites (btw solr 5 works **really** well)
Comment #60
joachim CreditAttribution: joachim commentedMy site is on 4.10.
Comment #61
joachim CreditAttribution: joachim commentedAh, with patch #55, a '/' in the search terms causes anything after it to get ignored, because it becomes a path component delimiter in the URL.
Eg searching for 'Either/Or' produces a URL of /search/site/Either/Or and the search term is thus taken to be just 'Either'.
Comment #62
pBartels CreditAttribution: pBartels commentedHello,
the apachesolr search did not really work for us without the double encoded search string.
When submitting a search via the drupal search block it did not work for example for search strings including + and &.
When submitting a search via the search block provided by apachesolr on the search page it did work, because of the double encoding.
When applying a filter or changing a sort order it stopped working again because the accordings links were not properly (double) encoded.
An easy solution was to double encode all filter and sort links and the search block search submission. I had to ensure as well to decode html entities submitted by the drupal search block autocompletion because this broke the solr search as well.
I provided an according patch.
Another solution I tried was to remove the double encoding from the search string in apachesolrs custom search block. But it seems apachesolr or drupal only works with the double encoded search strings, as drupal or better php, seems to perform a urldecode on the search parameters before passing the search to apachesolr. Search strings including an ampersand are rendered unusable this way as "TermA&TermB" is transformed to only "TermA", as drupal seems to think "TermB" is a variable (afaik).
I hope this helps somebody and will be fixed in upcoming releases. With the patch, eveything works fine now for us.
Comment #63
pBartels CreditAttribution: pBartels commentedforgot the pagination...
Comment #64
shrop CreditAttribution: shrop at Mediacurrent commentedUploading a rerolled patch for #51 by oknate so it will apply cleanly. Removed paths in the diff that included "docroot/sites/all/modules/contrib/apachesolr".
Comment #65
shrop CreditAttribution: shrop at Mediacurrent commentedComment #66
shrop CreditAttribution: shrop at Mediacurrent commentedComment #67
herved CreditAttribution: herved commentedHello everyone, I just stumbled on this issue.
1. The path in $form_state['redirect'] will go through rawurlencode() in url() so we can't use rawurlencode() here otherwise we would double encode it. I agree with #55 to only encode the slashes ('/' to '%2f').
2. So this means we need to do the opposite ('%2f' to '/') when retrieving the search string in apachesolr_search_custom_page().
ATM that function retrieves the search string like this:
$keys = rawurldecode($keys);
This is not what we want because $keys comes from the menu router page_arguments and is already url-decoded.
We don't want to double decode the search string.
So we should only decode the slash we encoded earlier. So it becomes:
$keys = str_replace('%2f', '/', $keys);
3. This also impacts the title callbacks which should also do the same ('%2f' to '/').
In this patch I changed a bit the behavior of titles slightly:
- always replace % placeholders (with empty string if needed).
So if no search strings were entered we don't get for example: "Search results: %terms" but "Search results: "
- and it also includes the fix from #1314664-45: Search pages custom pages title setting does nothing
Let me know if this works for you,
Hervé
Comment #68
kerby70 CreditAttribution: kerby70 at FFW commentedAside: For a very similar, but custom issue.
I had a similar issue on a custom search form block. Had some issues when trying a similar solution to the one discussed here, the resulting encoded string was different than the on-page form would create. I found it was simpler to rawurldecode & re-urlencode like this `$search_query = urlencode(rawurldecode($search))`. Some of the issues I had were related to spaces ` ` vs `+` and `&` vs `%26`. For instance `/search?search=test & test2` vs `/search?search=test+%26+test2`.
Comment #69
raystuart CreditAttribution: raystuart at Tag1 Consulting commentedRe-rolling #67