Background Info
On the backend, I have a node type called "Burst" with a custom field to store a URL (provided by the Link module, of course).
On the frontend, I want to take the URL the user entered in that custom field and link to it.
My needs are simple so I went to "Manage display" for the "Burst" content type ( found at /admin/structure/types/manage/burst/display ) and made the field visible. Additionally, I set the format to "Title, as link (default)"
Encountered problem
Thus far, everything has been working well with each "Burst" content linking to the stored URL. However, in one instance, I found it butchering the URL as follows:
User input URL stored by the custom field:
http://irinawerning.com/index.php?/back-to-the-fut/back-to-the-future/
The URL on the frontend:
http://irinawerning.com/index.php?%2Fback-to-the-fut%2Fback-to-the-futur...
I only caught this problem when I was browsing and clicking around. Clicking the link on the frontend for this particular case takes me to a blank page. At first, I thought the stored URL was wrong but that's not the case. Looking at the HTML source code, I get the above different URL. The forward slashes "/" are changed to "%2F" rendering the link wrong. (uh... not working)
I am by no means knowledgeable in this area and I am taking a stab in the air but this seems somewhat related to URL encoding issues. That's my best guess.
Either way, please help.
| Comment | File | Size | Author |
|---|---|---|---|
| #38 | link_module_displays-1914072-38.patch | 1.97 KB | sinduri |
Comments
Comment #1
eevensen commentedSubscribe
Comment #2
KoshaK commentedthere is SEVERAL issues opened already, including me, that relates to this. I belive the problem is in " ? " query used in the link. everithing after the " ? " converts in to %code , for testing purposed removing ? from the link dispayed fine, but ofcause the link will not work.
Comment #3
ottar commentedAccording to http://perishablepress.com/stop-using-unsafe-characters-in-urls/, a URL usually "has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.".
The RFC 1738 standard for URI queries lists $ & + , / : ; = ? @ as reserved characters that only need encoding when not used for their defined reserved purposes.
Unfortunately, the Link Module encodes reserved characters (at least some of them) that are part of the URI query part, i.e. everything that follows after the ?. This may therefore create problems. Here's an example:
The European Commission's so-called comitology register is using URLs of the following type: http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum...
which includes the following query string
?do=search.documentdetail&x8OmaW%2FL46BuLhL6WqnmkJEglLwooD5HgaQpDucw9D0n%2FQhs71dMAJ5dvcXCvNIj
When put into a link field, the / (forward slash) will be transformed (encoded) into %2F, resulting in the URL:
http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum...
where the query string now reads
?do=search.documentdetail&x8OmaW%2FL46BuLhL6WqnmkJEglLwooD5HgaQpDucw9D0n%2FQhs71dMAJ5dvcXCvNIj
Here, the two / have been encoded into %2F (but the & has not been encoded?!). This encoding seems to change the semantic of the URL, as pointed out above. Instead of taking you to the specific page, the encoded URL now brings you to the site's homepage.
So, can anyone explain why the Link Module encodes (some) reserved characters in ? query strings? Could or should this be changed in a future version? If not, it seems that I am not the only one who would highly appreciate a patch that would stop/avoid such encoding!
Comment #4
gunzip commentedsame for me. i've got unreachable urls with colon and semicolon wrongly encoded...
Comment #5
gunzip commentedComment #6
summit commentedHi,
This is set as head-issue about this.
In https://www.drupal.org/node/1984398 I see a patch for question mark.
Looking forward for a permanent solution for this link malforming!
Greetings, Martijn
Comment #7
dgtlmoon commentedA work around for those affected, this patch needs its equivalent added to the .test
Comment #8
dgtlmoon commentedupdated patch, it was not taking into consideration #2333119: Output broken when using array parameters in query
Comment #9
dgtlmoon commentedFinal patch, removes the original query
Comment #10
mh86 commentedI'm facing the same problem. Tested the patch from #9 and it did not immediately solve my issue with semicolons in query strings.
I had to add the semicolon to following line:
if ($query_parts = preg_split('/(&|&|=|;)/i', $parts['query'])) {instead of
if ($query_parts = preg_split('/(&|&|=)/i', $parts['query'])) {This got the link somehow working, even though the original semicolons got replaced by ampersands in the final URL. Also there are probably more characters we need to take care of ($ & + , / : ; = ?)
Furthermore I'm wondering why we are splitting external URLs in such a complicated way at all, why not just use the URL the user provided (with check_url))?
Comment #11
dgtlmoon commentedOK so slight refactor
Comment #13
dgtlmoon commentedThird time
Comment #14
dgtlmoon commentedComment #17
dgtlmoon commentedThere's some really horribly named and commented functions in this module
Comment #18
dgtlmoon commentedComment #19
dgtlmoon commentedComment #20
klausiThat patch does not work with http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum... from above or http://example.com/cgi-bin/appl/selfservice.pl?action=jobdetail;job_pub_...
So for user-entered external links we must touch them as little as possible to keep them as is. Here is a solution that emulates check_url() but does not mess up the "&" character.
Comment #22
klausiWe should also check for UTF-8, same as check_url() would do.
Comment #25
prasannag commentedThe patch at #22 is working fine. +RTBC
Comment #26
prasannag commentedComment #27
Paul Lomax commentedPatch in #22 fixes some long standing issues for us! +RTBC
Comment #28
xen commentedNew version of #22. I believe that no html quoting should be necessary as l() should take care of that. The non-external link handling code doesn't do any html quoting.
Comment #30
xen commentedGah, forgot status.
Comment #32
amkloseI might be doing this wrong. I'm trying to link to an internal page while including url parameters for the view filters.
The URL I entered in the link field is
/request-a-quote?field_expertise=64,but it's rendered as
/request-a-quote%3Ffield_expertise%3D64I applied patch #28 and that doesn't seem to make any difference. Does anybody know what I might be doing wrong?
Comment #33
klausi@Xen: Right, just tested that we don't need the extra HTML quoting.
Patch does not apply anymore, rerolled.
Comment #34
klausiPatch does not apply anymore, rerolled.
Comment #35
kyuubi commentedHi,
I'm having the same issue and the patch doesn't fix it.
Like @amklose mentioned in my case a url like:
hybridauth/window/Google?destination=frontpage%3Fredirect%3Dfirst-login&destination_error=frontpage
becomes
hybridauth/window/Google%3Fdestination%3Dfrontpage%253Fredirect%253Dfirst-login%26destination_error%3Dfrontpage
I think the issue is that the patch assumes this problem only occurs for external urls while this happen for internal ones as well.
Thanks,
Duarte
Comment #36
pifagorComment #37
klausiLet's focus this issue on external links only for now to not make the scope here too big.
The patch is still ready for review, we just need some test cases to demonstrate what we want to fix here.
Comment #38
sinduriNew version of #34