Background Info

On the backend, I have a node type called "Burst" with a custom field to store a URL (provided by the Link module, of course).
On the frontend, I want to take the URL the user entered in that custom field and link to it.

My needs are simple so I went to "Manage display" for the "Burst" content type ( found at /admin/structure/types/manage/burst/display ) and made the field visible. Additionally, I set the format to "Title, as link (default)"

Encountered problem

Thus far, everything has been working well with each "Burst" content linking to the stored URL. However, in one instance, I found it butchering the URL as follows:

User input URL stored by the custom field:
http://irinawerning.com/index.php?/back-to-the-fut/back-to-the-future/

The URL on the frontend:
http://irinawerning.com/index.php?%2Fback-to-the-fut%2Fback-to-the-futur...

I only caught this problem when I was browsing and clicking around. Clicking the link on the frontend for this particular case takes me to a blank page. At first, I thought the stored URL was wrong but that's not the case. Looking at the HTML source code, I get the above different URL. The forward slashes "/" are changed to "%2F" rendering the link wrong. (uh... not working)

I am by no means knowledgeable in this area and I am taking a stab in the air but this seems somewhat related to URL encoding issues. That's my best guess.

Either way, please help.

Comments

eevensen’s picture

Subscribe

KoshaK’s picture

there is SEVERAL issues opened already, including me, that relates to this. I belive the problem is in " ? " query used in the link. everithing after the " ? " converts in to %code , for testing purposed removing ? from the link dispayed fine, but ofcause the link will not work.

ottar’s picture

According to http://perishablepress.com/stop-using-unsafe-characters-in-urls/, a URL usually "has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.".

The RFC 1738 standard for URI queries lists $ & + , / : ; = ? @ as reserved characters that only need encoding when not used for their defined reserved purposes.

Unfortunately, the Link Module encodes reserved characters (at least some of them) that are part of the URI query part, i.e. everything that follows after the ?. This may therefore create problems. Here's an example:

The European Commission's so-called comitology register is using URLs of the following type: http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum...

which includes the following query string

?do=search.documentdetail&x8OmaW%2FL46BuLhL6WqnmkJEglLwooD5HgaQpDucw9D0n%2FQhs71dMAJ5dvcXCvNIj

When put into a link field, the / (forward slash) will be transformed (encoded) into %2F, resulting in the URL:
http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum...

where the query string now reads

?do=search.documentdetail&x8OmaW%2FL46BuLhL6WqnmkJEglLwooD5HgaQpDucw9D0n%2FQhs71dMAJ5dvcXCvNIj

Here, the two / have been encoded into %2F (but the & has not been encoded?!). This encoding seems to change the semantic of the URL, as pointed out above. Instead of taking you to the specific page, the encoded URL now brings you to the site's homepage.

So, can anyone explain why the Link Module encodes (some) reserved characters in ? query strings? Could or should this be changed in a future version? If not, it seems that I am not the only one who would highly appreciate a patch that would stop/avoid such encoding!

gunzip’s picture

same for me. i've got unreachable urls with colon and semicolon wrongly encoded...

gunzip’s picture

Priority: Normal » Critical
summit’s picture

Hi,
This is set as head-issue about this.
In https://www.drupal.org/node/1984398 I see a patch for question mark.

Looking forward for a permanent solution for this link malforming!
Greetings, Martijn

dgtlmoon’s picture

Version: 7.x-1.1 » 7.x-1.x-dev
StatusFileSize
new991 bytes

A work around for those affected, this patch needs its equivalent added to the .test

dgtlmoon’s picture

Status: Active » Needs review
StatusFileSize
new979 bytes

updated patch, it was not taking into consideration #2333119: Output broken when using array parameters in query

dgtlmoon’s picture

StatusFileSize
new1017 bytes

Final patch, removes the original query

mh86’s picture

Status: Needs review » Needs work

I'm facing the same problem. Tested the patch from #9 and it did not immediately solve my issue with semicolons in query strings.
I had to add the semicolon to following line:
if ($query_parts = preg_split('/(&|&|=|;)/i', $parts['query'])) {

instead of
if ($query_parts = preg_split('/(&|&|=)/i', $parts['query'])) {

This got the link somehow working, even though the original semicolons got replaced by ampersands in the final URL. Also there are probably more characters we need to take care of ($ & + , / : ; = ?)

Furthermore I'm wondering why we are splitting external URLs in such a complicated way at all, why not just use the URL the user provided (with check_url))?

dgtlmoon’s picture

Status: Needs work » Needs review
StatusFileSize
new1.57 KB

OK so slight refactor

Status: Needs review » Needs work

The last submitted patch, 11: 1914072-query-parts.patch, failed testing.

dgtlmoon’s picture

StatusFileSize
new1.16 KB

Third time

dgtlmoon’s picture

Status: Needs work » Needs review

The last submitted patch, 7: 1914072-query-parts.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 13: 1914072-query-parts.patch, failed testing.

dgtlmoon’s picture

There's some really horribly named and commented functions in this module

/**
 * Unpacks the item attributes for use.
 */
function _link_load($field, $item, $instance) {
dgtlmoon’s picture

StatusFileSize
new1.08 KB
dgtlmoon’s picture

Status: Needs work » Needs review
StatusFileSize
new1.08 KB
klausi’s picture

StatusFileSize
new2.19 KB

That patch does not work with http://ec.europa.eu/transparency/regcomitology/index.cfm?do=search.docum... from above or http://example.com/cgi-bin/appl/selfservice.pl?action=jobdetail;job_pub_...

So for user-entered external links we must touch them as little as possible to keep them as is. Here is a solution that emulates check_url() but does not mess up the "&" character.

Status: Needs review » Needs work

The last submitted patch, 20: link-external-1914072-29.patch, failed testing.

klausi’s picture

Status: Needs work » Needs review
StatusFileSize
new2.36 KB
new1.26 KB

We should also check for UTF-8, same as check_url() would do.

Status: Needs review » Needs work

The last submitted patch, 22: link-external-1914072-22.patch, failed testing.

The last submitted patch, 22: link-external-1914072-22.patch, failed testing.

prasannag’s picture

The patch at #22 is working fine. +RTBC

prasannag’s picture

Status: Needs work » Reviewed & tested by the community
Paul Lomax’s picture

Patch in #22 fixes some long standing issues for us! +RTBC

xen’s picture

StatusFileSize
new1.97 KB

New version of #22. I believe that no html quoting should be necessary as l() should take care of that. The non-external link handling code doesn't do any html quoting.

The last submitted patch, 22: link-external-1914072-22.patch, failed testing.

xen’s picture

Status: Reviewed & tested by the community » Needs review

Gah, forgot status.

Status: Needs review » Needs work

The last submitted patch, 28: link_module_displays-1914072-28.patch, failed testing.

amklose’s picture

I might be doing this wrong. I'm trying to link to an internal page while including url parameters for the view filters.

The URL I entered in the link field is /request-a-quote?field_expertise=64,
but it's rendered as /request-a-quote%3Ffield_expertise%3D64

I applied patch #28 and that doesn't seem to make any difference. Does anybody know what I might be doing wrong?

klausi’s picture

Status: Needs work » Needs review
StatusFileSize
new1.62 KB

@Xen: Right, just tested that we don't need the extra HTML quoting.

Patch does not apply anymore, rerolled.

klausi’s picture

StatusFileSize
new1.66 KB

Patch does not apply anymore, rerolled.

kyuubi’s picture

Hi,
I'm having the same issue and the patch doesn't fix it.
Like @amklose mentioned in my case a url like:

hybridauth/window/Google?destination=frontpage%3Fredirect%3Dfirst-login&destination_error=frontpage

becomes

hybridauth/window/Google%3Fdestination%3Dfrontpage%253Fredirect%253Dfirst-login%26destination_error%3Dfrontpage

I think the issue is that the patch assumes this problem only occurs for external urls while this happen for internal ones as well.

Thanks,
Duarte

pifagor’s picture

Status: Needs review » Needs work
klausi’s picture

Title: Link Module displays malformed URL » Link Module displays malformed external URL
Issue tags: +Needs tests

Let's focus this issue on external links only for now to not make the scope here too big.

The patch is still ready for review, we just need some test cases to demonstrate what we want to fix here.

sinduri’s picture

StatusFileSize
new1.97 KB

New version of #34