Hi,
I see this module might switch maintainers, but this feature request is pretty simple, and I've even solved it locally on my site (internal company site). If I submit a URL that contains a "^" character, it displays the "Not a Valid URL" error and will not accept it. The "^" character is a valid character in a URL and does not need to be escaped using "%". For me, the solution was to edit line 677 of link.module:

// $query = "(\/?\?([?a-z0-9+_|\-\.\/\\\\%=&,$'():;*@\[\]]*))";
$query = "(\/?\?([?a-z0-9+_|\^\-\.\/\\\\%=&,$'():;*@\[\]]*))";

There might be some other valid characters that could be escaped as well-- I only came across this one because we use it as a part of a basic regular expression passed on the URL. Thanks!

Paul

Comments

ridgerunner’s picture

Wrong! The caret ("^") is NOT a valid URL character and must be percent encoded if you follow the recommendations of the IETF. This is one of the "unsafe" characters defined in RFC1738 (which was updated by RFC3986).

RFC1738: 2.2 URL Character Encoding Issues
...
Unsafe:

Characters can be unsafe for a number of reasons. The space
character is unsafe because significant spaces may disappear and
insignificant spaces may be introduced when URLs are transcribed or
typeset or subjected to the treatment of word-processing programs.
The characters "<" and ">" are unsafe because they are used as the
delimiters around URLs in free text; the quote mark (""") is used to
delimit URLs in some systems. The character "#" is unsafe and should
always be encoded because it is used in World Wide Web and in other
systems to delimit a URL from a fragment/anchor identifier that might
follow it. The character "%" is unsafe because it is used for
encodings of other characters. Other characters are unsafe because
gateways and other transport agents are known to sometimes modify
such characters. These characters are "{", "}", "|", "\", "^", "~",
"[", "]", and "`".

All unsafe characters must always be encoded within a URL.
...

RFC3986 updated RFC1738 to allow the tilde character ("~") to be used, (and the "#" character is the reserved delimiter for the fragment portion of the URI), but the "^" char is still a no-no to use unencoded if you wish to adhere to the IETF recommendations. That said, yes, some people do not follow the recommended procedures and use the caret, unencoded (in the query portion of the URI), but this is still a mangling of the standard and certainly should not be encouraged IMHO.

sreynen’s picture

Status: Active » Closed (won't fix)

#1 seems like a good reason not to "fix" this. The would point of this module is to enforce valid URLs, and this would work against that.