Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
All menu paths are urlencoded on output (by Drupal) when placed in the GET query.
All GET values (including the menu path) are urldecoded on input (by PHP).
This means, the URLs that result from user defined menu paths and aliases will always be valid, even menu paths that use punctuation like "#" or "!" or even random Unicode characters.
e.g.
Path/Alias = blog/Bunnies are made of people!?
Resulting URI = http://example.com/base-path/?q=blog/Bunnies+are+made+of+people%21%3F
Path/Alias = blog/My résumé
Resulting URI = http://example.com/base-path/?q=blog/My+r%C3%A9sum%C3%A9
Path/Alias = blog/アニメ
Resulting URI = http://example.com/base-path/?q=blog/%E3%82%A2%E3%83%8B%E3%83%A1
In spite of this, path.module requires that path aliases contain only characters valid in relative URLs. This makes no sense. The attached path removes this restriction.
This is a necessary step towards allowing e.g. pathauto to support arbitrary languages. The current practice of transliteration of letters to ASCII and removal of accents is a hack which produces 'prettier URLs', but which are less meaningful to search engines. It is also useless for languages which do not use the latin script.
Note that the 'odd' escapes for the Unicode characters above is perfectly normal. This is the standard used for IRIs (the i18n'd form of URIs, see RFC 3987) and supported by all the major browsers and search engines.
However, because of phishing abuse, some browsers will not show the Unicode characters in some or all IRIs in the address bar and/or status bar. e.g. Japanese Wikipedia on Google.
Comments
Comment #1
Steven CreditAttribution: Steven commentedThings to know:
This means, the URLs that result from user defined menu paths and aliases will always be valid, even menu paths that use punctuation like "#" or "!" or even random Unicode characters.
e.g.
Path/Alias =
blog/Bunnies are made of people!?
Resulting URI =
http://example.com/base-path/?q=blog/Bunnies+are+made+of+people%21%3F
Path/Alias =
blog/My résumé
Resulting URI =
http://example.com/base-path/?q=blog/My+r%C3%A9sum%C3%A9
Path/Alias =
blog/アニメ
Resulting URI =
http://example.com/base-path/?q=blog/%E3%82%A2%E3%83%8B%E3%83%A1
In spite of this, path.module requires that path aliases contain only characters valid in relative URLs. This makes no sense. The attached path removes this restriction.
This is a necessary step towards allowing e.g. pathauto to support arbitrary languages. The current practice of transliteration of letters to ASCII and removal of accents is a hack which produces 'prettier URLs', but which are less meaningful to search engines. It is also useless for languages which do not use the latin script.
Note that the 'odd' escapes for the Unicode characters above is perfectly normal. This is the standard used for IRIs (the i18n'd form of URIs, see RFC 3987) and supported by all the major browsers and search engines.
However, because of phishing abuse, some browsers will not show the Unicode characters in some or all IRIs in the address bar and/or status bar. e.g. Japanese Wikipedia on Google.
Comment #2
chx CreditAttribution: chx commentedLovely patch. Less restrictions, more features, less code, more comments.
Comment #3
Dries CreditAttribution: Dries commentedCommitted to CVS HEAD! :)
Comment #4
(not verified) CreditAttribution: commented