Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
I get "Not a valid URL" when trying a URI which has Hebrew characters. This is a big issue - there is no logic behind this, as paths and file names may have characters other than Latin ones these days.
Thanks!
Comments
Comment #1
Krummrey CreditAttribution: Krummrey commentedSame for the german Umlauts (äöü) and ß.
These do appear in working URLs
www.grüne.de
Comment #2
dropcube CreditAttribution: dropcube commentedI confirm the same, is not possible to add an URL with 'ñ' in it.
Comment #3
jcfiala CreditAttribution: jcfiala commentedQuite right, folks - there shouldn't be any reason for that. I'll move this issue up in my mental list. That said, if anyone would like to work on a patch, that would be great!
Comment #4
dropcube CreditAttribution: dropcube commentedComment #5
jcfiala CreditAttribution: jcfiala commentedBetter title, I agree. Also, let's get the Version number correct, ya?
Comment #6
dropcube CreditAttribution: dropcube commentedSome related issues in core: http://drupal.org/project/issues/search/drupal?issue_tags=IDN
Comment #7
dropcube CreditAttribution: dropcube commentedComment #8
dropcube CreditAttribution: dropcube commentedComment #9
hass CreditAttribution: hass commented+
Comment #10
jcfiala CreditAttribution: jcfiala commentedAlright, so here's what's been done. If you're using international urls that aren't working, then you can now (in the latest patches) go into the field and turn off url validation. This will allow people to enter just about any url they feel like, without throwing an error.
I've also added the ß and ñ.
Hebrew characters are giving me a bit of a trouble, though. Is there an expert in the house who can work with me on how to change the regular expressions in link_validate_url() to include the hebrew characters without listing every possible one?
Comment #11
Amir Simantov CreditAttribution: Amir Simantov commented@#10
Good to hear that some progress is taken into account. However, I am afraid that neither of the options is good;
Turning off validation - well, this is going to far - one of the greater benefits of using the link module is that is prevents entering a bad string, by mistake.
Adding specific characters - there will be more to find out as people which use more languages will tackle this issue; adding more and more specific characters is an endless task.
I will try to get some ideas from people on how to solve this problem in more a generic way.
Amir
Comment #12
hass CreditAttribution: hass commented"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.
Comment #13
jcfiala CreditAttribution: jcfiala commented"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.
hass, you have to give the code time to get into the dev release. I only uploaded it an hour ago.
Comment #14
jcfiala CreditAttribution: jcfiala commented@11: I agree that turning off validation is excessive!
The base problem is that I need to modify the regex to handle/include unicode characters, without including unicode symbols. This is something I'm not familiar enough with to do yet - I attempted to add \u#### to the regex, and got errors back out again. I'm open for suggestions and help of any sort!
Comment #15
yhager CreditAttribution: yhager commentedIs there an RFC for this we can use for the validation?
Comment #16
hass CreditAttribution: hass commented@jcfiala: No, "ß" is a letter that is not allowed in hostnames. You cannot register domains with this letter. If you have allowed it in your code, this is wrong.
Comment #17
jcfiala CreditAttribution: jcfiala commented@hass
Ah, that's what you meant. Do you have a reference that declares "ß" character-non-grata for hostnames?
Comment #18
hass CreditAttribution: hass commentedhttp://www.google.de/m/search?q=idn+allowed+characters
Comment #19
jcfiala CreditAttribution: jcfiala commentedInteresting.
Okay, I'll change the code so that "ß" is not allowed in the domain name, but is allowed in the path.
Comment #20
jcfiala CreditAttribution: jcfiala commentedOkay, the ß, which I have learned from wikipedia to call the Eszett character, is no longer allowed in domain names, but can be used in other locations.
Comment #21
hass CreditAttribution: hass commentedchx suggested in #389278: Create IDN encoding and decoding functions to write an extra module only for IDN validation. I also think this would be a great idea... but I'm not familiar enough with all IDN rules to maintain such a module.
Comment #23
Amir Simantov CreditAttribution: Amir Simantov commented@#20 - jcfiala - This is not fixed at all; re-acticating.
Comment #24
jcfiala CreditAttribution: jcfiala commentedAmir, can you please give more detail on what, exactly, is not fixed at all, and which version you were testing when you were trying it?
Giving exact urls that failed and should not have (or which were accepted and should not have) is _really_ useful for writing tests.
Comment #25
Amir Simantov CreditAttribution: Amir Simantov commentedOK - here is a link to an item in Drupal Israel site - you may see gibberish if you have no Hebrew fonts installed:
http://www.drupal.org.il/content/הוספת-פורום
Thanks.
Comment #26
unic CreditAttribution: unic commentedAnd I can't set url for cyrillic domain names.
For example: http://президент.рф
Comment #27
unic CreditAttribution: unic commentedI think a general solution required. Regardless of culture.
Comment #28
jcfiala CreditAttribution: jcfiala commentedI'm quite agreed that a general solution is required. That's part of the reason why this ticket is currently stalled - I don't have a general solution on hand, and higher-priority items are keeping me busy.
Comment #29
unic CreditAttribution: unic commentedYou can use Punycode conversion then validate as usual I think.
Complete Punycode converter class: http://www.phpclasses.org/browse/download/1/file/5845/name/idna_convert.....
Just
And don't forget "рф" domain :-)
Is this helpful?
Comment #30
dqd@unic, I thought the same. Can you please post your idea anew? There is a chance for support on this.
Please go here #1319520: Gathered: Internationalized domain names (punycode)
to join a discussion I set up on this and to contribute for faster implementation. So I mark this here as duplicate to make sure that evereybody joins the centralized discussion on this task, which I would like to put more attention on next time.
Thanks for the effort.
Comment #31
virgo CreditAttribution: virgo commentedHi-
Is there anything as of now what works ?
at moment, if you put in link field someting like äriportaal.ee, it will generate link like yourdomain.com/%C3%A4riportaal.ee
What can be done ?
Regards,
Virgo