I get "Not a valid URL" when trying a URI which has Hebrew characters. This is a big issue - there is no logic behind this, as paths and file names may have characters other than Latin ones these days.

Thanks!

Comments

Krummrey’s picture

Same for the german Umlauts (äöü) and ß.

These do appear in working URLs
www.grüne.de

dropcube’s picture

I confirm the same, is not possible to add an URL with 'ñ' in it.

jcfiala’s picture

Assigned: Unassigned » jcfiala

Quite right, folks - there shouldn't be any reason for that. I'll move this issue up in my mental list. That said, if anyone would like to work on a patch, that would be great!

dropcube’s picture

Title: Allow non-English characters in a URI » Allow Internationalized domain name and non-ASCII characters in a URI
jcfiala’s picture

Version: 6.x-2.5 » 6.x-2.6

Better title, I agree. Also, let's get the Version number correct, ya?

dropcube’s picture

dropcube’s picture

Title: Allow Internationalized domain name and non-ASCII characters in a URI » Allow Internationalized domain name
Issue tags: +IDN
dropcube’s picture

Title: Allow Internationalized domain name » Allow Internationalized domain names
hass’s picture

+

jcfiala’s picture

Version: 6.x-2.6 » 6.x-2.x-dev
Priority: Critical » Normal
Status: Active » Postponed (maintainer needs more info)

Alright, so here's what's been done. If you're using international urls that aren't working, then you can now (in the latest patches) go into the field and turn off url validation. This will allow people to enter just about any url they feel like, without throwing an error.

I've also added the ß and ñ.

Hebrew characters are giving me a bit of a trouble, though. Is there an expert in the house who can work with me on how to change the regular expressions in link_validate_url() to include the hebrew characters without listing every possible one?

Amir Simantov’s picture

@#10
Good to hear that some progress is taken into account. However, I am afraid that neither of the options is good;

Turning off validation - well, this is going to far - one of the greater benefits of using the link module is that is prevents entering a bad string, by mistake.

Adding specific characters - there will be more to find out as people which use more languages will tackle this issue; adding more and more specific characters is an endless task.

I will try to get some ideas from people on how to solve this problem in more a generic way.

Amir

hass’s picture

"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.

jcfiala’s picture

"ß" is not allowed in domain/host names (and only in path)! ä, ü, ö is allowed.

hass, you have to give the code time to get into the dev release. I only uploaded it an hour ago.

jcfiala’s picture

@11: I agree that turning off validation is excessive!

The base problem is that I need to modify the regex to handle/include unicode characters, without including unicode symbols. This is something I'm not familiar enough with to do yet - I attempted to add \u#### to the regex, and got errors back out again. I'm open for suggestions and help of any sort!

yhager’s picture

Is there an RFC for this we can use for the validation?

hass’s picture

@jcfiala: No, "ß" is a letter that is not allowed in hostnames. You cannot register domains with this letter. If you have allowed it in your code, this is wrong.

jcfiala’s picture

@hass

Ah, that's what you meant. Do you have a reference that declares "ß" character-non-grata for hostnames?

hass’s picture

jcfiala’s picture

Status: Postponed (maintainer needs more info) » Active

Interesting.

Okay, I'll change the code so that "ß" is not allowed in the domain name, but is allowed in the path.

jcfiala’s picture

Status: Active » Fixed

Okay, the ß, which I have learned from wikipedia to call the Eszett character, is no longer allowed in domain names, but can be used in other locations.

hass’s picture

chx suggested in #389278: Create IDN encoding and decoding functions to write an extra module only for IDN validation. I also think this would be a great idea... but I'm not familiar enough with all IDN rules to maintain such a module.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Amir Simantov’s picture

Status: Closed (fixed) » Active

@#20 - jcfiala - This is not fixed at all; re-acticating.

jcfiala’s picture

Status: Active » Postponed (maintainer needs more info)

Amir, can you please give more detail on what, exactly, is not fixed at all, and which version you were testing when you were trying it?

Giving exact urls that failed and should not have (or which were accepted and should not have) is _really_ useful for writing tests.

Amir Simantov’s picture

OK - here is a link to an item in Drupal Israel site - you may see gibberish if you have no Hebrew fonts installed:

http://www.drupal.org.il/content/הוספת-פורום

Thanks.

unic’s picture

And I can't set url for cyrillic domain names.
For example: http://президент.рф

unic’s picture

Version: 6.x-2.x-dev » 6.x-2.9
Status: Postponed (maintainer needs more info) » Active

I think a general solution required. Regardless of culture.

jcfiala’s picture

I'm quite agreed that a general solution is required. That's part of the reason why this ticket is currently stalled - I don't have a general solution on hand, and higher-priority items are keeping me busy.

unic’s picture

You can use Punycode conversion then validate as usual I think.

Complete Punycode converter class: http://www.phpclasses.org/browse/download/1/file/5845/name/idna_convert.....

Just

<?php
  $IDN = new idna_convert();
  $punycoded = $IDN->encode($url);
  //validate $punycoded
?>

And don't forget "рф" domain :-)
Is this helpful?

dqd’s picture

Title: Allow Internationalized domain names » Allow Internationalized domain names (new link to the discussion below)
Status: Active » Closed (duplicate)

@unic, I thought the same. Can you please post your idea anew? There is a chance for support on this.

Please go here #1319520: Gathered: Internationalized domain names (punycode)

to join a discussion I set up on this and to contribute for faster implementation. So I mark this here as duplicate to make sure that evereybody joins the centralized discussion on this task, which I would like to put more attention on next time.

Thanks for the effort.

virgo’s picture

Hi-

Is there anything as of now what works ?

at moment, if you put in link field someting like äriportaal.ee, it will generate link like yourdomain.com/%C3%A4riportaal.ee

What can be done ?

Regards,

Virgo