Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
As top level domain we should allow anything with more than 2 English letters (for example .berlin) because site administrators and module maintainers alike cannot ever keep up with ICANN inventing new top level domains all the time:
At the time of writing, the list of TLDs is currently greater than 1300.
Currently you get this validation error:
The value provided for Webseite is not a valid URL.
This is an alternate solution to #1846202: Periodically Update TLD list from Current IANA List.
Comment | File | Size | Author |
---|---|---|---|
#98 | link-2299657-for-7.x-1.4.patch | 2.83 KB | dsnopek |
#92 | allow_any_tld_because-2299657-92.patch | 3.31 KB | renatog |
| |||
#77 | After.png | 88.23 KB | renatog |
#77 | Before.png | 16.88 KB | renatog |
#76 | Screen Shot 2017-06-28 at 5.07.32 PM.png | 78.05 KB | rootwork |
Comments
Comment #1
tobiberlinI have the same issue now with exact the same domain extension (".berlin"). Would it be possible to patch the validation in a way that domain extensions with more then 3 (or how many are ever accepted) chars are possible? It must be the regular expression at line 1180 and following in link.module:
Unfortunately I am not familiar with regex so far so maybe somone can help who can handel regex?!
Best,
Tobias
Comment #2
tobiberlinFound out that in line 13 the domain endings are defined which will be checked:
Looking at all the new domain endings I think it would be time to change the whole regex against which the external links are checked.
Comment #3
GoddamnNoise CreditAttribution: GoddamnNoise commentedI've found the same problem with the new TLD .gal. If I patch the link.module file adding this new TLD to the list defined by the LINK_DOMAINS constant, it accepts the new TLD without any validation errors.
I agree with tobiberlin, maybe it's time to update the regex to support all the new TLDs out there. Or, at least, a configuration option could be added in the admin backend to populate the link_extra_domains variable to add new TLDs not directly supported by the LINK_DOMAINS constant:
Meanwhile, we can set the link_extra_domains variable in the settings.php file adding all the new TLDs we want to be supported, like this:
$conf['link_extra_domains'] = array('berlin', 'gal');
Comment #4
drummThis has been reported for Drupal.org: #2353449: request update of field values to allow new tlds
Comment #5
Leeteq CreditAttribution: Leeteq commentedThe issue in #4 is a closed (duplicate) support request pointing back to this issue; updating the Link module to support the new TLDs is the solution.
Comment #6
brice_gato CreditAttribution: brice_gato commentedA small correction on #4, $conf ['link_extra_domains'] is an array not a string. I added my new TLDs successfully and everything works like a charm :
$conf['link_extra_domains'] = array('paris', 'berlin', 'newyork');
Comment #7
GoddamnNoise CreditAttribution: GoddamnNoise commentedThanks brice_gato, it was a mistake. I've edited my previous post to do it right.
Comment #8
drummAha, I didn't read the code closely enough to see the new TLDs were easy to whitelist. I added them all for Drupal.org: https://bitbucket.org/drupalorg-infrastructure/drupal.org/commits/b34e14...
Comment #9
jcfiala CreditAttribution: jcfiala commentedWell, I'm certainly interested in hearing of patches. Since it's possible to override the list of domains in the the settings.php file, it's not something I think of as a Major problem, though.
Comment #10
mikeytown2 CreditAttribution: mikeytown2 commentedhttp://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
http://data.iana.org/TLD/tlds-alpha-by-domain.txt has the full list from what I can tell. Cron job to pull it in once a week?
Comment #11
jcfiala CreditAttribution: jcfiala commented@mikeytown2: Although you could certainly do that, it seems pretty heavy functionality to put into a field, although I could see an module that builds off of link that fetches that list.
Comment #12
Vacilando CreditAttribution: Vacilando commentedMuch needed!
Note: #1846202: Periodically Update TLD list from Current IANA List also contains some work towards pulling all valid TLDs from IANA.
Comment #13
quicksketchI agree with @jcfiala. Tracking TLDs is going to be more work than it's worth at this point. TLDs have turned into a wild-west. I think at this point we should bail on attempting to track top-level domains. Can we just nix the entire
_link_domains()
function,LINK_DOMAINS
constant, andlink_extra_domains
variable?Perhaps if we wanted to keep some TLD validation ability, we should instead introduce a variable for
link_allowed_domains
. If empty, then all TLDs are allowed just by basic regex checking.Comment #14
klonos+1
Comment #15
saltednutWe had some patches going for this in #2474547: Many new domains not supported by Link module, notably .website and .io but @quicksketch closed that issue as duplicate.
Comment #16
samuel.mortensonAgreed. With the amount of new TLDs whitelisting seems futile anyway, plus I don't see any negative of allowing users to enter whatever they want for the TLD as long as it only contains alphanumeric characters (per http://data.iana.org/TLD/tlds-alpha-by-domain.txt, some domains contain numbers, although it's rare).
Comment #17
GoddamnNoise CreditAttribution: GoddamnNoise commentedAgreed with #16
Comment #18
kenorb CreditAttribution: kenorb commentedComment #19
Leeteq CreditAttribution: Leeteq commentedIt makes sense to have a specific/other (sub?)module automatically update TLDs.
IMO, a black/whitelisting function should be included anyway.
This module could provide an option to treat the manually maintained list as either a blacklist or a whiltelist.
I would personally prefer a radio button plus a long-text text field where one could edit/maintain/paste a list of TLDs, one per line, to be taken into account both when NOT using the other suggested (sub?)module to automatically retrieve the latest valid public list, AND along with the other module IF the option is set to blacklist, to limit the public list.
Comment #20
Leeteq CreditAttribution: Leeteq commentedComment #21
jmart CreditAttribution: jmart commentedI looked in the database's variable table and link_extra_domains was not yet defined.
I added the following in my template.php:
variable_set('link_extra_domains', array('church'));
Refreshed my homepage and cleared the cache. Then I checked the database and saw that my variable was set. Then I commented it out from template.php. Now www.somechurch.church works.
I +1 on the idea that the module should allow a GUI interface for a whitelist of TLD domains. The automatic idea is nice but there is a lot of overhead for a few edge cases. Also going into settings.php is not good for basic site admins and some hosts don't let you in there.
Comment #22
egfrith CreditAttribution: egfrith as a volunteer commentedI've got a patch that implements the idea of @quicksketch at #13:
* Remove link_extra_domains variable
* Replace with link_allowed_domains
* If link_allowed_domains is set, only validate URLs with TLDs in link_allowed_domains, or other 2 character TLDs
* If link_allowed_domains is not set, validate any TLD between 2 and 63 characters in length, as per https://tools.ietf.org/html/rfc1034.
Most of the changes are within the _link_domains() function.
Comment #23
egfrith CreditAttribution: egfrith as a volunteer commentedComment #25
egfrith CreditAttribution: egfrith as a volunteer commentedI've refined the standard regexp, since it forbade hyphens and numbers, which are actually part of https://tools.ietf.org/html/rfc1034 and required for TLDs such as XN--ZFR164B .
I've also updated the test of external links to reflect the changed functionality
Comment #27
egfrith CreditAttribution: egfrith as a volunteer commentedThe problem with the last patch is that there is now nothing that rules out "rss.xml" from being a valid domain; before the ".xml" domain would not have given a match in $external_pattern . To try to fix this, I've moved the match against $internal_pattern_file above $external_pattern. This will, I think, break the recognition of "example.com" as a domain - but this is the price of not having a list of TLDs.
Comment #28
egfrith CreditAttribution: egfrith as a volunteer commentedComment #30
egfrith CreditAttribution: egfrith as a volunteer commentedAnother option is to give up on recognising protocol-less strings with dots in them as internal links, i.e. not testing that rss.xml is an INTERNAL_LINK.
Comment #31
reptilex CreditAttribution: reptilex commentedWhy is there no hybrid between this solution and the one offered under: https://www.drupal.org/node/1846202 ? Which uses a script to recover the current valid domains from icann.
Comment #32
egfrith CreditAttribution: egfrith as a volunteer commentedIt strikes me that we should really find out, and probably copy, how link fields are validated in Drupal 8, now that the link module is part of core. I had a quick look, but the structure of the code has changed quite a bit from Drupal 7. I'll take a longer look sometime - but would be happy if someone else gets there first.
Comment #33
katjam CreditAttribution: katjam commentedI think in Drupal 8 the uri is validated as a reachable link.
Comment #34
JonMcL CreditAttribution: JonMcL commentedSolution at #30 is working well for me with the new .nyc TLD.
Comment #35
Remon CreditAttribution: Remon commentedPatch #30 applies and works nicely on dev branch as well as link-7.x-1.4 release.
Comment #36
jp.stacey CreditAttribution: jp.stacey at Magnetic Phield commentedThe patch in #30 looks good. It also solves an important problem with hardcoded whitelists, that developers often use local URLs beginning http://foo.vagrant/ or http://bar.local/ .
Arguably the most reliable source of information on a specific domain's validity is a DNS lookup, so if we're not planning on doing that then a configurable (and default open) whitelist is definitely the best to avoid hurting developers.
Comment #37
Remon CreditAttribution: Remon commentedI've taken the liberty of changing status to RTBC.
Comment #38
kristiaanvandeneyndeLooks good, just a few nitpicks:
Can go straight back to RTBC AFAIAC when the above issues are fixed.
Comment #39
jtsnow CreditAttribution: jtsnow at Mirum Agency commentedThis patch addresses the following items mentioned in comment #38:
* Removed the trailing whitespace.
* Edited doxygen comments to fit 80 character limit.
There is no need to delete the `link_extra_domains` since it can still be used. The patch introduces a sensible default in case the `link_extra_domains` variable is not set.
Comment #41
kristiaanvandeneyndeThe new variable is allowed domains, the old extra domains is no longer used. So that one should be removed in an update hook. Also, the new one should be deleted in hook_uninstall().
Comment #42
klausiClosed #2747561: Allow any TLD because site admins can never keep up with ICANN as duplicate.
Comment #43
klausiComment #44
klausiUnrelated changes, they should be removed from the patch. Make sure to remove all unrelated whitespace changes, so that patch review is easier.
First sentence of a function comment should be just one line under 80 characters. I suggest "Returns the list of allowed domains.". Further comments should be a separate paragraph.
Comment #45
ChuChuNaKu CreditAttribution: ChuChuNaKu as a volunteer and at Acquia commentedI made the following changes to the latest patch per the following comments
#41- Removed unused link_extra_domains variable in an update hook.
- Deleted new link_allowed_domains variable in a hook_uninstall().
#44- Removed unrelated whitespace changes.
- Changed first sentence of _link_domains function comment.
Comment #47
ChuChuNaKu CreditAttribution: ChuChuNaKu as a volunteer and at Acquia commentedHere's an attempting at a passing test..
Comment #48
rootworkI can confirm that #47 applies and fixes the issue for me. Given the technical review above, I'll let others confirm this and set it to RTBC.
Comment #49
doraf CreditAttribution: doraf commentedWe're running into an issue with a .community TLD address, and this patch looks like it would help solve our issue. We haven't been able to test this patch yet, but we're (my team) is hoping it gets committed soon. Thanks!
Comment #50
DamienMcKenna+1
Comment #51
rootworkComment #52
ultimikeYes! This appears to be a sorely needed addition!
-mike
Comment #53
klausiThis is a bad example in a test case, because you should always be explicit what your test requires. Not simply taking whatever is on some admin role, which is really obscure.
I think the tests are failing because the new "administer fields" permission is missing. Can you try again with that one set?
Also, the test fixes are unrelated to this issue, so should probably be done in a separate issue.
Comment #54
Graber CreditAttribution: Graber as a volunteer commentedThis patch solves the problem but URLs like http://some.domain.com/a-z/?D=0&S=0&DT=||&id=11 now don't get validated properly and the unpatched version didn't have this problem.
Ok. It's a completely different issue and applies to 1.4 version.
Comment #55
idebr CreditAttribution: idebr at ezCompany commented@klausi I have contributed a patch that fixes the failing tests in #2843813: Fix failing tests due to missing 'administer fields' permission
Comment #56
ChuChuNaKu CreditAttribution: ChuChuNaKu as a volunteer and at Acquia commentedThanks for your help @idebr !!! This patch removes the simpletests included in comment number 47 and as a result may fail the automated testing. However, once #2843813: Fix failing tests due to missing 'administer fields' permission is accepted , this patch should pass automated testing.
Comment #57
ChuChuNaKu CreditAttribution: ChuChuNaKu as a volunteer and at Acquia commentedComment #59
idebr CreditAttribution: idebr at ezCompany commented@ChuChuNaKu Thanks, your patch works as expected.
I updated the patch to remove the trailing white-space changes. These changes are already in other popular patches for the Link module and it caused a failure to apply the patch. For reference: https://www.drupal.org/node/2666912#comment-11204309
Comment #60
idebr CreditAttribution: idebr at ezCompany commentedComment #62
rv0 CreditAttribution: rv0 at Coworks.be commentedPatch works fine for me. Hope the tests are working again soon...
Comment #63
renatogHi people, in attachment the patch with Settings Page.
By default the value continues.
If user need can be change in settings page.
Comment #65
renatogCommited in dev branch.
Regards.
Comment #66
rootwork@RenatoG Honestly I don't think the fix in #63 addresses the issue, and it's a little surprising that you abandoned the path of the patches upthread in this issue without really explaining why.
The issue title is "Allow any TLD because site admins can never keep up with ICANN."
The fix in 63 makes the default list of TLDs configurable by admins. Which is clearly better than it not being configurable. But it still puts the burden on site admins to know and keep updated a list of valid TLDs. And unless I'm missing something, there's no way to turn off this check entirely (and allow any TLDs).
If you don't want to remove this check entirely, as was done all along in this issue until the commit abandoned this path, could we at least be able to turn this check entirely off for admins who don't want to have to maintain this list?
Comment #67
r0nn1ef CreditAttribution: r0nn1ef as a volunteer and commentedI kind of agree with @rootwork configurable is better than not, but there is a downloadable list of valid TLD's from ICANN. Couldn't a cron be set up to download the list periodically and stored in the variable? The page to download the list is at https://www.icann.org/resources/pages/tlds-2012-02-25-en.
Comment #68
othermachines CreditAttribution: othermachines commentedAgree with @rootwork that there should be an option to turn validation off.
Comment #69
kristiaanvandeneyndeDisagree. The list will keep growing and growing with all the custom TLDs being sold nowadays. Making this configurable will only result in bad UX because the people configuring something through a UI are often the same people who wouldn't know how or where to find a recent list of TLDs to begin with.
I prefer the original approach of removing the check altogether.
Comment #71
renatogI agree. Reverted for review of #60
Comment #72
renatogRe-up patch #60 of @idebr
Regards.
Comment #73
kristiaanvandeneyndeSame comment as #41. Patch in #60 had it right regarding the variables. Uninstall should no longer care about the link_extra_domains variable.
Comment #74
renatogPatch with #43 recommendations.
Regards.
Comment #75
rootwork#74 looks good to me. I'll try to do a full test later this afternoon.
Comment #76
rootworkAll looks good.
Without the patch, edit form errors on save:
With the patch, edit form saves and link is preserved:
Comment #77
renatogThank you very much for your review @rootwork.
Works good with question of: #2861854: .dance domain is not a valid url
Before:
After:
Comment #79
renatogFixed.
Commited in dev branch.
Thank you all for contributions.
Good Work.
Regards.
Comment #81
renatogHi people.
Reverted because the change made based on #60 is accepting all the links in validation:
Look:
Link: www.link.incorrectlink
And passed in validation:
That is incorrect, Is not it?!
Reverted in dev branch.
Regards.
Comment #82
renatogComment #83
renatogComment #84
klausiI'm sure *.incorrectlink will soon be registered with ICANN and will be one of the next TLDs :-P
www.link.incorrectlink is a perfectly correct link and should pass validation because "incorrectlink" might be a TLD in the future.
Comment #85
renatogOk, so we should update: "http://www.example.frog/" in link.validate.test, right?!
Thanks @klausi
Good Work and Good Weekend.
Regards.
Comment #86
klausiYep, we should update the tests to reflect that.
Comment #87
rootworkHere's a good list of valid and invalid URLs:
http://formvalidation.io/validators/uri/
WHATWG has a good living standard on URLs, which (in my opinion) better represents the way URLs are actually being used right now, in comparison to the more restrictive RFC 3986. RFC 3986 only permits ASCII (non-ASCII characters are converted to punycode). The WHATWG guidelines are more generous:
Here's the WHATWG example list of valid and invalid URLs:
https://url.spec.whatwg.org/#urls
Both standards incorporate IPv6 addresses.
Comment #88
renatogComment #89
renatogHi people.
In attachment the patch with fix.
Good Work.
Regards.
Comment #90
rootworkApart from the testing failure, I wonder if it would be possible to still test for valid domains. I posted the references in #87 because there are guidelines for valid URLs, even if we allow any TLDs. What do you think @RenatoG?
Comment #91
renatog;
Comment #92
renatogHi guys!
@rootwork; I agree with you.
In attachment new patch with tests updates based on #87
Tests: Passed
Thank you very much.
Good Work.
Regards
Comment #93
renatogComment #94
klausiLooks good, thanks for the update!
Comment #96
renatogHi people!
Fixed.
Thank you all for contributions.
Commited and pushed in dev branch: 7.x-1.x
Good Work.
Regards.
Comment #98
dsnopekFor those still on Link 1.4, who aren't ready to go to the 1.5-beta's, here's a version of this patch that works for 1.4
Comment #99
iMiksuWhat about non-alphabetic TLDs? From full list, you can see TLDs like:
Root Zone Database: http://www.iana.org/domains/root/db
Comment #100
amazingrandoI'm on Link 1.4 and the patch in #98 worked for me.
Comment #101
steva1982 CreditAttribution: steva1982 commentedThe patch #98 would be very useful.
When do you think you will commit it?
Thanks.
Comment #102
iaminawe CreditAttribution: iaminawe commentedUnfortunately the patch in #98 no longer works against the latest release 7.x-1.6. It would be great if this functionality was integrated with the full release.
Comment #103
DamienMcKennaAnyone using this patch should grab the 7.x-1.x-dev snapshots until the maintainers can do a 7.x-1.7 release (https://www.drupal.org/project/link/issues/3037467).
Comment #104
drummThe commit for this,
232dc95
went into 7.x-1.5-beta2 and I’ve seen it working in 7.x-1.5 & 7.x-1.6.