This issue has been reported privately to the security team but it was decided to handle this as public security improvement since no direct vulnerability is involved. This issue was reported by mvonfrie.

Problem/Motivation

valid_url() report http://- as valid URL:
$valid = valid_url('http://-', true);

Urls look like this after the scheme (protocol) according to RFC 3986, section 3.2:

authority   = [ userinfo "@" ] host [ ":" port ]
host        = IP-literal / IPv4address / reg-name

reg-name should be "conform to the DNS syntax"

Important quote from below research:

The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.

UrlHelper::isValid() should reject a hostname that starts, ends, or is only a hyphen.

Proposed resolution

Improve the regex in UrlHelper::isValid() (valid_url() in Drupal 7).

Remaining tasks

Write a patch.
Review
Commit

User interface changes

none.

API changes

none.

Original comments:

#1 benjy commented April 2, 2015 at 6:35am
Component:		» Code
This just seems like a bug to me at this point and could easily be handled in public.
Comment #2 Pere Orga commented April 2, 2015 at 3:43pm
I have tried a few browsers and all think URL http://- is valid.

http://-: http://i.imgur.com/zjaPlzv.png
http://localhost: http://i.imgur.com/UzXV4Xe.png
Comment #3 benjy commented April 3, 2015 at 3:57am
From: https://www.ietf.org/rfc/rfc3986.txt

Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-"). Although schemes are case-
insensitive, the canonical form is lowercase and documents that
specify schemes must do so with lowercase letters. An implementation
should accept uppercase letters as equivalent to lowercase in scheme
names (e.g., allow "HTTP" as well as "http") for the sake of
robustness but should only produce lowercase scheme names for
consistency.

scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
So the example with the hyphen is valid because technically it could be part of the scheme?
Comment #4 benjy commented April 3, 2015 at 4:09am
OK, more reading, my first comment wasn't the reason.

So as this issue reports, the format after the scheme is: host        = IP-literal / IPv4address / reg-name

And reg-name is made up of: reg-name      = *( unreserved / pct-encoded / sub-delims )

And unreserved is: unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

Which explains the valid url when starting with a hyphen.
Comment #5 mvonfrie commented April 4, 2015 at 4:07pm
Yes, but exactly before the definition of

reg-name    = *( unreserved / pct-encoded / sub-delims )

the RFC says

Such a name consists of a sequence of domain labels separated by ".",
each domain label starting and ending with an alphanumeric character
and possibly also containing "-" characters.
This is not covered by the formal syntax specification. That means every host name can contain hyphens, but neither start nor end with them or just be a hyphen:

The labels must follow the rules for ARPANET host names. They must
start with a letter, end with a letter or digit, and have as interior
characters only letters, digits, and hyphen.

https://tools.ietf.org/html/rfc1035, 2.3.1
So in my understanding http://- is not a valid url as - is not a valid hostname.
Comment #6 Pere Orga commented April 4, 2015 at 7:36pm
In any case I also think this could be discussed in public. It doesn't look like any harm could be done.
Comment #7 benjy commented April 6, 2015 at 6:11am
+1 for public from me.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Ayesh’s picture

We definitely have a problem!
Domain names cannot start or end with hyphens, but:

  valid_url('http://-example.com-', TRUE); // true

filter_var works as it should:

  filter_var('http://-example.com-', FILTER_VALIDATE_URL); // false
  filter_var('http://-', FILTER_VALIDATE_URL); // false

It all probably comes down to * as the only correct regex to validate URLs with today's and future TLDs (http://වෙබ්.පාර්ලිමේන්තුව.ලංකා/ anyone?).

Version: 8.0.x-dev » 8.1.x-dev

Drupal 8.0.6 was released on April 6 and is the final bugfix release for the Drupal 8.0.x series. Drupal 8.0.x will not receive any further development aside from security fixes. Drupal 8.1.0-rc1 is now available and sites should prepare to update to 8.1.0.

Bug reports should be targeted against the 8.1.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.2.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.1.x-dev » 8.2.x-dev

Drupal 8.1.9 was released on September 7 and is the final bugfix release for the Drupal 8.1.x series. Drupal 8.1.x will not receive any further development aside from security fixes. Drupal 8.2.0-rc1 is now available and sites should prepare to upgrade to 8.2.0.

Bug reports should be targeted against the 8.2.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.3.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.2.x-dev » 8.3.x-dev

Drupal 8.2.6 was released on February 1, 2017 and is the final full bugfix release for the Drupal 8.2.x series. Drupal 8.2.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.3.0 on April 5, 2017. (Drupal 8.3.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.3.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.6 was released on August 2, 2017 and is the final full bugfix release for the Drupal 8.3.x series. Drupal 8.3.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.4.0 on October 4, 2017. (Drupal 8.4.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.4.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.4 was released on January 3, 2018 and is the final full bugfix release for the Drupal 8.4.x series. Drupal 8.4.x will not receive any further development aside from critical and security fixes. Sites should prepare to update to 8.5.0 on March 7, 2018. (Drupal 8.5.0-alpha1 is available for testing.)

Bug reports should be targeted against the 8.5.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.6 was released on August 1, 2018 and is the final bugfix release for the Drupal 8.5.x series. Drupal 8.5.x will not receive any further development aside from security fixes. Sites should prepare to update to 8.6.0 on September 5, 2018. (Drupal 8.6.0-rc1 is available for testing.)

Bug reports should be targeted against the 8.6.x-dev branch from now on, and new development or disruptive changes should be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

jibran’s picture

Version: 8.6.x-dev » 7.x-dev

valid_url doesn't exist in D8 anymore so moving the issue to D7.

jibran’s picture

Title: valid_url() allows invalid values » Url::isValid() allows invalid values
Version: 7.x-dev » 8.8.x-dev

Fix valid_url() (Url::isValid() in Drupal 8)

ah!

apaderno’s picture

Version: 8.8.x-dev » 8.9.x-dev
apaderno’s picture

Status: Active » Needs work
Issue tags: +Needs issue summary update

There isn't any Url::isValid() method, in Drupal 8.8.x.

apaderno’s picture

Title: Url::isValid() allows invalid values » UrlHelper::isValid() allows invalid values
Issue summary: View changes
Status: Needs work » Active
Issue tags: -Needs issue summary update
gapple’s picture

Here's a test-only patch, to cover leading and trailing hyphens in domain names.

gapple’s picture

Status: Active » Needs review
FileSize
1.74 KB

- Separate out the IPv4 test - it's now slightly more strict by requiring 4 number segments, but still allows invalid addresses such as 000.299.300.999
- Require that each domain segment is at least one alphanumeric character, and hyphens are only valid prior to an alphanumeric character or percent-encoded character.

Status: Needs review » Needs work

The last submitted patch, 14: drupal-2474191-14.patch, failed testing. View results

Version: 8.9.x-dev » 9.1.x-dev

Drupal 8.9.0-beta1 was released on March 20, 2020. 8.9.x is the final, long-term support (LTS) minor release of Drupal 8, which means new developments and disruptive changes should now be targeted against the 9.1.x-dev branch. For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

mrinalini9’s picture

Status: Needs work » Needs review
FileSize
1.74 KB

Rerolled patch to 9.1.x.

Version: 9.1.x-dev » 9.2.x-dev

Drupal 9.1.0-alpha1 will be released the week of October 19, 2020, which means new developments and disruptive changes should now be targeted for the 9.2.x-dev branch. For more information see the Drupal 9 minor version schedule and the Allowed changes during the Drupal 9 release cycle.

quietone’s picture

Issue summary: View changes
quietone’s picture

Retesting with 9.2.x

If that passed, then this is ready for review.

quietone’s picture

Title: UrlHelper::isValid() allows invalid values » UrlHelper::isValid() should handle hyphens, '-', correctly
ranjith_kumar_k_u’s picture

FileSize
52.74 KB
52.65 KB
192.48 KB

I have tested the last patch on 9.2 dev with few examples
Tested code
tested code

Before patch
before patch

After patch
after patch

greggles’s picture

Issue summary: View changes

The tests in #22 seem to indicate this is working properly. Also updating the issue summary to make it easier to find what the "right" behavior is.

dawehner’s picture

+++ b/core/tests/Drupal/Tests/Component/Utility/UrlHelperTest.php
@@ -101,6 +101,11 @@ public function providerTestInvalidAbsolute() {
+      '-',
+      '-example.com',
+      'example-.com',
+      'example.-com',
+      'example.com-',

I wonder whether it would be worth testing some combinations of minuses as well, like -`example-.com` or so, this way more cases are covered

Ayesh’s picture

Just a heads-up - upstream PHP has some changes, that parse_url function returns different results on latest PHP builds. I will update with more concrete examples soon.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.3.x-dev » 9.4.x-dev

Drupal 9.3.0-rc1 was released on November 26, 2021, which means new developments and disruptive changes should now be targeted for the 9.4.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

vikashsoni’s picture

Applied #17 patch in drupal-9.3.x-dev looking good and applied successfully
Thanks for the patch

benjifisher’s picture

Issue summary: View changes
Status: Needs review » Needs work
Issue tags: +Needs issue summary update

Thanks to everyone who has worked on this. Regular expressions are so much fun, aren't they?

I reviewed the issue summary, the comments, the code, and the test. I did not read the RFCs referenced in the issue summary. I would like to suggest several changes.

The most complicated problem is that the current code has a single regular expression that is supposed to match either a domain name or an IPv4 address: (?:[a-z0-9\-\.]|%[0-9a-f]{2})+. That single regular expression is not strict enough, and I think it makes sense to have separate expressions for the two options (as @gapple pointed out in #14).

The problem is that, when you do that, you are making the match for IPv4 addresses more strict. That is a good thing to do, but it is outside the scope of this issue.

We should change either the patch or the scope so that they match. I suggest we change the scope. That means updating the issue title and the description. When you do that, please be as clear as you can in the "Proposed resolution" section.


In #25, @Ayesh pointed out that there are changes to the parse_url() function in PHP 8. I reviewed the PHP docs for parse_url(). I think the only changes are whether "empty" query strings and fragments are represented by NULL or ''. That should not affect this issue.


I would like to point out that testValidAbsolute() already includes ex-ample.com as one of its test cases.


The following suggestions mostly refer to the main part of the patch:

-        (?:
-          (?:[a-z0-9\-\.]|%[0-9a-f]{2})+                        # A domain name or a IPv4 address
+        (?:                                                     # A domain name
+          (?:[a-z0-9]
+            (?:-?(?:[a-z0-9]|%[0-9a-f]{2}))*
+            (?:\.([a-z0-9]
+              (?:-?(?:[a-z0-9]|%[0-9a-f]{2}))*
+            ))*
+          )
+          |(?:[0-9]{1,3}(?:\.[0-9]{1,3}){3})                    # or a IPv4 address
           |(?:\[(?:[0-9a-f]{0,4}:)*(?:[0-9a-f]{0,4})\])         # or a well formed IPv6 address
  1. In #24, @dawehner suggested more test cases, with multiple hyphens. I think that is a good idea, and it is easy to add more test cases.
  2. Minor point: "or a IPv4 address" should be "or an IPv4 address". Better: match the format of the IPv6 comment.
  3. Move the "A domain name" comment to the correct line (down 1).
  4. The regular expression for a part of the domain name does not enforce either part of "start with a letter, end with a letter or digit". It should be something like [a-z]...[a-z0-9]. Except that "letter" should allow a URL-encoded character, so make that (?:[a-z]|%[0-9a-f]{2})...(?:[a-z0-9]|%[0-9a-f]{2})
  5. The regular expression -?(?:[a-z0-9]|%[0-9a-f]{2}) requires each hyphen to be followed by an alphanumeric character or URL-encoded non-ASCII character. This is stricter and more complicated than the correct regular expression, [a-z0-9\-]|%[0-9a-f]{2}.
  6. Add more test cases, such as "ex--ample.com", related to that last point.

Version: 9.4.x-dev » 9.5.x-dev

Drupal 9.4.0-alpha1 was released on May 6, 2022, which means new developments and disruptive changes should now be targeted for the 9.5.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 9.5.x-dev » 10.1.x-dev

Drupal 9.5.0-beta2 and Drupal 10.0.0-beta2 were released on September 29, 2022, which means new developments and disruptive changes should now be targeted for the 10.1.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

Version: 10.1.x-dev » 11.x-dev

Drupal core is moving towards using a “main” branch. As an interim step, a new 11.x branch has been opened, as Drupal.org infrastructure cannot currently fully support a branch named main. New developments and disruptive changes should now be targeted for the 11.x branch, which currently accepts only minor-version allowed changes. For more information, see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.