There are quite a bit of delicate regex being used to parse the URL string. While we can not remove all regex because some are still needed, is there a reason why parse_url() is not being used where it can be?

Comments

quicksketch’s picture

Mostly because link's checking is *very* loose. It also parses internal (without http://www) and external links. I'm not entirely sure parse_url() couldn't be used, but if it could, it'd be great to reduce some of that code. We've had many issues in the past to loosen the URL checking to allow all kinds of invalid characters, so it might be reopening several problems if we switch the validation mechanism.

dragonwize’s picture

Well, parse_url does not validate the URL it only parses it. So as long as the URL follows standards like use :// after the protocol execpt for file:/// which allows 3 slashes, query begins with ?, etc. then parse_url will be able to correctly and easily break down the URL then we can use most of current regex just for validation when and where it is needed.

Since link is being extremely loose in its valdiation, URLs are constantly get more valid characters, and there already more issues requesting for the validation to be more loose or even turn it off, we might want to think about just dropping all validation except the most basic like parse_url does or put an option to enable validation instead disabling it. Just a thought.

budda’s picture

jcfiala’s picture

Version: 6.x-2.5 » 6.x-2.9
Assigned: Unassigned » jcfiala

I should have a look at this some more, and see about getting the same change into the 7.x branch.

dqd’s picture

Status: Active » Closed (duplicate)
Issue tags: +field validation

Dear followers of this issue: please read the project page info of link module for further validation issues. There is already an issue to collect and discuss all possible validation scenarios in general. That's why I will mark this one here as duplicate. I need all concentration inside the ONE and only discussion to move forward. After a D7 implementation we will provide a D6 backport.

Explanation: There are too many corner cases and validation wishes of users to implement them all serially one after the other. We would have a 40 lines cluttered settings form for validation methods only conflicting each other. I think, the right way is to find a maybe more complex but all embracing configuration method, which lets the admin better decide how and when to validate the url. Including a good description which helps to set it up.