This is, of course, contradicts the statement under input formats which declares that "Web page addresses and e-mail addresses turn into links automatically."

Viewed in links which contains Hebrew character. For example: domain.com/א

If you have no Hebrew characters installed and configured in your computer, you see Gibberish after the slash :-)

Comments

Drave Robber’s picture

Title: Web page addresses and e-mail addresses DOES NOT turn into links in comments for non-English characters » Web page addresses and e-mail addresses DO NOT turn into links if they contain non-English characters
Version: 6.9 » 6.26
Component: comment.module » filter.module

Indeed:

This has nothing to do with comment.module however, but everything with filter of which input formats consist. (this would happen on node body, too)

Related issue (it seems ! also breaks it): #1610342: Some API links are not converted into proper links

Sepero’s picture

Version: 6.26 » 6.28

This and a few other bugs (1, 2) appear to be a result of filter.module trying to be too smart.
All this stuff for what is url accepted is overkill: [a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)

I've just now looked into the D7 code for this and it still does not recognize foreign characters. Why try to be so smart and accidentally restrict valid things? Perhaps instead - identify what should be recognized as knowlingly invalid, and use a much simpler non-greedy Negation type of regex. For example, allow every type character except: \)\<\s and perhaps a few others, and stop before exceptions like if it ends with a period or comma followed by whitespace

Sepero’s picture

Here's a fix. Go into modules/filter/filter.module and replace the function _filter_url with this:

function _filter_url($text, $format) {
  // Pass length to regexp callback
  _filter_url_trim(NULL, variable_get('filter_url_length_'. $format, 72));

  $text = ' '. $text .' ';

  $head = "(<p>|<li>|<br\s*/?>|[\s\(]|&nbsp;)";
  $tail = "([.,?!]*?)(?=([\s\)\<]))";
  $url_protocol = "http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://";
  $url_address = "[^\s\)\<@]*[^\s\)\<\,,?!]";
  // RFC3490 allows for unicode characters in email addresses. http://www.faqs.org/rfcs/rfc3490.html
  $email_address = "[^\s@\(\)\<\>]";
  
  // Match absolute URLs.
  $text = preg_replace_callback("`$head(($url_protocol)($url_address))$tail`i", '_filter_url_parse_full_links', $text);

  // Match e-mail addresses.
  $text = preg_replace("`$head($email_address+@$email_address+\.$email_address{2,4})$tail`i", '\1<a href="mailto:\2">\2</a>\3', $text);

  // Match www domains/addresses.
  $text = preg_replace_callback("`$head(www\.$url_address)$tail`i", '_filter_url_parse_partial_links', $text);
  $text = substr($text, 1, -1);

  return $text;
}

It fixes all these open bugs:
https://drupal.org/node/550464
https://drupal.org/node/2016089
https://drupal.org/node/1899246
https://drupal.org/node/1480992
https://drupal.org/node/1055864

It also fixes unknown/unmarked bugs:
Can't use foreign characters in email addresses.
Can't use many valid characters like '$%' in email addresses.
Can't preceed www web addresses with an html break.
Many html codes can't be used to end a url or email link.

Hanno’s picture

Status: Active » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.

apaderno’s picture

Version: 6.28 » 6.x-dev
Issue summary: View changes