This one is converted into link:

http://id.wikipedia.org/wiki/Polandia

This one not:

http://pl.wikipedia.org/wiki/Język_indonezyjski

Validation rules should be improved somehow.
I.e. to the next space, tag or break line.

Comments

kenorb’s picture

Status:Active» Needs review

Changed to different filter which works:
BBCode (Provided by bbcode)
Converts BBCode to HTML.
http://drupal.org/project/bbcode

Maybe it's good to copy BBCode replacements rules?

Filter method:

/**
* URL filter. Automatically converts text web addresses (URLs, e-mail addresses,
* ftp links, etc.) into hyperlinks.
*/
function _filter_url($text, $format) {
  // Pass length to regexp callback
  _filter_url_trim(NULL, variable_get('filter_url_length_'. $format, 72));

  $text = ' '. $text .' ';

  // Match absolute URLs.
  $text = preg_replace_callback("`(<p>|<li>|<br\s*/?>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))`i", '_filter_url_parse_full_links', $text);

  // Match e-mail addresses.
  $text = preg_replace("`(<p>|<li>|<br\s*/?>|[ \n\r\t\(])([A-Za-z0-9._-]+@[A-Za-z0-9._+-]+\.[A-Za-z]{2,4})([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))`i", '\1<a href="mailto:\2">\2</a>\3', $text);

  // Match www domains/addresses.
  $text = preg_replace_callback("`(<p>|<li>|[ \n\r\t\(])(www\.[a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+~#\&=/;-])([.,?!]*?)(?=(</p>|</li>|<br\s*/?>|[ \n\r\t\)]))`i", '_filter_url_parse_partial_links', $text);
  $text = substr($text, 1, -1);

  return $text;
}

BBCode method:

<?php
  // We cannot evaluate the variable in callback function because
  // there is no way to pass the $format variable
  if (variable_get("bbcode_encode_mailto_$format", 1)) {
    // Replacing email addresses with encoded html
    $body = preg_replace_callback('#\[email(?::\w+)?\]([\w\.\-\+~@]+)\[/email(?::\w+)?\]#si', '_bbcode_encode_mailto', $body);
    $body = preg_replace_callback('#\[email=(.*?)(?::\w+)?\](.*?)\[/email(?::\w+)?\]#si', '_bbcode_encode_mailto', $body);
  }
  else {
    $body = preg_replace(
      array('#\[email(?::\w+)?\](.*?)\[/email(?::\w+)?\]#si','#\[email=(.*?)(?::\w+)?\]([\w\s]+)\[/email(?::\w+)?\]#si'),
      array('<a href="mailto:\\1" class="bb-email">\\1</a>', '<a href="mailto:\\1" class="bb-email">\\2</a>'),
      $body);
  }

  // Turns web and e-mail addresses into clickable links
  if (variable_get("bbcode_make_links_$format", 1)) {

    // pad with a space so we can match things at the start of the 1st line
    $ret = ' ' . $body;
    // padding to already filtered links
    $ret = preg_replace('#(<a.+>)(.+</a>)#i', "$1\x07$2", $ret);

    // matches an "xxx://yyyy" URL at the start of a line, or after a space.
    // xxxx can only be alpha characters.
    // yyyy is anything up to the first space, newline, comma, double quote or <
    $ret = preg_replace('#(?<=^|[\t\r\n >\(\[\]\|])([a-z]+?://[\w\-]+\.([\w\-]+\.)*\w+(:[0-9]+)?(/[^ "\'\(\n\r\t<\)\[\]\|]*)?)((?<![,\.])|(?!\s))#i', '<a href="\1">\1</a>', $ret);

    // matches a "www|ftp.xxxx.yyyy[/zzzz]" kinda lazy URL thing
    // Must contain at least 2 dots. xxxx contains either alphanum, or "-"
    // zzzz is optional.. will contain everything up to the first space, newline,
    // comma, double quote or <.
     $ret = preg_replace('#([\t\r\n >\(\[\|])(www|ftp)\.(([\w\-]+\.)*[\w]+(:[0-9]+)?(/[^ "\'\(\n\r\t<\)\[\]\|]*)?)#i', '\1<a href="http://\2.\3">\2.\3</a>', $ret);

    // matches an email@domain type address at the start of a line, or after a space.
    // Note: Only the followed chars are valid; alphanums, "-", "_" and or ".".
    if (variable_get("bbcode_encode_mailto_$format", 1))
      $ret = preg_replace_callback("#([\t\r\n ])([a-z0-9\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", '_bbcode_encode_mailto', $ret);
    else
      $ret = preg_replace('#([\t\r\n ])([a-z0-9\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i', '\\1<a href="mailto:\\2@\\3">\\2@\\3</a>', $ret);

sdrycroft’s picture

The URL filter also fails if there are parentheses within an anchor:

Doesn't work:
http://iz.carnegiemnh.org/cranefly/tipulinae.htm#Tipula_(Nippotipula)_abdominalis

Works:
http://iz.carnegiemnh.org/cranefly/tipulinae.htm#Tipula_Nippotipula_abdo...

kenorb’s picture

Status:Needs review» Active
Eric Schaefer’s picture

Same for german umlauts.

kenorb’s picture

sun’s picture

Status:Active» Closed (duplicate)

Thanks for taking the time to report this issue.

However, marking as duplicate of #161217: URL filter breaks generated href tags.
You can follow up on that issue to track its status instead. If any information from this issue is missing in the other issue, please make sure you provide it over there.