testcase html:

<a href="http://drupal.org/" title="the official website">Drupal</a> is an open source content management platform.

result:

Drupal
is an open source content management platform.

expected result:

Drupal [1] is an open source content management platform.

[1] http://drupal.org/

problem:

$pattern = '@(<a[^>]+?href="([^"]*)">(.+?)</a>)@i';

fix:

$pattern = '@(<a[^>]+?href="([^"]*)"[^>]*?>(.+?)</a>)@i';

and attached.

if i see this right, drupal_html_to_text() also fails for utf8 with non ascii characters. but thats another issue.

CommentFileSizeAuthor
drupal_html_to_text.patch844 bytesax
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Gábor Hojtsy’s picture

Status: Needs review » Fixed

That is pretty straightforward and trivial. Committed. Thanks.

Anonymous’s picture

Status: Fixed » Closed (fixed)