We are currently experiencing problems with oEmbed when the body contains certain HTML.

I have tracked down the problem to line 14 in oembed_legacy.inc:

$text = preg_replace_callback("[regex]", '_oembed_preg_parse', $text);

The above regex seems to be a bit to hard to complete when the body contains something like the following:

<p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">
</p>

(the inline-styling are not ours, this markup comes from embedding an instagram-post)

It seems that it fails because it is not a possible match to have no URL. This means that it has to try to match the string with any given value, backtracking every time a value is not found. This results in a lot of backtracking potentially. Adding a ? after the URL-matching will allow it to pass when no URL is found, avoiding a lof of the backtracking. Also, with a more complex body-field (the entire embed-code) it fails at group 6. The same fix applies there, adding a ? after the [ \n\r\t\)] in that group fixes it.

So the proposed regex is: `(^|<p(?:\s[^>]*)*>|<li(?:\s[^>]*)*>|<br(?:\s[^>]*)*>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-zA-Z0-9@:%_+*~#?&=.,/;-]*[a-zA-Z0-9@:%_+*~#&=/;-]))?([.,?!]*?)(?=($|</p>|</li>|<br\s*/?>|[ \n\r\t\)]?))`i

Also, setting the case-insensitive-switch and adding A-Z seems redundant.

So the improved regex-expression would be:
`(^|<p(?:\s[^>]*)*>|<li(?:\s[^>]*)*>|<br(?:\s[^>]*)*>|[ \n\r\t\(])((http://|https://|ftp://|mailto:|smb://|afp://|file://|gopher://|news://|ssl://|sslv2://|sslv3://|tls://|tcp://|udp://)([a-z0-9@:%_+*~#?&=.,/;-]*[a-z0-9@:%_+*~#&=/;-]))?([.,?!]*?)(?=($|</p>|</li>|<br\s*/?>|[ \n\r\t\)]?))`i

I will create a patch and attach in a moment.

CommentFileSizeAuthor
#2 oembed_legacy_regex-2602294-2.patch1.1 KBmian3010
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

mian3010 created an issue. See original summary.

mian3010’s picture

Attached patch