I'm using the media_youtube module with the emline module to transform youtube URLs into a video. Youtube uses hyphens (-) in their video IDs and these URLs are not transformed. I've tracked the problem down to the regex in the _eminline_url() function. I'm surprised that this issue hasn't been noticed in the year since it was committed in #811090: Embedded Inline Media filter regex out of date - URL can't be wrapped in HTML tags.

(?<=[\s(>])((http://|https://)([a-zA-Z0-9@:%_+*~#?&=.,/;-\[\]]*[a-zA-Z0-9@:%_+*~#&=/;-\[\]]))(?=[.,?!\s)<])

In the current regex the hyphen is in the middle but is not escaped and simply escaping the hyphen, as follows, resolves the issue for me.

(?<=[\s(>])((http://|https://)([a-zA-Z0-9@:%_+*~#?&=.,/;\-\[\]]*[a-zA-Z0-9@:%_+*~#&=/;\-\[\]]))(?=[.,?!\s)<])

My PHP version is 5.3.10 in case you care.

CommentFileSizeAuthor
#1 eminline-Regex_hyphen_fix-1868588-1.patch720 bytestangent
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

tangent’s picture

Here is a patch.

tangent’s picture

Status: Active » Needs review

Sorry, forgot to change status.

tangent’s picture

Below is a URL which fails to transform for me, due to the hyphen in the URL.

http://www.youtube.com/watch?v=P0xVp3N-M84

reswild’s picture

Status: Needs review » Reviewed & tested by the community

I've tested the patch, and it seems to be working as it should.

zoo33’s picture

OMG yes. RTBC!

The current regex breaks URLs without hyphens too. Example:

<p>http://www.youtube.com/watch?v=2eDW7W_au6Y</p><p>Some nice music therapy for all of you.</p>

gets filtered down to:

http://www.youtube.com/watch?v=2eDW7W_au6Y</p><p>Some

…breaking the markup of the page. The video shows, but strange things happen with the content that comes after.

If you don't escape the hyphen, it is used as a "from/to" symbol which makes the regex too greedy in my case, and maybe too narrow in other cases. The patch fixes it.

ron_s’s picture

Title: URL detection regex does not match hyphens » URL detection regex does not match hyphens / breaks HTML markup

Very nice, thank you for this patch... can confirm it is working for us as well. Like zoo33, we were seeing embedded videos break the markup regardless of whether or not there was a hyphen. For example, with HTML like this:

<p><a href="http://www.youtube.com/watch?v=RmMojHCPMJ4">http://www.youtube.com/watch?v=RmMojHCPMJ4</a></p>
<p>This is sample text</p>

... the regex was causing the embedded video HTML to break. The video would be rendered, but PHP would try to fix the broken HTML by wrapping both the video and the text which follows it in anchor tags. So the "This is sample text" in my example above would be converted to this in the final version.

<p><a href="http://www.youtube.com/watch?v=RmMojHCPMJ4">This is sample text</a></p>

Thanks for taking time to share a solution. I've added to the title in case anyone might be looking for this regarding the broken HTML issue.

  • Commit d33c125 on 6.x-2.x by aaron:
    Issue #1868588 by tangent: URL detection regex does not match hyphens /...
aaron’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Fixed

Committed to http://drupalcode.org/project/emfield.git/commit/d33c125.

Sorry, and thank you for your patience.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.