Problem/Motivation
URL replacement for Youtube and Vimeo works great when the URLs are in the body of text on their own. However, if you have a Youtube URL as an HTML element attribute (for example, an tag's href attribute) then the module attempts to also replace those links. Which causes all sorts of broken HTML. This is not limited to just tags, this could cause all sorts of problems with data attributes, or other similar things.
Steps to reproduce
Create some content with the following HTML
<p>This is my youtube URL: https://www.youtube.com/watch?v=HmZKgaHa3Fg</p>
This is a <a href="https://www.youtube.com/watch?v=S7SLep244ss">link to another youtube video</a>
If you attempt to process that, the video in the anchor href will also be replaced and you will end up with some very weird HTML elements in your DOM.
Proposed resolution
Modify the regular expressions such that they will not match URLs if they have any character other than the following ones preceding it:
Any space character (\s)
A closing > character
A closing ] character
Or if the link is the very first thing on the line
This will prevent any HTML attributes from triggering the replacement, unless of course they are badly formed and have spaces or those other characters in them, which they should not.
We can also consolidate the different Youtube regexes into one and clean up the regexes to make it more strict. For example, at the moment during the URL replacement it could in theory match URLs like so:
swww.youtube.com
httpwww.youtube.com
syoutube.com
There is another issue that touches on some of this code too, so I have incorporated those changes into my patch too:
| Comment | File | Size | Author |
|---|---|---|---|
| #6 | url_to_video_filter-html_attribute_issues-3191775-6.patch | 13.2 KB | huzooka |
| #3 | url_to_video_filter-html_attribute_issues-3191775-3.patch | 5.06 KB | owenbush |
| #3 | interdiff-2-3.txt | 1.9 KB | owenbush |
Comments
Comment #2
owenbush commentedAttaching a patch to address all the issues outlined. This is being moved to needs review now there is a patch.
Comment #3
owenbush commentedAn issue was discovered with youtu.be URLs which I should now have addressed. Attached is an interdiff and a new patch.
Comment #4
huzookaI had issues with additional query args, so I improved #3 a bit:
https://www.youtube.com/watch?v=iutmWwkOcYg&t=20s(previously the idendified ID wasiutmWwkOcYgt=20s, now it isiutmWwkOcYg)v,UrlToVideoFilterService::convertYouTubeUrlToEmbedCodewas not able to find the ID - e.g.https://www.youtube.com/watch?app=desktop&v=iutmWwkOcYNow it is able to get the ID.UrlToVideoFilterService::convertYouTubeUrlToEmbedCodereturns an embed placeholder with the whole link as ID. So I changed all thepreg_replace()functions withpreg_match()- it there is no match,UrlToVideoFilterService::convertYouTubeUrlToEmbedCodereturns NULL.?mark. With this patch the issue is gone.(With the changes I applied unit test is passing on my side.)
Comment #5
huzookaMy following test "string" still fails:
I get
Why? Because in
UrlToVideoFilterService::convertYouTubeUrls(), every occurrence of the first link gets replaced by its embed code, which means that the second link,https://youtu.be/iutmWwkOcYg?t=10will be also partially replaced, excluding the query arg(s) (in this case,?t=10).I went ahead and introduced a much more stricter replacement performed by
preg_replace()(see interdiff).Comment #6
huzookaHad to fix the YouTube pattern: their video IDs can contain underscores!
Comment #7
jaypanThank you for your work on this. This has been implemented in 3.x.x.