Problem/Motivation

URL replacement for Youtube and Vimeo works great when the URLs are in the body of text on their own. However, if you have a Youtube URL as an HTML element attribute (for example, an tag's href attribute) then the module attempts to also replace those links. Which causes all sorts of broken HTML. This is not limited to just tags, this could cause all sorts of problems with data attributes, or other similar things.

Steps to reproduce

Create some content with the following HTML

<p>This is my youtube URL: https://www.youtube.com/watch?v=HmZKgaHa3Fg</p>

This is a <a href="https://www.youtube.com/watch?v=S7SLep244ss">link to another youtube video</a>

If you attempt to process that, the video in the anchor href will also be replaced and you will end up with some very weird HTML elements in your DOM.

Proposed resolution

Modify the regular expressions such that they will not match URLs if they have any character other than the following ones preceding it:

Any space character (\s)
A closing > character
A closing ] character
Or if the link is the very first thing on the line

This will prevent any HTML attributes from triggering the replacement, unless of course they are badly formed and have spaces or those other characters in them, which they should not.

We can also consolidate the different Youtube regexes into one and clean up the regexes to make it more strict. For example, at the moment during the URL replacement it could in theory match URLs like so:

swww.youtube.com
httpwww.youtube.com
syoutube.com

There is another issue that touches on some of this code too, so I have incorporated those changes into my patch too:

#3190674: Fix query separator in YouTube parser

Comments

owenbush created an issue. See original summary.

owenbush’s picture

Status: Active » Needs review
StatusFileSize
new5.06 KB

Attaching a patch to address all the issues outlined. This is being moved to needs review now there is a patch.

owenbush’s picture

StatusFileSize
new1.9 KB
new5.06 KB

An issue was discovered with youtu.be URLs which I should now have addressed. Attached is an interdiff and a new patch.

huzooka’s picture

I had issues with additional query args, so I improved #3 a bit:

  1. Video IDs are identified in links like https://www.youtube.com/watch?v=iutmWwkOcYg&t=20s (previously the idendified ID was iutmWwkOcYgt=20s, now it is iutmWwkOcYg)
  2. If the link's first arg wasn't v, UrlToVideoFilterService::convertYouTubeUrlToEmbedCode was not able to find the ID - e.g. https://www.youtube.com/watch?app=desktop&v=iutmWwkOcY Now it is able to get the ID.
  3. If anything goes wrong, UrlToVideoFilterService::convertYouTubeUrlToEmbedCode returns an embed placeholder with the whole link as ID. So I changed all the preg_replace() functions with preg_match() - it there is no match, UrlToVideoFilterService::convertYouTubeUrlToEmbedCode returns NULL.
  4. With the patch #3, the discovered YouTube video IDs ended with a ? mark. With this patch the issue is gone.

(With the changes I applied unit test is passing on my side.)

huzooka’s picture

StatusFileSize
new12.79 KB
new6.01 KB

My following test "string" still fails:

<p>Youtu.be:</p>

<p>https://youtu.be/iutmWwkOcYg</p>

<p>Youtu.be with start time:</p>

<p>https://youtu.be/iutmWwkOcYg?t=10</p>

I get

<p>Youtu.be:</p>

<player of https://youtu.be/iutmWwkOcYg />

<p>Youtu.be with start time:</p>

<player of https://youtu.be/iutmWwkOcYg />?t=10

Why? Because in UrlToVideoFilterService::convertYouTubeUrls(), every occurrence of the first link gets replaced by its embed code, which means that the second link, https://youtu.be/iutmWwkOcYg?t=10 will be also partially replaced, excluding the query arg(s) (in this case, ?t=10).

I went ahead and introduced a much more stricter replacement performed by preg_replace() (see interdiff).

huzooka’s picture

StatusFileSize
new13.2 KB
new10.34 KB

Had to fix the YouTube pattern: their video IDs can contain underscores!

jaypan’s picture

Status: Needs review » Fixed

Thank you for your work on this. This has been implemented in 3.x.x.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.