Problem/Motivation

For a while, as per #3186164: oEmbed providers.json returns errors when retrieved from Drupal, all pages which display oEmbed content coming from YouTube would fail with an exception:

The fetched resource did not have a valid Content-Type header.

This is because the fetched resources were returning text/html as their Content-Type header, which our oEmbed system doesn't recognize.

The original bug has since been fixed by YouTube, but the oEmbed system has been extended by contrib (for example, the oEmbed Providers module) and could be used to interact with virtually any provider, which could return any sort of Content-Type header. Therefore, it makes sense to make the oEmbed system a little less strict overall when it sees an unexpected Content-Type header. This will also make us more resistant to similar problems in the future, should they happen at YouTube or Vimeo (the two providers core supports OOTB) again.

Steps to reproduce

N/A

Proposed resolution

Since JSON seems to be (anecdotally) the most popular format for oEmbed data, let's keep our current allowance for XML, but otherwise just default to treating oEmbed data as JSON, regardless of the Content-Type header.

Remaining tasks

Merge it!

User interface changes

None.

API changes

None.

Data model changes

None.

Release notes snippet

TBD

Issue fork drupal-3186415

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dan2k3k4 created an issue. See original summary.

pookmish’s picture

I've started to notice the same problem earlier this week and it is now occurring on every attempt. The suggested fix does solve the issue.

My suggestion would be to combine that new else condition with the text/javascript conditions. It would safeguard any unknown issue when the headers return a json or javascript response but the content is malformed.

Also I would guess tests are in need for this fix.

philltran’s picture

I have been running into the same issue.

Thanks for posting up your fix. Arbitrarily parsing the data as json scares me as we have no control on what is coming back from the oembed provider especially if the provider is not YouTube.

Oddly add `&format=json` to the end of the youtube oembed url does not change the content-type header.

almunnings’s picture

Related issue upstream on Google

https://issuetracker.google.com/issues/174576256

randell’s picture

#4 patch works for Drupal 9.0.8.

chrisfromredfin’s picture

I discovered similar. The site I'm working on is peppered with YouTube videos, and this exception was whitescreening any page with this. I assume YouTube will fix their header at some point in the nearish future, but until then, I did similar:

+    elseif (strstr($format, 'text/html')) {
+      @json_decode($content);
+      if (json_last_error() == JSON_ERROR_NONE) {
+        $data = Json::decode($content);
+      }
+      else {
+        // Bat country.
+        throw new ResourceException('The fetched resource did not have a valid Content-Type header.', $url);
+      }
+    }

The more correct thing would probably be to wrap the logic for attempted JSON parsing in it's own function, like isJson(), but I didn't know where that should live in Drupal - a utility class, or just a private method on this class or what...

larowlan’s picture

Status: Active » Postponed
Issue tags: +Bug Smash Initiative

This is an upstream bug

AnticoVeneziano’s picture

didn't worked form (drupal 9.1)
not able to patch: "error: input non riconosciuto"
patch applied by hand: video don't show...
sorry

LOBsTerr’s picture

It's also happens for Drupal 8.9

philltran’s picture

I applied patch #4 and it is working for me on 8.9 as a temporary measure until google corrects the upstream issue.

stuckinconcretejungle’s picture

I applied patch #4 and it is working for me on 8.9 too, thanks.

TechnoTim2010’s picture

I am seeing the same issue, have a site making heavy use of youtube oembeds to show online content for an offline theatre due to COVID.

More annoyingly destroys our advent calendar.

I was wondering why noone was watching the videos.
Running 8.9.10

Tried patch #4 but not solved the issue.

TechnoTim2010’s picture

From related issue https://www.drupal.org/project/drupal/issues/3186164#comment-13925067
This drush command fixed it for me.
drush cset media.settings oembed_providers_url https://gist.githubusercontent.com/Berdir/75849d8b01bafbb15264bd93ced5a2f2/raw/3f20bd0a163f71fa46a1b80dc53fd08890dfc36e/providers.json
Thanks to @mathieucarset

benjarlett’s picture

Whatever that juju is in #15 it's impressive and worked for me too :)

Sergiy Bovdyr’s picture

Thank you so much! Fix number 4 works for me (Drupal 9.1). Added this to my file composer.json and run composer update.

    "require": {
        "cweagans/composer-patches": "^1.6",

   -- // --

        "patches": {
            "drupal/core": {
                "view() error": "https://www.drupal.org/files/issues/2020-12-04/3186415-8.9.x-youtube-headers-html.patch"
            }
        }
Sergiy Bovdyr’s picture

AnticoVeneziano’s picture

even the #17 didn't work... patch correctly applied by composer
(cache cleaned five or six times...) nothing happens
still same error
it's incredible! a lot of videos to show...

AndrewsizZ’s picture

Patch works for drupal 9. Seems like youtube today changed API

meder’s picture

#4 (= #17) worked for me at Drupal 9.1

sense-design’s picture

Patch #4 works for me on Drupal 8.9.11

Alejo D’s picture

I agreed with this patch, i did the same workaround to fix it but seems like a final solution because YouTube now returns a JSON as text/html.

Tested #4 on D 9.1

danheisel’s picture

Thanks!!! Patch in #4 works brilliantly with 8.9.11. Saved me from a few hours of trouble shooting this morning. Cheers!!

komlenic’s picture

It's probably not necessary, but I'll add another confirmation that the patch in #4 fixes this on 8.9.10 and 8.9.11.

MariaJ’s picture

Patch #4 works for me, Drupal 8.9.6
Thanks!!

othermachines’s picture

I just ran into this as well. Patch in #4 works great as a temporary fix while things get fixed upstream. Thanks!

cilefen’s picture

I would change the exception message in #4 to simply "The fetched resource could not be parsed as JSON." if this becomes a permanent fix. The reason for that is that text/html actually would be a valid Content-Type header according to the new code.

philltran’s picture

Google has responded to the issue created regarding this issue. A fix is supposed to be released on Monday.

https://issuetracker.google.com/issues/174576256

ksenzee’s picture

#28 Adding this workaround to Drupal doesn't make JSON served as text/html into a valid oEmbed response. It's just an invalid response we have a workaround for. As it happens, we already have a couple of workarounds in that code: we allow for application/xml and text/javascript Content-Type headers, but if servers return anything but application/json or text/xml, they're violating section 2.3 of the oEmbed spec. I think the exception message is okay as is.

phenaproxima’s picture

I'm of mixed mind here. On the one hand, I agree with @ksenzee that we are respecting the oEmbed spec as written, and providers are violating it, and that therefore we should limit the number of workarounds we're willing to add to core.

On the other hand...since adding the oEmbed system to core, we have discovered that, in practice, providers not infrequently violate the spec with abandon. And it's not too likely we're going to get upstream fixes made to, y'know, YouTube or SoundCloud or others. We just kinda have to take what we're given, because not supporting YouTube isn't an option.

So I'm not super sure how to go about future-proofing this. That said, this particular situation is an upstream bug, so it's probably okay to wait on Google to fix it before seeing if we need changes in core.

Chris Matthews’s picture

From https://noembed.com/

oEmbed is nice. Unfortunately, not everything supports oEmbed. Worse, the sites that do support it don't provide a consistent interface. Noembed provides a single url to get embeddable content from a large list of sites, even sites without oEmbed support!

Additionally, Noembed guarantees that all responses will have html, title, url, and provider_name fields. This means fewer special cases spent building up your own HTML.

A simple demo is available here.

Usage

Treat Noembed like a regular oEmbed provider, but use any of the supported sites for the url parameter. Noembed also supports a callback parameter for JSONP.

An example request might look like this:

http://noembed.com/embed?url=http%3A//www.youtube.com/watch%3Fv%3DbDOYN-6gdRE&callback=my_embed_function

And the response will look like:

my_embed_function(
  {
    "width" : 425,
    "author_name" : "schmoyoho",
    "author_url" : "http://www.youtube.com/user/schmoyoho",
    "version" : "1.0",
    "provider_url" : "http://www.youtube.com/",
    "provider_name" : "YouTube",
    "thumbnail_width" : 480,
    "thumbnail_url" : "http://i3.ytimg.com/vi/bDOYN-6gdRE/hqdefault.jpg",
    "height" : 344,
    "thumbnail_height" : 360,
    "html" : "<iframe type='text/html' width='425' height='344' src='http://www.youtube.com/embed/bDOYN-6gdRE' frameborder=0></iframe>",
    "url" : "http://www.youtube.com/watch?v=bDOYN-6gdRE",
    "type" : "rich",
    "title" : "Auto-Tune the News #8: dragons. geese. Michael Vick. (ft. T-Pain)"
  }
)
larowlan’s picture

Phenaproxima - is the fix here to catch the exception and output a log message/fallback?

I don't think we should be special casing providers that don't meet the spec

kreynen’s picture

Just had this happen as well.

phenaproxima’s picture

Phenaproxima - is the fix here to catch the exception and output a log message/fallback?

I'm not sure what the "fix" is, exactly, since this is a case where YouTube is sending us bad data due to an apparent bug on their end. So, technically, we don't need to do anything.

However, I did have a thought for how to make this more fault-tolerant: how about we allow the resource fetcher to take, as a constructor parameter, an object that implements \Drupal\Component\Serialization\SerializationInterface to use as the fallback decoder? If the fetched resource has an unknown Content-Type header, we just pass it to the fallback decoder and hope for the best. If the decoder throws InvalidDataTypeException, we catch that and re-throw it, wrapped in a ResourceException.

Thoughts? IMHO, this approach would improve the robustness of the resource fetcher, which seems necessary given how much of a wild west the oEmbed world can be, while avoiding special cases for specific providers.

phenaproxima’s picture

Title: OEmbed throws fetched resource did not have a valid Content-Type header for text/html responses » Make oEmbed resource fetcher more tolerant of unexpected Content-Type headers
Status: Postponed » Needs review

Re-titling in light of the potential direction we can take here, which I've implemented in the merge request.

amjad1233’s picture

This has been resolved upstream.

Reference :
https://issuetracker.google.com/issues/174576256

b2f’s picture

This patch could still be useful, but I confirm the original YouTube problem as been fixed by Google as of now.

Alejo D’s picture

Seems like Youtube still having the issue with some videos.

curl -I "https://www.youtube.com/oembed?url=https://www.youtube.com/watch?v=cxxeRzfJ1_c"
HTTP/2 401
expires: Tue, 27 Apr 1971 19:44:06 GMT
content-type: text/html; charset=utf-8
p3p: CP="This is not a P3P policy! See http://support.google.com/accounts/answer/151657?hl=es-419 for more info."
strict-transport-security: max-age=31536000
cache-control: no-cache
x-content-type-options: nosniff
content-length: 0
date: Wed, 09 Dec 2020 14:08:46 GMT
server: YouTube Frontend Proxy
x-xss-protection: 0
alt-svc: h3-29=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
phenaproxima’s picture

Version: 9.1.x-dev » 9.2.x-dev

Switching this to 9.2.x, since that's the active development branch. Hopefully this will be backported as well, but that's up to committers.

kaynen’s picture

Status: Needs review » Reviewed & tested by the community

The patch in #37 continues to work and seems to add some additional error handling that would be useful. The video in #39 likely has "Allow Embedding" turned off in my testing. The response with HTTP/2 401 is always in the response with any video I test from Youtube that has embedding turned off.

phenaproxima’s picture

Crediting @kaynen for manual testing we did together during DrupalCon Europe 2020.

alexpott’s picture

Status: Reviewed & tested by the community » Needs work

@larowlan has added unresolved comments to the merge request

pianomansam’s picture

For those of us still on 8.9 and have applied patch #4, there is a flaw. If the decoded JSON object doesn't return an array, then ResourceFetcher::createResource() will complain. This will cause you to be unable to unpublish (or possibly make other edits) to the media item. You'll also receive a 500 from media fields using the media library.

I encountered this issue when a YouTube video that was in the media library became unpublished.

Here is an updated patch that double-checks that the JSON object decodes correctly to an array.

Note that this issue doesn't exist prior to applying patch #4. So if another solution comes along, I'm sure this won't be an issue. This only applies to sites using patch #4.

pianomansam’s picture

Apologies for the bad patch.

cilefen’s picture

+    elseif (strstr($format, 'text/javascript') || strstr($format, 'application/json') || strstr($format, 'text/html')) {
+      try {
+        $data = Json::decode($content);
+        if (!is_array($data)) {
+          throw new ResourceException('Could not retrieve the oEmbed resource.', $url);
+        }
+      }
+      catch (InvalidDataTypeException $e) {
+        throw new ResourceException('The fetched resource did not have a valid Content-Type header and could not be parsed with JSON', $url);
+      }

The exception message wording could be refined. "'Could not retrieve the oEmbed resource" may not be accurate because it actually means that data could not be decoded was decoded but was not formatted as expected. "The fetched resource did not have a valid Content-Type header and could not be parsed with JSON" should probably read "...or could not be parsed as JSON".

walangitan’s picture

Updated the patch in #45 with the suggestions in #46. This patch was rerolled incorrectly, refer to #48.

walangitan’s picture

Hiding #47 as it was rerolled incorrectly.

Updated the patch in #45 with the suggestions in #46. This is for 8.9.x.

cilefen’s picture

Status: Needs work » Needs review
phenaproxima’s picture

Status: Needs review » Needs work

This still needs work, to address the comments in @larowlan's review of the merge request. We probably should cease using patches in this issue; the merge request supersedes them.

phenaproxima’s picture

Assigned: Unassigned » phenaproxima

Self-assigning to deal with Lee's review.

ThirstySix’s picture

#48 Didn't work even patch correctly applied by composer.json.
I tried with &format=json. No luck with version 8.9.11 :(
Same issue "Could not retrieve the oEmbed resource."

Ghost of Drupal Past’s picture

Ghost of Drupal Past’s picture

FileSize
1.53 KB

Whopsie, that was not against core.

ThirstySix’s picture

#54 Didn't work. Youtube embed shows "403 Forbidden" error
Tested with 8.9.11

Berdir’s picture

This patch was specifically for a temporary regression in youtube that has since been fixed. If you get access denied then it means the video is unpublished or can't be embedded, see #44, has nothing to do with this problem. And if you get other errors, it could for example be that your URL is wrong and contains other arguments like list which won't work.

Lendude’s picture

FYI there does sees to be a new regression that causes the same error to show up, but for a completely different reason. If you use a youtube URL without a www. prefix, it will throw a 403. It didn't before, it does now.

So completely unrelated and again what seems to be an upstream problem, but posting here because this seems to be the issue that people look at first, opened #3188842: Youtube oEmbed without www prefix shows "403 Forbidden" error and shows "Could not retrieve the oEmbed resource." for it

larowlan’s picture

Status: Needs review » Reviewed & tested by the community

The merge request https://git.drupalcode.org/issue/drupal-3186415/-/tree/3186415-oembed-th... looks RTBC to me.

To be clear I'm RTBCing the merge request, not the patch

alexpott’s picture

Status: Reviewed & tested by the community » Needs review

Why are we passing in the fallback decoder? It seems an unnecessary complexity. How about we change
elseif (strstr($format, 'text/javascript') || strstr($format, 'application/json')) {
to
else {
and always assume JSON. And wrap $data = Json::decode($content); in a try catch. For me that'd be following the robustness principle.

phenaproxima’s picture

Why are we passing in the fallback decoder? It seems an unnecessary complexity. How about we change
elseif (strstr($format, 'text/javascript') || strstr($format, 'application/json')) {
to
else {
and always assume JSON. And wrap $data = Json::decode($content); in a try catch. For me that'd be following the robustness principle.

I'm not sure I concur here. IMHO, the fallback decoder is just that -- a swappable fallback for situations we don't recognize, because we know that some providers don't obey the oEmbed specification as strictly as they maybe should. If a resource returns a known Content-Type header, we don't need to fall back. We know exactly how to handle those.

But if we don't recognize the Content-Type header, JSON might very well not be a safe assumption (which would be a rare situation, but if it arose, that's exactly why the fallback decoder is swappable). For example, what if it were an XML response, but with a text/html MIME type? In such a case, you'd almost certainly want to inject a fallback decoder, since it might not be possible to change the upstream behavior. The point of adding the fallback decoder is to empower developers to handle strange situations.

So to me, the way the code is currently structured is much clearer about what we will accept vs. what we will try to accept, and helps developers deal with weirdness from oEmbed providers.

Just my $.02.

Ghost of Drupal Past’s picture

And wrap $data = Json::decode($content); in a try catch

Json::decode does not throw exceptions.

larowlan’s picture

about we change
elseif (strstr($format, 'text/javascript') || strstr($format, 'application/json')) {
to
else {
and always assume JSON

That works for me, but with the caveat that there's no exception so we'd be checking for false or json_last_error instead of catching an exception

Originally I was against special casing one broken provider, but this isn't doing that, its just assuming that json is more likely than XML, which I can live with.

Version: 9.2.x-dev » 9.3.x-dev

Drupal 9.2.0-alpha1 will be released the week of May 3, 2021, which means new developments and disruptive changes should now be targeted for the 9.3.x-dev branch. For more information see the Drupal core minor version schedule and the Allowed changes during the Drupal core release cycle.

kim.pepper’s picture

Issue tags: +#pnx-sprint

longwave’s picture

Status: Needs review » Reviewed & tested by the community

OEmbed endpoints seem to like to make unexpected changes on us, so this should help protect against that. The new test coverage handles a few more cases where we might see strange responses.

effulgentsia’s picture

The YouTube example that's in the issue summary has been fixed by YouTube. Are there any current examples of oEmbed providers that return an incorrect Content-Type header?

phenaproxima’s picture

Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs issue summary update

I'm not aware of any others with this problem that are supported specifically by core; we only do YouTube and Vimeo by default. That said, modules like oEmbed Providers let sites interact with a virtually any provider, which may do anything, so slightly increasing the fault tolerance here seems like a wise idea. I'll update the issue summary and re-categorize this as a feature request.

phenaproxima’s picture

Category: Bug report » Feature request
Issue summary: View changes
Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs issue summary update

Rewrote the issue summary as a feature request, so restoring RTBC.

phenaproxima’s picture

Issue summary: View changes
phenaproxima’s picture

Issue summary: View changes
effulgentsia’s picture

Status: Reviewed & tested by the community » Needs review

For me, this should be a "works as designed". From https://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1:

If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content...

Therefore, to comply with HTTP specification, I don't think we should treat the response body as JSON just because it happens to be parseable as JSON, when the server is explicitly telling us that it's HTML.

the oEmbed system has been extended by contrib (for example, the oEmbed Providers module) and could be used to interact with virtually any provider, which could return any sort of Content-Type header

If contrib wants to break HTTP specification in order to compensate for a provider's bug, it can implement a Guzzle middleware to intercept the response and change the response header.

phenaproxima’s picture

I think we have to kung-fu fight now, @effulgentsia. :) I feel strongly in the opposite direction.

While I think that everything you've said is technically correct, I have to ask: how do we benefit from strict compliance with HTTP in this case? I don't see how we, or anyone, does. Because if there is one thing we have learned the hard way since the oEmbed system was committed, it's that providers occasionally do weird things, and stuff breaks from time to time. When that happens, it is site builders, authors, and end users who are punished. Their sites stop working, without warning, due to no fault of their own, and they are left without any immediate corrective action they can take, unless they have the skill and time to hack their way through custom development (assuming a workaround is even possible).

I cannot imagine that such an experience increases anybody's confidence in Drupal.

And sure, we could have a contrib module that implements a workaround. But that's not much better, because imagine the scenario: every YouTube video embedded on your site has stopped working and whole pages, maybe even your home page, are breaking due to oEmbed exceptions. You are frazzled and have to frantically figure out what's going wrong. Is it really a good thing if we force people to find some random "break glass in case of obscure YouTube emergency" contrib module? I would argue that it is not. It makes Drupal look complex, intimidating, unfriendly, and fragile.

I'm all for keeping complexity out of core, but the proposed change is one that actually simplifies core a little, whilst adding no new API surface -- both of which benefit core developers -- and makes it harder for sites to break in the face of inevitable quirks and bugs on oEmbed providers' end...which benefits site owners and end users. You're right, it departs from the HTTP specification, but in the name of making things work a little better for the people who are actually affected...that feels like an acceptable trade-off.

And hey, maybe I'm completely in the wrong here, and maybe there's some massive benefit that I'm not seeing. I've been wrong before and I can be wrong again. But if that's the case, I definitely need to understand the rationale better, and why this would not just be punishing end users.

Now, having said my piece, I do not want to leave this issue in "needs review", or it will rot for eternity in limbo. I'd prefer if committers weighed in collaboratively to make an ultimate decision here. Therefore, I'm tagging this for both product and framework manager review.

rlnorthcutt’s picture

FWIW, I agree with @phenaproxima on this. If this is something that Google can unexpectedly break, then we have to assume that other providers will... and Google may well again. The proposal does provide a greater measure of reasonable robustness, doesn't change the interface, and is testable and secure... the benefits to the user outweigh the benefits of architectural rigidity. This is a good example of when it sometimes makes sense to break convention.

longwave’s picture

Status: Needs review » Reviewed & tested by the community

For what it's worth I also entirely agree with @phenaproxima here, having already fell foul of the YouTube issue on a production site, and with the full expectation that a similar situation can and will occur in the future. In situations like this I think we should aim to follow Postel's robustness principle: "be conservative in what you send, be liberal in what you accept".

Therefore I'm respectfully setting this back to RTBC in order to get the manager reviews on this.

effulgentsia’s picture

Seems that other projects agree with you as well: Wordpress, Embed/embed, and Typo3's oembed extension all seem to parse the JSON without checking the content-type.

That makes it all the more likely that providers can fail to provide the correct header without people noticing, and I agree that if Drupal sites are the only ones that start failing when that happens, then that's not good for Drupal site owners/users or for the Drupal project.

Ok, given the widespread precedent for it in the ecosystem, I'm convinced we should do this as well. Therefore, removing the "Needs framework manager review" tag, and I think that also means we don't need product manager review, so removing that tag too.

Gábor Hojtsy’s picture

I agree based on #76 that this makes sense, thanks!

effulgentsia’s picture

Saving issue credits.

  • effulgentsia committed 167686b on 9.3.x
    Issue #3186415 by phenaproxima, Charlie ChX Negyesi, walangitan,...

  • effulgentsia committed e232dfa on 9.2.x
    Issue #3186415 by phenaproxima, Charlie ChX Negyesi, walangitan,...
effulgentsia’s picture

Version: 9.3.x-dev » 9.2.x-dev

Thanks, everyone! Pushed to 9.3.x, and cherry picked to 9.2.x since this is non-disruptive and it's possible that YouTube or another provider will encounter this again prior to 9.2's EOL.

effulgentsia’s picture

Status: Reviewed & tested by the community » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.