Problem/Motivation
For a while, as per #3186164: oEmbed providers.json returns errors when retrieved from Drupal, all pages which display oEmbed content coming from YouTube would fail with an exception:
The fetched resource did not have a valid Content-Type header.
This is because the fetched resources were returning text/html
as their Content-Type header, which our oEmbed system doesn't recognize.
The original bug has since been fixed by YouTube, but the oEmbed system has been extended by contrib (for example, the oEmbed Providers module) and could be used to interact with virtually any provider, which could return any sort of Content-Type header. Therefore, it makes sense to make the oEmbed system a little less strict overall when it sees an unexpected Content-Type header. This will also make us more resistant to similar problems in the future, should they happen at YouTube or Vimeo (the two providers core supports OOTB) again.
Steps to reproduce
N/A
Proposed resolution
Since JSON seems to be (anecdotally) the most popular format for oEmbed data, let's keep our current allowance for XML, but otherwise just default to treating oEmbed data as JSON, regardless of the Content-Type header.
Remaining tasks
Merge it!
User interface changes
None.
API changes
None.
Data model changes
None.
Release notes snippet
TBD
Comment | File | Size | Author |
---|
Issue fork drupal-3186415
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #3
pookmish CreditAttribution: pookmish commentedI've started to notice the same problem earlier this week and it is now occurring on every attempt. The suggested fix does solve the issue.
My suggestion would be to combine that new
else
condition with thetext/javascript
conditions. It would safeguard any unknown issue when the headers return ajson
orjavascript
response but the content is malformed.Also I would guess tests are in need for this fix.
Comment #4
almunningsPatch for Drupal 8.9 based on above.
Also cleanly applies to 9.1.
Comment #5
philltran CreditAttribution: philltran commentedI have been running into the same issue.
Thanks for posting up your fix. Arbitrarily parsing the data as json scares me as we have no control on what is coming back from the oembed provider especially if the provider is not YouTube.
Oddly add `&format=json` to the end of the youtube oembed url does not change the content-type header.
Comment #6
almunningsRelated issue upstream on Google
https://issuetracker.google.com/issues/174576256
Comment #7
randell CreditAttribution: randell commented#4 patch works for Drupal 9.0.8.
Comment #8
chrisfromredfinI discovered similar. The site I'm working on is peppered with YouTube videos, and this exception was whitescreening any page with this. I assume YouTube will fix their header at some point in the nearish future, but until then, I did similar:
The more correct thing would probably be to wrap the logic for attempted JSON parsing in it's own function, like isJson(), but I didn't know where that should live in Drupal - a utility class, or just a private method on this class or what...
Comment #9
larowlanThis is an upstream bug
Comment #10
AnticoVeneziano CreditAttribution: AnticoVeneziano commenteddidn't worked form (drupal 9.1)
not able to patch: "error: input non riconosciuto"
patch applied by hand: video don't show...
sorry
Comment #11
LOBsTerr CreditAttribution: LOBsTerr at European Commission and European Union Institutions, Agencies and Bodies commentedIt's also happens for Drupal 8.9
Comment #12
philltran CreditAttribution: philltran commentedI applied patch #4 and it is working for me on 8.9 as a temporary measure until google corrects the upstream issue.
Comment #13
stuckinconcretejungle CreditAttribution: stuckinconcretejungle as a volunteer commentedI applied patch #4 and it is working for me on 8.9 too, thanks.
Comment #14
TechnoTim2010 CreditAttribution: TechnoTim2010 commentedI am seeing the same issue, have a site making heavy use of youtube oembeds to show online content for an offline theatre due to COVID.
More annoyingly destroys our advent calendar.
I was wondering why noone was watching the videos.
Running 8.9.10
Tried patch #4 but not solved the issue.
Comment #15
TechnoTim2010 CreditAttribution: TechnoTim2010 commentedFrom related issue https://www.drupal.org/project/drupal/issues/3186164#comment-13925067
This drush command fixed it for me.
drush cset media.settings oembed_providers_url https://gist.githubusercontent.com/Berdir/75849d8b01bafbb15264bd93ced5a2f2/raw/3f20bd0a163f71fa46a1b80dc53fd08890dfc36e/providers.json
Thanks to @mathieucarset
Comment #16
benjarlett CreditAttribution: benjarlett commentedWhatever that juju is in #15 it's impressive and worked for me too :)
Comment #17
Sergiy Bovdyr CreditAttribution: Sergiy Bovdyr commentedThank you so much! Fix number 4 works for me (Drupal 9.1). Added this to my file composer.json and run composer update.
Comment #18
Sergiy Bovdyr CreditAttribution: Sergiy Bovdyr commentedComment #19
AnticoVeneziano CreditAttribution: AnticoVeneziano commentedeven the #17 didn't work... patch correctly applied by composer
(cache cleaned five or six times...) nothing happens
still same error
it's incredible! a lot of videos to show...
Comment #20
AndrewsizZ CreditAttribution: AndrewsizZ as a volunteer and at AnyforSoft for AnyforSoft commentedPatch works for drupal 9. Seems like youtube today changed API
Comment #21
meder CreditAttribution: meder commented#4 (= #17) worked for me at Drupal 9.1
Comment #22
sense-designPatch #4 works for me on Drupal 8.9.11
Comment #23
Alejo DI agreed with this patch, i did the same workaround to fix it but seems like a final solution because YouTube now returns a JSON as text/html.
Tested #4 on D 9.1
Comment #24
danheisel CreditAttribution: danheisel commentedThanks!!! Patch in #4 works brilliantly with 8.9.11. Saved me from a few hours of trouble shooting this morning. Cheers!!
Comment #25
komlenic CreditAttribution: komlenic commentedIt's probably not necessary, but I'll add another confirmation that the patch in #4 fixes this on 8.9.10 and 8.9.11.
Comment #26
MariaJ CreditAttribution: MariaJ commentedPatch #4 works for me, Drupal 8.9.6
Thanks!!
Comment #27
othermachines CreditAttribution: othermachines commentedI just ran into this as well. Patch in #4 works great as a temporary fix while things get fixed upstream. Thanks!
Comment #28
cilefen CreditAttribution: cilefen as a volunteer commentedI would change the exception message in #4 to simply "The fetched resource could not be parsed as JSON." if this becomes a permanent fix. The reason for that is that text/html actually would be a valid Content-Type header according to the new code.
Comment #29
philltran CreditAttribution: philltran commentedGoogle has responded to the issue created regarding this issue. A fix is supposed to be released on Monday.
https://issuetracker.google.com/issues/174576256
Comment #30
ksenzee#28 Adding this workaround to Drupal doesn't make JSON served as text/html into a valid oEmbed response. It's just an invalid response we have a workaround for. As it happens, we already have a couple of workarounds in that code: we allow for application/xml and text/javascript Content-Type headers, but if servers return anything but application/json or text/xml, they're violating section 2.3 of the oEmbed spec. I think the exception message is okay as is.
Comment #31
phenaproximaI'm of mixed mind here. On the one hand, I agree with @ksenzee that we are respecting the oEmbed spec as written, and providers are violating it, and that therefore we should limit the number of workarounds we're willing to add to core.
On the other hand...since adding the oEmbed system to core, we have discovered that, in practice, providers not infrequently violate the spec with abandon. And it's not too likely we're going to get upstream fixes made to, y'know, YouTube or SoundCloud or others. We just kinda have to take what we're given, because not supporting YouTube isn't an option.
So I'm not super sure how to go about future-proofing this. That said, this particular situation is an upstream bug, so it's probably okay to wait on Google to fix it before seeing if we need changes in core.
Comment #32
Chris Matthews CreditAttribution: Chris Matthews as a volunteer and at City of Oaks Design commentedFrom https://noembed.com/
oEmbed is nice. Unfortunately, not everything supports oEmbed. Worse, the sites that do support it don't provide a consistent interface. Noembed provides a single url to get embeddable content from a large list of sites, even sites without oEmbed support!
Additionally, Noembed guarantees that all responses will have
html
,title
,url
, andprovider_name
fields. This means fewer special cases spent building up your own HTML.A simple demo is available here.
Usage
Treat Noembed like a regular oEmbed provider, but use any of the supported sites for the
url
parameter. Noembed also supports acallback
parameter for JSONP.An example request might look like this:
http://noembed.com/embed?url=http%3A//www.youtube.com/watch%3Fv%3DbDOYN-6gdRE&callback=my_embed_function
And the response will look like:
Comment #33
larowlanPhenaproxima - is the fix here to catch the exception and output a log message/fallback?
I don't think we should be special casing providers that don't meet the spec
Comment #34
kreynen CreditAttribution: kreynen at University of Colorado Boulder commentedJust had this happen as well.
Comment #35
phenaproximaI'm not sure what the "fix" is, exactly, since this is a case where YouTube is sending us bad data due to an apparent bug on their end. So, technically, we don't need to do anything.
However, I did have a thought for how to make this more fault-tolerant: how about we allow the resource fetcher to take, as a constructor parameter, an object that implements
\Drupal\Component\Serialization\SerializationInterface
to use as the fallback decoder? If the fetched resource has an unknown Content-Type header, we just pass it to the fallback decoder and hope for the best. If the decoder throws InvalidDataTypeException, we catch that and re-throw it, wrapped in a ResourceException.Thoughts? IMHO, this approach would improve the robustness of the resource fetcher, which seems necessary given how much of a wild west the oEmbed world can be, while avoiding special cases for specific providers.
Comment #36
phenaproximaRe-titling in light of the potential direction we can take here, which I've implemented in the merge request.
Comment #37
amjad1233This has been resolved upstream.
Reference :
https://issuetracker.google.com/issues/174576256
Comment #38
b2f CreditAttribution: b2f at Axess Open Web Services commentedThis patch could still be useful, but I confirm the original YouTube problem as been fixed by Google as of now.
Comment #39
Alejo DSeems like Youtube still having the issue with some videos.
Comment #40
phenaproximaSwitching this to 9.2.x, since that's the active development branch. Hopefully this will be backported as well, but that's up to committers.
Comment #41
kaynen CreditAttribution: kaynen as a volunteer commentedThe patch in #37 continues to work and seems to add some additional error handling that would be useful. The video in #39 likely has "Allow Embedding" turned off in my testing. The response with
HTTP/2 401
is always in the response with any video I test from Youtube that has embedding turned off.Comment #42
phenaproximaCrediting @kaynen for manual testing we did together during DrupalCon Europe 2020.
Comment #43
alexpott@larowlan has added unresolved comments to the merge request
Comment #44
pianomansam CreditAttribution: pianomansam at Rapid Development Group for Christian Reformed Church of America commentedFor those of us still on 8.9 and have applied patch #4, there is a flaw. If the decoded JSON object doesn't return an array, then
ResourceFetcher::createResource()
will complain. This will cause you to be unable to unpublish (or possibly make other edits) to the media item. You'll also receive a 500 from media fields using the media library.I encountered this issue when a YouTube video that was in the media library became unpublished.
Here is an updated patch that double-checks that the JSON object decodes correctly to an array.
Note that this issue doesn't exist prior to applying patch #4. So if another solution comes along, I'm sure this won't be an issue. This only applies to sites using patch #4.
Comment #45
pianomansam CreditAttribution: pianomansam at Rapid Development Group commentedApologies for the bad patch.
Comment #46
cilefen CreditAttribution: cilefen as a volunteer commentedThe exception message wording could be refined. "'Could not retrieve the oEmbed resource" may not be accurate because it actually means that data
could not be decodedwas decoded but was not formatted as expected. "The fetched resource did not have a valid Content-Type header and could not be parsed with JSON" should probably read "...or could not be parsed as JSON".Comment #47
walangitan CreditAttribution: walangitan at Chromatic commentedUpdated the patch in #45 with the suggestions in #46.This patch was rerolled incorrectly, refer to #48.Comment #48
walangitan CreditAttribution: walangitan at Chromatic commentedHiding #47 as it was rerolled incorrectly.
Updated the patch in #45 with the suggestions in #46. This is for 8.9.x.
Comment #49
cilefen CreditAttribution: cilefen as a volunteer commentedComment #50
phenaproximaThis still needs work, to address the comments in @larowlan's review of the merge request. We probably should cease using patches in this issue; the merge request supersedes them.
Comment #51
phenaproximaSelf-assigning to deal with Lee's review.
Comment #52
ThirstySix CreditAttribution: ThirstySix as a volunteer and commented#48 Didn't work even patch correctly applied by composer.json.
I tried with &format=json. No luck with version 8.9.11 :(
Same issue "Could not retrieve the oEmbed resource."
Comment #53
Ghost of Drupal PastInvalidDataTypeException is only thrown by the YAML parts of Drupal, not the JSON.
Let's try this.
Comment #54
Ghost of Drupal PastWhopsie, that was not against core.
Comment #55
ThirstySix CreditAttribution: ThirstySix as a volunteer and commented#54 Didn't work. Youtube embed shows "403 Forbidden" error
Tested with 8.9.11
Comment #56
BerdirThis patch was specifically for a temporary regression in youtube that has since been fixed. If you get access denied then it means the video is unpublished or can't be embedded, see #44, has nothing to do with this problem. And if you get other errors, it could for example be that your URL is wrong and contains other arguments like list which won't work.
Comment #57
LendudeFYI there does sees to be a new regression that causes the same error to show up, but for a completely different reason. If you use a youtube URL without a www. prefix, it will throw a 403. It didn't before, it does now.
So completely unrelated and again what seems to be an upstream problem, but posting here because this seems to be the issue that people look at first, opened #3188842: Youtube oEmbed without www prefix shows "403 Forbidden" error and shows "Could not retrieve the oEmbed resource." for it
Comment #58
larowlanThe merge request https://git.drupalcode.org/issue/drupal-3186415/-/tree/3186415-oembed-th... looks RTBC to me.
To be clear I'm RTBCing the merge request, not the patch
Comment #59
alexpottWhy are we passing in the fallback decoder? It seems an unnecessary complexity. How about we change
elseif (strstr($format, 'text/javascript') || strstr($format, 'application/json')) {
to
else {
and always assume JSON. And wrap $data = Json::decode($content); in a try catch. For me that'd be following the robustness principle.
Comment #60
phenaproximaI'm not sure I concur here. IMHO, the fallback decoder is just that -- a swappable fallback for situations we don't recognize, because we know that some providers don't obey the oEmbed specification as strictly as they maybe should. If a resource returns a known Content-Type header, we don't need to fall back. We know exactly how to handle those.
But if we don't recognize the Content-Type header, JSON might very well not be a safe assumption (which would be a rare situation, but if it arose, that's exactly why the fallback decoder is swappable). For example, what if it were an XML response, but with a
text/html
MIME type? In such a case, you'd almost certainly want to inject a fallback decoder, since it might not be possible to change the upstream behavior. The point of adding the fallback decoder is to empower developers to handle strange situations.So to me, the way the code is currently structured is much clearer about what we will accept vs. what we will try to accept, and helps developers deal with weirdness from oEmbed providers.
Just my $.02.
Comment #61
Ghost of Drupal PastJson::decode does not throw exceptions.
Comment #62
larowlanThat works for me, but with the caveat that there's no exception so we'd be checking for false or json_last_error instead of catching an exception
Originally I was against special casing one broken provider, but this isn't doing that, its just assuming that json is more likely than XML, which I can live with.
Comment #64
kim.pepperComment #66
longwaveOEmbed endpoints seem to like to make unexpected changes on us, so this should help protect against that. The new test coverage handles a few more cases where we might see strange responses.
Comment #67
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThe YouTube example that's in the issue summary has been fixed by YouTube. Are there any current examples of oEmbed providers that return an incorrect Content-Type header?
Comment #68
phenaproximaI'm not aware of any others with this problem that are supported specifically by core; we only do YouTube and Vimeo by default. That said, modules like oEmbed Providers let sites interact with a virtually any provider, which may do anything, so slightly increasing the fault tolerance here seems like a wise idea. I'll update the issue summary and re-categorize this as a feature request.
Comment #69
phenaproximaRewrote the issue summary as a feature request, so restoring RTBC.
Comment #70
phenaproximaComment #71
phenaproximaComment #72
effulgentsia CreditAttribution: effulgentsia at Acquia commentedFor me, this should be a "works as designed". From https://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2.1:
Therefore, to comply with HTTP specification, I don't think we should treat the response body as JSON just because it happens to be parseable as JSON, when the server is explicitly telling us that it's HTML.
If contrib wants to break HTTP specification in order to compensate for a provider's bug, it can implement a Guzzle middleware to intercept the response and change the response header.
Comment #73
phenaproximaI think we have to kung-fu fight now, @effulgentsia. :) I feel strongly in the opposite direction.
While I think that everything you've said is technically correct, I have to ask: how do we benefit from strict compliance with HTTP in this case? I don't see how we, or anyone, does. Because if there is one thing we have learned the hard way since the oEmbed system was committed, it's that providers occasionally do weird things, and stuff breaks from time to time. When that happens, it is site builders, authors, and end users who are punished. Their sites stop working, without warning, due to no fault of their own, and they are left without any immediate corrective action they can take, unless they have the skill and time to hack their way through custom development (assuming a workaround is even possible).
I cannot imagine that such an experience increases anybody's confidence in Drupal.
And sure, we could have a contrib module that implements a workaround. But that's not much better, because imagine the scenario: every YouTube video embedded on your site has stopped working and whole pages, maybe even your home page, are breaking due to oEmbed exceptions. You are frazzled and have to frantically figure out what's going wrong. Is it really a good thing if we force people to find some random "break glass in case of obscure YouTube emergency" contrib module? I would argue that it is not. It makes Drupal look complex, intimidating, unfriendly, and fragile.
I'm all for keeping complexity out of core, but the proposed change is one that actually simplifies core a little, whilst adding no new API surface -- both of which benefit core developers -- and makes it harder for sites to break in the face of inevitable quirks and bugs on oEmbed providers' end...which benefits site owners and end users. You're right, it departs from the HTTP specification, but in the name of making things work a little better for the people who are actually affected...that feels like an acceptable trade-off.
And hey, maybe I'm completely in the wrong here, and maybe there's some massive benefit that I'm not seeing. I've been wrong before and I can be wrong again. But if that's the case, I definitely need to understand the rationale better, and why this would not just be punishing end users.
Now, having said my piece, I do not want to leave this issue in "needs review", or it will rot for eternity in limbo. I'd prefer if committers weighed in collaboratively to make an ultimate decision here. Therefore, I'm tagging this for both product and framework manager review.
Comment #74
rlnorthcuttFWIW, I agree with @phenaproxima on this. If this is something that Google can unexpectedly break, then we have to assume that other providers will... and Google may well again. The proposal does provide a greater measure of reasonable robustness, doesn't change the interface, and is testable and secure... the benefits to the user outweigh the benefits of architectural rigidity. This is a good example of when it sometimes makes sense to break convention.
Comment #75
longwaveFor what it's worth I also entirely agree with @phenaproxima here, having already fell foul of the YouTube issue on a production site, and with the full expectation that a similar situation can and will occur in the future. In situations like this I think we should aim to follow Postel's robustness principle: "be conservative in what you send, be liberal in what you accept".
Therefore I'm respectfully setting this back to RTBC in order to get the manager reviews on this.
Comment #76
effulgentsia CreditAttribution: effulgentsia at Acquia commentedSeems that other projects agree with you as well: Wordpress, Embed/embed, and Typo3's oembed extension all seem to parse the JSON without checking the content-type.
That makes it all the more likely that providers can fail to provide the correct header without people noticing, and I agree that if Drupal sites are the only ones that start failing when that happens, then that's not good for Drupal site owners/users or for the Drupal project.
Ok, given the widespread precedent for it in the ecosystem, I'm convinced we should do this as well. Therefore, removing the "Needs framework manager review" tag, and I think that also means we don't need product manager review, so removing that tag too.
Comment #77
Gábor HojtsyI agree based on #76 that this makes sense, thanks!
Comment #78
effulgentsia CreditAttribution: effulgentsia at Acquia commentedSaving issue credits.
Comment #81
effulgentsia CreditAttribution: effulgentsia at Acquia commentedThanks, everyone! Pushed to 9.3.x, and cherry picked to 9.2.x since this is non-disruptive and it's possible that YouTube or another provider will encounter this again prior to 9.2's EOL.
Comment #82
effulgentsia CreditAttribution: effulgentsia at Acquia commented