This issue is spun off from #2831274-390: Bring Media entity module to core as Media module.
There has been some disagreement and bikeshedding about how to handle updates to media entity thumbnails. Currently, media entities have an updateThumbnail() method but it is an internal method and should not be called by external code. We need to decide on what the API will be for this, implement a patch if necessary, and finalize this once and for all!
- We want to download a thumbnail for media (when not queued) before saving the media item to prevent a double save.
- Since Media::preSave() is part of the database transaction we can't use that to download thumbnail from an external source. This causes fails in SQLite.
- Not only downloading thumbnail can be slow, other API's can also be slow or unavailable causing similar issues.
This is postponed on #2831274: Bring Media entity module to core as Media module.
Remaining Tasks
Collect all the history of this issue, all the bikeshedding, and so forth, and move it into this issue so that we can have a coherent place to talk about it and decide how to proceed.- Write a patch if necesssary.
- Commit the patch, if it exists, and breathe a collective sigh of relief.
Proposed solution
- We are going to override the media storage handler to call a new
updateMetadata()on media before the transaction starts. We will need to document why this is done since ideally we would do this inMedia::preSave(). - We move
shouldUpdateThumbnail()to the storage handler since we don't want it as a public method in Media. - We will remove
updateQueuedThumbnail()and make the queue useupdateThumbnail(). - When the thumbnail is empty, we add a default value in
Media::preSave(), this is done for the media name as well.
| Comment | File | Size | Author |
|---|---|---|---|
| #133 | 2878119.124_133.raw-diff.txt | 40.32 KB | dww |
| #133 | 2878119-133.patch | 58.91 KB | dww |
| #124 | interdiff-122-124.txt | 640 bytes | seanb |
| #124 | 2878119-124.patch | 64.53 KB | seanb |
| #122 | 2878119-122.patch | 64.87 KB | seanb |
Comments
Comment #2
phenaproxima#2831274: Bring Media entity module to core as Media module is in, so this is no longer postponed!
Comment #5
osopolarFWIW: There is a related issue in the queue of Media entity image module.
Comment #6
phenaproxima#2831944: Implement media source plugin for remote video via oEmbed has stretched the current thumbnail non-API to its absolute limit, and necessitated some ugly, temporary, thumbnail-related hacks in the Media module in order to land that issue. It is therefore imperative that we get this issue sorted out, so I'm bumping the priority.
Comment #7
phenaproximaA first attempt, just to see how many tests break. The main difference between this and what we already have is that, during Media::preSave(), the default thumbnail will always be used, so as to avoid expensive and dicey thumbnail download operations.
Comment #9
seanbCan we close #2862465: Move \Drupal\media\Entity\Media::updateThumbnail into its own service as a duplicate of this one? Or close this and continue in the other issue?
Comment #10
phenaproximaI'm closing the other issue because this is more general. We have the opportunity to totally redesign and improve Media's thumbnail API, so let's do that in here.
Comment #11
phenaproximaThis will almost certainly break tests, but it illustrates the solution I had in mind for this knotty problem. It's nice to have the slickness of thumbnail updates being part of the save operation, but it's extremely dicey and bad to have that logic be part of the Media entity class. This moves the logic to the storage handler, which should hopefully give us the best of both worlds.
Comment #12
seanbSo, just had a nice discussion about this with @phenaproxima and @marcoscano. We decided on the following:
updateThumbnail()on media before the transaction starts. We will need to document why this is done since ideally we would do this inMedia::preSave().shouldUpdateThumbnail()to the storage handler since we don't want it as a public method in Media.updateQueuedThumbnail()and make the queue useupdateThumbnail().Media::preSave(), this is done for the media name as well.Please let me know if any of this should be changed.
Comment #13
seanbImportant change of direction. Phenaproxima noticed that other external metadata could have the same issues as the thumbnail downloads. Slow API's could cause even fetching the name metadata to be slow.
To fix this we should probably not only update the thumbnail before the save transaction or from the queue, but all metadata. Instead of
updateThumbnail()we would add aupdateMetadata()method toMediaInterface.The only problem we have is the media name. This is defined as metadata. If fetching the name fails, or is done through the queue, we need a temp name for the media item.
Comment #14
phenaproximaA first, honest attempt with the agreed-upon direction. Let's see how many tests this breaks (hint: a lot).
Comment #16
marcoscano@phenaproxima nice work! I'm glad we are cleaning and simplifying some of the thumbnail-handling, it was a bit obscure so far.
I know we always try to minimize the API surface we add to the interface, but could we consider making these methods public as well?
I definitely see use cases for contrib or custom code wanting to perform actions such as: revert the thumbnail to the default one, force-trigger another update of the thumbnail at some random point, or updating just the thumbnail without necessarily updating the mapped fields, etc.
I have thoughts on how hardcoding the ALT and TITLE here would impact the solution being cooked in #2956368: MediaThumbnailFormatter produces unhelpful text alternative and title attributes for media thumbnails. But once this is in theory just some "temporary" initial values, I guess we shouldn't worry too much at this point.
If we end up making this public, though, then I'd try to have some optional arguments so callers could opt-out of this hardcoding of ALT/TITLE, if wanted.
Currently if the source does not provide a default name metadata, the source base falls back to a name made of
media:{type machine name}:{uuid}. Should we use that here, in name of consistency?Looking at the implementation this doesn't seem to completely describe what this method does. (For instance it also updates the thumbnail, which is not a mapped metadata)
Nit: My phpcs is complaining we need a class description string here.
Are we sure the queued-thumbnails scenario is well handled here? In that case, when the worker triggers the saving of the entity, the media item will already have a thumbnail (the default one), wont' be new, and the source field would not have been changed. So unless I'm seeing something wrong, this will just re-save the entity in that case?
How about we document here why this method exists, and why the call to
parent::save()must be executed after the::updateMetadata()call?This may be a personal preference, but I tend to hate when the docblock doesn't give me enough info and forces me to read the code implementation to understand what the function does... :) Maybe just adding something like
"Metadata should be updated whenever the sourcefield is changed, or if this is a new entity being created."to the current text would improve this help text a lot!Can we call this method
::shouldUpdateMetadata()?Apparently
$entity->originalis set as part ofEntityStorageBase::save(), so I guess we could directly always use::loadUnchanged()here?Is it too crazy to wonder if this would better live inside
MediaSourceInterfacerather than here? No strong opinions on this though...If we do move it to the source, I would move the
::shouldUpdateMetadata()method there too.Comment #17
marcoscanoDidn't address any of #16, just trying to fix some tests.
One of the things I noticed while working on this is that we were always setting the "temporary" name, while we should only do it in queued operations. Otherwise we bypass all of the default naming provided by the sources. I've changed this in the
MediaStorage::save()and it seems to work well now.Fixed too some other edge cases not being dealt with, according to the tests :).
Also, removed this chunk of the test
Drupal\Tests\media\Kernel\MediaSourceTest::testThumbnail:Because I cannot think of a normal use-case where the user removes an existing thumbnail and we need to detect that. The basefield is not available on the form display, so probably the only way of removing the thumbnail is... by code?
Again, in the same test, I've commented out:
With the current approach, metadata is updated/mapped only:
- When the media is created and non-queued
- When the queue item is processed
- When a media is saved AND source field changed
In that chunk of code we test something else. Is this an intended feature we want to support? I personally don't think so, but I left the test commented out so it's more explicit we need to discuss this further.
Comment #18
seanbActually, I think this is intentionally. The name could be from an external webservice that might not be available or really slow. We can't depend on the media source providing it, same as the thumbnail.
This also made me notice something else. When looking at the patch the name currently only is provided by the media source when the name is empty. This is a problem when you want to have a media source set the name when you change the source field for example. I found a bug about that: #2933012: Expose the filename as metadata for file based media sources.
Might be best to solve this through #2882473: Hide the media name basefield from the entity form by default.
Do we really need this parameter? Just don't call the methos if you don't want to update right?
We should probably remove this check right? If we call the method, just update everything. Why only when empty or if override is true?
See my comment above about the parameter.
Good question, I guess updating the metadata only when the source field changes is a better than updating it every time. So when you only clear or change a metadata field, we should probably not automatically update that.
I think the point of the test was to check if an empty thumbnail field gets updated when empty. Also when removing the file entity the thumbnail field is also cleared right? I think the test is a good one to have.
Comment #19
marcoscano@seanB thanks for reviewing!
Well, that's why the queue-based approach exists, right? If the media type is configured not to use the queue, then it is assumed that metadata can be fetched in real-time, even if that's an expensive operation. Or maybe I'm missing something here :)
As I mentioned in #2882473-18: Hide the media name basefield from the entity form by default, IMHO I believe this is an impossible problem to solve, without risking to lose user-entered data. For me the safest approach is just to stick to update only if empty.
1)
Not really... once now the fields will not be empty (because we have generated a default value prior to saving), when the queue worker runs, it would do nothing if the
$overrideparameter isn't there (as the test failures demonstrate). The override parameter would be indeed unnecessary if we removed also the ->isEmpty() check, but in this case we would run into the other issues I mentioned above.2) and 3) same as before
4) Yep, that's my opinion too
5) Hm, how is this different than 4? The source field hasn't changed here, so we sholdn't update the thumbnail, even if it's empty. If some code deleted its value, I guess it's logical to expect that the same code should be responsible for setting a new one? (once the user can't delete the thumbnail through the UI).
Thanks!
Comment #20
seanbThe media source name could be empty though if the source fails to get one (external service could be down). But I agree we should not undo the logic in
getName(). How about we add the default name code frompreSave()togetName(), and move the code you added inMediaStoragetoupdateMetadata()(since the name could actually be metadata)?I'll add my thoughts about #2882473: Hide the media name basefield from the entity form by default in that issue.
If we remove the complete if statement (
if ($this->get($field_name)->isEmpty() || $override) {, including theisEmpty() check) and always update the metadata this will not be the case right? The code calling the method should determine whether or not it actually wants to update the metadata.You are right, we should replace this with a test more specific for
updateMetadata().Comment #21
phenaproximaoEmbed support has landed (again -- hopefully won't get reverted this time), so this needs a re-roll since that patch changed a lot of Media's save logic (in order to work around the very problems we're trying to solve in this issue, that is).
oEmbed support lit a fire under this issue's butt because it highlighted the weaknesses of Media's remote media handling. We were able to hack Media enough to bludgeon it into working and consistently passing tests, but the hacks in question are unmaintainable and were always intended to be temporary anyway.
Now that core officially supports remote media, this issue is critically important to making that functionality rock-solid. Therefore, even though we are adding something(s) to Media's API, I'd like to propose that this become a beta target for 8.6.x, or at least be marked for backport to 8.6.x, since this will likely miss 8.6.0's alpha deadline. Therefore, tagging this for release manager review.
Comment #22
phenaproximaThis is extremely important to Media remaining stable, so I'm marking it as Media-critical.
Comment #23
phenaproximaBriefly discussed this issue with @catch in Slack. Because this is an API addition, we should be able to land it during the 8.6.0 beta period, so I'm marking this as a beta target since honestly, I think we'll be lucky to land it during the alpha stage. Once 8.6.0 is out, backporting this issue will be a dicey proposition, but we'll cross that bridge when we get there.
Comment #24
catchThis adds a single method to MediaInterface which is fine under the 1-1 interface rule.
I agree this is an important issue (would probably reclassify as a bug), but while it seems like a high impact/low risk change up until beta (assuming that's the only API impact), less sure about it getting in during rc or a patch release.
Comment #25
phenaproximaThanks, @catch!
Fire up your coffee makers and let's land this during alpha/beta.
Comment #26
seanbRerolled and just uploading my progress and checking what the testbot thinks. Also trief to address the comments in #16:
1. We discussed this on Slack and decided that we would only add public methods we need for now. If there are good reasons to make other methods public, feel free to open a follow up and let's discuss it.
2. There is a follow up for better default, so let's improve that in the other issue.
3. Fixed. Changed it to match the source base.
4. Fixed.
5. Fixed.
6. Fixed, added a call to updateMetadata() before saving the media item from the queue worker. I can't think of a reason why we would not always want to update the metadata when processing the queue?
7. Fixed. At least added some documentation, but please let me know if this could be further improved.
8. Fixed.
9. Fixed.
10. Fixed.
11. I don't think we need to make this a public method on the media source. This is pretty generic. If needed we can fix this in a follow up.
No interdiff because the reroll messed it up.
Comment #28
seanbI'm affraid we will still need presave to set the default thumbnail and the default name for media. That should fix the tests.
Comment #30
deviantintegral commentedWith this patch, there still seems to be a key limitation with thumbnail handling: there is no good way to handle transient errors when retrieving thumbnails. For example:
This simplest approach is to not catch any exceptions in getMetadata() in a media source plugin. However, if thumbnails aren't queued, then those exceptions are never caught by the media entity form, and end users are left with a 503. If a source plugin does catch an exception and log it, it's still required by the getMetadata() method to return NULL. For thumbnails, this means the default thumbnail is saved as the canonical thumbnail, and the entity is not re-inserted into the queue to try again later.
This comes up with any metadata, and not just thumbnails, but at least with other fields an empty mapped field can be updated the next time the media item is saved.
I suggest we add an exception along the lines of MetadataNotAvailableException, that is thrown by source plugins when an error prevents loading an attribute. In this issue, we add a catch for this with thumbnail handling, leaving another catch / error handler for the main media entity form in a followup.
Comment #31
seanbAdded the default thumbnail alt/title through presave as well. Also fixed an issue in the queue. Last but not least updated the text on the media type form since we are no longer queueing thumbnail downloads, but all metadata updates. We might want to have a followup to actually rename the queue?
#30 I agree this could be an issue. What to do when metadata is not available? We could stop the media item from being saved, but that doesn't work when the metadata is fetched in the queue. I think this is a separate issue and best discussed and fixed in a separate issue though.
Comment #33
seanbSigh, broke another test changing the media type form.
Comment #34
marcoscanoNice work, thanks @seanB!
A) If I recall correctly from the slack chat we had, we still need to document very clearly that the default name generation that source plugins provide should always be done without any remote API calls, once it will be called from inside the transaction. I can't think of any better place for this other than the
MediaSourceInterface::getMetadata()docblock, although this only refers to the default name metadata.B) I believe this could be a good place for fixing #30, and I like the exception-based approach. I'm just not sure what would be the best behavior when the media entity catches the exception thrown by the source when the metadata generation failed. Should we force-queue it so it can be retried later? Use an empty value and show an error to the user? Other ideas?
Also,
Correct me if I'm wrong, but doesn't this represent a change respect to the previous behavior? (maybe I'm just confused about this point).
If I understand it correctly, before the mapped fields were only populated if empty, regardless of everything else. With the new code, when we update metadata, we always update as well the mapped fields, regardless if they are empty or not. If this is the case, in the uncommon but possible scenario where a user is saving a media item with a different source value but manually-entered mapped field, we would override the manually-entered data with the fetched metadata. It may actually make sense, but if it's a new behavior we should at least document this very well. (Probably in the metadata mapping form?)
Also, once
::updateMetadata()is public, I believe we should explicitly indicate on its docblock that whoever is calling this method programmatically will also update mapped fields, regardless of their current values.Could we simplify this with
@return $this?Could we simplify this with
@return $this?Nitpick: sorry technically this sentence should read:
If we don't want to move this method to the source interface, maybe we should use then
MediaSourceInterface::getSourceFieldValue()instead here to get the source field value?That is a public method of the source interface, and I was under the assumption that it should be used as the "canonical" way of getting the source field value.
Now that we are not dealing with thumbnails only anymore, maybe a better label for this could be something like "Queue metadata fetching" ?
Again, maybe "Fetch metadata via a queue (...)" ?
(No strong feelings though)
Comment #35
seanb#34
A. Fixed. Added documentation to
MediaSourceInterface::getMetadata().B. Off course it could be done in this issue, but unavailable remote metadata is a separate issue imho.
1. Yes, we now always update metadata if the source changes. If the metadata did not update before when a source field changed, this is a bug right? Mapped metadata fields should not be manually filled by a user. Maybe we should even go as far and disable them in the media form? That would probably be a follow up as well.
2. Fixed.
3. Fixed. This was a little inconsitent in
MediaInterface, so changed it forsetCreatedTime()as well. The setName() method already used@return $this.4. Fixed.
5. Fixed.
6. Fixed.
7. Fixed.
While doing nr 6 and 7, I noticed the queue was still named specifically
ThumbnailDownloader. Also the flag on media types was namedqueue_thumbnail_downloads. I feel this is confusing and let's just rip the band-aid off.I renamed the flag and deprecated
thumbnailDownloadsAreQueued()andsetQueueThumbnailDownloadsStatus()in favor of newmetadataFetchingIsQueued()andsetQueueMetadataFetchingStatus()methods.I also renamed the QueueWorker plugin from
ThumbnailDownloadertoMetadataFetcher. I think we are allowed to do that. Maybe we have to keep the plugin ID the same though?Hope this doesn't break the tests too bad.
Comment #36
phenaproximaWe can probably implement an alter hook so as to "alias" the old plugin ID --
hook_queue_info_alter()should do the trick.Comment #38
seanbAdded
hook_queue_info_alter()for BC and fixed the tests (hopefully all of them).Comment #40
seanbAnother try to fix the tests.
Comment #41
marcoscanoThanks @seanB!
A) +1 for removing the band-aid and renaming the queue. But the same reasoning (i.e. Let's fix everything that makes the metadata handling inconsistent/weird/fragile/etc in this issue instead of only half-solving them) would apply, IMHO, to try and have at least a very basic approach to what to do when remote metadata is temporarily unavailable. I'd try to see how the solution for that looks like here, if we realize it's starting to get out of hand, then we can defer it to a follow-up.
B) About #34-1: I still think this is a big issue that we should try to solve here. As I said, making sense or not, the previous behavior is something that is out there since a year already, so people can certainly be relying on that. It's not that weird actually, it just means that the feature is: " When mapping metadata to entity fields, these will be populated when empty with metadata from the API. ", instead of " When mapping metadata to entity fields, these will be updated every time the source field value changes ", which is the behavior you are describing.
Should we have a @todo indicating that this should be removed in D9, or similar ? (not sure how this type of workaround to keep BC is documented)
Is it possible that we've lost the
::getSourceValueField()you had added in one of the previous patches?Also, ::getSourceValueField() will return the first value of the field, not a FieldItemInterface object. Because of that, I think:
a) we should improve the documentation of
::getSourceFieldValue()to clearly indicate that. (Probably in a follow-up?)b) Use a standard value comparison, instead of
::equals()c) Check if we have test coverage for this. I have the feeling that the tests should have picked the wrong usage in a previous patch, not sure if they did.
Looks like there's a typo here.
Comment #42
seanb#41
A) Just had a discussion about this with Marcos. Metadata fetching is the responsibility of the source. There could be a lot of different reasons why metadata was not fetched correctly, and not in all cases does it actually make sense to fetch it again. We could let sources throw an exception, but what do we do next? Maybe it's better that media sources determine what should happen. For example, a media source could add the media item to the queue for later processing. Another thing we discussed is that it would be nice if users had some way of triggering metadata fetching from the interface. If for some reason only the name changed in a remote document management system, the source value could be the same but there should probably be a way for a user to get the latest metadata and update it. Let's create a follow up for that.
B) Also discussed this with Marcos. He's right (off course) that we currently have the following sentence on the media type edit/create page that people could depend on:
Information will only be mapped if the entity field is empty.. I personally think that we should always update metadata, at least when the source field changes, but if currently people are depending on source not updating filled in fields, then I guess we shouldn't change that. So we should probably make this configurable. We could leave the old behaviour for existing sites, but should probably change the default to always update metadata for new ones. I've added a checkbox to configure this per media type to the patch.1. Fixed. There does not really seem to be a standard for it though.
2. Fixed.
3. Fixed.
Let's hope the new checkbox doesn't break too much.
Comment #44
seanbAhh that needed a reroll.
Comment #45
phenaproximaI really like this patch. It's so logical and it deeply un-breaks the crappy metadata updating code we originally landed in core. Beautiful work. I cannot wait to see this committed. That said, we have some (not horrible) work to do...
I don't know if I like the naming here. How about we call this overwrite_mapped_metadata, and describe it as "Overwrite previously mapped metadata values"?
The entity system is not guaranteed during an update hook, so we need to do this with the config factory. Also, we will need a test of this update path.
$queues can be type hinted as an array.
Let's add a @see to this issue, for reference, and a @deprecated so we can easily remove it.
Should "once" be "since"?
Let's rename $always_update_mapped_metadata to $should_overwrite, because that would more clearly describe what the flag represents.
I don't think we should do this here; it convolutes the method. loadThumbnail() should always take a URI and always return a file entity, period. Let's move this default thumbnail URI stuff elsewhere.
This might call for a change record, since it's pretty much a reversal of what Media has been doing for two minor versions now.
Are we deprecating this method? If so, we need to throw a deprecation notice and link to a change record.
Should use $this->get('queue_metadata_fetching').
I find this method name kind of obscure. How about queueMetadataFetching()?
s/should/must. We need to be clear about that here.
Correction: it *must* not use any remote API calls. There's no reason not to be emphatic about this.
Not technically in scope, but I think this makes things clearer so let us keep it :)
Nit: "source field" is two words.
Why aren't we using equals() any more? I think that is a much safer way to determine equality than this is, since it keeps field type-specific logic properly encapsulated.
I think we need to make it clear that "metadata" includes thumbnails. Otherwise, users will wonder why their new media items are displaying a generic thumbnail instead of the one they were expecting.
"Metadata and thumbnails", to make it clear that we consider the thumbnail to be a piece of metadata.
We'll need to link to a change record.
Let's change any assertEquals() calls we're touching here to assertSame(), while we're at it.
Comment #46
phenaproximaAlso, we need to improve the issue summary, since this scope has expanded.
Comment #47
marcoscanoMediaSourceInterface::getSourceFieldValue()returns the value of the first delta, not a FieldItemInterface object.Comment #48
phenaproximaWe don't need to use getSourceFieldValue(). How about we just get the source field name (which is easy to do), then call get() on the media item and the original version, then call equals()? That's what we were doing in an older version of the patch...
Comment #49
marcoscanoI know, and I specifically
bothered Sean until he agreedrecommended that we changed it to::getSourceFieldValue()because that's the "canonical way of getting a source field value™" of a media entity... That's a public method on the source interface, and we are using it in multiple other places of the code. It would be inconsistent (and in my opinion invalidate the separation of concerns) if we get the values directly. Sources know about their assets, they should do whatever is necessary inside::getSourceFieldValue()and everyone should, IMHO, use that. If we use it in some places, and in others we get direct values from the entity fields, we lose part of the benefit of having the method in the first place...Comment #50
phenaproximaSince it's a method of MediaSourceInterface, I guess that makes sense. But I think this could potentially backfire on us later, in circumstances where the === comparison we're doing is not going to be a sufficient determinant of whether the value has changed. For example, if the source field value is a formatted text field, then changing the format could constitute a significant enough change that an update is required, even if the primary value has not changed.
Having said that...maybe that's an edge case and I'm being overly paranoid. And even if someone does encounter it, it is workaround-able by ninjas. So for now, let's stick with what we have in this patch, and open a follow-up to talk more about whether we need a more robust way to determine if a change has occurred.
Comment #51
marcoscano@phenaproxima thanks for reviewing!
1- Fixed
2- Pending
3- Fixed
4- Fixed
5- Fixed
6- Fixed
7- Fixed
8- Pending
9- Pending
10- Fixed
11- Fixed
12- Fixed
13- Fixed
14- 👍
15- Fixed
16- Won't fix (as per #50)
17- Fixed
18- Fixed
19- Pending
20- How about a follow-up to change all of them at once? I find extremely confusing to have some asserts using one method and others a different one. It will make future readers scratch their heads for a moment unnecessarily.
Didn't look at the tests yet.
Opened follow-ups:
- #42-A: #2983456: Expose triggering update of media metadata + thumbnail to end users
- #50: #2983457: Investigate if a more robust way of detecting media source field changes is necessary
- #45.20: #2983458: Use assertSame() instead of assertEquals() in Media tests
Comment #52
marcoscanoTrying to fix some tests.
It turns out we have several tests that manipulate "broken" media assets (i.e. without a source field on the entity). Now that we call
::getSourceFieldValue()as part of the saving process, they all broke. I believe it is legitimate to fix this here, which is where we are introducing the change. The idea here is that if the entity has a source field configured in the source config, but that field is not available, we just return NULL.Thoughts?
Comment #53
marcoscanoNow that I think more about this, I believe we should do it differently.
This is a method that source plugins can (and should) override when appropriate, so we should not limit this to the fist delta, which is an implementation detail for the types we ship in core, but not necessarily true for all sources.
I suggest then we leave this more generic with something such as:
Which is what this patch implements. I'd be cool to defer this to a follow-up though, if we believe this needs more thorough discussion.
Comment #54
marcoscanoMore test-fixing
Comment #55
marcoscanoOf course having Umami media types committed in the meantime messed up with us as well :P
Another try.
Comment #56
seanbHere is a first stab at a change record: https://www.drupal.org/node/2983661
#45
2. Fixed.
8. See change record.
9. Added a link to the change record.
19. Added a link to the change record.
Also changed some minor things from #52 and #53 and added the new 'overwrite_mapped_metadata' to all media type config in the standard profile and Umami.
Comment #57
marcoscanoSoooo.... @seanB and I just realized that this code comment and this UI text are both wrong, because this code is indeed always updating the metadata when the source field changed, so the non-override thing was only happening when you save the entity without changing the source field value. Which we are not changing in this patch, so the whole checkbox to make overriding configurable is useless.
(╯°□°)╯︵ ┻━┻
Comment #58
seanbNew patch.
Comment #60
phenaproximaLooking really nice!
I think we want to use clear() here, not set().
Let's put the @deprecated in the function doc comment, and link to the change record under that.
I'd ideally like to move this into the source plugin, but we can do that in a follow-up.
We need to throw an E_USER_DEPRECATED error here.
Formatting is wrong here: it should be something like
- default_name: (explanation here)We can improve this parameter description. What does it mean for metadata fetching to be queued? This is a good place for us to explain that.
We should assert that $queue_statuses contains at least one TRUE value, so as to be sure our update test is actually testing something (
$this->assertContains(TRUE, $queue_statues))There's no need for $config_factory, just use $this->config() to retrieve editable configuration.
Why are we doing this in the test? Surely we just want to read back the updated configuration and assert things?
Regarding #57, I discussed briefly with @marcoscano on Slack. I'm really glad we discovered this inconsistency now and are removing the incorrect text/comments in this patch. Landing this patch will really untangle the hideous logic surrounding thumbnail/metadata updates, so with that in mind I suggest we open a follow-up issue to make overwriting configurable and address that in a dedicated issue/patch. Changing it now is out of scope.
Thanks for your continued excellent work, @marcoscano and @seanB!
Comment #61
deviantintegral commentedI think there's some UX implications of queuing all metadata and not just thumbnails. For example, here is what an end user would see after embedding a YouTube video, before the queue has run:
Also, it looks like the name of the media item is not updated when processing the queue.
I'd suggest:
This is actually something being asked for on a project I'm on. It would be especially useful when editors override video metadata. For example, a video title and summary is wrong in an upstream source. It's (temporarily) overridden in Drupal, and some way to say "please revert this back to the source metadata" would be very nice. I filed #2983724: Add a user interface to fetch and revert mapped metadata for this.
Filed #2983728: Handle metadata errors gracefully.
Comment #62
marcoscano#61 brings some valid points, thanks @deviantintegral !
Some notes:
This is actually a bug with the "Remote video" type shipped in D8.6, where the title metadata is not mapped to the name basefield. It is being fixed in #2983325: Use title -> name mapping for the "Remote video" type.
Also, we briefly discussed yesterday too the issue with the iframe dimensions and scroll bars. It turns out that the oembed request we perform will respect the formatter settings defined for that field. By default these settings are null (no values for width / height), so the iFrame will be built with whatever dimensions the oEmbed provider decided to send back as default. Unfortunately it seems that Youtube sends a very unreasonable 480px-width iframe... :/ Not sure if it happens with all videos, or if we should hardcode more reasonable default values for the formatter, but this should probably be better discussed in another issue I guess.
Having that said, I believe the other points are very accurate and we should probably try to solve them here too.
If the queue worker is responsible for publishing back the item when metadata is updated, I'm not sure other scenarios where this would be useful. Do you have something elese specific in mind?
About triggering manually the metadata update, we started to look into that in #2983456: Expose triggering update of media metadata + thumbnail to end users. Not entirely sure if this would make #2983724: Add a user interface to fetch and revert mapped metadata a duplicate, once you mention the ability to also revert the already mapped metadata. However, with the current patch, I'm not sure if such scenario would be necessary. Basically, the metadata will be updated (and mapped fields overriden) every time the source field changes. However, if the source field doesn't change, users can simply edit the item and manually change the field values, saving the entity again. No metadata update would be triggered in this second save operation. Would that be enough for the use case you had in mind?
Thanks!
Comment #63
phenaproximaThanks for the very excellent points you raised, @deviantintegral! I agree with all of them 100%. However, I'm not sure this is the issue in which to solve them. They are all very important UX problems, but I think this issue needs to stay tightly focused on the problem at hand, which is correcting the poor design of the underlying API for metadata retrieval. Until we do that, very little meaningful progress can be made, and we are sitting with a major design flaw that can potentially compromise the operation of sites that have a lot of media items (especially ones that use oEmbed). Since there is UX fallout here, I think we can and should fix the lowest-hanging fruit now (i.e., setting a message when saving a media item), but the other points should be handled in high priority follow-ups.
I am completely in favor of opening those issues now and trying to land them as soon as possible (i.e., let's shoot for 8.6!), but in this issue, I'd prefer to deal only with the API-level problems.
Comment #64
marcoscanoHowever, the "broken" name issue for queued remote videos could be seen as a regression we are introducing with this patch.
Testing scenario 1:
- Drupal 8.6.x HEAD
- Enable Media
- Go to the "/admin/structure/media/manage/remote_video", map the "Resource title" to the "Name" field, and check "Queue thumbnails download"
- Create a media item by pasting a youtube URL
- This is what you get:
Testing scenario 2:
- Drupal 8.6.x HEAD with patch #58 applied
- Enable Media
- Go to the "/admin/structure/media/manage/remote_video", map the "Resource title" to the "Name" field, and check "Queue metadata fetching"
- Create a media item by pasting a youtube URL
- This is what you get:
IMHO it's hard to solve this without assuming that the queued items are not "yet ready for public consumption"...
Comment #65
marcoscanoThis addresses the feedback from #60, thanks for reviewing!
Leaving in Needs Work because I believe we should at least fix:
- Improve the confirmation message when the item is queued
- Implement a new checkbox (selected by default) allowing sites to configure whether the queued item should be created unpublished, and publish it once the worker has successfully updated the metadata.
Comment #66
marcoscanoUps, forgot the patch.
Comment #67
marcoscanoWe discussed this with @daniel.bosen, @chr.fritsch, @seanB, @Gabor Hotjsy at Drupal Dev Days. The consensus about the above mentioned regression and how to handle it appears to be:
For newly created media items, it is indeed bad UX to have a "broken" item published for an undetermined time. For existing entities, however, if you just update the source field, it would probably be bad to unpublish something that is already available on the site. Those entities would have (for example) a title that might not reflect the new video, but it's not a "broken" default value for the title, so we probably shouldn't force it to be unpublished.
So we think the best compromise is:
- When a user is creating a new media item for the first time (and it's queued), we will unpublish it until the item is processed by the queue worker.
- We will also store in the queue item the status that the user-defined when saving the form
- The queue worker, after updating the metadata, will restore the item's status to the status the user selected when saving the form
- We will not change the publishing status of existing entities when processing them in the queue
- We will adjust the confirmation message to reflect all these scenarios, so people know what's happening.
Gábor also pointed out that it would be weird if the user didn't have visibility over the media item that they just created. So that indicates us that fixing #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user is probably a must for the above plan to make sense.
Comment #68
phenaproximaI think that plan sounds good. +1 from me.
Comment #69
seanbHere is a patch implementing improved media save messages and unpublishing new queued items. Still have to fix the test fail in #66 by adding one or more media types with queueing enabled to the fixture.
Comment #71
seanbRerolled after the latest changes and tried to fix the tests.
Comment #72
deviantintegral commentedDid some basic tests by hand and so far this is working as advertised. A few code notes below.
This is great!
+1
I don't see a status message for this anymore. And comment typo :D
This is a pretty long message and I wonder if it would be a bit overwhelming. How about something like:
"The @media_type has been saved will be published once all @media_source_label information has been retrieved. <a>Download media information now</a>."_However_ as a site builder, if I've queued processing it's probably because I don't want to worry about PHP timeouts. As an editor, what I'd probably want is some way to say "this is a really important asset, so please put it at the top of the queue". Perhaps we leave out a link entirely for now.
Doesn't
$saved_messagehave to be inline with the t() method call to be picked up by translation tools? Or is that obsolete now?Should we document that thumbnail_alt is not required but highly recommended?
I could see developers thinking "I just need to return $media->id() here". Should we clarify that it represents the media item in the "remote" system? Or just simply "... represents a given media item. This is not the media entity ID, as the same media item may be referenced by multiple media entities. Typically this is a remote URL of some kind."
Should we summarize why we override the base class - something like "Storage handler for media that handles queued metadata fetching."
This method feels like it belongs on the media entity itself, given that's the "subject" of the method.
Should this be "media entity" instead of "media item"?
Comment #73
seanbAddressed some of the comments in #72
3. Not sure what you mean, the status message is updated in the patch? Fixed the typo.
4. I feel the proposed message will probably lead to confusion. I agree a short message is better, but I think explaining the metadata fetching is queued for later processing and providing a way to fix that is necessary. If we want to fix that, we could allow editors to override queueing while create a media item, but I think that is out of scope for this issue. I left this for now.
5. Not exactly sure what you mean with "have to be inline with the t() method"?
6. Fixed. Added some documentation.
7. Fixed. Removed the word "uniquely". Most media types are currently local files, so I don't think we can assume this typically is a remote URL.
8. Fixed. Added a comment to explain the storage is overriden to handle metadata fetching outside of the database transaction.
9. When adding media to core it was decided to use media item instead of media entity. Also left this for now. I still see some occurences of media entity in some places, we should probably fix that in a follow up.
Comment #74
seanbChanging status to get some more feedback.
Comment #76
phenaproximaI love this patch! It is so much better than what we had before and fixes so many problems, it's just so many orders of magnitude of improvement that I feel like I need a cold shower after reading it. Great job!!
The @deprecated needs to explain the nature of the deprecation as per policy (referencing the existing change record), and there should be a blank line before the @see.
$queues['media_metadata_fetcher'] is guaranteed to exist, so I think we could drop the
ifcheck. Additionally, we should have $queues['media_entity_thumbnail'] be a reference to $queues['media_metadata_fetcher'], so$queues['media_entity_thumbnail'] = &$queues['media_metadata_fetcher'].We're overwriting the mapped field every time we update metadata. Isn't this a BC break?
'langcode' should use $translation->language()->getId(), to keep the abstraction sealed.
Thank Cthulhu this method is going away! That's a load off.
Why did the name of the parameter change? We can leave it as-is.
I feel like $queue_metadata_fetching should default to TRUE for convenience. Can we update the interface and this method?
This patch doesn't implement a link. Let's add a @todo which references the related issue.
"...might be temporarily incorrect". Also, what if the item has been saved unpublished? We should only tell them the item will be published later if they have actually saved it as published, no?
Is this necessary? If so, can we explain why it is?
Should be "source-specific".
Same here.
"available" --> "possible"
"...for the thumbnail alt text".
Nice explanation!! Love it.
Let's call this $queue_item, since that's consistent with createItem().
This could use a comment.
Nit, but I've seen patches be kicked back for this even though I was the one who originally wrote it this way: let's put this all on one line.
This could benefit from a comment.
"directly" --> "immediately on save"
This should mention that the thumbnail is also considered metadata.
Let's change the description to "Whether the thumbnail and metadata should be fetched via a queue."
Can we remove "to download later"?
"directly" --> "immediately on save".
Why are we calling updateMetadata()? The storage handler calls that anyway if TRUE is passed to save().
I think we can use
$this->config($media_type_config_name)->get()here.Comment #77
marcoscanoFor the record, posting here a summary of a recent discussion @phenaproxima and I had on slack about BC:
In #34.1, I pointed out a possible BC break being introduced, because I was under the assumption that mapped fields were only populated/updated when empty. It turns out I was wrong (as indicated in #57), so basically we are not changing the previous behavior.
::updateMetadata(), so no mapped fields will be overwrittenIn fact, after this patch we will actually improve the consistency, once the texts that lead to confusion are part of the code that is being removed.
Comment #78
deviantintegral commentedThis is getting close!
I meant the link in this message, which was pointed out in #76.
You can't use variables inside of t() unless we previously know the string has been translated. See Variables in user interface text.
I missed that, and have been using "item" all through out a module to refer to the remote object specifically. Thanks for pointing that out!
Ok, so then it will be the responsibility of media API clients specifically call updateMetadata()? For example, if a webhook notifies a Drupal site about an update. I think that makes sense. If a module wanted to offer editorial overrides of metadata, and preserve them through an updateMetadata call, I guess they could just check for $media->original in getMetadata()?
Comment #80
seanbI think this should address all feedback.
#76
1. Fixed
2. Fixed
3. See #77. It looked that way at first, but it actually isn't.
4. Fixed
5. Yay!
6. Fixed
7. I'm not sure about this. Most media types should not have to be queued, only external ones. We could discuss this in a follow up, but I think queueing should be the exception for specific cases, not the default.
8. Fixed
9. Fixed
10. Fixed
11. Fixed
12. Fixed
13. Fixed
14. Fixed
15. Yay!
16. Fixed
17. Fixed
18. Fixed
19. Fixed
20. Fixed
21. Fixed, changed the text a bit to make sure it stays under 80 chars.
22. Fixed
23. Fixed
24. Fixed
25. Fixed
26. Fixed
#78
Thanks! Changed the way the translation texts are built to allow the translation to be picked up while at the same time allowing users to translate the full message.
Yeah, updating mapped metadata is a difficult issue. Maybe you want to update it when changing the source field (for example the title of a youtube video), maybe not (for example the width of the video)? But I guess sources can implement their own logic in getMetadata() and other modules can also revert updated metadata during preSave().
Comment #81
phenaproximaLooks great!
Let's reword this a bit: "@deprecated in Drupal 8.6.0 and will be removed before Drupal 9. Use the 'media_metadata_fetcher' queue worker instead of 'media_entity_thumbnail'."
Seems to me like this could be streamlined to avoid repeating these hard-coded strings:
Although, on second thought, this could translate incorrectly since it assumes each sentence makes sense in that order. Alternatively, we could just set three separate messages...
Comment #82
seanb#81
1. Fixed
2. We need to translate the text as a whole so we can't avoid repeating some of the texts. Setting up individual messages is an option, but I prefer the current setup for the same reason we need to translate the text as a whole. Translation of individual parts of the texts is not optimal. Adding some comments to explain what's going on hopefully makes it a little better.
Comment #83
seanbRemoving some tags.
The checkbox for optionally fetching metadata was removed again in #58. So no need to update the IS anymore.
The change record is: https://www.drupal.org/node/2983661
Comment #84
phenaproximaWe don't need to do this. Let's just click the "Edit" link instead.
Let's add a comment indicating that we expect this test to be absent because the media item is non-new.
Why are we removing these lines? I ask because, looking at MediaStorage, the metadata should absolutely be updated on save if the source field has changed.
Same here; won't the field be re-mapped on save? I'm sure there's a good reason for this, but we need to be able explain the removal of test coverage to committers :)
Ditto.
An extra semicolon was added to the end of this otherwise unchanged line.
Comment #85
seanb#84
1. Fixed
2. Fixed
3. Fixed
4. In the old situation, we updated the metadata when the source field changed or when the metadata field was empty. In this patch, we only update metadata if the source field changes. The new approach allows users to override metadata fields with an empty value (until the source fields changes) and seems more efficient for sources with external metadata.
5. See 4
6. Fixed
Comment #87
seanbFixed the tests.
Comment #88
phenaproximaCommitters, please backport this issue.
Why? Because it fixes a major architectural problem that was discovered while working on #2831944: Implement media source plugin for remote video via oEmbed.
The problem is that, due to the way the Media class was handling thumbnail and metadata mapping, it was possible for Drupal to be waiting on HTTP/network requests while saving remote media items -- while the database was in a locked transactional state! We jury-rigged a horrible hack to work around this in
\Drupal\media\Entity\Media::save()(the hack is removed in this issue), which was always intended to be temporary. The problem is, if you call\Drupal::entityTypeManager()->getStorage('media')->save($media_item)in the current 8.6 API, you'll be able to reproduce the conditions that caused the problem. This is very, very bad indeed and actually sort of breaks the API if you think about it (depending how you call the save() method, you might break your site). Without this fix, releasing 8.6.0 and trumpeting our new oEmbed system would be...well, it would be very not good.So, given the severity of the problem, the thoroughness of the fix, and the fact that it corrects our API and stabilizes remote media items (as well as making for much cleaner code), I'm officially requesting a backport to 8.6 even if we are adding an API method (which is covered under the BC policy's 1:1 rule).
Comment #89
alexpottIf you do the value move in the media type config entity then any exported media type will also be updated. You also can trigger a deprecation notice so that developers can know that they need to update their default configuration.
I find multiple tests in the same updatetestbase test a bit odd since they are all using the same stating data.
Comment #90
samuel.mortensonSome review, mostly questions about behavior:
This repeated empty check sticks out to me - is this working around a bug, and if so can we document it or open a follow up?
Where is the
$in_queueargument documented?This looks scary - what are the replication steps for new media to be unpublished? Would #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user make this worse?
Comment #91
samuel.mortensonI also noticed that a queue item is created for every entity save, even if an item is already in the queue. Is this intentional?
Comment #92
catchThat's more or less unavoidable until we have #1548286: API for handling unique queue items.
Comment #93
seanb#89
1. Not exactly sure what you mean? Should we remove the update hook and move the value in one of the methods on the MediaType entity? Also throw a deprecation notice from the mediaType entity as well? Or should we add a deprecation notice in the update hook?
2. I guess we could create different fixtures for each update test, but in this case adding some config to the existing fixture seemed to make sense. Don't have a lot experience with this though, so if this is a blocking issue I might need some help.
#90
1. Added some documentation. A media name is required and we expect it to be there in a lot of places. We don't want to require this for editors though. That is why we have 3 ways to set a title at the moment.
We need the empty checks to make sure the fallbacks only kick in when a title is not already available.
2. Fixed
3. We unpublish a media item temporarily when:
We preferably want to land #2983456: Expose triggering update of media metadata + thumbnail to end users soon as well to provide users a way to fix this.
The issue #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user becomes more important, but this already is a pretty critical issue if you ask me.
#91
Yeah, as catch already mentioned we don't have a nice way to get around this at the moment.
Comment #94
vijaycs85Here is few review comments/questions:
Use bundle()-> instead of bundle->?
What happens here if there is no thumbnail URI? In real time or via the queue, it's going to be reported back as metadata updated?
Could we also check $saved here before queueing? If save failed for some reason, no point in fetching metadata?
Comment #95
effulgentsia commentedI think this should be marked @internal, since it's only intended to be called by MediaStorage.
I don't think we need to mark the entire class as @internal. Is it any more internal than any of our other entity type storage handlers?
Seems to me like whether or not we're called from the queue worker results in sufficiently different behavior that it warrants a separate method instead of an optional parameter. Perhaps
saveFromQueue()? In which case, we should also define that in aMediaStorageInterface.Comment #96
effulgentsia commentedRetitling this issue to focus on the goal. The API changes in the patch are just in service of that goal.
Comment #97
effulgentsia commentedSimplifying grammar :)
Comment #98
effulgentsia commentedThis might not stay Critical if other committers disagree, but raising it to that for now. I think it satisfies at least some of the criteria of https://www.drupal.org/core/issue-priority#critical-task, such as a concrete performance issue, a viable plan to resolve it, affects the runtime operation of the site, can't be committed to a patch release, and my own discretion.
I'd love to see this land before beta. Whether or not it's eligible between beta and RC might hinge on if others agree with the priority being Critical.
Comment #99
seanb#94
1. That doesn't work, bundle() returns a string and the magic getter returns
Drupal\Core\Field\EntityReferenceFieldItemList2. Good point, changed it so we now only set the thumbnail from the metadata if a URI is returned. If this fails it falls back to the default thumbnail in
preSave().3. If saving fails an exception is thrown, we only get SAVED_NEW or SAVED_UPDATED back so there is no extra value in checking that I think?
#95
1. Fixed.
2. Fixed.
3. Fixed.
Comment #101
seanbAh, new media library config.
Comment #102
vijaycs85Thanks for the update @seanB
As per docblock,
Comment #103
phenaproximaNot seeing a whole lot wrong here.
Let's mention that this MUST come before the thumbnail logic below, because $translation->label() is called and we MUST have a name set before we call that, lest it result in a remote API call or some other naughtiness.
Maybe we should cast $queue_metadata_fetching to bool here.
Where are we setting the item to unpublished?
Needs be elseif.
This is patch noise, but I'll let it slide :)
We should mention that the names of these attributes are specified by the default_name_metadata_attribute and thumbnail_uri_metadata_attribute properties of the plugin definition.
Let's expand this a little to explain why we fetch metadata outside of the transaction.
I don't think we need to keep $should_update_metadata in its own variable; the method name is self-explanatory. So this should be
if (!$should_queue && $this->shouldUpdateMetadata($media)).I'm not sure if, from a UX perspective, which would be less jarring -- allowing the item to have its original status and letting the metadata be temporarily wrong, or unilaterally changing the item's status until the queue runs. Both options suck, but our backs are against the wall for now. Maybe we should get an expert UX opinion on which would be preferable.
We don't need this @throws because we are never explicitly throwing the exception.
Would it be easy to load the same revision, unchanged (or copy-and-paste the code from the parent class which loads and sets $entity->original during save)? If so, that might smooth out potential Content Moderation-related turbulence. If it's not easy, then let's keep this as-is for now but add a @todo referencing the issue which will make this compatible with Content Moderation, and escalate that issue to major.
Shouldn't we check that the default thumbnail is used if the thumbnail is removed?
Comment #104
effulgentsia commentedIs the metadata even temporarily wrong? Suppose we kept it published and someone were to view it prior to the queue running... wouldn't it just result in
$media->getSource()->getMetadata()being called at that time?Comment #105
seanb#102 Fixed. This is not properly documented on
EntityStorageInterface::save(), but I guess that's another issue.#103
1. Fixed
2. Fixed
3. That would be
MediaStorage::save()4. Fixed
5. :)
6. Fixed
7. Fixed
8. Fixed
9. We tried this for youtube video's. You could have a video published on your site where the name is
media:remote_video:7cb1fdf5-2ad6-44ce-918c-ce480c266e87. Besides that the video would not even be shown since the embed code is not fetched yet. When search engines index this content, or non-editors see this, this could lead to confusion. So that's why we decided that showing nothing is a little less evil (for at least search engines and non-editor users). That being said, it is important to land #2983456: Expose triggering update of media metadata + thumbnail to end users pretty soon after this patch, to allow users to trigger the immediate update of metadata if needed.10. Fixed
11. Added a todo since there seem to be some things that need to be discussed about the latest vs default revision etc.
12. Fixed, good point!
#104
The
getMetadata()method is supposed to be used when saving and the fetched values should be stored in the media item. When viewing a media item, you are just displaying the saved field values and no external API calls should be needed.Comment #106
chr.fritsch@seanB and @phenaproxima put in a lot of effort into this issue and fixed one of the major problems with media.
So this looks really good to me and I hope it will still make it into 8.6.
Comment #107
seanbChanged #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user to critical since we decided that issue is a blocker for this patch.
The decision was made to temporarily unpublish media when metadata fetching is queued. We do this to make sure incomplete media items are not shown on the site and/or possibly indexed by search engines. Without the permission added in #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user, there is no way for a user to see the media item that it just created.
Comment #109
effulgentsia commentedThis patch didn't make it into beta, but I still think has a good chance of making it into RC.
Let's update that label too.
Let's change this to use
ConfigEntityUpdater. See #2989256: Old post_update functions that ->save() config entities without calling ->trustData() first mask errors in other post_update functions for why.I think we need a
MediaStorageInterfacethat defines thesaveFromQueuemethod, so that the queue worker can type-hint on the interface, not the implementation.Is there a reason for this to be internal? If so, let's document that reason. But I don't think it should be, since I think we'll be supporting the concept of resaving from the queue for a while, especially since we also have a config key for it.
What about moving this to within the
saveFromQueue()implementation? That way,MediaStoragecontrols both the unpublishing and the publishing as its own implementation detail.Comment #110
effulgentsia commentedHm, I'm also starting to wonder why have this logic in a storage handler? Why not have Media itself implement the desired save() and saveFromQueue() methods? Maybe there's good reasons for the storage handler, but if so, what are they?
Comment #111
phenaproximaBecause any entity can be saved in two ways: $entity->save() and $storage->save($entity). It needs to work the same way either way, and since the first form is just a more concise variant of the second form, the storage handler is what makes the most sense for this logic.
Comment #112
xjmSo what is the actual impact on site owners of this bug? The issue summary does not describe what I'd consider a critical issue. It sounds like the bug is:
If site owners have remote media types on their sites, their content will sometimes be saved without all of the remote metadata. There is not a data integrity problem because the data is downloaded eventually. The concern is mainly that the thumbnails will not appear in the meanwhile and so, for a contrib media source, the user (or maybe a search engine) might see the default fallback thumbnail rather than the preferred thumbnail for awhile.
Unless media items in HEAD are currently taking like 100ms extra to save because of the thumbnail functionality, or unless there is an actual data integrity problem or race condition, this issue doesn't seem like it would be critical. It definitely does sound major, which is serious enough in its own right without needing to bend the critical priority.
However, if there is a race condition, data integrity issue, etc. in HEAD, or if saving a media item hangs/whitescreens, then critical seems appropriate.
Comment #113
effulgentsia commentedRe #112, yeah, what I thought was the Critical issue when I wrote #98 was when you added a remote video (media/add/remote_video), that an HTTP request was being issued inside of a database transaction. However, I'm not currently able to reproduce that on 8.6 HEAD, because even in 8.6 HEAD,
Media::save()already does the metadata fetching prior to callingparent::save(). So, yeah, a question for folks working on this issue: other than some code/API cleanup, what is this issue solving?Comment #114
larowlanmy reading of this indicates that we might have an issue in this scenario
* User saves media as published
* Storage handler queues metadata fetch and marks it as unpublished, noting it needs to be re-published
* Content editor goes back and resaves as unpublished
* Storage handler queues another metadata fech and leaves it as unpublished but notes original status as unpulished
* queue runs, first task publishes, second on unpublishes
In an instance where the first queue item is run and the second one is deferred to a future cron run, do we end up with the situation where the queue overrides the content-editors intention to unpublish
Comment #115
samuel.mortenson@larowlan Wouldn't the
$media->isNew()check in the snippet you posted prevent this scenario?Comment #116
phenaproximaIf you call $media->save(), there is no chance that you will be waiting on an HTTP request during a database transaction.
If you call
Drupal::entityTypeManager()->getStorage('media')->save($media), you might.Since both ways are valid ways of saving a media item (and in fact, the second one is probably more "correct"), we need to fix this so that, no matter how you save your media items, you cannot end up in an HTTP request during a database transaction.
That's why this is critical. The crux of the issue is not whether authors have the correct metadata or not at save time; it's that they could potentially scramble their database if they are saving too many media items the "wrong" way at once.
Now, if that means we split off a new issue and fix the immediate problem without cleaning up the API as well, so be it. We can improve the actual API for 8.7.x. I defer that decision to the committers. But the main problem (the possibility of waiting for HTTP during a database transaction) is the actual critical bug that must be fixed now for 8.6.
Comment #117
larowlanThat makes the scenario worse, as this could occur right?
Comment #118
xjmYeah, if we can separate "prevent database scrambling" from "add improved APIs", we should definitely do that.
@larowlan, @effulgentsia, @plach, and I discussed this and we're somewhat concerned about the risk in general. This sounds like the sort of tricky bug where fixing it might introduce related regressions in data integrity, performance, or user expectation (with regard to the queuing process, etc.) We also discussed that the proposed queue/publishing workflow could have unintended side effects on sites using Content Moderation.
At the least, we'll want some test cases around the Workflow scenarios, and some thorough manual testing overall.
Thanks @phenaproxima!
Comment #119
catchI'm also concerned about the unpublishing - especially cases where the media item should be unpublished for some other reason.
If we're concerned about search indexing and caching, then we should prevent caching when the media item is rendered with the placeholder/incomplete metadata - although given items are saved again when they get the correct metadata, cache tags should be invalidated, so not sure even this is a problem at least on the cache side?
Twitter shows embedded articles with a placeholder thumbnail sometimes for minutes at a time, so there are user experience patterns where embedded media takes a while to update, and it doesn't to me feel less jarring than something being unpublished.
For example if I create a media item from a node/article form via the field (when this option is available), and the media item is unpublished, won't it just completely fail to show when the node is rendered? Would I even be able to reference the media item from the node after creating it or would it be immediately unavailable?
Then that node referencing the unpublished media could also get cached/search indexed in the interim, leading to a similar problem to the one this issue is trying to solve - except that with the node case it will be even harder to prevent caching (because referencing unpublished media items is a valid case if the media item is unpublished later).
I do agree with ensuring things happen outside the transaction though.
Comment #120
seanbI see the automatic unpublishing is something we need to discuss a little more. Removed it from this patch since it was not part of the original scope for this issue, and even though showing a media item to users before metadata is fetched could be confusing, this issue is not completely new. We also have #2983456: Expose triggering update of media metadata + thumbnail to end users to allow users to fetch metadata directly, which might be good enough.
Hope this takes away the concerns to commit this patch.
#109
1. Fixed
2. Fixed
3. Fixed. Added an internal interface for queued media only.
4. I think
saveFromQueue()should be reserved for the queue defined in the module. The supported/documented way to save media is through thesave()method. If any calling code want to update the metadata directly it should callupdateMetadata()on the media item first, not "misuse"saveFromQueue().5. Removed the automatic unpublishing.
Comment #122
seanbFor some reason the new
MediaStorageQueuedInterfacewas in the interdiff but not in the patch.Comment #124
seanbMissed one assert that should be removed.
Comment #125
marcoscanoWell, as pointed out in #64, the "temporarily broken" name is something being introduced in this patch, that's the reason we decided to do the unpublishing here and not in a follow up.
One argument could be made that for most affected sites, it would be preferable to keep things as they are today rather than having the effect mentioned in #64.
Comment #126
seanbAs catch mentioned in #119:
I agree having media unpublished is not necessarily better. Providing users with a way to fix it through #2983456: Expose triggering update of media metadata + thumbnail to end users would really help, with that option users might not even have a problem? Besides that, users also still have the choice to create the media as unpublished, or to disable queuing on a media type.
Maybe we could add a warning to the media form while creating it, instead of after creating, so users know that they are creating a media item that is temporarily incorrect. They can choose to not publish it immediately instead of the module making this decision?
Comment #127
phenaproximaAfter discussion with @seanB is Slack, I've opened #2990896: Move Media::save() logic into storage handler to prevent data integrity issues when media items with remote content are saved, which is critical, to move the Media::save() stuff into the storage handler. That one needs to be fixed in 8.6 and has a much narrower scope. That is just about moving the current hack to a better spot. We can sort out the API and UX implications further in this issue, which no longer needs to be critical (or backported to 8.6.x).
Comment #128
deviantintegral commentedThis reads as if there's a b/c break if your source doesn't implement the default name. Actually, that is a b/c break given that media sources don't have to extend from
MediaSourceBase. A cursory search in core isn't showing any cases that will fail if NULL is returned, so perhaps we should change this toevery source should provide?\Drupal\media\MediaStorage::savecallsupdateMetadata, but the method parameter is only typehinted toEntityInterface. We should probably add an instanceof check here.Likewise, should
public function saveFromQueue(EntityInterface $media);be a MediaInterface?Should this just be
@return int?Comment #130
dww#124 isn't applying to 8.7.x anymore. Just re-queued bot to confirm. Anyone setup to quickly re-roll this?
Also NW for #128, all of which seems like good feedback to me.
Haven't done a thorough review yet.
Thanks, everyone! Would love to ses (and help get) this fixed in core.
Cheers,
-Derek
Comment #131
idebr commentedComment #133
dwwI keep getting nailed by this, and would love to see it in core. I just tried to re-roll it for 8.7.x branch (which is where I need it right now). It was *really* gnarly, and I probably broke all kinds of things. ;) Running tests locally now, I predict all sorts of failures...
BUT, hopefully this is a step closer to working again...
Sadly, interdiff is majorly confused, so I'm having to attach a raw diff of the two patch files.
I suspect the test bot will fail miserably with this, but let's see what happens. ;)
Would *love* to get some input and help from the media maintainers on this...
Comment #135
seanbFetching all metadata in a queue is something we got quite some pushback on. But if we expand the API to allow metadata refresh, that might be a totally different issue. That might be better addressed in #2983456: Expose triggering update of media metadata + thumbnail to end users.
So maybe just close the this issue as won’t fix, and added the relevant API parts to refresh metadata in #2983456: Expose triggering update of media metadata + thumbnail to end users. It would significantly simplify the solution. I think all we need is a
updateMetadata()method to improve this and have a way for users to execute that when needed.Comment #136
dwwWFM. See #2983456-14: Expose triggering update of media metadata + thumbnail to end users.
Thanks!
-Derek