This issue is spun off from #2831274-390: Bring Media entity module to core as Media module.

There has been some disagreement and bikeshedding about how to handle updates to media entity thumbnails. Currently, media entities have an updateThumbnail() method but it is an internal method and should not be called by external code. We need to decide on what the API will be for this, implement a patch if necessary, and finalize this once and for all!

  • We want to download a thumbnail for media (when not queued) before saving the media item to prevent a double save.
  • Since Media::preSave() is part of the database transaction we can't use that to download thumbnail from an external source. This causes fails in SQLite.
  • Not only downloading thumbnail can be slow, other API's can also be slow or unavailable causing similar issues.

This is postponed on #2831274: Bring Media entity module to core as Media module.

Remaining Tasks

  • Collect all the history of this issue, all the bikeshedding, and so forth, and move it into this issue so that we can have a coherent place to talk about it and decide how to proceed.
  • Write a patch if necesssary.
  • Commit the patch, if it exists, and breathe a collective sigh of relief.

Proposed solution

  • We are going to override the media storage handler to call a new updateMetadata() on media before the transaction starts. We will need to document why this is done since ideally we would do this in Media::preSave().
  • We move shouldUpdateThumbnail() to the storage handler since we don't want it as a public method in Media.
  • We will remove updateQueuedThumbnail() and make the queue use updateThumbnail().
  • When the thumbnail is empty, we add a default value in Media::preSave(), this is done for the media name as well.
CommentFileSizeAuthor
#133 2878119.124_133.raw-diff.txt40.32 KBdww
#133 2878119-133.patch58.91 KBdww
#124 interdiff-122-124.txt640 bytesseanb
#124 2878119-124.patch64.53 KBseanb
#122 2878119-122.patch64.87 KBseanb
#120 interdiff-105-120.txt10.4 KBseanb
#120 2878119-120.patch63.85 KBseanb
#105 interdiff-101-105.txt8.02 KBseanb
#105 2878119-105.patch65.81 KBseanb
#101 interdiff-99-101.txt1.36 KBseanb
#101 2878119-101.patch64.84 KBseanb
#99 interdiff-93-99.txt7.26 KBseanb
#99 2878119-99.patch63.48 KBseanb
#93 interdiff-87-93.txt3.6 KBseanb
#93 2878119-93.patch63.16 KBseanb
#93 2878119-93.patch63.16 KBseanb
#87 interdiff-85-88.txt1.51 KBseanb
#87 2878119-88.patch62.36 KBseanb
#85 interdiff-82-85.txt3.36 KBseanb
#85 2878119-85.patch62.33 KBseanb
#82 interdiff-80-82.txt1.73 KBseanb
#82 2878119-82.patch63.19 KBseanb
#80 interdiff-73-80.txt14.43 KBseanb
#80 2878119-80.patch62.73 KBseanb
#73 interdiff-71-73.txt3.87 KBseanb
#73 2878119-73.patch61.53 KBseanb
#71 interdiff-69-71.txt3.3 KBseanb
#71 2878119-71.patch61.25 KBseanb
#69 interdiff-66-69.txt7.18 KBseanb
#69 2878119-69.patch57.78 KBseanb
#66 interdiff-58-65.txt5.47 KBmarcoscano
#66 2878119-65.patch52.72 KBmarcoscano
#64 test2.png263.38 KBmarcoscano
#64 test1.png263.75 KBmarcoscano
#61 media:remote_video:cf4451ef-0a2c-489d-87c8-7003cf6a883d | Drush Site-Install 2018-07-04 16-30-43.png937.78 KBdeviantintegral
#58 interdiff-56-58.txt18.93 KBseanb
#58 2878119-58.patch52.51 KBseanb
#56 interdiff-55-56.txt18.24 KBseanb
#56 2878119-56.patch57.49 KBseanb
#55 interdiff-54-55.txt4.14 KBmarcoscano
#55 2878119-55.patch62.55 KBmarcoscano
#54 interdiff-53-54.txt1.37 KBmarcoscano
#54 2878119-54.patch59.41 KBmarcoscano
#53 interdiff-52-53.txt1.03 KBmarcoscano
#53 2878119-53.patch59.39 KBmarcoscano
#52 interdiff-51-52.txt1.65 KBmarcoscano
#52 2878119-52.patch59.19 KBmarcoscano
#51 interdiff-44-51.txt18.7 KBmarcoscano
#51 2878119-51.patch58.62 KBmarcoscano
#44 2878119-44.patch58.23 KBseanb
#42 interdiff-40-42.txt17.51 KBseanb
#42 2878119-42.patch58.22 KBseanb
#40 interdiff-38-40.txt7.28 KBseanb
#40 2878119-40.patch51.22 KBseanb
#38 interdiff-35-38.txt2.59 KBseanb
#38 2878119-38.patch51.22 KBseanb
#35 interdiff-33-35.txt32.58 KBseanb
#35 2878119-35.patch50.16 KBseanb
#33 interdiff-31-33.txt1.25 KBseanb
#33 2878119-33.patch26 KBseanb
#31 interdiff-28-31.txt3.64 KBseanb
#31 2878119-31.patch24.75 KBseanb
#28 interdiff-26-28.txt4.56 KBseanb
#28 2878119-28.patch22.89 KBseanb
#26 2878119-26.patch23.34 KBseanb
#17 2878119-17.patch23.01 KBmarcoscano
#17 interdiff-14-17.txt7.46 KBmarcoscano
#14 2878119-14.patch19.53 KBphenaproxima
#11 2878119-11.patch7.29 KBphenaproxima
#7 2878119-7.patch4.04 KBphenaproxima

Comments

phenaproxima created an issue. See original summary.

phenaproxima’s picture

Status: Postponed » Active

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

osopolar’s picture

FWIW: There is a related issue in the queue of Media entity image module.

phenaproxima’s picture

Priority: Normal » Major
Issue tags: +Media Initiative

#2831944: Implement media source plugin for remote video via oEmbed has stretched the current thumbnail non-API to its absolute limit, and necessitated some ugly, temporary, thumbnail-related hacks in the Media module in order to land that issue. It is therefore imperative that we get this issue sorted out, so I'm bumping the priority.

phenaproxima’s picture

Status: Active » Needs review
StatusFileSize
new4.04 KB

A first attempt, just to see how many tests break. The main difference between this and what we already have is that, during Media::preSave(), the default thumbnail will always be used, so as to avoid expensive and dicey thumbnail download operations.

Status: Needs review » Needs work

The last submitted patch, 7: 2878119-7.patch, failed testing. View results

seanb’s picture

Can we close #2862465: Move \Drupal\media\Entity\Media::updateThumbnail into its own service as a duplicate of this one? Or close this and continue in the other issue?

phenaproxima’s picture

I'm closing the other issue because this is more general. We have the opportunity to totally redesign and improve Media's thumbnail API, so let's do that in here.

phenaproxima’s picture

StatusFileSize
new7.29 KB

This will almost certainly break tests, but it illustrates the solution I had in mind for this knotty problem. It's nice to have the slickness of thumbnail updates being part of the save operation, but it's extremely dicey and bad to have that logic be part of the Media entity class. This moves the logic to the storage handler, which should hopefully give us the best of both worlds.

seanb’s picture

Issue summary: View changes

So, just had a nice discussion about this with @phenaproxima and @marcoscano. We decided on the following:

  • We want to download a thumbnail for media (when not queued) before saving the media item to prevent a double save.
  • Since Media::preSave() is part of the database transaction we can't use that to download thumbnail from an external source. This causes fails in SQLite.
  • We are going to override the media storage handler to call a new updateThumbnail() on media before the transaction starts. We will need to document why this is done since ideally we would do this in Media::preSave().
  • We move shouldUpdateThumbnail() to the storage handler since we don't want it as a public method in Media.
  • We will remove updateQueuedThumbnail() and make the queue use updateThumbnail().
  • When the thumbnail is empty, we add a default value in Media::preSave(), this is done for the media name as well.

Please let me know if any of this should be changed.

seanb’s picture

Title: Build a stable API for updating media item thumbnails » Build a stable API for updating media item metadata
Issue summary: View changes

Important change of direction. Phenaproxima noticed that other external metadata could have the same issues as the thumbnail downloads. Slow API's could cause even fetching the name metadata to be slow.

To fix this we should probably not only update the thumbnail before the save transaction or from the queue, but all metadata. Instead of updateThumbnail() we would add a updateMetadata() method to MediaInterface.

The only problem we have is the media name. This is defined as metadata. If fetching the name fails, or is done through the queue, we need a temp name for the media item.

phenaproxima’s picture

Status: Needs work » Needs review
StatusFileSize
new19.53 KB

A first, honest attempt with the agreed-upon direction. Let's see how many tests this breaks (hint: a lot).

Status: Needs review » Needs work

The last submitted patch, 14: 2878119-14.patch, failed testing. View results

marcoscano’s picture

@phenaproxima nice work! I'm glad we are cleaning and simplifying some of the thumbnail-handling, it was a bit obscure so far.

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,137 +167,97 @@ public function getSource() {
    +  protected function updateMappedMetadata() {
    ...
    +  protected function updateThumbnail() {
    ...
    +  protected function setDefaultThumbnail() {
    

    I know we always try to minimize the API surface we add to the interface, but could we consider making these methods public as well?

    I definitely see use cases for contrib or custom code wanting to perform actions such as: revert the thumbnail to the default one, force-trigger another update of the thumbnail at some random point, or updating just the thumbnail without necessarily updating the mapped fields, etc.

  2. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,137 +167,97 @@ public function getSource() {
    +    $this->thumbnail->alt = $this->t('Thumbnail', [], [
    

    I have thoughts on how hardcoding the ALT and TITLE here would impact the solution being cooked in #2956368: MediaThumbnailFormatter produces unhelpful text alternative and title attributes for media thumbnails. But once this is in theory just some "temporary" initial values, I guess we shouldn't worry too much at this point.

    If we end up making this public, though, then I'd try to have some optional arguments so callers could opt-out of this hardcoding of ALT/TITLE, if wanted.

  3. +++ b/core/modules/media/src/Entity/Media.php
    @@ -305,45 +266,17 @@ protected function shouldUpdateThumbnail($is_new = FALSE) {
    +        $translation->setName($this->bundle->entity->label() . ' ' . $this->uuid());
    

    Currently if the source does not provide a default name metadata, the source base falls back to a name made of media:{type machine name}:{uuid}. Should we use that here, in name of consistency?

  4. +++ b/core/modules/media/src/MediaInterface.php
    @@ -64,4 +64,12 @@ public function setCreatedTime($timestamp);
    +   * Updates all mapped metadata for the media item.
    

    Looking at the implementation this doesn't seem to completely describe what this method does. (For instance it also updates the thumbnail, which is not a mapped metadata)

  5. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +/**
    

    Nit: My phpcs is complaining we need a class description string here.

  6. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +    // Only update the metadata if the queue is disabled for the media item
    +    // _and_ an update is actually needed (i.e., the value of the source field
    +    // has changed).
    +    if (!$should_queue && $this->shouldUpdate($media)) {
    +      $media->updateMetadata();
    +    }
    

    Are we sure the queued-thumbnails scenario is well handled here? In that case, when the worker triggers the saving of the entity, the media item will already have a thumbnail (the default one), wont' be new, and the source field would not have been changed. So unless I'm seeing something wrong, this will just re-save the entity in that case?

  7. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +    $saved = parent::save($media);
    

    How about we document here why this method exists, and why the call to parent::save() must be executed after the ::updateMetadata() call?

  8. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +   *   TRUE if the media item's metadata needs to be updated, otherwise FALSE.
    

    This may be a personal preference, but I tend to hate when the docblock doesn't give me enough info and forces me to read the code implementation to understand what the function does... :) Maybe just adding something like "Metadata should be updated whenever the sourcefield is changed, or if this is a new entity being created." to the current text would improve this help text a lot!

  9. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +  protected function shouldUpdate(MediaInterface $media) {
    

    Can we call this method ::shouldUpdateMetadata()?

  10. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +    $original = isset($media->original)
    

    Apparently $entity->original is set as part of EntityStorageBase::save(), so I guess we could directly always use ::loadUnchanged() here?

  11. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,155 @@
    +  protected function hasSourceFieldChanged(MediaInterface $media, MediaInterface $original) {
    

    Is it too crazy to wonder if this would better live inside MediaSourceInterface rather than here? No strong opinions on this though...
    If we do move it to the source, I would move the ::shouldUpdateMetadata() method there too.

marcoscano’s picture

StatusFileSize
new7.46 KB
new23.01 KB

Didn't address any of #16, just trying to fix some tests.

One of the things I noticed while working on this is that we were always setting the "temporary" name, while we should only do it in queued operations. Otherwise we bypass all of the default naming provided by the sources. I've changed this in the MediaStorage::save() and it seems to work well now.

Fixed too some other edge cases not being dealt with, according to the tests :).

Also, removed this chunk of the test Drupal\Tests\media\Kernel\MediaSourceTest::testThumbnail:

    // Remove the thumbnail and make sure that it is auto-updated on save.
    $media->thumbnail->target_id = NULL;
    $this->assertEquals('public://thumbnail2.jpg', $media_source->getMetadata($media, 'thumbnail_uri'), 'Value of the thumbnail metadata attribute is not correct.');
    $media->save();
    $this->assertEquals('public://thumbnail2.jpg', $media->thumbnail->entity->getFileUri(), 'New thumbnail was not added to the media item.');
    $this->assertEquals('Mr. Jones', $media->thumbnail->title, 'Title text was not set on the thumbnail.');
    $this->assertEquals('Thumbnail', $media->thumbnail->alt, 'Alt text was not set on the thumbnail.');

Because I cannot think of a normal use-case where the user removes an existing thumbnail and we need to detect that. The basefield is not available on the form display, so probably the only way of removing the thumbnail is... by code?

Again, in the same test, I've commented out:

    // Remove the value of the mapped field and make sure that it is re-mapped
    // on save.
    \Drupal::state()->set('media_source_test_attributes', [
      $attribute_name => ['title' => 'Attribute to map', 'value' => 'Snowball'],
    ]);
    $media->{$field_name}->value = NULL;
    $this->assertEquals('Snowball', $media_source->getMetadata($media, $attribute_name), 'Value of the metadata attribute is not correct.');
    $media->save();
    $this->assertEquals('Snowball', $media->get($field_name)->value, 'Metadata attribute was not mapped to the field.');

With the current approach, metadata is updated/mapped only:
- When the media is created and non-queued
- When the queue item is processed
- When a media is saved AND source field changed

In that chunk of code we test something else. Is this an intended feature we want to support? I personally don't think so, but I left the test commented out so it's more explicit we need to discuss this further.

seanb’s picture

One of the things I noticed while working on this is that we were always setting the "temporary" name, while we should only do it in queued operations.

Actually, I think this is intentionally. The name could be from an external webservice that might not be available or really slow. We can't depend on the media source providing it, same as the thumbnail.
This also made me notice something else. When looking at the patch the name currently only is provided by the media source when the name is empty. This is a problem when you want to have a media source set the name when you change the source field for example. I found a bug about that: #2933012: Expose the filename as metadata for file based media sources.
Might be best to solve this through #2882473: Hide the media name basefield from the entity form by default.

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -180,12 +180,16 @@ public function updateMetadata() {
    +  protected function updateMappedMetadata($override = FALSE) {
    

    Do we really need this parameter? Just don't call the methos if you don't want to update right?

  2. +++ b/core/modules/media/src/Entity/Media.php
    @@ -180,12 +180,16 @@ public function updateMetadata() {
    +      if ($this->get($field_name)->isEmpty() || $override) {
    

    We should probably remove this check right? If we call the method, just update everything. Why only when empty or if override is true?

  3. +++ b/core/modules/media/src/MediaInterface.php
    @@ -67,9 +67,13 @@ public function getSource();
    +  public function updateMetadata($override = FALSE);
    

    See my comment above about the parameter.

  4. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -131,13 +131,14 @@ public function testMetadataMapping() {
    +    // @todo: Is this a feature we really want to support?
    +//    \Drupal::state()->set('media_source_test_attributes', [
    +//      $attribute_name => ['title' => 'Attribute to map', 'value' => 'Snowball'],
    +//    ]);
    +//    $media->{$field_name}->value = NULL;
    +//    $this->assertEquals('Snowball', $media_source->getMetadata($media, $attribute_name), 'Value of the metadata attribute is not correct.');
    +//    $media->save();
    +//    $this->assertEquals('Snowball', $media->get($field_name)->value, 'Metadata attribute was not mapped to the field.');
    

    Good question, I guess updating the metadata only when the source field changes is a better than updating it every time. So when you only clear or change a metadata field, we should probably not automatically update that.

  5. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -189,14 +190,6 @@ public function testThumbnail() {
    -    // Remove the thumbnail and make sure that it is auto-updated on save.
    -    $media->thumbnail->target_id = NULL;
    -    $this->assertEquals('public://thumbnail2.jpg', $media_source->getMetadata($media, 'thumbnail_uri'), 'Value of the thumbnail metadata attribute is not correct.');
    -    $media->save();
    -    $this->assertEquals('public://thumbnail2.jpg', $media->thumbnail->entity->getFileUri(), 'New thumbnail was not added to the media item.');
    -    $this->assertEquals('Mr. Jones', $media->thumbnail->title, 'Title text was not set on the thumbnail.');
    -    $this->assertEquals('Thumbnail', $media->thumbnail->alt, 'Alt text was not set on the thumbnail.');
    -
    

    I think the point of the test was to check if an empty thumbnail field gets updated when empty. Also when removing the file entity the thumbnail field is also cleared right? I think the test is a good one to have.

marcoscano’s picture

@seanB thanks for reviewing!

Actually, I think this is intentionally. The name could be from an external webservice that might not be available or really slow.

Well, that's why the queue-based approach exists, right? If the media type is configured not to use the queue, then it is assumed that metadata can be fetched in real-time, even if that's an expensive operation. Or maybe I'm missing something here :)

When looking at the patch the name currently only is provided by the media source when the name is empty. This is a problem when you want to have a media source set the name when you change the source field for example.

As I mentioned in #2882473-18: Hide the media name basefield from the entity form by default, IMHO I believe this is an impossible problem to solve, without risking to lose user-entered data. For me the safest approach is just to stick to update only if empty.

1)

Do we really need this parameter? Just don't call the methos if you don't want to update right?

Not really... once now the fields will not be empty (because we have generated a default value prior to saving), when the queue worker runs, it would do nothing if the $override parameter isn't there (as the test failures demonstrate). The override parameter would be indeed unnecessary if we removed also the ->isEmpty() check, but in this case we would run into the other issues I mentioned above.

2) and 3) same as before

4) Yep, that's my opinion too

5) Hm, how is this different than 4? The source field hasn't changed here, so we sholdn't update the thumbnail, even if it's empty. If some code deleted its value, I guess it's logical to expect that the same code should be responsible for setting a new one? (once the user can't delete the thumbnail through the UI).

Thanks!

seanb’s picture

Well, that's why the queue-based approach exists, right?

The media source name could be empty though if the source fails to get one (external service could be down). But I agree we should not undo the logic in getName(). How about we add the default name code from preSave() to getName(), and move the code you added in MediaStorage to updateMetadata() (since the name could actually be metadata)?

I'll add my thoughts about #2882473: Hide the media name basefield from the entity form by default in that issue.

Not really... once now the fields will not be empty (because we have generated a default value prior to saving), when the queue worker runs, it would do nothing if the $override parameter isn't there (as the test failures demonstrate).

If we remove the complete if statement (if ($this->get($field_name)->isEmpty() || $override) {, including the isEmpty() check) and always update the metadata this will not be the case right? The code calling the method should determine whether or not it actually wants to update the metadata.

Hm, how is this different than 4? The source field hasn't changed here, so we sholdn't update the thumbnail, even if it's empty.

You are right, we should replace this with a test more specific for updateMetadata().

phenaproxima’s picture

oEmbed support has landed (again -- hopefully won't get reverted this time), so this needs a re-roll since that patch changed a lot of Media's save logic (in order to work around the very problems we're trying to solve in this issue, that is).

oEmbed support lit a fire under this issue's butt because it highlighted the weaknesses of Media's remote media handling. We were able to hack Media enough to bludgeon it into working and consistently passing tests, but the hacks in question are unmaintainable and were always intended to be temporary anyway.

Now that core officially supports remote media, this issue is critically important to making that functionality rock-solid. Therefore, even though we are adding something(s) to Media's API, I'd like to propose that this become a beta target for 8.6.x, or at least be marked for backport to 8.6.x, since this will likely miss 8.6.0's alpha deadline. Therefore, tagging this for release manager review.

phenaproxima’s picture

Issue tags: +Media critical

This is extremely important to Media remaining stable, so I'm marking it as Media-critical.

phenaproxima’s picture

Issue tags: +beta target

Briefly discussed this issue with @catch in Slack. Because this is an API addition, we should be able to land it during the 8.6.0 beta period, so I'm marking this as a beta target since honestly, I think we'll be lucky to land it during the alpha stage. Once 8.6.0 is out, backporting this issue will be a dicey proposition, but we'll cross that bridge when we get there.

catch’s picture

This adds a single method to MediaInterface which is fine under the 1-1 interface rule.

I agree this is an important issue (would probably reclassify as a bug), but while it seems like a high impact/low risk change up until beta (assuming that's the only API impact), less sure about it getting in during rc or a patch release.

phenaproxima’s picture

Thanks, @catch!

Fire up your coffee makers and let's land this during alpha/beta.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new23.34 KB

Rerolled and just uploading my progress and checking what the testbot thinks. Also trief to address the comments in #16:

1. We discussed this on Slack and decided that we would only add public methods we need for now. If there are good reasons to make other methods public, feel free to open a follow up and let's discuss it.
2. There is a follow up for better default, so let's improve that in the other issue.
3. Fixed. Changed it to match the source base.
4. Fixed.
5. Fixed.
6. Fixed, added a call to updateMetadata() before saving the media item from the queue worker. I can't think of a reason why we would not always want to update the metadata when processing the queue?
7. Fixed. At least added some documentation, but please let me know if this could be further improved.
8. Fixed.
9. Fixed.
10. Fixed.
11. I don't think we need to make this a public method on the media source. This is pretty generic. If needed we can fix this in a follow up.

No interdiff because the reroll messed it up.

Status: Needs review » Needs work

The last submitted patch, 26: 2878119-26.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new22.89 KB
new4.56 KB

I'm affraid we will still need presave to set the default thumbnail and the default name for media. That should fix the tests.

Status: Needs review » Needs work

The last submitted patch, 28: 2878119-28.patch, failed testing. View results

deviantintegral’s picture

With this patch, there still seems to be a key limitation with thumbnail handling: there is no good way to handle transient errors when retrieving thumbnails. For example:

  • The thumbnail URI returns a temporary error, such as a 503. More commonly, I've seen at least one (internal, enterprise) video service return thumbnail URIs in videos where the thumbnails don't exist on the CDN yet.
  • The URI returns a 403 or 401, such as when credentials have been changed for an API. This is most common when queuing thumbnails, as there may be a gap between pulling media data and downloading the thumbnail.

This simplest approach is to not catch any exceptions in getMetadata() in a media source plugin. However, if thumbnails aren't queued, then those exceptions are never caught by the media entity form, and end users are left with a 503. If a source plugin does catch an exception and log it, it's still required by the getMetadata() method to return NULL. For thumbnails, this means the default thumbnail is saved as the canonical thumbnail, and the entity is not re-inserted into the queue to try again later.

This comes up with any metadata, and not just thumbnails, but at least with other fields an empty mapped field can be updated the next time the media item is saved.

I suggest we add an exception along the lines of MetadataNotAvailableException, that is thrown by source plugins when an error prevents loading an attribute. In this issue, we add a catch for this with thumbnail handling, leaving another catch / error handler for the main media entity form in a followup.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new24.75 KB
new3.64 KB

Added the default thumbnail alt/title through presave as well. Also fixed an issue in the queue. Last but not least updated the text on the media type form since we are no longer queueing thumbnail downloads, but all metadata updates. We might want to have a followup to actually rename the queue?

#30 I agree this could be an issue. What to do when metadata is not available? We could stop the media item from being saved, but that doesn't work when the metadata is fetched in the queue. I think this is a separate issue and best discussed and fixed in a separate issue though.

Status: Needs review » Needs work

The last submitted patch, 31: 2878119-31.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new26 KB
new1.25 KB

Sigh, broke another test changing the media type form.

marcoscano’s picture

Status: Needs review » Needs work

Nice work, thanks @seanB!

A) If I recall correctly from the slack chat we had, we still need to document very clearly that the default name generation that source plugins provide should always be done without any remote API calls, once it will be called from inside the transaction. I can't think of any better place for this other than the MediaSourceInterface::getMetadata() docblock, although this only refers to the default name metadata.

B) I believe this could be a good place for fixing #30, and I like the exception-based approach. I'm just not sure what would be the best behavior when the media entity catches the exception thrown by the source when the metadata generation failed. Should we force-queue it so it can be retried later? Use an empty value and show an error to the user? Other ideas?

Also,

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,59 @@ public function getSource() {
    +  /**
    +   * Updates all fields that are mapped to metadata attributes.
    +   */
    +  protected function updateMappedMetadata() {
    +    $field_map = $this->bundle->entity->getFieldMap();
    +    foreach ($field_map as $attribute_name => $field_name) {
    +      $this->set($field_name, $this->getSource()->getMetadata($this, $attribute_name));
    +    }
    +  }
    

    Correct me if I'm wrong, but doesn't this represent a change respect to the previous behavior? (maybe I'm just confused about this point).
    If I understand it correctly, before the mapped fields were only populated if empty, regardless of everything else. With the new code, when we update metadata, we always update as well the mapped fields, regardless if they are empty or not. If this is the case, in the uncommon but possible scenario where a user is saving a media item with a different source value but manually-entered mapped field, we would override the manually-entered data with the fetched metadata. It may actually make sense, but if it's a new behavior we should at least document this very well. (Probably in the metadata mapping form?)
    Also, once ::updateMetadata() is public, I believe we should explicitly indicate on its docblock that whoever is calling this method programmatically will also update mapped fields, regardless of their current values.

  2. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,59 @@ public function getSource() {
        * @return \Drupal\media\MediaInterface
    ...
    +   *   The updated media entity.
    

    Could we simplify this with @return $this ?

  3. +++ b/core/modules/media/src/MediaInterface.php
    @@ -64,4 +64,12 @@ public function setCreatedTime($timestamp);
    +   * @return \Drupal\media\MediaInterface
    +   *   The updated media entity.
    

    Could we simplify this with @return $this ?

  4. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,166 @@
    +    // - the queueing is disabled _and_ an update is actually needed (i.e., the
    +    //   value of the source field has changed), or
    

    Nitpick: sorry technically this sentence should read:

    // ... (i.e., this is a new entity or the value of the source field has changed) ...
    
  5. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,166 @@
    +      $current_items = $media->get($source_field);
    

    If we don't want to move this method to the source interface, maybe we should use then
    MediaSourceInterface::getSourceFieldValue() instead here to get the source field value?

    That is a public method of the source interface, and I was under the assumption that it should be used as the "canonical" way of getting the source field value.

  6. +++ b/core/modules/media/src/MediaTypeForm.php
    @@ -207,13 +207,13 @@ public function form(array $form, FormStateInterface $form_state) {
    -        'queue_thumbnail_downloads' => $this->t('Queue thumbnail downloads'),
    +        'queue_thumbnail_downloads' => $this->t('Queue metadata downloads'),
    

    Now that we are not dealing with thumbnails only anymore, maybe a better label for this could be something like "Queue metadata fetching" ?

  7. +++ b/core/modules/media/src/MediaTypeForm.php
    @@ -207,13 +207,13 @@ public function form(array $form, FormStateInterface $form_state) {
    +    $form['workflow']['options']['queue_thumbnail_downloads']['#description'] = $this->t('Download metadata via a queue. When using remote media sources, downloading metadata could be a slow process. Using a queue allows for this process to be handled in the background.');
    

    Again, maybe "Fetch metadata via a queue (...)" ?

    (No strong feelings though)

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new50.16 KB
new32.58 KB

#34

A. Fixed. Added documentation to MediaSourceInterface::getMetadata().
B. Off course it could be done in this issue, but unavailable remote metadata is a separate issue imho.

1. Yes, we now always update metadata if the source changes. If the metadata did not update before when a source field changed, this is a bug right? Mapped metadata fields should not be manually filled by a user. Maybe we should even go as far and disable them in the media form? That would probably be a follow up as well.
2. Fixed.
3. Fixed. This was a little inconsitent in MediaInterface, so changed it for setCreatedTime() as well. The setName() method already used @return $this.
4. Fixed.
5. Fixed.
6. Fixed.
7. Fixed.

While doing nr 6 and 7, I noticed the queue was still named specifically ThumbnailDownloader. Also the flag on media types was named queue_thumbnail_downloads. I feel this is confusing and let's just rip the band-aid off.

I renamed the flag and deprecated thumbnailDownloadsAreQueued() and setQueueThumbnailDownloadsStatus() in favor of new metadataFetchingIsQueued() and setQueueMetadataFetchingStatus() methods.

I also renamed the QueueWorker plugin from ThumbnailDownloader to MetadataFetcher. I think we are allowed to do that. Maybe we have to keep the plugin ID the same though?

# Plugins
Particular plugin classes should not be considered part of the public API. References to plugin IDs and settings in default configuration can be relied upon however.

Hope this doesn't break the tests too bad.

phenaproxima’s picture

Maybe we have to keep the plugin ID the same though?

We can probably implement an alter hook so as to "alias" the old plugin ID -- hook_queue_info_alter() should do the trick.

Status: Needs review » Needs work

The last submitted patch, 35: 2878119-35.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new51.22 KB
new2.59 KB

Added hook_queue_info_alter() for BC and fixed the tests (hopefully all of them).

Status: Needs review » Needs work

The last submitted patch, 38: 2878119-38.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new51.22 KB
new7.28 KB

Another try to fix the tests.

marcoscano’s picture

Status: Needs review » Needs work

Thanks @seanB!

A) +1 for removing the band-aid and renaming the queue. But the same reasoning (i.e. Let's fix everything that makes the metadata handling inconsistent/weird/fragile/etc in this issue instead of only half-solving them) would apply, IMHO, to try and have at least a very basic approach to what to do when remote metadata is temporarily unavailable. I'd try to see how the solution for that looks like here, if we realize it's starting to get out of hand, then we can defer it to a follow-up.

B) About #34-1: I still think this is a big issue that we should try to solve here. As I said, making sense or not, the previous behavior is something that is out there since a year already, so people can certainly be relying on that. It's not that weird actually, it just means that the feature is: " When mapping metadata to entity fields, these will be populated when empty with metadata from the API. ", instead of " When mapping metadata to entity fields, these will be updated every time the source field value changes ", which is the behavior you are describing.

  1. +++ b/core/modules/media/media.module
    @@ -336,3 +336,15 @@ function media_preprocess_media_reference_help(&$variables) {
    +  // The queue worker plugin for downloading thumbnails was changed to update
    +  // all metadata, so to maintain backwards compatibility, ensure that
    +  // media_entity_thumbnail is an alias of media_metadata_fetcher.
    

    Should we have a @todo indicating that this should be removed in D9, or similar ? (not sure how this type of workaround to keep BC is documented)

  2. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,166 @@
    +      $original_items = $original->get($source_field);
    +      $current_items = $media->get($source_field);
    

    Is it possible that we've lost the ::getSourceValueField() you had added in one of the previous patches?

    Also, ::getSourceValueField() will return the first value of the field, not a FieldItemInterface object. Because of that, I think:
    a) we should improve the documentation of ::getSourceFieldValue() to clearly indicate that. (Probably in a follow-up?)
    b) Use a standard value comparison, instead of ::equals()
    c) Check if we have test coverage for this. I have the feeling that the tests should have picked the wrong usage in a previous patch, not sure if they did.

  3. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,48 @@
    +   * Returns whether metadata is fetched via a queued.
    

    Looks like there's a typo here.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new58.22 KB
new17.51 KB

#41

A) Just had a discussion about this with Marcos. Metadata fetching is the responsibility of the source. There could be a lot of different reasons why metadata was not fetched correctly, and not in all cases does it actually make sense to fetch it again. We could let sources throw an exception, but what do we do next? Maybe it's better that media sources determine what should happen. For example, a media source could add the media item to the queue for later processing. Another thing we discussed is that it would be nice if users had some way of triggering metadata fetching from the interface. If for some reason only the name changed in a remote document management system, the source value could be the same but there should probably be a way for a user to get the latest metadata and update it. Let's create a follow up for that.

B) Also discussed this with Marcos. He's right (off course) that we currently have the following sentence on the media type edit/create page that people could depend on: Information will only be mapped if the entity field is empty.. I personally think that we should always update metadata, at least when the source field changes, but if currently people are depending on source not updating filled in fields, then I guess we shouldn't change that. So we should probably make this configurable. We could leave the old behaviour for existing sites, but should probably change the default to always update metadata for new ones. I've added a checkbox to configure this per media type to the patch.

1. Fixed. There does not really seem to be a standard for it though.
2. Fixed.
3. Fixed.

Let's hope the new checkbox doesn't break too much.

Status: Needs review » Needs work

The last submitted patch, 42: 2878119-42.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new58.23 KB

Ahh that needed a reroll.

phenaproxima’s picture

Status: Needs review » Needs work
Issue tags: -Needs reroll +Needs change record, +Needs update path tests

I really like this patch. It's so logical and it deeply un-breaks the crappy metadata updating code we originally landed in core. Beautiful work. I cannot wait to see this committed. That said, we have some (not horrible) work to do...

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -30,9 +30,12 @@ media.type.*:
    +    always_update_mapped_metadata:
    +      type: boolean
    +      label: 'Whether mapped metadata should always be updated'
    

    I don't know if I like the naming here. How about we call this overwrite_mapped_metadata, and describe it as "Overwrite previously mapped metadata values"?

  2. +++ b/core/modules/media/media.install
    @@ -163,3 +163,17 @@ function media_update_8600() {
    +function media_update_8601() {
    +  /** @var \Drupal\media\MediaTypeInterface[] $media_types */
    +  $media_types = \Drupal::entityTypeManager()->getStorage('media_type')->loadMultiple();
    +  foreach ($media_types as $media_type) {
    +    $media_type->set('queue_metadata_fetching', $media_type->get('queue_thumbnail_downloads'))
    +      ->set('queue_thumbnail_downloads', NULL)
    +      ->set('always_update_mapped_metadata', FALSE)
    +      ->save();
    +  }
    +}
    

    The entity system is not guaranteed during an update hook, so we need to do this with the config factory. Also, we will need a test of this update path.

  3. +++ b/core/modules/media/media.module
    @@ -336,3 +336,16 @@ function media_preprocess_media_reference_help(&$variables) {
    +function media_queue_info_alter(&$queues) {
    

    $queues can be type hinted as an array.

  4. +++ b/core/modules/media/media.module
    @@ -336,3 +336,16 @@ function media_preprocess_media_reference_help(&$variables) {
    +  // The queue worker plugin for downloading thumbnails was changed to update
    +  // all metadata, so to maintain backwards compatibility, ensure that
    +  // media_entity_thumbnail is an alias of media_metadata_fetcher. This will be
    +  // dropped in Drupal 9.0.0.
    

    Let's add a @see to this issue, for reference, and a @deprecated so we can easily remove it.

  5. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,62 @@ public function getSource() {
    +      // Set the media name as well, once this could involve retrieving remote
    

    Should "once" be "since"?

  6. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,62 @@ public function getSource() {
    +    $always_update_mapped_metadata = $this->bundle->entity->mappedMetadataIsAlwaysUpdated();
    

    Let's rename $always_update_mapped_metadata to $should_overwrite, because that would more clearly describe what the flag represents.

  7. +++ b/core/modules/media/src/Entity/Media.php
    @@ -217,8 +242,13 @@ protected function updateThumbnail($from_queue = FALSE) {
    +    if (!$thumbnail_uri) {
    +      $default_thumbnail_filename = $this->getSource()->getPluginDefinition()['default_thumbnail_filename'];
    +      $thumbnail_uri = \Drupal::config('media.settings')->get('icon_base_uri') . '/' . $default_thumbnail_filename;
    +    }
    

    I don't think we should do this here; it convolutes the method. loadThumbnail() should always take a URI and always return a file entity, period. Let's move this default thumbnail URI stuff elsewhere.

  8. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -124,6 +125,13 @@ class MediaType extends ConfigEntityBundleBase implements MediaTypeInterface, En
    +  /**
    +   * Whether mapped metadata should always be updated.
    +   *
    +   * @var bool
    +   */
    +  protected $always_update_mapped_metadata = TRUE;
    

    This might call for a change record, since it's pretty much a reversal of what Media has been doing for two minor versions now.

  9. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +166,28 @@ public function setDescription($description) {
    +  public function setQueueThumbnailDownloadsStatus($queue_metadata_fetching) {
    +    return $this->setQueueMetadataFetchingStatus();
    +  }
    +
    

    Are we deprecating this method? If so, we need to throw a deprecation notice and link to a change record.

  10. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +166,28 @@ public function setDescription($description) {
    +  public function metadataFetchingIsQueued() {
    +    return $this->queue_metadata_fetching;
    +  }
    

    Should use $this->get('queue_metadata_fetching').

  11. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +166,28 @@ public function setDescription($description) {
    +  public function setQueueMetadataFetchingStatus($queue_metadata_fetching) {
    +    return $this->set('queue_metadata_fetching', $queue_metadata_fetching);
    

    I find this method name kind of obscure. How about queueMetadataFetching()?

  12. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,17 @@ public function getMetadataAttributes();
    +   * there are 2 special metadata attributes every source should provide:
    

    s/should/must. We need to be clear about that here.

  13. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,17 @@ public function getMetadataAttributes();
    +   *   should not use any remote API calls. A source specific attribute should
    

    Correction: it *must* not use any remote API calls. There's no reason not to be emphatic about this.

  14. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -178,13 +189,13 @@ public function prepareViewDisplay(MediaTypeInterface $type, EntityViewDisplayIn
    +   * Get the first value stored in the source field.
    

    Not technically in scope, but I think this makes things clearer so let us keep it :)

  15. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,164 @@
    +   *   Metadata should be updated whenever the sourcefield is changed, or if
    

    Nit: "source field" is two words.

  16. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,164 @@
    +      $original_value = $source->getSourceFieldValue($original);
    +      $current_value = $source->getSourceFieldValue($media);
    +
    +      if ($original_value !== $current_value) {
    +        return TRUE;
    +      }
    

    Why aren't we using equals() any more? I think that is a much safer way to determine equality than this is, since it keeps field type-specific logic properly encapsulated.

  17. +++ b/core/modules/media/src/MediaTypeForm.php
    @@ -207,13 +207,15 @@ public function form(array $form, FormStateInterface $form_state) {
    +    $form['workflow']['options']['queue_metadata_fetching']['#description'] = $this->t('Fetches metadata via a queue. When using remote media sources, downloading metadata could be a slow process. Using a queue allows for this process to be handled in the background.');
    

    I think we need to make it clear that "metadata" includes thumbnails. Otherwise, users will wonder why their new media items are displaying a generic thumbnail instead of the one they were expecting.

  18. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,48 @@
    +   * Returns whether metadata is fetched via a queue.
    

    "Metadata and thumbnails", to make it clear that we consider the thumbnail to be a piece of metadata.

  19. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,48 @@
    +   * @deprecated in Drupal 8.6.x, to be removed before Drupal 9.0.0. Use
    +   *   ::metadataFetchingIsQueued() instead.
        */
       public function thumbnailDownloadsAreQueued();
    

    We'll need to link to a change record.

  20. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -122,22 +122,38 @@ public function testMetadataMapping() {
    +    $this->assertEquals('Pinkeye', $media->get($field_name)->value, 'Metadata attribute was mapped to the field.');
    

    Let's change any assertEquals() calls we're touching here to assertSame(), while we're at it.

phenaproxima’s picture

Also, we need to improve the issue summary, since this scope has expanded.

marcoscano’s picture

Why aren't we using equals() any more? I think that is a much safer way to determine equality than this is, since it keeps field type-specific logic properly encapsulated.

MediaSourceInterface::getSourceFieldValue() returns the value of the first delta, not a FieldItemInterface object.

phenaproxima’s picture

MediaSourceInterface::getSourceFieldValue() returns the value of the first delta, not a FieldItemInterface object.

We don't need to use getSourceFieldValue(). How about we just get the source field name (which is easy to do), then call get() on the media item and the original version, then call equals()? That's what we were doing in an older version of the patch...

marcoscano’s picture

We don't need to use getSourceFieldValue(). How about we just get the source field name (which is easy to do), then call get() on the media item and the original version, then call equals()? That's what we were doing in an older version of the patch...

I know, and I specifically bothered Sean until he agreed recommended that we changed it to ::getSourceFieldValue() because that's the "canonical way of getting a source field value™" of a media entity... That's a public method on the source interface, and we are using it in multiple other places of the code. It would be inconsistent (and in my opinion invalidate the separation of concerns) if we get the values directly. Sources know about their assets, they should do whatever is necessary inside ::getSourceFieldValue() and everyone should, IMHO, use that. If we use it in some places, and in others we get direct values from the entity fields, we lose part of the benefit of having the method in the first place...

phenaproxima’s picture

Issue tags: +Needs followup

Since it's a method of MediaSourceInterface, I guess that makes sense. But I think this could potentially backfire on us later, in circumstances where the === comparison we're doing is not going to be a sufficient determinant of whether the value has changed. For example, if the source field value is a formatted text field, then changing the format could constitute a significant enough change that an update is required, even if the primary value has not changed.

Having said that...maybe that's an edge case and I'm being overly paranoid. And even if someone does encounter it, it is workaround-able by ninjas. So for now, let's stick with what we have in this patch, and open a follow-up to talk more about whether we need a more robust way to determine if a change has occurred.

marcoscano’s picture

@phenaproxima thanks for reviewing!

1- Fixed
2- Pending
3- Fixed
4- Fixed
5- Fixed
6- Fixed
7- Fixed
8- Pending
9- Pending
10- Fixed
11- Fixed
12- Fixed
13- Fixed
14- 👍
15- Fixed
16- Won't fix (as per #50)
17- Fixed
18- Fixed
19- Pending
20- How about a follow-up to change all of them at once? I find extremely confusing to have some asserts using one method and others a different one. It will make future readers scratch their heads for a moment unnecessarily.

Didn't look at the tests yet.

Opened follow-ups:
- #42-A: #2983456: Expose triggering update of media metadata + thumbnail to end users
- #50: #2983457: Investigate if a more robust way of detecting media source field changes is necessary
- #45.20: #2983458: Use assertSame() instead of assertEquals() in Media tests

marcoscano’s picture

StatusFileSize
new59.19 KB
new1.65 KB

Trying to fix some tests.

--- a/core/modules/media/src/MediaSourceBase.php
+++ b/core/modules/media/src/MediaSourceBase.php
@@ -332,7 +332,7 @@ public function getSourceFieldValue(MediaInterface $media) {
 
     /** @var \Drupal\Core\Field\FieldItemInterface $field_item */
     $field_item = $media->get($source_field)->first();
-    return $field_item->{$field_item->mainPropertyName()};
+    return $field_item ? $field_item->{$field_item->mainPropertyName()} : NULL;
   }

It turns out we have several tests that manipulate "broken" media assets (i.e. without a source field on the entity). Now that we call ::getSourceFieldValue() as part of the saving process, they all broke. I believe it is legitimate to fix this here, which is where we are introducing the change. The idea here is that if the entity has a source field configured in the source config, but that field is not available, we just return NULL.

Thoughts?

marcoscano’s picture

StatusFileSize
new59.39 KB
new1.03 KB
+++ b/core/modules/media/src/MediaSourceInterface.php
@@ -178,13 +189,13 @@ public function prepareViewDisplay(MediaTypeInterface $type, EntityViewDisplayIn
   /**
-   * Get the primary value stored in the source field.
+   * Get the first value stored in the source field.
    *
    * @param MediaInterface $media
    *   A media item.
    *
    * @return mixed
-   *   The source value.
+   *   The first value of the source field.
    *
    * @throws \RuntimeException
    *   If the source field for the media source is not defined.
diff --git a/core/modules/media/src/MediaStorage.php b/core/modules/media/src/MediaStorage.php

Now that I think more about this, I believe we should do it differently.

This is a method that source plugins can (and should) override when appropriate, so we should not limit this to the fist delta, which is an implementation detail for the types we ship in core, but not necessarily true for all sources.

I suggest then we leave this more generic with something such as:

public function prepareFormDisplay(MediaTypeInterface $type, EntityFormDisplayInterface $display);
 
   /**
-   * Get the first value stored in the source field.
+   * Get the media asset's source value.
    *
    * @param MediaInterface $media
    *   A media item.
    *
    * @return mixed
-   *   The first value of the source field.
+   *   The value that uniquely represents a given media asset. This usually
+   *   means a scalar value present in the first delta of the source field, but
+   *   source plugins are free to have their own logic to determine what this
+   *   value should be.
    *
    * @throws \RuntimeException
    *   If the source field for the media source is not defined.

Which is what this patch implements. I'd be cool to defer this to a follow-up though, if we believe this needs more thorough discussion.

marcoscano’s picture

StatusFileSize
new59.41 KB
new1.37 KB

More test-fixing

marcoscano’s picture

StatusFileSize
new62.55 KB
new4.14 KB

Of course having Umami media types committed in the meantime messed up with us as well :P

Another try.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new57.49 KB
new18.24 KB

Here is a first stab at a change record: https://www.drupal.org/node/2983661

#45

2. Fixed.
8. See change record.
9. Added a link to the change record.
19. Added a link to the change record.

Also changed some minor things from #52 and #53 and added the new 'overwrite_mapped_metadata' to all media type config in the standard profile and Umami.

marcoscano’s picture

Soooo.... @seanB and I just realized that this code comment and this UI text are both wrong, because this code is indeed always updating the metadata when the source field changed, so the non-override thing was only happening when you save the entity without changing the source field value. Which we are not changing in this patch, so the whole checkbox to make overriding configurable is useless.

(╯°□°)╯︵ ┻━┻

seanb’s picture

StatusFileSize
new52.51 KB
new18.93 KB

New patch.

The last submitted patch, 56: 2878119-56.patch, failed testing. View results

phenaproxima’s picture

Status: Needs review » Needs work

Looking really nice!

  1. +++ b/core/modules/media/media.install
    @@ -163,3 +163,16 @@ function media_update_8600() {
    +      ->set('queue_thumbnail_downloads', NULL)
    

    I think we want to use clear() here, not set().

  2. +++ b/core/modules/media/media.module
    @@ -336,3 +336,18 @@ function media_preprocess_media_reference_help(&$variables) {
    +  // @deprecated
    +  if (isset($queues['media_metadata_fetcher'])) {
    +    $queues['media_entity_thumbnail'] = $queues['media_metadata_fetcher'];
    +  }
    

    Let's put the @deprecated in the function doc comment, and link to the change record under that.

  3. +++ b/core/modules/media/src/Entity/Media.php
    @@ -239,115 +260,27 @@ protected function loadThumbnail($thumbnail_uri = NULL) {
    +        $default_thumbnail_filename = $this->getSource()->getPluginDefinition()['default_thumbnail_filename'];
    +        $thumbnail_uri = \Drupal::config('media.settings')->get('icon_base_uri') . '/' . $default_thumbnail_filename;
    

    I'd ideally like to move this into the source plugin, but we can do that in a follow-up.

  4. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +158,28 @@ public function setDescription($description) {
    +  public function setQueueThumbnailDownloadsStatus($queue_metadata_fetching) {
    +    return $this->queueMetadataFetching($queue_metadata_fetching);
    +  }
    

    We need to throw an E_USER_DEPRECATED error here.

  5. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,17 @@ public function getMetadataAttributes();
    +   * - default_name
    +   *   The default name generated for a media item. The generated default name
    +   *   must not use any remote API calls. A source specific attribute should
    +   *   be used to fetch a name for the media item from a remote API.
    +   * - thumbnail_uri
    +   *   The URI of the thumbnail file for the media item, e.g.
    +   *   public://directory/thumbnail.jpg.
    

    Formatting is wrong here: it should be something like - default_name: (explanation here)

  6. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,52 @@
    +   * @param bool $queue_metadata_fetching
    +   *   The queue metadata fetching flag.
    

    We can improve this parameter description. What does it mean for metadata fetching to be queued? This is a good place for us to explain that.

  7. +++ b/core/modules/media/tests/src/Functional/Update/MediaUpdateTest.php
    @@ -74,4 +74,31 @@ public function testOEmbedConfig() {
    +    $config_factory = $this->container->get('config.factory');
    +    $media_type_config_names = $config_factory->listAll('media.type.');
    +
    +    $queue_statuses = [];
    +    foreach ($media_type_config_names as $media_type_config_name) {
    +      $queue_statuses[$media_type_config_name] = $config_factory->get($media_type_config_name)->get('queue_thumbnail_downloads');
    +    }
    

    We should assert that $queue_statuses contains at least one TRUE value, so as to be sure our update test is actually testing something ($this->assertContains(TRUE, $queue_statues))

  8. +++ b/core/modules/media/tests/src/Functional/Update/MediaUpdateTest.php
    @@ -74,4 +74,31 @@ public function testOEmbedConfig() {
    +      $media_type_config = $config_factory->getEditable($media_type_config_name);
    

    There's no need for $config_factory, just use $this->config() to retrieve editable configuration.

  9. +++ b/core/modules/media/tests/src/Functional/Update/MediaUpdateTest.php
    @@ -74,4 +74,31 @@ public function testOEmbedConfig() {
    +      $media_type_config->set('queue_metadata_fetching', $media_type_config->get('queue_thumbnail_downloads'))
    +        ->set('queue_thumbnail_downloads', NULL)
    +        ->save();
    

    Why are we doing this in the test? Surely we just want to read back the updated configuration and assert things?

Regarding #57, I discussed briefly with @marcoscano on Slack. I'm really glad we discovered this inconsistency now and are removing the incorrect text/comments in this patch. Landing this patch will really untangle the hideous logic surrounding thumbnail/metadata updates, so with that in mind I suggest we open a follow-up issue to make overwriting configurable and address that in a dedicated issue/patch. Changing it now is out of scope.

Thanks for your continued excellent work, @marcoscano and @seanB!

deviantintegral’s picture

I think there's some UX implications of queuing all metadata and not just thumbnails. For example, here is what an end user would see after embedding a YouTube video, before the queue has run:

Default view of a queued media item

  1. The created status message doesn't tell the user that metadata will be processed later. To an editor, it looks like the media item is just broken.
  2. The default title is the entity type / bundle / uuid. Let's say 500 items are bulk-imported. Until the queue is processed, there's no easy way to find a specific item from the content view.
  3. By default, the media type set to publish new content. That means that search engines and end users may be exposed to the unprocessed media item. Since there's no core functionality to publish after fetching the metadata, how are sites supposed to handle this state?
  4. Rendering the media item may require metadata that's missing. For example, perhaps the remote video source shouldn't render a YouTube video until it has the dimensions of the video.

Also, it looks like the name of the media item is not updated when processing the queue.

I'd suggest:

  1. When metadata is queued, we should indicate to the user that's happening.
  2. When a new media item is created, and metadata is queued, we should set the status to unpublished. When the queue item is processed, we should publish the media item if "published" is set in the media type. We'd probably need to include that setting in the queue item data to handle changing the media type while the queue is still being processed so it reflects the state of when the queue item was created.
  3. Some method or flag on Media entities to indicate that metadata is entirely missing and hasn't been fetched yet. This would be flipped when updateMetadata() finishes.

Another thing we discussed is that it would be nice if users had some way of triggering metadata fetching from the interface.

This is actually something being asked for on a project I'm on. It would be especially useful when editors override video metadata. For example, a video title and summary is wrong in an upstream source. It's (temporarily) overridden in Drupal, and some way to say "please revert this back to the source metadata" would be very nice. I filed #2983724: Add a user interface to fetch and revert mapped metadata for this.

B. Off course it could be done in this issue, but unavailable remote metadata is a separate issue imho.

Filed #2983728: Handle metadata errors gracefully.

marcoscano’s picture

#61 brings some valid points, thanks @deviantintegral !

Some notes:

Also, it looks like the name of the media item is not updated when processing the queue.

This is actually a bug with the "Remote video" type shipped in D8.6, where the title metadata is not mapped to the name basefield. It is being fixed in #2983325: Use title -> name mapping for the "Remote video" type.

Also, we briefly discussed yesterday too the issue with the iframe dimensions and scroll bars. It turns out that the oembed request we perform will respect the formatter settings defined for that field. By default these settings are null (no values for width / height), so the iFrame will be built with whatever dimensions the oEmbed provider decided to send back as default. Unfortunately it seems that Youtube sends a very unreasonable 480px-width iframe... :/ Not sure if it happens with all videos, or if we should hardcode more reasonable default values for the formatter, but this should probably be better discussed in another issue I guess.

Having that said, I believe the other points are very accurate and we should probably try to solve them here too.

  • +1 for clearly indicating in the confirmation message when the metadata is queued, and that the media item is "not yet complete"
  • I am a bit unsure about the idea of always unpublishing the media item before the queue is processed. It clearly makes sense in the case of remote videos where the default name is very unfriendly, but I wonder if that would be generalizable enough. Maybe having another setting on the media type such as "Unpublish the media item while waiting in the queue" (states-enabled by the "Use queue" checkbox), would be a good compromise? In any case, I agree that having an "unprocessed" or "half-ready" media item exposed to users and search engines for an undetermined amount of time is a bad idea that we should solve here.
  • Some method or flag on Media entities to indicate that metadata is entirely missing and hasn't been fetched yet. This would be flipped when updateMetadata() finishes.

    If the queue worker is responsible for publishing back the item when metadata is updated, I'm not sure other scenarios where this would be useful. Do you have something elese specific in mind?

About triggering manually the metadata update, we started to look into that in #2983456: Expose triggering update of media metadata + thumbnail to end users. Not entirely sure if this would make #2983724: Add a user interface to fetch and revert mapped metadata a duplicate, once you mention the ability to also revert the already mapped metadata. However, with the current patch, I'm not sure if such scenario would be necessary. Basically, the metadata will be updated (and mapped fields overriden) every time the source field changes. However, if the source field doesn't change, users can simply edit the item and manually change the field values, saving the entity again. No metadata update would be triggered in this second save operation. Would that be enough for the use case you had in mind?

Thanks!

phenaproxima’s picture

Thanks for the very excellent points you raised, @deviantintegral! I agree with all of them 100%. However, I'm not sure this is the issue in which to solve them. They are all very important UX problems, but I think this issue needs to stay tightly focused on the problem at hand, which is correcting the poor design of the underlying API for metadata retrieval. Until we do that, very little meaningful progress can be made, and we are sitting with a major design flaw that can potentially compromise the operation of sites that have a lot of media items (especially ones that use oEmbed). Since there is UX fallout here, I think we can and should fix the lowest-hanging fruit now (i.e., setting a message when saving a media item), but the other points should be handled in high priority follow-ups.

I am completely in favor of opening those issues now and trying to land them as soon as possible (i.e., let's shoot for 8.6!), but in this issue, I'd prefer to deal only with the API-level problems.

marcoscano’s picture

StatusFileSize
new263.75 KB
new263.38 KB

However, I'm not sure this is the issue in which to solve them. They are all very important UX problems, but I think this issue needs to stay tightly focused on the problem at hand, which is correcting the poor design of the underlying API for metadata retrieval.

However, the "broken" name issue for queued remote videos could be seen as a regression we are introducing with this patch.

Testing scenario 1:

- Drupal 8.6.x HEAD
- Enable Media
- Go to the "/admin/structure/media/manage/remote_video", map the "Resource title" to the "Name" field, and check "Queue thumbnails download"
- Create a media item by pasting a youtube URL
- This is what you get:

Testing scenario 2:

- Drupal 8.6.x HEAD with patch #58 applied
- Enable Media
- Go to the "/admin/structure/media/manage/remote_video", map the "Resource title" to the "Name" field, and check "Queue metadata fetching"
- Create a media item by pasting a youtube URL
- This is what you get:

IMHO it's hard to solve this without assuming that the queued items are not "yet ready for public consumption"...

marcoscano’s picture

This addresses the feedback from #60, thanks for reviewing!

Leaving in Needs Work because I believe we should at least fix:
- Improve the confirmation message when the item is queued
- Implement a new checkbox (selected by default) allowing sites to configure whether the queued item should be created unpublished, and publish it once the worker has successfully updated the metadata.

marcoscano’s picture

StatusFileSize
new52.72 KB
new5.47 KB

Ups, forgot the patch.

marcoscano’s picture

We discussed this with @daniel.bosen, @chr.fritsch, @seanB, @Gabor Hotjsy at Drupal Dev Days. The consensus about the above mentioned regression and how to handle it appears to be:

For newly created media items, it is indeed bad UX to have a "broken" item published for an undetermined time. For existing entities, however, if you just update the source field, it would probably be bad to unpublish something that is already available on the site. Those entities would have (for example) a title that might not reflect the new video, but it's not a "broken" default value for the title, so we probably shouldn't force it to be unpublished.

So we think the best compromise is:

- When a user is creating a new media item for the first time (and it's queued), we will unpublish it until the item is processed by the queue worker.
- We will also store in the queue item the status that the user-defined when saving the form
- The queue worker, after updating the metadata, will restore the item's status to the status the user selected when saving the form
- We will not change the publishing status of existing entities when processing them in the queue
- We will adjust the confirmation message to reflect all these scenarios, so people know what's happening.

Gábor also pointed out that it would be weird if the user didn't have visibility over the media item that they just created. So that indicates us that fixing #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user is probably a must for the above plan to make sense.

phenaproxima’s picture

I think that plan sounds good. +1 from me.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new57.78 KB
new7.18 KB

Here is a patch implementing improved media save messages and unpublishing new queued items. Still have to fix the test fail in #66 by adding one or more media types with queueing enabled to the fixture.

Status: Needs review » Needs work

The last submitted patch, 69: 2878119-69.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new61.25 KB
new3.3 KB

Rerolled after the latest changes and tried to fix the tests.

deviantintegral’s picture

Status: Needs review » Needs work

Did some basic tests by hand and so far this is working as advertised. A few code notes below.

  1. +++ b/core/modules/media/media.module
    @@ -336,3 +336,20 @@ function media_preprocess_media_reference_help(&$variables) {
    +  // The queue worker plugin for downloading thumbnails was changed to update
    

    This is great!

  2. +++ b/core/modules/media/src/Entity/Media.php
    @@ -97,9 +97,8 @@ public function getName() {
    +    return $name->value;
    

    +1

  3. +++ b/core/modules/media/src/MediaForm.php
    @@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
    +    // If the metadata fetching is queued, show a link to do it immediatelly.
    

    I don't see a status message for this anymore. And comment typo :D

  4. +++ b/core/modules/media/src/MediaForm.php
    @@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
    +        $saved_message .= ' The media item has been configured to retrieve additional information in the background. Some information shown on the page might temporarily not be correct. The item is unpublished and will be published automatically when the additional information is retrieved.';
    

    This is a pretty long message and I wonder if it would be a bit overwhelming. How about something like:

    "The @media_type has been saved will be published once all @media_source_label information has been retrieved. <a>Download media information now</a>."

    _However_ as a site builder, if I've queued processing it's probably because I don't want to worry about PHP timeouts. As an editor, what I'd probably want is some way to say "this is a really important asset, so please put it at the top of the queue". Perhaps we leave out a link entirely for now.

  5. +++ b/core/modules/media/src/MediaForm.php
    @@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
    +    $this->messenger()->addStatus($this->t($saved_message, $t_args));
    

    Doesn't $saved_message have to be inline with the t() method call to be picked up by translation tools? Or is that obsolete now?

  6. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,16 @@ public function getMetadataAttributes();
    +   * there are 2 special metadata attributes every source must provide:
    

    Should we document that thumbnail_alt is not required but highly recommended?

  7. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -178,13 +188,16 @@ public function prepareViewDisplay(MediaTypeInterface $type, EntityViewDisplayIn
    +   *   The value that uniquely represents a given media item. This usually means
    

    I could see developers thinking "I just need to return $media->id() here". Should we clarify that it represents the media item in the "remote" system? Or just simply "... represents a given media item. This is not the media entity ID, as the same media item may be referenced by multiple media entities. Typically this is a remote URL of some kind."

  8. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,171 @@
    + * Defines the storage handler class for media.
    

    Should we summarize why we override the base class - something like "Storage handler for media that handles queued metadata fetching."

  9. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,171 @@
    +   * Determines if the source field value of a media item has changed.
    

    This method feels like it belongs on the media entity itself, given that's the "subject" of the method.

    Should this be "media entity" instead of "media item"?

seanb’s picture

StatusFileSize
new61.53 KB
new3.87 KB

Addressed some of the comments in #72

3. Not sure what you mean, the status message is updated in the patch? Fixed the typo.

+++ b/core/modules/media/src/MediaForm.php
@@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
+    $saved_message = '';
...
+      $saved_message = '@type %label has been created.';
...
+      $saved_message = '@type %label has been updated.';
...
+    // If the metadata fetching is queued, show a link to do it immediatelly.
+    if ($this->entity->bundle->entity->metadataFetchingIsQueued()) {
+      if ($saved === SAVED_NEW) {
+        $saved_message .= ' The media item has been configured to retrieve additional information in the background. Some information shown on the page might temporarily not be correct. The item is unpublished and will be published automatically when the additional information is retrieved.';
+      }
+      else {
+        $saved_message .= ' The media item has been configured to retrieve additional information in the background. Some information shown on the page might temporarily not be correct.';
+      }
+    }
+
+    $this->messenger()->addStatus($this->t($saved_message, $t_args));

4. I feel the proposed message will probably lead to confusion. I agree a short message is better, but I think explaining the metadata fetching is queued for later processing and providing a way to fix that is necessary. If we want to fix that, we could allow editors to override queueing while create a media item, but I think that is out of scope for this issue. I left this for now.

5. Not exactly sure what you mean with "have to be inline with the t() method"?

6. Fixed. Added some documentation.

7. Fixed. Removed the word "uniquely". Most media types are currently local files, so I don't think we can assume this typically is a remote URL.

8. Fixed. Added a comment to explain the storage is overriden to handle metadata fetching outside of the database transaction.

9. When adding media to core it was decided to use media item instead of media entity. Also left this for now. I still see some occurences of media entity in some places, we should probably fix that in a follow up.

seanb’s picture

Status: Needs work » Needs review

Changing status to get some more feedback.

The last submitted patch, 53: 2878119-53.patch, failed testing. View results

phenaproxima’s picture

Status: Needs review » Needs work
Issue tags: -Needs update path tests

I love this patch! It is so much better than what we had before and fixes so many problems, it's just so many orders of magnitude of improvement that I feel like I need a cold shower after reading it. Great job!!

  1. +++ b/core/modules/media/media.module
    @@ -336,3 +336,20 @@ function media_preprocess_media_reference_help(&$variables) {
    +/**
    + * Implements hook_queue_info_alter().
    + *
    + * @deprecated
    + * @see https://www.drupal.org/node/2983661
    + */
    

    The @deprecated needs to explain the nature of the deprecation as per policy (referencing the existing change record), and there should be a blank line before the @see.

  2. +++ b/core/modules/media/media.module
    @@ -336,3 +336,20 @@ function media_preprocess_media_reference_help(&$variables) {
    +  if (isset($queues['media_metadata_fetcher'])) {
    +    $queues['media_entity_thumbnail'] = $queues['media_metadata_fetcher'];
    +  }
    

    $queues['media_metadata_fetcher'] is guaranteed to exist, so I think we could drop the if check. Additionally, we should have $queues['media_entity_thumbnail'] be a reference to $queues['media_metadata_fetcher'], so $queues['media_entity_thumbnail'] = &$queues['media_metadata_fetcher'].

  3. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,59 @@ public function getSource() {
    +    foreach ($field_map as $attribute_name => $field_name) {
    +      $this->set($field_name, $source->getMetadata($this, $attribute_name));
    +    }
    

    We're overwriting the mapped field every time we update metadata. Isn't this a BC break?

  4. +++ b/core/modules/media/src/Entity/Media.php
    @@ -239,115 +260,27 @@ protected function loadThumbnail($thumbnail_uri = NULL) {
    +        $translation->thumbnail->alt = $this->t('Thumbnail', [], ['langcode' => $translation->langcode->value]);
    

    'langcode' should use $translation->language()->getId(), to keep the abstraction sealed.

  5. +++ b/core/modules/media/src/Entity/Media.php
    @@ -373,55 +306,6 @@ public function preSaveRevision(EntityStorageInterface $storage, \stdClass $reco
    -  /**
    -   * {@inheritdoc}
    -   */
    -  public function save() {
    

    Thank Cthulhu this method is going away! That's a load off.

  6. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +158,29 @@ public function setDescription($description) {
    +  public function setQueueThumbnailDownloadsStatus($queue_metadata_fetching) {
    

    Why did the name of the parameter change? We can leave it as-is.

  7. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +158,29 @@ public function setDescription($description) {
    +  public function queueMetadataFetching($queue_metadata_fetching) {
    

    I feel like $queue_metadata_fetching should default to TRUE for convenience. Can we update the interface and this method?

  8. +++ b/core/modules/media/src/MediaForm.php
    @@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
    +    // If the metadata fetching is queued, show a link to do it immediately.
    

    This patch doesn't implement a link. Let's add a @todo which references the related issue.

  9. +++ b/core/modules/media/src/MediaForm.php
    @@ -63,15 +63,28 @@ public function save(array $form, FormStateInterface $form_state) {
    +        $saved_message .= ' The media item has been configured to retrieve additional information in the background. Some information shown on the page might temporarily not be correct. The item is unpublished and will be published automatically when the additional information is retrieved.';
    

    "...might be temporarily incorrect". Also, what if the item has been saved unpublished? We should only tell them the item will be published later if they have actually saved it as published, no?

  10. +++ b/core/modules/media/src/MediaSourceBase.php
    @@ -330,6 +330,10 @@ public function getSourceFieldValue(MediaInterface $media) {
    +    if ($media->get($source_field)->isEmpty()) {
    +      return NULL;
    +    }
    

    Is this necessary? If so, can we explain why it is?

  11. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,20 @@ public function getMetadataAttributes();
    +   * defined for the source. Besides the source specific metadata attributes,
    

    Should be "source-specific".

  12. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,20 @@ public function getMetadataAttributes();
    +   *   default name must not use any remote API calls. A source specific
    

    Same here.

  13. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,20 @@ public function getMetadataAttributes();
    +   * If available, it is highly recommended to also implement a metadata
    

    "available" --> "possible"

  14. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,20 @@ public function getMetadataAttributes();
    +   * attribute for the alt text by setting the thumbnail_alt_metadata_attribute
    

    "...for the thumbnail alt text".

  15. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -178,13 +192,15 @@ public function prepareViewDisplay(MediaTypeInterface $type, EntityViewDisplayIn
    +   *   The value that represents a given media item. This usually means a scalar
    +   *   value present in the first delta of the source field, but source plugins
    +   *   are free to have their own logic to determine what this value should be.
    

    Nice explanation!! Love it.

  16. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,174 @@
    +    $queue_data = [];
    

    Let's call this $queue_item, since that's consistent with createItem().

  17. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,174 @@
    +    if ($should_queue && $media->isNew()) {
    +      $queue_data['original_status'] = $media->isPublished();
    +      $media->setUnpublished();
    +    }
    

    This could use a comment.

  18. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,174 @@
    +    $original = isset($media->original)
    +      ? $media->original
    +      : $this->loadUnchanged($media->id());
    

    Nit, but I've seen patches be kicked back for this even though I was the one who originally wrote it this way: let's put this all on one line.

  19. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,174 @@
    +      if (!$original->hasTranslation($langcode)) {
    +        return TRUE;
    +      }
    

    This could benefit from a comment.

  20. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,54 @@
    +   *   metadata should be fetched directly.
    

    "directly" --> "immediately on save"

  21. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,54 @@
    +   * Sets a flag to indicate that metadata should be fetched via a queue.
    

    This should mention that the thumbnail is also considered metadata.

  22. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,54 @@
    +   * @param bool $queue_metadata_fetching
    +   *   The queue metadata fetching flag.
    

    Let's change the description to "Whether the thumbnail and metadata should be fetched via a queue."

  23. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,54 @@
    +   *   TRUE if metadata is fetched via a queue to download later, FALSE if the
    

    Can we remove "to download later"?

  24. +++ b/core/modules/media/src/MediaTypeInterface.php
    @@ -33,23 +33,54 @@
    +   *   metadata should be fetched directly.
    

    "directly" --> "immediately on save".

  25. +++ b/core/modules/media/src/Plugin/QueueWorker/MetadataFetcher.php
    @@ -58,10 +58,17 @@ public static function create(ContainerInterface $container, array $configuratio
    +    if ($media = $this->storage->load($data['id'])) {
    +      $media->updateMetadata();
    +      // When metadata fetching for a new media item is queued, the media item
    +      // is unpublished by default. When the media item was originally intended
    +      // to be published, we need to make sure the media item is automatically
    +      // published after the metadata is fetched.
    +      if (!empty($data['original_status'])) {
    +        $media->setPublished();
    +      }
    +      $this->storage->save($media, TRUE);
    

    Why are we calling updateMetadata()? The storage handler calls that anyway if TRUE is passed to save().

  26. +++ b/core/modules/media/tests/src/Functional/Update/MediaUpdateTest.php
    @@ -75,4 +75,29 @@ public function testOEmbedConfig() {
    +      $queue_statuses[$media_type_config_name] = $config_factory->get($media_type_config_name)->get('queue_thumbnail_downloads');
    

    I think we can use $this->config($media_type_config_name)->get() here.

marcoscano’s picture

For the record, posting here a summary of a recent discussion @phenaproxima and I had on slack about BC:

In #34.1, I pointed out a possible BC break being introduced, because I was under the assumption that mapped fields were only populated/updated when empty. It turns out I was wrong (as indicated in #57), so basically we are not changing the previous behavior.

  • On first save, all mapped fields are empty, so they will be populated
  • On entity update:
    • If the source field has changed, the mapped fields will be updated, regardless if they are empty or not.
    • If the source field has not changed, the new patch will not call ::updateMetadata(), so no mapped fields will be overwritten

In fact, after this patch we will actually improve the consistency, once the texts that lead to confusion are part of the code that is being removed.

deviantintegral’s picture

This is getting close!

3. Not sure what you mean, the status message is updated in the patch? Fixed the typo.

I meant the link in this message, which was pointed out in #76.

5. Not exactly sure what you mean with "have to be inline with the t() method"?

You can't use variables inside of t() unless we previously know the string has been translated. See Variables in user interface text.

When adding media to core it was decided to use media item instead of media entity.

I missed that, and have been using "item" all through out a module to refer to the remote object specifically. Thanks for pointing that out!

If the source field has not changed, the new patch will not call ::updateMetadata(), so no mapped fields will be overwritten

Ok, so then it will be the responsibility of media API clients specifically call updateMetadata()? For example, if a webhook notifies a Drupal site about an update. I think that makes sense. If a module wanted to offer editorial overrides of metadata, and preserve them through an updateMetadata call, I guess they could just check for $media->original in getMetadata()?

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new62.73 KB
new14.43 KB

I think this should address all feedback.

#76

1. Fixed
2. Fixed
3. See #77. It looked that way at first, but it actually isn't.
4. Fixed
5. Yay!
6. Fixed
7. I'm not sure about this. Most media types should not have to be queued, only external ones. We could discuss this in a follow up, but I think queueing should be the exception for specific cases, not the default.
8. Fixed
9. Fixed
10. Fixed
11. Fixed
12. Fixed
13. Fixed
14. Fixed
15. Yay!
16. Fixed
17. Fixed
18. Fixed
19. Fixed
20. Fixed
21. Fixed, changed the text a bit to make sure it stays under 80 chars.
22. Fixed
23. Fixed
24. Fixed
25. Fixed
26. Fixed

#78

You can't use variables inside of t() unless we previously know the string has been translated. See Variables in user interface text.

Thanks! Changed the way the translation texts are built to allow the translation to be picked up while at the same time allowing users to translate the full message.

If a module wanted to offer editorial overrides of metadata, and preserve them through an updateMetadata call, I guess they could just check for $media->original in getMetadata()?

Yeah, updating mapped metadata is a difficult issue. Maybe you want to update it when changing the source field (for example the title of a youtube video), maybe not (for example the width of the video)? But I guess sources can implement their own logic in getMetadata() and other modules can also revert updated metadata during preSave().

phenaproxima’s picture

Looks great!

  1. +++ b/core/modules/media/media.module
    @@ -340,7 +340,9 @@ function media_preprocess_media_reference_help(&$variables) {
    + * @deprecated in Drupal 8.6.x, will be removed before Drupal 9.0. Use
    + *   \Drupal\media\Plugin\QueueWorker\MetadataFetcher instead.
    

    Let's reword this a bit: "@deprecated in Drupal 8.6.0 and will be removed before Drupal 9. Use the 'media_metadata_fetcher' queue worker instead of 'media_entity_thumbnail'."

  2. +++ b/core/modules/media/src/MediaForm.php
    @@ -58,32 +58,36 @@ public function form(array $form, FormStateInterface $form_state) {
    +      $saved_message = $this->t('@type %label has been created.', $t_args);
    +
    +      if ($is_queued && $is_published) {
    +        $saved_message = $this->t('@type %label has been created. The media item has been configured to retrieve additional information in the background. Some information shown on the page might be temporarily incorrect. The item is unpublished and will be published automatically when the additional information is retrieved.', $t_args);
    +      }
    +      else if ($is_queued) {
    +        $saved_message = $this->t('@type %label has been created. The media item has been configured to retrieve additional information in the background. Some information shown on the page might be temporarily incorrect.', $t_args);
    +      }
    

    Seems to me like this could be streamlined to avoid repeating these hard-coded strings:

    $saved_message = '@type %label has been created.';
    
    if ($is_queued) {
      $saved_message .= ' The media item has been configured...';
    
      if ($is_published) {
        $saved_message .= ' The item is unpublished...';
      }
    }
    $this->t($saved_message, $t_args);
    

    Although, on second thought, this could translate incorrectly since it assumes each sentence makes sense in that order. Alternatively, we could just set three separate messages...

seanb’s picture

StatusFileSize
new63.19 KB
new1.73 KB

#81

1. Fixed
2. We need to translate the text as a whole so we can't avoid repeating some of the texts. Setting up individual messages is an option, but I prefer the current setup for the same reason we need to translate the text as a whole. Translation of individual parts of the texts is not optimal. Adding some comments to explain what's going on hopefully makes it a little better.

seanb’s picture

Removing some tags.

The checkbox for optionally fetching metadata was removed again in #58. So no need to update the IS anymore.
The change record is: https://www.drupal.org/node/2983661

phenaproxima’s picture

  1. +++ b/core/modules/media/tests/src/Functional/MediaUiFunctionalTest.php
    @@ -183,6 +183,41 @@ public function testMediaWithMultipleMediaTypes() {
    +    $page->pressButton('Save');
    +    $media_id = $this->container->get('entity_type.manager')
    +      ->getStorage('media')
    +      ->getQuery()
    +      ->execute();
    +    $media_id = reset($media_id);
    

    We don't need to do this. Let's just click the "Edit" link instead.

  2. +++ b/core/modules/media/tests/src/Functional/MediaUiFunctionalTest.php
    @@ -183,6 +183,41 @@ public function testMediaWithMultipleMediaTypes() {
    +    $assert_session->pageTextNotContains('The item is unpublished and will be published automatically when the additional information is retrieved.');
    

    Let's add a comment indicating that we expect this test to be absent because the media item is non-new.

  3. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -114,30 +114,17 @@ public function testMetadataMapping() {
    -    // Now change the value of the source field and make sure that the mapped
    -    // values update too.
    -    $this->assertSame('Pinkeye', $media_source->getMetadata($media, $attribute_name), 'Value of the metadata attribute is not correct.');
    -    $media->set('field_media_test', 'some_new_value');
    

    Why are we removing these lines? I ask because, looking at MediaStorage, the metadata should absolutely be updated on save if the source field has changed.

  4. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -114,30 +114,17 @@ public function testMetadataMapping() {
    -    // Remove the value of the mapped field and make sure that it is re-mapped
    -    // on save.
    -    \Drupal::state()->set('media_source_test_attributes', [
    -      $attribute_name => ['title' => 'Attribute to map', 'value' => 'Snowball'],
    -    ]);
    -    $media->{$field_name}->value = NULL;
    -    $this->assertSame('Snowball', $media_source->getMetadata($media, $attribute_name), 'Value of the metadata attribute is not correct.');
    -    $media->save();
    -    $this->assertSame('Snowball', $media->get($field_name)->value, 'Metadata attribute was not mapped to the field.');
    

    Same here; won't the field be re-mapped on save? I'm sure there's a good reason for this, but we need to be able explain the removal of test coverage to committers :)

  5. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -189,14 +176,6 @@ public function testThumbnail() {
    -    // Remove the thumbnail and make sure that it is auto-updated on save.
    -    $media->thumbnail->target_id = NULL;
    -    $this->assertSame('public://thumbnail2.jpg', $media_source->getMetadata($media, 'thumbnail_uri'), 'Value of the thumbnail metadata attribute is not correct.');
    -    $media->save();
    -    $this->assertSame('public://thumbnail2.jpg', $media->thumbnail->entity->getFileUri(), 'New thumbnail was not added to the media item.');
    -    $this->assertSame('Mr. Jones', $media->thumbnail->title, 'Title text was not set on the thumbnail.');
    -    $this->assertEquals('Thumbnail', $media->thumbnail->alt, 'Alt text was not set on the thumbnail.');
    

    Ditto.

  6. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -223,7 +202,7 @@ public function testThumbnail() {
    +    $this->assertSame('public://thumbnail2.jpg', $media_source->getMetadata($media, 'alternative_thumbnail_uri'), 'Value of the thumbnail metadata attribute is not correct.');;
    

    An extra semicolon was added to the end of this otherwise unchanged line.

seanb’s picture

StatusFileSize
new62.33 KB
new3.36 KB

#84

1. Fixed
2. Fixed
3. Fixed
4. In the old situation, we updated the metadata when the source field changed or when the metadata field was empty. In this patch, we only update metadata if the source field changes. The new approach allows users to override metadata fields with an empty value (until the source fields changes) and seems more efficient for sources with external metadata.
5. See 4
6. Fixed

Status: Needs review » Needs work

The last submitted patch, 85: 2878119-85.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new62.36 KB
new1.51 KB

Fixed the tests.

phenaproxima’s picture

Issue tags: +backport

Committers, please backport this issue.

Why? Because it fixes a major architectural problem that was discovered while working on #2831944: Implement media source plugin for remote video via oEmbed.

The problem is that, due to the way the Media class was handling thumbnail and metadata mapping, it was possible for Drupal to be waiting on HTTP/network requests while saving remote media items -- while the database was in a locked transactional state! We jury-rigged a horrible hack to work around this in \Drupal\media\Entity\Media::save() (the hack is removed in this issue), which was always intended to be temporary. The problem is, if you call \Drupal::entityTypeManager()->getStorage('media')->save($media_item) in the current 8.6 API, you'll be able to reproduce the conditions that caused the problem. This is very, very bad indeed and actually sort of breaks the API if you think about it (depending how you call the save() method, you might break your site). Without this fix, releasing 8.6.0 and trumpeting our new oEmbed system would be...well, it would be very not good.

So, given the severity of the problem, the thoroughness of the fix, and the fact that it corrects our API and stabilizes remote media items (as well as making for much cleaner code), I'm officially requesting a backport to 8.6 even if we are adding an API method (which is covered under the BC policy's 1:1 rule).

alexpott’s picture

  1. +++ b/core/modules/media/media.install
    @@ -163,3 +163,16 @@ function media_update_8600() {
    +    $media_type_config->set('queue_metadata_fetching', $media_type_config->get('queue_thumbnail_downloads'))
    +      ->clear('queue_thumbnail_downloads')
    

    If you do the value move in the media type config entity then any exported media type will also be updated. You also can trigger a deprecation notice so that developers can know that they need to update their default configuration.

  2. +++ b/core/modules/media/tests/src/Functional/Update/MediaUpdateTest.php
    @@ -75,4 +75,29 @@ public function testOEmbedConfig() {
    +  /**
    +   * Tests the media type configuration updates.
    +   *
    +   * @see media_update_8601()
    +   */
    +  public function testMediaTypeConfig() {
    

    I find multiple tests in the same updatetestbase test a bit odd since they are all using the same stating data.

samuel.mortenson’s picture

Some review, mostly questions about behavior:

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,59 @@ public function getSource() {
    +      if ($translation->get('name')->isEmpty()) {
    +        $translation->setName($translation->getName());
    +      }
    
    @@ -239,115 +260,27 @@ protected function loadThumbnail($thumbnail_uri = NULL) {
    +      if ($translation->get('name')->isEmpty()) {
    +        $translation->setName('media:' . $this->bundle() . ':' . $this->uuid());
    +      }
    

    This repeated empty check sticks out to me - is this working around a bug, and if so can we document it or open a follow up?

  2. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,176 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function save(EntityInterface $media, $in_queue = FALSE) {
    

    Where is the $in_queue argument documented?

  3. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,176 @@
    +    // When metadata is fetched via a queue, the media item temporarily contains
    +    // empty metadata fields. The media item is unpublished until the metadata
    +    // is fetched via the queue.
    +    if ($should_queue && $media->isNew()) {
    +      $queue_item['original_status'] = $media->isPublished();
    +      $media->setUnpublished();
    +    }
    

    This looks scary - what are the replication steps for new media to be unpublished? Would #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user make this worse?

samuel.mortenson’s picture

I also noticed that a queue item is created for every entity save, even if an item is already in the queue. Is this intentional?

catch’s picture

That's more or less unavoidable until we have #1548286: API for handling unique queue items.

seanb’s picture

StatusFileSize
new63.16 KB
new63.16 KB
new3.6 KB

#89

1. Not exactly sure what you mean? Should we remove the update hook and move the value in one of the methods on the MediaType entity? Also throw a deprecation notice from the mediaType entity as well? Or should we add a deprecation notice in the update hook?
2. I guess we could create different fixtures for each update test, but in this case adding some config to the existing fixture seemed to make sense. Don't have a lot experience with this though, so if this is a blocking issue I might need some help.

#90

1. Added some documentation. A media name is required and we expect it to be there in a lot of places. We don't want to require this for editors though. That is why we have 3 ways to set a title at the moment.

  • A user entered a title, if the field is shown on the form
  • A default title is generated by the media source from the metadata, this is only done directly if metadata is not queued
  • When the metadata is queued, a default title is generated from the bundle/UUID as a last resort

We need the empty checks to make sure the fallbacks only kick in when a title is not already available.
2. Fixed
3. We unpublish a media item temporarily when:

  • Create a media type and enable queued metadata fetching.
  • Create a media item for this type and keep the published checkbox on.
  • On save, we temporarily unpublish the media item, since the default title is not very helpful and metadata fields are all still empty. This is not a media item you would want to show to the visitors of your site.
  • When the queue is processed, the media item is published again.

We preferably want to land #2983456: Expose triggering update of media metadata + thumbnail to end users soon as well to provide users a way to fix this.
The issue #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user becomes more important, but this already is a pretty critical issue if you ask me.

#91

Yeah, as catch already mentioned we don't have a nice way to get around this at the moment.

vijaycs85’s picture

Here is few review comments/questions:

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,60 @@ public function getSource() {
    +    $field_map = $this->bundle->entity->getFieldMap();
    
    +++ b/core/modules/media/src/MediaForm.php
    @@ -58,20 +58,43 @@ public function form(array $form, FormStateInterface $form_state) {
    +    $is_queued = $this->entity->bundle->entity->metadataFetchingIsQueued();
    
    +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,188 @@
    +    $should_queue = $media->bundle->entity->metadataFetchingIsQueued();
    

    Use bundle()-> instead of bundle->?

  2. +++ b/core/modules/media/src/Entity/Media.php
    @@ -166,36 +165,60 @@ public function getSource() {
    +    if ($thumbnail_uri) {
    

    What happens here if there is no thumbnail URI? In real time or via the queue, it's going to be reported back as metadata updated?

  3. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,188 @@
    +    $saved = parent::save($media);
    ...
    +    if ($should_queue) {
    ...
    +      $this->queue->createItem($queue_item);
    

    Could we also check $saved here before queueing? If save failed for some reason, no point in fetching metadata?

effulgentsia’s picture

  1. +++ b/core/modules/media/src/MediaInterface.php
    @@ -64,4 +63,11 @@ public function setCreatedTime($timestamp);
    +  public function updateMetadata();
    

    I think this should be marked @internal, since it's only intended to be called by MediaStorage.

  2. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,188 @@
    + * @internal
    

    I don't think we need to mark the entire class as @internal. Is it any more internal than any of our other entity type storage handlers?

  3. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,188 @@
    +  public function save(EntityInterface $media, $in_queue = FALSE) {
    

    Seems to me like whether or not we're called from the queue worker results in sufficiently different behavior that it warrants a separate method instead of an optional parameter. Perhaps saveFromQueue()? In which case, we should also define that in a MediaStorageInterface.

effulgentsia’s picture

Title: Build a stable API for updating media item metadata » Whether queued or not, update the media thumbnail and metadata prior to beginning the entity save database transaction

Retitling this issue to focus on the goal. The API changes in the patch are just in service of that goal.

effulgentsia’s picture

Title: Whether queued or not, update the media thumbnail and metadata prior to beginning the entity save database transaction » Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction

Simplifying grammar :)

effulgentsia’s picture

Priority: Major » Critical

This might not stay Critical if other committers disagree, but raising it to that for now. I think it satisfies at least some of the criteria of https://www.drupal.org/core/issue-priority#critical-task, such as a concrete performance issue, a viable plan to resolve it, affects the runtime operation of the site, can't be committed to a patch release, and my own discretion.

I'd love to see this land before beta. Whether or not it's eligible between beta and RC might hinge on if others agree with the priority being Critical.

seanb’s picture

StatusFileSize
new63.48 KB
new7.26 KB

#94

1. That doesn't work, bundle() returns a string and the magic getter returns Drupal\Core\Field\EntityReferenceFieldItemList
2. Good point, changed it so we now only set the thumbnail from the metadata if a URI is returned. If this fails it falls back to the default thumbnail in preSave().
3. If saving fails an exception is thrown, we only get SAVED_NEW or SAVED_UPDATED back so there is no extra value in checking that I think?

#95

1. Fixed.
2. Fixed.
3. Fixed.

Status: Needs review » Needs work

The last submitted patch, 99: 2878119-99.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new64.84 KB
new1.36 KB

Ah, new media library config.

vijaycs85’s picture

Thanks for the update @seanB

3. If saving fails an exception is thrown, we only get SAVED_NEW or SAVED_UPDATED back so there is no extra value in checking that I think?

As per docblock,

   * @return bool|int
   *   If the record insert or update failed, returns FALSE. If it succeeded,
   *   returns SAVED_NEW or SAVED_UPDATED, depending on the operation performed.
   */
  abstract protected function doSave($id, EntityInterface $entity);
phenaproxima’s picture

Status: Needs review » Needs work

Not seeing a whole lot wrong here.

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -239,115 +268,28 @@ protected function loadThumbnail($thumbnail_uri = NULL) {
    +      // If the media item does not have a title yet, for instance when the
    +      // metadata fetching is queued, generate a default name.
    +      if ($translation->get('name')->isEmpty()) {
    +        $translation->setName('media:' . $this->bundle() . ':' . $this->uuid());
    +      }
    

    Let's mention that this MUST come before the thumbnail logic below, because $translation->label() is called and we MUST have a name set before we call that, lest it result in a remote API call or some other naughtiness.

  2. +++ b/core/modules/media/src/Entity/MediaType.php
    @@ -158,14 +158,30 @@ public function setDescription($description) {
    +  public function queueMetadataFetching($queue_metadata_fetching) {
    +    return $this->set('queue_metadata_fetching', $queue_metadata_fetching);
    

    Maybe we should cast $queue_metadata_fetching to bool here.

  3. +++ b/core/modules/media/src/MediaForm.php
    @@ -58,20 +58,43 @@ public function form(array $form, FormStateInterface $form_state) {
    +      if ($is_queued && $is_published) {
    +        $saved_message = $this->t('@type %label has been created. The media item has been configured to retrieve additional information in the background. Some information shown on the page might be temporarily incorrect. The item is unpublished and will be published automatically when the additional information is retrieved.', $t_args);
    

    Where are we setting the item to unpublished?

  4. +++ b/core/modules/media/src/MediaForm.php
    @@ -58,20 +58,43 @@ public function form(array $form, FormStateInterface $form_state) {
    +      else if ($is_queued) {
    

    Needs be elseif.

  5. +++ b/core/modules/media/src/MediaInterface.php
    @@ -51,8 +51,7 @@ public function getCreatedTime();
    -   * @return \Drupal\media\MediaInterface
    -   *   The called media item.
    +   * @return $this
    

    This is patch noise, but I'll let it slide :)

  6. +++ b/core/modules/media/src/MediaSourceInterface.php
    @@ -107,6 +107,20 @@ public function getMetadataAttributes();
    +   * - default_name: The default name generated for a media item. The generated
    +   *   default name must not use any remote API calls. A source-specific
    +   *   attribute should be used to fetch a name for the media item from a
    +   *   remote API.
    +   * - thumbnail_uri: The URI of the thumbnail file for the media item, e.g.
    +   *   public://directory/thumbnail.jpg.
    

    We should mention that the names of these attributes are specified by the default_name_metadata_attribute and thumbnail_uri_metadata_attribute properties of the plugin definition.

  7. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    + * The default storage is overridden to handle metadata fetching outside of the
    + * database transaction.
    

    Let's expand this a little to explain why we fetch metadata outside of the transaction.

  8. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +    $should_update_metadata = $this->shouldUpdateMetadata($media);
    +    if (!$should_queue && $should_update_metadata) {
    

    I don't think we need to keep $should_update_metadata in its own variable; the method name is self-explanatory. So this should be if (!$should_queue && $this->shouldUpdateMetadata($media)).

  9. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +    if ($should_queue && $media->isNew()) {
    +      $queue_item['original_status'] = $media->isPublished();
    +      $media->setUnpublished();
    +    }
    

    I'm not sure if, from a UX perspective, which would be less jarring -- allowing the item to have its original status and letting the metadata be temporarily wrong, or unilaterally changing the item's status until the queue runs. Both options suck, but our backs are against the wall for now. Maybe we should get an expert UX opinion on which would be preferable.

  10. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +   * @throws \Drupal\Core\Entity\EntityStorageException
    +   *   In case of failures, an exception is thrown.
    +   *
    

    We don't need this @throws because we are never explicitly throwing the exception.

  11. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +    // Try to load the unchanged media item, so we can compare its source field
    +    // value with the current value. If there is no original version, treat the
    +    // media item as new and update its metadata.
    +    $original = isset($media->original) ? $media->original : $this->loadUnchanged($media->id());
    

    Would it be easy to load the same revision, unchanged (or copy-and-paste the code from the parent class which loads and sets $entity->original during save)? If so, that might smooth out potential Content Moderation-related turbulence. If it's not easy, then let's keep this as-is for now but add a @todo referencing the issue which will make this compatible with Content Moderation, and escalate that issue to major.

  12. +++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
    @@ -189,14 +186,6 @@ public function testThumbnail() {
    -    // Remove the thumbnail and make sure that it is auto-updated on save.
    -    $media->thumbnail->target_id = NULL;
    -    $this->assertSame('public://thumbnail2.jpg', $media_source->getMetadata($media, 'thumbnail_uri'), 'Value of the thumbnail metadata attribute is not correct.');
    -    $media->save();
    -    $this->assertSame('public://thumbnail2.jpg', $media->thumbnail->entity->getFileUri(), 'New thumbnail was not added to the media item.');
    -    $this->assertSame('Mr. Jones', $media->thumbnail->title, 'Title text was not set on the thumbnail.');
    -    $this->assertEquals('Thumbnail', $media->thumbnail->alt, 'Alt text was not set on the thumbnail.');
    

    Shouldn't we check that the default thumbnail is used if the thumbnail is removed?

effulgentsia’s picture

I'm not sure if, from a UX perspective, which would be less jarring -- allowing the item to have its original status and letting the metadata be temporarily wrong, or unilaterally changing the item's status until the queue runs. Both options suck, but our backs are against the wall for now.

Is the metadata even temporarily wrong? Suppose we kept it published and someone were to view it prior to the queue running... wouldn't it just result in $media->getSource()->getMetadata() being called at that time?

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new65.81 KB
new8.02 KB

#102 Fixed. This is not properly documented on EntityStorageInterface::save(), but I guess that's another issue.

   * @return
   *   SAVED_NEW or SAVED_UPDATED is returned depending on the operation
   *   performed.
   *
   * @throws \Drupal\Core\Entity\EntityStorageException
   *   In case of failures, an exception is thrown.

#103

1. Fixed
2. Fixed
3. That would be MediaStorage::save()
4. Fixed
5. :)
6. Fixed
7. Fixed
8. Fixed
9. We tried this for youtube video's. You could have a video published on your site where the name is media:remote_video:7cb1fdf5-2ad6-44ce-918c-ce480c266e87. Besides that the video would not even be shown since the embed code is not fetched yet. When search engines index this content, or non-editors see this, this could lead to confusion. So that's why we decided that showing nothing is a little less evil (for at least search engines and non-editor users). That being said, it is important to land #2983456: Expose triggering update of media metadata + thumbnail to end users pretty soon after this patch, to allow users to trigger the immediate update of metadata if needed.
10. Fixed
11. Added a todo since there seem to be some things that need to be discussed about the latest vs default revision etc.
12. Fixed, good point!

#104

The getMetadata() method is supposed to be used when saving and the fetched values should be stored in the media item. When viewing a media item, you are just displaying the saved field values and no external API calls should be needed.

chr.fritsch’s picture

Status: Needs review » Reviewed & tested by the community

@seanB and @phenaproxima put in a lot of effort into this issue and fixed one of the major problems with media.

So this looks really good to me and I hope it will still make it into 8.6.

seanb’s picture

Changed #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user to critical since we decided that issue is a blocker for this patch.

The decision was made to temporarily unpublish media when metadata fetching is queued. We do this to make sure incomplete media items are not shown on the site and/or possibly indexed by search engines. Without the permission added in #2889855: Unpublished media entity can not be accessed by owner and update any media/delete any media access possibly cached by user, there is no way for a user to see the media item that it just created.

The last submitted patch, 73: 2878119-73.patch, failed testing. View results

effulgentsia’s picture

Status: Reviewed & tested by the community » Needs work

This patch didn't make it into beta, but I still think has a good chance of making it into RC.

  1. +++ b/core/modules/media/config/schema/media.schema.yml
    @@ -30,7 +30,7 @@ media.type.*:
    -    queue_thumbnail_downloads:
    +    queue_metadata_fetching:
           type: boolean
           label: 'Whether the thumbnail downloads should be queued'
    

    Let's update that label too.

  2. +++ b/core/modules/media/media.install
    @@ -163,3 +163,16 @@ function media_update_8600() {
    +function media_update_8601() {
    +  $config_factory = \Drupal::configFactory();
    +  foreach ($config_factory->listAll('media.type.') as $media_type_config_name) {
    +    $media_type_config = $config_factory->getEditable($media_type_config_name);
    +    $media_type_config->set('queue_metadata_fetching', $media_type_config->get('queue_thumbnail_downloads'))
    +      ->clear('queue_thumbnail_downloads')
    +      ->save();
    +  }
    +}
    

    Let's change this to use ConfigEntityUpdater. See #2989256: Old post_update functions that ->save() config entities without calling ->trustData() first mask errors in other post_update functions for why.

  3. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +class MediaStorage extends SqlContentEntityStorage {
    

    I think we need a MediaStorageInterface that defines the saveFromQueue method, so that the queue worker can type-hint on the interface, not the implementation.

  4. +++ b/core/modules/media/src/MediaStorage.php
    @@ -0,0 +1,189 @@
    +   * @internal
    +   */
    +  public function saveFromQueue(EntityInterface $media) {
    

    Is there a reason for this to be internal? If so, let's document that reason. But I don't think it should be, since I think we'll be supporting the concept of resaving from the queue for a while, especially since we also have a config key for it.

  5. +++ b/core/modules/media/src/Plugin/QueueWorker/MetadataFetcher.php
    @@ -58,10 +58,16 @@ public static function create(ContainerInterface $container, array $configuratio
    +      if (!empty($data['original_status'])) {
    +        $media->setPublished();
    +      }
    

    What about moving this to within the saveFromQueue() implementation? That way, MediaStorage controls both the unpublishing and the publishing as its own implementation detail.

effulgentsia’s picture

Hm, I'm also starting to wonder why have this logic in a storage handler? Why not have Media itself implement the desired save() and saveFromQueue() methods? Maybe there's good reasons for the storage handler, but if so, what are they?

phenaproxima’s picture

Because any entity can be saved in two ways: $entity->save() and $storage->save($entity). It needs to work the same way either way, and since the first form is just a more concise variant of the second form, the storage handler is what makes the most sense for this logic.

xjm’s picture

So what is the actual impact on site owners of this bug? The issue summary does not describe what I'd consider a critical issue. It sounds like the bug is:

If site owners have remote media types on their sites, their content will sometimes be saved without all of the remote metadata. There is not a data integrity problem because the data is downloaded eventually. The concern is mainly that the thumbnails will not appear in the meanwhile and so, for a contrib media source, the user (or maybe a search engine) might see the default fallback thumbnail rather than the preferred thumbnail for awhile.

Unless media items in HEAD are currently taking like 100ms extra to save because of the thumbnail functionality, or unless there is an actual data integrity problem or race condition, this issue doesn't seem like it would be critical. It definitely does sound major, which is serious enough in its own right without needing to bend the critical priority.

However, if there is a race condition, data integrity issue, etc. in HEAD, or if saving a media item hangs/whitescreens, then critical seems appropriate.

effulgentsia’s picture

Re #112, yeah, what I thought was the Critical issue when I wrote #98 was when you added a remote video (media/add/remote_video), that an HTTP request was being issued inside of a database transaction. However, I'm not currently able to reproduce that on 8.6 HEAD, because even in 8.6 HEAD, Media::save() already does the metadata fetching prior to calling parent::save(). So, yeah, a question for folks working on this issue: other than some code/API cleanup, what is this issue solving?

larowlan’s picture

+++ b/core/modules/media/src/MediaStorage.php
@@ -0,0 +1,189 @@
+    if ($should_queue && $media->isNew()) {
+      $queue_item['original_status'] = $media->isPublished();
+      $media->setUnpublished();
+    }

my reading of this indicates that we might have an issue in this scenario

* User saves media as published
* Storage handler queues metadata fetch and marks it as unpublished, noting it needs to be re-published
* Content editor goes back and resaves as unpublished
* Storage handler queues another metadata fech and leaves it as unpublished but notes original status as unpulished
* queue runs, first task publishes, second on unpublishes

In an instance where the first queue item is run and the second one is deferred to a future cron run, do we end up with the situation where the queue overrides the content-editors intention to unpublish

samuel.mortenson’s picture

* Content editor goes back and resaves as unpublished
* Storage handler queues another metadata fech and leaves it as unpublished but notes original status as unpulished

@larowlan Wouldn't the $media->isNew() check in the snippet you posted prevent this scenario?

phenaproxima’s picture

So, yeah, a question for folks working on this issue: other than some code/API cleanup, what is this issue solving?

If you call $media->save(), there is no chance that you will be waiting on an HTTP request during a database transaction.

If you call Drupal::entityTypeManager()->getStorage('media')->save($media), you might.

Since both ways are valid ways of saving a media item (and in fact, the second one is probably more "correct"), we need to fix this so that, no matter how you save your media items, you cannot end up in an HTTP request during a database transaction.

That's why this is critical. The crux of the issue is not whether authors have the correct metadata or not at save time; it's that they could potentially scramble their database if they are saving too many media items the "wrong" way at once.

Now, if that means we split off a new issue and fix the immediate problem without cleaning up the API as well, so be it. We can improve the actual API for 8.7.x. I defer that decision to the committers. But the main problem (the possibility of waiting for HTTP during a database transaction) is the actual critical bug that must be fixed now for 8.6.

larowlan’s picture

@larowlan Wouldn't the $media->isNew() check in the snippet you posted prevent this scenario?

That makes the scenario worse, as this could occur right?

  1. editor adds media item as published
  2. queue code sets to unpublished and flags to move back to published
  3. editor edits and marks as unpublished
  4. queue code runs and set to published
xjm’s picture

Yeah, if we can separate "prevent database scrambling" from "add improved APIs", we should definitely do that.

@larowlan, @effulgentsia, @plach, and I discussed this and we're somewhat concerned about the risk in general. This sounds like the sort of tricky bug where fixing it might introduce related regressions in data integrity, performance, or user expectation (with regard to the queuing process, etc.) We also discussed that the proposed queue/publishing workflow could have unintended side effects on sites using Content Moderation.

At the least, we'll want some test cases around the Workflow scenarios, and some thorough manual testing overall.

Thanks @phenaproxima!

catch’s picture

I'm also concerned about the unpublishing - especially cases where the media item should be unpublished for some other reason.

If we're concerned about search indexing and caching, then we should prevent caching when the media item is rendered with the placeholder/incomplete metadata - although given items are saved again when they get the correct metadata, cache tags should be invalidated, so not sure even this is a problem at least on the cache side?

Twitter shows embedded articles with a placeholder thumbnail sometimes for minutes at a time, so there are user experience patterns where embedded media takes a while to update, and it doesn't to me feel less jarring than something being unpublished.

For example if I create a media item from a node/article form via the field (when this option is available), and the media item is unpublished, won't it just completely fail to show when the node is rendered? Would I even be able to reference the media item from the node after creating it or would it be immediately unavailable?

Then that node referencing the unpublished media could also get cached/search indexed in the interim, leading to a similar problem to the one this issue is trying to solve - except that with the node case it will be even harder to prevent caching (because referencing unpublished media items is a valid case if the media item is unpublished later).

I do agree with ensuring things happen outside the transaction though.

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new63.85 KB
new10.4 KB

I see the automatic unpublishing is something we need to discuss a little more. Removed it from this patch since it was not part of the original scope for this issue, and even though showing a media item to users before metadata is fetched could be confusing, this issue is not completely new. We also have #2983456: Expose triggering update of media metadata + thumbnail to end users to allow users to fetch metadata directly, which might be good enough.

Hope this takes away the concerns to commit this patch.

#109

1. Fixed
2. Fixed
3. Fixed. Added an internal interface for queued media only.
4. I think saveFromQueue() should be reserved for the queue defined in the module. The supported/documented way to save media is through the save() method. If any calling code want to update the metadata directly it should call updateMetadata() on the media item first, not "misuse" saveFromQueue().
5. Removed the automatic unpublishing.

Status: Needs review » Needs work

The last submitted patch, 120: 2878119-120.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new64.87 KB

For some reason the new MediaStorageQueuedInterface was in the interdiff but not in the patch.

Status: Needs review » Needs work

The last submitted patch, 122: 2878119-122.patch, failed testing. View results

seanb’s picture

Status: Needs work » Needs review
StatusFileSize
new64.53 KB
new640 bytes

Missed one assert that should be removed.

marcoscano’s picture

Removed it from this patch since it was not part of the original scope for this issue, and even though showing a media item to users before metadata is fetched could be confusing, this issue is not completely new

Well, as pointed out in #64, the "temporarily broken" name is something being introduced in this patch, that's the reason we decided to do the unpublishing here and not in a follow up.

One argument could be made that for most affected sites, it would be preferable to keep things as they are today rather than having the effect mentioned in #64.

seanb’s picture

As catch mentioned in #119:

Twitter shows embedded articles with a placeholder thumbnail sometimes for minutes at a time, so there are user experience patterns where embedded media takes a while to update, and it doesn't to me feel less jarring than something being unpublished.

I agree having media unpublished is not necessarily better. Providing users with a way to fix it through #2983456: Expose triggering update of media metadata + thumbnail to end users would really help, with that option users might not even have a problem? Besides that, users also still have the choice to create the media as unpublished, or to disable queuing on a media type.

Maybe we could add a warning to the media form while creating it, instead of after creating, so users know that they are creating a media item that is temporarily incorrect. They can choose to not publish it immediately instead of the module making this decision?

phenaproxima’s picture

Priority: Critical » Major
Issue tags: -Media critical, -backport

After discussion with @seanB is Slack, I've opened #2990896: Move Media::save() logic into storage handler to prevent data integrity issues when media items with remote content are saved, which is critical, to move the Media::save() stuff into the storage handler. That one needs to be fixed in 8.6 and has a much narrower scope. That is just about moving the current hack to a better spot. We can sort out the API and UX implications further in this issue, which no longer needs to be critical (or backported to 8.6.x).

deviantintegral’s picture

   * there are 2 special metadata attributes every source must provide:
   * - The default name generated for a media item. The generated default name

This reads as if there's a b/c break if your source doesn't implement the default name. Actually, that is a b/c break given that media sources don't have to extend from MediaSourceBase. A cursory search in core isn't showing any cases that will fail if NULL is returned, so perhaps we should change this to every source should provide?

\Drupal\media\MediaStorage::save calls updateMetadata, but the method parameter is only typehinted to EntityInterface. We should probably add an instanceof check here.

Likewise, should public function saveFromQueue(EntityInterface $media); be a MediaInterface?

   * @return bool|int
   *   SAVED_NEW or SAVED_UPDATED is returned depending on the operation
   *   performed.

Should this just be @return int?

Version: 8.7.x-dev » 8.8.x-dev

Drupal 8.7.0-alpha1 will be released the week of March 11, 2019, which means new developments and disruptive changes should now be targeted against the 8.8.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

dww’s picture

Status: Needs review » Needs work

#124 isn't applying to 8.7.x anymore. Just re-queued bot to confirm. Anyone setup to quickly re-roll this?

Also NW for #128, all of which seems like good feedback to me.

Haven't done a thorough review yet.

Thanks, everyone! Would love to ses (and help get) this fixed in core.

Cheers,
-Derek

idebr’s picture

Issue tags: +Needs reroll

Version: 8.8.x-dev » 8.9.x-dev

Drupal 8.8.0-alpha1 will be released the week of October 14th, 2019, which means new developments and disruptive changes should now be targeted against the 8.9.x-dev branch. (Any changes to 8.9.x will also be committed to 9.0.x in preparation for Drupal 9’s release, but some changes like significant feature additions will be deferred to 9.1.x.). For more information see the Drupal 8 and 9 minor version schedule and the Allowed changes during the Drupal 8 and 9 release cycles.

dww’s picture

Status: Needs work » Needs review
Issue tags: -Needs reroll
StatusFileSize
new58.91 KB
new40.32 KB

I keep getting nailed by this, and would love to see it in core. I just tried to re-roll it for 8.7.x branch (which is where I need it right now). It was *really* gnarly, and I probably broke all kinds of things. ;) Running tests locally now, I predict all sorts of failures...

BUT, hopefully this is a step closer to working again...

Sadly, interdiff is majorly confused, so I'm having to attach a raw diff of the two patch files.

% diff -up 2878119-124.patch 2878119-133.patch > 2878119.124_133.raw-diff.txt

I suspect the test bot will fail miserably with this, but let's see what happens. ;)

Would *love* to get some input and help from the media maintainers on this...

Status: Needs review » Needs work

The last submitted patch, 133: 2878119-133.patch, failed testing. View results

seanb’s picture

Fetching all metadata in a queue is something we got quite some pushback on. But if we expand the API to allow metadata refresh, that might be a totally different issue. That might be better addressed in #2983456: Expose triggering update of media metadata + thumbnail to end users.

So maybe just close the this issue as won’t fix, and added the relevant API parts to refresh metadata in #2983456: Expose triggering update of media metadata + thumbnail to end users. It would significantly simplify the solution. I think all we need is a updateMetadata() method to improve this and have a way for users to execute that when needed.

dww’s picture

Status: Needs work » Closed (won't fix)