Problem/Motivation

This is spun off from #2878119-116: Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction:

If you call $media->save(), there is no chance that you will be waiting on an HTTP request during a database transaction.

If you call Drupal::entityTypeManager()->getStorage('media')->save($media), you might.

Since both ways are valid ways of saving a media item (and in fact, the second one is probably more "correct"), we need to fix this so that, no matter how you save your media items, you cannot end up in an HTTP request during a database transaction.

That's why this is critical. The crux of the issue is not whether authors have the correct metadata or not at save time; it's that they could potentially scramble their database if they are saving too many media items the "wrong" way at once.

Now, if that means we split off a new issue and fix the immediate problem without cleaning up the API as well, so be it. We can improve the actual API for 8.7.x. I defer that decision to the committers. But the main problem (the possibility of waiting for HTTP during a database transaction) is the actual critical bug that must be fixed now for 8.6.

Proposed resolution

Move the logic in Media::save() to the media entity's storage handler, so that no matter how you save, HTTP stuff is done outside the database transaction.

Remaining tasks

Do it.

User interface changes

None.

API changes

None.

Data model changes

None.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

phenaproxima created an issue. See original summary.

xjm’s picture

Title: Move Media::save() logic into storage handler » Move Media::save() logic into storage handler to prevent data integrity issues when media items with remote content are saved
seanB’s picture

Status: Active » Needs review
FileSize
23.21 KB

This should be a push in the right direction.

seanB’s picture

Already found some things to improve. Also saw the tests were failing so added a fix for that as well.

The last submitted patch, 3: 2990896-3.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

phenaproxima’s picture

  1. +++ b/core/modules/media/src/Entity/Media.php
    @@ -239,115 +242,21 @@ protected function loadThumbnail($thumbnail_uri = NULL) {
    +        $default_thumbnail_filename = $this->getSource()->getPluginDefinition()['default_thumbnail_filename'];
    +        $thumbnail_uri = \Drupal::config('media.settings')->get('icon_base_uri') . '/' . $default_thumbnail_filename;
    

    Let's move this outside of the foreach loop so we only have to do it one time, or not at all.

  2. +++ b/core/modules/media/src/MediaStorageQueuedInterface.php
    @@ -0,0 +1,31 @@
    +interface MediaStorageQueuedInterface extends SqlEntityStorageInterface {
    

    Let's just call this MediaStorageInterface, and it should extend EntityStorageInterface, not SqlEntityStorageInterface. Actually, I'm not convinced we even need an interface at all; an interface implies an API, and we want to avoid introducing any API in this patch if we can. So maybe we should remove this interface entirely, and typehint the storage handler as MediaStorage in the queue worker.

effulgentsia’s picture

Re #6.2, why is this patch adding the saveFromQueue() method at all? I thought the scope of this issue was *only* moving Media::save() to MediaStorage::save(). Everything else here seems out of scope for the Critical issue, and rather the scope of #2878119: Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction. Or am I misunderstanding something?

If I'm right though, then I think that would mean no new interface needed yet by MediaStorage, and instead the queue worker could typehint to EntityStorageInterface, since all it would need to call is save().

seanB’s picture

Re #6.2, why is this patch adding the saveFromQueue() method at all? I thought the scope of this issue was *only* moving Media::save() to MediaStorage::save().

The code in Media::save() checks several things to determine if the thumbnail or metadata should be updated. Checking if the source field changed was the most important. When we move the hasSourceFieldChanged method to check if the source field changed, the Media::shouldUpdateThumbnail() makes no sense anymore, since it relied on that method being in Media and not the Storage. This led to a whole chain of things depending on each other (which was a big reason why we wanted to refactor it in the first place).

When updating the thumbnail becomes part of the storage, saving from the queue becomes more complex. The storage now needs to know if it's being saved from a queue or not. That's why the saveFromQueue() was also introduced in this patch. I personally agree with phenaproxima that we shouldn't add this to the API, but in #2878119-109: Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction it was suggested we should add it.

phenaproxima’s picture

@seanB is correct -- the logic involved in saving media is so complex and wound around itself that we end up needing a saveFromQueue() -- there are several methods in Media that it is, effectively, replacing (anything with the $from_queue) parameter.

I say we should:

  • Remove the interface
  • Mark saveFromQueue() explicitly internal
  • Mark MediaStorage explicitly internal

I believe that the latest patch is indeed doing the minimum of what's needed to fix the issue at hand.

effulgentsia’s picture

First of all, I really apologize for not helping #2878119: Whether queued or not, update the media thumbnail and metadata before beginning the entity save database transaction land sooner. I'll do my best to help it land as soon as possible into 8.7.

However, now that we're already past 8.6 beta and with the RC being this week, I'd really like to keep this patch's changes as small as possible. What do you think of something like this?

For 8.6, I don't want to change the queue worker or anything else that's currently calling $media->save() to calling the storage handler's save() instead. That way, if there are existing modules out there that override the Media entity type class, they can keep working as they do now. However, what this patch does is simply makes core's Media class functional for contrib code that calls the storage handler's save() method directly, which is a legitimate thing for contrib to do, and which is why this issue is Critical. If there's agreement on this approach, I think all that's left is to add a test for that exact scenario (invoking the storage handler's save() directly).

effulgentsia’s picture

+++ b/core/modules/media/src/PrepareSaveInterface.php
@@ -0,0 +1,27 @@
+ * @todo Consider making this a non-internal interface in Drupal 8.7.

I opened #2992426: Add entity methods and hooks for executing pre/post code outside of the save transaction to propose adding an API for all entity types to have the ability to run some of their save() code outside of the transaction.

phenaproxima’s picture

I'm not too keen on the idea of having an interface, because one thing is true of Drupalists -- if you put it there, they will use it, internal or not :) Besides, we may have to swiftly deprecate said interface in 8.7.x. My preference would be to simply expose an internal prepareSave() method on the Media class, and leave it at that -- I imagine it would be allowed under the BC policy, and I doubt anything in contrib is extending/replacing the Media class (although that's worth a grep).

effulgentsia’s picture

phenaproxima’s picture

There are two known places in contrib where the Media class is being extended, and they're both in unit tests: http://grep.xnddx.ru/search?text=extends+Media+&filename=

Therefore, I think having a method_exists() check is smart. I like this patch; I think we could probably land this to patch over the critical.

phenaproxima’s picture

Status: Needs review » Reviewed & tested by the community

In fact, I'll go so far as to say this is RTBC once green on all backends.

This is a late-beta critical bug fix; our goal is to prevent the worst-case scenario in the smallest number of lines possible, not to finesse it. We must add a public method, so we're marking it internal (which is par for the course; the Media class already has one of those), which gives us the latitude to remove it in a minor release if we need to. Cleaning up the API -- absolutely a noble pursuit, with all the general crazy yak shaving that entails -- should be an official 8.7 target for the Media Initiative.

But for now, let's correct the immediate problem at hand.

xjm’s picture

#13 seems like a pretty good compromise to me as well. The only question I had is what impact swapping out the storage handler would have on an existing site or module. It seems like at a minimum we should guarantee a cache clear to ensure an up-to-date entity definition?

xjm’s picture

Status: Reviewed & tested by the community » Needs review
xjm’s picture

we're marking it internal... which gives us the latitude to remove it in a minor release if we need to.

Well, not exactly: https://www.drupal.org/core/deprecation#internal

But yes, I agree with marking it internal and with not adding an interface until such time as it becomes a part of the parent API.

effulgentsia’s picture

It seems like at a minimum we should guarantee a cache clear to ensure an up-to-date entity definition?

Done.

I think all that's left is to add a test for that exact scenario (invoking the storage handler's save() directly)

I still think that's a good idea to do, but not sure if it can be done in time for RC's commit freeze.

phenaproxima’s picture

I think all that's left is to add a test for that exact scenario (invoking the storage handler's save() directly)

One quick option is to modify a few of Media's existing tests, which (as you might guess) do a lot of saving. We could just change some of the calls from $media->save() to $storage->save($media), especially in the functional oEmbed tests, which are most likely to suffer from this problem. Also, this patch should always be tested on all backends, since it was originally SQLite which turned up the problem (because it does not have row-level locking).

phenaproxima’s picture

Assigned: Unassigned » phenaproxima
Status: Needs review » Needs work
Issue tags: +Needs tests

Self-assigning to write the tests.

phenaproxima’s picture

Assigned: phenaproxima » Unassigned
Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
6.28 KB
10.07 KB

This should hopefully do the trick. It's light, but hopefully enough to prove the point.

In a nutshell, this changes certain parts of an extensive kernel test (MediaSourceTest) which covers operations like setting the default name, initial metadata mapping, and setting the default thumbnail -- all of which are done in prepareSave(). Sometimes the test will call $storage->save($media), and other times it will call $media->save(). The behavior should be totally identical in either case, since with this patch, $media->save() directly delegates to the storage handler. I think this proves the fix, since the whole idea is that we want to be able to mix and match either style of saving media without seeing different behavior.

phenaproxima’s picture

At @effulgentsia's request, here is a fail patch and updated test. I had to more carefully select which points of MediaSourceTest should call $storage->save($media), since only certain specific conditions will trigger the failure. But trigger it, they do.

seanB’s picture

Not sure I agree with changing the existing some of the Media::save() calls to the storage save. I can see it happen that someone changes this back later without realising the purpose of those changes.

Can't we just have a specific test to verify there is no difference in saving from the storage or the entity?

Something like this (didn't test this yet, but it should give you an idea):

  /**
   * Tests that the media storage saves media the same way the entity does.
   */
  public function testMediaSave() {
    $field_name = 'field_to_map_to';
    $attribute_name = 'attribute_to_map';
    $storage = FieldStorageConfig::create([
      'entity_type' => 'media',
      'field_name' => $field_name,
      'type' => 'string',
    ]);
    $storage->save();
    FieldConfig::create([
      'field_storage' => $storage,
      'bundle' => $this->testMediaType->id(),
      'label' => 'Field to map to',
    ])->save();

    // Define mapping and make sure that the value was stored in the field.
    \Drupal::state()->set('media_source_test_attributes', [
      $attribute_name => ['title' => 'Attribute to map', 'value' => 'Snowball'],
    ]);
    $this->testMediaType->setFieldMap([$attribute_name => $field_name])->save();

    // Assert the media item saves itself correctly.
    $media = Media::create([
      'bundle' => $this->testMediaType->id(),
      'field_media_test' => 'some_value',
    ]);
    $media->save();
    $this->assertSame('media:' . $media->bundle() . ':' . $media->uuid(), $media->getName(), 'Default name was not saved correctly.');
    $this->assertSame('Snowball', $media->get($field_name)->value, 'Metadata attribute was not mapped to the field.');

    // Assert the media storage saves the media item correctly.
    $media = Media::create([
      'bundle' => $this->testMediaType->id(),
      'field_media_test' => 'some_value',
    ]);
    $this->storage->save($media);
    $this->assertSame('media:' . $media->bundle() . ':' . $media->uuid(), $media->getName(), 'Default name was not saved correctly.');
    $this->assertSame('Snowball', $media->get($field_name)->value, 'Metadata attribute was not mapped to the field.');
  }

Status: Needs review » Needs work

The last submitted patch, 23: 2990896-23-FAIL.patch, failed testing. View results

phenaproxima’s picture

Status: Needs work » Needs review

Fail patch failed. Hooray!

phenaproxima’s picture

How does this look? It explicitly tests all three mappings (name, metadata, thumbnail), using save() and save($media).

The last submitted patch, 27: 2990896-27-FAIL.patch, failed testing. View results

seanB’s picture

Loving it! Just on minor last thing, wouldn't it be better to check the actual values we expect instead of checking if the values are not empty?

+++ b/core/modules/media/tests/src/Kernel/MediaSourceTest.php
@@ -15,6 +15,73 @@
+    $this->assertFalse($a->get('name')->isEmpty());
+    $this->assertFalse($b->get('name')->isEmpty());
...
+    $this->assertFalse($a->get('field_to_map_to')->isEmpty());
+    $this->assertFalse($b->get('field_to_map_to')->isEmpty());
phenaproxima’s picture

We could do that, but it's beside the point of test. Unless save() is called, the name field is kept empty (which is masked by getName(), which delegates to the source plugin if the field is empty), so asserting its emptiness is good enough. Same thing for the mapped value.

seanB’s picture

Status: Needs review » Reviewed & tested by the community

Fair enough! I think this is ready. Since I didn't write any code in the final patch I should be able to RTBC this.

  • catch committed 1e7b231 on 8.7.x
    Issue #2990896 by phenaproxima, effulgentsia, seanB, xjm: Move Media::...

  • catch committed 57818db on 8.6.x
    Issue #2990896 by phenaproxima, effulgentsia, seanB, xjm: Move Media::...
catch’s picture

Version: 8.7.x-dev » 8.6.x-dev
Status: Reviewed & tested by the community » Fixed

Committed/pushed to 8.7.x and cherry-picked to 8.6.x. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.