Problem/Motivation
Images referenced in text fields using <img src="…" data-entity-type="file" data-entity-uuid="…" /> have their usage tracked by the editor_file_reference filter plugin, and the accompanying hook_entity_insert() + hook_entity_update() + hook_entity_delete() + hook_entity_revision_delete() hooks.
Unfortunately, hook_entity_update() does not take translations into account.
Therefore, images referenced in a text field are lost when a new image is being added while creating a new translation without creating a new revision (because file usage incorrectly reaches zero, causing the permanent file to be marked temporary, and then eventually being deleted — after 6 hours by default).
There's likely also other bugs, such as when changing the default language — see #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted for that. Or when deleting a translation.
Steps to reproduce
- install the
languagemodule - create 2 languages: X & Y
- install the
content_translation - mark the
articlenode type'sbodyfield as translatable - create an article in language X, using the
Basic HTML
format for thebodyfield, with an image inserted into thebodyfield via the text editor's image dialog - verify that the uploaded file has its
statuscolumn in thefile_managedtable set to1,
and it has 1 usage in thefile_usagetable - create a new translation in language Y for that node, uncheck "Create new revision", then save (without changing any fields)
-
- expected result: the uploaded file has 2 uses in the
file_usagetable - actual result: the uploaded file has 1 use in the
file_usagetable
- expected result: the uploaded file has 2 uses in the
- go to either translation and remove the
<img>from the body -
- expected result: the uploaded file still has its
statuscolumn in thefile_managedtable set to1, and it has 1 use in thefile_usagetable - actual result: the uploaded file now has its
statuscolumn in thefile_managedtable set to0, and it has 0 uses in thefile_usagetable
- expected result: the uploaded file still has its
- run cron 6 hours later (this is the default value of the
temporary_maximum_agesetting insystem.file) -
- expected result: the uploaded file is still present (because usage > 0)
- actual result: the uploaded file is absent (because usage = 0)
Proposed resolution
- Extend test coverage to ensure all potential edge cases wrt translations & revisions are tested.
- Fix logic to make tests pass.
To prevent this class of bugs in the future, Entity Field API should provide an API to iterate over the values of a field across all translations & revisions.
Remaining tasks
- Have >=1 Entity Field API maintainer either provide code or review until the code is doing what they deem is the correct way to iterate over the values for a certain field across all revisions and all translations.
- Decide whether this logic should live in a helper function of
editor.moduleor whether it belongs in Entity API, and whether we should block this issue on that. - Expand test coverage to cover all edge cases for translations & revisions. We likely need input from an Entity Field API maintainer to indicate what all those edge cases are.
- #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted must land before the expanded test coverage can pass.
- (soft requirement) Fix the other bug surfaced here by @Leksat (in #17.1 + #18.1 + #22 + #26) — we have #2892368: editor.module's hook_entity_update() does not count file usages correctly when adding/removing images without creating new revisions for that. Ideally it'd already land since it'd simplify this patch, and that in itself does not cause data loss, so there is no need for an update path, which makes that one simpler.
User interface changes
None.
API changes
TBD
Data model changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| #49 | interdiff.txt | 656 bytes | dawehner |
| #49 | 2708411-49.patch | 21.16 KB | dawehner |
| #44 | interdiff.txt | 10.76 KB | dawehner |
| #44 | 2708411-44.patch | 21.11 KB | dawehner |
| #38 | 2708411-29v8.2-2.patch | 20.33 KB | quiron |
Comments
Comment #2
LuxianPS: patch was created using 8.0.x branch, but apparently nothing change in 8.1.x. I hope it will apply :)
Comment #4
LuxianBy mistake I uploaded the reverted patch. This one should work.
Comment #5
wim leersGreat catch! :) Patch looks great, this now just needs test coverage.
\Drupal\Tests\editor\Kernel\EditorFileUsageTestalready has test coverage, this patch just needs to expand it.Comment #6
LuxianI just realized that an update hook will also be welcomed for content that was already created :(
Comment #7
wim leers#6 Hm… you are probably right. That should not be too difficult fortunately. But first, let's focus on test coverage.
Comment #8
LuxianOk, this patch contains only the test, let's see how it fails. Complete patch will follow.
Comment #9
LuxianOf course I forgot the patch :-)
Comment #11
leksat commentedBased on Luxian's work.
Overall:
- added editor_entity_translation_insert() and hook_entity_translation_delete()
- editor_entity_update() updated to iterate over all changed translations
Comment #13
leksat commentedI think I found a better way to do the same. I reverted the previous changes to the editor.module and updated _editor_get_file_uuids_by_field() to gather field text from all field translations.
Comment #14
leksat commentedAdded an update hook to recalculate file usages.
Comment #16
leksat commentedA test failure in #13 seems to be unrelated.
The patch looks good from my side. Ready for reviews!
Comment #17
wim leersI'm not sure this is actually better. Can you explain why this is better?
I don't think this is a good idea. It's an API change. And I don't think this is even necessary.
It's possible to use
->delete()instead.This won't work. What if you have 10K, 100K, 1M entities?
This must use the batch API. Although I'm not sure how to do that in updates. I think there are already some update hooks that do this though.
Comment #18
leksat commented@Wim Leers, thanks for the review!
1. I noticed that during entity insertion, the editor module counts all file usages. I mean, if a single image is used twice in a field, the usage will be 2.
However, during entity update, array_diff() is used. It removed duplicate elements from the resulting array so that file usages may be updated incorrectly.
For example:
- we create an entity referencing two identical images [1, 1] => 2 usages of the file
- we update entity removing 1 image => array_diff([1, 1], [1]) => [] => file usage stays the same
- we update entity removing the last image => image usage is decreased by 1 => 1 "lost" file usage remains
I was able to reproduce this on simplytest.me
Probably this also needs a test coverage.
2. Yeah, this is not nice. Will try to find a better way.
3. Even if we have 1M entities, memory usage should be fine. But, yeah, I forgot about the time limit. Will check for batch examples.
Comment #19
wim leersWe don't have explicit test coverage for that, but I don't think that's wrong. If you have two file fields on an entity, and they both reference the same file, you'd also count it twice.
Great work here, thank you for pushing it forward!
Comment #20
leksat commentedReverted API changes. The update function now uses batch.
Comment #21
wim leers#20 addressed #17.2 and #17.3, but not #17.1. See #19.1 for why I think the entire
diffOnce()thing is wrong/unwanted.AWESOME work on the Batch-based update hook! Did you test it? Does it work?
Comment #22
leksat commented@Wim Leers,
For #17.1:
I think there is a misunderstanding. The diffOnce() thingy does not change the current editor behavior (if one image is used 2 times in a node, it will be counted twice). Instead, it fixes an additional bug described in #18.1. Can you please check that comment again?
I tested batch-update manually:
- generated 150 nodes and 150 files with devel
- updated some nodes to have some images (including one-image-several-times case)
- applied patch, ran update
- step-debugged important places of the update function
-- ensured that usages were first cleared and then recalculated
-- ensured that batch works
- ensured that new usages are correct
Thanks for the review!
Comment #23
wim leersThis is great! It means I can remove the tag. But I'm afraid I can't remove the tag: your manual testing needs to be proven by an automated test too.
Comment #24
leksat commentedUhh... Okay. Never did this before, but I hope there are some examples in the core.
Comment #25
swentel commentedAlso wondering out loud, do we need a test also that file usage is /not/ incremented when the body itself is not translatable (so synced accross translations) ? Granted, that scenario is probably not something that will happen a lot in the real world, but it's a use case though.
Comment #26
wim leersOh! Sorry, I indeed misread. Thanks!
I'm afraid you're 100% correct :) We totally failed to add test coverage for this case in #1932652: Add image uploading to WYSIWYGs through editor.module.
That means this patch is close, and on the right path, but just needs a bit more refinement:
This needs unit test coverage in
\Drupal\Tests\Core\Common\DiffArrayTest.Trailing space.
This is NOT iterating over all revisions! That seems wrong.
Needs FQCN. (Yes, this was already wrong before.)
Comment #27
wim leersI was going to remove the tag, because we do have "regular" tests (just not yet upgrade path tests), but @swentel just suggested another edge case that doesn't have test coverage yet.
Comment #28
wim leersSee #2725415-18: Text Editor module fails to track usage of images uploaded in text_with_summary fields, allows uploaded images to be deleted and #2729953: Clarify handling of multiple references to the same file from entities. Let's change the behavior of multiple references to the same file in a single revision in a separate issue, that's out of scope here.
Comment #29
leksat commentedIn this patch:
- added unit test for DiffArray::diffOnce() (that was a really good idea! I caught a fatal error :))
- update now processes revisions as well
- added test for the update
Important: the update test uses Entity API prior to
$this->runUpdates()call. That works, but I'm not sure if that's how it should be. If API usage is not allowed before updates are executed, I guess the only way would be to provide a new database dump for this update.Comment #30
gábor hojtsyComment #31
gábor hojtsyUsing the proper media tag now.
Comment #33
dawehnerI'm we are loosing files, I think this is by no way not a critical issue.
Comment #35
dawehnerI'm seriously wondering whether we should have form of iterating over translations and revisions or so. Having custom behaviour here feels really weird.
Comment #36
wim leersI very much agree. Back when this was added, every reviewer understood:
hook_entity_(insert|update|…)()to be operating on/called for every revisionThis is clearly not true.
Entity API's hooks are extremely confusing in that regard. It's likely lots of contrib modules will have the same problems. In fact, that's apparently why a bunch of new hooks were added in #2382675: hook_entity_create() affects the data of new translations of existing entities in unexpected and undocumented ways. This was committed in November 2015, shortly before D8's release. This logic in
editormodule was written more than two years before that.Furthermore, the entity translation "insert" and "delete" hooks were added around the same time (#1810370: Entity Translation API improvements) as this code was written. The "entity translation create/delete" hooks are about "translation storage". There is no corresponding "update" hook. This seems to further confirm that just looking at revisions should be okay, but apparently it is not.
So, at this point, I have no idea what is actually the correct way. We really need better documentation. Should we move this to the
entity systemcomponent for feedback from the Entity API maintainers?It's also not clear to me why the much simpler approach in #4 is not sufficient. #11 changed the implementation significantly.
Comment #37
quironHi all,
I needed the patch #29 but for 8.2 there was an error on the patch apply, so I have recreated it for 8.2.
hope be usefull.
Comment #38
quironSorry, there was a typo error :S
this is the correct one
Comment #39
wim leersThe comments/work in #1239558: Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger from #83 until #90 only further confirm what I wrote in #36.
Comment #40
gábor hojtsyA translation is not a revision and a revision is not necessarily a translation. Revisions and translations are the two axes on the "spreadsheet" of an entity. If you use the UI AND have revisions enabled, then a new translation change would create a new revision (with copy of all data for other languages in that revision). If you don't use revisions or you use the API, then you can change multiple translations within one revision. Basically the revisions are columns on the spreadsheet and translations are rows. Does that help? :)
Comment #41
wim leers@Gábor: Yes, that's clear now. But the thing is that it's very unclear in Entity API itself, which is why nobody noticed this during the review process back when this functionality was added. It's not clear from Entity API documentation, and it's not clear by reading Entity API's hooks. Grep
entity.api.phpfor . I think it'd be worth adding #40's explanation toentity.api.php.Comment #42
dawehnerMaybe we could open up an issue to either document that or even provide some API helpers to deal with that.
Note: Using the entity API in a
hook_update_Nis dangerous. I could imagine that this is safe to be done in ahook_post_update_NNote: We could use a dataprovider here.
Comment #43
dawehnerWorking on it for a while
Comment #44
dawehnerhook_post_update_N()Comment #46
leksat commentedI don't really remember, but I guess I did it this way because it's theoretically possible to programmatically create a new entity having several translations from the beginning. I doubt if Drupal will fire translation hooks for each translation in this case.
Comment #47
wim leers#2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted looks suspiciously similar. That'd not be surprising, because
editormodule essentially copied this logic fromfilemodule.Comment #48
alexpottRecently @catch, @webchick, @xjm, @cottser, @cilefen and I discussed #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary , #2810355: $entity->isDefaultTranslation() behaves incorrectly when changing default translation, causing file/image field usage to be set to zero, causing files to be deleted and #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations. We all agreed that the current file usage behaviour that is resulting in files being deleted when they are being used by a site is unacceptable and a critical issue. We've created #2821423: Dealing with unexpected file deletion due to incorrect file usage to plan for how to fix and mitigate the related issues.
Comment #49
dawehnerThis could fix a bunch of those errors ...
Comment #51
berdirSounds like you had a loop in the update function?
This is a bit strange.
LANGCODE_NOT_APPLICABLE is an actual language that an entity can have, here we use it to indicate that we should not get a specific translation but pick the one we got.
It would make a lot more sense to merge those two things together, and only loop in the if case.
if (is translatable) {
loop over languages, merge the text together)
else {
just do it once with the current translation, as it doesn't matter through which translation we access it.
We will duplicate 2-3 lines of code with that, but I think the result will make more sense.
this isn't quite so easy and might be the reason for your loop?
here you care about revisions, but you still store the result as entity_ids in either case.
also $result is weird when you actually have the count?
here you don't seem to care about revisions anymore, so you will never reach the total.
Also, pager() doesn't make sense, pager() relies on GET query arguments. You need to page yourself, by doing the proper range().
One approach for dealing with revisions would be to query over entity IDs and then load all revisions for that if it supports revisions. And hope that there aren't millions of revisions ;)
You could also have a look at paragraphs_post_update_set_paragraphs_parent_fields(), that's a "loop over all revisions or ids" post update function that we wrote in paragraphs and it *seems* to work :)
Comment #53
tstoecklerSeems this not being re-added, unless I'm missing something.
Comment #54
wim leersI went through the entire issue.
The following comments/discussions were especially important:
editor.module'shook_entity_update()implementation. In #9, he also added test coverage.hook_entity_update(), but instead useshook_entity_translation_insert(). He also implementedhook_entity_translation_delete(). And expanded the test coverage.hook_entity_translation_insert()+hook_entity_translation_delete()implementations. So this goes back to only relying onhook_entity_update()to also detect the "change events" thathook_entity_translation_insert()+hook_entity_translation_delete()are specifically designed for.editor_file_referencefilter plugin.editor.module'shook_entity_update(): not only does it fail in case of new/removed translations, it also fails when adding/removing images without creating a new revision (i.e. when editing a particular revision): if images are added/removed, then the file usage is never incremented/decremented accordingly. It's probably a good idea to split this off to a separate issue. Opened #2892368: editor.module's hook_entity_update() does not count file usages correctly when adding/removing images without creating new revisions for this.hook_update_N(), buthook_post_update_N(). He made that happen in #44 + #49.Having looked at this again, I now disagree with my own analysis in #47. It's a similar problem space to #2708411: [PP-1] editor.module's editor_file_reference filter not tracking file usage correctly for translations, but it's not the same. The difference:
editor.module's use ofhook_entity_insert()+hook_entity_update()+hook_entity_delete()+editor_entity_revision_delete()that were introduced in #1932652: Add image uploading to WYSIWYGs through editor.module. It looks like it should also usehook_entity_translation_insert()+hook_entity_translation_delete(), but those were added after #1932652 had already been started, in #1810370: Entity Translation API improvements. But it also turns out that we actually don't even want to use those hooks (as the last several patches demonstrate) — what we want is for Entity Field API to provide a standardized way to iterate over the values for a field across all revisions and all translations.\Drupal\file\Plugin\Field\FieldType\FileFieldItemList::delete()removing file usages, and it relying on$entity->isDefaultTranslation()(i.e.\Drupal\Core\TypedData\TranslatableInterface::isDefaultTranslation()) to determine whether a translation is being deleted or not, and apparently$entity->isDefaultTranslation()behaves incorrectly in case the default translation is being changedFinally, a patch review.
I think this patch is in a fairly good place. But there are certain things that still need to happen:
[PP-1].editor.modulethat @Berdir can then review & approve. Then we'll have a working, green patch, with all the logic sound. Then either as a blocker or in a follow-up issue, we can turn that into a proper API. Perhaps onContentEntityBase. Perhaps as a trait. Something likegetFieldValuePerRevisionAndPerTranslation($field_name)which returns an array with each item in the array containing['value' => …, 'revision id' => …, 'langcode' => …]. Let's discuss the details.Comment #55
wim leersCleaning up issue tags.
Comment #56
wim leersFix HTML in IS.
Comment #57
wim leersFix HTML in IS.
Comment #58
wim leersOne final clarification in the proposed resolution.
Comment #59
wim leersThis is actually not limited to . It's that there's no translation handling at all.
Comment #60
catchNow that #2801777: Give users the option to prevent drupal from automatically marking unused files as temporary is in, this no longer results in unexpected file deletion. #2821423: Dealing with unexpected file deletion due to incorrect file usage is open as a critical meta-issue to discuss the broader approach on this. So I’m going to downgrade this to major after discussion with @xjm and @cilefen, since we feel effort would be better spent on replacing the file_usage system at this point. We can still fix this bug of course as a major.
Comment #64
wim leersMarked #3090997: Translation and revisions is not supported when file upload to editor as a duplicate.