Problem/Motivation
Right now, when you complete a Drupal 8 migration, AFAICS in all cases, you get your node/comment titles moved over, but the bodies display as blank:
The data for the fields is in the database, but getting stripped out on the front end.
According to the migrate team, this happens when filter formats not recognized by Drupal 8; they get migrated as filter_null, a filter that simply returns an empty string. However, to an "end user" it just appears that migrations don't work, even for very simple core cases where no fancy non-core input formats were used.
Major/Migrate critical because this represents a pretty significant usability hurdle, and harms user confidence in the system early on.
Proposed resolution
Instead, map mismatched filter formats to fallback_format (defaults to plain text).
The one possible reason not to do this is PHP code filter, where you could inadvertently expose database credentials that would now be exposed in plain text. However, we could probably deal with this case specially. (For example, explicitly setting that one to filter_null to indicate it needs attention.)
Remaining tasks
Write a patch + testsReview itCommit it!- Publish change record
User interface changes
By default, migrations will now display at least something for body field content for nodes and comments. (As well as other rich text fields.)
API changes
?
Data model changes
N/A?
Release notes snippet
Migrations now default to the fallback filter format, fixing a problem where rich text fields were not displaying post-migration.
Comment | File | Size | Author |
---|---|---|---|
#38 | 3061571-36.patch | 19.61 KB | alexpott |
#38 | 21-36-interdiff.txt | 1.76 KB | alexpott |
#23 | node-view-no-body-displayed.png | 6.35 KB | mradcliffe |
#23 | node-edit-no-text-format.png | 12.87 KB | mradcliffe |
#21 | 3061571-21.patch | 19.22 KB | Wim Leers |
Comments
Comment #2
mikelutzOf course, in actually looking into the code after the call and reminding myself how it actually works and why it's not going to be quite that simple. :-)
Formats such as plain_text, full_html, etc in the standard profile consist of a collection of filter plugins such as filter_html_escape(Display any HTML as plain text) or filter_url(Convert URLs into links) or the dreaded php_code filter included in D7. Each filter is run in succession, taking the output of the previous filter transforming it somehow, and then passing it on.
Formats are all migrated from the D7 site. If you had plain_text in D7, migrate creates a plain_text in D8. If you have a php format in d7, you get a php format in D8. When we migrate the formats, we migrate each filter in the list. They changed from an id and callback returned from hook_filter_info() in D7 to to a plugin in D8 but the core filters all have the same id.
Filters live in code. They are defined in hooks in D7 and as plugins in D8, but they cannot be migrated, they must be provided by core or an installed module.
During the format migration, the system loops through the format's filters if a D7 filter id doesn't match a D8 filter plugin id, it's the filter that is replaced with filter_null in the format's chain (which takes anything and spits out an empty string to the next filter in the chain). In core, it's usually the php_code filter that is the one getting stripped out, although a contrib or custom filter that is missing in D8 would also be converted to filter_null in the chain.
So the field has a valid format, it's just the format itself that contains one or more invalid filters.
So, we have to make some decisions. I strongly suspect the biggest offender will be the php_code filter, since that was provided in core in D7 and there is no replacement in D8. I wonder if it would be reasonable to add a new php_code filter to core which strips out php code without evaluating it (we can certainly change the name/id and map it in a migration). I also wonder if it would be a good idea to add an empty or pass-through filter to core, and maintain a list of popular contrib d7 filters that are considered safe to replace with a pass through, rather than forcing any filter that doesn't exist on the new site to be replaced with filter_null. At least then we could try to keep as many of the site's formats as possible.
Otherwise, what the original plan translates to is essentially:
Attempt to migrate a format
Loop through the format's filters
IF a filter is not found THEN
IF the filter is php_code THEN
replace the filter with filter_null
ELSE
abort migrating the format at all
After which, the field migration will run and the text fields will migrate with a invalid format, and then fallback to the fallback format.
Thoughts?
Comment #3
heddnHere's my thoughts after reviewing the notes from above.
Let's work on simple solution first. It would require a CR, so tagging.
Comment #4
bogdog400 CreditAttribution: bogdog400 commentedI found this page because my body fields were disappearing. After pulling out some hair, I found this error in the log:
Missing filter plugin: filter_null.
After stumbling around, I found that if I went to Home>Administration>Configuration>Content authoring and started editing the content types, I would start getting messages that "filter_null" couldn't be found and so it would be deleted when I saved the text format. After saving the text format for Full HTML, the body text started appearing.
I think it would be better to have a better default than filter_null that just deletes content.
Comment #5
mikelutz@bogdog400 nothing is deleted, you simpler have to configure your formats to allow it to display.
It's tricky from a security issue. Because we don't know what the missing filter actually did, we don't know what of the content is actually safe to display, so we really can't display any of it by default. The purpose of this issue is to find some way for a person to make that decision during the migration rather than having to find out what to do later.
Comment #6
bogdog400 CreditAttribution: bogdog400 commentedYes, I understand that it doesn't delete content permanently, but it disappears from the page in a way that could be caused by dozens of different issues. Perhaps it's a misplaced block. Perhaps it's a permissions issue. I went through a dozen hypotheticals before finding this solution.
It might better to have an error filter that just inserts an error message with a pointer to an issue page.
Comment #8
Wim Leers#2946889-22: Missing migration filters that are replaced with filter_null may have invalid settings applied is a step in this direction.
I think the next step is to build a registry of filters and/or an audit step ahead of the migration that allows the user to choose to drop certain filters.
For example, the https://www.drupal.org/project/typogrify and https://www.drupal.org/project/image_resize_filter filters are purely cosmetic. If we'd have a registry for those, we could safely drop those during migration, which would then result in the body fields showing up right away.
Contrast this with e.g. https://www.drupal.org/project/markdown — without this, you'd get nonsensical output.
Comment #9
Wim LeersA different proposal
I propose an intermediate solution which combines the best of both.
In Drupal 8, filters must declare their type — this was introduced in #1782838: WYSIWYG in core: round one — filter types (change record:
https://www.drupal.org/node/1817474).
If we'd have similar metadata on Drupal 7 filters, we could make much smarter decisions automatically. Like I already mentioned in #8, not all filters are equally important.
Today's reality
Right now, any text format that uses a contrib filter gets a
filter_null
filter inserted automatically (the only exception appears to be the Markdown module + filter, thanks to #2711053: Migration from D7 broken in 8.1.x). Any timefilter_null
is used, all body fields using that text format show up as empty.Even if the filter is purely cosmetic: typographical enhancements, URL transformations, tokens getting expanded — those things not working automatically after a migration is acceptable. It's easy to install add a filter to a text format after the migration. That's the whole point of the filter system: filters are non-destructive. Which means the user's original input data is never modified, and any fancy transformations happen at render time only.
This looks broken and is scary: it seems as if all content was lost during the migration. It makes it seem as if content was destroyed.
Proposed new reality
Maintain in Drupal 8 core's
\Drupal\filter\Plugin\migrate\process\FilterID
filter types for the most commonly installed contributed filters.If a filter does not have a Drupal 8 equivalent and it is a transformation-only filter:
EDIT: the D7 contrib filters in the patch are the first ten I encountered while going through https://www.drupal.org/project/project_module?f%5B0%5D=&f%5B1%5D=&f%5B2%... :)
Comment #10
Wim LeersRebased.
Comment #11
Wim LeersSee #3104111: [meeting] Migrate Meeting 2020-01-02 — we have +1s from Migration system maintainers @mikelutz and @heddn! 🥳
Comment #12
Wim LeersI audited the top 300 editor/filter modules, or the first 6 pages out of a total of 14 pages. The pages were sorted by most installed first. Module 300 had only 89 sites using it. That's less than 0.013%.
Which means we're firmly in the 99.9% territory of D7 sites benefiting from the metadata I've gathered here.
(Note that I first used http://grep.xnddx.ru/search?text=filter_info%28%29&filename= to find all modules providing filters, but I noticed pretty quickly that the vast majority of modules were not being found, so I had to resort to a manual process of browsing d.o, navigating to gitlab, selecting the right branch, searching for
filter_info
.)Comment #13
Wim LeersI was uniquely qualified to perform the analysis in #12. But adding the test coverage and refactoring per @mikelutz and @heddn's guidance can be done by anybody who is familiar with the migration system. So, unassigning.
(I might get back to this, but if somebody else can push this across the finish line in the mean time, that'd be amazing.)
Comment #14
heddnWe don't use t() in messages. Its because they are like exceptions.
Comment #15
Wim LeersSome more stats for #12:
TYPE_MARKUP_LANGUAGE
, so cannot be dropped (because without them, it's not humanly possible to compare the content on the source and destination)TYPE_HTML_RESTRICTOR
, so cannot be dropped (for security reasons)In other words, #12 enables many more D7 sites to migrate to D8 without necessarily having the same filters be available. In some cases, the filter is entirely cosmetic, in others it's only used in a relatively small number of cases so easy to fix manually, and in others it's going to be desirable to port the filter later — during the "polish" phase of the project.
Comment #16
mradcliffeI do like this message having the plugin id so I think that
sprintf
would work if the ids are safe values to print. Does the message does get printed in the UI when using Migrate Drupal UI? I don't remember.Comment #17
Wim Leers#16: No, messages are not exposed in
migrate_drupal_ui
, they're only available via the existing "watchdog" UI.Comment #18
Wim LeersComment #19
heddnStill needs tests. Since there's a lot of these, probably a sampling of the filter ids are needed.
This could use some docs on why this is needed. or split it out into another issue. I don't see any reason for its existence previously mentioned in this issue.
Comment #20
Wim LeersAdded test coverage and fixed coding standards violations.
Comment #21
Wim LeersRE:
SubProcess
: I'd swear I needed to do this for stuff to work … but now it's been a month since I wrote that code and I stupidly didn't write down the "why" back then 🙈So … let's remove it and let's see if it passes!
Comment #22
Wim Leers🥳 Green! Happy to see the
SubProcess
hunk gone :)Change record created: https://www.drupal.org/node/3109982
Comment #23
mradcliffeI applied the patch to 8.8.1 and ran a migration from Drupal 7. This didn't seem to get me back to a working state with content being displayed on nodes.
On Drupal 7, the Filtered HTML (id: 1) text format has the following filters enabled (url, autop, wysiwyg, htmlcorrector). The only non-core one is wysiwyg filter provided by wysiwyg_filter. This should get picked up by the patch on migration, and it does, but the end result does not display the content.
On Drupal 8, the filter format migration failed to import the Filtered HTML (id: 1) text format. There are two messages:
1. Filter wysiwyg could not be mapped to an existing filter plugin; defaulting to filter_null
2. The configuration property filters.filter_null.settings.nofollow_domains.0 doesn't exist. (/mnt/gfs/ACQUIASITE/livedev/docroot/core/lib/Drupal/Core/Config/Schema/ArrayElement.php:76)
The nodes that have this text format do not show any content.
So it looks like the patch is working, but still failing to do what is proposed because the configuration schema doesn't match for filter_null and it's still trying to migrate the settings from drupal 7.
Comment #24
mradcliffeOh, I see. There's a change in 8.9.x that makes this not compatible with 8.8.x :(
I'll put this back in Needs review and see if I can replicate in 8.8.x by applying the change to FilterSettings.
Comment #25
mradcliffeOnce I applied the commit from #2946889: Missing migration filters that are replaced with filter_null may have invalid settings applied, re-ran the filter format migration as an update, then the Filtered HTML text format was created, saved it, and the node edit screen displayed correctly in Drupal 8.8.1.
Comment #26
Wim Leers#25: any chance you could include before vs after screenshots of what you describe in #25? 🙏😊
(Thanks for the manual testing by the way!)
Comment #27
myrto_a CreditAttribution: myrto_a as a volunteer and commentedfor me it was resolved by going to admin/config/content/formats and by enabling CKeditor where it was not set
Comment #28
mikelutzI like the approach and the list of contrib filters seems reasonable. The format migration is not generally one that it ran incrementally or updated, so this should provide no issues for people running ongoing migrations. It will, however help alleviate a common and confusing issue when running migrations, where body text doesn't appear to come through
Comment #29
alexpottCommitted and pushed f8c0cf381d to 9.0.x and adf48cb3cc to 8.9.x. Thanks!
I think we should consider backporting this task to 8.8.x as it makes migrating to D8 simpler.
Comment #32
Wim Leers+1
Comment #33
mradcliffeWe also would need to back port #2946889: Missing migration filters that are replaced with filter_null may have invalid settings applied.
Comment #34
mikelutz#2946889: Missing migration filters that are replaced with filter_null may have invalid settings applied had already been backported to 8.8. I don't object to putting this in 8.8 too. It technically changes how a process plugin works, but as I said above, this is a migration you run once, so you either have the old behavior or the new, we are not breaking any ongoing process, plus the change is only ever an improvement, plus it's just so important to the project to make migrating as easy as possible that I'm +1 to putting this in 8.8 if possible.
Comment #35
alexpottThis broke PHP 7.4 due to
I hotfixed the test to not create an exception but pass a flag in instead.
Comment #38
alexpottHere's a complete patch for 8.8.x
Comment #39
mikelutzComment #40
je1 CreditAttribution: je1 commentedApplied this patch to 8.8.x, and did "drush mim upgrade_d7_node_blog --update", the blog entries updated with correct created dates and titles but the blog entry body and summary are blanks.
got several of these from the drush command:
Missing file with ID 24647. ImageItem.php:329 [warning]
reports shows:
Location http://.../admin/config/content/formats/manage/php_code?_wrapper_format=drupal_ajax&ajax_form=1&destination=%2Fadmin%2Fconfig%2Fcontent%2Fformats
Message Missing filter plugin: filter_null.
Not sure how to go about fixing that, just migrated from d7 to d8 and need the content migrated. The book type contents worked but not the basic pages or blogs content.
Comment #41
Wim Leers#21: found why it was necessary — see #3126063: Harden SubProcess process plugin.
Comment #42
xjmGot confused trying to apply this to 9.1.x. It's already been committed and the 8.8 patch is in #38.
Comment #44
quietone CreditAttribution: quietone as a volunteer commentedNot sure why this is NW the tests are passing. And I reran them to be sure. So, restoring the RTBC from #40.
Comment #47
benjifisherThe bot moved this issue from 8.8.x to 8.9.x. Since this issue has already been fixed for 8.9.x and was open as a backport to 8.8.x, I am marking it fixed and removing "backport" from the title.
Comment #48
quietone CreditAttribution: quietone as a volunteer commentedThere is still a bit to do here. The change record needs to be published. It looks fine to me but I don't know much about filters.
Comment #49
benjifisherAs my comment (#47) says, I meant to mark this issue Fixed, but I seem to have marked it Active. #multitasking
Yes, the CR is still unpublished. That CR refers to both this issue and the follow-up #3126063: Harden SubProcess process plugin. I am not sure what the right thing to do is. Maybe we should do our best to get that issue fixed, or maybe we should update the CR to describe how the system currently works.
I am also not sure what the status of this issue should be right now. Maybe NR is best, since at least that means it is more likely to be noticed at the weekly migration meeting.
Comment #50
benjifisherAt #3154841: [meeting] Migrate Meeting 2020-07-09, we agreed that the CR can be published even though the follow-up issue is still RTBC. I published the CR, so now I will mark this issue as Fixed.
Comment #52
TomChiverton CreditAttribution: TomChiverton commentedStill happens on fresh D7 to D9 migration.
Comment #53
Kristen PolI'm getting:
errors during a Drupal 6 => Drupal 9 migration via Migrate Drupal UI.
UPDATE: It seems like going through and saving all the text filters may have fixed this. When I re-saved the node, it didn't but when I went and saved all the text filters, I see the content now. It's possible it's from something else but it's worth a try if people are having issues.
Comment #54
quietone CreditAttribution: quietone as a volunteer commentedThis results when the source uses a filter that does not exist on the destination. So not really a bug and the data has been migrated. To get it to display see the sections about filters at Known issues when upgrading from Drupal 6 or 7 to Drupal 8