Problem/Motivation

Drupal 8 has separate field storage types Text (plain) and Text (formatted). There's also Text (plain, long) and Text (formatted, long). The important part here is that this selection is done on field storage level.

In Drupal 7, the text processing settings are defined on Field instance settings. In other words, the text processing settings can be plain text on one content type and filtered html on another.

In Drupal 6, the text processing settings are defined on global field settings which is equivalent of Drupal 8 field storage settings.

There are two issues on how text fields are migrated to Drupal 8.

  • Drupal 7 text fields and long text fields are currently migrated to Drupal 8 as Text (formatted) / Text (formatted, long). This is problematic from two different aspects: a) single-line text fields are migrated as Text (formatted) which is rarely the desired behavior and b) migrating content of D7 plain text field to a Drupal 8 Text (formatted) field might have security implications if the content filtering settings are not correctly set.
  • When analyzing the Drupal 7 issue mentioned above, it was identified that the Drupal 6 text formatting settings were not respected even though they can be mapped 1:1 to Drupal 8. This topic is handled as a separate child issue: #2849861: D6 text area formatting settings not respected when migrating to D8

Proposed resolution

For the Drupal 7 text fields, the proposed resolution is as follows:

* Common base for d7_field, d7_field_instance sources
* if the text or text_long field has only plain text instances, migrate to string or string_long field.
* If the text or text_long field has only filtered text instances, migrate to text or text_long field.
* if the text or text_long field has both plain text and filtered text instances, don't migrate and log a message indicating the required steps in order to fix this (fix the instances on the source site or use custom migration)
* If there are instances of both text-y and string-y types, string-y instances fall back to text with plain_text filter
* 'entity_exists' process plugin (to check for plain_text filter)
* 'log' process plugin (to log errors if the filter doesn't exist)

The proposed resolution for the Drupal 6 issue is covered in the child issue: #2849861: D6 text area formatting settings not respected when migrating to D8

Remaining tasks

  • Agree if the proposed resolution is the way to go.

  • Figure out how the D7 field instances can be fetched from the context of the field storage migration.

  • Modify the field storage migration logic so that it takes the D7 instance filtering settings into account.
  • Tests, lots of them.

User interface changes

None.

API changes

None, hopefully.

Data model changes

None, hopefully.

---clips---
When this issue was analyzed, all possible permutations of the Drupal 6 and Drupal 7 text fields were analyzed. The findings of this analysis is listed below for reference purposes.

Summary of the analysis: D7 - D8 migrations

Scenario ID: D7/1

D7 Field type: Text
D7 Widget type: Text field
D7 Global field settings: 255 or less
Current D8 field storage type: Text (formatted)

This scenario is currently being discussed in the comments of this issue. D7 text processing settings are on field instance settings, not in global field storage settings. It is not possible to determine the text processing from field instance settings so we need to choose if we want to migrate to Text (plain) or Text (formatted).

Single-line text fields are typically not used for formatted text (but of course it is possible that this would be the case). It is being proposed that we would change the migration of this scenario so that the target D8 field type would be Text (plain) instead of Text (formatted).

Scenario ID: D7/2

D7 Field type: Text
D7 Widget type: Text field
D7 Global field settings: 256 or more
Current D8 field storage type: Text (formatted)

This scenario has a bug: if the field length exceeds 255 characters, we have data loss because the current D8 target field type Text (formatted) is limited to 255 characters. Child issue has been created to change the target field type to Text (formatted, long). See #2849864: D7 text fields with long field length incorrectly migrated to D8.

This scenario is currently being discussed in the comments of this issue. The same comments as in the scenario D7/1 apply with the exception that the discussion is whether the D8 target field type should be Text (plain, long) or Text (formatted, long). Update: now that the field length is working as expected, this scenario is effectively same as D7/1. In other words, the question is between Text (plain) vs. Text (formatted).

Scenario ID: D7/3

D7 Field type: Long text
D7 Widget type: Text area (multiple rows)
D7 Global field settings: n/a
Current D8 field storage type: Text (formatted, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D7/4

D7 Field type: Long text and summary
D7 Widget type: Text area with summary
D7 Global field settings: n/a
Current D8 field storage type: Text (formatted, long, with summary)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Summary of the analysis: D6 - D8 migrations

Scenario ID: D6/1

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Plain text b) max length less than 256
Current D8 field storage type: Text (plain)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/2

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Plain b) max length greater than 255.
Current D8 field storage type: Text (plain, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/3

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Plain b) max length not defined
Current D8 field storage type: Text (plain, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/4

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Filtered text b) max length less than 256
Current D8 field storage type: Text (formatted)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/5

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Filtered text b) max length greater than 255
Current D8 field storage type: Text (formatted, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/6

D6 Field type: text
D6 Widget type: Text field
D6 Global field settings: a) Text processing Filtered text b) max length not defined
Current D8 field storage type: Text (formatted, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/7

D6 Field type: text
D6 Widget type: Text area (multiple rows)
D6 Global field settings: a) Text processing Plain b) max length: does not matter
Current D8 field storage type: Text (formatted, long)

The analysis revealed a bug in this scenario. D6 has text processing on global settings. If the text processing has been set to Plain Text in D6, that should be maintained in the migration. A child issue has been created to handle this issue. See #2849861: D6 text area formatting settings not respected when migrating to D8

Scenario ID: D6/8

D6 Field type: text
D6 Widget type: Text area (multiple rows)
D6 Global field settings: a) Text processing Filtered text b) max length: does not matter
Current D8 field storage type: Text (formatted, long)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/9

D6 Field type: text
D6 Widget type: Select list
D6 Global field settings: Does not matter
Current D8 field storage type: List (text)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/10

D6 Field type: text
D6 Widget type: Check boxes / radio buttons
D6 Global field settings: Does not matter
Current D8 field storage type: List (text)

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Scenario ID: D6/11

D6 Field type: text
D6 Widget type: Single on/off checkbox
D6 Global field settings: Does not matter
Proposal for D8 migration: Boolean

No issues found in the analysis. No changes proposed. This scenario is here for reference purposes only.

Members fund testing for the Drupal project. Drupal Association Learn more

Comments

jeffwpetersen created an issue. See original summary.

jeffwpetersen’s picture

Title: Plain Text D7 to D8 fails » Plain Text D7 to D8 error
jeffwpetersen’s picture

Title: Plain Text D7 to D8 error » Plain Text D7 to D8 format error
masipila’s picture

Title: Plain Text D7 to D8 format error » D7 Plain text fields incorrectly migrated to D8 as Text (formatted)

I'm also experiencing this issue.

On my D7 site, I have a field of type Text and this field is migrated to D8 as Text (formatted) when I was expecting it to be migrated as Text (plain)

When I was googling about this, I found this issue: #2324649: Migrate text fields correctly based on their text_processing setting.. The description of that issue says that fields without text processing will use string/string_long types and those with text processing will continue to be text_*.

If I look at the field_config database table in the D7 database, the config for one of my plain text fields look like this:

a:8:{s:9:"type_name";s:7:"gallery";s:12:"entity_types";a:0:{}s:8:"settings";a:4:{s:10:"max_length";i:255;s:15:"text_processing";s:1:"0";s:14:"allowed_values";s:0:"";s:18:"allowed_values_php";s:0:"";}s:12:"translatable";s:1:"0";s:7:"storage";a:5:{s:4:"type";s:17:"field_sql_storage";s:8:"settings";a:0:{}s:6:"module";s:17:"field_sql_storage";s:6:"active";s:1:"1";s:7:"details";a:1:{s:3:"sql";a:2:{s:18:"FIELD_LOAD_CURRENT";a:1:{s:29:"field_data_field_photo_credit";a:2:{s:5:"value";s:24:"field_photo_credit_value";s:6:"format";s:25:"field_photo_credit_format";}}s:19:"FIELD_LOAD_REVISION";a:1:{s:33:"field_revision_field_photo_credit";a:2:{s:5:"value";s:24:"field_photo_credit_value";s:6:"format";s:25:"field_photo_credit_format";}}}}}s:12:"foreign keys";a:1:{s:6:"format";a:2:{s:5:"table";s:13:"filter_format";s:7:"columns";a:1:{s:6:"format";s:6:"format";}}}s:7:"indexes";a:1:{s:6:"format";a:1:{i:0;s:6:"format";}}s:2:"id";s:2:"20";}

Unserialized version of the settings:

    [settings] => Array
        (
            [max_length] => 255
            [text_processing] => 0
            [allowed_values] => 
            [allowed_values_php] => 
        )

The text_processing settings is present but it has value 0. Could it be that the migrate script is only looking if the text_processing is set (evaluates TRUE in this example) when it should be checking if it is set and has a value that is something else than 0?

Cheers,
Markus

quietone’s picture

Issue tags: +migrate-d7-d8
masipila’s picture

Ok, found the root cause for this issue.

Background information of my D7 test site

My site is originally built on D5, upgraded to D6 and then to D7. I'm now about to migrate the D7 site to D8. My site has two languages, FI and EN. I don't think these have anything to do with this, but I'll mention it here anyway since I haven't tested yet with a vanilla D7 site yet.

Migration preparations steps:

  1. I created the new D8 site
  2. I enalbed language, interface translation, content translation
  3. I enabled migrate_upgrade, migrate_tools, migrate_plus
  4. I created the migrations with drush migrate-upgrade --configure-only

My findings

When I execute drush migrate-import upgrade_d7_field, I can see that getFieldType() of class \Drupal\text\Plugin\migrate\cckfield\TextField is invoked.

The code of this method is as follows:

public function getFieldType(Row $row) {
  $widget_type = $row->getSourceProperty('widget_type');

  if ($widget_type == 'text_textfield') {
    $settings = $row->getSourceProperty('global_settings');
    $field_type = $settings['text_processing'] ? 'text' : 'string';
    if (empty($settings['max_length']) || $settings['max_length'] > 255) {
      $field_type .= '_long';
    }
    return $field_type;
  }
  else {
    switch ($widget_type) {
      case 'optionwidgets_buttons':
      case 'optionwidgets_select':
        return 'list_string';
      case 'optionwidgets_onoff':
        return 'boolean';
      case 'text_textarea':
        return 'text_long';
      default:
        return parent::getFieldType($row);
    }
  }
}

What we can see here is that we are trying to get the source property widget_type of the row and if that type would be text_textfield, then we're trying to check what is the text processing settings. If $settings['text_processing'] evaluates FALSE, the $type is set to string.

The problem here is caused by the fact that at least on my site $widget_type = $row->getSourceProperty('widget_type') is always NULL, which leads to a situation where the logic described above is never reached and the method falls back to the else-clause.

I debugged this with with

$debug = $row->getSource();
drush_print(json_encode($debug));

Indeed, the $row seems not to have a property widget_type (it has type. It doesn't have a property global_settings either, it has settings.

Before creating a patch, I will investigate (hopefully later this weekend) if these properties are present on D6 CCK fields, but at least they are not on my D7 site.

Cheers,
Markus

masipila’s picture

Status: Active » Closed (won't fix)

I installed vanilla D6 and D7 sites to investigate more.

Text processing settings in Drupal 6, 7 and 8

  • Drupal 6: text processing settings are defined on global settings. In other words, the text processing settings affects all content types where the field is used.
  • Drupal 7: text processing settings are defined on instance settings. In other words, the text processing settings can be plain text on one content type and filtered html on another.
  • Drupal 8: there are separate field types Text (plain) and Text (formatted). In other words, the plain vs. formatted selection is made on the storage settings (like in D6), not on the instance settings (like in D7).

About the D7 - D8 text field migrations

As I mentioned in the earlier comment, the field storage settings are migrated in TextField::getFieldType(Row $row)

I investigated the source properties of the $row and there is absolutely nothing that would indicate the D7 text processing settings. This is quite logical given the fact that the text processing settings in D7 are defined on field instance, not field storage level. And because the field can be used on two different content types so that the field is plain text in one of them and formatted text in the other one, it is conceptually impossible to write a migration logic that could determine which one to use.

Conclusion: I'm closing this as a WONTFIX. :(

Ugly, quick and dirty workaround

Since this migration is a one-time operation, I used a very ugly workaround and edited core/modules/text/src/Plugin/migrate/cckfield/TextField.php. I added a hard coded list of my D7 site's plain text fields to the beginning of getFieldType method.

diff --git a/core/modules/text/src/Plugin/migrate/cckfield/TextField.php b/core/modules/text/src/Plugin/migrate/cckfield/TextField.php
index 89475ff..2c8379f 100644
--- a/core/modules/text/src/Plugin/migrate/cckfield/TextField.php
+++ b/core/modules/text/src/Plugin/migrate/cckfield/TextField.php
@@ -99,6 +99,13 @@ public function processCckFieldValues(MigrationInterface $migration, $field_name
    * {@inheritdoc}
    */
   public function getFieldType(Row $row) {
+    // Check if current field is one of the hard coded plain text fields.
+    $field_name = $row->getSourceProperty('field_name');
+    $plain_fields = ['field_a', 'field_b', 'field_c'];
+    if (in_array($field_name, $plain_fields)) {
+      return 'string';
+    }
+
     $widget_type = $row->getSourceProperty('widget_type');
 
     if ($widget_type == 'text_textfield') {

Cheers,
Markus

markabur’s picture

Status: Closed (won't fix) » Active

Hm, so when D8 migrates any single-line text field, it will always come in as text (formatted) rather than text (plain)? Surely that can't be the correct, expected behavior?

masipila’s picture

Hi,

Every field has two different settings: the field storage settings and the field instance settings.

Field storage settings in Drupal 8 include for example the field name and field type, e.g. Text (plain) or Text (formatted). If the field is used on multiple content types, the field storage settings apply to all content types. In other words, the field type (plain vs. formatted) is done on the field storage settings in Drupal 8.

In Drupal 7, the field storage setting have a field type, but it is just Text. In other words, the storage settings do NOT define anything about the formatting i.e. plain vs. formatted. In Drupal 7, this is defined in field instance settings.

1. Let's assume that our Drupal 7 site has two content types: cat and dog

2. Let's say we have a field_nickname, whichs is of type text in our Drupal 7 site.

3. Let's say that the field_nickname formatting setting on cat instance is plain text and on dog instanceformatted text.

4. Let's now consider the creation of field_nickname to our Drupal 8 site. Should it be of type Text (plain) or Text (formatted)?

Markus

markabur’s picture

The ideal would be for the migration script to create two separate fields, each with its own configuration.

If it can't do that then it should at least default to text (plain), because in the real world single-line text fields are pretty much always plain text. I've never seen formatted text used for a single-line text field.

catch’s picture

Priority: Normal » Major
Issue tags: +Migrate critical

Bumping to major and tagging - assuming this is valid it's a very hard situation to get out of.

masipila’s picture

Continuing what @markabur is suggesting in #10. One option would be as follows:

  • D7 Text would be migrated to D8 as Text (plain) or Text (plain, long) if in the D7 storage settings the field length exceeds 255 characters.
  • D7 Long text would be migrated to D8 as Text (formatted, long)
  • D7 Long text and summary would be migrated to D8 as Text (formatted, long, with summary)

I can create a patch for this approach if this is the direction we want to go.

Considerations

  • I agree with @markabur that one line text fields are most often plain text fields. I've never seen one-line textfields to have formatted text but of course there are sites and use cases which would have these.
  • Long text fields are a bit more trickier to judge. There are definitely use cases where these would contain formatted text, but it's very easy to imagine a use case where we would have plain text in textareas (i.e. D7 long text). However, I don't think it would be the end of the world if we would migrate these as formatted texts. It's then up to the site builder which text formats he/she wants to grant to different roles.
  • Long text with summary: I don't see a difference to Long text, these two should behave the same way.

Any comments for this approach?

Markus

markabur’s picture

I wonder if it is possible for the migration script to check the field instance settings somehow? Then it could go with plain if all instances use plain, and formatted if all instances use formatted.

The hard thing is what to do when there is one field with multiple instances with some configured as plain and some as formatted. Fortunately that is more of an edge case, even thinking about long-text fields. For some sites there are important business reasons to retain the split between formatted an plain field instances, so for them the new d8 site would need to have two separate fields. But for other sites it would be best to force one way or the other (maybe going forward, "field_nickname" really ought to allow formatting for dogs as well as cats).

masipila’s picture

At least I'm not able to think how we could access the field instances from here.

Splitting the field into two separate fields is definitely not possible since the different migrations are stand-alone. What I mean by this is that when you're later migrating nodes, the field content will go from D7 field_a to D8 field_a and that migration only expects that the fields are migrated before nodes can be migrated. Writing some black magic voodoo that would sometimes migrate the content to D8 field_a and sometimes to D8 field_b is definitely not the way we can go here.

Markus

Version: 8.3.x-dev » 8.4.x-dev

Drupal 8.3.0-alpha1 will be released the week of January 30, 2017, which means new developments and disruptive changes should now be targeted against the 8.4.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

masipila’s picture

Title: D7 Plain text fields incorrectly migrated to D8 as Text (formatted) » D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected

There's another bug in this same method, this time in D6 text area migrations.

Steps to reproduce:

  • In D6, create a text field, select textarea as widget type
  • In global field settings, set text processing as Plain text
  • Migrate the field to D8.
  • Expected result: Field is migrated as Text (plain, long)
  • Actual result: Field is migrated as Text (formatted, long)

I created a patch to the original issue, it also includes a fix for this. I'll still update the unit tests to both and submit the patch shortly.

Cheers,
Markus

masipila’s picture

Okay, here we go.

The attached patch includes following changes:

Drupal 7 text fields

  • All single-line text fields are now migrated to D8 as Text (plain) / Text (plain, long)
  • They used to be migrated as Text (formatted) / Text (formatted, long)
  • No changes how D7 Long text fields are migrated.
  • No changes how D7 Long text with summary fields are migrated.

Drupal 6 text fields

  • Long text fields: if the global text processing setting is Plain text, field is migrated to D8 as Text (plain, long).
  • Long text fields: if the global text processing setting is Filtered text, field is migrated to D8 as Text (formatted, long).
  • Previously all Long text fields were migrated to D8 as Text (formatted, long)
masipila’s picture

Status: Active » Needs review
jeffwpetersen’s picture

D7 migration works. All plain_text fields are migrated correctly. Text (plain) and Text (plain, long) are both correct.

D6 fields that are plain text with D6 field settings of "Size of textfield: #" should be migrated to Text (plain) are being migrated incorrectly to D8 as Text (plain, long).

D6 fields that should be migrated to Text (plain, long) have field settings in D6 of "Rows: #" as opposed to the Text (plain) which has the "Size of textfield: #" field setting.

catch’s picture

Priority: Major » Critical

Going to bump this to critical since it's data integrity and potentially security depending on what the source data is like.

masipila’s picture

@jeffwpetersen, could you please take a screenshot of the D6 field configuration that you were referring in #19 to avoid misunderstandings?

masipila’s picture

I found amother issue with D6 migrations. In D6 we can have on/off checkboxes with site-builder defined text values. For example, you could have allowed values of the textfield configured as

yes|Yes, I agree
no|No

The important thing here is that the value that will be saved to the database is text.

Now, the field storage migration here is migrating these on/off checkbox textfields as boolean checkbox fields. The field storage gets created but when we're later trying to migrate the content of this field, things boil down...

What I propose is that we create a summary of the different scenarios to the issue summary and then try to figure out the concept / approach for each scenario. Writing the actual code is trivial but the concept is difficult due to the different data models...

I'm writing this with my mobile, I can create the summary table tomorrow.

Cheers,
Markus

masipila’s picture

Status: Needs review » Needs work
jeffwpetersen’s picture

field #1 Sub Head is migrating to D8 incorrectly as: Text (Plain, Long).
field #2 is what the configuration for D6 Plain Text Long looks like in admin. AKA: Text area (multiple rows) in the D6 admin interface.
plain text error

masipila’s picture

Issue summary: View changes
masipila’s picture

Regarding #24: You need to check the global settings i.e. the section that says These settings apply to the Sub Head field in every content type in which it appears.. The content type specific settings can't be interpreted because this migration script doesn't know if the field is used on multiple content types.

Proposed next steps:
- Please share the D6 widget type and the Global settings
- What is the expected result
- What is the actual result

Cheers,
Markus

masipila’s picture

Issue summary: View changes
masipila’s picture

Issue summary: View changes
masipila’s picture

Issue summary: View changes
masipila’s picture

Issue summary: View changes
masipila’s picture

Issue summary: View changes
masipila’s picture

I have now listed all possible scenarios that I was able to think of to the issue summary.

Please refer to the scenario by the ID when commenting, e.g. D7/1 or D72.

@catch, could you please go through the scenarios. If you agree with the proposals that I wrote to the issue summary, I can then write a patch according to the agreed approach.

Cheers,
Markus

masipila’s picture

Issue summary: View changes

Two different D6 scenarios had same scenario ID, fixed now in the issue summary.

masipila’s picture

Issue summary: View changes

One d6 scenario was listed twice on the issue summary. Fixed that and re-ordered the d6 scenarios so that they now appear in logical order.

masipila’s picture

@jeffwpetersen, which scenario from the issue summary does your issue in #19 and #24 fall into?

Cheers,
Markus

jeffwpetersen’s picture

@masipila

D6/1 & D6/2 should behave as expected.

phenaproxima’s picture

I have what might be a cunning plan to solve this problem.

Drupal 8 has two long string field types -- string and text. String fields are plain text only and do not support filter formats at all (they don't provide a database column), whereas text fields do.

To me, the solution to the possible conflict(s) is to migrate the problematic fields as text fields, not string fields, and have them use the plain_text filter format (installed by the Filter module) by default. If the plain_text format doesn't exist -- deleted by a misguided site builder? -- Migrate will need to recreate it first by reading it out of Filter's default configuration.

masipila’s picture

@phenaproxima: D8 is currently migrating the D7 single line textfields (e.g. scenario D7/1 in the issue summary) as Text (formatted).

If the user has permission to other filter formats, the user will be shown a format select list after every single line text field. This is the whole starting point where this issue starts from.

At least on my sites, I need this:
1) I have tons of single-line textfields which will never have formatting
2) I have tons of textareas where I want to have a WYSIWYG editor for the user and I definitely don't want that the user would need to manually select a filtering format. Most of my users who are creating content don't even know what text formats are. Hence: the default formatter for these textareas need to be Restricted HTML.

Markus

masipila’s picture

@jeffwpetersen, as a reply to #36: Are you saying that D6/1 and D6/2 are working as currently defined in the issue summary or that you are expecting some other result from these scenarios?

Markus

masipila’s picture

Issue summary: View changes

I was investigating the scenario D6/11 more. This scenario is about the D6 on/off checkboxes. Currently these D6 text fields are migrated to D8 as boolean checkboxes. I had problems with that migration so I proposed here that we should migrate these D6 checkboxes as List (text) but the migration to D8 boolean is the more correct way to do this.

There are other issues for the D6 on/off checkbox migration to D8 boolean, such as #2588827: CCK migrations seem to have trouble with some checkbox types and #2405017: Cannot migrate boolean fields of type text. I need to test the on/off check box migrations more because I think it does not work properly at the moment. Having said this, if my further tests show that there is something wrong on how the content of the D6 on/off checkboxes is migrated to the D8 boolean field, we should cover that as another issue. So let's focus here on the D8 field storage types and boolean is the correct target type for the D6/11 scenario.

I've now updated the issue summary to reflect this.

masipila’s picture

Status: Needs work » Needs review

Based on #40, the patch from #17 should now reflect the scenarios of the issue summary. Changing this back to Needs Review.

masipila’s picture

Issue tags: +migrate-d6-d8
masipila’s picture

Issue summary: View changes

I updated the issue summary so that it highlights the changes that the patch #17 is applying with arguments why those changes are proposed.

Now I'll rest with this and will wait that somebody reviews the issue summary.

Cheers,
Markus

heddn’s picture

Title: D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected » D6/D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected
Status: Needs review » Needs work

re #38: the point made by @phenaproxima in #37 is that we have options for the critical data loss and security issues; those which bumped this to a critical. We should take a conservative stance and choose the most flexible but also most secure option available. This can be done with his proposal. That solution isn't the ideal in all cases (as you outlined for your use case) but it wouldn't result in either data loss or security vulnerabilities.

In your use case, nothing stops you from developing a custom migration that maps the fields more appropriately for your scenario. Can yo

With 15 scenarios in the IS, can we eliminate the things that are just noise? I see a lot of scenarios that aren't going to change. Then I see a few scenarios that require changes. Can we split each of these scenarios into sub-issues so the merits and pros/cons can be individually addressed?

Setting to NW to respond to the proposal of splitting things out and discussing solutions in child issues.

masipila’s picture

@heddn, thanks for your comments!

The reason I included all permutations to the issue summary was that I identified that they were not systematically considered before (e.g. the existing code for D7 migrations is falling back to the very last default because $widget_type is NULL in D7 migrations). Since it took a while to do that, I documented the results so that the community does not need to do that analysis again. I apologize if the issue summary became overwhelming because of that.

Based on your feedback, I will create separate child issues for the following clear bugs. I'll do these later this afternoon when my son is taking a nap.

  1. D7 text fields with long field length are migrated as short (Scenario D7/2)
  2. D6 text area formatting settings not respected (Scenario D6/7)

Cheers,
Markus

masipila’s picture

Issue summary: View changes

I created the two child issues for the clear bugs so that we can continue the discussion on the D7 Text field formatting approach on this original thread. The two child-issues are:
#2849861: D6 text area formatting settings not respected when migrating to D8
#2849864: D7 text fields with long field length incorrectly migrated to D8

I also updated the issue summary with the information that the clear two bugs are handled in their own child-issues.

masipila’s picture

Now back to the original topic i.e. should the D7 text fields (scenarios D7/1 in the issue summary) be migrated to D8 as Text (plain) / Text (plain, long) or should they be migrated to D8 as Text (formatted) / Text (formatted, long) or do we have some other alternatives.

@catch, heddn: you have both raised concerns that if we would migrate the D7 text fields as Text (plain), this might open an attack vector under some circumstances. Could you please elaborate this? Are you referring to a scenario where we would have the formatting set to PHP and that the field would contain PHP code or something else?

Considering other alternatives (what @markabur proposed in #13).

The input argument of the TextField::getFieldType method is \Drupal\migrate\Row that represents the Drupal 7 field storage. Because this Row is representing the field storage, it doesn't have any information about the field instance settings where the text formatting settings are defined.

If we would able to read the D7 field instance settings for this field storage, we could have the following logic:
1. Get a list of all D7 field instances of this field storage.
2. If the text formatting on all of the D7 field instances is Plain Text, then migrate this field storage to D8 as Text (plain) or Text (plain, long), depending on the field length of the storage.
3. Else: Migrate the field storage to D8 as Text (formatted) or (Text, formatted, long) depending on the field length of the storage.

Cheers,
Markus

masipila’s picture

Issue summary: View changes

Scenario D7/2 field length bug: This is not a valid bug. I investigated this more in the child issue and the field length is correctly migrated. See #2849864: D7 text fields with long field length incorrectly migrated to D8. Issue summary updated accordingly.

masipila’s picture

Issue summary: View changes
masipila’s picture

Continuing on the consideration if the D7 single-line text fields (D7/1) would be migrated to D8 as Text (plain).

I'm still wondering what is the possible security concern with this approach. I made a simple test in order to test XSS:

  • I created a single-line text field in D7 (scenario D7/1).
  • I entered value <script>alert("hello world");</script> to the text field in D7.
  • I migrated this field storage to D8 as Plain (text) and migrated the node
  • There is no XSS that I can see; the markup is shown as plain text in the text field (i.e. the user input is escaped to plain text)

To me, the question here is as follows:

  • Most of single-line text fields are plain text. However, most does not mean all.
  • If we migrate these D7 single-line text fields (scenario D7/1) as Text (formatted), then the D8 users are seeing the text format drop down list for each single-line textfield, which is not desired in most cases.
  • If we migrate these D7 singel-line text fields as Text (plain), then the D8 users are NOT bothered with the text fromat drop down. However, this means that in those use cases where the single-line text field actually has markup, then that mark-up is not rendered but the raw markup is shown as escaped plain text.
  • This is the choice that we have to make unless there is a way how we could somehow analyze the field instance settings when we are migrating field storage settings.
  • Now waiting for input from the others.

    Markus

heddn’s picture

@masipila are you in IRC in #drupal-migrate? We could chat more easily there.

From what I understand of the problem, if we have field_caption marked as plain text on the page content type and marked as formatted text on the article content type, we have to resolve the field to something in D8 that has the same field name. If we always map the same field to plain text, then the article field would be corrupted. But if we always map to a formatted text field, then the page content type with the XSS would introduce an XSS vulnerability.

masipila’s picture

Title: D6/D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected » D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected
Issue summary: View changes

I updated the issue summary to the standard issue summary template to avoid confusion and to clearly indicate the remaining tasks.

Markus

masipila’s picture

Issue summary: View changes
phenaproxima’s picture

Assigned: Unassigned » phenaproxima

I'll take a crack at this.

alexpott’s picture

Issue tags: +Triaged D8 critical

Discussed with @catch, @cilefen, @cottser, @laurii, and @xjm and we agreed that this was critical because of data integrity and security. The data integrity issue is because a D7 field is migrated to a single D8 field but sometimes the D7 field instance settings should actually map to two different D8 fields. The security issue is because some sites default formatter might be permissive so a plain text field on D7 might migrate to a field that allows markup that was not allowed before.

vasi’s picture

I'm thinking of giving this a shot, but I'm a bit confused after reading through the comments. Is the proposed resolution in the summary still the plan here?

If all instances of the field are filtered text, the D8 field storage should be of type Text (formatted)... In other cases, the D8 field storage should be Text (plain)

Or is the idea that we'll map a single D7 field to two D8 fields?

sometimes the D7 field instance settings should actually map to two different D8 fields

masipila’s picture

I think that the very first task that we need to do here is to figure out a way how we can check all the D7 field instance settings when we are running the field storage conversion. Once we can do this, then we are half way through already and can proceed to even the field-split direction @alexpott was referring to.

Markus

vasi’s picture

masipila, it's not very hard to get the field instance settings in the d7_field source. It's slightly annoying, but not too hard: You add fci.data field to the query (which contains the field settings), remove the distinct() and figure out yourself whether each row is distinct.

The hard part is creating two D8 field storages for a single D7 (entity_type, field_name). The problem is that a migration 'process' can munge a row, or skip a row—but they can't *add* a row. So we'd have to:

* Create the duplicate row in the d7_field source, not the process
* Deduplicate the field name
* In every other place that refers to a field_name, figure out which one you want

heddn’s picture

@vasi the recomendation in #37 by the migrate maintainers is to create a single test field and set its default formatter to plain_text.

vasi’s picture

Ok, here's a brand new patch according to that plan: if text_processing is false, we force the filter to plain_text. No interdiff, as it's basically unrelated to the previous patch.

Things that are good:

* It's super simple and obvious
* There's a test

Things that are sub-optimal:

* The test isn't very meaningful. It would be nice to have at least a simple integration test.
** Ideally, we'd also test the whole process: add a potentially-unsafe field value to the D7 migration fixture, run the migration, verify the value is rendered safely.
* Users may be confused by all their plain-text D7 fields having formatters in D8. This is more complex to fix and probably isn't critical, I suggest we move it to a follow-up issue.
* I don't check whether or not the plain_text filter exists in D8. Is it possible for it not to exist? I see no way of removing it in the UI.

vasi’s picture

Status: Needs work » Needs review
heddn’s picture

Status: Needs review » Needs work

Let's do the extra work of seeing if we *need* to mark the field as filtered text in D8. If it is the only field instance(s) of that type are *not* filtered text, then let's do the "right thing" and convert it to a Text (plain) or Text (plain, long) destination.

From the IS:

  • When the field storage type is being determined, we should somehow obtain a list of all D7 field instances of this field storage.
  • If all instances of the field are filtered text, the D8 field storage should be of type Text (formatted) or Text (formatted, long) depending on D7 field type: Text (formatted) if the D7 field type is Text and Text (formatted, long) if the D7 field type is Long text.
  • In other cases, the D8 field storage should be Text (plain) or Text (plain, long), again depending on the D7 field type as above.
vasi’s picture

I thought we'd try to just handle the critical part separately from the nice-to-have. I've played around a bit looking for clean ways to do the nice-to-have part, but I've only come up with ugly solutions so far.

xmacinfo’s picture

Would you mind to clearly define the critical part versus the nice-to-have part?

Would we benefit in creating a new issue for the nice-to-have part?

heddn’s picture

re #63: I don't think we should migrate to a filtered field if it really should be a plain text type, even as an interim step.

catch’s picture

There's no way to switch from filtered to plain text, so we'd leave migrated sites stuck with this.

For a temporary workaround could we detect this case and point out to people they should be doing a custom migration?

The only way to fix conclusively seems to be to split fields used both as formatted and plain text into two separate fields in 8.x.

masipila’s picture

+1 for what catch proposed in #66.

So the concept could be as follows:

When the field storage type is being determined, we should somehow obtain a list of all D7 field instances of this field storage.

If all instances of the field are filtered text, the D8 field storage should be of type Text (formatted) or Text (formatted, long) depending on D7 field type: Text (formatted) if the D7 field type is Text and Text (formatted, long) if the D7 field type is Long text.

If all instances of the field are plain text, the D8 field storage should be Text (plain) or Text (plain, long), again depending on the D7 field type as above.

If there are instances of both formatted and plain text, do not migrate but show a message to the user. If the mixed instances are not really needed, the site builder could go to D7 site and change the instance settings so that the formatting settings are the same for all instances. If the different formatting settings are really needed on different instances, then, the site builder needs to use custom migration.

How do you guys like this approach?

Markus

catch’s picture

Issue summary: View changes

I'd completely forgotten that instance settings can be changed on 7.x...

Given that #67 seems ideal long term not just as a temporary fix. Most sites shouldn't run into this, we give a route for ones that do to change the instance settings in the 7.x UI then run the upgrade (which has the same end effect as the previous solution here). Maybe write a documentation page on Drupal.org with instructions, linking to it from the message?

Sites that need the field to be split are stuck with a custom migration, but I don't see a way around that. Also we'd have had much worse trouble with this doing it with hook_update_N().

I've added this to the proposed resolution.

alexpott’s picture

@catch asked me to have a think about the plan in #67. It seems like a good pragmatic step to helping the majority get there migrations correct. We probably should open a follow up issue to address the edge case where a field has mixed instances in a generic way - but as long as we can inform the user and they mitigate via preparation steps on D7 then this followup shouldn't block migrate being stable.

heddn’s picture

Assigned: phenaproxima » Unassigned
Issue tags: +Baltimore2017

Discussed in the weekly migrate maintainers call and agreed with the general approach discussed between 62 and 69. Also tagging for sprinting at Baltimore.

Un-assigning from Adam so Vasi can proceed with that direction.

vasi’s picture

Discussed with heddn, phenaproxima:

* Common base for d7_field, d7_field_instance sources
* Custom handling for text fields in sources, to figure out if they should be text or string
* If there are instances of both text-y and string-y types, string-y instances fall back to text with plain_text filter
* 'entity_exists' process plugin (to check for plain_text filter)
* 'log' process plugin (to log errors if it doesn't exist; check if issue exists)

vasi’s picture

Status: Needs work » Needs review
FileSize
16.78 KB

Here's my work-in-progress:

* We have a common base for both d7_field and d7_field_instance sources, which takes care of figuring out whether we can convert to string
* New MigrateField String, if we can convert to string
* If there are incompatible instances, unformatted falls back to formatted with plain-text filter.
* New process plugins entity_exists and log. I'll throw these in the sub-issues too.
* A process that handles the warnings

Marked "Needs review" just for testing—will go to "Postponed" after, since we're waiting on sub-issues.

TODO:

* Sub issues for the new process plugins
* Lots more tests

Status: Needs review » Needs work

The last submitted patch, 72: 2842222-72.patch, failed testing.

joelpittet’s picture

Status: Needs work » Needs review

@vasi what is text_filter_warning:? I couldn't find that key anywhere in core.

vasi’s picture

@joelpittet: It's something I forgot to delete, while @heddn and I were looking into different ways of doing this. Thanks for catching that!

vasi’s picture

As I'm reviewing and fixing the test failures, I encountered this:

Drupal\Tests\field\Kernel\Migrate\d7\MigrateFieldTest::testFields
Failed asserting that "28" is identical to 28.

It turns out that Drupal\migrate\Plugin\migrate\source\SqlBase::count() returns a string, even though it's documented as returning an int! I guess I'll try casting to int, and see how many things break?

vasi’s picture

Status: Needs review » Needs work

Most of these I can fix:

* Removed the spurious 'text_filter_warning'
* Casted counts to ints
* Added 'filter' as a module in many migration tests
* Made d7_field_instance_widget_settings also be based on FieldBase. I'll have to do something similar with d7_field_formatter_settings, but a bit more complex.

However, I'm hitting a bit of a wall with processing 'options/type' in the widget and formatter settings migrations. These just use maps now, and rely on the MigrateField/MigrateCckField plugins to populate the map. But now it will have to be more complex, since 'text_textfield' as input can mean either 'text_textfield' or 'string_textfield' as output.

I could make it a two-level static_map on (field_type, formatter_type). But that would mean changing every existing MigrateField, and would break BC. Suggestions?

vasi’s picture

Um, I'm super confused at how d7_field_formatter_settings's options/type even works for MigrateFields. They modify the process by putting *nested* items in the map, but then the map has a single source. So we end up with something like this:

process:
  options/type:
    -
      plugin: static_map
      source: formatter_type
      map:
        date_default: datetime_default
        text:
          trimmed: text_trimmed
...

That can't possibly be meaningful, can it? These nested bits must just be for D6, right? So confusing!

vasi’s picture

After talking with @phenaproxima, we think we should have a custom process plugin that allows MigrateFieldPlugins to directly modify values like 'options/type'. It's both cleaner and more flexible than having them munge the process itself.

Maybe the process plugin can be type "migrate_field", and take a "method" parameter—then we won't have to have multiple process plugins for formatters and widgets.

joelpittet’s picture

Yeah it's a bit confusing, I think the first one date_default: datetime_default is short hand for

date_default: 
  date_default: datetime_default

But I could be wrong, that's how I read it though...

heddn’s picture

Title: D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected » PP-1: D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected
Status: Needs work » Postponed
vasi’s picture

Assigned: Unassigned » vasi
heddn’s picture

Issue tags: +Needs reroll
catch’s picture

Title: PP-1: D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected » D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected
Status: Postponed » Needs review

Just committed #2872812: Create process plugin to log a message, so this is unblocked.

Hadn't realised there's a patch here, will try to review in a bit.

Status: Needs review » Needs work

The last submitted patch, 84: 2842222-83.patch, failed testing.

catch’s picture

If there are instances of both text-y and string-y types, string-y instances fall back to text with plain_text filter

I don't think we can do this safely and it's not what #70 is talking about. If we can't decide between plain text and HTML, we shouldn't migrate and instead tell the user there's a problem - they can then go back and reconcile instance settings on the 7.x site, or write a custom migration.

rakesh.gectcr’s picture

Issue summary: View changes
Jo Fitzgerald’s picture

Assigned: vasi » Unassigned
Status: Needs work » Needs review
FileSize
6.79 KB
16.39 KB
  • Added filter module to tests that required it.
  • Edited expected results that should now be "string" rather than "text".
  • Cast d7_field processed count to integer.
  • Removed unused use statements.
jeffwpetersen’s picture

Status: Needs review » Reviewed & tested by the community

Working for me on D7 - D8. Although I can't attest to every scenario.

phenaproxima’s picture

Status: Reviewed & tested by the community » Needs review

I'm glad this works, but I looked at the patch and I have some architectural concerns. I would like to discuss this more with the other maintainers before we go to RTBC with this.

mikeryan’s picture

Title: D7 Plain text fields incorrectly migrated to D8 as Text (formatted), D6 textarea formatting settings not respected » D7 Plain text fields incorrectly migrated to D8 as Text (formatted)
Status: Needs review » Needs work
Issue tags: -migrate, -migrate-d6-d8
  1. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -8,6 +8,8 @@ source:
    +    text_filter_warning:
    

    There's no value here? And, for that matter, this constant doesn't appear to be used.

  2. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +  warn_if_plain_text_filter_missing:
    

    Generally, YAML is simpler than PHP - but with 10 process plugins here trying to implement if/then/else logic, I think this is a case where one process plugin with straightforward PHP logic is the simpler solution.

    At the very least, if we keep this process pipeline it badly needs comments, it's very difficult to follow what's going on here.

  3. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +      default_value: null
    +      map:
    +        text: true
    +        text_long: true
    

    Seems a little odd that the return values here are true or null - why not false instead of null?

  4. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +      default_value: null
    +      map:
    +        0: true
    

    Ditto.

  5. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +      message: >
    +        Field '@field_name' on '@entity_type' entities has both formatted
    +        text and unformatted text instances. The field on bundle '@bundle'
    

    As far as I can see here, we reach this point if we have a text or text_long field where instance_settings/text_processing for this instance is 0 - how do we know there are both kinds of instances?

    OK, getting farther down I see where getType() has converted those without text_processing to 'string'...

  6. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +    - plugin: callback
    +      callable: intval
    

    Not clear on what this is doing - entity_exists is not returning an integer value?

  7. +++ b/core/modules/field/migration_templates/d7_field_instance.yml
    @@ -29,6 +31,47 @@ process:
    +      message: >
    

    Here and above - just wanted to mention I've never seen the ">" YAML syntax before...

  8. +++ b/core/modules/field/src/Plugin/migrate/source/d7/Field.php
    @@ -12,21 +11,14 @@
    +    $query->fields('fc');
    

    The base query already includes a couple of the 'fc' columns - is the redundancy a problem? Should this explicitly name the remaining columns it wants rather than pulling all the field_config columns?

  9. +++ b/core/modules/field/src/Plugin/migrate/source/d7/FieldBase.php
    @@ -0,0 +1,106 @@
    +  public function count($refresh = FALSE) {
    

    Actual counting should be done in doCount(), so the base count() implementation can manage caching.

  10. +++ b/core/modules/field/src/Plugin/migrate/source/d7/FieldInstance.php
    @@ -13,21 +12,15 @@
    +    $query->fields('fci')
    

    As with the previous 'fc' comment, should we be explicit about the 'fci' columns being added here?

mikeryan’s picture

Bump - Jo, do you have time to address my feedback?

Thanks.

Jo Fitzgerald’s picture

My input on this was only correcting the tests. I think we need to go back to @vasi who wrote the patch in #72 to address @mikeryan's review in #92.

maxocub’s picture

Assigned: Unassigned » maxocub

I'm working on addressing #92 and adding some tests.

maxocub’s picture

This is a bit of a revamp of the previous patch.
For what I understand from #67, #68 and #69, the chosen solution was to:

  • If the text or text_long field has only plain text instances, migrate to string or string_long field.
  • If the text or text_long field has only filtered text instances, migrate to text or text_long field.
  • If the text or text_long field has both plain text and filtered text instances, don't migrate and log a message indicating the required steps in order to fix this (fix the instances on the source site or use custom migration)

For the tests, I added to the fixtures 6 field bases and 12 field instances, covering every scenarios above.
I don't know why there's so much unrelated changes in the fixture, so I uploaded a patch without those changes for the review.

About the reviews from #92, most of them are not relevant anymore (1-7).
8 and 10 have been addressed.
About 9, the count() method from SqlBase, which is overridden by this one, doesn't deal with caching, so not sure what should be done here.

maxocub’s picture

Assigned: maxocub » Unassigned
Status: Needs work » Needs review

Forgot to NR.

The last submitted patch, 96: 2842222-96.patch, failed testing. View results

maxocub’s picture

maxocub’s picture

Issue summary: View changes
maxocub’s picture

Just moved a condition from FieldInstance to FieldBase.

masipila’s picture

Testing this now... I'll report my results once done.

masipila’s picture

Status: Needs review » Needs work

Here are the findings from my manual testing with different combinations.

Tests are with a fresh D8.4.x with git using Standard install profile, patch #101 applied.

Summary

  • The conflicting instance detection should also be applied to field type Long text with summary. I tested also this and noticed that this is always migrated as formatted even if there are conflicting instance settings in D7.
  • Was there supposed to be UI feedback / log message shown to the user if field storage (d7_field) / field instance (d7_field_instance)
    migration is skipping a row because it found conflicting D7 instances? I was migrating using Drush and did not see any messages.

Detailed test report

On D7, I have a content type with the following fields:

body

  • Field type: Long text and Summary
  • Instance settings: Filtered Text
  • Result: OK, migrated as Text (formatted, long, with summary)

field_plain_textfield

  • Field type: Text
  • Instance settings: Plain Text
  • Result: OK, migrated to D8 as Text (plain)

field_formatted_textfield

  • Field type: Text
  • Instance settings: Filtered Text
  • Result: OK, migrated to D8 as Text (formatted)

field_plain_textarea

  • Field type: Long text
  • Instance settings: Plain Text
  • Result: OK, migrated to D8 as Text (plain, long)

field_formatted_textarea

  • Field type: Long text
  • Instance settings: Filtered Text
  • Result: OK, migrated to D8 as Text (formatted, long)

field_shared_textfield

  • Field type: Text
  • Instance settings: Plain text on this content type, Filtered text on another content type
  • Result: OK, the d7_field (storage) and d7_field_instance (instance) migrations skipped these.
  • Remark: Was the supposed to be some UI feedback to the user that there are conflicting instance settings?

field_shared_textarea

  • Field type: Long Text
  • Instance settings: Plain text on this content type, Filtered text on another content type
  • Result: OK, the d7_field (storage) and d7_field_instance (instance) migrations skipped these.
  • Remark: Was the supposed to be some UI feedback to the user that there are conflicting instance settings?

field_shared_textarea_with_summary

  • Field type: Long text and summary
  • Instance settings: Plain text on this content type, Filtered text on another content type
  • Result: NOT OK, field was created as formatted even though there were conflicting instances on D7
masipila’s picture

Arrrrgh... Yet another conceptual nightmare, related to my previous comment.

In D8, we only have Text (formatted, long, with summary). There is no such thing as Text (plain, long, with summary).

However, it is possible to have a D7 field so that the field (storage) type is Long text and summary but on instance settings we have defined text processing to Plain text. I would assume this is not a very common combination, but it is possible.

There is no elegant migration path from D7 to D8 with this corner case. If the source field in D7 has unsafe content, we might still have an attack vector here.

Markus

maxocub’s picture

@masipila: Thanks for your testing and extended report!

About the Text with summary fields, I guess we should:
* If there's both plain and filtered instances, skip the row
* If there's only filtered instances, migrate as is
* If there's only plain instances, (find a solution)

We could skip the row too for that third scenario, explaining that there's no String with summary field on D8.
We could also check if the instances use the summary, and if not, migrate to string_long fields.

About the UI warning for skiped field, it's not very apparent, but it is in the migration log.
There's no way yet to change the message status to error or warning (so it would be more apparent after a migration), maybe we should open an issue to address that. I'll see if it's even possible.

masipila’s picture

About the Text with summary fields, I guess we should:
* If there's both plain and filtered instances, skip the row
* If there's only filtered instances, migrate as is
* If there's only plain instances, (find a solution)

We could skip the row too for that third scenario, explaining that there's no String with summary field on D8.
We could also check if the instances use the summary, and if not, migrate to string_long fields

Scenarios 1 and 2 are conceptually clear and this is the way to go.

For scenario 3. I think that the if-no-content-uses-summary-migrate-to-string-long is adding way too much complexity compared to the benefit. If there is a such a field and the site has (big amount of) content, this would be the case in vast majority of this sub-scenario and we would be anyway back to square one.

The simplest way to handle the plain long texts with summary would probably be be to a) skip these rows and b) log a message. The log message could say something like this:

The type of source field field_foo is Long text and summary and the text processing settings on field instances is set to Plain text. Drupal 8 only has field type Long text and summary (formatted). In order to migrate this field, carefully check that the content of this field does not contain dangerous input, change the source instance settings to Filtered text and try to migrate again.

maxocub’s picture

Status: Needs work » Postponed

This issue was discussed in yesterday's weekly Migrate hangout.
The Field & FieldInstance source plugins are badly not adapted for what needs to be done here and they could use some refactoring for saner maintainability.
Here is the proposed approach:

  1. Change the D7 Field source plugin so that it returns the field definition in as pure a form as possible. But it also needs to include a property called 'instances', which is an array of arrays, each of which is an instance of the field, as unmodified as possible. So basically, the D7 Field source plugin becomes a massive info-dump of all field and all of their instances. #2891932: Refactor D7 Field source plugin
  2. D7 Field Instance source plugin will extend Field to narrow its scope and return only the 'instances' array from each row. To each instance, it will add the base field definition as 'field_definition'. So basically it returns the same information as D7 Field, but in a different way. #2891935: Refactor D7 FieldInstance source plugin and other field instance related source plugins
  3. Write a new process plugin whose job is to invoke and call a method on a field plugin, then return the value from that call. This process plugin takes one value: the field type as known in the source database, which it uses to instantiate and call the appropriate field plugin.

    plugin: process_field
    source: field_type
    method: getFieldType

    #2893061: Create a ProcessField plugin to process the field types

  4. In sub-issues, recreate the current functionality by building dedicated field plugins for each field type we know how to handle, even the ones that are just doing static mapping at the moment.#2893072: Modify the DateField Field plugin so it's used for D6 & D7, #2893073: Create a EntityReference Field plugin, #2893075: Create a ListField Field plugin, #2893076: Create a NumberField Field plugin, #2893077: Create a PhoneField Field plugin
  5. Adapt d7_field and d7_field_instance migrations to use this new infrastructure. #2893061: Create a ProcessField plugin to process the field types
  6. Finally, circle back to the Text field plugin and have it implement the logic it needs in order to determine the proper field type to use (or throw a MigrateException if it cannot).

I'll create new issues (and reference them here) in the order above and as the work advance.

krystalcode’s picture

A couple of comments in the approach, I've looked into the D7 to D8 case.

1. Due to the complexity of the problem and the various cases that arise, I think that as a site maintainer I would like to be informed and confirm the action taken. It would be great if we would ask the user to confirm the choice made automatically, and choose another one from a list of eligible options. For example:

The field "body" of type "Long text" of the "node" entity, bundle "Page" will be automatically migrated to a field of type "Text (formatted, long)". Please press enter to proceed, or choose an option from the list.
1. Text (formatted, long)
2. Text (plain, long)
3. Skip this field, I will do a custom migration

If that is possible on a per-bundle case, it would also solve the issue where the same field has different settings for different bundles or entity types. Unless the way migrate proceeds does not allow for that.

2. As pointed out, in D7 whether a text is formatted or plain is stored as field instance setting. However, this means that it can be changed at any time, for whatever reason, without altering the data structure. Imagine the following scenario: I have a blog where users can post comments. I initially allow users to use a rich text editor for the comments to enter very basic HTML, such as bold and italic text. Later, I figure out that users actually don't use it, or there is a redesign of the site that removes the rich text editor, and I simplify comments by changing the comment body field instance setting to allow only plain text. It could be the other way around, from plain text to formatted.

Now, old comments may or may not display properly depending on their contents and the text formats, but that's for the site owner to fix. The issue with the migration is that there will be information loss: if we migrate the comment body that is currently in plain text from Long text to Text (plain, long), the information about the original format for old comments will be lost and it will be more difficult to fix the problem later.

In that case, I would at least like to be notified about the problem before migrating. If we cannot implement the interactive choice as described in (1), I might prefer to not do the migration automatically, solve the issue first, and try again or do the migration myself. In the scenario above, we could easily detect if there is such inconsistency by looking if there is at least 1 record in "field_data_***" table where the format column is not NULL, or is NULL in the reverse scenario.

I don't know if there would be many installations that could have this issue, but since this is a potential information loss I thought it should be discussed. Not sure if there is any security concern in the case of switching from plain_text to formatted, I guess if there is it would be existing in the D7 site in the first place.

3. Regarding the scenario where we have a text_with_summary field using plain text. Would it be an option to open an issue at the Field module to provide such a field by default (string, plain, with summary)? If not, would it be in scope for the Migrate module to provide one to cover this case?

masipila’s picture

I'm not the maintainer of the migration module, but a couple of comments to the remarks of #108. The module maintainers can of course comment on their behalf if they see things differently.

1. Due to the complexity of the problem and the various cases that arise, I think that as a site maintainer I would like to be informed and confirm the action taken. It would be great if we would ask the user to confirm the choice made automatically, and choose another one from a list of eligible options.

As far as I know, we don't have this kind of user interface capabilities in the migrations. I believe the intended usage is that if there are more complex migration scenarios where this kind of decision making logic would be needed, then the site builder needs to define those rules in a custom migration. (In this specific scenario, the easiest "custom" migration would be to create the text fields manually to D8 and then only migrate the content to those manually created text fields).

2. As pointed out, in D7 whether a text is formatted or plain is stored as field instance setting. However, this means that it can be changed at any time, for whatever reason, without altering the data structure. Imagine the following scenario: I have a blog where users can post comments. I initially allow users to use a rich text editor for the comments to enter very basic HTML, such as bold and italic text. Later, I figure out that users actually don't use it, or there is a redesign of the site that removes the rich text editor, and I simplify comments by changing the comment body field instance setting to allow only plain text. It could be the other way around, from plain text to formatted.

I believe that the issue you're describing would affect the original D7 site way before you would run into the same dilemma during the migration to D8. In my opinion, the migration of field storages / field instances from D7 to D8 can only take the current D7 field storage / instance settings into account.

3. Regarding the scenario where we have a text_with_summary field using plain text. Would it be an option to open an issue at the Field module to provide such a field by default (string, plain, with summary)? If not, would it be in scope for the Migrate module to provide one to cover this case?

I don't think that it is the scope of the Migrate module to start adding new field types to core. The D7 to D8 is only one migration path that this core module supports. When we're talking about migrations from different systems (even different major versions of Drupal), they will always have bigger or smaller conceptual differences.

---clips---
Just in case this helps someone else, here's how I worked myself around this and a couple of other issues in my D7 to D8 migration.

  • I wanted to have my single-line textfields in D8 as Text (plain). Currently, the Migrate modules are migrating these to D8 as Text (formatted).
  • Instead of migrating, I created my single-line textfields to D8 manually.
  • I wrote a small D8 module and implemented hook_migrate_prepare_row() as below:
<?php

/**
 * @file
 * Migration hook implementations.
 */

use Drupal\migrate\Row;
use Drupal\migrate\Plugin\MigrateSourceInterface;
use Drupal\migrate\Plugin\MigrationInterface;
use Drupal\migrate\MigrateSkipRowException;

/**
 * Implements hook_migrate_prepare_row.
 */
function MYMODULE_migrate_prepare_row(Row $row, MigrateSourceInterface $source, MigrationInterface $migration) {
  // Special handling for Field Storages and Field Instances.
  if (in_array($migration->id(), ['upgrade_d7_field', 'upgrade_d7_field_instance'])) {
    // Get the name of this field.
    $field_name = $row->getSourceProperty('field_name');

    if (in_array($field_name, ['field_foo', 'field_bar', 'field_baz'])) {
      $message = "Skipping field: " . $field_name;
      drush_print($message);
      throw new MigrateSkipRowException($message);
    }
}

This simple hook implemenation skips fields field_foo, field_bar, field_baz in both field storage and field instance migration. This is OK as I created these field manually to D8.

Cheers,
Markus

Version: 8.4.x-dev » 8.5.x-dev

Drupal 8.4.0-alpha1 will be released the week of July 31, 2017, which means new developments and disruptive changes should now be targeted against the 8.5.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

xjm’s picture

Thanks @masipila and @maxocub!

We generally shouldn't have a critical issue postponed on a non-critical issue. If an issue must be fixed in order to also fix a critical, then the blocking issue itself is critical too. The other possible scenario is that the critical issue is hotfixed and then we fix it the "right" way in non-critical followups. It depends on what the fastest way to fix the critical is.

For now, I have promoted #2891935: Refactor D7 FieldInstance source plugin and other field instance related source plugins, which was already RTBC.

catch’s picture

Status: Postponed » Needs work

Just committed the blocker so un-postponing.

maxocub’s picture

Sorry, but this is also blocked by another one: #2893061: Create a ProcessField plugin to process the field types, which was also postponed, so I unpostponed it and bumped it to critical too.

catch’s picture

Title: D7 Plain text fields incorrectly migrated to D8 as Text (formatted) » [PP-1] D7 Plain text fields incorrectly migrated to D8 as Text (formatted)