This is a sub-issue of #1910624: [META] Introduce and complete configuration schemas in all of core.

Problem/motivation

#1866610: Introduce Kwalify-inspired schema format for configuration introduced the idea of config schema and documented at http://drupal.org/node/1905070.

Proposed solution

  1. Break up migration_dependencies into migration_dependencies:required and migration_dependencies:optional. More config schema friendly, more user friendly. Win all over. #2263453: Split migration_dependencies into two keys
  2. Provide a schema per load, source and destination plugin. Having source schema per plugin allows to mark certain constants as translatable labels.migrate.destination.* needs to only contain
    plugin:
      type: string
    

    and then migrate.destination.entity:user and migrate.destination.entity:file will need a little more.

  3. Make process a type: property and document that if you have a translatable thing then don't put it in the YAML rather write a custom plugin and use t(). This will be insanely rare as constants are in the source configuration anyways which, as the previous point states, will have its own schema.
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

chx’s picture

Issue summary: View changes
Gábor Hojtsy’s picture

Gábor Hojtsy’s picture

We could theoretically change the migration config structures so the process key would contain a mapping with type info, so we can provide schema. Would that then help for further levels of the migration structure?

chx’s picture

Erm what? The migration process is structured the way it is because we want to allow easy manual migration writing which I suspect will be common. That was our goal, the ability to provide a config schema was and still not. If the config schema can't accommodate what we are doing then ... tough luck, I guess.

Gábor Hojtsy’s picture

Erm, what? The config schema feature set is structured the way it is because we want to allow easy manual config schema writing which I suspect will be common.

Looks like two to be manually written format requirements collide here. If we are to cover migrations with config schemas, that is. Is there a translation use case or is there going to be a tool that needs to ensure no random data type changes happen? Both would point to needing to add a schema. Do we want to display migrations in English regardless of everything else translated because we want the migration be easy to write by hand? Is developer easiness a priority over user facing features? Does keeping varying typing in only migrations while no other part of config would use that at least in core actually easy for developers?

chx’s picture

There's nothing translatable in migration; Mike Ryan wants to add labels but those we can cover; it's the process key we can't (? or just don't know how) cover and that is sheer machine configuration.

Gábor Hojtsy’s picture

There's nothing translatable in migration; Mike Ryan wants to add labels but those we can cover

There is already a top level label at least, eg. "Drupal 6 blocks", "Drupal 6 comments", we are not displaying this on the UI or we are to display it in its English original at all times?

it's the process key we can't (? or just don't know how) cover and that is sheer machine configuration.

Why is it a config entity then? I suspected it would be so a possible contrib module can be created that would serve as a wizard to create/edit migrations. So long as the config entity is edited and saved back, if it lacks schema there is nothing that would ensure that values like false or 12 retain their types, most likely they would end up as strings as they come back from the form, so the file would change even at places where it is not changed resulting in deployment changes that are not actual changes. Which is the whole reason we are adding config schemas to all config in core which will be edited/deployed in any way.

If migrations are not to be edited/deployed then why are they config entities? If they are then why don't we care for types to retain themselves? Or is that hypothetical contrib module going to need to manually typecast in PHP for each key so it works well? Is that the plan?

chx’s picture

  1. Oh labels are already in? Great!
  2. They are configuration entities because we do want to deploy them. Write them on development , deploy to production.
  3. It is, in fact, impossible to type check process, or so I think? Even we amend plugins to give us config schemas, the default_value plugin won't be able to give you a definite answer cos the default value, well, that can be quite anything. StaticMap, well, the mapping often results in strings or integers but I can't rule out it might need arrays. What the heck does Views do?
Gábor Hojtsy’s picture

Views uses schema's dynamic typing. It has two ways to handle dynamic types.

1. Either the substructure is a mapping, and then it has a key to define its type. eg. the following is a sequence of unknown types, where each item has a dynamic type from the views.area.* type list. The specific type is then identified by the value of the plugin_id key in the data. So each "unknown type" is mandated to be mapping based and should have a "plugin_id" key which is a string value which is loaded to identify the type for the whole item.

       type: sequence
       label: 'No results behavior'
       sequence:
         - type: views.area.[plugin_id]

2. The other option is the container has type information for part of the data structure. Row plugins use this for example. Rows have common keys like type and provider but their options are of "unknown type". In this case the type of the data in options depends on the type of the container, eg. in the archive view that ships with D8 it is type: 'entity:node' so the options are typed views.row.entity:node which is defined by the node module of course.

    row:
      type: mapping
      label: 'Row'
      mapping:
        type:
          label: 'Row type'
        options:
          type: views.row.[%parent.type]
        provider:
          label: 'Provider'

So in summary either the parent container has some type info which is for use cases of common keys with some parts that are type specific. In this case the substructure can be of any type, it can be a string or anything else. Or the subtype needs to be a mapping and should have a common type identification key, that was the first example.

If your container has no info about the data stored then to satisfy the schema you would need to add a level of mapping container with a type key that can be used to tell the type. Eg. the following can be described with schema:

something:
  data_type: int 
  value: 12

something:
  data_type: values
  value:
    - 1
    - 5
    - 4

The gist of the schema would be:

something:
  type: mapping
  mapping:
    data_type:
      type: string
    value:
      type: something_type_[%parent.data_type]

something_type_int:
  type: integer

something_type_values:
  type: sequence
  sequence:
    type: integer

The following cannot be described with a schema because there is no indication of type that we can take. We can only take indication of dynamic type from a mapping key's value. If we would reverse-engineer the type from how the data *looks like* then we cannot really do data type enforcement because a boolean turning to be an integer is totally valid.

something: 12

something:
  - 1
  - 5
  - 4

Even shorter, it either needs a mapping based container with type info or itself needs to be a mapping with a type key where the dynamic data type can be derived from.

chx’s picture

So I talked to Gabor and he says that instead of

plugin: default_value
default_value: 15

we should have

plugin: default_value
default_value: 15
type: integer

Well then. I am unfollowing the issue.

Edit: before we make migrations not config entities, I'd like to note we want to add groupings to them and use entity query to find "Drupal 6 configuration and such". Just to make it more fun, the value of the group key will either be a string or a list of groups and I will finish #2248567: Config entity query doesn't work when matching an array property to make the queries work. Seriously, if config schema can't accommodate such freedoms then the config schema is broken and needs to be fixed.

chx’s picture

Issue summary: View changes
chx’s picture

Title: Provide configuration schema for Migration module » Make configuration schema flexible
Component: migration system » configuration system
Priority: Normal » Critical
Gábor Hojtsy’s picture

Status: Active » Needs review
FileSize
1.71 KB

Sincerely I did not expect adding a type key is so radical.

As I wrote above If we would reverse-engineer the type from how the data *looks like* then we cannot really do data type enforcement because a boolean turning to be an integer is totally valid.. Not sure how the config schema would be able to describe a data type that has no "external" information about its type but require introspection of the data itself for its type. That cannot be used to enforce types as the type of the data may change. Eg. boolean to string to int. Without specific info outside of the boolean/int/string, schema cannot tell if it should be a boolean/int/string by inspecting the value only.

I was working on a minimal schema attempt. I'm not sure the schema tests will not give up with the unknown type use, since it may be used to in fact check for missing schema, but since I did this locally I would not throw away the patch.

chx’s picture

Title: Make configuration schema flexible » Provide configuration schema for Migration module
Component: configuration system » migration system
Issue summary: View changes
Priority: Critical » Normal
FileSize
720 bytes
1.71 KB

I am fine with that.

However, I'd like to note that type: unknown does not appear in this issue as a recommendation -- which would have decreased tensions a ton -- nor https://drupal.org/node/1905070.#types here while the latter mentions type: undefined so either the patch is wrong or the documentation is. core.data_types.schema.yml uses undefined so I suspect that's the correct one. Running ag -G Typed undefined and ag -G Typed unknown pretty much nails it: it's undefined.

The last submitted patch, 13: migration-minimal-schema.patch, failed testing.

Status: Needs review » Needs work

The last submitted patch, 14: 2183957_14.patch, failed testing.

Gábor Hojtsy’s picture

Status: Needs work » Active

Yeah typing with unknown does not appear as a recommendation because it is not a recommendation. I wanted to see how the schema test is going to fail (or not) on this. The unknown type is used a a fallback type and apparently the schema iterates into the tree even if its unknown type and then fails even more. One could argue this is a feature because you get to see all fails at once instead of just seeing them level by level as you resolve them.

So defining the type as explicitly unknown does not really help with this. Obviously a theoretic option is to introduce an ignore type or something along those lines, but I doubt that gives the right idea to schema writers in general. I think the need to use an ignore type again underscores how strange it is for migrations to be config entities. You said this is for deployment but you also said you see no way a wizard/UI built for this ever and that these would not be loaded and saved back. For deployment, any other plugin yml is totally fine, local tasks, actions, etc. deploy the same way and they are hand written as well and don't need schema or exceptions in the schema format or the schema tests for them to be supported.

chx’s picture

So let me get this straight: we should write new

  1. discovery mechanism
  2. load mechanism
  3. (simple) query mechanism

just because the config schema can't skip keys?

chx’s picture

Edit: nevermind me, I will just unfollow this and every other migration issue.

chx’s picture

Status: Active » Closed (won't fix)

#2260525: Migrations should not be configuration entities is filed. I am closing this issue (and also closing the #drupal-migrate channel and altogether my work on migration).

Gábor Hojtsy’s picture

Re what used to be in #19 when you first posted it: Config schemas are not mandatory for config entities, we are implementing them as best practice. Why is that a best practice for config entities? Well, almost all of them (except migrations) are edited on a UI and are saved back. Also most of them (maybe except migration?) need translation support. When config is saved back, to avoid lots of manual code to typecast values, we use schemas to typecast them and enforce types which helps deployments include only necessary changes. Translation uses information in the schema to extract and make config available for translation.

Re #18: I guess there were good reasons local tasks, local actions, menu items, contextual links, etc. are not config entities, and they implement discovery mechanisms based on simple YAML discovery as well as their loading via the plugin system. They could have implemented themselves as config entities because why not and avoid writing some of that code. I don't claim to know all the right things, so there may not have been any good reason for them not to use config entities, but I think there were.

There are a few options here:

- Keep excluding *migrations only* in the core test that ensures all config has schema. Every single config in all their permutations will have schema except migrations. This means if there is ever going to be a tool to write schemas from code, it will need to do manual type casting for everything to ensure proper typing. You said such code is unlikely to happen anyway to write migrations so this may not be a problem. Migrations will not be translatable. If later on schemas are desired, the migration format cannot be changed due to it being an API change and any API changes needed for schemas to support the creative formats may not be possible either, so this firmly paints migrations in its pre-established corner it will stay in for Drupal 8.x.

- Keep excluding *migrations only* in the core test that ensures all config has schema. Don't expect anybody would ever write them. Add one-off custom translation support to their title only instead, ie. t() on display, potx to extract the top level title only. Again this assumes nothing ever needs to be translatabale in migrations, ie. no data needs to be included that may need to be created with localised strings. I think this is technically very simple and also very confusing, because you opt out of significant common features of configuration and do manual things instead.

- Change the migration format to conform to what we can write schema for. This would allow writing a wizard to edit/create migrations. I don't know if this is at all a realistic use case, and you said it is not. It would allow for translating the label and any possible future components that need to be translated. I don't know how realistic that use case is.

- Change config schema to support the format that migrations are using. The way I understand it, there is no data format info, so we would need to use introspection which I explained several times above is not what schemas are designed at all to do and cannot be used to enforce types. So I'm not sure this is at all possible but I only received vague examples of what might not fit with schema.

- Add ignore type to config schema and use that to ignore most of migrations for schema. This paints migrations in an even firmer corner, since once it has a schema released, it cannot have a different schema later on (API change). So it would be worse than having no schema IF there are parts that would get schemas later on. Also it sets an interesting precedent where schema authors get an easy type to pretend they have schema but they don't. I don't think that is something we want to encourage in general.

- Move migrations not be config entities. Yeah may need to write some code but it would be in line with other YAML discovery based plugins which it seems to be from how you explain it. Translate label manually, extract label for translation manually as with all other YAML discovery based plugins.

- More options I cannot think of because my mind is limited. Fill it in.

(Although looks like you already made your choice in #20. Since I spent so much time to summarise this, going to post it either way).

benjy’s picture

I guess there were good reasons local tasks, local actions, menu items, contextual links, etc. are not config entities, and they implement discovery mechanisms based on simple YAML discovery as well as their loading via the plugin system

I'd be interested to hear from someone who knows why local actions, local task etc weren't made into config entities. Maybe a comparison of that decision against migrate will make the decision to move away from config entities for us.

Gábor Hojtsy’s picture

Here is a summary as I see it. This is nowhere authoritative and please correct me where you see holes or misstatements. I think this should help make my position clearer.

* Annotation Static YAML files Configuration State
Examples Entity types, Plugin types, etc. Menu links, local tasks, contextual links, etc. Global configuration (site info), config entities (views, menus, contact categories, content types, etc.) Last cron runtime
Used to be Info hooks Info hooks Variables and custom DB tables ”Misused” variables
Target use Metainfo on something that needs custom logic, in the same file Metainfo on something that mostly does not need custom logic (although custom logic can be assigned) Site build configuration Environment specific values
Tools Annotation based discovery, alter hooks YAML based discovery, alter hooks Configuration diff/sync so changes in live/dev/staging can be synced None
What it applies to All deployments of the project the same way (except patches and local alter hooks) All deployments of the project the same way (except patches and local alter hooks) Although there is some default config, it may be deleted, edited, new ones may be created, etc. It is deployment specific. Environment specific, not synced between different environments of the same setup.
Who/what creates them Developer by hand Developer by hand, any dynamic ones are provided by PHP code (alter, dynamic providers) Defaults shipped with module, new ones saved by code, old ones changed and saved back by code From code
Translated As “software” since it uses well known “never-changing strings” As “software” since it uses well known “never-changing” strings Default config as software, since it is “never-changing”. Live config and changes to default config as itself, since it cannot be predetermined. Not applicable.

I think Migrations are targeted to be hand written, it may not be different in the dev vs. live environment by way of admin changes. They may be different by way of a code update coming. Once a module is installed, its default config is synced over to active store, and further updates to its config in its code is not synced to the active store, the site’s active config is then disconnected from what it originally was on install and lives its own life. So updates to migrations in modules would not be synced to active config either later on. Only updates made to the actual content of active config on the dev site would be synced to the live site on deployment. So migrations would need to build their own tools to resync those config entities from the install folder later on, no?

The more angles I collected for this table, the more and more confusing it became why migrations are config entities. Am I missing something very glaring? Please correct me.

Gábor Hojtsy’s picture

I also summarised what schema can currently describe to Mike Ryan over email. I think its worth recording it here as well, since its a totally general example. The key names a demonstrative for easier understanding and not to be used as-is in real life config/schemas IMHO :)

In short, schema cannot describe a data type it has no idea how to describe. If data can be either:

data: 12

or

data:
 - abc

Then schema cannot describe this. It needs some hint elsewhere or inside the data structure as to what is the type. Eg. the following works:

what_type_data_is: data_is_int
data: 12

or

what_type_data_is: data_is_list_of_strings
data:
  - abc

Then we can write schema that uses the value of 'what_type_data_is' as part of the data type name and that data type can inherit from int or list and define it as a list of strings in the later case. If there is no hint in the data, then we could only do introspection of the data, which makes it impossible to cast it to the right value, because then '0', false and 0 are resolving to string, bool and int respectively and we don't know what to cast it to, so it will not end up as number, bool or string, right?

So schema can take a hint from outside the type like that or it can take a hint from inside the type, but then it requires the type to be mapping/associative:

data:
  - value: 12
  - name: 'boo'
  - what_type_data_is: int_with_a_name

or

data:
  - date: '2014.05.08.'
  - value:
    - abc
  - what_type_data_is: list_of_strings_with_a_date

In this case the type of data is defined based on a hint *within* data. So we can reliably get that hint, it is required that a mapping/associative base type is used as a container. Other than that the container can have different keys, etc, no problem.

benjy’s picture

Thanks for the summaries.

So migrations would need to build their own tools to resync those config entities from the install folder later on, no?

Yes that is the case, I was using http://drupal.org/project/config_devel when developing real migrations to help with the syncing back but it wasn't ideal.

Gábor Hojtsy’s picture

Right, if you have manually written static YAML files that does not change on the live site and only changes with code deployment, then deployment can be easily made with the rest of the code. This is clearly mapping to the use case of YAML based things like local tasks, etc. as far as I can see. That works if migrations will not contain anything to be translated except maybe the top level title, because that can be hardcoded.

The following scenarios would be good reasons to use config entities:

- the installed version and the actual live version may be very different due to changes in the deployment
- the dev/staging/live environment copies (and instances) may be different due to admin UI changes, need syncing
- new ones are only added on install not added later on
- users are to create new ones on a UI and would export / optionally ship as part of a module later on
- translatable text may appear at various places in the tree

Given that active config is now in the database, not in files, and anyway, editing of active config is not to be done by hand, unless there is a migration wizard module to build migrations, I don't know how would migrations change through the UI. The same reason this would not happen on the UI on stage/dev either, so only code deployments would change migrations, right? No?

If that is the case, code deployments cannot change active config by design, you need that custom code to that. While code files (which YAML plugin files are) are updated with code deployments without a hitch and then with local caches updated new data is used right away. So looks like you are coding around one thing (updates with code deployments for config that it is not designed for) so you don't need to code around a different thing (grouping YAML plugin files that would need its own code)?

benjy’s picture

so only code deployments would change migrations, right? No?

Yes

If that is the case, code deployments cannot change active config by design, you need that custom code to that. While code files (which YAML plugin files are) are updated with code deployments without a hitch and then with local caches updated new data is used right away. So looks like you are coding around one thing (updates with code deployments for config that it is not designed for) so you don't need to code around a different thing (grouping YAML plugin files that would need its own code)?

Yes I agree, this makes sense to me.

I don't know how the decision was made to use config entities initially, i'm guessing because of the concept of a migration wizard. Personally, I think efforts could be better spent elsewhere, and i'm not sure how useful a wizard would be anyway.

chx’s picture

> I don't know how the decision was made to use config entities initially,

There were two things in core which bore some resemblance to the planned architecture which was a pile of plugins configured somehow -- views and blocks both of which are config entities.

It is my belief the single biggest problem of Drupal 8 is the cognitive overload, the number of patterns you need to learn and we are about to introduce a new one -- a new place to look for files, a new kind of CRUD API. I won't do that. I will let others do that which shouldn't be a problem cos noone else has a problem with that.

chx’s picture

Status: Closed (won't fix) » Active

Seems this is still under discussion.

chx’s picture

Status: Active » Closed (won't fix)

Or maybe not...

benjy’s picture

Status: Closed (won't fix) » Active

We discussed this in todays call: https://plus.google.com/u/0/events/cn5kahflq5n3kt8orp6n7hurp84 and the notes are here: https://groups.drupal.org/node/421738

Moving away from config entities would require us to re-implement quite a bit of code such as the loading/saving of migrations, querying for migration groups etc.

The biggest hinderances we discussed with staying with config entities is the development workflow which http://drupal.org/project/config_devel seems to solve quite well.

So, at this point we're brainstorming ways to provide a schema for migrations.

Gábor Hojtsy’s picture

Please update this issue as/if you have findings in that brainstorm. My main interest here was to figure out if we need to have schema API changes in core, which would then be beta blocking. If migrations are to adapt to schemas or if they don't want to have schemas or if they are not config entities, then schema API will not need API changes. Either way, any architecture decisions in migrations that may need API changes elsewhere would be beta blocking in core I think. So better figure it out sooner than later while it is still possible.

chx’s picture

FileSize
801 bytes

> My main interest here was to figure out if we need to have schema API changes in core, which would then be beta blocking.

Yes. See the attached Dictionary (hashmap, associative array, wth you want to call it). It's like a sequence but with string keys or a Mapping where I do not know the keys nor do I care for them, it's just an associative array and that's it. The Sequence is *almost* this but first() uses [0]. Also, I am not sure whether this is a bug or what but

  public function getItemDefinition() {
    return $this->definition['sequence'][0];
  }

should, in my opinion be simply

  public function getItemDefinition() {
    return $this->definition['sequence'];
  }

which then would be another API change.

Gábor Hojtsy’s picture

I think the getItemDefintion() code is a bug. I don't see anything using getItemDefintion() in config schema handling. It is used in content fields, but there is no overlap, Sequences are only used in config schema not in content at all. It uses a List type which was later introduced but did not convert Sequence to itself... Yeah. So I think this method is buggy and probably lacks tests. I don't think your use of config schema would expose this since this method does not seem to be invoked in schema handling in the first place.

As for taking the first item as a number key, I think that we use sequence type for all kinds of things where we don't care for keys and all items use the same type. I THINK we use it for key based lists where we don't care for keys. I don't have specific examples, @vijaycs85 may :) So why does that work if first() does not work? I would not be surprised that first() is not used either in config schema. A quick grep gives me the idea that we only implement that also to conform to some typed data interface but don't actually use it in schema parsing.

So looks like you found two bugs in unused code :)

chx’s picture

Issue summary: View changes
mikeryan’s picture

Sorry for my absence from this discussion - I'm currently on a semi-vacation with limited windows for work.

I'd like to step back and look at things from a higher level (not least because I don't have a full grasp of the low-level at this point;). Let's look at some of the use cases for the migration API:

  1. Upgrades from Drupal 6 and Drupal 7, which will ship with core. These migrations will be implemented in hand-written YAML files - core will not ship with a general CRUD UI for these. The upgrade UI that will be part of core (see #2200379: Upgrade UI) will prompt for credentials for the source database, and location of the source files (images/attachments/etc.). Since the file migration requires the source file location, right now the UI loads the file migration configuration entity, sets the source file location, and saves it to the active configuration.
  2. Contrib modules will be expected, in lieu of the traditional update hooks, to provide migration configuration to handle their data (plus custom plugins where necessary).
  3. A contrib module providing a general CRUD UI for migration, with at least the current functionality of Migrate 2.x's migrate_ui module on Drupal 7 - a dashboard for managing migrations and a field mapping editor - plus the ability to create migrations from scratch (such as by creating a view and specifying it as the query for a Sql source plugin). This would, among other things, allow customization of the core D6/D7 upgrade process - omitting vocabularies you don't want to keep, remapping content types and fields, etc. So, this module would be creating and deleting migration configurations, as well as modifying them from the defaults shipped in YAML.
  4. A contrib module or modules providing additional plugins (e.g., source plugins for XML and JSON).
  5. A contrib module providing upgrades/imports from D5 and D8.
  6. Contrib modules supporting imports from other systems. For example, wordpress_migrate would (as it does now) prompt for the source Wordpress site dump, offer some configuration options, and create migrations from that, which subsequently can be updated and deleted.
  7. Custom modules implementing specialized migrations (here's my bread-and-butter!). These would largely be hand-coded YAML files, with some custom source/destination plugins, but it will be handy to be able to quickly modify migrations through the UI to test things out.

So, I want to be sure the core migration API, even if it doesn't directly support all these use cases, at least provides a foundation that can be cleanly extended in contrib to support them. If configuration entities, without *too* much tweaking, will fit the bill, I'm all for that. If we need to do a custom thing here, I can deal with that too.

chx’s picture

Issue summary: View changes

Configuration entities sure support these; in fact if you ever want to save these things via a UI that's a strong case for not going custom.

About a UI, while a field mapping editor is certainly possible it doesn't even begin to scratch the capabilities of migrate; anything more useful is going to be very interesting to write a UI for; although Views UI might provide a clue to us. You'd need to pick a plugin for each field and configure said plugin which smells much like what Views does. Anyways, yes, configuration entities surely can do all that.

vijaycs85’s picture

Here is the initial approach that would work after few changes like:

  1. Introduce plugin for process
  2. Update dependencies structure (Added suggested structure as comment)

Still needs a lots of work to ask testbot to have a look. So no status change for now :)

chx’s picture

Thanks so much! I really love the new multilevel dependencies structure you suggest. One note, however:

+#    process:
+#      type: migrate.process.[plugin]
+#      label: 'Process'

Do not even try. Process is hopeless.

    process:
      type: sequence
      label: 'Process'
      sequence:
        - type: property

and be done.

benjy’s picture

I wrote a script to automatically create the new dependencies structure and updated MigrationStorage. Created a new issue here: #2263453: Split migration_dependencies into two keys

Gábor Hojtsy’s picture

Status: Active » Needs review

#2264179: Clarify use of property/undefined and add an ignore type in configuration schema was committed, you should be able to use the ignore type now.

vijaycs85’s picture

chx’s picture

thanks for the patch

core/modules/migrate/config/schema/migrate.process.schema.yml
and
+migrate_process:
+ type: sequence

are unnecessary

vijaycs85’s picture

Thanks @chx. Yep, we don't need them anymore.

chx’s picture

Not quite sure what's the next step is. Either remove the commented out parts or add them for real but we can't add commented out pieces to core.

tstoeckler’s picture

So I think some important parts have been left out of this discussion. I had discussed this problem-space rather briefly with @chx and @Gábor Hojtsy at Drupal Dev Days Szeged and came to a somewhat different resolution that has not been discussed above. Only now did I see this issue.

Contrary to what is stated above, migrations do in fact have a consistent data structure that we could write schema for. It is just that that structure is so verbose (to allow for an incredible amount of flexibility) that simple mappings i.e. from D6 variable name to D8 config name would be such a PITA to write that way that we allow a shortened, irregular structure as well. But at its heart the process part is an array of arrays of process plugins (or similar...). It is because we copy the configuration straight into the active store without processing that the irregular data structure ends up in the active store as well. There it would potentially be validated against schema (which is impossible, as mentioned above). But in fact the migrate code has to account for the multitude of possible array structures as well, so it could itself benefit from a normalized, consistent data structure.

So we could in fact solve both problems by adding some sort of "process" layer when importing configuration into the active store. Migration config entities could then normalize the process data structure into the verbose form. For that we (or at least @vijaycs85 :-)) could fairly easily write schemas and the migration code would be happy to not have to perform the normalization itself.

I'm not sure what exactly such a "process" layer would entail and if that would in fact be a can of worms, so I do not want to advertise this as the holy grail here, but it should be mentioned as a possibility at least and personally I think it would be worth exploring.

vijaycs85’s picture

Thanks for your comment @tstoeckler. We have had very looong discussion about 'process' and end up with 'ignore' as:

1. we might not have array of array all the time.
2. We don't really need granular schema for process (neither for translation nor for validation).

However, if we believe that ignore isn't smart approach and worth further investigation,IMHO, we should get this in as MVP (minimal viable product) and discuss and implement 'process'.Ideally, it will be few lines of code change and lots of new schema files/parts addition?

chx’s picture

While adding a normalizing method to the plugins are not totally impossible you will get into some trouble loading the migrations without a source connection specified. Don't forget that process is recursive thanks to the iterate plugin so some complexity is expected there. Also on export you'd end up losing the useful shorthands. And, you will win nothing cos the schema is useless in this case anyways.

Gábor Hojtsy’s picture

@tstoeckler: yeah the information we got from migrate people and @chx in particular is it is impossible to write schema for certain parts of the migration structure. Without further studying if that claim is true or not we worked from the information we had. Normalizing the migration structure in the DB just so it can apply schema sounds dangerous. Eg. schema could not be applied without the normalization process (eg. on localize.drupal.org). Also one of the main uses of schema would turn upside down. We use schema to cast values to the right types, so UI based changes do not end up with accidental type changes, so deployments only show actual changes. If we are to normalize the structure in active config, then the export would show a lot more (non-actual) changes, so the whole point of using the schema is defeated.

Looks like the only two options we have/had is to either enforce only one possible structure of migrations, which @chx claims would severely cripple the hand-writing DX of migrations or ignore parts of migrations for the schema, which is the road we have been on so far due to all the other things considered.

benjy’s picture

enforce only one possible structure of migrations, which @chx claims would severely cripple the hand-writing DX of migrations

This would surely cripple the hand-writing DX of migrations as well as meaning we'd need huge schema files to handle every single migration field mapping. It would be a maintenance nightmare IMO.

I'm all for getting this issue committed and talking about process layers and normalisation in another issue.

tstoeckler’s picture

I'm not convinced yet that we can't figure this out properly :-), but I agree that leaving process to a separate issue makes sense.

Regarding setting a bad precedent with ignore we could just add a comment in the YAML to denote that this is an edge case and do not try this at home.

Gábor Hojtsy’s picture

I'm not clear on the data model freezing of migrations given we don't consider migrations beta or RC or release blocker to begin with but for all other things at least since the schema is the data model description, changing schemas like type: ignore to a whole tree of types is IMHO considered a data model change. That is one of things I said above about committing to one schema approach for migrations, that may lock migrations in that data model/schema for a while. Once again I'm not sure when that lock may be effective on migrations given their (non-)relation to the release timeline. Just consider that when agreeing to interim schemas that may change later on. Already explained this in #21 though, so no news.

Gábor Hojtsy’s picture

BTW I did not state that to freeze discussions, I stated that to help you estimate consequences of the decision.

Gábor Hojtsy’s picture

@chx pinged me on IRC and shared this note:

[10:02am] chx: GaborHojtsy: i do not expect to find bugs so serious in migrate as to change the migrate schema
[10:02am] chx: GaborHojtsy: the architecture is very solid and havent changed in a very long time

That addresses my concern. So what about fleshing out the remaining pieces and get this done? :) Who can help?

chx’s picture

I will just very busy because there's a drupalcon the weekend and i have a client site launch on Saturday too. AAAAAAAA. benjy has a brand new v2.0 to take care of. We are a bit short on hands.

vijaycs85’s picture

FileSize
3.88 KB
2.71 KB

I guess only concern that need to be fixed is #46. Here is an update for it. We don't need those commented items as they don't have anything special and would be covered by migrate.source.*: & migrate.destination.*:

Gábor Hojtsy’s picture

Issue tags: +Needs tests

Let's add a ConfigSchemaTestBase based test then to ensure it works? :)

vijaycs85’s picture

vijaycs85’s picture

Issue tags: +LONDON_2014_MAY
Gábor Hojtsy’s picture

I would love to review this but have no idea what kind of values may be at what kind of places. All I could review is if there are typos in the type names and such. I think someone with basic know-how of the schema types and knowledge of the migration structure would be good to review. @penyaskito maybe? :)

penyaskito’s picture

Issue summary: View changes

Updated issue summary.

penyaskito’s picture

Status: Needs review » Needs work

I re-read the whole issue thread and updated some points on the issue summary.

The conflict discussion was only how to make schemas for process, which looks really painful, and probably not worth it (I think this is @chx vision too if I understood well his points). Also anyone else agreed with defering that to a follow-up if needed.

I reviewed the schemas, and from my limited knowledge on those, they make sense.
Tests are added in the patch and validate that migrations are conforming to the schema.

+++ b/core/modules/migrate/config/schema/migrate.schema.yml
@@ -0,0 +1,37 @@
+    migration_dependencies:
...
+        required:
...
+          sequence:
+            - type: strting
...
+          sequence:
+            - type: strting

There is a typo here. When fixed, I guess this is RTBC.

vijaycs85’s picture

Status: Needs work » Needs review
FileSize
10.3 KB
695 bytes

Thanks for the review @penyaskito. updated them.

penyaskito’s picture

Status: Needs review » Needs work

Oops, sorry for missing this... Shouldn't \Drupal\config\Tests\DefaultConfigTest::testDefaultConfig be modified to for removing the exception?

chx’s picture

process: ignore is correct and there's nothing else to be done, no followups, no changes, nothing.

benjy’s picture

How did the issue in #63 not cause a test to fail?

vijaycs85’s picture

Status: Needs work » Needs review
FileSize
11.11 KB
828 bytes

Oops, sorry for missing this... Shouldn't \Drupal\config\Tests\DefaultConfigTest::testDefaultConfig be modified to for removing the exception?

- Good point removing here.

How did the issue in #63 not cause a test to fail?

- Current code base, we don't strictly check and throw exception. only place is DefaultConfigTest which @penyaskito mentioned in #65.

Status: Needs review » Needs work

The last submitted patch, 68: 2183957-migration_config_schema-68.patch, failed testing.

vijaycs85’s picture

Status: Needs work » Needs review
FileSize
19.93 KB
10.95 KB

This patch has:

1. fix for all test fails
2. changed dependencies in core/modules/migrate_drupal/config/install/migrate.migration.d6_contact_settings.yml to migration_dependencies

Status: Needs review » Needs work

The last submitted patch, 70: 2183957-migration_config_schema-70.patch, failed testing.

vijaycs85’s picture

penyaskito’s picture

Status: Needs review » Reviewed & tested by the community
+++ b/core/modules/migrate_drupal/config/install/migrate.migration.d6_contact_settings.yml
@@ -15,5 +15,6 @@ process:
-dependencies:
-  - d6_contact_category
+migration_dependencies:
+  required:
+    - d6_contact_category

Detecting this is quite a good proof that the schemas are helpful.

RTBC if green.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Great work!

Committed and pushed to 8.x. w00t!

  • Commit d5bb653 on 8.x by webchick:
    Issue #2183957 by vijaycs85, chx, Gábor Hojtsy: Provide configuration...
Gábor Hojtsy’s picture

Yayyay! Now back at #2183983: Find hidden configuration schema issues to figure out if there are other systemic holes in schemas (or any minor ones for the matter :)

vijaycs85’s picture

Just for the record, my answer for @benjy question (at #67) is wrong in #68. The real reason/problem is raised by @Gábor Hojtsy at #2293063: Config schema mapping parsing is inconsistently forgiving, does not conform to the interface. When we get undefined type (in our case 'strting'), we just assume it as type 'undefined' and silently move forward.

vijaycs85’s picture

Gábor Hojtsy’s picture

You know a fantastic side effect of these schemas is once you test migrations with the schema it finds tests that are not testing the right values and migrations that are migrating to settings that don't exist, see #2293419: Add config schema test to all configuration test in migration, fix bugs (also opened by @vijaycs85).

andypost’s picture

+++ b/core/modules/migrate/config/schema/migrate.source.schema.yml
@@ -0,0 +1,388 @@
+migrate.source.d6_comment:
...
+migrate.source.d6_comment:

how does duplicated keys works?

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.