Follow-up to #2462233: Migrations should not use the configuration entity system (borrowing some of alexpott's commentary)
Problem/Motivation
The migrate_drupal module, when enabled, installs almost 100 configuration entities supporting migration from Drupal 6. When Drupal 7 support is committed, that will double. And eventual contrib support for migration from other Drupal versions, if it follows this model, will inflate the numbers further. This presents the following problems:
- As installed, these configuration entities are incomplete - they cannot be actually used without adding configuration for the source database.
- On any given site, only a subset of the config entities will ever be used. In a straight upgrade scenario, only those applicable to the source site's Drupal version are needed, and among those only the ones with applicable sources and destinations (e.g., the book-related config entities are only needed when the book module is enabled on both the source and destination sites).
- They pollute config export - putting them into your site config git repo - does that make sense
- Contrib tools for managing migrations will operate on fully-configured (runnable) migrations - if active configuration is full of what are really hypothetical migrations, they will need to deal with filtering them out somehow.
Proposed resolution
Implement a configuration template system - a means of managing default configurations which are not in the site's active config but can be discovered and used to build configuration entities that will be further configured and saved dynamically to active configuration. Use this template system in migrate_drupal.
The way I visualize this working with the upgrade UI is:
- User enters credentials for a Drupal database, and the site address (needed for finding public files), and submits the form.
- migrate_upgrade identifies the version (let's say Drupal 6) and queries the template system for relevant templates (currently, that means those with migration_tags containing "Drupal 6").
- (postponed) It creates a migration group containing the db credentials, and constructs the migration configuration entity for each template pointing it to the group.
- At least for now (pending either group support or a state-based means of saving db credentials), the db credentials are merged into each migration.
- The site address is set as the source_base_path for any file migrations.
- For each migration, if its requirements are met (e.g., for the d6_book migration, the book module is enabled in both the source D6 and destination D8 sites), it is saved to active configuration. Otherwise, it is discarded.
- The migration process is run for all migrations in active configuration assigned to the newly-constructed group.
- Once the migrated content is validated, migrate_upgrade and migrate_drupal can be uninstalled, uninstalling the associated configuration entities.
Remaining tasks
- Proposed solution is awaiting RTBC and commit.
User interface changes
N/A
API changes
Service migrate.template_storage added, with a findTemplatesByTag($tag) method which retrieves all templates matching the specified tag (having it under the 'migration_tags' key) indexed by template name with values consisting of the parsed template.
ConfigInstaller::validateDependencies() made public; parameter $additional_config added (names of templates being instantiated in the same operation, so config dependencies among them are resolved) - this allows consumers of the template service to validate prospective migration entities before attempting to create them.
Comment | File | Size | Author |
---|---|---|---|
#57 | interdiff-2463909-29-57.txt | 39.68 KB | phenaproxima |
#57 | 2463909-57.patch | 85.12 KB | phenaproxima |
#53 | interdiff-2463909-39-53.txt | 775 bytes | phenaproxima |
#53 | 2463909-53.patch | 51.12 KB | phenaproxima |
#39 | interdiff.txt | 3.52 KB | mikeryan |
Comments
Comment #1
dasjowould #2431275: Move optional configuration in config/install to config/optional be a solution for this?
i understand that we don't want to have all possible migrations pollute the configuration system. basically unless you decide you want to migrate something (being d6 config, d7 config, ...) the system shouldn't add active configuration.
Comment #2
mikeryan@dasjo: The optional directory doesn't really address this - it's not dynamic enough. For example, one use case which I didn't mention is instantiating multiple migrations from one template - e.g., in a custom Drupal-to-Drupal migration, instead of one d6_node migration that gets all the nodes across all the content types, you would want to instantiate an instance for each specific content type you want to migrate, customized with field mappings specific to that content type.
In the Migration class:
So the usage would be something like:
Comment #3
chx CreditAttribution: chx at MongoDB commentedThe database credentials ought to go into state and injected on load time instead of saving them into every migration. We already have a query system for config entities. Once D6/D7 is migrated, uninstall the module, your migrations are gone. I thought we already agreed on all this? Why do we have another issue?
Comment #4
mikeryanDatabase credentials would go into the group, not into every migration (what does go into every migration is a reference to the group).
Yes, we do have a query system for config entities which I'd love to leverage - without having to put every potential migration into active configuration.
https://www.drupal.org/node/2456261#comment-9790273 may provide better context. My main concern here is category #2 ("framework for a class of migration") - I'm more interested in migrate_drupal as a framework for general Drupal-to-Drupal migrations than in the one-click upgrade path. I want it to be easy to take the default configurations from migrate_drupal and instantiate them multiple times (when importing multiple sites, within site having multiple custom node migrations, etc.). Similarly for other frameworks - wordpress_migrate provides default mappings that can be configured for particular import scenarios.
Comment #5
mikeryan@benjy points out that @alexpott's patch at https://www.drupal.org/files/issues/2462233.6.patch should point the way towards implementing queryability of templates. Don't know if I'll get a chance to work on it myself soon, I definitely won't have time for D8 work next week, so if someone else wants to take a crack at it...
Comment #6
mikeryanGetting close on this... The approach I'm working on is based on the assumption that we don't really need full-featured queryability here, that for the migrate_drupal use case (and for use cases I can anticipate in the contrib/custom world) we only need to query on the presence of a single tag (e.g., 'Drupal 6' or 'WordPress'). So, the changes you can expect to see would be:
getAllTemplates()
method to pick up YAML config from module migration_template directoriespublic static function templateQuery($tag)
to the Migration class, calling that getAllTemplates and looking forin_array($tag, $template['migration_tags'])
That part seems to work fine on its own, working on making use of it in migrate_upgrade, got a bit of a chicken-and-egg issue dealing with dependencies that I probably won't get back to til Monday.
Comment #7
mikeryanHere's a patch I have working with migrate_upgrade now - the actual code isn't all that much. Modifying the existing tests to work with this will be a bit more...
Comment #8
mikeryanUpdating the tests wasn't nearly as hairy as I feared (i.e., I didn't have to edit all of them). Ready for review.
Comment #9
mikeryanA patch making use of this functionality for the UI and migrate-upgrade drush command will be posted at #2506501: Support migration templates.
Comment #10
phenaproximaShould this be on the Migration class? I feel like this maybe should go on the MigrationStorage handler, since it's about loading meta-migrations. Then again...Migration::load() exists, but that's to load a single migration, whereas this method has a wider reach.
I also feel like this should be renamed. I find templateQuery() kind of vague. Maybe searchTemplatesByTag() or similar?
Except for the config.storage service, all the arguments of this call are ExtensionInstallStorage::__construct's defaults. Why repeat them? The constructor only calls parent::__construct() anyway, so another option is to just delete ExtensionInstallStorage's constructor entirely.
If we keep this method for some reason, the doc comment should be {@inheritdoc}.
Comment #11
mikeryanWould it make sense to go in ExtensionInstallStorage? I have no strong opinion on where it should live, whatever the community thinks...
templateQuery() does seem more like a more full-featured query function, doesn't it? OK, maybe findTemplatesByTag() (I'd rather find than only search;)?
True, we can at least let the last two args default.
The fact that $directory in particular has a default value implies it may sometimes be omitted, and if we construct migrate's ExtensionInstallStorage without the argument we want to make sure it has the right directory default. I'm not sure if that is something that happens IRL though...?
Comment #12
ultimikeI chatted with mikeryan and phenaproxima on IRC about this patch, and I'm definitely impressed at the small size of it. Quite elegant. Great job, Mike - we're lucky to have such big brains working on this stuff.
I'm glad that this won't add any additional overhead (just slightly different overhead) to the use case of having to customize an existing migration config while also enabling future extensibility for custom migrations.
I much prefer
findTemplatesByTag()
overtemplateQuery()
.I'm +1 for this!
-mike
Comment #13
chx CreditAttribution: chx commentedWhat happened to groups? The issue summary mentions them quite a lot, #4 said:
> Database credentials would go into the group, not into every migration (what does go into every migration is a reference to the group).
but that's the last time any issue comment or patch contains the word "group". So either the patch is incomplete or the issue summary needs upgrade or the issue should simply be won't fixed (which I am not doing since I am not a migrate maintainer, I am just expressing a preference).
Comment #14
benjy CreditAttribution: benjy at CodeDrop commentedI agree with #10.1 we should move tempalteQuery() to MigrationStorage. From there I think we should instantiate the ExtenstionInstallStorage in the constructor and inject the rest of the dependencies like the config storage.
#10.2, The constructor is overridden so we can provider our own defaults, see the MIGRATION_TEMPLATE_DIRECTORY is the default now.
The last three params are all defaults.
I wonder if we should use 'migration_template' inline with core that has install, optional and schema.
Will this do discovery of themes, profiles and modules? Eg, could we make it faster with a cut down version?
This could be replace with:
list($provider, $name) = explode('.', $full_name, 2);
It would be nice to have a few more migration templates in the test module, and then test that the templateQuery() is actually doing some filtering. Right now it could just be returning everything and this would still pass.
Other than that it's looking pretty good, amazing how simple it turned out!
Comment #15
mikeryan@chx: About groups, #2302307: Support shared configuration between migrate groups was postponed and the group support moved to migrate_plus, it was tough to make the case without some concrete usage of groups. Once the templates are in core and migrate_upgrade is updated to use the templates, I'll make a follow-up patch for migrate_upgrade which will use groups and hopefully demonstrate the utility. Then we can revisit getting the groups into core.
@benjy:
I don't see how these would be applicable to templates. Templates work on a pull model - an application will request templates with a particular tag - so I don't see the install/optional distinction being meaningful. I don't see how schema would be used here - templates will contain a subset of the migration schema.
Yes, it appears it will scan themes and profiles as well as modules. It's not too hard to imagine providing migration templates as part of a profile, so I think that should kept. Given that the number of themes in a typical site is far fewer than the number of modules, and that this will typically be a one-time operation (when configuring a set of migrations, which then will be manipulated without rescanning for templates), I'm inclined to treat this as premature optimization and leave it to be addressed later if it does seem to be a problem.
I'll apply the rest of yours and phenaproxima's suggestions and have a new patch today, thanks!
Comment #16
mikeryanHere we go!
Comment #17
benjy CreditAttribution: benjy at CodeDrop commentedChanges look good to me.
Comment #18
phenaproximaLooks good and makes sense. Testbot approves, so *taps patch on the shoulders with a sword* I declare this RTBC.
Comment #19
alexpottLet's call this MigrationTemplateStorage
Lets move this to MigrationTemplateStorage and make that a service so this method can be used. No point mixing template storage with the real storage.
I'm still concerned that version is in a tag as I think version is something more fundamental about a migration than a tag. I.e. it is says what can this migrate from.
Comment #20
phenaproximaHere's a patch addressing #2-#4 :) The service is called migrate.template_storage.
Sorry about the gigantic-ness of the patch; for some reason my git did not pick up the renames, and instead seems to think that the contents of config/optional were deleted, and then re-created entirely in migration_templates.
Comment #21
phenaproximaAn easier-to-review version, after fixing my git config (thanks @benjy!)
Comment #24
phenaproximaLet's try that again...
Comment #25
benjy CreditAttribution: benjy at CodeDrop commentedLets not use ExtensionInstallStorage in the Migrate full test, we have our own service now.
Comment #26
phenaproximaFixed!
Comment #27
mikeryanWith the name change to MigrateTemplateStorage, we don't need the "as BaseExtensionInstallStorage" anymore. Otherwise, this patch looks good to me.
It doesn't break anyone doing custom migrations, if they've got their migrations defined in config/[install|optional] they'll still work just fine. Is anyone working on non-core Drupal migrations that depend on the automatic installation of the migrate_drupal migrations?
Version is specific to the migrate_drupal migrations, while the API being added here is meant to be generic. Maybe an alternative, without touching the API, would be for the migrate_drupal migrations to have a tag of 'Drupal' and source.version of 6, 7, etc., and do the version filtering after obtaining all the 'Drupal' templates?
Comment #28
phenaproximaI'd even go a step further. Migrations, as we know, aren't just Drupal to Drupal; they might also be WordPress to Drupal or Typo3 to Drupal. I think the version indicator should be called "api" or "platform" or something, and its value should be a meaningful constant like
drupal-6.x
,drupal-7.x
,wordpress-3.x
,typo3-2.x
or what-have-you. I envision this key as being similar to hook_views_api()'s "api" key, only broader.Example:
This isn't really a change in functionality -- we could filter on migration_tags -- but a change in semantics. I think it's a good idea for that reason alone.
Comment #29
phenaproximaFixed!
Comment #30
mikeryan@phenaproxima: Are you suggesting findTemplatesByTag() should be matching against this 'platform' key instead of 'migration_tags', or that the platform key would be something additional interpreted solely by the application (migrate_drupal, wordpress_migrate, etc.)? Although the applications we're thinking about now fit that pattern of migrating from a "platform", I think the core API should be as simple and generic as possible and not make this assumption about how templates will be used.
Comment #31
benjy CreditAttribution: benjy at CodeDrop commentedI do wonder if this and maybe even findTemplatesByTag() should cache the results on the class? They do YAML discovery after all which is probably expensive although this might not be called all that often.
@alexpott, what do you think? The rest is RTBC for me.
Comment #32
mikeryanYamlDiscovery caches the parsed templates itself, so we don't pay that price on every call to getAllTemplates().
Comment #33
mikeryanAlso, in the typical use case findTemplatesByTag() is called once, period, in like the lifetime of the site. E.g., in the upgrade scenario, the upgrade process calls it once to retrieve the templates, configure them, and save them as config entities, and from then on only deals with the entities.
Comment #34
phenaproximaThat being answered, it looks like this is RTBC as per #31.
Comment #35
mikeryanThe corresponding migrate_upgrade patch has been updated to reflect the latest core patch: https://www.drupal.org/node/2506501#comment-10054228
Comment #36
alexpottHmmm looking at this i think we should do this differently. I think we should be using the ConfigInstaller here and setting the source storage to be the
migrate.template_storage
. We should set something on the migrate.template_storage so that listAll() only returns the migrations which have the desired version. That way migration config dependencies are properly used when they are in the template.Comment #37
mikeryanCalling ConfigInstaller to directly install config from templates is a model that would only work in the tests - in real-world applications, we need to modify the config before installation. Alex suggests making ConfigInstaller::validateDependencies() public static so applications can, well, validate the dependencies - the problem I'm running into with that approach is that, while validateDependencies() itself doesn't reference $this, its $enabled_extensions and $all_config parameters are populated from protected ConfigInstaller methods (getEnabledExtensions, getActiveStorages), preventing us from simply calling ConfigInstaller::validateDependencies from outside the class. Making those public just so we can pass them to validateDependencies seems a bit much... Perhaps better would be a public wrapper to validateDependencies() which passes that info transparently.
Comment #38
mikeryanNaming things is hard :-(.
Here's a patch adding public validateDependenciesWrapper() to ConfigInstaller - please, offer a better name if you can.
Comment #39
mikeryanForget the wrapper, have validateDependencies() pick up those bits itself if they're not provided...
Comment #40
phenaproxima+1 for that. The patch looks good to me.
Comment #41
chx CreditAttribution: chx commentedFor one last time: this is wrong.
> As installed, these configuration entities are incomplete - they cannot be actually used without adding configuration for the source database.
Put the database credentials in state. This is a classic case of state: if you have a development site it is likely the credentials will point to a development version of the source site and the production site will have production credentials.
> They pollute config export - putting them into your site config git repo - does that make sense
How is it better to put the installed config entities complete with db credentials in config export when, as I pointed out above, production will likely need different credentials?
Comment #42
phenaproximaI agree with @chx about the DB credentials being run-time state information, but that's not a reason to hold up this patch. There is nothing in the patch, as written, which forces the DB credentials to be in the migration configuration. Yes, that was the original imagined use case, but a template system would be useful for other reasons (I'm interested in generating migration configurations ahead of time, based on the source data -- this would clear up a lot of complex, knotty code).
It's not wrong.
Comment #43
mikeryanDB credentials were never the only imagined use case, I just thought that was simplest example to make... In migrate_upgrade we're also using templates to merge source_base_path obtained from the UI/drush command into the file migrations. Down the road, the templates will make it possible to generate migrations d6_node:article, d6_node:blog, etc. from a single d6_node template, to have migrations from multiple sites generated from the same templates, ... All without having templates which haven't been used cluttering up active configuration, which was the original motivation for this issue.
Comment #44
mikeryanUpdated issue summary to reflect the current proposed implementation and use case.
Comment #45
phenaproximaTo me, that is also state info since it could vary from environment to environment, but that's another discussion.
Comment #46
phenaproximaSince @mikeryan wrote 95% of this patch, I have it on good authority (Moshe) that it's ethical for me to RTBC it. It looks good to me as of #39.
Comment #47
chx CreditAttribution: chx commentedThen you need to write a new IS because the current one is DB credential focused.
Comment #48
mikeryanI already rewrote the issue summary, which now has only a glancing reference to the credentials (until someone gets in a patch to put them into state, migrate_upgrade puts them in configuration). This patch was never about database credentials, that's simply an example I provided to illustrate merging configuration into templates that I did not realize would be controversial, and what to do with the credentials should not derail the template patch.
Comment #49
phenaproximaUpgrading to major. This is blocking work on the core migrate UI, as well as work on CCK/field handling for D6 and D7.
Comment #53
phenaproximaHEAD done broke our patch! This one regenerates the ConfigInstaller proxy, since the patch changes ConfigInstallerInterface.
Comment #54
phenaproximaSince the previous patch was just a boilerplate thing and didn't actually change any template-related code, I feel comfortable setting this back to RTBC.
Comment #55
alexpottI think I derailed this issue around #36. I incorrectly treated templates as whole configuration entities. Since looking at the changes and discussing further plans for templates in IRC with @mikeryan, @phenaproxima and @neclimdul I think we should take the approach in #29 with one change. I don't think that templates should contain config dependencies. Config dependencies are part of configuration and not the migration system.
Comment #56
alexpottDiscussed with @mikeryan and @phenaproxima in IRC and we're all happy with the proposal in #55.
Comment #57
phenaproximaFixed as per #55. None of the templates have the dependencies key anymore, and MigrateFullDrupalTestBase will no longer try to verify dependencies.
Let's get this thing in. :)
Comment #58
benjy CreditAttribution: benjy at CodeDrop commentedLooks good to me, nice to get rid of all those config dependencies.
Comment #59
alexpott@benjy - well we're not getting rid of the config dependencies - they are calculated from the templates when we convert them into Migration configuration entities.
Committed 74d72d1 and pushed to 8.0.x. Thanks!
I still feel a CR would be useful for people who have built migrations from their contrib or custom modules.
Comment #61
benjy CreditAttribution: benjy at CodeDrop commentedYeah I understands that, but, out of site out of mind :) nicer for those of us who will hand write migrations