Right now, migrations (i.e., classes derived from Migration) are static - a given class has a builtin source, and can only have a single instance, so there's no way to manage sources where the class implementation is identical but there may be multiple source feeds you want to import using that class (thinking in particular here of XML sources).
The Migrate module tools instantiate non-abstract classes derived from Migration, with no arguments - at the moment, there's no mechanism to have multiple migration instances for different XML files, or for the tools to pass the right arguments for each of them. Note that having instances doesn't just mean presenting multiple entries in drush ms, but that each instance will have its own map and message tables.
What I'm thinking is we need to have migrate_status entries for each "instance", as well as map and message tables for each. Right now this is all keyed by a machine name which is the migration class name minus Migration (e.g. class BeerNodeMigration has machine name BeerNode).
OK, so looking at the code, a derived Migration class can set its own machine name after calling the parent constructor, so derived migrations could manufacture it (better to have an overrideable deriveMachineName() method, though). I think there are some spots in the drush support that would need to be modified, but it might be just that simple. Probably not, though...
Anyway, the implementing module will most likely to need to maintain a table of its own, say from XML feed URLs to machine names, and to implement, say, hook_migrations_alter(&migration_list) (called from migrate_migrations()).
I'll be working on this in the coming days...
Comments
Comment #1
mikeryanIn progress... I've got the basic support in Migrate, haven't tried implementing dynamic classes on top of it yet though. But fair warning, once this change goes in existing migration code will need to be changed.
Basically, in any of these cases, to fix existing code remove Migration from the end of argument you're currently passing.
Comment #2
mikeryanCommitted for D7, to be backported to D6.
Comment #4
mikeryanOK, that commit is a bit messed up, please don't update to the latest dev release on D7 until futher notice...
Comment #5
mikeryanOK, the D7 dev branch should be in better shape. Having some trouble with the D6 backport, though...
Comment #6
mikeryanOK, Moshe points out this still isn't quite working right. Let me explain what the issues are...
Originally, we had a one-to-one correspondence between migrations (as listed on the dashboard or drush ms) and migration classes. We required migration class names to end with Migration, and set the machine name to the class name without the Migration. Hence, the machine name for BeerUserMigration is BeerUser. This issue was about permitting parameterized migration classes, where one class might have multiple instances (listed and run separately). What I tried to do is maintain the existing behavior, while adding the ability to define migrations where the machine name and class name aren't necessarily related (more specifically, any relation is specific to the migration implementation, the migrate module itself doesn't know it). This was a bit of a struggle, because too many areas of the code assumed the class_name=machine_name.'Migration' pattern, but I seemed to be close. The problem is, though, that essentially we're supporting two kinds of migrations - those that follow the pattern, and those that don't. And, given just a machine name, we can't easily tell which kind it is.
So, I'm thinking it's time to ditch the Migration suffix - no longer make it special. The default machine name will be the class name - if you need to do "dynamic" migrations, or want a different machine name, you can implement hook_migrations_alter() (implemented as part of this issue). I think in the long run that will be cleaner, but in the short run the transition may be a bit painful for those of us already using Migrate 2. I expect I can do it without break existing migration code, but you'll find that the Migration suffix now appears where it previously didn't, and you will have to include it in drush commands (i.e., drush mi BeerUserMigration instead of drush mi BeerUser).
Thoughts?
Comment #7
mikeryanOh, and again - the current dev D7 branch is broken, using specific migration names in drush commands doesn't work (i.e., you can do drush mi --all but not drush mi BeerUser). If you're on beta2, or pulled from CVS before my 12/10 commit, stick with what you've got until I've got this all worked out.
Comment #8
das-peter commentedSubscribe
Meanwhile I use following hack to work with named migrations. Not sure if this breaks something other e.g dynamic migrations.
Comment #9
mikeryanI've committed changes per comment #7 - Migration is no longer a special suffix, machine names match class names by default, etc. This seems to work fine at the command line (at least better than it did yesterday), but the tests are hosed in a very strange way (running the Migrate tests I get the message "Class 'ImageToolkitTestCase' not found" - I'm not running image tests!) - I will investigate tomorrow.
A reminder that I haven't touched the D6 branch yet - I will backport once D7 is all straightened out.
Comment #10
mikeryanGrrrr... More problems, I'm going to think out loud here...
So, the original (pre-dynamic) mechanism for finding out what migrations were available was to browse class names from Drupal's code registry for names ending in Migration, create a ReflectionClass for the name, and identify whether it's a non-abstract subclass of MigrationBase. By removing the Migration suffix restriction, we instantiate a ReflectionClass for every class in the registry. Problem: broken classes can appear in the registry (the registry just parses class names from declarations, it doesn't pick up any number of potential things PHP won't like), and even instantiating the reflection object for the class (not an actual instance of the class) can generate errors. When I first ran into this I setup an error handler to catch these, but the error handler doesn't catch fatal errors.
Now, these errors do reflect actual problems in classes elsewhere in the Drupal install. One approach may be to let it go - fix the broken modules. But that leaves the migration code held hostage to other modules - if this were rare, it might be practical, but just in this one environment there are two modules affected; who knows how many others might be out there? And, the approach is inherently a performance concern anyway - it was one thing to narrow down the field of search by naming conventions, but now it's scanning several hundred classes in the registry; not just for Migration classes, but in separate calls for handlers.
So, I'm now thinking we really need to replace auto-discovery of classes with an explicit registration process. Already dynamic migrations need to explicitly add themselves to the migrate_status table - we might as well extend that process for all migrations. So, when you implement a new migration, rather than simply declare the class and clear the cache, you would write an update function to register the migration (and de-register it in hook_uninstall). It is more work, but I think it is a cleaner way to manage the classes. It also will address an issue that's looming for me - being able to override a non-abstract migration class and control whether the overridden class can be used on its own, or be completely effaced by the override.
Or, rather than having a push model (register/de-register in update hooks), we could have a pull model - implement hook_migration and add your valid migrations there, plus hook_migration_alter for fancy stuff.
And keep in mind, we need to deal with this not just for the Migration classes, but the handler classes as well.
Thoughts? Speak now, because I need to deal with this today...
Thanks.
Comment #11
mikeryanOr, support both push and pull - pushing adds migrations to migrate_status, and migrate_migrations() returns everything in that table plus anything added via a hook...
Comment #12
mikeryanOK, I've discussed this with Moshe, who was concerned with the additional housekeeping burden on migration implementors. What we've come up with is per-module registration. The fundamental problem with discovering relevant classes is the limited information available in the Drupal code registry - basically, the name and the defining module. We want to narrow the number of classes reviewed for matching for two reasons - performance, and the fact that sometimes registered classes blow up the ReflectionClass constructor. The original means of doing this was by requiring suffixes - Migration for migration classes, Handler for general handlers, FieldHandler for field handlers. That's limiting for the migration implementor, and just plain ugly.
So, let's look at that module column in the registry. If we knew which modules implement migration classes, we could scan only registry entries for those modules. How do we know which modules those are? Well, the simplest way is a hook whereby a module reports, say, what API version of Migrate it's built against. And, it happens that we had such a thing in Migrate 1, which we haven't addressed yet for Migrate 2.
So, that's the current plan - modules implementing migrations will implement
Migrate will pick up the names of the modules implementing this hook (and reporting an API version of 2), and filter the registry query on that list.
Comment #13
mikeryanThis is close now - it appears to be working fine in my development environment, but I'd like to do some more testing before committing (probably tomorrow).
Key changes that will be seen in beta3 from this work:
Comment #14
fangel commentedMike, if you want to let me know when the changes arrive, I'll happily test them out against the migrations I'm working on. I might check out a bit of the code too, but I don't know the the code of Drupal 7 or Migrate 2 well enough to catch complicated errors..
-M
Comment #15
mikeryanThe code is committed to CVS, for both D6 and D7, thanks.
Comment #16
mikeryanCommitted this past weekend... No one's screamed yet.
Comment #17
fangel commentedYeah, got around to testing -dev today. Works fine.
I take it there is no UI for adding new dynamic migrations? So the way to use those atm, is add new rows to the migrate_status table, or what? (Also, if so, migrate_status isn't the best-fitting name is it? Shouldn't it rather be something more generic like migrate_migrations?)
Comment #18
mikeryanYes, the implementing class needs to add to the migrate_status table using MigrationBase::registerMigration(). For example, in the blog migration module I'm working on now, when an XML file is uploaded a migration machine name is generated from the blog title and used to register (and subsequently run) the migration.
On the table name, I've had enough refactoring for now:-). The table does track the status of migrations, so the name still does fit, but you're right that optimally migrate_migrations or migrate_registry might be best.
Comment #20
apotek commentedJust curious whether this change also moves the map tables to the SOURCE db rather than the destination db.
I have made the changes to my codebase that are detailed in comment #13, but I'm crashing out on my migrations now because I get SQL errors.
WD php: PDOException: SQLSTATE[42S02]: Base table or view not found: [error]
1146 Table 'sourcedb.migrate_map_fooms'
It prefixes the source DB name now rather than the destination. I'll have to look through and see if I'm modifying the map at any point but just wanted to log this issue here, since this might affect other users who are moving from a late 2010 version of migrate to 6.x-2.3
Okay, got it. Never mind:
Comment #21
mikeryanMap tables are created in the destination database by default, but if you pass the connection key (e.g., 'legacy') of your source database as the fourth argument to the MigrateSQLMap constructor you can have them created there. Also be sure that if the map tables in your destination table cannot automatically join to the source query, you set the 'map_joinable' => FALSE option in your source constructor.
Comment #22
apotek commentedThank you. This makes it very clear.