I'm looking at doing a D6->D7 migration with a web service component for export so we can move changes across continuously until the final switch over. The number of ids starts getting pretty crazy (there's 330,000+ nodes and 540,000+ users) so even just getting a listing of ids turns into a lot of data. What I'm probably going to subclass MigrateListJSON and override getIdList to use a page parameter and keep making requests until it gets all the ids.
It occurred to me that it might be more elegant to modify MigrateList's interface so that getIdList() could return an iterateable object of ids, rather than just an array. Since an an array is iterateable it would be backwards compatible. This would allow the iterator to fetch a page of values at a time rather than all at once. Going page at a time even if they're large (say 10,000 ids) would still use less memory than fetching the entire list. I think the only real change would need to be in MigrateSourceList::next() where it does:
while ($this->id = array_shift($this->idList))
It could go to something like
foreach ($this->idList as $id) {
$this->id = $id;
Thoughts?
Comment | File | Size | Author |
---|---|---|---|
#6 | 0002-Issue-1268070-by-drewish-Let-MigrateList-getIdList-r.patch | 4.97 KB | drewish |
#5 | migrate_1268070.patch | 4.97 KB | drewish |
#4 | migrate_1268070.patch | 4.82 KB | drewish |
#3 | migrate_1268070.patch | 2.37 KB | drewish |
Comments
Comment #1
drewish CreditAttribution: drewish commentedAh just started thinking about the need to be resumable. I think we could probably just test the result of the getIdList() call and if it's an array call
new ArrayIterator($this->idList);
then in the loop use the Iterator interface.Comment #2
mikeryanSounds like a good idea. Too late to get into 2.2, which I plan on cutting very soon, but let's put it on the list for 2.3.
Comment #3
drewish CreditAttribution: drewish commentedHere's kind of what I'm thinking. It doesn't seem like it's a good idea to expose getIdList() since no-one's calling it so I killed that off.
One downside is that the iterator class couldn't make use of other parts of the MigrateList. In my case I was trying to extend MigrateListJSON and found I needed to be able to call getIDsFromJSON(). So I ended up having my list implement Iterator so I could do it in one place.
It's also kind of odd because we don't care about keys so you you either end up implementing key() and ignoring it.
I'm wondering if I should just be implementing MigrateSource myself... I wonder if there's a way we can make that process easier.
Here's my class:
Comment #4
drewish CreditAttribution: drewish commentedAdded support to MigrateItems for working with an Iterator.
Comment #5
drewish CreditAttribution: drewish commentedHad some mixed capitalization of idList and idlist.
Comment #6
drewish CreditAttribution: drewish commentedHere's a re-roll
Comment #7
mikeryanCommitted for D6 and D7, thanks!