Problem/Motivation
I am attempting to migrate around ~300,000 files from Drupal 7. As I do the migration import, I hit this:
Memory usage is 435.23 MB (85% of limit 512 MB), reclaiming memory. [warning]
Memory usage is now 439.99 MB (86% of limit 512 MB), not enough reclaimed, starting new batch [warning]
What's interesting, is that on the first run, I got about 70,000 files in one go before hitting the wall, then it halfed, then it halfed again, and now I'm to less than 1,000 per run before hitting the memory limit.
Proposed resolution
Figure out what's causing the memory usage to be so high.
Remaining tasks
- Figure out what the problem is
- Write Patch
User interface changes
N/A
API changes
N/A
Data model changes
N/A
Comments
Comment #2
davidwbarratt commentedComment #3
mikeryanThe memory is almost certainly be sucked up by entity cache, I would think. MigrateExecutable::attemptMemoryReclaim() does the following:
It sounds like this is somehow failing to actually reclaim the entity storage...
Comment #4
mikeryanOK, I devel-generated a bunch of stuff in my local D6 site copy and ran migration against that with a touch of instrumentation. I ended up with massive quantities of comment and I'm seeing:
So, entity cache reclamation in general seems to work fine, seems like what you're seeing is file-specific. I'm doubtful that the migration itself is leaking memory, I suspect the leak is elsewhere, but it needs a deeper dig with massive quantities of files (oh where is drush generate-files?).
Comment #6
swentel commentedI've seen this happening as well. First 40k files go very fast, but after that it's slowing down to a point where it's simply not usefull anymore. I haven't actually checked if this is a problem with entity cache, that might be suspect too (*). For some reason, I was focusing on the migrate map and message table. When more and more records are going in, the slower it becomes and I suspect it's the source_ids_hash column which acts as an index as well, but I can't confirm for now. I've started working on a patch for migrate upgrade in #2708723: Allow to run different background processes which does two things:
- allow drush to run different background processes in the background using an offset
- use temporary tables to move the data after each chunk to this table, moves it back after a plugin has finished.
Using that technique I was able to import 400k files in over a little of 70 minutes.
I will do a test with not triggering different backprocesses, but instead staying in the same while loop that drush is in, but still move the data from the migrate tables to a temporary table.
(*) Although in my case, the actual files aren't found (so far, just trying pure data migration), so there aren't files being saved, only records in the migrate map and message tables telling that the file isn't found.
Comment #7
catchRe-titling this and bumping to major. Anecdotally from irc this does sound like it's specific to the file migration.
Comment #8
Anonymous (not verified) commentedI have in the past run into issues like this in Drupal 7 and it is tough to debug. While I was debugging I found some platform issues beyond Drupal (php bugs I believe), but in the end it turned out to be an infrastructure issue (firewall vs. replication servers not in sync). In any case, we want to catch the issue so we can prevent the issue in the future.
@davidwbarratt and @swentel can we get details on the environment you are running the migrations in?
Specifically:
- what PHP version? and,
- what host OS?
Comment #9
swentel commented@Ryan Weal
I'm running the migration (for now) on my local machine, Ubuntu 15.10 - php 5.6.11-1ubuntu3.3
Comment #10
mikeryan@swentel - were you seeing the memory leakage problem reported in the original issue, or just the slowdown?
Comment #11
berdirWe've seen something similar when migrating a larger amount of files. Only happened for files, not for nodes.
Given that file migrations involve downloading *lots* of of files and putting them in the local file system, I can totally imagine that this is outside of our control and something in PHP itself or even lower.
What helped for us was just processing a few hundred items and then doing that in a bash loop.
Comment #12
swentel commented@mikeryan particularly the slowdown, haven't really checked the memory.
Comment #13
ultimikeI spent some time today (at the DrupalCon New Orleans migrate sprint) testing this in order to figure out if this could be an issue with Migrate. Here's what I did:
In speaking with mikeyan, vasi, and others, the issue _could_ be that PHP's memory_limit is either set to "-1" or it could be set to a value higher than the available memory on the machine. In both cases, the attemptMemoryReclaim() will **never** be run, and an out-of-memory condition can occur.
-mike
Comment #14
davidwbarratt commentedSo in our case, we actually did not want the files being moved over. We decided we'd instead use Stage File Proxy to move them over as needed.
To accomplish this, we overrode the
entity:filedestination plugin like so:We originally "fixed" this problem by creating a
drushrc.phpfile with this content:The server has 7.5GB of memory, so more than enough to handle it.
We are attempting to migrate over 333,000 files.
With us overriding the destination plugin to not move the files, the only thing I can think of is that their must be something with the file entity save itself.
For now we'll try increasing the memory limit to
2Gbut this is a little ridiculous.Comment #15
davidwbarratt commentedComment #16
catchI'd expect the same problem with other entity types, but any chance it's #2635440: Document what cache clearing from ContentEntityStorageBase::resetCache() actually clears clearing the persistent cache? It would be useful to know if switching the entity cache backend to null makes things better or worse while running the migration.
Comment #17
Anonymous (not verified) commentedthis looks like an xhprof run would help?
Comment #18
benjy commented@davidwbarratt, given the comment in #13, can you provide a setup that would allow someone here to reproduce your issue?
Comment #19
mikeryan@davidwbarratt: A couple other points of clarification:
@beejeebus: See ultimike's analysis in #13 above.
Comment #20
neclimdulI don't have that many files but I ran a similar test on a very large set of nodes and saw similar results to #13. Memory peaked pretty high during each run so I had to set the limit at 512 but when I did so it never hit the reclaim and ran smoothly.
Even so, I also ran the same test with a null entity cache in reference to #16. It was something like 5 minutes faster on an 8 hour migration. As long as the run was though that's basically within the margin of error for network traffic and other randomness affecting the run so I don't know if it really made a difference or not.
Comment #21
kevinwal commentedRunning into this as well with saving files in a migration. I'll follow up with details but still seeing the issue even with patch from #16 #2635440: Remove persistent cache clearing from ContentEntityStorageBase::resetCache()
Comment #22
mikeryanOn our current project, we've found node migration getting progressively slower, and tracked it down to pathauto - do the people seeing this problem on file migration A. have pathauto enabled and B. Have an alias pattern for files?
Comment #23
mikeryan(credit where credit is due - @geerlingguy tracked this down to pathauto)
Comment #24
berdirSee #2765729: PathautoPattern->applies() exponentially slows down operations with large numbers of nodes for the pathauto issue.
I'm not sure that the case for file is because of pathauto as well, we didn't have any file pathauto patterns (which you can only have with file_entity anyway).
However, it's not actually in pathauto itself but some cache context collection/merging that seems to get slower and slower, possibly due to a huge array of cache contexts somewhere. Could be something similar for files as well.
Comment #25
Anonymous (not verified) commentedre. #19 - the point of an xhprof dump would be to get the information that #13 doesn't provide.
nothing i've seen in this issue shows what actually happens when the OP runs their migration.
fixing the issue for the OP without the info that something like an xhprof dump provides is mostly an exercise in educated guessing.
Comment #26
mikeryanTwo questions:
If the answers are "no", my inclination is to close this as not reproducible.
Comment #27
kevinwal commentedWe are doing quite a few migrations that can include images and do have pathauto on the sites. I'll try to reproduce.
Comment #29
mpp commented@mikeryan: I can confirm there is a performance issue with migrations of large datasets.
When performing a test migration of 30 000 nodes, it takes 4 hours when I run it at once:
../vendor/bin/drush mi migrate_researchers_en --limit=30000When I run the same migration but in steps of 5 000, it only takes 20 minutes:
I had pinpointed the issue to a pathauto issue (#2765729) but now that patch is in and I'm on Drupal 8.2.
Comment #30
mikeryan@mpp: What about memory? That's what this issue is about...
Comment #31
mikeryanI will note that for running a large migration in one go, #2309695: Add query batching to SqlBase may be helpful - can you try the patch there?
Comment #32
mpp commented@mikeryan: I get messages "Reclaiming memory" with a max_memory_limit of 1G.
Comment #33
mikeryan@mpp: OK - could you try profiling memory usage with xhprof? (and, could someone who has done memory profiling with xhprof chime in with hints? Haven't done it myself to this point...). Alternatively, maybe you could try ultimike's instrumentation in https://www.drupal.org/node/2688297#comment-11189891 above.
Comment #34
heddnReviewed this in the weekly migrate maintainers call. Based on the number of reports, we are going to downgrade this from a migrate critical. If this becomes more prevalent, we can always re-add the tag.
Comment #35
ohthehugemanatee commentedAdding to the list of reports. :|
I have an SqlBase source with 63000 rows, used in two consecutive paragraph migrations (splitting the source row into two paragraphs). One of the paragraph migrations references previously-migrated files, the other just contains text content. I can run them both individually, but if I run them together I get the OOM issue described here.
The workaround so far is to tag migrations into groups, and write a bash script that runs them in sequence. Not optimal, but it gets me through the migration.
Comment #36
swilmes commented@mikeryan We are having the memory issue on our migration and have ran xhprof, which led back to array_merge using massive amounts of memory. We still haven't figured out why, but I may be able to provide more details Wednesday when we revisit the issue.
Comment #37
berdirAre you using pathauto? Try using the latest dev, not the beta version. I'll release a new beta soon.
Comment #38
swilmes commented@Berdir I am using pathauto. We have ran xhprof while using it and pathauto uses the most memory when its used. We ran it without pathauto, and that's when we see array_merge as being the largest consumption of memory.
EDIT: array_merge is not the issue when not using pathauto. I was mistaken, and that was where pathauto was where pathauto was using memory. The memory usage with pathauto disabled seems to be coming from database queries.
Comment #39
hussainwebA possibly related issue: #2843595: Add indexes to migrate_map_* tables
Comment #41
mikeryanA question for anyone seeing memory issues with migration - are you using search_api? I'm seeing some memory issues myself now on a project that has search_api enabled, and given that it has a history of memory issues (although none open on D8 at the moment), that seems a bit suspicious...
Comment #43
heddnI'd be curious if this is solved or helped with #2701335: Run garbage collection during migration memory reclamation.
Comment #44
mpp commented@mikeryan, indeed we're using search_api with search_api_solr.
Comment #45
neclimdulI'll take a look, last I looked this was still an issue. (no on search api btw)
Comment #48
luksakI am running into memory issues when mirgating from a sql source to media entities. The file entities were already migrated earlier. Uninstalling search_api didn't help...
Comment #49
neclimdulI've been struggling with this for years as evidenced by this issue. At this point migrates memory reclaiming is "pretty good". There does seem to be some place in migrations itself that is leaking though and as a result the memory reclaim isn't working. I think I might have tracked it down to source plugin and prepare row shoving a lot of data onto rows but glancing at the code again I can't put my finger on what it was.
It was complicated enough and immediate solution didn't seem likely I've had to just work around it at this point by just running the migration repeatedly until it finishes. Far from ideal but it seems to work as the migration is able to work back up to the high_water without leaking and the pick up the processing where it leaks again until it stops or completes.
Comment #50
neclimdulAnother note, @webflo mention in some chat at some point he thought entity's caching was broken by moving to a MemoryBackend instead of the property on the Managers. I haven't been able to reproduce this and the code looks fine as there is compatibility core on the managers to expose he same api but maybe it affects some one else.
Comment #51
luksakI found out that I configured my high water property incorrectly. Now I am able to run the migration in batches
Comment #52
benjifisherAny migration source that derives from SqlBase supports the
batch_sizeconfiguration.I am not sure I was seeing the problem described here, but the symptom was pretty much the same as the issue description. On a D7 -> D8 migration, I was running out of memory when migrating something like 100K users. I fixed it by adding
to my migration config.
Comment #53
luksakHuh, interesting... How does this play together with the limit of a migration import?
Comment #54
benjifisherThis was a while ago, so I may be misremembering, but I think
drush mim --limit=5000 my_user_migrationinterfered with thebatch_sizesetting. I consider that a bug.Comment #56
akalam commentedI think the problem comes from design. Migrate loads all rows everytime the migrate:import command is run. This eats more and more memory when the total rows grows up.
Migrate focused their effort in managing the memory and trying to free it, instead of trying to load only the needed data and not all. We think would be better to load the entire data only when the row needs to be imported. Imagine a periodic migration with a total count of 1.000.000 rows, where maybe you only need to import 10 new rows.
Here is a code example of a source plugin extending ContentEntity.
I would like someone else to review that approach and tell if it could be interesting to generalize and move it to the base source plugin somehow.
Comment #57
heddnAt the very least, this should be pended on #3006750: Remove memory management from MigrateExecutable. Over there is the first step to making memory management more managable.
After that, we need to have a better idea of what is the issue. It is really, really hard to fix something like this without a lot of profiling and debugging. Cause we don't really know what is causing the memory to get eaten up and not reclaimed.
But as pre-step, let's externalize memory management.
Comment #59
wim leersThis is still an ongoing problem. We (@huzooka, @narendraR and I) are currently investigating this too, and using the infrastructure that #2309695: Add query batching to SqlBase introduced did not solve the problem (by the way: literally nothing in Drupal core uses it). The root cause seems to lie at a lower level than that. Expect news soon, @narendraR is digging deeper currently.
This can make a migration impossible to continue. Unless you resort to
drushand set the memory limit to "unlimited". But that is not a reasonable demand.Because this is extremely disruptive for sites migrating, bumping this to critical. The 46 followers of this issue prove that this problem has affected many migrations already.
Comment #60
heddnWe discussed this in the migrate maintainers call last night. Given the definitions of critical include things like data loss and no other work around (both of which aren't the case here), we suggested this should be drop to a major. Hopefully we have some more details shortly from your research. This has been a tough nut to crack.
Comment #61
bhanu951 commentedJust to add I was able to solve this issue partially by adding
batch_sizekey in the source and--feedbackto the migration import command.Comment #62
pasqualleComment #63
pasqualleComment #64
pasqualleComment #65
wim leers@narendraR got stuck in his investigation, because using
batch_sizeis incompatible with\Drupal\migrate\Plugin\migrate\source\SqlBase::mapJoinable():… which is not at all mentioned in #2309695: Add query batching to SqlBase. It means this (using
batch_size) will not scale: you cannot interrupt a migration and continue it later. It'll need to iterate over every source row until it finds the last one it actually migrated.I don't see a clear solution. Anyone else? 🤞🤓
Comment #66
neclimdulI mean, I don't know how anyone runs large migrations safely outside of drush but that's an entirely different discussion. You are right though that the memory requirement is a non-solution and I'm pretty sure still runs into slowing down to basically a stop in the end.
I know how frustrating this is but its been years since I've looked at this so I can just give you a sense of what I was looking at and how we got through our migrations and maybe that will give you some clue in your search.
1. I'm sure it goes without saying but #3006750: Remove memory management from MigrateExecutable for flushing cashes and managing memory was _key_. It looks nothing like what we used but flushing caches is needed. I also just nulled out some caches on the container because they where 99.9999% misses with all the writes.
2. With memory management, bigger isn't always better. There's a sweet spot between thrashing the cache clears and the slowdown of php's allocation scheme so choose a "reasonable" value for your memory limit. I believe I have migrate_manifest patched to pass an arg into the GC watcher so we could tun it to batches of migrations as well. Maybe worth investigating.
3. And most specific to this issue, I had the most problem with certain process plugins so there seemed to be some sort of leakyness there. I hinted at this in an earlier comment. This might be why the "batch" concept worked for some people, probably an internal iterator storing rows as they're generated that buckles under its size and maybe batching flushes that? Files at the time seemed to be the one I just had to deal with but there was a lot of tuning of how processes plugins worked across the project to keep rows light and things running. Hopefully that's still a relevant and a useful pointer.
Comment #67
wim leers@neclimdul Thank you, those are all super valuable context and real-world experience anecdotes! 🙏
Comment #72
andypostI did re-roll of #3006750: Remove memory management from MigrateExecutable
But is it still a blocker?
Comment #73
qzmenkoThis is still a problem, but in our case for nodes migration.
We need to migrate ~2 million nodes. At the beginning of the migration, ~10 nodes per second are imported. After 50k imported nodes, the speed becomes ~2 nodes per second.
I tried changing the batch_size in the migration, but it did not affect the migration speed at first glance.
Comment #74
fjgarlin commentedI'm affected by this as well. In this case user's migration. Memory keeps creeping up (around 2 million users).
I've tried different options and no luck. The last thing I am trying came from this article, where it tries to play with the limit option in a loop for the migration as seen in the script suggested.
This is currently running so I don't know the result of it. It's still not ideal, because when using the
--limitoption, it stills tries to do some gathering of the previous runs.For example, if I run
drush migrate:import my_user_migration --limit 100, the output the first time would beBut then, on the second run,
drush migrate:import my_user_migration --limit 100, the output would beNote the
0 inserted, 0 updated.--
I even tried with a postSave event subscriber where I'd crear some caches but it would still not make a difference. This is what I tried:
Comment #75
berdirThere might be some other module that keeps things in memory, due to post processing.
resetCache() is a persistent cache clear, so it's fairly expensive and adds costs on its own. It will not add anything useful on top @entity.memory_cache->resetAll() which you do as well.
However, that can only clear the usage of those objects within the entity storage, if anything else holds on to these objects, they will remain in memory. Pretty impossible to say what it would be in your case, probably would require some kind of profiling with xhprof or blackfire or something like that. If it is specific to users, you could try to look for user presave/insert/update hook implementations.
Comment #76
heddnFor a 2M user migration, I stripped down the user source plugin so it only pulls back uids. Then I moved the actual gathering of data into a prepareRow. It had an amazing effect on the speed and memory usage of the user migration. By default the user source does what is essentially a
select * from users. What you want is something more likeseelct uid from users.Comment #77
fjgarlin commented@heddn - this is the migration and plugin that I am using:
- Migration: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/migrat...
- User plugin: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/src/Pl...
So, your suggestion would be to override the
User::querymethod to:Then in the
prepareRow, do you do:- A
select * from users where uid=$uid- And then several
$row->setSourcePropertyfor each property?I am going to try the above locally but wanted to also ask about the approach to make sure I understood you correctly.
Comment #78
fjgarlin commentedFor what is worth, I am not seeing any significant increase in speed after doing the above.
Before the change it was around 1100 records per minute
After the change it seems to be around 1150 records per minute
But this difference might just be the output number of the migration or me just looking a second late/early.
The code I did:
Comment #79
heddnSpeed should be about the same, especially in the beginning of the migration. But by the time you get to the 1M row mark, you're memory usage should be in a better place. That's where this alternative approach (which you outlined well) really starts to shine.
Comment #80
fjgarlin commentedGreat. Thanks for the info.
I went ahead and committed the above here https://git.drupalcode.org/project/drupalorg_migrate/-/commit/044bdebd94... and I will trigger again the full migration and monitor things.
Comment #81
benjifisherThis issue has been around for almost 10 years. Although several reliable users report running into this issue, no one has been able to provide steps to reproduce (STR) the problem. In Comment #13, @ultimike tried really hard to reproduce the problem just by creating and migrating 10K files, and did not see any evidence of a problem.
I am setting the status to Postponed (maintainer needs more information). If someone can provide STR, then we can un-postpone this issue.
Often, we set a time limit on this status and close the issue if there is no response. In this case, I think we should leave the issue open indefinitely. I think there are many useful comments, and open issues (even if they are postponed) are much more discoverable than closed issues.