Hi,
I encountered an error and found a way around it.
Problem:
When doing a migration update of around 40k entities (users in this case), the 4 GB php memory limit wasn't enough to prevent it from throwing an out of memory error.
Cause:
Turns out that all migrated (updated or created) items are (re)indexed, using a loadMultiple() call that passes ALL entities at once.
This happens in the following file: modules/contrib/search_api/src/Utility/PostRequestIndexing.php.
In the following function: public function destruct()
Solution that worked for me:
Call the loadMultiple() on chunks (smaller parts) of the entire list of entities.
This allows it to recover memory before starting the next group of items to index.
try {
// $items = $index->loadItemsMultiple($item_ids);
// if ($items) {
// $index->indexSpecificItems($items);
// }
$batch_of = 500; // TODO formula to set smart default based on php memory limit, stay well within safe limits
$item_ids_batches = array_chunk($item_ids, $batch_of);
foreach ($item_ids_batches as $item_ids_batch) {
$items = $index->loadItemsMultiple($item_ids_batch);
if ($items) {
$index->indexSpecificItems($items);
}
}
}
Suggestion:
It seems like performance would be better if you use bigger chunks, however they need to be small enough to not cause out of memory problems...
Hence, might be nice to have a (simple) formula to calculate a smart chunk size based on the php memory limit value.
| Comment | File | Size | Author |
|---|---|---|---|
| #4 | 3014641-4--post_request_indexing_in_batches.patch | 1.11 KB | drunken monkey |
Comments
Comment #2
drunken monkeyThe actual proper solution would probably be to disable indexing while migrating content. You can simply do that by calling
startBatchTracking()on the index entity/entities before starting the migration. That way, changes will be tracked, but no immediate indexing will occur. Temporarily changing the index to disable “Index items immediately” would also be a solution (it does practically the same).However, just indexing in chunks would still be an improvement in any case, I guess. It might not be enough in a lot of cases, though – i.e., whenever there is a execution time limit as well. But as most suh migrations will run via Drush, that’s probably not a concern in most cases.
Please see whether the attached patch works for you!
(I don’t think we need to have some special batch size computation logic in place. (Hm, or should we just re-use the index’s cron batch size setting?))
Comment #3
legolasboI think re-using the batch cron limit would be the way to go here. That way people can somewhat tune this to their own setup. It's also easily achieved by replacing this line with
$item_id_batches = array_chunk($item_ids, index->getOption('cron_limit'));Comment #4
drunken monkeyThanks for the review!
Not quite, as the cron limit can also be 0 or negative.
But not much more complicated, either – patch attached.
Comment #5
legolasboYup, looks good to me :)
Comment #6
drunken monkeyThanks, good to hear!
Fixed one remaining coding standards problem (spotted by PhpCs pre-commit hook) and committed.