Problem/Motivation
When exporting large datasets to XLSX format using the batch method, DataExport::doProcessBatch() creates multiple PhpSpreadsheet\Spreadsheet objects per batch iteration but never calls disconnectWorksheets() to break their circular references.
PhpSpreadsheet objects contain circular references (Spreadsheet ↔ Worksheet ↔ Cell) that prevent PHP's refcount-based garbage collector from freeing them. Since Drupal's _batch_process() can run multiple batch operations within a single HTTP request (via its internal while-loop and timer), Spreadsheet objects from prior iterations remain in memory.
For a 1,100-row export with export_batch_size of 200 (6 iterations), each iteration loads two full Spreadsheet objects (the accumulated file and the current batch) plus creates a Writer — none of which are freed. This causes unbounded memory growth, ultimately crashing the PHP process. On PHP 8.3 with the built-in development server, this manifests as a segfault (General Protection Fault) or a silent OOM kill depending on the memory_limit setting.
The problem is in src/Plugin/views/display/DataExport.php, lines ~897–917 (the "Workaround for XLS/XLSX" block).
Steps to reproduce
- Create a content type and populate it with 500+ nodes.
- Create a Views Data Export display configured with:
- Format: XLSX
- Export method: batch
- Batch size: 100 (or any size that requires multiple iterations)
- Trigger the export.
- Monitor PHP memory usage during the batch. Memory grows linearly with each batch iteration and is never reclaimed.
- With a large enough dataset, the PHP process crashes (segfault on built-in server, or Allowed memory size exhausted on Apache/FPM).
A standalone PHPUnit test is attached that reproduces the memory growth pattern outside of Drupal's batch system.
Proposed resolution
Call disconnectWorksheets() and unset() on the $previousExcel, $currentExcel, and $objWriter variables after saving the merged result. This breaks the circular references and allows PHP's garbage collector to reclaim the memory between batch iterations.
<?php
$objWriter = IOFactory::createWriter($previousExcel, ucfirst($output_format));
$objWriter->save($vdeFileRealPath);
// Break circular references so PHP's GC can reclaim memory between
// batch iterations within a single HTTP request.
$currentExcel->disconnectWorksheets();
$previousExcel->disconnectWorksheets();
unset($currentExcel, $previousExcel, $objWriter);
This is the standard cleanup pattern recommended by the PhpSpreadsheet documentation.
Remaining tasks
Fix the code as per above suggestion...
Consider whether the same cleanup should be applied in Xls::encode() (the non-batch encoder path), which also creates Spreadsheet objects without calling disconnectWorksheets().
User interface changes
None.
API changes
None.
Data model changes
None.
| Comment | File | Size | Author |
|---|---|---|---|
| SpreadsheetBatchMemoryLeakTest.php_.txt | 6.63 KB | dpagini |
Issue fork views_data_export-3586574
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #4
steven jones commentedYeah, looks good and simple enough eh?
Comment #6
steven jones commentedThanks for the bug report. This will be in the next release.
Comment #9
danharper commentedI can see this is in latest dev version which I have installed but I'm still getting 502 errors on xlsx only. CSV works fine.
There's no errors in the log file and it only seems to happen on production servers not on local dev.
I'm using batch export too with it cut down to 25 per batch on 5000 records, the records are slightly complex in terms of relationships etc.
Any pointers would be appreciated.
Comment #10
steven jones commented@danharper I think getting some logs or visibility into what is going wrong is going to be the thing to do. Otherwise it could be almost anything!