Problem/Motivation

When exporting large datasets to XLSX format using the batch method, DataExport::doProcessBatch() creates multiple PhpSpreadsheet\Spreadsheet objects per batch iteration but never calls disconnectWorksheets() to break their circular references.

PhpSpreadsheet objects contain circular references (Spreadsheet ↔ Worksheet ↔ Cell) that prevent PHP's refcount-based garbage collector from freeing them. Since Drupal's _batch_process() can run multiple batch operations within a single HTTP request (via its internal while-loop and timer), Spreadsheet objects from prior iterations remain in memory.

For a 1,100-row export with export_batch_size of 200 (6 iterations), each iteration loads two full Spreadsheet objects (the accumulated file and the current batch) plus creates a Writer — none of which are freed. This causes unbounded memory growth, ultimately crashing the PHP process. On PHP 8.3 with the built-in development server, this manifests as a segfault (General Protection Fault) or a silent OOM kill depending on the memory_limit setting.

The problem is in src/Plugin/views/display/DataExport.php, lines ~897–917 (the "Workaround for XLS/XLSX" block).

Steps to reproduce

  1. Create a content type and populate it with 500+ nodes.
  2. Create a Views Data Export display configured with:
    • Format: XLSX
    • Export method: batch
    • Batch size: 100 (or any size that requires multiple iterations)
  3. Trigger the export.
  4. Monitor PHP memory usage during the batch. Memory grows linearly with each batch iteration and is never reclaimed.
  5. With a large enough dataset, the PHP process crashes (segfault on built-in server, or Allowed memory size exhausted on Apache/FPM).

A standalone PHPUnit test is attached that reproduces the memory growth pattern outside of Drupal's batch system.

Proposed resolution

Call disconnectWorksheets() and unset() on the $previousExcel, $currentExcel, and $objWriter variables after saving the merged result. This breaks the circular references and allows PHP's garbage collector to reclaim the memory between batch iterations.

<?php
$objWriter = IOFactory::createWriter($previousExcel, ucfirst($output_format));
$objWriter->save($vdeFileRealPath);

// Break circular references so PHP's GC can reclaim memory between
// batch iterations within a single HTTP request.
$currentExcel->disconnectWorksheets();
$previousExcel->disconnectWorksheets();
unset($currentExcel, $previousExcel, $objWriter);

This is the standard cleanup pattern recommended by the PhpSpreadsheet documentation.

Remaining tasks

Fix the code as per above suggestion...
Consider whether the same cleanup should be applied in Xls::encode() (the non-batch encoder path), which also creates Spreadsheet objects without calling disconnectWorksheets().

User interface changes

None.

API changes

None.

Data model changes

None.

CommentFileSizeAuthor
SpreadsheetBatchMemoryLeakTest.php_.txt6.63 KBdpagini
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

dpagini created an issue. See original summary.

steven jones made their first commit to this issue’s fork.

steven jones’s picture

Status: Active » Reviewed & tested by the community

Yeah, looks good and simple enough eh?

  • steven jones committed 33e4f712 on 8.x-1.x
    feat: #3586574 disconnect worksheets in XLS batches.
    
    By: dpagini
    By:...
steven jones’s picture

Status: Reviewed & tested by the community » Fixed

Thanks for the bug report. This will be in the next release.

Now that this issue is closed, review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

danharper’s picture

I can see this is in latest dev version which I have installed but I'm still getting 502 errors on xlsx only. CSV works fine.

There's no errors in the log file and it only seems to happen on production servers not on local dev.

I'm using batch export too with it cut down to 25 per batch on 5000 records, the records are slightly complex in terms of relationships etc.

Any pointers would be appreciated.

steven jones’s picture

@danharper I think getting some logs or visibility into what is going wrong is going to be the thing to do. Otherwise it could be almost anything!