It appears that when a batched export is being generated, this module generates the PDF from the HTML output of the export on the transfer callback, instead of as part of executing the batch. To make matters worse, the PDF file overwrites the HTML file during this process, so there's no way for the module to differentiate one from the other.

This means that if the user attempts to download the export more than once, the second time they download the export they will get a very long PDF containing binary data. Even if the user doesn't attempt to download the report a second time, they may wonder why it's taking so long for the download to start after clicking the download link. Neither of these outcomes is ideal.

The problem is further complicated if the report is large or contains a lot of images that are generated by the Image module; it's possible that the transfer callback will time-out on a load balancer (e.g. on a host like Pantheon) the first time it is requested, causing the user to hit Refresh and try again. In many cases, the backend request is still running despite the load balancer timeout, so this results in multiple exports running concurrently. Each of these requests is tying up server resources without being throttled by a front-end load balancer. If the user continues to hit refresh, this can result in an unintentional Denial of Service on the host due to the number of exports running simultaneously.

Two things need to be done to remedy this:

  1. The PDF should only be generated once, during the batch. This ensures that attempting to download the results of the export in the future is a lightweight option.
  2. A lock should be used to prevent multiple requests from generating the simultaneously. This ensures that if the batch times-out while preparing the PDF and the user refreshes the page, they don't inadvertently take out the server by repeatedly hitting refresh.

Comments

GuyPaddock created an issue. See original summary.

guypaddock’s picture

Assigned: Unassigned » guypaddock

Assigning to myself to work on.

guypaddock’s picture

Title: Downloading Results of a Batched Export Results in a PDF Containing Binary Data » Re-downloading the Output of a Batched Export Results in a PDF Containing Binary Data

Clarified issue title.

guypaddock’s picture

guypaddock’s picture

Assigned: guypaddock » Unassigned
guypaddock’s picture

Status: Needs review » Fixed

#3114202: Mega-patch - Re-factor of Views Data Export PDF to Fix Bugs and Improve UX, which contains a fix for this issue, has been committed to dev. Closing out.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.