I'm sorry; I tried to make several smaller, atomic changes to this module to address outstanding issues, but in the end so many of the fixes touched similar parts of the code and overall this module needed such a large re-factoring to make it work properly that I ended up with this mega patch.
Apologies in advance to the maintainers, but I am hopeful that the changes will be to your liking.
This patch is based off of #3112950: Nicer Preview Output. You will want to apply that patch first.
Here's what changed in this patch:
New Features and Enhancements
PDF Page Headers and Footers
- View headers and footers now render as the header and footer of every PDF page. All you need to do is configure header and footer areas on the VDE PDF view display.
-
It's now possible to include page numbers and other information (section name, time, date, URL, etc.) in PDF page headers or footers. This is made possible by a small JavaScript file that's included in the header and footer HTML templates that gets evaluated in the off-screen browser that
wkhtmltopdfuses to render the PDF -- an approach briefly described in the "Footers And Headers" section of wkhtmltopdf documentation (the JavaScript in this module is based on the example from that documentation).See instructions at the end of this post for instructions on how to use the new feature.
- Page header and footer HTML is now rendered out to a temporary file passed-in to
wkhtmltopdfinstead of being exposed through an insecure menu path. This also helps to ensure that header and footer HTML is thread-safe (e.g. multiple PDF exports can be running at the same time without impacting each other). This closes #3112949: Header and Footer Generation Doesn't work in Latest Dev.
Styling Improvements
- A default stylesheet for the PDF now ships with the module. This stylesheet is based on table styling from the Seven module in core and provides some reasonable default styling that makes tables look good out-of-the-box for most use cases so that we don't have to force site builders to always provide their own stylesheet. The stylesheet can be overridden with a custom one (just as before).
- The default stylesheet applies to page headers and footers. The stylesheet is automatically incorporated into header and footer HTML.
More Consistent UX/Batch Operation
- For batched exports, the PDF is now generated only a single time during the batch operation. Previously, the PDF was only generated when the user proceeded to download the result of the batched export. This would lead to a disappointing user experience if the PDF took longer than 5 seconds to generate because it would appear to the user as if the download was taking a long time to start, since the user would have expected the PDF to be generated during the batch export operation. The previous design also created several issues for the user if the user tried downloading the results a second time, or refreshed the page after a time-out.
- It is now possible to render the PDF within the browser window instead of always forcing it to be downloaded. If the "Provide as file" option is unchecked, then the PDF will be displayed immediately after export in the browser window. (For the best site builder UX, I recommend applying the patch to VDE from #3112930: UI/UX: "Provide as file" option is confusing so that this setting gets renamed to "Force export to be saved as download", which has a clearer intent).
- The option to render raw HTML for stylesheet development is now controlled by a new option. The option is called "Bypass PDF rendering and export intermediate HTML instead (for development and styling purposes)", and it now distinct from the "Provide as file" option provided natively by Views Data Export.
- Various bits of the PDF display style settings form verbiage were tweaked for clarity. For example, the description of the width and height settings now explains how those settings work when in landscape mode. In adition, "User style sheet path" has become "Custom stylesheet path" since a defaul stylesheet is included with the module now.
Bug Fixes
- This module now indicates that it requires PHP 7.0+. The master branch had several changes to code style to use PHP 7's new short array syntax. That's fine, but the module needs to require PHP 7 to avoid syntax errors appearing after installation.
- The table HTML output is now properly formed/complete even when: 1) performing a batch export, while 2) not using grouping. This fixes a regression that was seen only in the
7.xdev branch of the module that appears to have been introduced in7b38884. Previously, during a batched export, the opening<table>tag was not being generated because this behavior had been moved intotemplate_preprocess_views_data_export_pdf()which does not get invoked when rendering table bodies in a batch. - The "Provide as file"/"Force export to be saved as download" option no longer must be set for proper PDF export operation. This is a result of the fact that the option to export raw HTML instead of PDF has been moved to a separate option (described above). This closes #3112923: UI/UX: HTML Output is Generated Instead of PDF Output if "Provide as file" is Unchecked.
- Downloading the output of a batched export no longer results in a PDF containing the raw binary output (garbage data) of the original PDF. This bug is partially fixed by moving the generation of the PDF from the file transfer callback into the batch operation itself (described above), and partially by guarding generation of the PDF with a lock (described below). This addresses issue 1 in #3112947: Re-downloading the Output of a Batched Export Results in a PDF Containing Binary Data.
-
For batched exports, refreshing the page (at the end of the batch or after a timeout) no longer causes the same PDF to be generated by multiple concurrent requests.
The previous design did not guard against multiple requests trying to generate a PDF from the same HTML file. This was easy to cause just by hitting refresh. Further, if the user hit refresh because the original request timed-out, it was very possible that the original request was still running, and refreshing multiple times could crush the server under load.
A lock is now used to prevent a second request from trying to generate a PDF for the same HTML file for up to 10 minutes. If the batch operation times-out, and the user refreshes the pge to resume, the batch will poll the status of the lock until it is released. The MIME type on the file will communicate to the batch whether or not the file was converted successfully from "text/html" to "application/pdf". If you are using File Entity: You also need to patch File Entity using #8 in #2570377: File Entity overwrites filemime information prepared by other modules, or MIME types won't save properly.
This addresses issue 2 in #3112947: Re-downloading the Output of a Batched Export Results in a PDF Containing Binary Data.
- Failing to invoke
wkhtmltopdfno longer results in the raw HTML output being sent back to the browser as if it were PDF output. Instead, the user is now presented with an error message (if possible) that asks them to contact the site administrator. The site administrator should be able to find information on the root cause of the issue in the site logs.
Remaining Known Issues
-
"Grouping field Nr.1" option still does not appear in settings for the PDF style plug-in. Although the documentation for this module indicates that it supports grouping, and there is a lot of code in the settings and theme functions of this module for that feature, the option does not appear in Views settings. I modified several parts of the theme functions that work with this feature, but wasn't able to test for regressions because the option is missing.
It appears that this bug was previously reported as #3027310: Group by function. Unfortunately, we don't have a need for this function in our project (yet) and I am not an expert on what's required to get grouping functionality to work in a display style plug-in, so it's not fixed by this patch.
If I had to guess, I would say that
$this->definition['uses grouping']needs to beTRUEfor the PDF style plug-in and it's probably not set. Not sure, though; have not debugged this issue. This could also be an issue with VDE itself -- see #1187712: Grouping?. Large Views datasets can fail to export due to an InnoDB redo log error. In order for headers and footers to contain accurate result set information when performing a batched export, this module buffers view results in the batch operation "sandbox". Unfortunately, this creates a lot of database activity during the export because the sandbox is constantly being updated with an ever-increasing blob of serialized view result information. This can result in an error such as "The size of BLOB/TEXT data inserted in one transaction is greater than 10% of redo log size."
A fix for this issue is pending and might just require saving the view results to a temporary file instead, but that feels like something outside the scope of this already-expansive issue.
This issue has been addressed in #8.
Large Views datasets fail to generate a PDF (time out). On hosts that have a strict time limit (e.g. 90 seconds on Pantheon), which may be too tight when generating PDFs for large datasets. As a workaround for some exports, if the PDF generation step ("Finalizing PDF file.") of a batched export times-out, the user can refresh the page to "resume" the batch. In most cases, the original request that timed-out only timed-out on the load balancer and is actually still running
wkhtmltopdfon the backend, and refreshing the page will switch the export batch into a mode where it is polling the status of the export lock (introduced in this patch) for up to 10 minutes. So, on hosts like Pantheon, this workaround may allow the export to succeed as long as it takes less than an additional 8.5 minutes to finish-up.A fix for this issue will likely involve integrating with Background Process, which is outside the scope of this already-expansive issue.
This has been addressed by #3114698: Add support for Background Process so that large exports don't timeout.
Step-by-Step Instructions for New Features
Adding Page Numbers to a PDF
With this patch applied to your copy, here are steps for adding simple page numbering to the footer of every PDF page for an existing VDE PDF view display:
- Open the Views Data Export PDF display of the view for editing (if you haven't already).
- Add a new "Global: Text area" footer to the "Footer" region of the view.
- (Recommended) Change "For" at the top of the modal from "All displays" to "This YOUR DISPLAY (override)" (e.g. "This views_data_export_pdf (override)") to ensure that the page numbering only affects PDF exports.
- (Optional) Set the "Label" of the field to "Page number" to make it easier to find later when administering the view.
- (Recommended) Enable "Display even if view has no result" to ensure page numbers always appear even when there are no results. (For other region handlers, enabling this option can workaround issues in Views like #1807624: Saving and rendering in empty region for 'Global: unfiltered' text does not work).
- (Important) Set the "Text format" to "Full HTML".
- Enter the following HTML markup in the textarea:
<div style="float: right;"> Page <span class="pdf-page"></span> of <span class="pdf-topage"></span> </div> - Click the "Apply (this display)" button.
- Save changes to the view.
- Try the export.
Comments
Comment #2
guypaddock commentedComment #3
guypaddock commentedComment #4
guypaddock commentedComment #5
guypaddock commentedThe patch is attached.
Comment #6
guypaddock commentedTo aid in review of the mega-patch, I've set-up a mirror of the repository so that the diff is visible here:
https://gitlab.com/GuyPaddock/views_data_export_pdf/-/merge_requests/1/d...
Comment #7
guypaddock commentedComment #8
guypaddock commentedA revised patch is attached that remedies the issue with large exports filling-up the InnoDB redo log.
The new approach writes view results for batched exports to a temporary file instead of stashing the view results in the batch sandbox.
Comment #9
guypaddock commentedComment #10
guypaddock commentedCaught a few additional, minor loose ends for clean-up.
Attached is a re-roll of #8 with the additional clean-up. Interdiff is also attached, and the PR on my GitLab fork has been updated to aid in review.
Comment #12
guypaddock commentedComment #13
guypaddock commentedNow that I'm a co-maintainer on this module, fix has been committed to dev. Closing out.
Comment #14
guypaddock commentedComment #15
guypaddock commented