Problem/Motivation

The 7.x version of this module supported exporting data via the batch API for large datasets that could not be reliably exported in a single request.

Proposed resolution

Remaining tasks

CommentFileSizeAuthor
#308 views_data_export-2789531-308_for_beta3_only.patch93.94 KBYurkinPark
#303 2789531_interdiff_295-303.txt1.82 KBmalcolm_p
#303 views_data_export-2789531-303.patch76.21 KBmalcolm_p
#295 2789531-295.views_data_export.Support-for-batch-operations.patch76.2 KBayalon
#294 views-data-export-support-for-batch-operations-2789531-294.patch77.55 KBayalon
#292 views_data_export_batch_empty_fields.patch832 bytesayalon
#287 Screenshot 2019-11-05 at 11.53.26.png28.74 KBdarrenwh
#287 Screenshot 2019-11-05 at 11.52.02.png25.89 KBdarrenwh
#278 interdiff_269-277.txt361 bytesalex.skrypnyk
#277 views-data-export-support-for-batch-operations-2789531-277.patch73.17 KBalex.skrypnyk
#276 views-data-export-support-for-batch-operations-2789531-273.patch76.73 KBmikran
#276 interdiff.txt959 bytesmikran
#276 modified-patch-for-composer-1.0.0-beta3-do-not-test.patch75.75 KBmikran
#273 views-data-export-support-for-batch-operations-2789531-273.patch76.73 KBmikran
#269 views-data-export-support-for-batch-operations-2789531-269.patch76.69 KBspheresh
#264 2789531-264-interdiff.txt748 bytesjlscott
#264 views-data-export-support-for-batch-operations-2789531-264.patch76.6 KBjlscott
#259 interdiff_256-259.txt8.8 KBsimplyshipley
#259 views-data-export-support-for-batch-operations-2789531-259.patch76.42 KBsimplyshipley
#256 2789531-256.patch73.9 KBtamarpe
#256 2789531-256.interdiff.txt1.48 KBtamarpe
#255 2789531-255.patch73.6 KBtamarpe
#255 2789531-255.interdiff.txt878 bytestamarpe
#254 2789531-254.interdiff.txt1.07 KBclaudiu.cristea
#254 2789531-254.patch73.6 KBclaudiu.cristea
#251 interdiff-2789531-243-251.txt3.78 KBYurkinPark
#251 2789531-251.patch73.69 KBYurkinPark
#246 views_data_export-batch_support-2789531-246-D8.patch74.45 KBivelin.enchev
#243 interdiff-2789531-233-243.txt1.68 KBYurkinPark
#243 2789531-243.patch74.16 KBYurkinPark
#234 interdiff-2789531-222-233.txt30.98 KBYurkinPark
#234 2789531-233.patch74.08 KBYurkinPark
#230 interdiff-2789531-222-230.txt30.42 KBYurkinPark
#230 2789531-230.patch74.6 KBYurkinPark
#229 interdiff-2789531-222-229.txt30.05 KBYurkinPark
#229 2789531-229.patch74.19 KBYurkinPark
#228 interdiff-2789531-222-228.txt30.08 KBYurkinPark
#228 2789531-228.patch74.16 KBYurkinPark
#222 interdiff_219-222.txt699 bytessimplyshipley
#222 views-data-export-support-for-batch-operations-2789531-222.patch45.06 KBsimplyshipley
#219 interdiff_207-219.txt415 bytessimplyshipley
#219 views-data-export-support-for-batch-operations-2789531-219.patch44.95 KBsimplyshipley
#215 interdiff-2789531-214-215.txt4.84 KBYurkinPark
#215 2789531-215.patch46.57 KBYurkinPark
#214 2789531-214.patch43.08 KBandypost
#214 interdiff.txt7.17 KBandypost
#212 interdiff-203-212.txt699 bytesYurkinPark
#212 views-data-export-support-for-batch-operations-2789531-212.patch43.06 KBYurkinPark
#208 views-data-export-support-for-batch-operations-2789531-208.patch49.21 KBsimplyshipley
#207 views-data-export-support-for-batch-operations-2789531-207.patch44.92 KBsimplyshipley
#204 views-data-export-support-for-batch-operations-2789531-203.patch42.43 KBartematem
#202 views-data-export-support-for-batch-operations-2789531-202.patch42.17 KBartematem
#201 views-data-export-support-for-batch-operations-2789531-201.patch42.65 KBartematem
#200 views-data-export-support-for-batch-operations-2789531-200.patch59.9 KBartematem
#199 views-data-export-support-for-batch-operations-2789531-199.patch60.12 KBartematem
#198 views-data-export-support-for-batch-operations-2789531-198.patch59.99 KBartematem
#197 views-data-export-support-for-batch-operations-2789531-197.patch61.81 KBsimplyshipley
#196 views-data-export-support-for-batch-operations-2789531-196.patch61.71 KBsimplyshipley
#193 hacky-data-export-facets-workaround.patch1.1 KBdrunken monkey
#192 2789531-192.patch42.47 KBSteven Spasbo
#179 2789531-178.patch42.46 KBmhmd
#178 views_data_export_batch_xlsx_fix.patch1.62 KBmhmd
#178 2789531-178.patch42.46 KBmhmd
#172 lando-write-up.txt2.54 KBressa
#170 interdiff-169-170.txt703 bytesnicrodgers
#170 2789531-170.patch42.46 KBnicrodgers
#169 interdiff.txt802 bytesryan.gibson
#169 2789531-169.patch42.46 KBryan.gibson
#168 views_data_export_batch_xlsx_fix.patch1.62 KBuniquename
#149 2789531-149.patch42.33 KBjibran
#149 interdiff-d41d8c.txt924 bytesjibran
#148 2789531-148.patch41.43 KBjibran
#144 2789531-144-interdiff.txt1014 bytesJeffM2001
#144 2789531-144.patch41.4 KBJeffM2001
#143 2789531-142.patch41.44 KBYury N
#142 2789531-142.patch21.08 KBYury N
#139 interdiff-2789531-134-139.txt17.25 KBjohnchque
#139 2789531-139.patch40.34 KBjohnchque
#134 2789531-132.patch22.53 KBalex_ognev
#132 2789531-132.patch21.03 KBalex_ognev
#123 2789531-123.patch23.7 KBgrndlvl
#123 2789531-123-interdiff.txt1.58 KBgrndlvl
#122 2789531-122-interdiff.txt612 bytesgrndlvl
#122 2789531-122.patch23.62 KBgrndlvl
#121 2789531-120-interdiff.txt850 bytesgrndlvl
#120 2789531-120.patch23.62 KBgrndlvl
#116 2789531-116-interdiff.txt622 bytesBerdir
#116 2789531-116.patch23.51 KBBerdir
#106 interdiff-2789531-94-106.txt2.22 KBthtas
#106 2789531-106.patch23.38 KBthtas
#105 interdiff-2789531-94-105.txt2.19 KBthtas
#105 2789531-105.patch23.36 KBthtas
#94 interdiff.txt2.67 KByanniboi
#94 2789531-94.patch21.97 KByanniboi
#83 2789531-83.patch21.13 KBkilles@www.drop.org
#81 2789531-81.patch21.15 KBphjou
#80 2789531-80.patch21.12 KBphjou
#77 2789531-77.patch21.12 KBharsha012
#71 views_data_export-batch_support-2789531-71-D8.patch20.88 KBmartins.bertins
#71 interdiff-2789531-65-71.txt2.5 KBmartins.bertins
#65 views_data_export-batch_support-2789531-65-D8.patch20.71 KBharora
#65 interdiff-2789531-58-65.txt2.48 KBharora
#58 views_data_export-batch_support-2789531-58-D8.patch19.27 KBalan-ps
#58 interdiff-2789531-52-58.txt772 bytesalan-ps
#52 views_data_export-batch_support-2789531-52-D8.patch19.19 KB_Archy_
#52 interdiff-50-52.txt5.45 KB_Archy_
#42 interdiff-42-29.txt2.19 KB_Archy_
#42 views_data_export-batch_support-2789531-42-D8.patch13.25 KB_Archy_
#29 interdiff.txt859 bytesdawehner
#29 2789531-29.patch11.72 KBdawehner
#22 interdiff-2789531-21-22.txt691 bytesjames.williams
#22 views_data_export-add_batch_operations_to_file-2789531-22.patch11.76 KBjames.williams
#21 interdiff-2789531-15-21.txt1.13 KBjames.williams
#21 views_data_export-add_batch_operations_to_file-2789531-21.patch11.76 KBjames.williams
#20 views.view_.content.yml19.66 KBjhedstrom
#15 interdiff-2789531-14-15.txt973 bytesjames.williams
#15 views_data_export-add_batch_operations_to_file-2789531-15.patch11.77 KBjames.williams
#14 views_data_export-add_batch_operations_to_file-2789531-14-interdiff.txt1.51 KBBerdir
#14 views_data_export-add_batch_operations_to_file-2789531-14.patch11.8 KBBerdir
#12 views_data_export-add_batch_operations_to_file-2789531-12-interdiff.txt1.06 KBBerdir
#12 views_data_export-add_batch_operations_to_file-2789531-12.patch11.86 KBBerdir
#11 views_data_export-add_batch_operations_to_file-2789531-11-interdiff.txt3.95 KBBerdir
#11 views_data_export-add_batch_operations_to_file-2789531-11.patch11.8 KBBerdir
#10 views_data_export-add_batch_operations_to_file-2789531-10-interdiff.txt7 KBBerdir
#10 views_data_export-add_batch_operations_to_file-2789531-10.patch11.39 KBBerdir
#7 views_data_export-add_batch_operations_to_file-2789531-7.patch10.61 KBcaxy4
#5 views_data_export-add_batch_operations_to_file-2789531-5.patch10.54 KBcaxy4
#2 2789531_2.patch7.93 KBslashrsm
#47 interdiff-47-29.txt2.06 KB_Archy_
#47 views_data_export-batch_support-2789531-47-D8.patch12.98 KB_Archy_
#50 interdiff-47-50.txt13.46 KB_Archy_
#50 views_data_export-batch_support-2789531-50-D8.patch18.7 KB_Archy_
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhedstrom created an issue. See original summary.

slashrsm’s picture

Status: Active » Needs review
FileSize
7.93 KB

I gave this a try. Attached patch is WIP and I am looking for feedback on approach, which is fairly simple and should work with any output format. There are TODOs in the code which show main things that are missing and/or I am not sure if should be implemented differently.

I tested this with CSV export of ~10k users and it worked fine.

Tried with 1000 items CSV export, which took~2.5s. Also tried with 5000 and 10000, which both exploded into my face with 30s timeout. One option, that might be faster, is to cache output of each iteration as a whole and then merge them. I am not sure if that would support all possible formats though.

jhedstrom’s picture

Thanks @slashrsm for picking this up! I haven't had time for a thorough review, but the general direction looks good.

As an aside, we will definitely need to make sure we don't re-create the security issue from #2798759: Ensure access control is performed on batched export downloads when we add this.

    Just a few observations/notes:

  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -28,30 +32,64 @@ use Drupal\views\ViewExecutable;
    +      $renderer = \Drupal::service('renderer');
    

    Anywhere the \Drupal::service() method is being used, the service should be instead injected to the class.

  2. +++ b/src/Plugin/views/style/DataExport.php
    @@ -272,4 +273,59 @@ class DataExport extends Serializer {
    +    // TODO Split load into batches in case number of items is huge?
    +    // TODO Enormous amounts of rows will result in substantial amount of cache
    +    // data. Consider some other storage? File?
    

    I don't remember offhand for certain, but I think the 7.x version used a file. For large enough data sets, it's potentially possible that the entire result set could not fit in memory, so even batch processing would need to just append to a file perhaps?

Strutsagget’s picture

I get this error

[sitename] redirected you too many times.

caxy4’s picture

Team, here's another approach that writes to file instead of cache.
It seems to avoid the timeout and/or memory limit issues on large datasets the patch in comment #2 is susceptible to.

Similar to @slashrsm's patch, I left a few TODOs in where I wasn't sure on my approach - feedback encouraged!

A few implementation notes - a few relative to sashrsm's approach so people can quickly assess the differences:

  1. Configuration of the batch size is done via the "items per page" setting in the pager. Similar to @slashrsm's patch this means mine won't work with the "all" pager.
  2. Instead of handling the batch processing loop (i.e. init, process batches, finish) in display/DataExport::buildResponse() I leveraged the batch API's "operations" and "finished" callbacks - see callback_batch_operation() and callback_batch_finished() implementations.
  3. Instead of doing the batching in style/DataExport::render(), I did it in an implementation callback_batch_operation(): display/DataExport::processBatch().
_Archy_’s picture

Hello @caxy4. Tested your patch and it seems to work fine so far. Only one problem with this is that it doesn't handle duplicate headings. So for example exporting in CSV on every batch a new heading will be written into the file, which is not wanted. For this case I've added the following, but I don't think this will work for other formats:

// If this is not the first batch remove the headings.
    if ($context['sandbox']['progress'] != 0) {
      $rendered_rows = preg_replace('/^[^\n]+/', '', $rendered_rows);
    }

It would be nice a standard way of doing this. Does anyone have an idea? As I've seen there is no option to disable rendering of the heading.

caxy4’s picture

Uploading a nearly identical version of my previous patch with a fix for a bug where 1 row was skipped per batch because I didn't account for zero indexing when updating the $context['sandbox']['progress'] variable.

@_Archy_ I didn't experience the issue with each batch outputting the heading - I've got my View configured to export fields (i.e. not rendered entities) in CSV format using a pager where "items per page" determines the batch size. If you see an obvious difference in your setup, post here - if not, PM me and we can try to track it down together and post the results here.

_Archy_’s picture

@caxy4 just tried your patch without confirming before that it was good. Now it exports duplicates.
Let's say we have a total of 50 items. If the pager is set to 50, then it will get 50 items out of 50, but because you subtract one when setting progress you'll get 49 and continue to second batch, but it was already completed.

Regarding the duplicate headers. I am also rendering with fields for CSV, I have aliases set for the fields, 1500 items per page, two different content types at the same moment => I get headings on every 1500th row.

Lowell’s picture

I just tried the latest patch with csv and it seems to work pretty good.
Did have an issue where I cannot limit the results, no matter what I set the Items to Display value, it processes all the available records.

Berdir’s picture

Started testing this as well.

One thing I noticed is that it needs better error handling around the file. It hardcodes private right now. I can see why we might want to enforce that, but then we need to make sure it is enabled and disallow creating an export if not. Right now it fatals if private is not enabled. We could also make it configurable.

I also have the problem with the repeated header and also noticed that a line break is missing on the last line, so the header is then on the same line as the last line. I don't see how we can actually make this work without some way for the encoded to know that it is an ongoing stream. For CSV, that means the header, And this header stuff will actually be more complicated with XML/JSON and similar export formats, as they need to do things separately on the first, inner and the last call. I assume we should introduce a standard context argument that we pass to the serializer that gives it more info on this. As a minimal version of that, we could introduce a csv specific setting for showing the headers or not, then we can force that an later runs.

About displaying, I think it would be nice if we'd just return to the original view and would return a message and a link from where you can download the file.

The attached patch adds supports for arguments (untested) and contextual filters (minimally tested) and improves the error handling a bit. I also included a version of the hack in #6. See above for better suggestion but that will also require changes in csv_serialization.

Berdir’s picture

This implements the suggestion around just displaying a message with the download link. I think this is acceptable UX when you have to use batch. But obviously makes it important to be able to turn batch off and fall back to the default behavior.

Berdir’s picture

_Archy_’s picture

Someone should also check whether I am right about the change in #7, my problem for it described in #8. Get a view with normal display and show summary for it, get a display of data export (makes sure of multiple batches) and see if you get the same amount of results.

Berdir’s picture

You are :)

Verified and fixed, that code/comment there was wrong.

Example with items_per_page = 5. You start with offset 0, get results 1, 2, 3, 4, 5. Next run needs to run with offset 5 (so you skip 5 results, not start with the 5th), so you need to add 5, not 4.

Fixed, also fixed the destination when there is no exposed input.

james.williams’s picture

The workaround for CSV headers was checking for the CSV option in the wrong place, I'm not sure how that slipped through? And it would replace the header line with a blank line, which wasn't right either. See attached patch & interdiff.

Berdir’s picture

In my testing, there was a newline missing on the line that added the header, that's why I add that.

And testing the display worked for me, will re-check that.

jhedstrom’s picture

Thanks for all the work on this! I finally got a bit of time to test this out. Running against a ~45k node export, I'm seeing this appear in the logs while the batch is running:

Notice: Array to string conversion in
                                    Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() (line 309

Using CSV, so this is the header workaround line.

When the batch finishes, no file is downloaded, and there are no further error messages in the logs.

james.williams’s picture

@jhedstrom what are the error messages you're seeing in the logs? Please could you post up your view configuration export perhaps please?

Berdir’s picture

@jhedstrom: Do you have an export filename configured for your view? Not sure how it behaves without that (It's pretty well hidden in the path settings form)

james.williams’s picture

Ah, sorry! I'd totally thought I was logging notices so would have seen it, but turned out I wasn't. This patch should fix that notice. I haven't tested against your specific view jhedstrom, but I'd replicated the issue so I imagine this should be the solution now?

james.williams’s picture

Oh my. Literally just discovered another issue with the original patch's work, so here's yet another patch, to fix some bad HTML :-(

Haza’s picture

I was testing this patch with a SearchAPI (solr) views (support is there, we even have facets support thanks to #2827696: Facets support) and right now, with this patch, it crashed.

Error: Call to a member function countQuery() on null in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() (line 271 of views_data_export/src/Plugin/views/display/DataExport.php).

Even if this might not be top priority to make that working with a SearchAPI views, I wanted to let you know about that.

Berdir’s picture

Yeah, search API is not a priority for a first patch, but we should add an explicit check for the sql backend, to be able to display a proper error instead. Thanks for reporting.

Berdir’s picture

Ok, tested this again. Can confirm that all the fixes are valid and necessary, not sure what kind of drugs I was on when I tested my own patch :)

caxy4’s picture

Excellent contributions @berdir and @james.williams – I appreciate both the feedback and the action.

I'm using the latest patch from comment 22 in production and it's working great.

I did run into a hiccup though...running on a host with multiple application servers causes inconsistent data loss as described in this D7 issue: https://www.drupal.org/node/2352763.

I happen to be running on Pantheon so the issue only occurred on the "live" environment and will only occur for sites on Business and Elite plans; Professional plans run on a single app server.

It's possible we could work around this issue by storing results in the database as shown in the sandbox associated with the D7 issue linked above: http://cgit.drupalcode.org/sandbox-Jim-2352733/tree/

Berdir’s picture

That stream wrapper doesn't seem very scalable to me. At least with that implementation, you have to read the whole existing content from the DB, put it in a file, then append stuff and then write it all back. On every batch run. A better approach would be to to write separate rows and then read it all out again.

I see the problem, but having a non-reliable/slow shared file system is going to get you in trouble in other cases as well, for example with dropzonejs which also requires that you share the temporary file system.

But yes, we should definitely make the used directory/stream wrapper configurable and then you could use something like that if you can live with the consequences. Should be fairly easy to add another setting to the view, possibly in the same place as the filename.

james.williams’s picture

@caxy4 @berdir - is it worth spinning that off into a separate follow-up ticket too as that case could be considered outside the minimum viable product for this original issue? It's one that sounds solvable, but could easily derail the likelihood of getting the main solution done, which I don't think it needs to. Just my opinion :-)

dawehner’s picture

I gave the patch a try, ran into the same problem as haza in #23, and just fixed it. I believe the resulting code is actually even "nicer".

james.williams’s picture

Yes, that is nice!

jhedstrom’s picture

This is looking good. Thanks for all the work on this! Any chance somebody wants to write a test or two for it?

DuaelFr’s picture

+++ b/src/Plugin/views/display/DataExport.php
@@ -218,4 +223,158 @@ class DataExport extends RestExport {
+      // Initialize file we'll write our output results to.
+      // This file will be written to with each batch iteration until all
+      // batches have been processed.
+      // This is a private file because some use cases will want to restrict
+      // access to the file. The View display's permissions will govern access
+      // to the file.
+      $destination = 'private://' . $view->getDisplay()->options['filename'];

It looks like it could go wrong if more than one person ask for the same export (with different filters, or not) at the almost same time.
As we have no global token available in the file name, it seems currently impossible to have a random or time-based filename, what could be a workaround.


Would it be possible to add a warning near the filename in the UI if the private filesystem is not configured instead of waiting for the first try? Or even better, a hook_requirements implementation to warn users earlier.

dawehner’s picture

Would it be possible to add a warning near the filename in the UI if the private filesystem is not configured instead of waiting for the first try? Or even better, a hook_requirements implementation to warn users earlier.

That is a good idea! To be honest we could just use the public file system by default, unless a private file system is configured.

It looks like it could go wrong if more than one person ask for the same export (with different filters, or not) at the almost same time.

What about generating a UUID and use that for the filename?

james.williams’s picture

Using the public file system could expose data that should not be accessible to any user.

Did the potential for concurrent exports exist in the D7 version? If not, we should leave that for a follow-up issue really to keep this one on track.

DuaelFr’s picture

I just had a look. D7 version used a temporary file name then copied the file to its final destination at least in its drush integration.

@see _drush_views_data_export_batch_finished()

if (@drush_op('copy', $temp_file, $output_file)) {

I don't have much time to dig into the whole module to figure out if the UI version does the same or not.

dawehner’s picture

Using the public file system could expose data that should not be accessible to any user.

Fair. We should better be safe than sorry.

DuaelFr’s picture

I just tried #29 on Haza's case (same company here) and it's still failing on our large dataset.
With a step-by-step debug, I can see that the "count" query returns the entire dataset then tries to pass it though \Drupal\search_api\Plugin\views\query\SearchApiQuery::addResults().
addResults() fails on its first loop (too much $result->checkAccess() seem to exhaust server resources).

Edit: our "large" means 3800 indexed items, which is not so large imho.

dawehner’s picture

Edit: our "large" means 3800 indexed items, which is not so large imho.

Oh wow, yeah I just have like 500 items.

I guess we somehow need to be able to indeed just execute the count query :(

DuaelFr’s picture

FWIW I disabled the access check but had the same failure on the next foreach.
Anyhow, I agree with @dawehner: count queries should just deal with the total number of results, not the entire dataset.

My SolR result contains that total and I know ES almost does the same.
Request http://localhost:8983/solr/mycore/select?q=*%3A*&start=0&rows=0&wt=json&...

{
  "response": {
    "numFound": 8543,
    "start": 0,
    "docs": []
  }
}
james.williams’s picture

From what I can see in the D7 version, Search API had some special handling, added in #1258390: Add support for batched exports of Search API views. Perhaps the same sort of thing is going to be needed here.

james.williams’s picture

Status: Needs review » Needs work

I've just discovered a bug: in exporting to a CSV, where a pager value is set (which is used in the batching to set how many to export per batch iteration), the first row after that pager value is duplicated in the export.

For example, I've got my view set to do 15 rows 'per page', which means 15 are to be processed per batch iteration.
Then in my actual export, the first row is the header, followed by 15 correct rows. The next two rows are duplicates of each other. The rest of the export is fine, no more duplicates.

I also just want to remind us that there is still one TODO comment in the latest patch about making batched exports optional, and another around whether using the 'per page' setting for iteration limits is sensible or not. I reckon the latter is probably ok and the comment can just be removed / left in, but the former does need doing.

_Archy_’s picture

I've added an option to choose whether to show download link or instantly download export file. On the website I needed this patch drupal messages were hidden (clients ..) so this was a must.

_Archy_’s picture

Status: Needs work » Needs review

The last submitted patch, 42: views_data_export-batch_support-2789531-42-D8.patch, failed testing.

Berdir’s picture

Your patch is wrong, needs to be relative to the module root.

Also, this is implicity supported if you don't specify a filename I think?

david_garcia’s picture

Status: Needs review » Needs work

The overal approach of VBO in D8 of using core's serializers is just flawed.

Good for small and quick things, but unreliable for large exports. By definition, the serializers we are using (inherited from the symfony serializer component) are abstracted in such a way that serialization MUST be atomic.

Trying to make batched serialization on top of serializers that are atomic makes no sense.

IMHO the way to go is to completely drop the use of serializers (which is bad because it will not allow for code reuse of 3d party libraries) or go upstream and extend serializers to allow for batched serialization (some of them, because other are by definition atomic).

_Archy_’s picture

Status: Needs work » Needs review
FileSize
2.06 KB
12.98 KB

@Berdir I have checked and either way I do it when the batching finishes it redirects to front-page with the link in the drupal message. I believe that this part of code enforces it, and also the fact that no other redirect response is set in the finishBatch.

// The redirect destination is usually set with a destination, fall back
// to <front> to avoid a endless loop.
return batch_process(Url::fromRoute('<front>'));

Re-uploading the wrongly uploaded patch. I have added a checkbox on the view path dialog form to have the option for redirect download.

james.williams’s picture

@_Archy_ - the orange CSV button at the bottom of the view that the export is usually attached to would normally get a 'destination' query parameter on it, which is respected when the batch completes. So the destination parameter is what gets redirected to (which would normally just be the page display that the export display is attached to), and the empty string for the front page in that line is just a fallback for when no destination is supplied.

_Archy_’s picture

@james.williams Oh, I understand now. In my case I don't get any destination as I use the route for the export on menu link -> get redirected to front-page. It would be nice if the this instant download option would still allow the redirect to destination query (fallback front-page) but before that fire a JS window.open(fileUrl, "_blank"). But I don't quite see a clean way to do this.

_Archy_’s picture

Did some work on this.

  1. Removed the pager section from the view configuration. It doesn't make any sense to configure a pager for data export. But we still need to configure somehow the batch size and limit, so I've added a new section for this in the second column: "Export settings".
  2. Made batch export optional. Now there is an 'export method' on the 'Export settings' section that has: 'Standard' or 'Batch'. Might need some naming adjustments.
  3. Made the batch process method use the newly implemented options.
  4. Fixed some coding standard issues.

We cannot get totally get rid of the pager as it is a core part of the views, so I had to channel the new limit option through it. Also it even helps to make the standard export limited with the same option.

james.williams’s picture

Thanks for that work _Archy_! I have some points, but I don't think any major objections:

  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -30,22 +31,47 @@ use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
    +  protected $usesPager = FALSE;
    

    Is this needed? The RestExport class that is being extended sets usesPager to FALSE too.

  2. +++ b/src/Plugin/views/display/DataExport.php
    @@ -27,16 +31,72 @@ use Drupal\views\ViewExecutable;
    -    // Do not call the parent method, as it makes the response harder to alter.
    -    // @see https://www.drupal.org/node/2779807
    -    $build = static::buildBasicRenderable($view_id, $display_id, $args);
    

    Is this comment really no longer needed / true?

  3. +++ b/src/Plugin/views/display/DataExport.php
    @@ -27,16 +31,72 @@ use Drupal\views\ViewExecutable;
    -    // Setup an empty response, so for example, the Content-Disposition header
    -    // can be set.
    

    This comment looks like it ought to be kept?

  4. +++ b/src/Plugin/views/display/DataExport.php
    @@ -91,6 +151,18 @@ class DataExport extends RestExport {
    +    // We don't want to use pager as it doesn't make any sens. But it cannot
    +    // just be removed from a view as it is core functionality. Make sure
    +    // it doesn't interfere with us.
    

    There's a spelling mistake here ('sense' not 'sens').

    But more importantly, what does actually happen if the pager is left in place? The parent RestExport class sets its usesPager property to FALSE, but doesn't do anything else with pagers - why is our case any different? Given that that is core, I think we should follow their UX pattern, if it's applicable. But if there's a good reason to think we're different, then fine, so I'd be interested to hear back.

    (I can totally be persuaded that this is necessary, I'm just surprised to see this is different to RestExport displays.)

  5. +++ b/src/Plugin/views/display/DataExport.php
    @@ -100,6 +172,49 @@ class DataExport extends RestExport {
    +    // Add a category for data export settings in second column.
    

    I don't see this element rendered as a second column, just another row - am I missing something?

  6. +++ b/src/Plugin/views/display/DataExport.php
    @@ -145,6 +260,43 @@ class DataExport extends RestExport {
    +        $form['export_method'] = [
    +          '#type' => 'radios',
    +          '#title' => $this->t('Export method'),
    +          '#default_value' => $this->options['export_method'],
    +          '#options' => [
    +            'standard' => $this->t('Standard'),
    +            'batch' => $this->t('Batch'),
    +          ],
    +          '#required' => TRUE,
    +        ];
    

    Can we have some sort of description about what these options mean, in terms that are as simple as possible please?
    I imagine the D7 version of the module has some wording we could use.

This still leaves the question of how to resolve count queries for non-SQL backends open by the way. (Note this patch puts it back to getting the count from a count query, rather than setting the get_total_rows property and executing the view, which I'm not sure is what we want to do or not.)

_Archy_’s picture

Thanks for the review.

  1. I have added that during experimentations and forgot it there. Removed.
  2. I have looked into that issue and the way it works now, as it is fixed, is basically how it is here, with the only difference that we don't call the parent buildResponse because we already have the view loaded and can call it's buildRenderable method. So I guess we can drop the comment.
  3. Seems like I have removed it accidentally. Readded.
  4. Misspell fixed. Regarding the pager section: I personally see it as confusing for the user to have it for a data export display, as we won't have pages. We used it until this moment for a means to configure the batch size (items per page), but as I have added a new configuration "batch size" under new section "export settings", we don't need it anymore. The only thing the pager is used in the background is to limit the number of rows exported.
  5. That should appear in the second column of the display configuration interface under the "No results behavior" or "Language" sections.
  6. Added description for "batch" and "standard".
  7. About the way the total number of rows is got: I have replaced it with the query way because it wasn't working with my approach. Only now have I read the problem with it described above. So I decided to revert and move this part in the buildBatch, then pass it as an argument to the processBatch. The problem with the get_total_rows is that it also queries all the fields/entities and instantiates them, so I've added a limit(1) on the query, guess it helps.

About the non SQL. I have no experience with these. But shouldn't Drupal core provide means for count query, or the module that provides the driver for it, and then views use it so we can as-well?.. :D

The last submitted patch, 50: views_data_export-batch_support-2789531-50-D8.patch, failed testing.

james.williams’s picture

Issue tags: +Needs tests

Thanks for the explanations and modifications! :-) I only have a few things to reply with this time... lovely job though otherwise.

2. Given that the DataExport::BuildStandard() method is now basically the same as RestExport::buildResponse() then, is there actually any need for DataExport::BuildStandard()? Can't we just call parent::buildResponse() at that point, or am I missing something? Otherwise we're just unnecessarily duplicating code, from what I can see.

4. Yeah ok, I see why the 'pager' concept might not make sense. I also feel it is sufficient though, and basically does the same thing, just with a misleading name. I'd be interested to hear what anyone else thinks? Should we just reuse the views pager (perhaps relabelling it if that would help from a front-end point of view?), since it's doing the same job as these export settings, or is it better to have our own export settings for the job?

5. Oh, I see, sorry I didn't know about that - thanks!

7. We probably need to hear whether that limit(1) works for non-SQL backends. My concern is that it would limit the underlying query so the count would come back as '1', which is not correct.

Drupal/views does provide API to allow other search backends to implement count queries, but various backends may not actually support getting counts without executing full queries first. I'm not sure we could say the API is perfect :-) So VDE needs to do its best to allow for non-SQL backends where feasible, as it is dealing with executing the view at a low enough level that is needs to be aware -- or we could postpone that to a separate ticket, which I'd be in favour of (as the MVP for this ticket is already pretty large).

_Archy_’s picture

2. The only difference is this: we do

$build = $view->buildRenderable();

while the RestExport does

$build = static::buildBasicRenderable($view_id, $display_id, $args);

Our build has the view appended to it and already instantiated (which is good), while the other way around we have view information in the build, from which I guess the renderer will instantiate the view executable. We could use the RestExport::buildResponse(), but then we would be instantiating the same view twice.

4. I've also taught about relabeling the existing pager, but it will be messy. First off we need to remove some of the pager options, then we would also need to completely restructure the forms for the ones decided to keep. Remove the "more link" option, etc..

7. Well it seems to not affect the count query, but just the main query, still good point. Someone should test it.

About the tests: should we test for each export type? It should be tested whether the export method setting, the batch size and the row limiting works. Also do we stress test?

bember’s picture

When I set limit as 0 (i.e. unlimited) only one batch is processed. I suggest a fix as follows.

Instead of:

    $view->build();
    $view->get_total_rows = TRUE;
    // Don't load and instantiate so many entities.
    $view->query->setLimit(1);
    $view->execute();
    $total_rows = $view->total_rows;

Let's make sure we have the correct count value:

    $view->build();
    $count_query = clone $view->query;
    $total_rows = $count_query->query()->countQuery()->execute()->fetchField();
    // Don't load and instantiate so many entities.
    $view->query->setLimit(1);
    $view->execute();
_Archy_’s picture

The way of counting the number of results was flipped two times already. I agree that it should be done by getting the count query, but as pointed out in #23 it breaks for SearchApi. The issue you describe seems to be a bug or a views version functionality difference.

alan-ps’s picture

I added simple fix, in order to allow using replacement patterns for the filename (as it works for standard export).

mikejw’s picture

Just tested here for exporting some eform_submission entities using the entity (not fields) row plugin - works a treat. Appreciate your efforts!

ransomweaver’s picture

I find that with batch turned on,
a) the "download instantly" option results in the file being saved in the private file system, but no download initiated nor any message.
b) with "download instantly" off, a link is presented by drupal_set_message, but it doesn't honor the filename suggestion, comes named e.g. files.csv (or on firefox, just "files" )

Also the "download instantly" info line says "Otherwise you will be redirected to front-page containing the download link" but this is not the behavior. But It think that is not related to this patch.

fafnirical’s picture

With the latest patch (and I imagine with others), I'm having issues getting Facets to process the Views results during the batch process (it works fine with the non-batch data export). It looks like Facets are respected when Views Data Export determines the number of batch jobs to run, but not when it executes the View to get the actual results.

As an example, if the View would return 20 results when faceted, and the batch size is 100, the resulting data export is 101 rows (including header). If the same View would return 104 results when faceted, the resulting data export is 201 rows (including header).

As a workaround, I'm using hook_batch_alter() to add the View's exposed input (including selected facets) to the query string during the batch process.

james.williams’s picture

Just to avoid the more general resolution for this getting held up, I think that, as mentioned previously, this ticket needs to stick to a achieving a minimum viable product, that does not need to include full search API or facet support. Those could then be resolved in a follow up before being included in any stable release. I agree they're important to solve, but this ticket may never end if it's scope is not kept to something quite tight please.

@fafnirical I'm glad you found a workaround too though!

weseze’s picture

I tried the patch from #58 but it is not working as expected for me...

I'm exporting all my users with a ton of extra fields, but I only ever get 100 rows (+1 header row) and never the 1000+ users that are actually in Drupal.

Am I missing (or misunderstanding) some configuration settings? Or is there an actual problem?

skitten’s picture

@weseze the fix in #56 worked for me - otherwise I just got the first batch. Would be good to get that in the patch.

harora’s picture

Did following on this.

1. Added fix given in #56 to make the batch working.

2. I have added option for redirection path. Useful if user want to come back on a desired page with link otherwise fall back on front.

missvengerberg’s picture

Hi guys, I'm using Drupal 8.3.2 + VDE 8.x-1.0-alpha4+1-dev with #65's patch and I can't get it working. I keep having the issue of the batch size and limit rows. The output is always the same, which is the number of rows of the size batch and not the value of the limit field. I tried to change the limit rows from 0 to 5000 (I have right now like 3000 rows to export) to see if that worked that way, but no. It doesn't matter the number I put in limit rows, always export the number for size batch. Any clues?

Thanks

PS: I forgot to say I'm exporting XLS (Serialization Excel) module

joachim’s picture

Status: Needs review » Needs work
  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -212,10 +408,199 @@ class DataExport extends RestExport {
    +    // Load the View we're working with and set it's display ID so we get the
    

    Nitpick: 'its' without the apostrophe, here and in a few other places.

  2. +++ b/src/Plugin/views/display/DataExport.php
    @@ -212,10 +408,199 @@ class DataExport extends RestExport {
    +      $filename =  \Drupal::token()->replace($view->getDisplay()->options['filename'], array('view' => $view));
    

    What if multiple users are exporting this at the same time? The batch processes will try to write to the same file, won't they?

    What about using the user private tempstore instead? The size of that is LONGBLOB, so 4GB.

Berdir’s picture

You can't append to private store, it would mean loading possibly megabytes of data from the DB, appending something and then writing that again.

We're doing a token replace, if you're worried about conflicts between users, you can include global tokens like the uid or even a random string.

joachim’s picture

Ah ok. Thanks for the clarification!

In that case, yeah, add tokens for uid and timestamp (I can imagine power users running two exports simultaneously by opening up the same view in two tabs and applying different filters).

Berdir’s picture

current-user and current-date are global token types that always exist, so that should be OK, not sure if we can provide defaults, maybe the token browser could be added if not there yet.

martins.bertins’s picture

Fixed last batch size calculation if export limit is set.
With an export limit of 1200 items and a step of 1000 items, in the last batch it processed 800 items instead of 200.

Added workaround for xml export that removes duplicate xml declarations and ensures one root element.

martins.bertins’s picture

Status: Needs work » Needs review

Forgot to change status.

miiimooo’s picture

Latest patch works for me so far.

killes@www.drop.org’s picture

Status: Needs review » Needs work

Doesn't work for me.

I have a private file directory set up and the download file is created there, but the download does not work.

There is no file offered when I set the option to download immediately and when I don't, I get a 403 when I try to download the file from the link.

Shouldn't this patch provide some hook_file_download to check and provide access?

Or is the site maintainer expected to solve this in another way?

miiimooo’s picture

Status: Needs work » Needs review

@killes@www.drop.org that doesn't sound as if it was related to this module or batch. Check your file and drupal permissions and log. 403 indicates there is some sort of access restriction either on the file system level or in the drupal permissions (latter you can test by trying with user 1). Easiest way to check whether your private files directory is set up correctly that I can think of is adding a file field somewhere and setting it to private and trying uploading/downloading something. I really don't think your issue us related to this patch or even the views data export module.

joachim’s picture

Status: Needs review » Needs work
  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -152,6 +333,21 @@ class DataExport extends RestExport {
    +        $form['redirect_path'] = [
    +          '#type' => 'textfield',
    +          '#title' => $this->t('Redirect path'),
    +          '#default_value' => $this->options['redirect_path'],
    +          '#description' => $this->t('If you do not check Download instantly, you will be redirected to this path containing download link after export finished. Leave blank for <front>.'),
    +        ];
    

    This element should come after automatic_download, and be set to hide when that one is enabled.

  2. +++ b/src/Plugin/views/display/DataExport.php
    @@ -212,10 +408,213 @@ class DataExport extends RestExport {
    +      $filename =  \Drupal::token()->replace($view->getDisplay()->options['filename'], array('view' => $view));
    +      $destination = 'private://' . $filename;
    

    We shouldn't be using the filename set by the user to store the work in progress in the private files folder, as it may be something that doesn't guarantee uniqueness. That could mean that there would be a file clash when two users exporting from the same view, or one user exporting different filter values in two browser windows. Worse, if an admin configures two different export views with the same filename, there would be a security risk if the two views are not accessible to the same roles.

    So the filename that is used to save to the private files should be something the code controls, based on the view name + user ID + timestamp. That should then be read in and served to the user as the filename configured in the settings.

harsha012’s picture

Status: Needs work » Needs review
FileSize
21.12 KB

As per #76, applied the changes.

phjou’s picture

The last patch works on my website.

shawn_smiley’s picture

Status: Needs review » Needs work

The patch in #77 mostly worked for me. Though I did encounter 2 issues:

1. When selecting the "immediate download" option, I was never prompted to download the file after the batch operation completed.

2. When specifying a download URL, the link works, but it appended the number 5 to the filename (ex: broadcast_list_subscribers.csv5). So the end user would need to rename the file after downloading it. (the view is configured with the download filename suggestion of "broadcast_list_subscribers.csv")

phjou’s picture

I confirm the second problem, I corrected it yesterday. I have just changed the order in the name to keep the name configured in the view at the end.

phjou’s picture

I've just corrected the first problem about the direct download.

About the name file, I think keeping the user id is better because if another user export at the same time, you will have bigger problems than just a name. So reverse the order seems a better choice for me in order to keep the extension of the file.

killes@www.drop.org’s picture

@miiimooo you were quite correct, an outdated version of redirect.module caused these issues.

@phpjou your patch is using Drupal::time() which makes your patch 8.3 only.

killes@www.drop.org’s picture

Status: Needs work » Needs review
FileSize
21.13 KB

Updated the patch to use REQUEST_TIME

shawn_smiley’s picture

The patch in #83 does address the problem with the user ID being appended to the end of the filename. Thanks for posting that patch so quickly. :-)

A couple other observations of a less critical nature:

  • We should probably put the generated CSV files in a subfolder of the private directory so that we don't clutter up the root of the private file system. That would also facilitate disk cleanup for deleting old generated files.
  • The earlier comments mention appending/prepending a timestamp to the file name, but in my tests the files do not contain a timestamp in the name. Just the uid.
  • Probably an edge case, but it would be nice if the batch operation would redirect to the URL specified in the destination parameter rather than needing a hardcoded URL. In my case, I have two ways to get to the CSV export. One from an operation on a listing page of possible exports and the other on the actual report page. -- Ignore this comment. This functionality is working.

NOTE: I'm testing against the alpha4 release rather than the dev branch as it's a requirement on my current project that we not use dev releases. So that may account for some of the above variances.

killes@www.drop.org’s picture

Can anybody test whether they actually get a valid XLS file when they use the batch option? I've had reports it would be garbled in some way and at most one "page" of data would be visible.

jhedstrom’s picture

Now that #2862885: Batch: Convert system functional tests to phpunit is in, it should be possible to write a test to verify valid xls file download (the test can declare the xls_serialization module as a test dependency).

killes@www.drop.org’s picture

@jhedstrom I found the "todo" in the xls_serialization code. No batch support yet.

@all I find it rather inelegant to prepend the user ID to the actual downloaded file. Maybe put the ID between the file name and the dot before the extension?

skitten’s picture

Just tried patch 83. Immediate download from batched operation now works for me, but I can't get it to redirect anywhere afterwards - just sits on the 100% progress bar.

miiimooo’s picture

@killes@www.drop.org I can confirm using xls_serialization results in a garbled file

david_garcia’s picture

Status: Needs review » Needs work

Won't universally plug with the serializer layer because it's not designed to allow batches.

Martijn Houtman’s picture

Patch #83 works for me using CSV, redirect can be set using the destination query parameter.

Files do not appear to be removed after cron run, even though they are in the file_managed table with status 0. Is that normal?

Berdir’s picture

That depends on your configuration (see file system settings page), by default they are removed after 6h.

yanniboi’s picture

Just been looking at this for the first time and had a couple of notes:

+++ b/src/Plugin/views/display/DataExport.php
@@ -212,10 +408,219 @@ class DataExport extends RestExport {
+      $current_user = \Drupal::currentUser();
+      $user_ID = $current_user->id();
+      $request_time = REQUEST_TIME;
+      $timestamp = strtotime($request_time);
+      $view_name = \Drupal::token()->replace($view->getDisplay()->options['filename'], array('view' => $view));
+      $filename = $user_ID . $timestamp . $view_name;
  • REQUEST_TIME is deprecated and \Drupal::time()->getRequestTime() (properly injected ;) ) should be used now
  • strtotime($request_time) returns false as it is already a timestamp
yanniboi’s picture

3 quick changes:

Fixed timestamp

Timestamp is now correctly prepended to filename.

Export subdirectory

Putting files in views_data_export subdirectory of Private. This helps with knowing that a file was created by views_data_export (which is helpful for trimming the uid and timestamp off the top of the filename in hook_file_download).

Views schema

Added views schema for the new display options.

shawn_smiley’s picture

Note that \Drupal::time()->getRequestTime() is only available in Drupal 8.3 or newer. This change would make this module unusable for anyone still running older versions of D8. See comment #82 above.

yanniboi’s picture

Thanks @shawn_smiley, I missed that!

Will update the patch to fix.

andrewbelcher’s picture

#95 Drupal core < 8.3 is unsupported, so I would suggest the answer is to up the version requirement to 8.3 rather than add deprecated code in?

#94 I wonder if the implementation of hook_file_download @yanniboi is referring to should be part of this patch. I think the end user would expect the file that downloads to respect the filename given in the view's configuration.

The alternative, rather than prepend timestamp/user to the file name, is to use that information to adjust the directory we store it in. Then we don't need to change the file name as it will already be as expected.

ThomWilhelm’s picture

Gave this patch a test on Drupal 8, working really nicely for me, only one major bit of feedback against the D7 version below.

On the D7 version, when you create a view of say 10,000 nodes and create a data export display, the preview will always be limited to 20 items and display the message above "A maximum of 20 items will be shown here, all results will be shown on export.". This isn't implemented on the D8 version, so the view editor can hang for a long time when 10,000+ rows are output, this gets updated each time you change the view, which makes editing the view a real pain.

jhedstrom’s picture

Issue summary: View changes

Thanks @ThomWilhelm I've updated the IS to note that we should implement that limit to avoid that issue :)

ibullock’s picture

Was there any resolution to the issue of duplicate rows? I saw it called out a few times but never directly addressed in later comments.

I'm exporting a CSV for a few thousand nodes and getting random sets of duplicates each time on an Acquia environment, but not locally.

Contacting Acquia support I was directed to set my temp files directory to be environment specific and to some old documentation for D7, but that seems only to be a start to the solution as far as I can tell and I still get duplicates.

Edit: Should also mention there are occasional rows skipped as well, so overall this seems to be an issue of tracking progress in the batch.

josebc’s picture

@ibullock this can happen on a multi-server environment if the tmp directory is not in sync (glusterfs, nfs ..) since batch can write to a file on a different server each time, make sure your tmp directory in in sync between servers

bendev’s picture

tested patch #94 successfully locally.

On my production server, i got this error

"The data could not be saved because the destination is invalid. More information is available in the system log.
Export failed. Make sure the private file system is configured and check the error log."

needed to set the private file directory scorrectly in settings.php and make it writable + drush cr

joachim’s picture

> needed to set the private file directory scorrectly in settings.php and make it writable + drush cr

That's something that the Views plugin in this patch could check in its validate() method, and prevent you from adding itself to the view until that's been set up by the site admin.

khiminrm’s picture

Hi!

I've tried patch from #94. It works, but I had problems with file name. I created export for users and after using the same filters twice I received file with name 'csv' without file extension. After clearing site's cache all works again but only one time for each combination of filters.

thtas’s picture

@khiminrm funny I've just been working on fixing a very similar problem.
The way it manifested for me was that I wanted anonymous users to be able to do batch exports.

The problem is that the response to the route which this custom display creates is cached (for anonymous, not for authenticated), so if you redirect to /batch?id=1 then the next time you try to export you get sent to the old batch URL and the batch no longer exists (with a "no active batch" type error).

I also found that Anonymous users don't get access to download the exported data.

It seems the idea is to use the view permissions to determine access to the file:

// This is a private file because some use cases will want to restrict
// access to the file. The View display's permissions will govern access
// to the file.

But I think it should be sufficient to just allow download access to files created by the same user (the view access controls the creation of the file but after that you don't really have any good context to check access to the file anyway).

So this updated patch (based on #94) does 2 things:

1. Stop caching the route for this display if it's using batch mode
2. Explicitly allow access to download the batch file so that users don't get access denied after they've generated the file.

thtas’s picture

FileSize
23.38 KB
2.22 KB

Updated the patch from #105 to suppress some warnings on the file entity access check

khiminrm’s picture

Hi, @thtas!

Thanks for your patches. I've tried the last one and noticed another problem: file is exported with last selected filter. Reset button doesn't help. I tried to disable cache in export views and to remove option to remember last selection in exposed filters, but it didn't help (

khiminrm’s picture

Hi, @thtas!

I've tried export without patch and had the same problem with last selected filter applied to results in file on page without filters.

How to check if batch is working with patch?

Thanks!

khiminrm’s picture

Hi, @thtas!

I've tried your patch again, checked views settings and noticed that I've missed settings for batch in my previous testing. I've enabled batch and your patch works for me. Didn't notice any bugs for this moment.

Thank you!

skitten’s picture

I'm running the patch from #105 and seeing a weird thing:

If the logged in user is an admin it works great. If they're not then it hangs just displaying "Exporting data...".

The odd thing is that if I keep reloading that page I can see it adding [batch size] lines to the file in private each time, until the file is complete. But it won't redirect back after that, I still just see "Exporting data..."

I couldn't reproduce this on a clean drupal install so it may be that I have something else interfering, but I'm not really sure where to start looking, since I don't really know how the batch system works. So I'm wondering if "works for admin" is clue enough for someone to guess what might be going on.

skitten’s picture

Figured out my issue in #110 - it was some customizations I'd made in my theme.

StevenWill’s picture

Status: Needs work » Reviewed & tested by the community

Patch from #106 worked. The batch operation is a major feature for this module and basically can not use this module without it. With this patch I was successfully able to perform a export that took 15 minutes and several exports worked. Please add this patch to the module branch!

skitten’s picture

I am also successfully using using the patch for large exports.

Mile3’s picture

Tested #106 and working with 10,000+ records.

Can we get this patch into dev?

Thanks!

_Archy_’s picture

Status: Reviewed & tested by the community » Needs work

I think we still need tests for this.

Berdir’s picture

Status: Needs work » Needs review
FileSize
23.51 KB
622 bytes

The thing is that this module currently has basically no tests except a very basic kernel test that just does some config dependency checks.

I'll see if I can create something, no promises.

I noticed a problem with this on 8.5 due to the missing _format argument, while it doesn't solve all cases, at least the cases with an attached display are fairly easy to solve. See #2937942: REST views: regression in 8.5.x: view with display using 'csv' format now returns a 406 response, need to add ?_format=csv to URL to make it work.

What we can do is just add the selected format to the URL.

giorgio79’s picture

claudiu.cristea’s picture

giorgio79’s picture

Idea: How about using D8 core Migrate framework for exports instead of direct batch api? Steven Jones brainstormed about it here #2602650: [meta] Rethink Data Export from the ground up

grndlvl’s picture

So I am thinking that we don't want this logic here... however, I ran into an issue with using this with SearchApiQuery. To which I adjusted to add support for the count query. Note this requires #2958207: Implement query() as well.

grndlvl’s picture

FileSize
850 bytes

Whoops forgot about the interdiff.

grndlvl’s picture

FileSize
23.62 KB
612 bytes

Use static instead of self when calling static methods so that overrides are called.

grndlvl’s picture

FileSize
1.58 KB
23.7 KB

Fixing issue with hook_views_pre_view() not being called.

Spec0’s picture

Hi everyone, the patch works thanks, there are some notes that I would like to add to the documentation:
1. Make sure to add the correct MIME types(csv, xml) to document file entity type otherwise it will result in a file entity with broken bundle.
2. If you want authenticated or "other" user roles to be able to download the exports, permission "Document: Download own files" should be checked.

paulmckibben’s picture

Status: Needs review » Needs work

I tried the patch, but it only runs the first iteration of the batch. Only the number of rows I configure for the batch size actually get exported.

grndlvl’s picture

@paulmckibben I am using this one and it appears to be working as expected. Can you double check your limit?

paulmckibben’s picture

I did - I actually tried a few different limit sizes to see if they made a difference, and I always got the exact number of rows that matched the limit (first iteration of the batch only).

The only possibility I can think of as to why it works for you and not for me is that the site I'm trying this on is on 8.4.6. I will be updating it to 8.5.x soon and can try again then. Thanks for looking into this.

RobinTimman’s picture

Hi,
I am using this patch, but the limit is set to 0 and the batch size to 250, now will the issue be, that it only exports the size that the batch size is.
If I change the batch size to 500 it exports only 500 rows.

marklein’s picture

Hi, it happens to me, exactly, the same as the previous comment.

Spec0’s picture

Hi, this is because the changes in 2789531-123.patch breaks the total rows when limit is set to 0. Use 789531-116.patch

marklein’s picture

Thanks, it works perfectly!

alex_ognev’s picture

FileSize
21.03 KB

Hi, here is patch based on 2789531-123.patch , I have fixed total count calculation there , also was implemented https://www.drupal.org/files/issues/2945252-5.patch

alex_ognev’s picture

alex_ognev’s picture

johnchque’s picture

Status: Needs work » Needs review

Triggering test bot.

tobiberlin’s picture

Tested 2789531-132.patch - works great

matsbla’s picture

Patch in #134 works perfect!

johnchque’s picture

I've been testing this and I found some problems (not that big though):
- The redirect function does not work with the Standard export method. Not sure if it should anyway but if it doesn't we should clarify that.
- There is no validation for the redirect path, I entered a path with no / and it exploded with a WSOD.
- As stated in #76 it might be better to use states to show or not the redirect when the Download instantly checkbox is not checked.

johnchque’s picture

Added some test views so we can tests easier this.

gun_dose’s picture

Applied patch from #139 - with xls-format it produces broken file. MS Office does not open this. But it can be opened by Libre Office and it contains only items from last batch operation.

jienckebd’s picture

#132 worked perfectly -- thank you so much! Saved me a ton of time.

Yury N’s picture

I have faced same issue from #140. Looks like latest patches do not contain workaround for xls/xslx from #132. Added this to patch from #139.

Yury N’s picture

FileSize
41.44 KB

Sorry, messed with patch file

JeffM2001’s picture

Updated the patch with the suggestion in #97, moving the user id and timestamp to the directory path, so that the filename remains what is configured in the view settings.

stijndmd’s picture

With the final patch I'm getting an ajax error:

Notice: Undefined index: automatic_download in /mysite/web/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php on line 614
{"status":true,"percentage":"100","message":"100% complete. Time elapsed: 6 sec","label":""}

Edit: I also can no longer edit the "path" in the view settings. And I don't see the "automatic download" settings in the view configuration, I guess I should and I guess that's where it'd going wrong for me?

chipway’s picture

Status: Needs review » Needs work
matsbla’s picture

After I apply this patch and create a view for data export, and I start to get a warning with serial_launcher with message: No free threads available for launching jobs, and it happens everytime cron runs. However I don't get any more details in the warning.

jibran’s picture

Status: Needs work » Needs review
FileSize
41.43 KB

Here is the reroll.

jibran’s picture

FileSize
924 bytes
42.33 KB

Re-added

+<?php
+
+use Drupal\Core\Access\AccessResult;
+use Drupal\Core\Entity\EntityInterface;
+use Drupal\Core\Session\AccountInterface;
+use Drupal\file\FileInterface;
+
+/**
+ * Implements hook_ENTITY_TYPE_access().
+ */
+function views_data_export_file_access(EntityInterface $entity, $operation, AccountInterface $account) {
+  /* @var $entity FileInterface */
+  if ($operation == 'download') {
+    // Grant access to download the generate export if it was generated by the
+    // current user.
+    $pattern = '/\/\/views_data_export\/(?<uid>\d+)-\d+-.+\.csv$/';
+    preg_match($pattern, $entity->getFileUri(), $matches);
+    if (isset($matches['uid']) && $matches['uid'] == $account->id()) {
+      return AccessResult::allowed();
+    }
+  }
+}
+

which got removed in #132.

redsky’s picture

I tested patch #144 and it worked after I setup my private files. Only issue was the file didn't open, I had to rename it to .csv to have it open in excel like it does with the non-batch scenario. This was on Safari on a Mac.

Thank you for the work on this!

divya141’s picture

I have tried the patches #149 , #144 both with the current module for Data export as Xlsx file. Patch is not getting applied properly. Can anyone, Please attach the Applied Patched Module which is working for the Xlsx format data download using the Batch Process method. THANKS !!!

jibran’s picture

@divya141 you need to update the module to the latest dev version to apply this patch.

maxplus’s picture

Hi,
thanks for all the good work to getting the batch operation working in the D8 version!
The patch from #149 applies clean to the current dev version and I also get the batch settings in the views-UI.
The only problem I'm getting is that during export I get an ajax error with:

Error: Class 'PHPExcel_IOFactory' not found in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() (line 563

Before the patch and running 8.x-1.0-beta1 with xls_serialization it was working fine.
I also tried using the dev version of xls_serialization (including https://www.drupal.org/project/xls_serialization/issues/2905511) but that did not help.

When I additionally run composer require phpoffice/phpspreadsheet it does not change anything.

What am I'm doing wrong?

Thanks!

_gradient_’s picture

Hey guys, is it possible to make redirection with 'Download instantly' enabled?

ashish.mahajan’s picture

Hello Everyone,

The patch works for small amount of data. But i have around 1million data to export and seems like batch is also not working to export such a huge data. I am trying with batch and it is stuck and did not complete the operation everytime. Can anyone please suggest what should be the approach to export such huge data in csv or xls format. Thanks in advance.

SylvainM’s picture

Patch #149 works great, thanks

aniketto’s picture

Patch #149 works! Thanks @jibran!

There is a glitch though, the generated file does not have an extension of the format which it was exported in.
I tried with CSV and XML, but got a random file_name like fileEhyIsh or file046OEJ with no extension.
I manually had to add .csv in order to open the file correctly.

Could we have a fix for this along with what @_gradient_ mentioned in #154?

mangy.fox’s picture

Patch #149 is working beautifully for me!

divya141’s picture

@mangy.fox Are you Guys able to use the Batch Process for data more than 50 thousand in Excel (as xls or xlsx) , or for any type of large data ?? Please, reply

mangy.fox’s picture

Successfully exported ~32000 records (4Mb file) as a CSV. Not currently using XLS.

divya141’s picture

@ashish.mahajan Is your issue resolved ?? are you able to import the data ? If yes, what is the approach you took to import the data ??

Thanks !!!

anpolimus’s picture

Applied patch to my project. Export feature is working properly.
8 000 items were exported successfully.
Remaining item is to migrate this test to Drupal 8
https://cgit.drupalcode.org/views_data_export/tree/tests/base.test#n355

hanoii’s picture

Issue summary: View changes

#149 seems to working very good.

I addressed #98 by submitting #3008369: Limit live preview queries so that errors/or no pager doesn't crash it, I think it's a core issue.

divya141’s picture

Anyone is able to use the Batch Process for data more than 50 thousand in Excel (as xls or xlsx) , or for any type of large data ?? Please, reply, which is working for them. Please, reply

ryan.gibson’s picture

Status: Needs review » Reviewed & tested by the community

This works great for me and seems that it's worked great for a few others as well. I patched the latest dev version, changed the view export to a batch, configured the private file directory, cleared the cache, and ran the export without issue.

opdavies’s picture

I've applied the patch from #149 originally against 1.0-beta1, and found that whilst the the batch settings appeared in the Views UI, the changes weren't persisting.

I then updated the module to the latest dev version with composer req 'drupal/views_data_export:1.x-dev' and after rebuilding the cache again the settings persisted correctly.

I had the same permissions issues as #102, but now the export is working as expected.

andypost’s picture

Status: Reviewed & tested by the community » Needs work

It needs
- port/update tests - see #162
- upgrade path - update hook

uniquename’s picture

@maxplus I had the same issue. To me it seems, that the patch uses PHPExcel classes instead of PhpSpreadsheet classes that come with the xls_serialization module, because it requires "phpoffice/phpspreadsheet": "^1.0"

I adjusted the classes according to https://github.com/PHPOffice/PhpSpreadsheet/blob/develop/src/PhpSpreadsh...

Find a patch attached that works for me on top of #149.

ryan.gibson’s picture

FileSize
42.46 KB
802 bytes

The following patch aims to resolve the issue in which the generated file download doesn't have the extension, preventing applications from properly opening them.

nicrodgers’s picture

FileSize
42.46 KB
703 bytes

I've updated the patch from #169 to fix the regex pattern in views_data_export_file_access() so that it works with the location of saved files.

jlvelasco’s picture

i got this error when start the batch process,

Hubo un error HTTP AJAX.
Código de Resultado HTTP: 200
A continuación se detalla la información de depuración.
Ruta: /batch?id=7116&op=do_nojs&op=do
StatusText: OK
ResponseText: TypeError: Argument 1 passed to Drupal\views\Entity\Render\EntityFieldRenderer::getEntityTranslation() must implement interface Drupal\Core\Entity\EntityInterface, null given, called in /var/www/vivelus-prisma/docroot/core/modules/views/src/Plugin/views/field/EntityField.php on line 1046 in Drupal\views\Entity\Render\EntityFieldRenderer->getEntityTranslation() (line 69 of /var/www/vivelus-prisma/docroot/core/modules/views/src/Entity/Render/EntityTranslationRenderTrait.php).

core: 8.3
module: 1.x-dev
patch : #170

someone?

ressa’s picture

FileSize
2.54 KB

I have tested patches in #170 and #168 with a Search API Solr View and if I batch export as a CSV-file I get this error:

The website encountered an unexpected error. Please try again later.

In the log file:

Error: Call to a member function execute() on null in Drupal\views_data_export\Plugin\views\display\DataExport::buildBatch() (line 71 of /app/web/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php)

Drupal 8.6.3 with search_api_solr and latest views_data_export dev-release

composer require "drupal/search_api_solr"
composer require 'drupal/views_data_export:1.x-dev'

Patches in composer.json:

"patches": {
    "drupal/views_data_export": {
        "patch #170": "https://www.drupal.org/files/issues/2018-11-19/2789531-170.patch",
        "patch #168": "https://www.drupal.org/files/issues/2018-11-08/views_data_export_batch_xlsx_fix.patch"
    }
}

Note: I am adding my notes for a quick set up with Lando.

anpolimus’s picture

@jlvelasco I think, your issue is not connected with batch.
Can you test your view on small amount of data that don't require batch?

mhmd’s picture

#170 works fine with me on drupal 8.5.8

Big thanks to all contributors, I hope these patches views_data_export_batch_xlsx_fix.patch & 2789531-169.patch applied to dev version at least soon to be easy get that feature.

mahfuznz’s picture

Hi,

I am using Drupal 8.5.3 at the moment with views_data_export. I want to implement batch operation for bigger downloads of CSV and XLS file. Which patch should I apply first? views_data_export_batch_xlsx_fix.patch or 2789531-169.patch

I tried to apply 2789531-169.patch using the command from the root views_data_export module directory

patch -p1 < path/2789531-169.patch

Following were shown after executing the command:

patching file config/schema/views_data_export.views.schema.yml
patching file src/Plugin/views/display/DataExport.php
Hunk #8 FAILED at 419.
1 out of 8 hunks FAILED -- saving rejects to file src/Plugin/views/display/DataExport.php.rej
patching file src/Plugin/views/style/DataExport.php
Hunk #3 succeeded at 278 (offset -15 lines).
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_1.yml
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_2.yml
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_3.yml
patching file views_data_export.module

Could someone please help me with that?

And after running the patch command and clearing cache do I have to run any other command to make the new patch effective so that I can select batch file download option in the back-end

Your help is much appreciated.

Many thanks,

Mahfuz

mahfuznz’s picture

Hi,

Just to add some extra information. I tried to apply the following command in the following sequence with --dry-run option to test that showed me the following results:

patch -p1 --dry-run < ../../patches/views_data_export/views_data_export_batch_xlsx_fix.patch

patching file src/Plugin/views/display/DataExport.php
Hunk #1 FAILED at 9.
Hunk #2 FAILED at 562.
Hunk #3 FAILED at 580.
3 out of 3 hunks FAILED -- saving rejects to file src/Plugin/views/display/DataExport.php.rej

patch -p1 --dry-run < ../../patches/views_data_export/2789531-170.patch

patching file config/schema/views_data_export.views.schema.yml
patching file src/Plugin/views/display/DataExport.php
Hunk #8 FAILED at 419.
1 out of 8 hunks FAILED -- saving rejects to file src/Plugin/views/display/DataExport.php.rej
patching file src/Plugin/views/style/DataExport.php
Hunk #3 succeeded at 278 (offset -15 lines).
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_1.yml
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_2.yml
patching file tests/modules/views_data_export_test/test_views/views.view.views_data_test_3.yml
patching file views_data_export.module

Any help or direction is much appreciated.

Thanks,

Mahfuz

mhmd’s picture

Download the dev version of the module first then apply patches in #170

mhmd’s picture

FileSize
42.46 KB
1.62 KB

The batch "2789531-170.patch" has bug writing cells in wrong order I fixed as below diff

--- a/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php
+++ b/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php
@@ -578,7 +578,7 @@ public static function processBatch($view_id, $display_id, array $args, array $e
         $rowIndex++;
         $colIndex = 0;
         foreach ($row->getCellIterator() as $cell) {
-          $previousExcel->getActiveSheet()->setCellValueByColumnAndRow($colIndex++, $rowIndex, $cell->getValue());
+          $previousExcel->getActiveSheet()->setCellValueByColumnAndRow(++$colIndex, $rowIndex, $cell->getValue());
         }
       }

Translate complete string

--- a/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php
+++ b/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php
@@ -104,7 +104,7 @@ protected static function buildBatch(ViewExecutable $view, $args) {
       ],
       'title' => t('Exporting data...'),
       'progressive' => TRUE,
-      'progress_message' => '@percentage% complete. Time elapsed: @elapsed',
+      'progress_message' => t('@percentage% complete. Time elapsed: @elapsed'),
       'finished' => [static::class, 'finishBatch'],
     ];
     batch_set($batch_definition);

Uploaded patch files worked with me on dev version.

I hope dev release published soon to help fixing issues better.

mhmd’s picture

mahfuznz’s picture

Thanks @mhmd. I replaced my beta1 with dev version and applied patch #170 and it worked. Then I cleared the cache but I can't see any settings for batch operation under Format: Data export(csv) | Settings

Could anyone let me know where to look for it? Or do I have run any config update or something?

Another thing, do I need to apply the other patch views_data_export_batch_xlsx_fix.patch because I will also need download batch operation for xls format?

Thanks again.

codesmith’s picture

I tried the 2789531-178.patch in #179. In the Path Settings -> Path: dialog I set the path and filename and checked the "Download instantly" box. "Redirect Path" was left empty.

In Export Settings (@mahfuznz it's in the middle column of the View settings under No Results Behavior) I set the Batch size: 1000 and Limit: no limit.

Results:

1) Batch export worked for exporting 5800 rows.
2) I ended up getting the results displayed in the browser - no file was downloaded.
3) Error log: Notice: Undefined index: redirect_path in Drupal\views_data_export\Plugin\views\display\DataExport->buildOptionsForm() (line 358
4) Error log: Notice: Undefined index: automatic_download in Drupal\views_data_export\Plugin\views\display\DataExport->buildOptionsForm() (line 352
5) In the Path Settings -> Path dialog the help for the "Redirect Path" says "If you do not check Download instantly, you will be redirected to this path containing download link after export finished. Leave blank for ."

Thanks for the efforts everyone!

mahfuznz’s picture

Thanks @codesmith I found the Export settings already but your comment gave me information about some other options like Instand Download and Redirect Path. The download(both instant and link on the redirected page) is working if I am logged in as Admin. But when I try it with some other user role, the download is not working. It's just redirecting to the page but no link is shown nor it's downloaded instantly. Any idea what could be the reason?

Many thanks.

mahfuznz’s picture

I have found the reason for not downloading the file instantly or not showing the download link. I created private folder as the batch operation was not running. I had to give "View private files" to the roles to make it work.

So the CSV download is working fine with my Drupal 8.5.3 but when I try to download the Data export (xlsx) option, I get the following error in my log:

Notice: Undefined index: automatic_download in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() (line 616 of /Users/mrah406/dev/nihi/nutriweb/docroot/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php) #0 /Users/mrah406/dev/nihi/nutriweb/docroot/core/includes/bootstrap.inc(582): _drupal_error_handler_real(8, 'Undefined index...', '/Users/mrah406/...', 616, Array) #1 /Users/mrah406/dev/nihi/nutriweb/docroot/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php(616): _drupal_error_handler(8, 'Undefined index...', '/Users/mrah406/...', 616, Array) #2 /Users/mrah406/dev/nihi/nutriweb/docroot/core/includes/batch.inc(294): Drupal\views_data_export\Plugin\views\display\DataExport::processBatch('search_product', 'data_export_2', Array, Array, 5, Array) #3 /Users/mrah406/dev/nihi/nutriweb/docroot/core/includes/batch.inc(137): _batch_process() #4 /Users/mrah406/dev/nihi/nutriweb/docroot/core/includes/batch.inc(93): _batch_do() #5 /Users/mrah406/dev/nihi/nutriweb/docroot/core/modules/system/src/Controller/BatchController.php(55): _batch_page(Object(Symfony\Component\HttpFoundation\Request)) #6 [internal function]: Drupal\system\Controller\BatchController->batchPage(Object(Symfony\Component\HttpFoundation\Request)) #7 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(123): call_user_func_array(Array, Array) #8 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/Render/Renderer.php(582): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() #9 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(124): Drupal\Core\Render\Renderer->executeInRenderContext(Object(Drupal\Core\Render\RenderContext), Object(Closure)) #10 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/EventSubscriber/EarlyRenderingControllerWrapperSubscriber.php(97): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->wrapControllerExecutionInRenderContext(Array, Array) #11 /Users/mrah406/dev/nihi/nutriweb/docroot/vendor/symfony/http-kernel/HttpKernel.php(151): Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber->Drupal\Core\EventSubscriber\{closure}() #12 /Users/mrah406/dev/nihi/nutriweb/docroot/vendor/symfony/http-kernel/HttpKernel.php(68): Symfony\Component\HttpKernel\HttpKernel->handleRaw(Object(Symfony\Component\HttpFoundation\Request), 1) #13 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/StackMiddleware/Session.php(57): Symfony\Component\HttpKernel\HttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #14 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/StackMiddleware/KernelPreHandle.php(47): Drupal\Core\StackMiddleware\Session->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #15 /Users/mrah406/dev/nihi/nutriweb/docroot/core/modules/page_cache/src/StackMiddleware/PageCache.php(99): Drupal\Core\StackMiddleware\KernelPreHandle->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #16 /Users/mrah406/dev/nihi/nutriweb/docroot/core/modules/page_cache/src/StackMiddleware/PageCache.php(78): Drupal\page_cache\StackMiddleware\PageCache->pass(Object(Symfony\Component\HttpFoundation\Request), 1, true) #17 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/StackMiddleware/ReverseProxyMiddleware.php(47): Drupal\page_cache\StackMiddleware\PageCache->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #18 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/StackMiddleware/NegotiationMiddleware.php(50): Drupal\Core\StackMiddleware\ReverseProxyMiddleware->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #19 /Users/mrah406/dev/nihi/nutriweb/docroot/vendor/stack/builder/src/Stack/StackedHttpKernel.php(23): Drupal\Core\StackMiddleware\NegotiationMiddleware->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #20 /Users/mrah406/dev/nihi/nutriweb/docroot/core/lib/Drupal/Core/DrupalKernel.php(664): Stack\StackedHttpKernel->handle(Object(Symfony\Component\HttpFoundation\Request), 1, true) #21 /Users/mrah406/dev/nihi/nutriweb/docroot/index.php(19): Drupal\Core\DrupalKernel->handle(Object(Symfony\Component\HttpFoundation\Request)) #22 /Users/mrah406/.composer/vendor/laravel/valet/server.php(133): require('/Users/mrah406/...') #23 {main}.

Any idea what could be the reason. Please note that I could not apply this patch because when I tried dry-run option it showed the following message:

patch -p1 --dry-run < ../../patches/views_data_export/views_data_export_batch_xlsx_fix.patch

patching file src/Plugin/views/display/DataExport.php
Hunk #1 FAILED at 9.
Hunk #2 FAILED at 562.
Hunk #3 FAILED at 580.
3 out of 3 hunks FAILED -- saving rejects to file src/Plugin/views/display/DataExport.php.rej

Your help will be greatly appreciated.

Thanks once again for all the support.

codesmith’s picture

Following up to my #181 post:

1) File *does* automatically download in Chrome and Firefox but not Safari (OSX 10.13.6).

2) Tried adding a "Redirect path" url. Got WSOD. In logs it said the url was invalid because there was no leading / - so that should be added to the help instructions. Also the help for the "Download instantly" box says "Otherwise you will be redirected to above Redirect path" -- the redirect path field is *below*, not above.

3) Adding the "Redirect path" just left me on the "Exporting data..." batch page with bar sitting at 100%. It wasn't until I unchecked the "Download instantly" box and ran it again that it redirected and showed me the download link as expected. The download link downloaded the file in Chrome, FF but not Safari (Just opened file in browser - had to ctrl-click to download.)

4) I would guess that the file not downloading issue is happening because of some Safari settings, but when I used the 8.x-1.0-beta1 of the plugin I *did* get an automatic file download in Safari so... hmmm... weird.

@mahfuznz re: applying the views_data_export_batch_xlsx_fix.patch patch. It doesn't look you're applying it correctly. The file should be in the root of the module directory (ie. modules/contrib/views_data_export ) and you should be executing the patch command from the same place

$ cd modules/contrib/views_data_export 
$ patch -p1 < views_data_export_batch_xlsx_fix.patch 
patching file src/Plugin/views/display/DataExport.php
Hunk #2 succeeded at 564 (offset 2 lines).
Hunk #3 succeeded at 582 (offset 2 lines).
mhmd’s picture

I applied the patches against dev version and configured private folder and every things works fine with drupal 8.5.8.

I downloaded flies more than 10K and 15K rows in xlsx format.

mahfuznz’s picture

Thanks everyone for your helpful comments. I have a new issue at the moment. The Batch CSV download works for around 600 records. But after importing 2600 records in Drupal Content when I was trying to download all of them, the page is showing Gateway Timeout. Please note that I have about 30 columns for the content and two fields are linked with another content types where there are multiple fields. I had to create relationships to show fields from that linked content and expose them for search. So when I try to download, there will be lots of processing going on. What should I do to prevent the initial timeout so that the batch operation gets started?

Is there any other approach to my scenario of searching and downloading contents as CSV and XLS? I will have about 85000 contents and they will also be linked with two other content types. Any alternative efficient approach suggestion will be much appreciated.

Many thanks.

rovo’s picture

I applied patch #179 2789531-178.patch , and was able to successfully export +55k record CSVs at 1k per batch.

Thank you!

drunken monkey’s picture

The problem regarding Search API views mentioned in #172 should be fixed by #3024753: The Views query plugin should override query(). Please see there.

ressa’s picture

I have tested the #3024753: The Views query plugin should override query() patch by @drunken monkey and can confirm that it makes Search API Solr support batch export.

Dev versions of these modules:

composer require 'drupal/search_api:1.x-dev'
composer require 'drupal/views_data_export:1.x-dev'

Patches in composer.json:

"patches": {
    "drupal/search_api": {
        "Fix batch export": "https://www.drupal.org/files/issues/2019-01-09/3024753-2--query_plugin_query_method.patch"
    },
    "drupal/views_data_export": {
        "patch #170": "https://www.drupal.org/files/issues/2018-11-19/2789531-170.patch",
        "patch #168": "https://www.drupal.org/files/issues/2018-11-08/views_data_export_batch_xlsx_fix.patch"
    }
}

In settings.php:
$settings['file_private_path'] = '../private';

Set "Default download method" to "Private local files served by Drupal." under admin/config/media/file-system, and clear cache.

flocondetoile’s picture

Using patch 178 in #179.

I can not edit the path settings in views because this errors

Notice: Undefined index: automatic_download in /srv/www/ceremaweb/web/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php

because

$form['automatic_download'] = [
          '#type' => 'checkbox',
          '#title' => $this->t("Download instantly"),
          '#description' => $this->t("Check this if you want to download the file instantly after being created. Otherwise you will be redirected to above Redirect path containing the download link."),
          '#default_value' => $this->options['automatic_download'],
        ];

Using $this->getOption('automatic_download')

instead of $this->options['automatic_download'] fix the issue

There is the same issue with the other option redirect_path

Anyone has got this issue ?

mhmd’s picture

Apply the patches against dev version of the module

Steven Spasbo’s picture

drunken monkey’s picture

We hit another problem regarding use with Search API, and unfortunately it’s a bit of a doozy.

The problem is that Search API-based views can not only be filtered using Views-internal data (which are correctly handled by this patch), but also using facets, which Views doesn’t know about. The data from those comes from an additional GET parameter (f by default) which won’t be present during the batch operation, resulting in the view missing the facet filter and thus exporting different results than shown. (One oddity here is that the initial batch generation will still have the proper GET parameters and, thus, facet filters, so the number of batch iterations gets set correctly for the filtered number of results. I.e., if the filtered view has 17 results, the unfiltered one more than 20 and the batch size is 10, you’ll get the first 20 results of the unfiltered view in the export.)

The attached patch (to be applied on top of #192) shows a very crude workaround that resolved this problem for me, but I don’t really have an idea how to resolve this properly. Does anyone else have a clue? Basically, we’d need to get the GET parameter from the initial page into the batch process callback.
One relatively clean way to implement this might be an additional setting for preserving certain GET parameters (say, a text field where you can add as many as you want) using something like the mechanism in my patch. Not very user-friendly, but any other solution would probably have to be facets-centric, and I don’t think that’s an option? (Providing a solution in the Facets module by altering the batch operation could of course work, but would be a lot trickier to implement, and also add code to a generic module for just a very specific use case – basically same as providing a facets-specific solution in this module, just the other way round, plus separating the code into two modules.)

Anyways, just wanted to put this out there in case someone can think of a solution, or if the general problem (not preserving GET parameters) might affect other use cases as well. If it’s just a problem of faceted Search API views then this should of course not block the issue, and I’d be happy to create a follow-up once this has been committed. But any feedback would be appreciated!

(Work paid for by https://jura.ku.dk/icourts/.)

kiwad’s picture

Took the patch from #126 but modified 2 things :

I'm using the module XLS serialization and PhpSpreadsheet Library so I got errors Class not found for
'PHPExcel_Writer_Excel2007'
and
'PHPExcel_IOFactory'

per PhpSpreadsheet,

'PHPExcel_Writer_Excel2007' => \PhpOffice\PhpSpreadsheet\Writer\Xlsx::class,
'PHPExcel_IOFactory' => \PhpOffice\PhpSpreadsheet\IOFactory::class,

so I added
use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Writer\Xlsx;

and replaced PHPExcel classes

Also might worth noting that I' using XLS serialization with patch from #4
#2942178: Batch support implementation

ressa’s picture

@steven.spasbo's re-rolled patch in #192 works perfectly, enabling batch export in CSV-format. Is it time to commit the patch to dev, or do we need some more confirmations? Or perhaps tests and upgrade path still needs to be done, like @andypost mentions in #167?

I can also confirm that @drunken monkey's patch in #193 allows batch-based views data export to work with Search API-based views using facets.

simplyshipley’s picture

This is my first contribution back to Drupal. I hope I didn't do more harm than good, but I've been working on this issue for a few days. I started with patch #192 and made changes from there. This patch was created from the dev version of the module before #192 was applied to my last commit so we don't need to apply multiple patches. I'm not sure how most people like to work in the issue queue and I can recreate the patch if I need to, just let me know what I need to do. Now on to my updates.

I was able to get the Batch export functionality to work with csv and xlxs including filtering with facets and automatically downloading the file using javascript after being redirected to the chosen URL. I also added a new section on the views display configuration form to select the facet data source to use for filtering the exported data.

New configuration options in the view display:

Path Settings Form Updates

  • Allow the export file to be stored in the public files directory
  • Allow query string to be added to the redirect url
  • Allow user to choose redirect from a list of displays on the current view or add custom redirect path

New Facet Source Settings Section

  • Allow selecting facet source to filter data export.
simplyshipley’s picture

I rebuilt patch #196. It failed to apply because I used the --no-prefix flag in git when creating the patch. Nothing else changed between #196 and #197

artematem’s picture

artematem’s picture

artematem’s picture

maxdev’s picture

Hi @artematem
After appliying your latest patch #202 it creates unnecessary files in root of the drupal:

new file:   tests/modules/views_data_export_test/test_views/views.view.views_data_test_1.yml
new file:   tests/modules/views_data_export_test/test_views/views.view.views_data_test_2.yml
new file:   tests/modules/views_data_export_test/test_views/views.view.views_data_test_3.yml
maxdev’s picture

When trying to batch export in XLS/XLSX it throws error:

ResponseText: Error: Class 'PHPExcel_IOFactory' not found in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() (line 547 of /data/www/drupalpoc/project/modules/contrib/views_data_export/src/Plugin/views/display/DataExport.php).
ressa’s picture

Just a tip: If you want to pin a dev-version of a module on a specific commit, use something like this in your composer.json file:

"drupal/views_data_export": "1.x-dev#717234f"

717234f refers to the commit from 11 Jan 2019 at 19:45 listed on https://www.drupal.org/node/980666/commits which works with patches #192 and #193.

simplyshipley’s picture

Here's patch #197 updated to work with the new dev release on March 11 2019 commit #05bd75f

simplyshipley’s picture

I removed the tests since they are failing since I added a module dependency to xls_serialization. I've tried a couple times to update the tests, but it keeps failing the d.o test. The tests pass locally, but I'm not sure how to fix the issue. I will continue to work at fixing the tests and add them back when I figure out the problem. This is the patch #207 with the tests removed.

andypost’s picture

@simplyshipley when you add dependencies that should go to composer require dev and test_dependencies into module info file

andypost’s picture

@simplyshipley also please add interdiff - it's hard to follow changes

icjur’s picture

Thanks @simplyshipley, your patch in #208 works great with the latest dev-release, and I can export in CSV, XML and JSON-format. Lots of great additions, it's a great feature that there is an option to present the download link on the Views page itself, as well as having the option of preserving the original search, by keeping the query string.

It also works fine with Search API-based views, as mentioned in #193, which:

... can not only be filtered using Views-internal data (which are correctly handled by this patch), but also using facets, which Views doesn’t know about. The data from those comes from an additional GET parameter (f by default) which won’t be present during the batch operation, resulting in the view missing the facet filter and thus exporting different results than shown.

I have quickly tested the Facet settings > Facet source, which I couldn't get working, I kept getting ten results, when there were more ... I will do some more experimenting.

The CSS for the button seems to be missing, but it works if I add the original values in views_data_export.css:

.views-data-export-feed {
  display: inline;
}

.json-feed .feed-icon,.xml-feed .feed-icon,.csv-feed .feed-icon,.xls-feed .feed-icon,.xlsx-feed .feed-icon {
  background: no-repeat;
  overflow: hidden;
  text-indent: -9999px;
  display: block;
  width: 36px;
}

.json-feed .feed-icon {
  background-image: url(/modules/contrib/views_data_export/images/json.png);
}

.xml-feed .feed-icon {
  background-image: url(/modules/contrib/views_data_export/images/xml.png);
}

.csv-feed .feed-icon {
  background-image: url(/modules/contrib/views_data_export/images/csv.png);
}

.xls-feed .feed-icon {
  background-image: url(/modules/contrib/views_data_export/images/xls.png);
}

.xlsx-feed .feed-icon {
  background-image: url(/modules/contrib/views_data_export/images/xlsx.png);
  width: 43px;
}
YurkinPark’s picture

Tests are fixed, but i think i'll add more tests here

jibran’s picture

HERE are some review points.

  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -33,12 +36,104 @@ class DataExport extends RestExport {
    +    $count_query_results = $count_query->query(true)->execute();
    

    s/true/TRUE

  2. +++ b/src/Plugin/views/display/DataExport.php
    @@ -33,12 +36,104 @@ class DataExport extends RestExport {
    +    if ($count_query_results instanceof \Drupal\search_api\Query\ResultSetInterface) {
    

    We might need to check search api is installed.

  3. +++ b/src/Plugin/views/display/DataExport.php
    @@ -33,12 +36,104 @@ class DataExport extends RestExport {
    +      'title' => t('Exporting data...'),
    ...
    +      'progress_message' => t('@percentage% complete. Time elapsed: @elapsed'),
    

    $this->t

  4. +++ b/src/Plugin/views/display/DataExport.php
    @@ -197,12 +399,265 @@ class DataExport extends RestExport {
    +      $current_user = \Drupal::currentUser();
    ...
    +      $timestamp = \Drupal::time()->getRequestTime();
    +      $filename = \Drupal::token()->replace($view->getDisplay()->options['filename'], array('view' => $view));
    ...
    +      $vdeFileRealPath = \Drupal::service('file_system')->realpath($context['sandbox']['vde_file']);
    ...
    +      \Drupal::logger('views_data_export')->error($message);
    

    These services should be injected.

  5. +++ b/src/Plugin/views/display/DataExport.php
    @@ -197,12 +399,265 @@ class DataExport extends RestExport {
    +      $message = t('Could not write to temporary output file for result export (@file). Check permissions.', ['@file' => $context['sandbox']['vde_file']]);
    

    $this->t

  6. +++ b/src/Plugin/views/display/DataExport.php
    @@ -197,12 +399,265 @@ class DataExport extends RestExport {
    +      $headers = \Drupal::moduleHandler()->invokeAll('file_download', [$results['vde_file']]);
    ...
    +        drupal_set_message(t('Export complete. Download the file <a href=":download_url">here</a>.', [':download_url' => $url]));
    ...
    +      drupal_set_message(t('Export failed. Make sure the private file system is configured and check the error log.'), 'error');
    

    drupal_set_message is deperecated so let's inject the new service also module handler.

andypost’s picture

Status: Needs work » Needs review
FileSize
7.17 KB
43.08 KB

Agree that it needs more tests, specifically for workarounds with xls and so on

Fixed a code a bit for feedback of #213

1 done
2 no need since php 5.1 or 5.2 - autoloader will throw no error
3 fixed
4 batch in static method - no way to inject
5 fixed
6 fixed & see 4

Meanwhile only logger should provide detailed information about error - no reason to expose it to enduser, and here should be 500 error as core's rest doing

YurkinPark’s picture

FileSize
46.57 KB
4.84 KB

Fixed some part of code, added tests (for SQL views only for now), also, i didn't check how this works with search_api results.

Status: Needs review » Needs work

The last submitted patch, 215: 2789531-215.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

jansete’s picture

Patch should be use https://github.com/PHPOffice/PhpSpreadsheet instead https://github.com/PHPOffice/PHPExcel right?

In other hand, I think that you should be configure file destination path, I think that is very usefull can configure to temporary stream wrapper and avoid save permanently and have to configure private files.

simplyshipley’s picture

Is there a reason my patch #208 was skipped after icjur commented that it worked well in #211? I'm just curious if there's something I need to do differently or add to the patch. The patch adds functionality to export data filtered with facets, fixes issues with storing private files, the automatic download and cleaned up some of the coding standards.

Please let me know how I should go about getting some of these things added into the patch that others are using.

simplyshipley’s picture

jibran’s picture

There are a lot of out of scope changes going on in the issue. Now on top of that, we have two patches. @jhedstrom I'd suggest descoping the issue and addressing rest of it in followups.

jhedstrom’s picture

Issue tags: -Needs tests

@jibran agreed. Let's get an MVP in and then do follow-ups.

simplyshipley’s picture

andypost’s picture

Any reason to duplicate testing dependencies?

simplyshipley’s picture

@andypost This is the first time working with phpunit tests and I don't know how to configure the test correctly. I didn't build the tests, but it started failing after I added the xls_dependency to the module in my patch. I'm doing something wrong b/c the tests pass locally. Do you know how to make sure the dependencies are loaded for the tests or how I can replicate the test on drupal.org? This is really the first time I've committed anything to a module and I don't want to keep doing something wrong and cause people to get frustrated.

andypost’s picture

  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -7,8 +7,14 @@ use Drupal\Core\Cache\CacheableResponse;
    +use PhpOffice\PhpSpreadsheet\IOFactory;
    +use PhpOffice\PhpSpreadsheet\Writer\Xlsx;
    
    @@ -84,6 +232,50 @@ class DataExport extends RestExport {
    +      'category' => 'export_settings',
    
    @@ -211,4 +620,280 @@ class DataExport extends RestExport {
    +    // Workaround for XLS/XLSX.
    +    if ($context['sandbox']['progress'] != 0 && ($output_format == 'xls' || $output_format == 'xlsx')) {
    +      $vdeFileRealPath = \Drupal::service('file_system')->realpath($context['sandbox']['vde_file']);
    +      $previousExcel = IOFactory::load($vdeFileRealPath);
    

    this dependencies should be runtime, it means when no xls support enabled it should not be displayed and processed

  2. +++ b/tests/modules/views_data_export_test/views_data_export_test.info.yml
    @@ -2,7 +2,7 @@ name: 'Views data export test views'
     dependencies:
    +  - drupal:views
       - views_data_export:views_data_export
    

    the parent module already require views as dependency (added in this patch) so no reason to duplicate it here

  3. +++ b/views_data_export.info.yml
    @@ -5,4 +5,8 @@ package: 'Views'
    +  - xls_serialization:xls_serialization
    +test_dependencies:
    +  - xls_serialization:xls_serialization
    

    I mean that xls_serialization:xls_serialization should be only test_dependencies - no reason to require it

ressa’s picture

Thanks for persevering @simplyshipley, it can be complicated to make the test bot happy :-)

The patch from #208 works fine, but during drush config-import for both #219 and #222 patches, I get this error, possibly due to xls_serialization:xls_serialization being required?

[error]Drupal\Core\Config\ConfigImporterException: There were errors validating the config synchronization.
Unable to install the <em class="placeholder">Views Data Export</em> module since it requires the <em class="placeholder"></em> module. in Drupal\Core\Config\ConfigImporter->validate() (line 737 of /app/web/core/lib/Drupal/Core/Config/ConfigImporter.php). [0.17 sec, 34.17 MB]

In ConfigImportCommands.php line 259:
[Exception]
The import failed due to the following reasons:
Unable to install the <em class="placeholder">Views Data Export</em> module since it requires the <em class="placeholder"></em> module.
ressa’s picture

Thanks @andypost. I patched with #222, removed xls_serialization:xls_serialization as a dependency from views_data_export.info.yml, and the patch works, giving no errors during drush config-import.

Diff with #222 patch:

+++ b/views_data_export.info.yml
@@ -7,6 +7,5 @@ dependencies:
   - drupal:rest
   - drupal:views
   - csv_serialization:csv_serialization
-  - xls_serialization:xls_serialization
 test_dependencies:
   - xls_serialization:xls_serialization
YurkinPark’s picture

@simplyshipley i've skipped your patch because you have removed tests, you have not provided interdiff, i was thinking there was tests removing only. I've applied my changes to your patch with tests we had earlier.

YurkinPark’s picture

added search_api to tests

YurkinPark’s picture

added test dependencies to composer

Status: Needs review » Needs work

The last submitted patch, 230: 2789531-230.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

ressa’s picture

Status: Needs work » Needs review

Thanks for adding tests @YurkinPark, your latest patch in #230 works fine in terms of functionality. The download "CSV" button is still not showing though:

diff --git a/css/views_data_export.css b/css/views_data_export.css
index ce8d063..1c09434 100644
--- a/css/views_data_export.css
+++ b/css/views_data_export.css
@@ -8,19 +8,3 @@
   display: block;
   width: 36px;
 }
-.json-feed .feed-icon {
-  background-image: url(../images/json.png);
-}
-.xml-feed .feed-icon {
-  background-image: url(../images/xml.png);
-}
-.csv-feed .feed-icon {
-  background-image: url(../images/csv.png);
-}
-.xls-feed .feed-icon {
-  background-image: url(../images/xls.png);
-}
-.xlsx-feed .feed-icon {
-  background-image: url(../images/xlsx.png);
-  width: 43px;
-}
ressa’s picture

Status: Needs review » Needs work
YurkinPark’s picture

Status: Needs work » Needs review
FileSize
74.08 KB
30.98 KB

@ressa i've restired css in this patch. One more fix for composer.

Status: Needs review » Needs work

The last submitted patch, 234: 2789531-233.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

ressa’s picture

Thanks @YurkinPark, the download button is back :-)

anpolimus’s picture

Guys thank you for your efforts.
I've applied your last version of patch and see an error with PhpSpreadsheet dependency.
I think we have to add dependency to this library at composer.json

icjur’s picture

It seems like anonymous user can't export with the latest patch, even with these settings:

Data export: Access options
Role
x Anonymous user
x Authenticated user

If I try to export with admin it works fine, but with anonymous user I just get redirected to the front page. I see nothing in the logs, can I somehow use for example Devel to debug this?

YurkinPark’s picture

Anonymous issue can be related to "store_in_public_file_directory" setting. If file is stored in private directory it is not possible to let anonymous download it.

icjur’s picture

Thanks for the suggestion @YurkinPark, but I use public file directory, admin/config/media/file-system:

Default download method
x Public local files served by the webserver.

The Views Data Export Path settings is:
Access: Permission | View published content

The results (for admin) are saved in sites/default/files/views_data_export/content_index_data_export_1/1-1554716721 as expected.

Also, I don't even get the progress bar, it goes straight to the front page after I click the export as CSV-button as anonymous, which suggests to me that public/private file directory isn't the issue.

icjur’s picture

I just tried to create a really simple view, and the anonymous user still couldn't export. But if I change Export method from Batch to Standard it works fine, and the file is downloaded.

icjur’s picture

I tried to build a fresh version today, and everything seems to work fine again, so the anonymous user can export views data. Sorry for the noise.

YurkinPark’s picture

Status: Needs work » Needs review
FileSize
74.16 KB
1.68 KB

Fixed tests

icjur’s picture

I was able to reproduce the problem in #238 by downgrading Drupal core from 8.6.14 to 8.6.13. To verify, I upgraded back to 8.6.14, which fixed it. It might have been related to #3045349: Lazy started sessions with session data are not saved with symfony/http-foundation 3.4.24.

andypost’s picture

Status: Needs review » Needs work
  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -211,4 +619,280 @@ class DataExport extends RestExport {
    +    elseif (file_put_contents($context['sandbox']['vde_file'], $string, FILE_APPEND) === FALSE) {
    +      // Write to output file failed - log in logger and in ResponseText on
    +      // batch execution page user will end up on if write to file fails.
    +      $message = t('Could not write to temporary output file for result export (@file). Check permissions.', ['@file' => $context['sandbox']['vde_file']]);
    +      \Drupal::logger('views_data_export')->error($message);
    +      throw new ServiceUnavailableHttpException(NULL, $message);
    

    I see no test for this case, not clear how batch will consume 503
    Logger should get non-translatable message because it translates on displaying

    I think this case needs coverage cos "disk full" state needs better processing

  2. +++ b/tests/src/Functional/ViewsDataExportBatchTest.php
    @@ -0,0 +1,138 @@
    +use Drupal\dblog\Logger\DbLog;
    

    unused

  3. +++ b/tests/src/Functional/ViewsDataExportBatchTest.php
    @@ -0,0 +1,138 @@
    +    $this->assertSession()->pageTextContainsOnce('automatically downloaded');
    +//    $this->assertEquals(10, count($res2), 'Count of exported nodes is wrong.');
    

    unused code

  4. +++ b/tests/src/Kernel/Plugin/views/style/DataExportTest.php
    @@ -45,7 +46,7 @@ class DataExportTest extends ViewsKernelTestBase {
    -    /** @var \Drupal\views\ViewEntityInterface $view */
    +    /* @var \Drupal\views\ViewEntityInterface $view */
    

    wrong change

  5. +++ b/views_data_export.module
    @@ -18,3 +26,32 @@ function views_data_export_theme() {
    +function views_data_export_views_query_alter(ViewExecutable $view, QueryPluginBase $query) {
    +  // Get facet source id from view display and alter search query.
    +  $facet_source = $view->getDisplay()->getOption('facet_settings');
    +  if (isset($facet_source) && $facet_source !== 'None') {
    +    $search_query = $query->getSearchApiQuery();
    +    $facet_manager = \Drupal::service('facets.manager');
    +    $facet_manager->alterQuery($search_query, $facet_source);
    

    should explain *why* facets needs special alter

ivelin.enchev’s picture

I was getting a "Error: Call to a member function getRoutedDisplay() on null in Drupal\views\ViewExecutable->getUrl()"
when batch exporting an xlsx using the patch from #243
This one is the same as #243 but I've changed "$redirect_to_display !== 'none'" to "$redirect_to_display !== 'None'"

YurkinPark’s picture

Status: Needs work » Needs review

@icjur i've tested dev version of module with patch #243 and with drupal 8.6.13 - everything works fine.
@ivelin.enchev could you attach interdiff please?

Status: Needs review » Needs work

The last submitted patch, 246: views_data_export-batch_support-2789531-246-D8.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

YurkinPark’s picture

@ivelin.enchev i've checked string you have edited, possibly, xlsx issue you caught is not related to views_data_export module working (or this module), this is related to 'xls_serialization' module and this issue https://www.drupal.org/project/xls_serialization/issues/2911784 . Shortly, xlsx encode needs php-zip extension to be installed.

streger’s picture

@YurkinPark, @ivelin.enchev Thank you, I can confirm that after your patch #246 the export of xls/xlsx works and that it didn't work with #243.

YurkinPark’s picture

@ivelin.enchev i've checked code line you have modified but i preferred to change initial default setting (It was capitalised)
@andypost i've fixed all your points except first, still have no idea how to make this test

YurkinPark’s picture

Status: Needs work » Needs review
ressa’s picture

Thanks @YurkinPark, your patch in #251 works fine with the latest dev-release. Exporting in xml-format also work.

claudiu.cristea’s picture

FileSize
73.6 KB
1.07 KB
  1. +++ b/src/Plugin/views/display/DataExport.php
    @@ -75,6 +197,31 @@ class DataExport extends RestExport {
    +    if (\Drupal::service('module_handler')->moduleExists('facets')) {
    ...
         return $options;
    
    @@ -100,21 +291,40 @@ class DataExport extends RestExport {
    +    if (\Drupal::service('module_handler')->moduleExists('facets')) {
    
    @@ -157,6 +494,35 @@ class DataExport extends RestExport {
    +            if (\Drupal::service('module_handler')->moduleExists('facets')) {
    

    As we are checking for Facets module several times in the same class let's create a protected method isFacetsModuleEnabled() and statically cache the response in a protected property:

    protected $facetsModuleEnabled;
    ...
    protected function isFacetsModuleEnabled() {
      if (!isset($this->facetsModuleEnabled) {
        $this->facetsModuleEnabled = \Drupal::service('module_handler')->moduleExists('facets');
      }
      return $this->facetsModuleEnabled;
    }
    
  2. +++ b/src/Plugin/views/display/DataExport.php
    @@ -100,21 +291,40 @@ class DataExport extends RestExport {
    +      $categories['facet_settings'] = [
    ...
    +      $options['facet_settings'] = [
    
    @@ -157,6 +494,35 @@ class DataExport extends RestExport {
    +        $view = $form_state->getStorage()['view'];
    +        if (isset($view->get('storage')->getDependencies()['module'])) {
    +          $view_module_dependencies = $view->get('storage')->getDependencies()['module'];
    +          if (in_array('search_api', $view_module_dependencies)) {
    

    We should do this check also in DataExport::optionsSummary(). I have a view which doesn't depend on Search API but still show the "Facet settings" category. More, I think we should combine this check with this Facets module existence and move it in a protected method so that we can reuse it. I'm also questioning here the check itself. Just looking for search_api module dependency doesn't seem enough. Shouldn't we check for the provider of some plugin? I don't know exactly which plugin but I feel that the dependency could be added there for other reasons. I guess that we can check the view's query plugin if is search_api_query or if extends SearchApiQuery.

  3. +++ b/views_data_export.module
    @@ -18,3 +26,38 @@ function views_data_export_theme() {
    +  if (isset($facet_source) && $facet_source !== 'None') {
    

    Should be lowercased s/None/none. See DataExport::defineOptions() (this is already fixed in this patch). And probably we'll need to do the same check as in the plugin (the view is search_api and facets in enabled). I'm thinking that the check it can go in a service or at least a helper class.

tamarpe’s picture

Thanks! works great.
I had issues of :

Recoverable fatal error: Argument 5 passed to Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() must be an instance of Drupal\views_data_export\Plugin\views\display\int, integer given in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() 
Recoverable fatal error: Argument 7 passed to Drupal\views_data_export\Plugin\views\display\DataExport::processBatch() must be an instance of Drupal\views_data_export\Plugin\views\display\string, string given in Drupal\views_data_export\Plugin\views\display\DataExport::processBatch()

Fixed

tamarpe’s picture

FileSize
1.48 KB
73.9 KB

On Patch #254 Also had:
Fatal error: Cannot use isset() on the result of a function call (you can use "null !== func()" instead) in views_data_export/src/Plugin/views/display/DataExport.php on line 502

Also removed unused variable $metadata

tim-diels’s picture

Working very good. To me this can be set to RTBC...

malcolm_p’s picture

1. I feel that we should not be storing these exports in the public file directory by default.

$options['store_in_public_file_directory']['default'] = TRUE;

I think a nudge in the direction of the most secure default option is much safer.

2. If the file directory isn't writeable I get the below error from finishBatch:

The website encountered an unexpected error. Please try again later.</br></br><em class="placeholder">InvalidArgumentException</em>: Cannot redirect to an empty URL. in <em class="placeholder">Symfony\Component\HttpFoundation\RedirectResponse-&gt;setTargetUrl()</em> (line <em class="placeholder">86</em> of <em class="placeholder">/var/www/asf8/vendor/symfony/http-foundation/RedirectResponse.php</em>). <pre class="backtrace">Symfony\Component\HttpFoundation\RedirectResponse-&gt;__construct(NULL) (Line: 850)
Drupal\views_data_export\Plugin\views\display\DataExport::finishBatch(1, Array, Array, &#039;6 sec&#039;) (Line: 454)
_batch_finished() (Line: 98)
_batch_page(Object) (Line: 55)
Drupal\system\Controller\BatchController-&gt;batchPage(Object)
call_user_func_array(Array, Array) (Line: 123)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber-&gt;Drupal\Core\EventSubscriber\{closure}() (Line: 582)
Drupal\Core\Render\Renderer-&gt;executeInRenderContext(Object, Object) (Line: 124)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber-&gt;wrapControllerExecutionInRenderContext(Array, Array) (Line: 97)
Drupal\Core\EventSubscriber\EarlyRenderingControllerWrapperSubscriber-&gt;Drupal\Core\EventSubscriber\{closure}() (Line: 151)
Symfony\Component\HttpKernel\HttpKernel-&gt;handleRaw(Object, 1) (Line: 68)
Symfony\Component\HttpKernel\HttpKernel-&gt;handle(Object, 1, 1) (Line: 57)
Drupal\Core\StackMiddleware\Session-&gt;handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\KernelPreHandle-&gt;handle(Object, 1, 1) (Line: 106)
Drupal\page_cache\StackMiddleware\PageCache-&gt;pass(Object, 1, 1) (Line: 85)
Drupal\page_cache\StackMiddleware\PageCache-&gt;handle(Object, 1, 1) (Line: 47)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware-&gt;handle(Object, 1, 1) (Line: 52)
Drupal\Core\StackMiddleware\NegotiationMiddleware-&gt;handle(Object, 1, 1) (Line: 23)
Stack\StackedHttpKernel-&gt;handle(Object, 1, 1) (Line: 693)
Drupal\Core\DrupalKernel-&gt;handle(Object) (Line: 19)
</pre>

Instead, an error message and logged exception would be preferable and the redirect_url should always be passed to finishBatch.

simplyshipley’s picture

I updated the processBatch method to add the redirect_url before attempting to create the file and added a try/catch around the code that writes the file. This should catch the error and redirect instead of crashing if the filesystem is not writeable.

I also worked on making the export more efficient by using microtime measurements placed throughout the processBatch method to locate bottlenecks.

I'm not sure if there are reasons we need to render the entire view or if there is a better way to render only the display, but rendering the entire view is a major bottleneck. My CSV data export was configured to batch 3000 items at 15 columns wide pulling from over 300k documents in Solr. Rendering the entire view would take around 5 seconds while rendering only the display would take less than 1 second per batch iteration.

Times were measured on each side of the method calls below. Measuring only the process time in each single line of code
(src/Plugin/views/display/DataExport.php)
Around 5 seconds per iteration
$view->render()

Around 1 second per iteration
$view->display_handler->render()

I also found another bottleneck in the data export render method (src/Plugin/views/style/DataExport.php). When using the Symfony serializer the data runs through a normalizer which increases processing time per iteration by approximately 300% (0.6 seconds vs 2.4 seconds) for this portion of the batch export. It seems that if you are using non-nested data and exporting a CSV we could skip the normalizer and just encode the data. I added a configuration option at the bottom of the CSV settings form that allows making this change to the views display through the Views UI.

(src/Plugin/views/style/DataExport.php)

    if ($format === 'csv' && $this->options['csv_settings']['use_serializer_encode_only'] == 1) {
      return $this->serializer->encode($rows, $format, ['views_style_plugin' => $this]);
    }
    else {
      return $this->serializer->serialize($rows, $format, ['views_style_plugin' => $this]);
    }

I'm not sure if this is the best way to implement these changes, but these bottlenecks do exist and make a dramatic difference when exporting large datasets. These changes allow me to export around 100k rows per minute to CSV format on my local machine.

I also found a major issue with how we process xls/xlsx file exports. Currently we juggle 2 xls/xlsx files during this type of export. One file is the previously stored data and the other is the current batch iteration. This works, but is not efficient and basically becomes unusable as the exported data count increases. The previously stored data file increases in size with each iteration which causes the load time to increase for this file. The PhpOffice\PhpSpreadsheet library doesn't allow appending data to a spreadsheet so on each iteration the previously stored data file must be loaded so we can move the cursor to the highest row and write the new rows (current batch iteration) in a loop. I tested the processing time and loading the previously stored data file takes a few seconds to load in the beginning and steadily increases with each batch iteration. At on point in my test it was taking over 20 seconds to load the previously stored data file on each batch iteration and i was testing with less than 100k rows.

I'm not sure the best way to fix the xls/xlsx issue, so if anyone has any ideas please share them and I'll try to test them when I have time. I was thinking I might try storing the data as a CSV and using the PhpOffice\PhpSpreadsheet library to convert the file to xls/xlsx. This would allow us to append the data to a temporary CSV file and do the heavy conversion to xls/xlsx on the last iteration of the batch. This should be much more efficient, but would be limited to non-nested data and not use the xls_serializer module. It would probably be a new configuration option in the xls/xlsx settings form to skip the xls_serialization module serializer.

Please share your thoughts and ideas.

simplyshipley’s picture

malcolm_p’s picture

#259 fixes the error from the filesystem issues in the batch I encountered.

JeroenT’s picture

This patch probably still needs an upgrade path as stated in #167.

maelcorm’s picture

Hi,
Thank you very much for this patch. It works and it's very useful for the project that I'm working on.
However, I'm concerned about the performances.
I have currently more than 8000 nodes to export to a csv format. Each node has 45 fields.
It takes approximately 4 minutes to get the CSV file.
The total number of nodes is going to increase.
I have concerns about the time to generate this file in the future.
Any thoughts about how I could improve the export process for a large amount of data?
Thanks

jlscott’s picture

The patch from #259 is working for me, but I wanted to have token replacement available in the custom redirect path, so i have added that feature. New patch file and interdiff attached.

scoff’s picture

Should views pre_render/post_render etc hooks work in batch mode? I'm rewriting some hard-coded XML code (i.e. to ) in post_render and it works in standard mode, but not in batch. Any other recommended solution maybe?

scoff’s picture

This code in src/Plugin/views/display/DataExport.php

    // Rendering the display_handler is much more efficient than rendering
    // the entire view.
     $rendered_rows = $view->display_handler->render();

Replacing $view->display_handler->render() with $view->render() solves problems with hook_views_post_render()/hook_views_pre_render()/theme_views_post_render() (see $view->render() here)

I'm not sure how less efficient it is but it's more consistent.
What do you think?

UPD: Exporting 20K nodes to XML takes ~2 minutes ±6 seconds regardless of $view->display_handler->render() or $view->render()

Yago Elias’s picture

Hi jlscott, do you had any problem with "automatic download" when addind a custom redirect to any page?

Yago Elias’s picture

Looks like that

'#attached' => [
        'library' => [
          'views_data_export/views_data_export',
        ],
      ],

On style/DataExport.php only attach js/views_data_export_auto_download.js on my display views page, but not on a custom page. Had to add this libr on my custom form, where I want to download specifically a form.

spheresh’s picture

Rebasing #264 patch to be able to apply to the latest module version.

darrenwh’s picture

I'm currently using the patch from #264 with this hash 05bd75f39d15f214728e40600aed6fbdffc8295e of the dev branch, I'm doing a batch user export and including the user role column.

The role is showing on the preview on the view config page but when the CSV is exported the column is missing.

I'm exporting about 9605 records and using Drupal 8.7.1 with php 7.1

Gnanasampandan Velmurgan’s picture

The patch #264 works for me, Also I got the token replacement feature available in custom redirect path

malcolm_p’s picture

#269 is also working for me as well after updating to beta3.

I do still feel that keeping exports in private files instead of public would be a better default.
$options['store_in_public_file_directory']['default'] = TRUE;

mikran’s picture

patch did not apply to latest version, rerolled again

Ben25’s picture

#273 doesn't seem to work for me.

  - Installing drupal/views_data_export (1.0.0-beta3): Loading from cache
  - Applying patches for drupal/views_data_export
    https://www.drupal.org/files/issues/2019-10-02/views-data-export-support-for-batch-operations-2789531-273.patch (Support for batch operations)
   Could not apply patch! Skipping. The error was: Cannot apply patch https://www.drupal.org/files/issues/2019-10-02/views-data-export-support-for-batch-operations-2789531-273.patch
mikran’s picture

Yeah, that is actually the problem I have too.

patch in comment 269 applies to 8.x-1.0-beta3 from git

my reroll in comment 273 applies to latest commit from 8.x-1.x

the problem with composer is that the drupal.org packaging modifies info.yml as does this patch and these changes conflict.

I do not know what is the correct way to solve this.

mikran’s picture

ok to get forward with the immediate problem I created a temporary patch that just is the patch from comment 269 but without any changes to .info.yml files.

I'm reposting the real patch too to ensure that it is on top of the list still.

alex.skrypnyk’s picture

Doing the same as #276 did, but for the 3.0-beta3 version: re-rolling #269 without changes to the .info.yml files. Also, reduced the number of lines in the diff as per recommendations in https://www.drupal.org/node/1054616 (otherwise - the patch would not apply due to changes in views_data_export.info.yml):

Are you trying to patch a file that's modified by the Drupal.org packaging script? (i.e. mymodule.info.yml)
You may run into an issue with the patch working fine on code direct from the repo, but failing "in real life" on module code on an actual site.
This can happen if there are contextual/line-number discrepancies due to the Drupal.org packaging script.
If you find yourself in this situation, you can...
Create the patch with fewer lines of context than the default 3 lines of context:
git diff -U1 origin/8.x-1.x > mypatchfile.patch

alex.skrypnyk’s picture

FileSize
361 bytes

Added interdiff

mstrelan’s picture

Status: Needs review » Needs work

I'd have to agree with malcolm_p in #272, defaulting to private file system. Particularly if you already have an export established on an older version of this module, then switch the method to batch without clicking in to the path settings, you won't see the option.

I wonder if it would actually make more sense for all the new settings to live under the Method section, since they only apply to the Batch method and aren't really related to the path.

If you choose to use the private file system but the private file system is not configured the batch process will fail. We should probably check this is configured before presenting the export button, and show a message somewhere.

It would also be nice to skip the batch screen if the result set was smaller than the batch size, but not sure how possible that is.

Nitpicking now but I think "Download immediately" reads better than "Download instantly". A code search on the project I'm working on found various uses of the word "immediately" but none of the word "instantly".

mstrelan’s picture

The other issue I have is that while the temporary files are deleted on cron runs, the unique uid-timestamp directories are not. Perhaps we should be using a FileDownloadController similar to the core config module. Then the files could be saved in the filesystem with unique filenames but delivered with the filename specified in the view, and thus no need for the extra directories that hang around. This would also allow files to be created in temporary://, although I know that is an issue for load-balanced apps.

mstrelan’s picture

Sorry, one more issue. The javascript to auto download the file is not attached if the redirect is to another page, or if the data export is not "attached", e.g. if you were to build your own link rather than use the feed icons. I guess it needs to be added on every page, unless there is a way to attach it to \Drupal::messenger()->addMessage(). I guess if the message was a render array?

ultrabob’s picture

I've applied patch #277, and have set it up to export in batches of 50. After about 11 minutes the page still said initializing. Finally, I got this message showing up:

1% complete. Time elapsed: 11 min 28 sec

I'm trying to export 6408 nodes with 23 fields. This test is happening on a local lando install, so of course it is slower than normal, but this still seems extremely slow. Does anyone have some advice as to what further information I could give to contribute to making this solution more universally useful?

As of as long as it took me to write this message later, I'm still at 1% with the same timestamp.

ultrabob’s picture

As I continue working on the site where I've needed to deploy #277, I find that in cases where the fields would normally be output by the site theme templates, putting the export into batch mode, switches the theme to the admin theme.

hitesh.koli’s picture

#277 works for me for the issue with patch not getting applied.

selwynpolit’s picture

#277 applies great and adds the options for batching. When I go to the view output, I see the batch happening and then the csv file is never downloaded.. Ah and I find the output file in sites/default/files
Nice.
One interesting problem. I am doing a CSV export of files from my site and included the "entity label" for the files. While they render fine on screen, they never appear in the csv (or json) file. I can't imagine why that should be related in any way to batching the csv export but I figured I'd report it in case someone could figure it out.

ultrabob’s picture

@selwynpolit #285 could be related to #283

darrenwh’s picture

Hi,
as referenced in https://www.drupal.org/project/views_data_export/issues/2789531#comment-...

I have done a vanilla build with a docksal boilerplate

Used the latest dev and patch.

I created a couple of new roles and associated some test users with these roles

I created a new view off the admin/structure/views/view/user_admin_people view, added a batch exporting to a csv file.

The exported view does not show the export roles column

On the preview it looks like this:

Using the stable version of this module the role field shows

darrenwh’s picture

darrenwh’s picture

Issue summary: View changes

Removed image added in error

darrenwh’s picture

Looking more into #287 this looks as if it may be something to do with permissions, I'm wondering if there is a specific way to build the view when in a batch?

darrenwh’s picture

#277

+++ b/src/Plugin/views/display/DataExport.php
@@ -34,9 +41,132 @@ class DataExport extends RestExport {
+  protected static function buildBatch(ViewExecutable &$view, array $args) {

Objects don't need to be passed by reference

ayalon’s picture

I had the same issue as darrenwh. All my search api fields are empty in batch mode.

I was able to fix it, its just a one-line change but it took me a whole day to figure out.

For performance reasons, in batch-mode the author tries to only render the display_handler. This does not work because several important hooks are not called if you do not use $views->render() and therefor a lot of fields are empty.

As soon as we use $view->render() the fields are no more empty.

--- src/Plugin/views/display/DataExport.php	(date 1573221349644)
+++ src/Plugin/views/display/DataExport.php	(date 1573221349644)
@@ -780,9 +780,9 @@
       return;
     }
 
-    // Rendering the display_handler is much more efficient than rendering
-    // the entire view.
-    $rendered_rows = $view->display_handler->render();
+    // We have to render the whole view to get all hooks executes.
+    // Only rendering the display handler would result in many empty fields.
+    $rendered_rows = $view->render();
     $string = (string) $rendered_rows['#markup'];
 
     // Workaround for CSV headers, remove the first line.
ayalon’s picture

ayalon’s picture

Here is an updated patch:

ayalon’s picture

And here is a patch, that is applied correctly.

YurkinPark’s picture

Status: Needs work » Needs review
jansete’s picture

Hi guys,

It looks very nice.

In next phases it cuold be interesting add in view config an option to redirect to preview page (page before batch).

darrenwh’s picture

Status: Needs review » Reviewed & tested by the community

I've been using this for the last few weeks without issue so happy to RTBC

mbatterton’s picture

The latest patch works well for me but needs to be added inline with https://www.drupal.org/project/views_data_export/issues/2887450

maartendeblock’s picture

I can confirm path #295 works on my installation.

darrenwh’s picture

Whats holding this back from being merged in? I see the note:

In next phases it could be interesting add in view config an option to redirect to preview page (page before batch).

Assume this will be added off the back of this?

tjmoyer’s picture

I implemented patch #295, which worked great. But then in the last couple of days I'm seeing "No active batch" when trying to export that view. It seems to be a caching issue as if I add something like "?x=1" to the end of the URL it exports without an issue.

Anyone else run into this issue? Any ideas how to handle this?

malcolm_p’s picture

#295 is working for me as well.

The attached patch makes minor changes suggested in #279 :

  • Sets default file storage to private instead of public for a more secure default.
  • Changes text from 'instantly' to 'immediately' for consistency.
prerit_mohan_bhatnagar’s picture

Patch #303 works great however Drupal is returning the old file every time post first time download. The exposed filters filter out the data correctly but when we download previously generated file is downloaded and not the new one. Any one facing this issue?

mstrelan’s picture

Do the files have a unique filename? Try appending a timestamp to the generated filename. Most likely cached by your web server and/or browser.

prerit_mohan_bhatnagar’s picture

Yeah they do have time stamp added to the filename. Also this is not happening on my DEV AWS EC2 setup. All other environments are returning previously generated file.

attheshow’s picture

Using patch #295, when trying to download the generated file Chrome says "csv.html Failed - Forbidden". The URL looks like this: /sites/default/files/views_data_export/intercept_room_reservations_report_data_export_1/6-1579289492/.csv

Any ideas what I might be doing wrong?

YurkinPark’s picture

Hi everyone, attaching patch for beta3 version (because patch candidate can not be applied to beta3)

  • jhedstrom committed 86d57bd on 8.x-1.x
    Issue #2789531 by YurkinPark, simplyshipley, artematem, Berdir, _Archy_...
jhedstrom’s picture

Status: Reviewed & tested by the community » Fixed

I've committed this and am tagging a new release. Thanks for all the help and feedback on this epic issue!

scanasgarzo’s picture

I'm using latest 8.x-1.0-beta4 to export 2000~ users using a batch export of 10 items and I face the same issue explained on #307.

The export creates a new directory inside sites/default/files/views_data_export/users_csv_data_export_1 but this directory remains empty. The CSV file is nowhere. I have also temporally set 777 permissions and the problem persists.

If I go back to Standard export method the export works fine. However, with this method I can only export 1500 users before it reaches PHP timelimit. This is the reason we need batch export.

P.D I hadn't set the private file directory of Drupal. Now I've set it but the problem persists.

Given this issue has already many comments and it has been fixed, I created a new one: #3111879: Batch operations doesn't create the export file on file system

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

matthieu_collet’s picture

The patches don't work with 8.x-1.3
Does anyone has the same problem ?