Webform is a great module and I use it on several customer sites. One customer has a Site with a Form for registering for a competition. By end of the registration period there are 74000 entries.

When I try to download all 74000 results at once I always get the PHP-Error message:
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 72 bytes) in .../sites/all/modules/webform/includes/webform.submissions.inc on line 664

As you can see I allready gave PHP 1024M memory, which should not be the normal value for a Drupal site.

As of Version 6.x-3.13 I can download a specific range of submissions, thank you very much for that ;-))

The Form collects quit some data: for 10000 entries the resultung text file is about 4MB big. So 40000 results are about 16MB in the text file.

As my customer blames me for this behaviour I should mark this as a bug report. But I think the module is working fine so I mark it as a feature request. I really wuold like to see an optimized version for many many results. Maybe its also a good idea to zip the results before downloading it.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

quicksketch’s picture

Title: Download many Results: Fatal error: Allowed memory size of » Use BatchAPI to Export very large data sets to CSV/Excel

I think you're correct that Webform should be able to export very large datasets, so I'm changing the title to reflect what would be needed to actually get this working. Most of the time when you have an expensive process (or really any process that can have an essentially limitless amount of data) Drupal will use the BatchAPI to split the expensive process across multiple requests. In order to do this, we should open a temporary file on the disk and write, say, 1000 submissions at a time to it per request.

chiefkong’s picture

Thanks for your immediate reply.

I think this is a great idea and I'm looking foward seeing this in a new version in the future.

quicksketch’s picture

I'd like to incorporate some of #1276098: Add Drush command for bulk export into this patch, which takes the approach we want, other than it only works from the command line. But it does use batches and a temporary file, so it could be a start.

Owen Barton’s picture

Note that #1276098: Add Drush command for bulk export should have the guts of the refactoring needed to implement this (everything except the drush commandfile) - I would suggest committing this first, since even without adding bulk export it cleans up of some very long functions. Once that is in it should be easy enough to write a Batch API wrapper that sets up and then successively calls the export functions, in the same way the drush command does. Once this part is done we can replace the drush code with a Batch API implementation (which Drush has support for) that will do exactly the same thing in slightly fewer lines.

quicksketch’s picture

I would suggest committing this first, since even without adding bulk export it cleans up of some very long functions.

Could you clarify what you meant by "this" here? You mean we should commit #1276098: Add Drush command for bulk export first? Or you mean we should do this issue's task of using BatchAPI for normal exports first? Neither issue contains any "cleanup of very large functions" (this issue obviously doesn't even have a patch), though I hope that will be a result of this issue.

Owen Barton’s picture

Sorry - I forgot I had already separated this out: the patch I was referring to (that reworks the export code so that it can be used in batch calls) is at #1275468: Improve webform.report.inc abstraction to support drush/batch processing.

I think the above patch (or something very similar) needs to go in before the Batch API can sensibly be worked on.

The Drush command could go in right away (and later be refactored to use Batch API), or it could be put on hold until Batch API is ready, and then refactored to use that.

davidseth’s picture

Hey @Owen Barton & @quicksketch,

Where are you guys at about committing the drush inc file found at http://drupal.org/node/1276098#comment-4993764?

I have a client who has 170k records in one comp and we are having a *very* tough time getting to results out. So drush is a good first step, but Batch API is the way to go.

How can I contribute?

Cheers,

David

quicksketch’s picture

Hi @davidseth! Glad you're taking an interest in this issue. Unfortunately I haven't looked into this enough to provide an exact path to completion. The status of things is basically that #1275468: Improve webform.report.inc abstraction to support drush/batch processing needs to be updated and working on both the D6 and D7 version. It should also be done in a way that this task (BatchAPI) can utilize the same functions. Ideally, it'd be demonstrated that BatchAPI and Drush could use the same set of callback functions to accomplish similar goals, so we wouldn't need to update our implementation when adding each version. I know that's asking a lot, but I don't know either Drush nor BatchAPI well enough to provide much guidance unless I were to undertake the problem myself. I will probably get to it eventually, but this isn't high on my priority list.

davidseth’s picture

@quicksketch, working on a re-roll for http://drupal.org/node/1275468. Will keep you up-to-date.

Owen Barton’s picture

@quicksketch Drush has Batch API support, so it would be trivial to have Drush use exactly the same implementation code. The existing Drush code still works on a batch basis (just not Batch API), so would be a reasonable starting point for the Batch API implementation - the setup code could be used as is, and would just formulate the set of webform_results_download_rows needed in advance, rather than running the while loop itself.

davidseth’s picture

Patch posted in original issue: http://drupal.org/node/1275468#comment-5830280

Please have a look and let's get this into Webform. Thanks.

ahilts’s picture

Hi all,

I'm working on a different approach to this issue https://github.com/andrewhilts/Webform-Stats. At my organization, we run a large contest system, and would like to generate reports regarding the number of entrants, uniques, and opt-ins to our mailing list according to an interval in time. We would also like for reports from different nodes to be comparable with one another.

One solution is to create "statistical groupings" of webforms. Similar reporting metrics will be applied to each node in the grouping, at the same interval. This will allow easy comparison across time.

To get around the php timeouts, the module I'm working on chunks the reports into smaller queries (one query per metric per node), and uses the drupal_queue system to generate them at a scheduled time (say, 2:00AM), on cron. Subsequent cron runs will continue to work through the queue until the time window has closed, or the queue has been completed.

There is still much work to be done on this module (drupal 6 only right now, it's just me contributing atm, and see the issues queue), but I'd like anyone to have a look at it's code and progress and let me know what you think of the approach. Of course, if you'd like to contribute, that's even better! I'm happy to respond to any questions or comments.

Thanks!

quicksketch’s picture

I committed #1275468: Improve webform.report.inc abstraction to support drush/batch processing earlier today. I'm going to take a stab at converting our submit handler to use the new APIs in a batch process.

quicksketch’s picture

Status: Active » Needs review
FileSize
24.93 KB

Here's a patch that makes the UI use batch processes and also converts our fledgling Drush support to use the same batch code. Overall I think this is a great enhancement. It immediately gets the user off the download page and into a batch operation, so there's less double-clicking/waiting while the export is generated. This patch makes it clear that the UI enhancements should have come before the drush integration (since we removed all the drush batch handling that was just added in #1275468: Improve webform.report.inc abstraction to support drush/batch processing). In any case, it's now done, and we've got batch processing in both Drush and the UI. Nice.

quicksketch’s picture

Version: 6.x-3.14 » 7.x-4.0-alpha8

This patch makes API changes, and it's a new feature, so it's going into 4.x only.

quicksketch’s picture

Status: Needs review » Fixed

Pushed up to 7.x-4.x.

quicksketch’s picture

I made a followup to clean up some minor issues with exporting at #2042045: Tidy up Batch Exports and Exporting UI.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

torotil’s picture

Version: 7.x-4.0-alpha8 » 7.x-3.x-dev
Issue summary: View changes
FileSize
5.28 KB

Here is a patch for webform-7.x-3.x.

torotil’s picture

This patch also solves output buffering issues with very large files (in respect to the memory_limit) and adds a Content-Length HTTP-header.

torotil’s picture

Here is a new patch for webform-7.x-3.x. It fixes some paging issues.

Nikita Petrov’s picture

Status: Closed (fixed) » Needs review

Why this issue has status 'fixed'? The problem still exists for all versions of webform module, the code from the latest patch is not in the repo. I applied the patch and it works for me. Please, review it and commit to the module. Thanks.

The last submitted patch, 14: webform_export_batch-1327186.patch, failed testing.

torotil’s picture

Status: Needs review » Closed (won't fix)

That's easily explained. While I'm currently still maintainer for webform 7.x-3.x I decided not to do a full backport of this functionality from 7.x-4.x. While this patch worked well for me it is still experimental.

At the moment I'm in the process of migrating all my remaining webform3 sites to 7.x-4.x. That means I - if no one else steps up - 7.x-3.x will be unmaintained again within the next 2 weeks, perhaps earlier than that.

So as far as I am concerned this is a won't fix for 7.x-3.x.

torotil’s picture

Version: 7.x-3.x-dev » 7.x-4.x-dev
Status: Closed (won't fix) » Closed (fixed)

… or rather it's fixed in 7.x-4.x.

Nikita Petrov’s picture

Ok, thank you for the explanation!