Hi,

I have a view setup with views data export attached to export a csv of the filtered rows. The number of rows in the export is correct in comparison to the view but the export contains duplicates and some rows missing.

I've turned batch export off and everything works ok.

Cheers Dan

Comments

danharper created an issue. See original summary.

Nishruu’s picture

I have the same problem but turning off the batch export isn't an option because we need to be able to export large datasets (~8000 rows).
When I export more than a certain amounts of rows (it's difficult to know how much exactly, maybe 2000 rows) the problem appears.
As for @danharper the amount of rows exported is correct but some rows replace others.

I reproduce the problem with xlsx, xls and csv.

Nishruu’s picture

We solved our problem : the only sort criteria for our view was the creation date of the users.
This date was imported via migrate and similar for a lot of users so when each query generated by VDE was executed, it retrieved the users in an order that was not reliable.
So for example the batch n°150 retrieving the list of users with and offset of 500 and sorted only by creation date could very well find the same id as the batch n°151.

We needed to add another sort criteria with an unique id (here the uid) to make the batched export work.
So I don't know for @danharper but for us it was a mysql technicality and not a bug.

leontin’s picture

Indeed #3 helped me fix the view, thanks.

danharper’s picture

@Nishruu did you add the sort order to the VDE and the view it is attached to?

Cheers Dan

thomasmurphy’s picture

Duplication issues also raised here on the original batch support tread but not addressed
https://www.drupal.org/project/views_data_export/issues/2789531#comment-...

I'm also experiencing duplications (5-10%) which don't alter the total number of results, eg results are being overwritten by duplicates in the export. I tried switching the view sort order from a user created date o a UUID sort, which stopped the duplicates appearing in the export, but this is just a workaround for a bug which needs attention.

eloivaque’s picture

I have the same problem and I apply solution of #3 and it's works for me.

I think the problem is generated by order of view, and number of iteration of batch. In my case, I have 1 view with content type, and reference 9 paragraphs. This generate 9 rows.

Content 1 Pragraphs 1
Content 1 Pragraphs 2
Content 1 Pragraphs 3
Content 1 Pragraphs 4
Content 1 Pragraphs 5
Content 1 Pragraphs 6
Content 1 Pragraphs 7
Content 1 Pragraphs 8
Content 1 Pragraphs 9

The iteration of batch is defined to 5.
And the view is order date created of Content.

I think that, when batch get 5 elements with sql consult, return this.

Content 1 Pragraphs 2
Content 1 Pragraphs 4
Content 1 Pragraphs 3
Content 1 Pragraphs 8
Content 1 Pragraphs 6

Then the next sql return me 5 elements more, ordened by date of content type, and this for me, generate dupicate rows of paragraphs, and remove other paragraphs.

Content 1 Pragraphs 2
Content 1 Pragraphs 4
Content 1 Pragraphs 3
Content 1 Pragraphs 8
Content 1 Pragraphs 6
Content 1 Pragraphs 1
Content 1 Pragraphs 2
Content 1 Pragraphs 4
Content 1 Pragraphs 9

I solved order the view with ID paragraphs unique value. And it's works for me.

renatoheeb@gmail.com’s picture

Same story here, an additional order criteria helped to solve the issue. see #3
The bug is nasty and hard to grasp: The export is working and even the number of records is matching. You have to drill into your data to see the problem.

This bug does really need more love. Also, in the first place I was wondering, why we need to batch execute an export with only 10k rows anyway ...

hktang’s picture

We have encountered the same issue, and #3 fixes our export.

We had to add two IDs, the Content ID and the Author ID, for the export to work correctly.

DamienMcKenna’s picture

We were seeing this problem with an export too, only our export was losing hundreds of records. The table was sorting by creation date and the problem was that migrated data from a number of years back had the same creation date, and due to problems with the pagination query it would show a provide result set over time. I added an extra sort field for the node ID, which is after the primary sort mechanism (the previously mentioned creation date), and now it works properly.

It might be worth adding some official documentation about this, maybe even add something in the display plugin's form, as I suspect there might be lots of sites having problems with this without even knowing it.

DamienMcKenna’s picture

Title: Duplicates with batch export csv. » Inconsistent results when doing batch exports

Retitling the issue as it's a general problem of inconsistent data, rather than just duplicate records.

vishal.kadam’s picture

I've identified the source of the batch export limit problem. It has to do with the sequence of query results.

In order to prevent changes in the query result sequence, add the unique fields to the sort.