Problem/Motivation

Field labels in exports and the configuration screen are missing characters as basic as ä, ö and å.
This is because the labels are run through filter_var() which is supposed to strip, according to the comment, characters not allowed in keys of associative arrays.. However, at least I fail to find a place in the code which would actually use the label values as array keys.

Additionally, I traced the addition of the filter_var() call back to https://www.drupal.org/node/2853480 where it was not given any attention in the review process, so unfortunately I cannot tell if it really is a necessary step. However, the filtering makes the export functionality less intuitive as e.g. Finnish speakers need to decipher words such as "Ik" (Ikä, age) or "mraika" (määräaika, time target).

Proposed resolution

Remove some or all of the filtering. At least remove the filters (currently FILTER_FLAG_STRIP_HIGH | FILTER_FLAG_STRIP_LOW | FILTER_FLAG_NO_ENCODE_QUOTES) which strip away the "special" letters, to make the export configuration screen and the export itself more useful for non-English speakers.

Remaining tasks

  1. Discuss the necessity of running the labels through these filters in the first place.
  2. Decide on filters to remove, if not all.
  3. Write tests.
  4. Write the patch.
  5. Review the patch.

User interface changes

The labels will hopefully look the same on the export screen as well as in the export CSV as they do in the field settings.

API changes

Have not detected any as of yet. The Drupal\contact_storage_export\ContactStorageExportService::getLabels method will return different values for the field labels but that should not affect any code at least in this module.

Data model changes

None.

Release notes snippet

TBD.

Comments

kekkis created an issue. See original summary.

kekkis’s picture

Issue summary: View changes
scott_euser’s picture

Title: Consider removing filtering of field labes values » Consider removing filtering of field labels values
Assigned: Unassigned » scott_euser
Related issues: +#2854080: Provide an option to suppress CSV headers, +#2854087: Option for CSV Encoder to skip preparing and outputting the header.

Thanks for your message. This is definitely something we should sort as your examples show its clear this is not ideal. I will start with the reasoning why it is the case but the solution may be a bit more complicated and may need some related issues in the csv_serialization module.

So in csv_serialization, the header is extracted from the keys of the first row:

  protected function extractHeaders($data, array $context = array()) {
    $headers = [];
    if (!empty($data)) {
      $first_row = $data[0];
      $allowed_headers = array_keys($first_row);

That is here in gitlab: https://git.drupalcode.org/project/csv_serialization/blob/8.x-1.x/src/Encoder/CsvEncoder.php#L182.

This answer in StackOverflow explains the restrictions php puts on array keys:
https://stackoverflow.com/a/10696097

So in order to use a wider range of characters, we would need to find a way to to pass our own headers to CSV serliazation.

A related issue where there is a goal to suppress CSV Headers where essentially the problem is the same one (ie, the limitation in CSV serialization):
https://www.drupal.org/node/2854080

At a glance, it seems I have looked into this in the past and it appears CSV serialization has progressed:
https://www.drupal.org/project/csv_serialization/issues/2854087

The problem is, it is not released in the default branch, only their dev branch and beta 2x release. I would be happy to do some sort of check to see if the option is available, or have it as a patch in case you are sure you are using 1x-dev or 2x-beta.

Patch to come shortly with a first shot at this.

scott_euser’s picture

Title: Consider removing filtering of field labels values » Suppress default CSV Serialization Headers and instead use of rich UTF8 Headers
Assigned: scott_euser » Unassigned
Category: Bug report » Feature request
Status: Active » Needs review
Issue tags: +Needs tests

This is eventually going to need tests but at least ready for an initial review now.

scott_euser’s picture

scott_euser’s picture

In order for this patch to work, you will need to be on CSV Serialization 1x-dev or 2x-beta or newer since the default release does not contain the ability to suppress the output of headers.

scott_euser’s picture

Updated with a check that setSettings in CSV Serializations Encoder service is public (in default version it is protected).

scott_euser’s picture

Status: Needs review » Postponed

Pending CSV Serialization change from 2x beta to stable (or new release of >= 8.x-1.5)