When using a CSV file with a Feeds importer, what is the syntax for a field that allows multiple values, i.e. a 1-to-many relationship, like node references and user references?

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Title: For mulitple value fields (1-to-many relationships) like node reference, what is the syntax for CSV files? » CSV: Support one to many relationships
Version: 6.x-1.0-alpha12 » 6.x-1.x-dev
Component: Documentation » Code
Category: support » feature

That's not supported at the moment. What you're looking for is support for a nest operation that transforms the CSV file's flat format to nested arrays.

alex_b’s picture

One to many relationships in CSV are properly captured in first normal form:

name,phone_number
mark,2023151234
mark,2033334155
mark,3153152120

The CSV parser currently transforms this into an array that looks like this:

array(
  array(
    'name' => 'mark',
    'phone_number' => '2023151234',
  ),
  array(
    'name' => 'mark',
    'phone_number' => '2033334155',
  ),
  array(
    'name' => 'mark',
    'phone_number' => '3153152120',
  ),
);

What we need though is this:

array(
  array(
    'name' => 'mark',
    'phone_number' => array(
      '2023151234',
      '2033334155',
      '3153152120',
    ),
  ),
);

The way we can achieve this is to allow for specifying unique fields on the parser. The parser would then attempt to nest rows where the same field value occurs twice. This would have to work with multiple unique fields, consider a CSV file with three fields:

name,location,phone_number

Now we specify name and location as unique:

name,location

This should cause CSV parser to nest all entries (i. e. the phone_number entry) under location and all location entries under name.

I've not thought this all the way through. e. g. open question is how mappers would consume a multi level nested arrays.

Anonymous’s picture

Thanks for the feedback Alex.

One small problem I see with this though. How will the Parser (or Mapper?) know the difference between an new node and a multiple field entry. For example:

0:  name,phone_number
1:  mark,2023151234
2a: mark,2033334155
2b: mark,2033334127

or, if this is in a file:

0: name,phone_number
3: mark,2033335543

Line1: New node with some data
Line2a: Same node, new data set, please overwrite if it already exists
Line2b: Same node, additional entry for multiple value phone_number field
Line 3: New, *additional data*, for already existing node in DB. Please *add* the phone_number data to any existing node, do not delete any current phone_number relationships.

How do we distinguish between 2a and 2b and 3?

clemens.tolboom’s picture

I have solved this for multiple locations by 'defining' a multiple value separator.

Multiple locations.country codes AM\nRM\nRU\nUA

For this i only needed to patch the location mapper but it would be better to patch the CSV parser. This way it would be easy to define a multifield separator through the CSV parser settings.

Or maybe through the field mapper? That sounds not better but is an option.

What do you think about this?

alex_b’s picture

#3: by specifying one or more group-by fields (in our example that would be 'name'). Records would be grouped by the fields specified.

#4: I agree that we should support one-to-many relationships on the parser level. My current state of thinking about this problem is captured in #2. I'd like to avoid supporting sub-seperators that effectively make one field to multiple fields (I think this is what you suggest). However, splitting such fields is really simple and can be done easily on a per-use case basis.

clemens.tolboom’s picture

However, splitting such fields is really simple and can be done easily on a per-use case basis.

How easy would that be? My case now is having a csv with multiple terms and locations. taxonomy fails. Location i patched for now with explode("\n", $value)

Have the parser allow field splits could be a quick win. But risky too ... having to check for the special split value. So I guess grouping would work better.

alex_b’s picture

Have the parser allow field splits could be a quick win. But risky too ... having to check for the special split value. So I guess grouping would work better.

Agreed. I meant "easy" as in easy for a one-off like you did that for Location. Splitting out from within fields in a generalized way is actually hard to do.

clemens.tolboom’s picture

I'm still puzzling how to spit out a csv from say three tables node (2 nodes), taxonomy (3 terms per node), location (2 location per node).

Version 1: Doing a full join (easy from db perspective) result into 2 * 3 * 2 = 12 rows.

Importing Version 1 would mean to filter out duplicates of Terms and Location. A travel round trip (twice New York=duplicate location) is ruled out to get imported or we must make this configurable.

Version 2: Creating the csv by programming would result into 6 rows with 2 empty location values. I'm not sure my data provider is willing to program this.

n-1 t-1 l-1
n-1 t-2 l-1
n-1 t-3 ---
n-2 t-1 l-1
n-2 t-2 l-1
n-2 t-3 ---

Importing Version 2 would be easier I guess. Node is the repeating indicator. Terms and Locations are unique values to aggregate into an array.

infojunkie’s picture

subscribe

stella’s picture

subscribe

geek-merlin’s picture

this might benefit from #943344: Transformers

OddSim’s picture

My hack around this was to delimit multi-values with double pipes, then alter the CSV parser to explode fields containing the double pipes.

Example input:

    name,phone_number
    mark,2023151234||2033334155||3153152120

Change to ParserCSV.inc

 -  $row[$columnName] = isset($field) ? $field : '';
 +  $row[$columnName] = isset($field)
 +      ? (strpos($field, '||') !== false)
 +          ? explode('||', $field)
 +          : $field
 +      : '';
alex_b’s picture

I did something similar here: https://gist.github.com/658222

Nephele’s picture

Status: Active » Needs review
FileSize
9.83 KB

I find the approach in #12/#13 more natural than the approach in #2. So I took the liberty of expanding the github snippet into a full parser plugin that could hopefully be added to the Feeds module. It seems to me that if the approach in #2 is also wanted, it could be implemented as a separate parser, and users could then choose which parser they prefer.

The attached patch makes a new Multivalue CSV Parser available when defining a feed. The new parser includes options to specify the multivalue separator, and whether or not to strip spaces around the separator. With the default options, the string '|| one || two || three' would be split into three values: 'one', 'two', and 'three'. (Note, though, that the patch was written for 7.x rather than 6.x, since I'm working in Drupal 7).

The patch also includes a couple minor bug fixes that I noticed while testing this code (although they're not strictly part of the new parser):
* In 'FeedsSource.inc', fixed a bug that appears if you change the parser (e.g., from CSV Parser to Multivalue CSV Parser) for a feed definition that has previously been used to import a file -- the bug is that when you go to import another file, any options for the new parser are ignored.
* In 'FeedsCSVParser.inc', fixed the template so that it uses the specified delimiter instead of always using ','

jlab’s picture

I tried patching with feeds-6.x-1.x-dev

patch -p0 < feeds-multivaluecsvparser-759966-14.patch

But I'm getting the following output:

patching file feeds.plugins.inc
patching file includes/FeedsSource.inc
Hunk #1 FAILED at 504.
1 out of 1 hunk FAILED -- saving rejects to file includes/FeedsSource.inc.rej
patching file plugins/FeedsCSVParser.inc
Hunk #1 FAILED at 195.
1 out of 1 hunk FAILED -- saving rejects to file plugins/FeedsCSVParser.inc.rej
patching file plugins/FeedsMultiValueCSVParser.inc
Nephele’s picture

The patch I provided in #14 is for feeds-7.x-2.0-alpha2, which is presumably why you're getting errors. It probably wouldn't be too much work to create a 6.x patch, too -- I just wasn't sure how much interest there'd be, plus it's harder for me to test 6.x at this point.

serg.remote’s picture

#12 is small and elegant - works perfectly for me

videographics’s picture

The patch in #14 looks good. From an interface point of view, why not present the user with a choice of delimiters for multi-value fields just like a choice of field delimiters is offered?

Ayesh’s picture

For most comments here, look at http://drupal.org/project/feeds_tamper

rodrigogarcia’s picture

Hello,
I took #14 patch and made a plugin module for Drupal 6, in case somebody finds it useful.

johnv’s picture

@Ayesh, #19, thank you, I solved the multi-value case without problems and without patches with Feeds Tamper (on D7).

Ayesh’s picture

#12 worked for me too.
I think single pipe is even better because most standard applications' outputs are delimited by single pipes(or semi colons).
Thank you oddsim!

rfay’s picture

subscribe

dkroneb’s picture

Version: 6.x-1.x-dev » 7.x-2.x-dev

Thank-you Nephele, I did the #14 attached patch on "feeds-7.x-2.x-dev". I did the code paste by hand, slight change on line numbers for the dev version.
Just what I needed to import for "unlimited" multiply items. I highly recommend bringing the "Multivalue CSV parser" into feeds versions.

chrisjlee’s picture

@dkroneb So is there documented process for importing multiple values into a single field yet?

Or has this patch been committed into feed versions yet?

rfay’s picture

The real answer here remains the Feeds Tamper module.

verres’s picture

@rfray,

how can the feeds tamper module create the multi-value cck field? is there a plugin for this already? if so - it's not listed on the plugin list. I could easily create a 'pipeline' separated import field - explode it into an array...but then how would I take each array element and create the multiple values for the cck node reference?

for example - if I imported a field such as: "222|333|444" -- and I want each value to be treated as a separate value in an unlimited value node-reference, where/how would I do that? thanks

UPDATE.... I didn't notice you had it posted here: http://drupal.org/node/1122422 -- worked like a charm!

rfay’s picture

@verres, sorry! I thought I'd mentioned how to do it, not just feeds tamper. The last screencast at http://www.drupalcommerce.org/node/467 demonstrates how to do multivalued fields using feeds tamper. In this case it's multivalued product reference fields.

carn1x’s picture

Unfortunately the Feeds Tamper method requires that the creator of the CSV is technically competent. I have clients who I would not trust to merge fields into a "CSV cell", yet the alternative is that I need to manually convert each time the client updates their product list.

The node_import module from D6 made multi-value field mapping very simple however it did so by analyzing the CSV first and then mapping before the actual import. For multi-value fields, it provided multi-select lists where you could select multiple columns. It would be great it somebody could recreate similar functionality in Feeds to allow for great automation.

plamenD’s picture

The approach in #2 it seems easer to implement, so if is of any help hear is me solution which works for me so far.
in feeds/libraries/ParserCSV.inc after line 300

+     $instancesof = array_count_values($this->columnNames);
        foreach ($this->columnNames as $columnName) {
          $field = array_shift($fields);
-         $row[$columnName] = isset($field) ? $field : '';
+        if ($instancesof[$columnName] > 1) {
+         $row[$columnName][] = isset($field) ? $field : '';
+       }
+       else {
+          $row[$columnName] = isset($field) ? $field : '';
+       }
        }

And CSV format have to be:

guid,title,image,image,keys,keys,keys
1,"Title","[link to image]","[link to image]","first cloud link","Second cloud link","Third cloud link"

Please Excuse my english is not my native.

arkjoseph’s picture

Is there an a solution to the original question?

Our client data csv file will be formatted in a way that mutli-value fields will need to be on a per-row basis. We need the csv file to grow vertically and '||' separated values will become cumbersome and require too much interaction with the csv.
The attached file is a sample of how the data will be coming to us.

Am I asking to much for this feature to be integrated into feeds? Is there a possibility that feeds tamper can help out with this import?

Suggestions or tips ?

zeezhao’s picture

See the patch for field collections to see if can help:
drupal.org/node/1063434

arkjoseph’s picture

Thanks zeezhao,
I will investigate the patch. It doesn't look like its for 6.x but im sure it will get some attention soon enough.
Also, I am al little confused as to how Field Collection plays a role in Feeds?

Thanks again!

clemens.tolboom’s picture

arski’s picture

hmm, should the notation in first normal form work for entity references too? e.g. I have data in the form like:

1;foo
1;bar

where foo and bar are entity labels.. however, when I import the CSV, only one of them ends up being referenced :(

twistor’s picture

The first normal form described by Alex has never been implemented.

windmaomao’s picture

I see same thing as #35,

#14 patch doesn't work with entityreference :nid :label mapping either. Wonder how to make it work ?

jephree’s picture

Status: Needs work » Needs review

I tried solving this problem with this module...
https://github.com/jephree/drupal_aggregate_feed

Status: Needs review » Needs work

The last submitted patch, feeds-multivaluecsvparser-759966-14.patch, failed testing.

Status: Needs review » Needs work
breeze_man’s picture

If use feeds_tamper, you need to setup rule for every multi value field. Why not let CSV parser detect and split multi value automatically.
I made a patch which is the same as #12, please feel free to use it.