Hello,
as per subject, the delimiting double quotes are kept in the titles etc for me.

Any ideas?

Comments

giorgio79’s picture

I was wondering if escaped commas are taken into account.

here is a short writeup on the CSV format spec I was familiar with so far:

The CSV Format:

1. Each record is one line - Line separator may be LF (0x0A) or CRLF (0x0D0A), a line seperator may also be embedded in the data (making a record more than one line but still acceptable).

2. Fields are separated with commas.

3. Leading and trailing whitespace is ignored - Unless the field is delimited with double-quotes in that case the whitespace is preserved.

4. Embedded commas - Field must be delimited with double-quotes. A single cell with the text apples, carrots, and oranges becomes "apples, carrots, and oranges".

5. Embedded double-quotes - In fields containing a double quote, the double quote must be escaped by replacing the single double quote with two double quotes.

6. Embedded line-breaks - Fields must be surrounded by double-quotes.

7. Always Delimiting - Fields may always be delimited with double quotes, the delimiters will be parsed and discarded by the reading applications.

Here are some examples that demonstrate the rules above. Each sample describes the data and how the reading application should interpret it.

Standard lines:

 AAA, BBB, CCC 
 111, 222, 333
 444, 555, 888

Leading and trailing whitespace and embedded commas:

 "  AAA ", "B,B,B", "   CCC,,," 

Double-quotes: If your input data is
He said "Great!" and "with quotes" and no quotes
the corresponding line in the CSV input file looks like:

 "He said ""Great!""", """with quotes""", "no quotes" 


Note: In the USA and some other countries, for example the United Kingdom, Windows uses a dot (.) as decimal delimiter. For example, 0.5 = 1/2. In these countries, the data entries in a CSV file are separated by commas (,).

In continental Europe and some countries elsewhere Windows uses a comma (,) as decimal delimiter. For example, 0,5 = 1/2. In these countries, the data entries in a CSV file are separated by semi-colons ( ; ).

You can switch between ; and , by going to Start > Control Panel > Regional and Language Options > Standards and formats drop down. You can change the country there. For ; as CSV separator select (for example) USA and for , as separator select (for Example) Germany. 

parrottvision’s picture

I am not sure if we share the same issue here. Having delimited fields with comma in some description fields (e.g I was walking down the path, the moon was up) results in the import breaking?

alex_b’s picture

In alpha1 only single quotes are supported (I think). Either way, 6.x HEAD (dev version) has a new and more powerful parser, I don't plan any fixes on the alpha1 parser.

capellic’s picture

4. Embedded commas - Field must be delimited with double-quotes. A single cell with the text apples, carrots, and oranges becomes "apples, carrots, and oranges".

The problem is that the parser doesn't properly handle a comma in field.

This was the test CSV file I used:

FIELD1,FIELD2,FIELD3
"Row1,F1","Row1,F2","Row1,F3"
"Row2,F1","Row2,F2","Row2,F3"
"Row3,F1","Row3,F2","Row3,F3"

I had mapped FIELD1 to Title and the result was that the title for the first row was:

"Row1

The parser got to the comma and thought it was the field delimiter. But according to the specification above, a field delimited by double-quotes can have a comma in it.

I am going to try the DEV version to see if that clears up the issue.

Thanks for this module ---- it is extremely helpful.

capellic’s picture

Version: 6.x-1.0-alpha1 » 6.x-1.x-dev

I have confirmed that the issue still exists in DEV as of January 13.

capellic’s picture

Category: support » bug
Status: Active » Needs review
StatusFileSize
new3.87 KB

I've written CSV import utilities in PHP before, so I decided I'd take a look at the code. I'm happy to say that I've submitted a patch! I haven't really put it through the ringers with all sorts of funny characters and whatnot, but it's working on my test dataset (below) and another one that I can't share with the public.

This dataset now works:

FIELD1,FIELD2,FIELD3
"Row1,F1","Row1,F2","Row1,F3"
"Row2,F1","Row2,F2","Row2,F3"
"Row3,F1","Row3,F2","Row3,F3"

All my work is contained in the parser_csv_parse() function of parser_csv_parser.inc. I noted that there was a manual parsing to look for dilmeters and never had to do that much work to get my stuff to work. I knew there was a PHP function that created an arrary from a file, but here we needed to create one from a string. Turns out there is a PHP function in development here:

http://us3.php.net/manual/en/function.str-getcsv.php

In the comments, "Rob" provided the basis of the code for which I was looking. I pulled a lot of it out because it was opening an file and looping through lines of code. All I needed was the bit that threw the line into an array.

Note that I updated Rob's code with the suggestion from Anonymous right above it which provided a better expression.

This did the trick. I can now use double-quotes to encapsulate my string and use commas within that field.

Use at your own risk until the maintainer has had the time to review and apply to DEV as I may have made some bad decisions with regard to the integration due to me not knowing the ins and outs of his code.

alex_b’s picture

#6: I read the code and this is looking good in principle. I haven't tested.

Are you running a patched version? Any reports from the field?

capellic’s picture

@alex_b: Yes, I am running the patched version without any problem. However, I am only importing one feed that hasn't changed yet. But, I've deleted imported nodes and reimported several times and all seems well.

karens’s picture

+1 from me on this. I am doing some work on the feed element mapper for date fields to see if I can make it work correctly with non-ical feeds as well as ical feeds and am exporting calendar data from Yahoo calendar. That creates a double-quote delimited csv file, so I need the parser to handle double-quote delimiting correctly. I believe that Outlook creates similar output in its calendar export, so I believe if I can get this working, an Outlook export would also work in the mapper. At any rate, having the parser handle double-quote delimiters is critical to this application. I'll test the patch and report back.

karens’s picture

StatusFileSize
new4.85 KB

This worked great for me as soon as I fixed another bug that was uncovered by http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/parser_csv/.... It was sending a semi-colon as the delimited instead of a comma because of the following code on line 102-104 of parser.cvs.inc:

  $it = new ParserCSVIterator($filepath);
  $delimiter = empty($feed->settings['parsers']['parser_csv']['delimiter']) ? ',' : ';';
  $rows = parser_csv_parse($it, $delimiter);

Before the above fix it was always sending a comma, which happened to work right. After that fix it was trying to use ';' as a delimiter even though I asked for a comma as the delimiter, which broke the rest of the parser.

So here's a patch with that fixed, too.

alex_b’s picture

#10: that's great. Looks like we're getting close to RTBC?

karens’s picture

It's RTBC as far as I'm concerned, but since I made another change maybe capellic should confirm that his use case still works.

alex_b’s picture

Good call. capellic: once you confirm, I'll commit.

patchak’s picture

I tested the patch by KarenS and I can confirm it fixed the issue about the double quotes as field delimiters.
+1

Patchak

jpetso’s picture

Status: Needs review » Needs work

Regression: does not cope with linebreaks enclosed within quotes. I'll take a look at it afterwards, still skeptical though if regexps will do the job in a specification-conform manner.

alex_b’s picture

@jpetso: I see. Can you confirm an error? If so, could you post a sample CSV file?

jpetso’s picture

Here's an example CSV file (also as an attachment, for easy download):

"Testing
  those linebreaks."

Expected result: array(array("Testing\n those linebreaks"))
Result after applying the above patch: array(array('"Testing'), array('those linebreaks"'))

jpetso’s picture

StatusFileSize
new30 bytes

d.o did not like the CSV file with .csv extension, attaching it as .txt instead.

jpetso’s picture

In return, could someone provide me with an test case for the topic issue in here? The one mentioned above:

FIELD1,FIELD2,FIELD3
"Row1,F1","Row1,F2","Row1,F3"
"Row2,F1","Row2,F2","Row2,F3"
"Row3,F1","Row3,F2","Row3,F3"

works perfectly for me - with the current parser, I get an array that looks like this:

array(
  array("FIELD1", "FIELD2", "FIELD3"),
  array("Row1,F1", "Row1,F2", "Row1,F3"),
  array("Row2,F1", "Row2,F2", "Row2,F3"),
  array("Row3,F1", "Row3,F2", "Row3,F3"),
)

So what exactly is going wrong?

karens’s picture

#19, Yes I can get that result, but the double quotes are included in the field values and I need them removed. For instance if the value is 02/15/2009 the element value becomes "02/15/2009", with the double quotes prepended and appended to the field value. The alternate method of parsing that didn't do that.

karens’s picture

I can't get FeedAPI working at all at the moment so I can't double test anything, but that's what I was originally seeing that I was trying to solve.

jpetso’s picture

The quotes in the above array demo were only there to conform to PHP syntax. They were not part of the field values.
So, I'm sorry, but I simply cannot reproduce this. No additionally added double quotes for me. Let me try to list something here that can be reproduced:

Step 1. Store the above sample input file as quotes.csv:

FIELD1,FIELD2,FIELD3
"Row1,F1","Row1,F2","Row1,F3"
"Row2,F1","Row2,F2","Row2,F3"
"Row3,F1","Row3,F2","Row3,F3"

Step 2. Enable Devel and have the PHP code execution block appear below your pages.

Step 3. Insert the following code into the code execution block:

$path = '/home/jakob/quotes.csv'; // replace directory with the one where you stored the CSV file
module_load_include('inc', 'parser_csv', 'parser_csv_parser');
$lineIterator = new ParserCSVIterator($path);
dpm(print_r(parser_csv_parse($lineIterator), TRUE));

Apart from the PHP notices (I'm fixing those with #376263: Make parser_csv_parse() into a parser class), the following output gets displayed on my site:

Array
(
    [0] => Array
        (
            [0] => FIELD1
            [1] => FIELD2
            [2] => FIELD3
        )

    [1] => Array
        (
            [0] => Row1,F1
            [1] => Row1,F2
            [2] => Row1,F3
        )

    [2] => Array
        (
            [0] => Row2,F1
            [1] => Row2,F2
            [2] => Row2,F3
        )

    [3] => Array
        (
            [0] => Row3,F1
            [1] => Row3,F2
            [2] => Row3,F3
        )

)

For the love of god I can't make out any superfluous double quotes in there. Unless you give me an example with a different input file and output array, I cannot confirm this issue (and will protest against the patch being applied).

jpetso’s picture

Note: the above was tested with current dev version (DRUPAL-6--1 branch, parser_csv_parser.inc v1.1.2.3).

karens’s picture

Status: Needs work » Fixed

There have been numerous fixes to FeedAPI since I first tried this. I have just tried everything again and found one bug in this parser that kept my test from working #365562: dev version is not separating field values (it was the first item in the above patch, which I just posted as a simple patch to the other issue). With that fixed I agree that the current code works correctly and I didn't need this patch.

Sorry for the confusion but I could not get this working before without it and now I can.

FWIW, I added some test files to the Date module to test ical and csv imports of date values with FeedAPI. The csv test is in date/tests/Yahoo.csv, which is a file in the format of a csv export from Yahoo calendar or Microsoft Outlook.

alex_b’s picture

Category: bug » task
Status: Fixed » Needs review
StatusFileSize
new5.85 KB

I can't reproduce a problem with double quotes either. I've started a simpletest for capturing this kind of issues. I'd like to make it a good practice to add a test for every parsing problem we fix.

Patch adds

* Simpletest for CSV Parser with basic test that parses a zipped csv file
* Test that parses a double quoted CSV file
* Test that parses a double quoted CSV file with line breaks

Tests are all passing fine.

alex_b’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

capellic’s picture

Thanks all for making this happen.