Closed (fixed)
Project:
CSV Parser
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
23 Dec 2008 at 07:11 UTC
Updated:
25 Mar 2009 at 23:33 UTC
Jump to comment: Most recent file
Hello,
as per subject, the delimiting double quotes are kept in the titles etc for me.
Any ideas?
| Comment | File | Size | Author |
|---|---|---|---|
| #25 | 350405_simpletests.patch | 5.85 KB | alex_b |
| #18 | linebreak.txt | 30 bytes | jpetso |
| #10 | parser_csv.patch | 4.85 KB | karens |
| #6 | double_quote_dilimeter_fix.patch | 3.87 KB | capellic |
Comments
Comment #1
giorgio79 commentedI was wondering if escaped commas are taken into account.
here is a short writeup on the CSV format spec I was familiar with so far:
Comment #2
parrottvision commentedI am not sure if we share the same issue here. Having delimited fields with comma in some description fields (e.g I was walking down the path, the moon was up) results in the import breaking?
Comment #3
alex_b commentedIn alpha1 only single quotes are supported (I think). Either way, 6.x HEAD (dev version) has a new and more powerful parser, I don't plan any fixes on the alpha1 parser.
Comment #4
capellicThe problem is that the parser doesn't properly handle a comma in field.
This was the test CSV file I used:
I had mapped FIELD1 to Title and the result was that the title for the first row was:
"Row1
The parser got to the comma and thought it was the field delimiter. But according to the specification above, a field delimited by double-quotes can have a comma in it.
I am going to try the DEV version to see if that clears up the issue.
Thanks for this module ---- it is extremely helpful.
Comment #5
capellicI have confirmed that the issue still exists in DEV as of January 13.
Comment #6
capellicI've written CSV import utilities in PHP before, so I decided I'd take a look at the code. I'm happy to say that I've submitted a patch! I haven't really put it through the ringers with all sorts of funny characters and whatnot, but it's working on my test dataset (below) and another one that I can't share with the public.
This dataset now works:
All my work is contained in the parser_csv_parse() function of parser_csv_parser.inc. I noted that there was a manual parsing to look for dilmeters and never had to do that much work to get my stuff to work. I knew there was a PHP function that created an arrary from a file, but here we needed to create one from a string. Turns out there is a PHP function in development here:
http://us3.php.net/manual/en/function.str-getcsv.php
In the comments, "Rob" provided the basis of the code for which I was looking. I pulled a lot of it out because it was opening an file and looping through lines of code. All I needed was the bit that threw the line into an array.
Note that I updated Rob's code with the suggestion from Anonymous right above it which provided a better expression.
This did the trick. I can now use double-quotes to encapsulate my string and use commas within that field.
Use at your own risk until the maintainer has had the time to review and apply to DEV as I may have made some bad decisions with regard to the integration due to me not knowing the ins and outs of his code.
Comment #7
alex_b commented#6: I read the code and this is looking good in principle. I haven't tested.
Are you running a patched version? Any reports from the field?
Comment #8
capellic@alex_b: Yes, I am running the patched version without any problem. However, I am only importing one feed that hasn't changed yet. But, I've deleted imported nodes and reimported several times and all seems well.
Comment #9
karens commented+1 from me on this. I am doing some work on the feed element mapper for date fields to see if I can make it work correctly with non-ical feeds as well as ical feeds and am exporting calendar data from Yahoo calendar. That creates a double-quote delimited csv file, so I need the parser to handle double-quote delimiting correctly. I believe that Outlook creates similar output in its calendar export, so I believe if I can get this working, an Outlook export would also work in the mapper. At any rate, having the parser handle double-quote delimiters is critical to this application. I'll test the patch and report back.
Comment #10
karens commentedThis worked great for me as soon as I fixed another bug that was uncovered by http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/parser_csv/.... It was sending a semi-colon as the delimited instead of a comma because of the following code on line 102-104 of parser.cvs.inc:
Before the above fix it was always sending a comma, which happened to work right. After that fix it was trying to use ';' as a delimiter even though I asked for a comma as the delimiter, which broke the rest of the parser.
So here's a patch with that fixed, too.
Comment #11
alex_b commented#10: that's great. Looks like we're getting close to RTBC?
Comment #12
karens commentedIt's RTBC as far as I'm concerned, but since I made another change maybe capellic should confirm that his use case still works.
Comment #13
alex_b commentedGood call. capellic: once you confirm, I'll commit.
Comment #14
patchak commentedI tested the patch by KarenS and I can confirm it fixed the issue about the double quotes as field delimiters.
+1
Patchak
Comment #15
jpetso commentedRegression: does not cope with linebreaks enclosed within quotes. I'll take a look at it afterwards, still skeptical though if regexps will do the job in a specification-conform manner.
Comment #16
alex_b commented@jpetso: I see. Can you confirm an error? If so, could you post a sample CSV file?
Comment #17
jpetso commentedHere's an example CSV file (also as an attachment, for easy download):
Expected result:
array(array("Testing\n those linebreaks"))Result after applying the above patch:
array(array('"Testing'), array('those linebreaks"'))Comment #18
jpetso commentedd.o did not like the CSV file with .csv extension, attaching it as .txt instead.
Comment #19
jpetso commentedIn return, could someone provide me with an test case for the topic issue in here? The one mentioned above:
works perfectly for me - with the current parser, I get an array that looks like this:
So what exactly is going wrong?
Comment #20
karens commented#19, Yes I can get that result, but the double quotes are included in the field values and I need them removed. For instance if the value is
02/15/2009the element value becomes"02/15/2009", with the double quotes prepended and appended to the field value. The alternate method of parsing that didn't do that.Comment #21
karens commentedI can't get FeedAPI working at all at the moment so I can't double test anything, but that's what I was originally seeing that I was trying to solve.
Comment #22
jpetso commentedThe quotes in the above array demo were only there to conform to PHP syntax. They were not part of the field values.
So, I'm sorry, but I simply cannot reproduce this. No additionally added double quotes for me. Let me try to list something here that can be reproduced:
Step 1. Store the above sample input file as quotes.csv:
Step 2. Enable Devel and have the PHP code execution block appear below your pages.
Step 3. Insert the following code into the code execution block:
Apart from the PHP notices (I'm fixing those with #376263: Make parser_csv_parse() into a parser class), the following output gets displayed on my site:
For the love of god I can't make out any superfluous double quotes in there. Unless you give me an example with a different input file and output array, I cannot confirm this issue (and will protest against the patch being applied).
Comment #23
jpetso commentedNote: the above was tested with current dev version (DRUPAL-6--1 branch, parser_csv_parser.inc v1.1.2.3).
Comment #24
karens commentedThere have been numerous fixes to FeedAPI since I first tried this. I have just tried everything again and found one bug in this parser that kept my test from working #365562: dev version is not separating field values (it was the first item in the above patch, which I just posted as a simple patch to the other issue). With that fixed I agree that the current code works correctly and I didn't need this patch.
Sorry for the confusion but I could not get this working before without it and now I can.
FWIW, I added some test files to the Date module to test ical and csv imports of date values with FeedAPI. The csv test is in date/tests/Yahoo.csv, which is a file in the format of a csv export from Yahoo calendar or Microsoft Outlook.
Comment #25
alex_b commentedI can't reproduce a problem with double quotes either. I've started a simpletest for capturing this kind of issues. I'd like to make it a good practice to add a test for every parsing problem we fix.
Patch adds
* Simpletest for CSV Parser with basic test that parses a zipped csv file
* Test that parses a double quoted CSV file
* Test that parses a double quoted CSV file with line breaks
Tests are all passing fine.
Comment #26
alex_b commentedComitted #25 in preparation of #376263: Make parser_csv_parse() into a parser class
Comment #28
capellicThanks all for making this happen.