Multi-line imports fail. If I use the multiline file in test/node.csv and try to import it into a "page" content type, the import fails to detect that the body is multi-line; the import takes each line as the start of a new record to import.
Using the data from test/node.csv, the mapping test page shows for the _title_ of the first five records:
* instructions (node.inc test)
* 2. there is no user called 'jacques'
* 3. there is a user with uid = '2'
* 4. there is no user with uid = '3'."
* node test 1
As you can see, the import treats each line of the body as if it were the start of a new record.
I've also had this problem with CSV files that I created myself specifically for my content type.
As a work-around, can I somehow transform the carriage return in the multiline file into some other character that will be translated, such as "\n"? I know this doesn't actually work, but perhaps there's some other fix that will.
| Comment | File | Size | Author |
|---|---|---|---|
| #5 | node2.csv_.txt | 1.42 KB | myudkowsky |
| #5 | node3.csv_.txt | 1.43 KB | myudkowsky |
| #3 | field.definition.txt | 1.51 KB | myudkowsky |
| #3 | input.notworking.csv_.txt | 3.33 KB | myudkowsky |
| #3 | input.works_.csv_.txt | 3.34 KB | myudkowsky |
Comments
Comment #1
myudkowsky commentedUpgraded to critical to match the status of #273208: csv file with newlines.
Comment #2
myudkowsky commentedFixed title of bug.
Comment #3
myudkowsky commentedGiven #351084: Error in test file node.csv, the failure of tests/node.csv to import is not a good example of this bug.
Going back and trying some more, I see that I can in fact do an import -- this is frankly an utter mystery to me -- but I seem to now have trouble with just the last item on a CSV line. That is, of all the items on the line, the very last one seems to give me trouble.
There's a line-ending issue. The field that refuses to load has defined values of "Dry," "Semi-Dry," etc., but the system refuses to import CSV that includes the word "Dry."
I've set "auto_detect_line_endings = On" in /etc/php5/apache2/php.ini, but that doesn't seem to help. Besides, this is the output of a postgres database, created on LInux and parsed on Linux.
In order to solve this, I added quotes around the very last input item on the line. This allows the import to proceed.
I will hazard a guess that this is a trim()-related error of some sort, which happens at the end of lines in multi-line CSV input lines, and is only apparent when the CCK field has pre-defined values.
Status:
Drupal 5.14
Configuration file Protected
Cron maintenance tasks Last run 3 days 10 hours ago
You can run cron manually.
Database schema Up to date
File system Writable (public download method)
GD library 2.0 or higher
MySQL database 5.0.75
PHP 5.2.6-0.1+b1
PHP register globals Disabled
Unicode library PHP Mbstring Extension
Web server Apache/2.2.11 (Debian) PHP/5.2.6-0.1+b1 with Suhosin-Patch mod_ruby/1.2.6 Ruby/1.8.7(2008-08-11)
CCK Version:
5.x-1.10
I've attached a text file with the config of the content-type I'm attempting to import, and some sample content that works and another that fails. I've also attached a copy/paste of the CCK definition of the field I'm trying to import.
The error message I get is "Dry is not an allowed value for Dryness."
Comment #4
myudkowsky commentedAnother oddity: some of the multi-line content ("comments", which becomes "body" in my content type) will correctly import the line breaks. Other content will not import the line breaks correctly.
This can be seen on the input test file I supplied. The first CSV input line includes:
accounting.
Abarbanel's
And this is imported as:
accounting.Abarbanel's
The very next input line has a CSV input line with:
wines.
Abarbanel's
And this is rendered correctly. I cannot find any difference between these two input lines that would explain why one works and the other does not -- but I *suspect* that the problem might be related to the number of CR's in the first (1 CR) vs. the second (2 CR's).
I have tried various tricks, including changing the value of quote and putting the multi-line text at the end of a line instead of the middle of a line, but this does not help. I've also tried quotes around all strings in the line, not just the multi-line strings, but this does not work either.
Comment #5
myudkowsky commentedThe problem with multi-line import is as follows:
If I include a multi-line input item, the item must start with a CR -- otherwise the first CR in the item is not recognized.
I include as examples variants on the module's tests/node.csv. node2.csv fails to import the CR correctly at the end of the line. node3.csv will import correctly only on the second CR in the input item.
Comment #6
myudkowsky commentedIf I take the replacement code for fgetcsv() out of node_import.module, and use the standard PHP fgetscv(), the problem goes away.
I'm staring hard at the replacement code but I don't see the problem.
Comment #7
Robrecht Jacques commentedThanks for looking into this. I'll try to resolve the issue tomorrow (away from home right now).
The reason why 5.x doesn't use fgetcvs() is because of some issues with UTF8 characters.
It seems to me that if 1CRs import correctly and 2CRs not, that there is some problem with a boolean getting unset or reset incorrectly - "forgetting" the fact that we are parsing a multi-line column value.