Problem/Motivation

The core migrate system needs a way to import things from CSV files.

Immediate use case is for the Umami Demo profile now in core in 8.6.x. The imported content is in CSV files and we have code to read it and create the relevant entries. Having migrate_source_csv in core would allow us to remove that code and use migrate instead.

See #2809635-37: Create experimental installation profile for where @larowlan suggested bring migrate_source_csv into core.

Other good reasons to have CSV as a migration source in core:

  • "I use this constantly. I've seen it forked into the Contenta distro, and a few of my own projects (I just want the source plugin, not another module.) It'll also be vital to Commerce integrations as CSV is a common way to import data from an ERP." @mglaman (#4)
  • "CSV format is the first format used by non technical professionnals who need to import data and to generate/edit those datas thanks to software they knew (MS Excel without mention it). It's also the easiest way to export data from many ERP and big solutions with datas." @GoZ (#5)
  • "CSV is simply a really common used format and it would make migrate more usable by default for many usecases. One could say: Oh it is easy to parse CSV, but it turns out it is not. On top of that the module also doesn't load the entire CSV file upfront, but rather reads line by line. I think this makes it a good candidate also for bigger migrations out there." @dawehner (#14)
  • "This would help the Out of the Box Initiative. Let's do it! Consider this my maintainer +1 :)" @phenaproxima (#18)

Proposed resolution

The migrate_source_csv module has a stable release (2.0), has been downloaded almost 100k times, and comes complete with tests. The contenta CMS has been using this for their migrations, too.

Rename the files/namespaces, and move it in, as-is, to the core migrate module.

Since this is a straight port from contrib, discussions about fetchers, etc. are postponed to follow-ups. See #16, #36 and #37 for some of the highlights of the discussion.

Follow-up: #2962091: Adopt fetchers/parsers logic for use by source csv plugin

Remaining tasks

  1. Patch that renames files/namespaces and moves the contrib code into core Done.
  2. Tests Done.
  3. Word-smithing PHPDocs Done.
  4. Decide if we're okay with a straight port (done), or if we need to hash out all the fetcher/parser stuff first, figure out the design, class names, avoid conflicts, possible BC hell, etc. Done. See comments #80 through #83. Punted to #2962091: Adopt fetchers/parsers logic for use by source csv plugin.
  5. Finish cleaning up the docs and tests.

None.

User interface changes

None.

API changes

Adds a new migrate source plugin called "csv" in migration yml definition files. The class providing the plugin (\Drupal\migrate\Plugin\migrate\source\CSV) is heavily commented with PHPDoc.

Adds a \Drupal\migrate\CSVFileObject class (which extends \SplFileObject).

Existing users of this plugin from contrib are unharmed. The name is the same, but we haven't changed anything about how it works. Contrib users simply disable migrate_source_csv module before upgrading to core 8.X.Y and none of their migration files have to be changed to continue working.

Data model changes

None.

Original report by @smaz

In #2809635: Create experimental installation profile, we are using CSV files to provide default content for a demo installation profile of Drupal.

In a review, @larowlan suggested bringing the CSV migration sources into core:
https://www.drupal.org/project/drupal/issues/2809635#comment-12381619

I agree that a CSV migrate source would be a very useful feature for core.

The migrate_source_csv module has a stable release (2.0), has been downloaded almost 100k times (not sure if that includes composer stats?), and comes complete with tests. The contenta CMS has been using this for their migrations, too.

So would it be ok to look at moving this into Core's migrate module? If so, I'm happy to try and turn the module into a patch.

CommentFileSizeAuthor
#150 interdiff_147-150.txt19.41 KBheddn
#150 2931739-150.patch33.71 KBheddn
#147 2931739-147.patch35.36 KBheddn
#147 interdiff_145-147.txt9.3 KBheddn
#145 2931739-145.patch36.58 KBheddn
#145 interdiff_144-145.txt20.06 KBheddn
#144 2931739-144.patch44.02 KBheddn
#144 interdiff_129-144.txt9.48 KBheddn
#129 interdiff.2931739.120-129.txt18.31 KBmikelutz
#129 2931739-129.drupal.Bring-migratesourcecsv-to-core.patch45.29 KBmikelutz
#121 interdiff.2931739.120-121.txt4.64 KBlongwave
#121 2931739-121.drupal.Bring-migratesourcecsv-to-core.patch46.29 KBlongwave
#120 interdiff.2931739.115-120.txt11.75 KBlongwave
#120 2931739-120.drupal.Bring-migratesourcecsv-to-core.patch47.47 KBlongwave
#116 2931739-116.patch50.24 KBeiriksm
#115 2931739-115.txt50.24 KBeiriksm
#115 interdiff.txt2.41 KBeiriksm
#112 interdiff-2931739.txt555 byteseiriksm
#112 2931739-112.patch50.1 KBeiriksm
#107 2931739-107.patch50.1 KBheddn
#107 interdiff_96-107.txt8.59 KBheddn
#96 2931739-96.patch50.44 KBheddn
#96 interdiff_95-96.txt3.34 KBheddn
#95 2931739-94.patch50.73 KBjofitz
#92 interdiff-70-92.txt10.24 KBjofitz
#92 2931739-92.patch50.73 KBjofitz
#70 interdiff_66-70.txt4.03 KBheddn
#70 2931739-70.patch50.89 KBheddn
#66 interdiff_65-66.txt3.01 KBheddn
#66 2931739-66.patch50.93 KBheddn
#65 2931739-interdiff-59_65.txt1.12 KBdww
#65 2931739-65.patch50.89 KBdww
#50 2931739-49.migrate_source_csv.patch49.1 KBdww
#50 2931739-49.interdiff.txt5.2 KBdww
#48 2931739-48.patch48.75 KBheddn
#48 intrdiff_47-48.txt5.85 KBheddn
#47 interdiff_46-47.txt5.49 KBheddn
#47 2931739-47.patch48.41 KBheddn
#46 interdiff-43-46.txt3.22 KBjofitz
#46 2931739-46.patch48.13 KBjofitz
#43 2931739-43.interdiff.txt4.27 KBdww
#43 2931739-43.migrate_source_csv.patch48.17 KBdww
#41 interdiff_40-41.txt551 bytesheddn
#41 2931739-41.patch47.9 KBheddn
#40 interdiff_38-40.txt10.16 KBheddn
#40 2931739-40.patch47.89 KBheddn
#38 interdiff_35-38.txt33.63 KBheddn
#38 2931739-38.patch44.8 KBheddn
#35 interdiff-2931739-33-35.txt471 bytesrakesh.gectcr
#35 2931739-35.patch24.78 KBrakesh.gectcr
#32 2931739-33.patch24.78 KBrakesh.gectcr
#29 interdiff-2931739-27-29.txt1.71 KBrakesh.gectcr
#29 2931739-29.patch24.75 KBrakesh.gectcr
#27 interdiff-2931739-23-27.txt536 bytesyogeshmpawar
#27 2931739-27.patch24.81 KByogeshmpawar
#23 2931739-23.patch24.8 KBmglaman
#20 2931739-20.patch24.79 KBmglaman
#56 interdiff_50-55.txt12.51 KBheddn
#56 2931739-55.patch49.71 KBheddn
#59 interdiff_55-59.txt1.45 KBheddn
#59 2931739-59.patch50.76 KBheddn
#62 2931739-interdiff-59_62.txt2.84 KBdww
#62 2931739-62.patch50.9 KBdww

Comments

smaz created an issue. See original summary.

Version: 8.5.x-dev » 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

heddn’s picture

+1 on this idea.

mglaman’s picture

I use this constantly. I've seen it forked into the Contenta distro, and a few of my own projects (I just want the source plugin, not another module.)

It'll also be vital to Commerce integrations as CSV is a common way to import data from an ERP.

Not just +1, but +1,000

goz’s picture

Totally agreed.

CSV format is the first format used by non technical professionnals who need to import data and to generate/edit those datas thanks to software they knew (MS Excel without mention it). It's also the easiest way to export data from many ERP and big solutions with datas.

+1,000 for me too

olafkarsten’s picture

Agreed. +1

eiriksm’s picture

+999

heddn’s picture

Rather than just +1s, let's also explain what use case in Core we need it. Just because something is popular doesn't mean it should go into core. And for migrate, we've tended to keep things in contrib that Core itself doesn't need.

I know I didn't do that myself, but don't follow my bad example ;)

eli-t’s picture

Immediate use case is for the Umami Demo profile now in core in 8.6.x. The imported content is in csvs and we have code to read it and create the relevant entries. Having the migrate_source_csv would allow us to bin the code and use migrate instead.

dercheffe’s picture

IMHO this makes sense very much. Especially if you want to create different content types with different fields (maybe even new modules/successors of legacy modules, but with the same data/content).

I would also vote for these modules in core (they support both CSV also):

https://www.drupal.org/project/views_data_export

and

https://www.drupal.org/project/feeds

heddn’s picture

To get a more practical discussion going, can someone take it upon them self to roll a first patch of converting the existing code base from contrib into something added to migrate.module in core? We'd need the source plugin, the file object and the tests all moved/renamed. We also need to handle BC, so a possible rename of the plugin id could take place. A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

mglaman’s picture

A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

I think it should be `file_csv`. The plugin is for a file source which is in the CSV format. This opens up a naming convention which better handles other file formats. For instance, this Allows for file_json or file_xml.

heddn’s picture

I know it is a bit bikeshedding, but JSON and XML tends to not be a "file" as much as a URL data source. While csv is almost always a file. It really should be named 'csv', but that was already used. I wonder if there is any way to hack things so we support BC and still use the same name. We could try it at least.

dawehner’s picture

I think this really make sense. CSV is simply a really common used format and it would make migrate more usable by default for many usecases.
One could say: Oh it is easy to parse CSV, but it turns out it is not. On top of that the module also doesn't load the entire CSV file upfront, but rather reads line by line. I think this makes it a good candidate also for bigger migrations out there.

mglaman’s picture

XML tends to not be a "file" as much as a URL data source

Maybe so for JSON. But many times I have had to deal with systems which FTP files to a server and they are read and processed in either CSV or XML.

mikeryan’s picture

On the last point - the last big project I did involved JSON files (provided as attachments to Jira issues, if you must know).

A larger point, though, is that the retrieval of file/stream-oriented sources would best be separated from parsing of the content retrieved. migrate_plus provides a source plugin which encapsulates separate fetcher and parser plugins (the fetcher further making use of authentication plugins), and uses this to support XML and JSON. CSV sources would also fit this model naturally, and there is an issue to do this (#2867090: Merge migrate_source_csv into migrate_plus as a parser plugin). It would make sense to me, anticipating XML and JSON sources will eventually follow CSV into core, to do this from the beginning.

cosmicdreams’s picture

That makes sense from a classic ETL (Extract Transform Load) perspective. Which is sometimes described as ETTL (Extract Transfer Transform Load).

phenaproxima’s picture

This would help the Out of the Box Initiative. Let's do it! Consider this my maintainer +1 :)

mglaman’s picture

Assigned: Unassigned » mglaman

I'm going to roll a patch just to get things kicked off. If it gets rejected because we want fetchers and parsers per #16 then so be it! But a patch will help kick this along :)

mglaman’s picture

Assigned: mglaman » Unassigned
Status: Active » Needs review
StatusFileSize
new24.79 KB

Here we go!

. We also need to handle BC, so a possible rename of the plugin id could take place. A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

I didn't do this. I kept as CSV.

  1. We didn't agree on a name, so a more straight forward patch
  2. If someone has `migrate_source_csv` installed it technically will provide plugins after core's `migrate`, therefore it overrides the core `csv` plugin with the contrib version.

So we could rename it. Or keep it the same and allow contrib to override the plugin definition. I have a feeling that will not fly over well, but it is a possibility.

Status: Needs review » Needs work

The last submitted patch, 20: 2931739-20.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mglaman’s picture

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+namespace Drupal\Tests\migrate\Unit\Plugin\migrate\source;

+++ b/core/modules/migrate/tests/src/Unit/CSVUnitBase.php
@@ -0,0 +1,82 @@
+ * @group migrate

Test quirks that didn't fail in contrib but blew up here.

mglaman’s picture

Status: Needs work » Needs review
StatusFileSize
new24.8 KB

Fix on those two issues.

Status: Needs review » Needs work

The last submitted patch, 23: 2931739-23.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mglaman’s picture

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
--- /dev/null
+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+class CSVTest extends KernelTestBase {

I missed this. Class name does not match the filename.

yogeshmpawar’s picture

Assigned: Unassigned » yogeshmpawar
yogeshmpawar’s picture

Assigned: yogeshmpawar » Unassigned
Status: Needs work » Needs review
StatusFileSize
new24.81 KB
new536 bytes

Added Class name which matches with the filename & also added an interdiff.

Status: Needs review » Needs work

The last submitted patch, 27: 2931739-27.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

rakesh.gectcr’s picture

Status: Needs work » Needs review
StatusFileSize
new24.75 KB
new1.71 KB

Changed the Place of test to Drupal\Tests\migrate\Kernel\source, lets see how the test goes

mglaman’s picture

--- b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
...
+++ b/core/modules/migrate/tests/src/Kernel/source/CSVKernelTest.php
...
+class CSVKernalTest extends KernelTestBase {

CSVKernel.

This will fail.

Status: Needs review » Needs work

The last submitted patch, 29: 2931739-29.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

rakesh.gectcr’s picture

Status: Needs work » Needs review
StatusFileSize
new24.78 KB

Lets hit again... :)

Status: Needs review » Needs work

The last submitted patch, 32: 2931739-33.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

mglaman’s picture

+++ b/core/modules/migrate/config/schema/migrate.source.schema.yml
@@ -0,0 +1,31 @@
+    path:

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+class CSVKernalTest extends KernelTestBase {

CSVKernelTest

Not Kernal

rakesh.gectcr’s picture

Status: Needs work » Needs review
StatusFileSize
new24.78 KB
new471 bytes

:) Thanks Matt....

heddn’s picture

I'm not sure how important to block things on #16.

I did some quick design work on what we'd need to do to support the fetch/parser method this morning and it is significantly different than what is being done in migrate_source_csv today.

So, do we need to be able to support fetching the CSV from a non-local endpoint, say an HTTP request? Do need to be able to support multiple CSV URIs? If the answer is no, then we don't need fetchers/parsers. But I feel like we should have those things.

With that in mind, here's some design:

A new URI source plugin.
The parser can define a default fetcher. For CSVs that should be something akin to file_copy since file objects require a local file.
The current CSV class code becomes the parser.
Other possible fetchers are http and file (file doesn't copy, it just uses php file streams).
The parser would mask the presence of the fetcher as much as possible. But would pass along config options to the fetcher.
The fetcher interface in migrate_plus is pretty messy right now and assumes an HTTP resource. I think it should be less restrictive and return a \SplFileObject of some sort. Then if the parser needs to read in a file file into a json or xml parser, use $file->fread($file->getSize())

heddn’s picture

Here's a summary of a conversation in Slack w/ @mglaman and @phenaproxima:

phenaproxima: we should have fetchers but not add any extra overhead or complexity from fetchers.
mglaman: having local files is enough for commerce, we used something outside of drupal to fetch the files local
heddn: leave it to something outside of Drupal to retrieve the files locally. Or have contrib build that feature.

So, I'm going to suggest that having fetchers, while nice if easy... isn't critical. It isn't super easy. So let's implement without them and leave it to contrib or follow-ups in core if we find they are super critical. My $.02: fetchers won't become critical, because source csv in contrib became a top 100 module without it and it wasn't a common support request in the issue queue.

heddn’s picture

StatusFileSize
new44.8 KB
new33.63 KB

So, in the last few weeks a fair number of patches landed upstream that added test coverage and various fixes. Here's an update for all them.

Note, I removed the schema yml, because that was only there for migrate plus integration.

Next steps is to review the doxygen and see if things fail on the test bot. Doxygen, because I think its a little lacking.

Status: Needs review » Needs work

The last submitted patch, 38: 2931739-38.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new47.89 KB
new10.16 KB

OK, doxygen added. PHPCS stuff fixed and hopefully test failures too?

Except for the 80 char column wrapping on the doxygen. I didn't do that yet while we are discussing the language. I find its really hard to wordsmith when I'm constantly fixing the 80 chars on comments. Let's land on good docs, then do the line wrap once.

heddn’s picture

Status: Needs review » Needs work
StatusFileSize
new47.9 KB
new551 bytes

Fixing a copy/paste.

dww’s picture

I love migrate_source_csv. I don't mind installing it from contrib, but if we want to move it into core for core's benefit, that's fine with me.

@heddn asked for comments/wordsmithing on the phpdoc comment for the CSV source class. Here goes:

+ * - header_row_count: (optional) Signifies the number of rows of data in a file until the header row.

... "Defaults to 0." ? Don't we usually say what happens when you don't define something?

Also, "until the header row" is wrong. Maybe this is clearer:

" - header_row_count: (optional) The number of rows of header information in the CSV file before the first data row."

+ * - keys: (optional) 

Uhh, is that really optional? I remember migrations blowing up spectacularly if you don't define that correctly.

+ * - column_names: (optional) Numeric key 0-based index of the columns in source CSV file.

Ditto about "optional". If you don't define this, you have no way to access any of the source values, right? Seems this is required.
Sorry, I didn't realize defining a header_row gives you this automatically! Learn something new every day. We should probably say so right here, not just if you closely read the examples.

Furthermore, the intended input for it is something like this:

  column_names:
    -
      id: Identifier
    -
      first_name: 'First Name'
    -
      last_name: 'Last Name'
    -
      email: 'Email Address'
    -
      ip_address: 'IP Address'
    -
      date_of_birth: 'Date of Birth'

So it's not just a "Numeric key 0-based index of the columns". It's an array with numeric keys starting at 0 where the values are column machine name => column human-readable name associations. Not sure the most concise/elegant way to write that, but I it seems crucial to somehow explain it's expecting a nested array. We see it in the examples a little further down, but we should be clear up here, too.

Furthermore, Do the human readable names *ever* get used anywhere? I haven't see it, but maybe I haven't triggered the appropriate error condition where those would be visible.

Anyway, how about this?

" - column_names: (optional) Index of the columns in the source CSV file. If the file defines a header row, the values in the header row are used as the machine name of each column. If defined explicitly in the migration configuration, each entry in this array must have a numeric key, starting at 0, and the value should be a mapping of machine name to human-readable name for that column.

Or something. :/ Sorry. This is a bit confusing to document clearly.

+ * - delimiter: (optional) The field delimiter (one character only).

Defaults to a comma ','.

Side note: Duh! ;)

Okay, you get the idea. Ditto all the other optional values...

+ * Example:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   header_row_count: 1
+ *   keys:
+ *     - id
+ *
+ * # my_file.csv
+ * id,country
+ * 1,United States
+ * 2,Nicaragua
+ * 3,Spain
+ * @endcode
+ *
+ * This example configures the source plugin with minimal options.
+ *
+ * Description of options:
+ *   path: The file path to the CSV file.
+ *   header_row_count: This is the number of lines into the CSV to find the row that is the "header". If a header row is configured, then setting column_names is not required as the header names become the property names.
+ *   keys: The source property(ies) that uniquely identify the source. It defaults to a string/varchar database column.

Took me a second to realize all the text at the bottom was related to this example, not the next one.

Also: "It defaults to a string/varchar database column." huh?
Oh wow, I've never seen this, either: "Here we define the database schema to be a huge text column." We should fix the docs for "keys" above to talk about this schema stuff, not buried in these examples. I didn't realize that plumbing existed.

Maybe something like this:

+ * Simple example with minimal options:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   header_row_count: 1
+ *   keys:
+ *     - id
+ *
+ * # my_file.csv
+ * id,country
+ * 1,United States
+ * 2,Nicaragua
+ * 3,Spain
+ * @endcode
+ *
+ * Description of options:
+ *   path: The file path to the CSV file.
+ *   header_row_count: Set to 1 to indicate a header row in the CSV file that names the columns. Since there's a header row, setting column_names is not required as the header names become the property names.
+ *   keys: The source property that uniquely identifies each row.

Here's a proposal for the next example block:

+ * Advanced example with most options configured:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   keys:
+ *     id:
+ *       type: text
+ *       size: big
+ *   column_names:
+ *     -
+ *       id: Identifier
+ *     -
+ *       country: 'Country'
+ *   file_class: 'Drupal\example\MyCSVFileObject'
+ *   delimiter: '|'
+ *   enclosure: '%'
+ *   escape: '`'
+ *   file_flags: 14
+ *
+ * # my_file.csv
+ * %one%|%United States%
+ * %two%|%Nicaragua%
+ * %three%|%Spain%
+ * @endcode
+ *
+ * Description of options:
+ *   keys: The source property(ies) that uniquely identify the source. Here we define the database schema to be a huge text column.
+ *   column_names: These become the names of the source properties. It is needed because the CSV file has no header information and we therefore could not define header_row_count.
+ *   file_class: The file object is overridable.
+ *   delimiter, enclosure, escape  are all standard arguments to the SplFileObject to override default CSV reading functionality. @see http://php.net/manual/en/splfileobject.setcsvcontrol.php.
+ *   file_flags is a bitmask. In this example it does not break each row at a new line because it does not contain \SplFileObject::DROP_NEW_LINE. @see http://php.net/manual/en/splfileobject.setflags.php
+ *

I removed the bogus header row from the example my_file.csv here, since that would break the example migration, and it's just confusing.

Also, it seems valuable to have the advanced example show something with multiple columns as keys, since that'd be a more realistic example.

Meta: I should have been editing all this in emacs and posting a new patch, not trying to write it here. :/

Okay, I'm going to stop here for now and check back in with @heddn if this is actually helpful...

dww’s picture

StatusFileSize
new48.17 KB
new4.27 KB

Sorry, here's #42 as a patch and an interdiff. Much better.

jofitz’s picture

Status: Needs work » Needs review

Setting to Needs Review to run the testbot.

Status: Needs review » Needs work

The last submitted patch, 43: 2931739-43.migrate_source_csv.patch, failed testing. View results

jofitz’s picture

Status: Needs work » Needs review
StatusFileSize
new48.13 KB
new3.22 KB

Corrected coding standards errors.
Corrected typo that was causing CSVSourceYieldTest to fail.
Unable to replicate the other 2 test failures - hoping the other changes might work some magic.

heddn’s picture

StatusFileSize
new48.41 KB
new5.49 KB

Improvements on docs.

heddn’s picture

StatusFileSize
new5.85 KB
new48.75 KB

And a few more improvements. We're close enough, I formatted them to 80 chars.

The last submitted patch, 46: 2931739-46.patch, failed testing. View results

dww’s picture

StatusFileSize
new49.1 KB
new5.2 KB

Even more better PHPDocs. ;)

Note: I added a @todo about file_class:

*   @todo We should have an interface that this class must implement.

Not sure if we should do that right now as part of this patch, or handle it in a follow-up. But it seems that if we let people declare their own class, we should enforce an interface their class must implement if it's going to work for this.

The last submitted patch, 48: 2931739-48.patch, failed testing. View results

The last submitted patch, 47: 2931739-47.patch, failed testing. View results

The last submitted patch, 50: 2931739-49.migrate_source_csv.patch, failed testing. View results

heddn’s picture

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -16,27 +16,31 @@
    + *   the machine names of each column. If defined explicitly in the migration
    

    Any values explicitly defined in column_names will override...

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -16,27 +16,31 @@
    + *   alternative CSV file reader. This class should extend \SplFile.
    

    For now, it really should extend CSVFileObject.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -16,27 +16,31 @@
    + * - delimiter: (optional) The field delimiter (one character only). Defaults
    + *   to a comma (,).
    

    Why this change?

dww’s picture

1. Sure.

2. Sounds good (for now). Thoughts on if we want to provide (and enforce) an interface as part of this patch, or move that to a followup?

3. Because of 80 character wrapping.

Cheers,
-Derek

heddn’s picture

StatusFileSize
new12.51 KB
new49.71 KB

Hopefully this fixes up the last few tests. More docs tweaks.

Status: Needs review » Needs work

The last submitted patch, 56: 2931739-55.patch, failed testing. View results

cosmicdreams’s picture

OK, to do a code standards pass? Couple things I'm seeing.

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new1.45 KB
new50.76 KB

Oops, let's add the interface too. This should be ready for any types of review.

dww’s picture

Status: Needs review » Needs work

Great, thanks for adding the interface!

Now we need:

a) CSVFileObject should say it implements it.

b) The docs for file_class in CSV.php should reference it.

c) [optional] Something (?) should enforce that whatever you put in file_class implements it. There was talk of the CSV constructor throwing an exception.

dww’s picture

Assigned: Unassigned » dww

I'm going to work on my list from #60, and maybe fix the test failures, too. Stay tuned.

dww’s picture

Assigned: dww » Unassigned
Status: Needs work » Needs review
StatusFileSize
new50.9 KB
new2.84 KB

Whoops, somehow I missed that A and C were already done when I skimmed the interdiff. Sorry about that. ;) Edit: Oh, because those changes were not included in the interdiff.

So here's a patch for B.

I briefly looked at the tests, and it's not immediately obvious why they're failing. :/ My local setup isn't currently running tests, so I'll leave that for someone else to sort out.

@heddn: Can you check your editor wrapping settings? I'm working in an emacs window exactly 80 chars wide, and your stuff keeps extending off the screen. That's why the interdiff is larger than it needs to be, since I keep re-fixing the 80 char wrapping. :/ Please either fix your editor's wrapping, or turn it off so it preserves what I'm uploading.

Thanks,
-Derek

dww’s picture

p.s. @cosmicdreams: Sorry I missed your comment. I'm done with this for now. I'll assign to myself if I'm working on it again. Meanwhile, feel free to assign this to yourself if you want to work on something. Thanks.

The last submitted patch, 62: 2931739-62.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

dww’s picture

StatusFileSize
new50.89 KB
new1.12 KB

Ugh, sorry for the noise. In Slack @heddn convinced me my setup was wonky. I double-checked, and lo, my fill-column was actually set to 78.

Here's the fixes from #62, wrapped @ 80, not 78.

Still not sure WTF is causing those tests to fail. I haven't looked. No sense having the bot re-test this.

heddn’s picture

StatusFileSize
new50.93 KB
new3.01 KB

Thank goodness for tests. This should fix the final failures. Any chance we an upgrade our doctrine annotation parsing any time soon to get rid of this ugly work around?

quietone’s picture

  1. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,103 @@
    + * - assume CSV format
    + * - skip header rows on rewind()
    + * - address columns by header row name instead of index
    

    Pretty sure these are sentences, and need capitalization and a full stop.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,321 @@
    + *   the file defines a header row, the values in the header row are used as
    

    s/defines/includes/ ?

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,321 @@
    + * Simple example with minimal options:
    

    No need for Simple and Advanced in the next example. Just 'Example with minimal options' says it all.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,321 @@
    + *   - path: The file path to the CSV file.
    

    The description of path is verbatim from above and not adding value, in my opinion. I'd prefer a paragraph format explaining what is happening here.

  5. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,321 @@
    + * Advanced example with most options configured:
    

    Advanced can be removed.

  6. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,321 @@
    + * Description of options:
    

    As before, an explanation versus description would be better, I think.

rakesh.gectcr’s picture

Status: Needs review » Needs work
heddn’s picture

Assigned: Unassigned » heddn
heddn’s picture

Assigned: heddn » Unassigned
Status: Needs work » Needs review
StatusFileSize
new50.89 KB
new4.03 KB
jofitz’s picture

Status: Needs review » Needs work

The patch in #70 does not address #67.2 (this may or may not be intentional).

dww’s picture

Status: Needs work » Needs review

67.2 was a ? comment review concern, and I don't think it helps improve the meaning of the code comment. A file has to define values in its header row for it to matter and be useful. It's not like you simply include "#header row" in your file, you have to define what it is by naming all your columns. I dunno, we're seriously nitpicking and bike-shedding now. ;)

Generally, I think the paragraph version makes these comments a bit less useful, since it forces people (with short attention spans) to read an entire block of text if all they want to know is why a specific key is defined. I agree we can skip comments where there's nothing added over the original doc comments, but I'd personally rather see these still in the style of #66, not #70.

That said, I don't want to post another patch if it's going to be a pushing match. ;) If everyone else thinks a paragraph is better, I'll let it go.

Cheers,
-Derek

[edited for clarity]

quietone’s picture

Status: Needs review » Needs work

There was quite a bit of effort in getting the source, process and destinations plugins in core commented with examples and explanations and to be in a consistent style. Since this is going into core, it too should use the same style. Setting to NW because of that.

Personally, I do prefer lists and I often think in lists. And yet, I do find that having the plain language explanation can be very helpful at times to clarify a point that for some reason I didn't get in the details of the plugin options.

dww’s picture

Status: Needs work » Needs review

@quietone: Please look at the patch and/or interdiff. It's already in the style you asked for.

The only reason this was "needs work" was a (IMHO unhelpful) change for s/defines/includes/. This needs review, not work.

quietone’s picture

@dww, sorry my last comment does seem out of sequence. Anyway, my concerns from #67 are addressed, including 67.2. So that is done.

What is next here? I don't have time to do a full review but I suggest that the discussion of using a fetcher be put in the IS because someone is going to ask about it.

mglaman’s picture

I suggest that the discussion of using a fetcher be put in the IS because someone is going to ask about it.

When I read Fetcher I think of a remote source. Which seems to blow up the scope of this plugin source. The current source plugins do not use a fetcher, correct? Or would the fetcher have a "Local" option which just lets sourcing the file from the local disk? A lot of the comments here have seemed to echo the thought: Keep it simple so we can source local CSVs via core alone as quickly as possible.

dww’s picture

Assigned: Unassigned » dww
Issue tags: +Needs issue summary update

I'm working on a major IS update. Stay tuned.

dww’s picture

Assigned: dww » Unassigned
Issue summary: View changes
Status: Needs review » Needs work

First pass. Summarizing the fetcher/parser debate is still todo.
Anyone else want to take a turn with that?

heddn’s picture

Issue summary: View changes
Status: Needs work » Needs review

IS updated to discuss fetchers and follow-up (#2962091: Adopt fetchers/parsers logic for use by source csv plugin) opened to continue that discussion.

Back to NR.

dww’s picture

Issue summary: View changes
Issue tags: -Needs issue summary update

Cool, thanks for opening the follow-up. I did a few other minor edits to the summary, and added this as a remaining task:

Decide if we're okay with a straight port (done), or if we need to hash out all the fetcher/parser stuff first, figure out the design, class names, avoid conflicts, possible BC hell, etc.

Generally, I'm fine with getting this in sooner and leaving #2962091: Adopt fetchers/parsers logic for use by source csv plugin postponed for now. However, if we foresee doing the whole fetcher/parser system, "csv" makes a lot more sense as a parser plugin name than it does as a source plugin. Whenever the dust settles on #2962091, are we going to be stuck with some BC hell while we try to rename classes? Or is the vision that the parser plugin namespace will be distinct, and it won't matter, and folks who have "source: plugin: csv" in their migration will continue to be supported somehow? Yes, I understand plans can change once you actually start working on something, but I'd love clarity on why committing this patch, right now, as-is, isn't going to make our lives harder in the future.

Echoing @mglaman in comment #12, perhaps the plugin_id for this should be called "file_csv" for now, instead of parking on "csv" itself?

Regardless, removing 'Needs issue summary update' tag.

heddn’s picture

re #80: What @phenaproxima and I had discussed was something that wouldn't be BC hell. By sticking with the same source plugin name; and a straight port, folks would just need to disable contrib source csv module. Because the plugin in core has the same one-for-one features, even the same name.

Renaming and would cause more issues.

dww’s picture

Sorry, you misunderstand my concern in #80. I already summarized #81 in the edit to the summary I did in #78:

API Changes

...

Existing users of this plugin from contrib are unharmed. The name is the same, but we haven't changed anything about how it works. So, contrib users simply disable migrate_source_csv module before upgrading to core 8.X.Y and none of their migration files have to be changed to continue working.

What I'm talking about in #80 is that N months from now, once this is in core, people like me that are relying on CSV migrations will have happily seen our contrib-driven CSV migrations working with CSV in core without modification. However, at some point, #2962091: Adopt fetchers/parsers logic for use by source csv plugin is going to introduce a slew of new classes, plugin ids, etc. Someone is going to say "The 'csv' id is all wrong for a source plugin! That's really a parser. Elegance++" People like me are going to say: "screw you, my migrations are already working, please don't re-re-break them!" Fight ensues. ;)

That's the situation I'm trying to mitigate by treading a bit more carefully now...

Maybe we can publicly agree, right here, that we're not going to do that in #2962091, commit "ourselves" to working around the issue that 'csv' is already taken as a source plugin id, and whatever happens in there is going to jump through a few more hoops to keep it working. That's all I'm saying.

Are we cool with that? If so, I'm going to copy this text into #2962091 to make it clear whatever refactoring happens there has to keep this working.

If we're not cool with that, please say so now.

Thanks,
-Derek

heddn’s picture

Migrate module is stable. Therefore, once this lands it would have to abide by BC policy according to https://www.drupal.org/core/d8-bc-policy. So, when the rubber meets the road and we want fetchers, can we do it in a BC manner and keep everyone happy? I'm pretty sure we can.

The fetcher would just need to know what parser to use. And this existing CSV parser would just start /also/ being a parser. The fetcher would retrieve the file and put it in the location that the csv plugin asks for it to be. The code that makes all this magic happen might be ugly, but csv and users of these new fetchers wouldn't need to see the ugliness.

dww’s picture

Issue summary: View changes

Great. You're now on record saying so, both here and in #2962091: Adopt fetchers/parsers logic for use by source csv plugin. ;) Updated both summaries accordingly.

No more remaining tasks, other than probably a thorough review and RTBC from someone who hasn't yet put their hands all over this code...

Thanks!
-Derek

heddn’s picture

Priority: Normal » Major

Seeing this is on the roadmap for 8.6 for both migrate and OOB, let's bump to a major.

quietone’s picture

Thanks for the updates to the IS, very thorough. I haven't done a full review, mostly because I can't RTBC this anyway since I worked on it in migrate_source_csv.

phenaproxima’s picture

Status: Needs review » Needs work

It's getting there!

  1. +++ b/core/modules/migrate/src/CSVFileInterface.php
    @@ -0,0 +1,42 @@
    +/**
    + * Defines a CSV file interface.
    + */
    

    Can we expand this doc comment to explain what constitutes a "CSV file", as defined by this interface? (i.e., it has header row(s) and named columns) If not now, then we need a follow-up to document this better.

  2. +++ b/core/modules/migrate/src/CSVFileInterface.php
    @@ -0,0 +1,42 @@
    +   * Number of header rows.
    

    Should be "Returns the number of header rows".

  3. +++ b/core/modules/migrate/src/CSVFileInterface.php
    @@ -0,0 +1,42 @@
    +   *   Get the number of header rows, zero if no header row.
    

    "...zero if there are no header rows."

  4. +++ b/core/modules/migrate/src/CSVFileInterface.php
    @@ -0,0 +1,42 @@
    +  /**
    +   * Number of header rows.
    +   *
    +   * @param int $header_row_count
    +   *   Set the number of header rows, zero if no header row.
    +   */
    +  public function setHeaderRowCount($header_row_count);
    

    Let's rephrase these similarly.

  5. +++ b/core/modules/migrate/src/CSVFileInterface.php
    @@ -0,0 +1,42 @@
    +  /**
    +   * CSV column names.
    +   *
    +   * @return array
    +   *   Get CSV column names.
    +   */
    +  public function getColumnNames();
    +
    +  /**
    +   * CSV column names.
    +   *
    +   * @param array $column_names
    +   *   Set CSV column names.
    +   */
    +  public function setColumnNames(array $column_names);
    

    The doc comments here could also use some fine-tuning and better type hints (string[] rather than array).

  6. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +use Drupal\migrate\CSVFileInterface;
    

    I don't think we need this use statement, since CSVFileObject and CSVFileInterface are in the same namespace. Also...should these not be in the Drupal\migrate\Plugin\migrate\source namespace? They are source-specific, after all.

  7. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +/**
    + * Defines a CSV file object.
    

    This doxygen isn't very helpful. Can we expand what a "CSV file object" is? Maybe "a wrapper around CSV data in a file"?

  8. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    + * - Address columns by header row name instead of index.
    

    numeric index

  9. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +  /**
    +   * The number of rows in the CSV file before the data starts.
    +   *
    +   * @var integer
    +   */
    +  protected $headerRowCount = 0;
    

    I wonder if 1 might not be a more sane default here. I imagine that most CSV files have a header row (correct me if I'm wrong), so it's a reasonably safe default assumption.

  10. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +  /**
    +   * The human-readable column headers, keyed by column index in the CSV.
    +   *
    +   * @var array
    +   */
    

    The type hint should be string[].

  11. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +    // Necessary to use this approach because SplFileObject doesn't like NULL
    +    // arguments passed to it.
    +    call_user_func_array(['parent', '__construct'], func_get_args());
    

    How will this prevent NULL arguments from going to SplFileObject?

  12. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +      // Set meaningful keys for the columns mentioned in $this->csvColumns.
    

    $this->csvColumns is not a thing.

  13. +++ b/core/modules/migrate/src/CSVFileObject.php
    @@ -0,0 +1,102 @@
    +  public function setHeaderRowCount($header_row_count) {
    

    Man, I wish we could use scalar type hints.

  14. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * If the CSV file contains non-ASCII characters, make sure it includes a UTF
    + * BOM (Byte Order Marker) so they are interpreted correctly.
    

    Could we cite a link here that explains what a BOM is, and how one might be sure that it is set correctly?

  15. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * - path: Path to the source CSV file available from local storage.
    

    Is this an absolute path, or a relative one?

  16. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * - keys: Array of column name(s) which represent the columns uniquely
    + *   identifying each record. If declared as a nested array, the keys must be
    

    The first sentence is kinda obtuse. Can we just say: "An array of column names in the CSV file. Can be an array of strings, or an array of arrays. If declared as an array of arrays..."

  17. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * - column_names: (optional) Names of the columns in the source CSV file. If
    + *   the file defines a header row, the values in the header row are used as
    + *   the machine names of each column. Any values explicitly defined in
    + *   column_names will override the values from the header row. Each entry in
    + *   the array must have a numeric key. Numbering starts at 0. Each value should
    + *   be a mapping of machine name to human-readable name.
    

    I think we need to rephrase this, because it's over-explaining the functionality in a very confusing way. If I'm understanding it correctly, column_names is a simple key-value map where the values are the column name in header row, and the keys are the corresponding source row property names. A quick example (no need for code) would also help.

  18. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * - file_class: (optional) Full class name that includes namespace for an
    

    Can we rephrase? Maybe something like: "Fully qualified (including the full namespace) name of a class that can read the CSV file."

  19. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * - file_flags: (optional) Bitmask flags for the SplFileObject. Defaults to
    + *   bitmask value of 15.
    

    And what does a bitmask value of 15 mean? We should probably link to external documentation for \SplFileObject to explain further.

  20. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * 1,United States
    

    USA #1? =P

  21. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * The path should be a full path to a file in local storage. Remote file
    

    "in local storage" is weird. How about just "a local file"? Also, let's move this to the explanation of the 'path' configuration key.

  22. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * streams are not supported. The header_row_count indicates the row in the CSV
    + * file that names the columns. Since there is a header_row_count, setting
    + * column_names is redundant and not done. The header defines the source
    + * property names. The keys configuration designates the source property that
    + * uniquely identifies each row.
    

    I think these sentences are both confusing and redundant -- we pretty much explained all of this already.

  23. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + *   column_names:
    + *     -
    + *       id: Identifier
    + *     -
    + *       country: Country
    

    Why is this a nested array?

  24. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    + * In this example of keys we override the default database schema to be a large
    + * text column. The column_names define the names of the source properties. It
    

    "Example of keys" is weird. How about: "In this example, we override the default database schema so that the 'id' column will be a large text column."

  25. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +  /**
    +   * List of available source fields.
    +   *
    +   * Keys are the field machine names as used in field mappings, values are
    +   * descriptions.
    +   *
    +   * @var array
    +   */
    +  protected $fields = [];
    

    Are we using this anywhere?

  26. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +  /**
    +   * List of key fields, as indexes.
    +   *
    +   * @var array
    +   */
    +  protected $keys = [];
    

    Or this?

  27. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +    if (empty($this->getConfiguration()['path'])) {
    +      throw new MigrateException('You must declare the "path" to the source CSV file in your source settings.');
    +    }
    +
    +    // Key field(s) are required.
    +    if (empty($this->getConfiguration()['keys'])) {
    +      throw new MigrateException('You must declare "keys" as a unique array of fields in your source settings.');
    +    }
    +
    +    $this->fileClass = $this->getConfiguration()['file_class'];
    

    Let's just call getConfiguration() once, after setConfiguration(), and keep re-using its result.

  28. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +    if (!file_exists($this->getConfiguration()['path'])) {
    +      throw new InvalidPluginDefinitionException($this->getPluginId(), sprintf('File path (%s) does not exist.', $this->getConfiguration()['path']));
    +    }
    +    // File handler using header-rows-respecting extension of SPLFileObject.
    +    $this->file = new $this->fileClass($this->getConfiguration()['path']);
    

    Let's get the path once so we don't have to keep calling getConfiguration(). It'll be easier to read.

  29. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +    $delimiter = $this->getConfiguration()['delimiter'];
    +    $enclosure = $this->getConfiguration()['enclosure'];
    +    $escape = $this->getConfiguration()['escape'];
    

    Same here. Calling getConfiguration() repeatedly makes the code harder to read.

  30. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,318 @@
    +  public function getIDs() {
    

    This needs to be getIds(), not getIDs().

dww’s picture

@phenaproxima: Thanks for the excellent and thorough review! I don't have time right now to re-roll this, but a few quick replies for whomever does:

9. I wonder if 1 might not be a more sane default here. I imagine that most CSV files have a header row (correct me if I'm wrong), so it's a reasonably safe default assumption.

Please no to this change:
a) I use/see CSV files all the time without a header row.
b) It defaults to 0 in contrib.
c) If we silently change it and get this wrong for existing migrations, we introduce data loss.

15. Is this an absolute path, or a relative one?

Yes. :( Not sure if this is the right spot, but I wanted to know and see people constantly asking about how to make this a relative path to the directory of the module providing a given CSV migration. From what I can gather, here are all the ways people can/do use this:
a) URI like public://import/csv/example.csv and put the CSV file in your public files dir.
b) Absolute path.
c) Relative path from the site root.
d) https://www.drupal.org/project/system_stream_wrapper or #1308152: Add stream wrappers to access .json files in extensions
e) Implement hook_migration_plugins_alter() along the lines of:

function migrate_source_csv_test_migration_plugins_alter(&$definitions) {
  $definitions['migrate_csv']['source']['path'] = drupal_get_path('module', 'migrate_source_csv_test') . $definitions['migrate_csv']['source']['path'];
}
16. The first sentence is kinda obtuse. Can we just say: "An array of column names in the CSV file. Can be an array of strings, or an array of arrays. If declared as an array of arrays..."

No, because that's not what this knob is doing. It's specifying which of your columns define the unique key for the row. Maybe a single column name, or an array of column names that together define the unique key.

17. I think we need to rephrase this, because it's over-explaining the functionality in a very confusing way. If I'm understanding it correctly, column_names is a simple key-value map where the values are the column name in header row, and the keys are the corresponding source row property names. A quick example (no need for code) would also help.

Yeah, I don't love this comment or how this knob works. But no, you're not exactly understanding how it works (further proof it all needs work). The keys of this knob are always a numeric index to specify which column in the CSV you're talking about. The values are either a string (the property name to refer to column #X with in the rest of your migration) or a nested array, where the key is the property name and the value is the (never used anywhere?) "human-readable names" of each column.

This knob is required if you don't have a header row, since it's the only way to map column numbers to property names.

If you have a header row, you probably don't want to define this, but if you do, whatever you put in here will overwrite whatever was declared in the header row.

Unless the "human-readable" name thingy is A Thing(tm) somewhere I'm not aware of, I'd love to kill that aspect of the code and enforce that this knob always has the same structure. Flat array of integer keys and string property name values.

#20 Yes, let's definitely fix that. It's alphabetically (and morally) last, so it should go to the end of the file. ;)

#23: Because of the confusing and fubar nature of #17.

#25: Not that I know of.

#26: Definitely. It's how we know the migration row key.

Otherwise, yes to everything you said.

Thanks again!
-Derek

phenaproxima’s picture

Hi @dww, thanks for the equally clear and thorough response! I really like the way you explained some of the configuration options, so for whoever works on this patch next...use the comment in #88 to rework some of the documentation :)

dww’s picture

Re: #89: Thanks. :)

Re: #87.17 + #88.17

If we change the structure of keys to always be a flat array, what happens to existing migrations that are (pointlessly) defining human-readable descriptions? I'd rather this source plugin didn't barf on them with exceptions, forcing them to rewrite any of their working migrations. So maybe we have to leave support for the complex nested thing, but somehow flag it as deprecated? If I knew it was going to be caught and nicely handled by default, maybe there's some kind of StopUsingDeprecatedMigrationSourceSettingsException we could throw? ;) Or something?

Short of any of that, I'd be happy for the public-facing docs to pretend this thing must be a flat array, even if the code actually supports both formats.

mikeryan’s picture

Unless the "human-readable" name thingy is A Thing(tm) somewhere I'm not aware of, I'd love to kill that aspect of the code and enforce that this knob always has the same structure.

Yes, it is - the descriptions are returned as the values of the source plugin's fields() method (machine names are the keys) and are displayed in migrate_tools by the drush migrate-fields-source command as well as displayed in the UI, allowing you to document fields with something a little more helpful than "id".

jofitz’s picture

Status: Needs work » Needs review
StatusFileSize
new50.73 KB
new10.24 KB

Addressed all of @phenaproxima's comment's in #87 except for:
9. Based upon @dww's response in #88.
11.

How will this prevent NULL arguments from going to SplFileObject?

16. Based upon @dww's response in #88.
17. Based upon @dww's response in #88.
23. Based upon @dww's response in #88.

Status: Needs review » Needs work

The last submitted patch, 92: 2931739-92.patch, failed testing. View results

dww’s picture

Re: #87.25: As @mikeryan said in #91, that's A Thing(tm). Behold:

interface MigrateSourceInterface extends \Countable, \Iterator, PluginInspectionInterface {

  /**
   * Returns available fields on the source.
   *
   * @return array
   *   Available fields in the source, keys are the field machine names as used
   *   in field mappings, values are descriptions.
   */
  public function fields();

So yeah, #92's patch and interdiff is incorrectly removing that and we need to keep it.

Test failures look like disagreement over the location of the CSVFileObject in the namespace. I agree it could live in \Drupal\migrate\Plugin\migrate\source instead of directly in \Drupal\migrate, but we have to get all parts of the code and tests to agree on that. ;)

Cheers,
-Derek

jofitz’s picture

Status: Needs work » Needs review
StatusFileSize
new50.73 KB

Fixed the test failures.

heddn’s picture

StatusFileSize
new3.34 KB
new50.44 KB

All feedback address now from #87. Between the awesome work from dww and Jo Fitzgerald. Along with a small nit I have with keeping consistent variable naming. This is ready again for another round of reviews.

For #87.11, this was done when originally we were setting the flags in the file constructor. That was removed and the constructor just left. I've cleaned that up in this latest patch.

heddn’s picture

Just noticed this:

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ * 1,Nicaragua
+ * 2,Spain
+ * 3,United States
...
+ * %really long string that makes this unique%|%United States%
+ * %even longer really long string that makes this unique%|%Nicaragua%
+ * %even more longer really long string that makes this unique%|%Spain%

Nit: let's order these both by country alpha. Leaving at NR for other feedback.

smaz’s picture

diff --git a/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv b/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv
new file mode 100644

Do we need a .htaccess here to prevent direct web access to any CSV files used for testing? We have one in the Umami profile for the default content:
https://cgit.drupalcode.org/drupal/tree/core/profiles/demo_umami/modules...

mglaman’s picture

RE: #98. What if the server is nginx? It won't make a difference at that point. I don't see the benefit, unless there's some history in the thread which added it for Umami.

phenaproxima’s picture

Preventing access to the CSV file(s) is not in scope for this issue. It's worth looking into and talking about, at least, but let's open a follow-up issue for that if it's needed.

eli-t’s picture

quietone’s picture

Added a follow up #2978827: Add a .htaccess file to prevent downloading of csv test files to decide on adding a .htaccess file.

phenaproxima’s picture

Status: Needs review » Needs work

Looks good and pretty logical, but I found a number of relatively minor things. Mostly documentation stuff.

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   the CSV file before the first data row. If set, the values in this row are
    + *   used as the source property names for each column in the file. If there are
    

    Shouldn't this be "...source property names for each data row in the file"?

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   multiple header rows, only the final row before the data is used for column
    

    Can we rephrase this: "If there are multiple header rows, the column names are taken from the final one."

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   the file defines a header row, the values in the header row are used as
    + *   the machine names of each column. Any values explicitly defined in
    

    There has been no previous mention of machine names. We need to explain what a machine name is in this context, and how it's used.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   namespace) that can read the CSV file. The class must extend \SplFileObject
    + *   and implement \Drupal\migrate\CSVFileInterface.
    

    I wonder -- given the way this works, do we actually need CsvFileInterface? Why not just create a new class which extends \SplFileObject with the appropriate methods, and enforce that file_class is an instance/descendant of that?

  5. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + * - file_flags: (optional) Bitmask flags for the SplFileObject. Defaults to
    + *   bitmask value of 15.
    

    '15' tells me nothing. Are there constants we could use here? Like

    FLAG_ONE | FLAG_TWO<code>?
    </li>
    
    <li>
    <code>
    +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   keys:
    + *     - id
    

    Let's add a comment explaining that this will create a single-column text key, using the value from the 'id' column of the CSV file.

  6. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   enclosure: '%'
    

    Using % as an enclosure is really jarring, and I can't imagine it'd be very common. Let's just use a single quote instead.

  7. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   escape: '`'
    

    Can we add a usage of this character to the example data, so as to illustrate what it's for?

  8. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + *   file_flags: 14
    

    We need a comment explaining what this flag will do.

  9. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    + * reading functionality. Configuration option file_flags is a bitmask. In this
    

    We already explained that file_flags is a bitmask, so we can remove this sentence.

  10. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    if (empty($plugin_configuration['path'])) {
    +      throw new MigrateException('You must declare the "path" to the source CSV file in your source settings.');
    +    }
    

    Why not use file_exists() here?

  11. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    if (empty($plugin_configuration['keys'])) {
    +      throw new MigrateException('You must declare "keys" as a unique array of fields in your source settings.');
    +    }
    

    This should check if $plugin_configuration['keys'] is an empty array.

  12. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    }
    +
    +  }
    

    Nit: There's an extra blank line here.

  13. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    $path = $this->getConfiguration()['path'];
    

    Since we do this at least twice, how about we instantiate the file reader class in __construct(), then call a method on it to get the path instead?

  14. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +  /**
    +   * @return \SplFileObject
    +   */
    +  protected function setupFile() {
    

    Doc comment needs to be expanded. Alternatively, if we move the logic from initializeIterator() into the constructor, we can just move all of this method's logic into initializeIterator(), where it makes more sense (IMHO).

  15. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +      else {
    +        $ids[$value]['type'] = 'string';
    +      }
    

    The doc comment says that this will default to text, not string. Let's make it consistent.

  16. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    if (!$this->file) {
    +      $this->initializeIterator();
    +    }
    +    foreach ($this->file->getColumnNames() as $column) {
    

    We can reduce this to foreach ($this->initializeIterator()->getColumnNames() as $column).

  17. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +    $fields = $this->getConfiguration()['fields'] + $fields;
    +
    +    return $fields;
    

    We don't need to reassign $fields, just return $this->getConfiguration()['fields'] + $fields.

  18. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function getConfiguration() {
    +    return $this->configuration;
    +  }
    

    Do we not inherit this from SourcePluginBase() already?

  19. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +  /**
    +   * Gets the file object.
    +   *
    +   * @return \SplFileObject
    +   *   The file object.
    +   */
    +  public function getFile() {
    +    return $this->file;
    +  }
    

    I'm not sure we need this method. Why can't calling code simply use this plugin as an iterator, like all other source plugins? If it's to maintain BC, we should deprecate this right now.

  20. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  public function setConfiguration(array $configuration) {
    +    // We must preserve integer keys for column_name mapping.
    +    $this->configuration = NestedArray::mergeDeepArray([$this->defaultConfiguration(), $configuration], TRUE);
    +  }
    

    Correct me if I'm wrong, but can't we use ConfigurablePluginTrait for this?

  21. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +      'file_class' => 'Drupal\migrate\Plugin\migrate\source\CSVFileObject',
    

    We can use the ::class form here.

  22. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,305 @@
    +  /**
    +   * {@inheritdoc}
    +   */
    +  final public function calculateDependencies() {
    +    return [];
    +  }
    

    Making this final is a good idea!

heddn’s picture

I'll see when I can respond. But a quick thought is that the second half of this review should result in better code comments. Not just blindly changing the code. I'm not able to go into more detail.

smaz’s picture

@heddn ping on your above comment - people are working on migrate + Umami at the moment at devdays so we might be able to get some help on this.

heddn’s picture

Try to answer what you can. But 103.14, 19, 21 (as examples) are there for a reason and we need to just add comments why. I wish my life weren't so busy right now or I'd go into more detail.

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new8.59 KB
new50.1 KB

I think I hit most of the above points of feedback. Exceptions are as follows.

103.8: There is already a comment about what this does.
103.18: no, it doesn't implement ConfigurablePluginInterface. That is issue #2937177: Migrate plugin base classes should implement ConfigurablePluginInterface.
103.19: because of #2954413: Add getter for file.
103.20: ConfigurablePluginTrait doesn't exist; maybe you are thinking of the above ^ issue?

Status: Needs review » Needs work

The last submitted patch, 107: 2931739-107.patch, failed testing. View results

Version: 8.6.x-dev » 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

phenaproxima’s picture

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + * - keys: Array of column name(s) which represent the columns uniquely
    

    "name(s)" should be "names". If there's one item in that array, so be it; but the (s) is jarring here.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + *   column names and the values should be sub-arrays that define the database
    + *   column schema to store values for that key. Otherwise, defaults to a schema
    

    We start talking about a database schema -- but that only applies to ID mapping with the Sql map plugin. Which is the default, of course; but if you don't know that, then this tickles the WTF reflex. So we should probably link to the Sql class documentation for reference.

    Also, instead of "defaults to a schema of 'string'", can we just say "By default, each key is stored as a string"?

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + *   the source property names names of each row. Any values explicitly defined
    

    "names" is duplicated.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + *   the array must have a numeric key. Numbering starts at 0. Each value should
    

    "...a numeric key, starting at 0". Also, why the hell would this be an array of arrays? It seems like column_names should be a simple hash (key => value); having it be a nested array is really confusing. So, if we can avoid that, so much the better.

  5. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + * - file_class: (optional) Fully qualified name of a class (including the full
    + *   namespace) that can read the CSV file. The class must extend \SplFileObject
    + *   and implement \Drupal\migrate\CSVFileInterface.
    

    I'm thinking we should remove this option entirely. I can't honestly see too many people needing to extend the file class, or change it. (But it's worth grepping contrib to prove me either right or wrong on that point.) If they do need to extend it, they can also extend this source and implement their own custom logic around that. Also, CSVFileInterface is gone from this patch (and rightfully so), so we should remove all mention of it.

  6. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + * @see http://php.net/manual/en/class.splfileobject.php.
    

    Let's move this @see up to where the other one is.

  7. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    + * @see http://php.net/manual/en/splfileobject.setcsvcontrol.php.
    + * @see http://php.net/manual/en/splfileobject.setflags.php.
    

    Ditto.

  8. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    // Path is required. The file might not exist at this path at plugin
    +    // discovery, so we do not do a file_exists check here.
    

    The constructor isn't called during discovery, so I'm not sure why this is relevant.

  9. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    if ($this->file) {
    +      return $this->file;
    +    }
    

    Why do we need to cache the file object? Isn't it the whole point of initializeIterator() that we return a fully prepared iterator, and let the calling code deal with it?

  10. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +      throw new InvalidPluginDefinitionException($this->getPluginId(), sprintf('File path (%s) does not exist.', $path));
    

    This is not a problem with the plugin definition; it's a problem with configuration. I think this should be InvalidArgumentException.

  11. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    $this->file = new $this->fileClass($path);
    

    I think that the creation of the file object should be done in a protected method, so that ninjas can override it if needed. This would also allow us to remove the arcane file_class option.

  12. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +  protected function setupFile() {
    

    I'm not sure why this method exists; it seems like its contents should be part of initializeIterator().

  13. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    $delimiter = $plugin_configuration['delimiter'];
    +    $enclosure = $plugin_configuration['enclosure'];
    +    $escape = $plugin_configuration['escape'];
    +    $this->file->setCsvControl($delimiter, $enclosure, $escape);
    

    I think this can all be one line.

  14. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +      if (is_array($value)) {
    +        $ids[$delta] = $value;
    +      }
    +      else {
    +        $ids[$value]['type'] = 'string';
    +      }
    

    It's really weird to me that we're setting $ids[$delta] or $ids['value'] depending on what $value is. Shouldn't $ids always be keyed by the name of the key?

  15. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    foreach ($this->initializeIterator()->getColumnNames() as $column) {
    

    We shouldn't be calling initializeIterator() just to get the column names; it forces us to have $this->file and make initializeIterator() mutate its state, which is awkward. Let's just add a new protected method to derive the column names without needing to touch the iterator.

  16. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +    // Any caller-specified fields with the same names as extracted fields will
    +    // override them; any others will be added.
    +    return $this->getConfiguration()['fields'] + $fields;
    

    Wat? We're relying on undocumented configuration properties here ('fields' isn't mentioned in the doc comment).

  17. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,309 @@
    +  /**
    +   * Gets the file object.
    +   *
    +   * @return \SplFileObject
    +   *   The file object.
    +   */
    +  public function getFile() {
    +    return $this->file;
    +  }
    

    What is the purpose of this method? I don't see it being used for anything. We should remove it unless there's a good reason to keep it around.

  18. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSVFileObject.php
    @@ -0,0 +1,103 @@
    +  /**
    +   * Sets the CSV column names.
    +   *
    +   * @param string[] $column_names
    +   *   Sets the CSV column names.
    +   */
    +  public function setColumnNames(array $column_names) {
    

    There's more to say here. Isn't $column_names supposed to be a key => value hash?

  19. +++ b/core/modules/migrate/tests/modules/csv_source_yield_test/src/Plugin/migrate/source/YieldRows.php
    @@ -0,0 +1,45 @@
    +/**
    + * Yields each image and sku.
    + *
    + * @MigrateSource(
    + *   id = "yield_rows"
    + * )
    + */
    +class YieldRows extends CSV {
    

    This is neat, but why do we have it? I don't understand what we're testing here.

  20. +++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv
    @@ -0,0 +1,16 @@
    +id,first_name,last_name,email,ip_address,date_of_birth
    +1,Justin,Dean,jdean0@example.com,60.242.130.40,01/05/1955
    +2,Joan,Jordan,jjordan1@example.com,137.230.209.171,10/14/1958
    +3,William,Ray,wray2@example.com,4.75.251.71,08/13/1962
    +4,Jack,Collins,jcollins3@example.com,118.241.243.64,08/16/1977
    +5,Jean,Moreno,jmoreno4@example.com,12.24.215.20,10/18/1940
    +6,Dennis,Mitchell,dmitchell5@example.com,185.24.131.116,08/21/1999
    +7,Harry,West,hwest6@example.com,101.74.110.171,10/05/1987
    +8,Rebecca,Hunt,rhunt7@example.com,253.107.6.23,04/21/1922
    +9,Rose,Rogers,rrogers8@example.com,21.2.126.228,08/14/2005
    +10,Juan,Walker,jwalker9@example.com,192.118.77.225,03/09/1958
    +11,Lois,Price,lpricea@example.com,231.185.100.19,04/08/1944
    +12,Patricia,Bell,pbellb@example.com,226.2.254.94,02/26/1950
    +13,Gerald,Kelly,gkellyc@example.com,31.204.2.163,09/25/1948
    +14,Kimberly,Jackson,kjacksond@example.com,19.187.65.116,06/21/1926
    +15,Jason,Mason,jmasone@example.com,225.129.68.203,10/01/1950
    

    Ideally, this file would include data that would exercise the delimiter, enclosure, and escape characters.

  21. +++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/field.storage.node.field_last_name.yml
    --- /dev/null
    +++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml
    
    +++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml
    +++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml
    @@ -0,0 +1,11 @@
    
    @@ -0,0 +1,11 @@
    +langcode: en
    +status: true
    +dependencies: {}
    +third_party_settings: {}
    +name: People
    +type: people
    +description: ''
    +help: ''
    +new_revision: true
    +preview_mode: 1
    +display_submitted: true
    

    Why not import the data as user accounts instead? Given the data we're testing with, that's a more realistic scenario.

  22. +++ b/core/modules/migrate/tests/src/Kernel/Plugin/MigrationProvidersExistTest.php
    @@ -36,6 +36,10 @@ public function testProvidersExist() {
    +      // CSV source intentionally does not have a source module.
    +      if ($id == 'csv') {
    +        continue;
    +      }
    

    Rather than make a bizarre exception here, why not just set the source module to "migrate"?

  23. +++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVTest.php
    @@ -0,0 +1,59 @@
    +  public function testCreate() {
    +    /** @var \Drupal\migrate\Plugin\MigratePluginManagerInterface $migrationSourceManager */
    +    $migrationSourceManager = $this->container->get('plugin.manager.migrate.source');
    +    $this->assertTrue($migrationSourceManager->hasDefinition('csv'));
    +  }
    

    We can remove this method; I don't think it covers anything useful (it also does not cover __construct() at all). If the 'csv' plugin does not exist, we *will* run into an exception elsewhere in the test.

  24. +++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVTest.php
    @@ -0,0 +1,59 @@
    +    $node = Node::load(1);
    +    $this->assertEquals($node->label(), 'Justin Dean');
    +    $this->assertEquals($node->get('field_first_name')->value, 'Justin');
    +    $this->assertEquals($node->get('field_last_name')->value, 'Dean');
    +    $this->assertEquals($node->get('field_email')->value, 'jdean0@example.com');
    +    $this->assertEquals($node->get('field_ip_address')->value, '60.242.130.40');
    +    $this->assertEquals($node->get('field_dob')->value, '1955-01-05');
    +  }
    

    We should be asserting all of the imported data, not just one record.

  25. +++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
    @@ -0,0 +1,113 @@
    +class CSVFileObjectTest extends CSVUnitTestCase {
    

    Do we really need a base class for this test? It seems like it would be acceptable to merge the two.

  26. +++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
    @@ -0,0 +1,113 @@
    +    $this->csvFileObject->setFlags(\SplFileObject::READ_CSV | \SplFileObject::READ_AHEAD | \SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY);
    

    According to the source plugin's doc comment, these are the defaults. We should just make them the defaults in CSVFileObject, and remove this line.

  27. +++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
    @@ -0,0 +1,113 @@
    +  /**
    +   * Tests that the construction appropriately creates a CSVFileObject.
    +   *
    +   * @covers ::__construct
    +   */
    +  public function testCreate() {
    +    $this->assertInstanceOf(CSVFileObject::class, $this->csvFileObject);
    +  }
    

    We can remove this; it's not really testing anything.

  28. +++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
    @@ -0,0 +1,113 @@
    +    $this->assertEquals($expected, $actual);
    

    This should be assertSame().

  29. +++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
    @@ -0,0 +1,113 @@
    +    $actual = $this->csvFileObject->count();
    +
    +    $this->assertEquals($expected, $actual);
    

    We should use assertCount() here.

  30. +++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
    @@ -0,0 +1,104 @@
    +  /**
    +   * The happy path file url.
    +   *
    +   * @var string
    +   */
    +  protected $happyPath;
    +
    +  /**
    +   * The un-happy path file url.
    +   *
    +   * @var string
    +   */
    +  protected $sad;
    

    This is adorable, but can we choose better property names? These aren't very descriptive.

eiriksm’s picture

Assigned: Unassigned » eiriksm
Issue tags: +DrupalEurope

Working on this

eiriksm’s picture

Status: Needs work » Needs review
StatusFileSize
new50.1 KB
new555 bytes

WOW!

So, this was a bit of a rabbit hole. I started by looking into the test failures. And for some reason the plugin was not picked up. So digging deeper in the rabbit hole, I found out the annotation parser did not pick it up. So why did the annotation parser not pick it up? Well because the function token_get_all, which the Doctrine TokenParser uses, tries to set the doc comment by using the last known doc comment before the token T_CLASS is found. So it initially gets found, but then, since we changed this:

-      'file_class' => 'Drupal\migrate\Plugin\migrate\source\CSVFileObject',
+      'file_class' => CSVFileObject::class,

there is another instance of the T_CLASS token, but without the doc comment (of course).

Not sure if we should reports it as a bug to doctrine/annotations as well, but for now: It's not possible to have a plugin discovered by annotation if you also use ::class in the same file :o

So for starters, here my try to fix the test failures from #107. I will look at the comments after I can verify this fixes the failures.

heddn’s picture

That is already fixed upstream. We just cannot use that style of syntax until we can upgrade to the newer version of doctrine.

Status: Needs review » Needs work

The last submitted patch, 112: 2931739-112.patch, failed testing. View results

eiriksm’s picture

Status: Needs work » Needs review
StatusFileSize
new2.41 KB
new50.24 KB

OK, cool.

Here is an updated patch that should fix the test failures for starters. Interdiff from #107, not my own.

eiriksm’s picture

StatusFileSize
new50.24 KB

Oh snap, was a bit fast at typing. Here is the patch in .patch format

dww’s picture

Assigned: eiriksm » Unassigned
Status: Needs review » Needs work

Nice work, thanks!

However, there's still a long list of (mostly spot-on) changes from the review in #110 that need to be addressed.

Re: #110.1, this whole first sentence:

* - keys: Array of column name(s) which represent the columns uniquely
*   identifying each record. If declared as a nested array, the keys must be

could be even simpler:

* - keys: Array of column names that uniquely identify each record. 

Re: #110.4: "Also, why the hell would this be an array of arrays?"
And #110.18: "Isn't $column_names supposed to be a key => value hash?"

We went over this at #87.17, #88, and sadly at #91 @mikeryan confirmed It's A Thing(tm).

Yes, $column_names *should* be a simple hash of numeric keys (index or delta) and a string values (the names). There's apparently a "need" for an optional 3rd layer of "description". Perhaps that should be punted to another config knob if it really needs to exist, so column_names doesn't have to be weirdly overloaded like this. column_names vs. column_descriptions or something. Both are simple integer key to string value hashes. Both are optional so long as your CSV provides a header row and column_names are defined inline. If there's no header row, column_names is required (but simple to understand, unlike now).

+1 to everything else from #110.

Thanks,
-Derek

p.s. Also unassigning, since I don't assume @eiriksm wants to re-roll for all of this. If anyone wants to actively work on this, please assign yourself. Thanks again!

eiriksm’s picture

Hi, sorry about that. My intention was to work through the list after fixing the failures but I got swamped by other things. I will assign myself if I get the time to pick it up in the coming days

mikelutz’s picture

longwave’s picture

Status: Needs work » Needs review
StatusFileSize
new47.47 KB
new11.75 KB

Tried to move this along a bit from the reviews in #110 and #117.

#110.14: This is the result of 'keys' being either a plain string or a field definition; I tried to add comments to help explain what is going on.
#110.16: Not sure what this undocumented property was useful for, so I removed it.
#110.19/20/21/24/25/26/30: Left these alone for now.

Everything else noted in the reviews should be fixed.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,294 @@
+ * - keys: Array of column names that uniquely identify each record, if keys
+ *   should be stored as strings. Otherwise, the keys must be column names and
+ *   the values should be field structure definitions as used by

I realise I'm overloading the word "keys" here to mean two different things, but this structure is hard to explain succinctly!

longwave’s picture

Whoops, I broke some of the tests entirely, this should be a bit better.

The last submitted patch, 120: 2931739-120.drupal.Bring-migratesourcecsv-to-core.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

dww’s picture

Status: Needs review » Needs work

That looks great, thanks!

#110.14: This is the result of 'keys' being either a plain string or a field definition; I tried to add comments to help explain what is going on.

The comments partly help. But the deeper issue of how weird the code is remains. Maybe it's too much to ask that we actually fix this stuff, given that it all already works in contrib. It does seem like the sort of thing that has to happen to modules as they go into core. If they have any bizarre code, they need to be simplified and cleaned up before being committed.

#110.19/20/21/24/25/26/30: Left these alone for now.

Those are all test cleanups. Can/should they be moved to a follow-up issue? If so, let's open it as a child issue. If not, we need to fix the tests before this is committed.

Either way, this still needs work. But it's getting very close!

Thanks again,
-Derek

longwave’s picture

I agree the tests still need to be fixed here, I just ran out of time.

Dwelling on this overnight I wonder if the complication around 'keys' is YAGNI for core. CSV is inherently a file made of strings so what is the use case for the key being something other than a string? Should we simplify this entirely for core and leave it to subclasses if they need anything more complex, much like the removal of the 'file_class' and 'fields' options?

phenaproxima’s picture

Speaking as a Migrate maintainer for core, I can confidently say that, as long as we use a different plugin ID, we are absolutely free to disregard any kind of backwards compatibility with contrib.

As far as core is concerned, this is a new feature. Therefore, our priority should be to build a well thought-out and easy to use CSV plugin, not maintain compatibility with contrib. The old plugin can continue to exist in contrib, with all its idiosyncrasies, if people need it.

mikelutz’s picture

dww’s picture

Re: #125:

...as long as we use a different plugin ID, we are absolutely free to disregard any kind of backwards compatibility with contrib.

Then we would have to change the plugin ID for this, since the patch is currently using the one from contrib, 'csv'. :/

Re: #124:

CSV is inherently a file made of strings so what is the use case for the key being something other than a string?

The string can hold different kinds of values. Some "strings" are actually integers that would be much more happily stored and indexed as ints in the schema, not strings. Some strings are fixed length and relatively short. Some strings are huge. Some are floats (the kind of number, not CSS). ;) Etc.

I think part of the confusion is that the config setting is called "keys" but really it's the means by which this plugin can implement MigrateSourceInterface::getIds(). If you read the PHP doc for that, it starts to make more sense why we need to allow field structures to be defined. It's simply a config shortcut that you can use a flat array of strings, in which case we use those values as the machine name index for the ids structure, and we give you a default subarray to declare that id as a string. testGetIdsComplex() is showing it in action.

We could potentially improve both test and doc coverage of cases where numeric keys are in use and could be specified as such. I'm glad we're testing "paragraph" sized big text columns, but ints (and floats) seem very common in various migration cases, and worth both documenting and testing as part of this. In fact, all of the test CSV sources that use an 'id' are actually ints, and we should probably define/test them that way. Or at least have 2 tests showing that int ids (e.g. nids, uids, etc) work when declared as either ints or they use the default string schema.

dww’s picture

Issue summary: View changes

Cleaned up the API changes section of the summary, and updated the remaining tasks.

Also, this:

+ * - keys: Array of column names that uniquely identify each record, if keys
+ *   should be stored as strings. Otherwise, the keys must be column names and
+ *   the values should be field structure definitions as used by
+ *   MigrateSourceInterface::getIds().

might be more clear as:

* - keys: Array of column names that uniquely identify each record, if each key
*   should be stored as a string. Otherwise, the array must be indexed by column
*   name and the values should be field structure definitions as used by
*   MigrateSourceInterface::getIds().

(sorry if that wrapping isn't right, I'm doing this comment via browser, not emacs).

mikelutz’s picture

StatusFileSize
new45.29 KB
new18.31 KB

This still needs work on tests, I need to go through the rest of #110. This is my attempt to add some sanity to the whole column names thing. I'm content to change variable and config key names around, but code wise, this makes more sense to me.

The human readable descriptive names are ONLY for the fields method in the source. There is absolutely no reason why the file object needs to know about them. Now we have two config items, column_names and column_descriptions, both are simple indexed arrays to map the 0 indexed column to a field key and description. If not present, header row is used for either/both, and lacking everything, you basically get no fields, so I want a little more error handling around that. I rather like requiring either a header row or column names, but if we decide to allow the whole thing to be accessed using just the column index, then we need a little more work around the fields method to support it.

I also reduced the ids array to a simple array that makes everything a string. If there's strong reason to allow other id types, we can put it back.

dww’s picture

Disclaimer: I haven't looked at the patch nor interdiff, only going on your comment. About to head into a meeting, so I don't have time for more right now.

Thanks for column_names vs. column_descriptions!

However, -1 for "I also reduced the ids array to a simple array that makes everything a string. If there's strong reason to allow other id types, we can put it back.". See #127 for why.

I'm a little worried that #129 is moving too many things in too many different directions before we have agreement on what we're trying to accomplish.

Thanks,
-Derek

mikelutz’s picture

The string can hold different kinds of values. Some "strings" are actually integers that would be much more happily stored and indexed as ints in the schema, not strings. Some strings are fixed length and relatively short. Some strings are huge. Some are floats (the kind of number, not CSS). ;) Etc.

While that may be true, the purpose of this typing is specifically for the idmap. While the default is certainly a SQL IdMap and a mySQL backend, it's not guaranteed. At the end of the day, the migration executable writes the map using the source values from the row. Unless we are going to typecast based on those key types, or have a lot more error checking around them, at the moment that happens, the executable is sending in a string value because the row is an array of strings. So despite the fact that it may seem like a simple incrementing integer value in a spreadsheet, and it may seem reasonable that it should be stored and indexed in the database as an integer, you are trusting an unknown idmap object can handle getting a string when you've told it it's getting an integer. The truth is, it probably can, but I'm not a fan of sending a sting to do an integer's job. Just because php is loosely typed doesn't mean we need to write loosely typed code.

As far as supporting longer strings, I think there's a better case for that, but I'm also not sure it's reasonable. if you need a paragraph sized identifier, you are probably doing something wrong.

I was thinking about potentially supporting using the csv line number as a migration key if none is specified. I won't think it's an ideal situation, but i could see some use cases where someone might want it.

dww’s picture

All of #131 is probably true, but that seems like a fight to have with MigrateSourceInterface::getIds(), not this patch. If that API expects/allows all this, that's the place to have the tests to see what happens with faulty input, not in this particular implementation. We're just making sure the source config allows us to uphold the API we've been given to work with. Contrib has used this particular way, the 'keys' config knob, a large number of existing migrations are written to work with that, and it'd be pretty annoying for us to both squat on the 'csv' namespace *and* break all those migrations by changing it at this point. It's wonky, but we almost fully understand it now and nearly have it documented properly. Most people will never care. If they care, and they go out of their way to define a column as an int, it's their problem if they have dirty source data.

mikelutz’s picture

I realize up around #81 @heddn talked about a straight port and keeping the same namespace, but the more I dig into this, the more I agree with #125. These configuration knobs are wonky, and once they are committed, it's going to be very difficult to fix. If we change the plugin name we can use a more intuitive set of configurations, and anybody currently using contrib can just keep doing so until they are ready to update their migrations.

I still vote against allowing non string ids when we only provide string fields, but I'm far more bothered by the column_names knob. If you aren't using a header row, you should be able to supply a list of column names as a simple array, and you shouldn't have to provide descriptions beyond the field names if you don't want to. Just making that change would break existing migrations and would warrant a name change, but I think it makes the plugin more useable going forward, and it's far better to do it now than try to do it later.

mikelutz’s picture

Rather than make a bizarre exception here, why not just set the source module to "migrate"?

More interestingly why is there a test in the migrate module which extends a test class from the migrate_drupal module and enables a test module from the migrate_drupal_ui module to test that all source plugins have an annotation that they aren't required to have?
The module dependencies for tests should go the other way. Source plugins aren't required to have a source module annotation. Migrations tagged with 'Drupal 6' or 'Drupal 7' are only allowed to use source plugins which have a source module annotation. We shouldn't have to add dummy annotations to source plugins that don't apply to Drupal migrations just to pass tests.
Migrate has been built primarily for d6 and d7 migrations, but it's purpose is greater, as this issue demonstrates. If there isn't a source module, there isn't a source module.

heddn’s picture

I don't know if this helps with some of the confusion. Around the timeframe of #110, there was a conversation in slack that never made it into this issue. Or so it would appear. In this conversation, @phenaproxima and @heddn agreed to not do a straight port as put forth in #81. So we /will/ have a new name. And this lets us make the API "right" and "sane".

As far as the column names thing, we don't need to expose that to folks in the API. For keys, I want to vote for using the approach we see in migrate_plus and core and call the thing ids and use the same sql style declaration of int, string, etc.

phenaproxima’s picture

Migrate has been built primarily for d6 and d7 migrations, but it's purpose is greater, as this issue demonstrates. If there isn't a source module, there isn't a source module.

Migrate can freak out and throw exceptions during plugin discovery if source_module isn't set. In retrospect, that behavior should have been entirely confined to Migrate Drupal, but at this point it's too late to go back. In Drupal 9, we can and should scale back the invasiveness of the source_module and destination_module stuff. But for now, we should just set the source_module to 'migrate' and call it a day.

mikelutz’s picture

Migrate can freak out and throw exceptions during plugin discovery if source_module isn't set. In retrospect, that behavior should have been entirely confined to Migrate Drupal, but at this point it's too late to go back.

I just did another quick grep, The only place I see the exception raised is in \Drupal\migrate_drupal\MigrationPluginManager::processDefinition() which throws an exception if migrate_drupal is enabled and there is a migration tagged with Drupal 6 or Drupal 7 that uses a source plugin with no source module defined. The exception declares that the source plugin must define the source module property, but it's really an exception on the migration, which shouldn't be using a source with no module defined. Actually, that should allow the migration to define a source module in it's configuration if it wants to, to mimic SourcePluginBase::getSourceModule.

Regardless, It should be punted to another issue, I suppose.

dww’s picture

Re: #135:

As far as the column names thing, we don't need to expose that to folks in the API.

I don't know what that means. Of course we have to let people define their column names, or they can't write the rest of their migration. Are you proposing we *require* a header row? I'd be very strongly opposed to that. So we have to have a way to declare the column names via the API of how you configure your source plugin.

Maybe you mean we don't have to expose column descriptions to folks? Great. But then shouldn't we have an issue somewhere to deprecate that part of the migrate source plugin API? See comment #91 from @mikeryan.

Regardless, if the party line is that core is going to change namespaces and break existing migrations for this, so be it. In that case, the summary is out of date, and perhaps some of the migrate maintainers who've been discussing this offline could update things to reflect the current "agreement". Namely, the API changes section can no longer claim BC and ease-of-use for existing users of the contrib module, and it should start to spell out the ways that we envision this working differently (even if it's primarily about changing the available config knobs).

Also agreed that #136 and #137 are out of scope for this issue.

Thanks,
-Derek

heddn’s picture

Yeah, we need to update the IS.

Here's what I expect the yml looks like:

    0: id
    1: first_name
    2: last_name
    3: email
    4: country
    5: ip_address
    6: date_of_birth

or alternatively:

    - id
    - first_name
    - last_name
    - email
    - country
    - ip_address
    - date_of_birth

This as opposed to the existing requirement:

    0:
      id: Identifier
    1:
      first_name: First Name
    2:
      last_name: Last Name
    3:
      email: Email Address
    4:
      country: Country
    5:
      ip_address: IP Address
    6:
      date_of_birth: Date of Birth

Then I expect the source module to array_combine or some such on id, first_name, etc. to make the required nested array that migrate needs.

So, no API exposed to site users about a double nested array thing, just a yaml array that the csv plugin converts into the right format.

dww’s picture

Right, so column names are required, either via header row or a (simple) array.

Column descriptions are optional, via a second (simple) array config knob. If not defined, we default to reusing the name as the description.

Or are you proposing we simply hide descriptions entirely and not provide any way to declare them? In that case, follow-up issue to deprecate them from the Migrate API entirely?

heddn’s picture

For a CSV source, I'm suggesting we hide their existence in the implementation details. It is confusing DX. That isn't to say it has no value. And for other sources, it might make sense. So, hide but don't deprecate is my suggestion. If someone needs the description, they should be able to extend the source plugin or keep using contrib.

dww’s picture

The nested array stuff is certainly confusing DX.
The flat arrays with separate knobs per #129 isn't bad.

If descriptions vs. machine names in general makes for a better UI/UX, and that's what the API itself supports so that various tools can benefit, it seems we should make it easier to give human readable names than "extend the plugin". I don't see what that has to do with a CSV source. Why does it matter if "first_name" vs. "First name" appears in my tool output, regardless of if the name came from a CSV file, a JSONAPI call, a DB query, etc?

I thought we were going to kill off the contrib version of this once in core. Why maintain two slightly different copies of the same thing?

andypost’s picture

Where the upgrade path from contrib will live? if structure will change then probably contrib needs to release 3.0 with 8.7 requirement and provide upgrade to disable itself at least

heddn’s picture

StatusFileSize
new9.48 KB
new44.02 KB

The tests will fail miserably so not changing status. But this rebases things using league/csv instead of a custom csv reader. I'm haven't had a chance to manually test things, but it should be a drop-in replacement for our previous file reader solution. Meaning, it should work but the tests need to be adjusted for the new solution.

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new20.06 KB
new36.58 KB

Let's see if this fixes up any of the tests.

heddn’s picture

Status: Needs review » Needs work

Some self review here:

  1. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,215 @@
    + * In this example, the migration source will use a single-column string key
    

    The string comment here is unneeded.

  2. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,215 @@
    +    if (empty($this->configuration['ids']) || !is_array($this->configuration['ids'])) {
    

    All this error checking here and below could use tests.

  3. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,215 @@
    +      'stream_filters' => [],
    

    Missing docs on this.

  4. +++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
    @@ -0,0 +1,215 @@
    +    if ($reader->supportsStreamFilter()) {
    

    This could use tests.

  5. +++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
    @@ -0,0 +1,104 @@
    +   * The multi line file url.
    +   *
    +   * @var string
    +   */
    +  protected $multiLine;
    ...
    +    $multiLineContent = <<<'EOD'
    +id,title,description
    +1,"Title 1","Description 1 Line 1
    +Description 1 Line 2
    +Description 1 Line 3"
    +2,"Title 2","Description 2 Line 1
    +Description 2 Line 2
    +Description 2 Line 3"
    +3,"Title 3","Description 3 Line 1
    +Description 3 Line 2
    +Description 3 Line 3"
    +EOD;
    ...
    +    $this->multiLine = vfsStream::newFile('multi-line.csv')
    +      ->at($root_dir)
    

    Let's get rid of this multi line scenario as we won't be supporting it in core.

  6. +++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
    @@ -0,0 +1,104 @@
    +    $happy = <<<'EOD'
    ...
    +    $sad = <<<'EOD'
    

    Let's rename these standard and non_standard.

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new9.3 KB
new35.36 KB

Status: Needs review » Needs work

The last submitted patch, 147: 2931739-147.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

dww’s picture

Hrm, I'm rather -1 to a new composer.json dependency for this. That means we have to:

a) Carefully audit the external dependency.
b) Have confidence it's well-maintained, will get security updates, that we can coordinate with those updates when needed, etc.
c) Have confidence it's going to survive as long as this code will want to use it.
d) Have confidence it's not going to have conflicting requirements from what we want (e.g. dropping support for PHP 7.0 before we do, whatever).

Looking at the interdiff, we're not saving many lines of code for all those potential costs. The entire CSVFileObject.php in contrib is 117 lines, with comments and whitespace. Most of that are trivial set/get methods. The only actual code of substance are these:

  public function rewind() {
    $this->seek($this->getHeaderRowCount());
  }

  public function current() {
    $row = parent::current();

    if ($row && !empty($this->columnNames)) {
      // Only use columns specified in the defined CSV columns.
      $row = array_intersect_key($row, $this->columnNames);
      // Set meaningful keys for the columns mentioned in $this->csvColumns.
      foreach ($this->columnNames as $key => $value) {
        // Copy value to more descriptive key and unset original.
        $value = key($value);
        $row[$value] = isset($row[$key]) ? $row[$key] : NULL;
        unset($row[$key]);
      }
    }

    return $row;
  }

I know we all love our #ProudlyFoundElsewhere tags, but it doesn't seem like an obvious win in this case.

Am I missing something?

Thanks,
-Derek

heddn’s picture

Status: Needs work » Needs review
StatusFileSize
new33.71 KB
new19.41 KB

This fixes tests locally.

heddn’s picture

Reasons to use it, it supports stream resources. The #1 requested feature for CSV files. The contrib module doesn't have that support. https://thephpleague.com/ is a well respected member of the PHP community.

The downside as you stated is lining up release dates. From https://csv.thephpleague.com/

Once a new major version is released, the previous stable release remains supported for six more months through patches and security fixes.

dww’s picture

Okay, duly noted. Thanks.

However, it seems like adding a new feature request on the way into core is the wrong approach. ;) Certainly the use-case-in-core for this (the Umami stuff) doesn't care about streams -- we'll have local files. Stream support seems to open up a lot of potential what-ifs that should be more carefully considered. IMHO, we should either:

a) Keep the core patch simple, small, easy, based on already-proven code, without introducing new dependencies. Add stream support at a later date in a follow-up issue.

or:

b) Try out the new dependency and stream support in contrib for a while and forget about trying to get this into 8.7.0.

Either way, I don't see much hope in adding a new external dependency at this late date in the 8.7.x release cycle. Maybe the framework and release managers will see it differently, but I doubt it.

andypost’s picture

I'm also see no real benefit of external dependency, meanwhile using this lib I recall issues with encodings and that cause to always override it to properly handle variable encodings

heddn’s picture

There is stream filter support for encoding conversion. I opted not to include it for the initial patch here. Since for core's use, we don't need to handle encoding conversion. It would be a later follow-up. Having an absolute path for a local file for CSV is painful.

The following is required to get the absolute path:

function hook_migration_plugins_alter(&$definitions) {
  $definitions['migrate_csv_test']['source']['path'] = drupal_get_path('module', 'my_Module') . $definitions['my_migration']['source']['path'];
}

Were-as we could use a stream once #1308152: Add stream wrappers to access .json files in extensions lands. And no plugin alters and cache clearing are needed.

I'm not particularity fixed on getting this into 8.7 if 8.8 works better. But it would be nice to use a library instead of parsing the CSV ourselves.

heddn’s picture

Status: Needs review » Closed (won't fix)

Yes we can and have written a nifty CSV solution in contrib. But if we bring it into core, it has several latent limitations and any enhancements to address these features would leave us writing our own solution. And we'd take that ~100 lines and quickly run into many 100s of lines. Migrate maintainers have many other priorities. Continuing to roll our own solution for reading CSVs doesn't seem like a good use of that time.

Known limitations:

  • no stream support
  • no BOM removal
  • no encoding/decoding at stream reading time (only supports utf-8 char sets)
  • hook_migration_plugins_alter is need for local file support in a lot of cases

Until we're more welcome to use 3rd party PHP libraries, I'm going to mark this closed, won't fix. I don't really want to implement a halfway CSV solution in core that has so many limitations.

mlncn’s picture

Too bad; this would make the migration support already in Drupal core really come alive. I thought Drupal had long since moved to embracing well-maintained 3rd-party dependencies!

heddn’s picture

I've opened #3042160: Utilize leage/csv for CSV reader to move this over into contrib.