Bring migrate_source_csv to core [#2931739]

Problem/Motivation

The core migrate system needs a way to import things from CSV files.

Immediate use case is for the Umami Demo profile now in core in 8.6.x. The imported content is in CSV files and we have code to read it and create the relevant entries. Having migrate_source_csv in core would allow us to remove that code and use migrate instead.

See #2809635-37: Create experimental installation profile for where @larowlan suggested bring migrate_source_csv into core.

Other good reasons to have CSV as a migration source in core:

"I use this constantly. I've seen it forked into the Contenta distro, and a few of my own projects (I just want the source plugin, not another module.) It'll also be vital to Commerce integrations as CSV is a common way to import data from an ERP." @mglaman (#4)
"CSV format is the first format used by non technical professionnals who need to import data and to generate/edit those datas thanks to software they knew (MS Excel without mention it). It's also the easiest way to export data from many ERP and big solutions with datas." @GoZ (#5)
"CSV is simply a really common used format and it would make migrate more usable by default for many usecases. One could say: Oh it is easy to parse CSV, but it turns out it is not. On top of that the module also doesn't load the entire CSV file upfront, but rather reads line by line. I think this makes it a good candidate also for bigger migrations out there." @dawehner (#14)
"This would help the Out of the Box Initiative. Let's do it! Consider this my maintainer +1 :)" @phenaproxima (#18)

Proposed resolution

The migrate_source_csv module has a stable release (2.0), has been downloaded almost 100k times, and comes complete with tests. The contenta CMS has been using this for their migrations, too.

Rename the files/namespaces, and move it in, as-is, to the core migrate module.

Since this is a straight port from contrib, discussions about fetchers, etc. are postponed to follow-ups. See #16, #36 and #37 for some of the highlights of the discussion.

Follow-up: #2962091: Adopt fetchers/parsers logic for use by source csv plugin

Remaining tasks

~~Patch that renames files/namespaces and moves the contrib code into core~~ Done.
~~Tests~~ Done.
~~Word-smithing PHPDocs~~ Done.
~~Decide if we're okay with a straight port (done), or if we need to hash out all the fetcher/parser stuff first, figure out the design, class names, avoid conflicts, possible BC hell, etc.~~ Done. See comments #80 through #83. Punted to #2962091: Adopt fetchers/parsers logic for use by source csv plugin.
Finish cleaning up the docs and tests.

None.

User interface changes

None.

API changes

Adds a new migrate source plugin called "csv" in migration yml definition files. The class providing the plugin (\Drupal\migrate\Plugin\migrate\source\CSV) is heavily commented with PHPDoc.

Adds a \Drupal\migrate\CSVFileObject class (which extends \SplFileObject).

Existing users of this plugin from contrib are unharmed. The name is the same, but we haven't changed anything about how it works. Contrib users simply disable migrate_source_csv module before upgrading to core 8.X.Y and none of their migration files have to be changed to continue working.

Data model changes

None.

Original report by @smaz

In #2809635: Create experimental installation profile, we are using CSV files to provide default content for a demo installation profile of Drupal.

In a review, @larowlan suggested bringing the CSV migration sources into core:
https://www.drupal.org/project/drupal/issues/2809635#comment-12381619

I agree that a CSV migrate source would be a very useful feature for core.

The migrate_source_csv module has a stable release (2.0), has been downloaded almost 100k times (not sure if that includes composer stats?), and comes complete with tests. The contenta CMS has been using this for their migrations, too.

So would it be ok to look at moving this into Core's migrate module? If so, I'm happy to try and turn the module into a patch.

Comment	File	Size	Author
#150	interdiff_147-150.txt	19.41 KB	heddn
#150	2931739-150.patch	33.71 KB	heddn
#147	2931739-147.patch	35.36 KB	heddn
#147	interdiff_145-147.txt	9.3 KB	heddn
#145	2931739-145.patch	36.58 KB	heddn
#145	interdiff_144-145.txt	20.06 KB	heddn
#144	2931739-144.patch	44.02 KB	heddn
#144	interdiff_129-144.txt	9.48 KB	heddn
#129	interdiff.2931739.120-129.txt	18.31 KB	mikelutz
#129	2931739-129.drupal.Bring-migratesourcecsv-to-core.patch	45.29 KB	mikelutz
#121	interdiff.2931739.120-121.txt	4.64 KB	longwave
#121	2931739-121.drupal.Bring-migratesourcecsv-to-core.patch	46.29 KB	longwave
#120	interdiff.2931739.115-120.txt	11.75 KB	longwave
#120	2931739-120.drupal.Bring-migratesourcecsv-to-core.patch	47.47 KB	longwave
#116	2931739-116.patch	50.24 KB	eiriksm
#115	2931739-115.txt	50.24 KB	eiriksm
#115	interdiff.txt	2.41 KB	eiriksm
#112	interdiff-2931739.txt	555 bytes	eiriksm
#112	2931739-112.patch	50.1 KB	eiriksm
#107	2931739-107.patch	50.1 KB	heddn
#107	interdiff_96-107.txt	8.59 KB	heddn
#96	2931739-96.patch	50.44 KB	heddn
#96	interdiff_95-96.txt	3.34 KB	heddn
#95	2931739-94.patch	50.73 KB	jofitz
#92	interdiff-70-92.txt	10.24 KB	jofitz
#92	2931739-92.patch	50.73 KB	jofitz
#70	interdiff_66-70.txt	4.03 KB	heddn
#70	2931739-70.patch	50.89 KB	heddn
#66	interdiff_65-66.txt	3.01 KB	heddn
#66	2931739-66.patch	50.93 KB	heddn
#65	2931739-interdiff-59_65.txt	1.12 KB	dww
#65	2931739-65.patch	50.89 KB	dww
#50	2931739-49.migrate_source_csv.patch	49.1 KB	dww
#50	2931739-49.interdiff.txt	5.2 KB	dww
#48	2931739-48.patch	48.75 KB	heddn
#48	intrdiff_47-48.txt	5.85 KB	heddn
#47	interdiff_46-47.txt	5.49 KB	heddn
#47	2931739-47.patch	48.41 KB	heddn
#46	interdiff-43-46.txt	3.22 KB	jofitz
#46	2931739-46.patch	48.13 KB	jofitz
#43	2931739-43.interdiff.txt	4.27 KB	dww
#43	2931739-43.migrate_source_csv.patch	48.17 KB	dww
#41	interdiff_40-41.txt	551 bytes	heddn
#41	2931739-41.patch	47.9 KB	heddn
#40	interdiff_38-40.txt	10.16 KB	heddn
#40	2931739-40.patch	47.89 KB	heddn
#38	interdiff_35-38.txt	33.63 KB	heddn
#38	2931739-38.patch	44.8 KB	heddn
#35	interdiff-2931739-33-35.txt	471 bytes	rakesh.gectcr
#35	2931739-35.patch	24.78 KB	rakesh.gectcr
#32	2931739-33.patch	24.78 KB	rakesh.gectcr
#29	interdiff-2931739-27-29.txt	1.71 KB	rakesh.gectcr
#29	2931739-29.patch	24.75 KB	rakesh.gectcr
#27	interdiff-2931739-23-27.txt	536 bytes	yogeshmpawar
#27	2931739-27.patch	24.81 KB	yogeshmpawar
#23	2931739-23.patch	24.8 KB	mglaman
#20	2931739-20.patch	24.79 KB	mglaman
#56	interdiff_50-55.txt	12.51 KB	heddn
#56	2931739-55.patch	49.71 KB	heddn
#59	interdiff_55-59.txt	1.45 KB	heddn
#59	2931739-59.patch	50.76 KB	heddn
#62	2931739-interdiff-59_62.txt	2.84 KB	dww
#62	2931739-62.patch	50.9 KB	dww

Comments

Comment #1

19 December 2017 at 12:27

smaz created an issue. See original summary.

Comment #2

19 December 2017 at 12:27

Version:

8.5.x-dev

» 8.6.x-dev

Drupal 8.5.0-alpha1 will be released the week of January 17, 2018, which means new developments and disruptive changes should now be targeted against the 8.6.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #3

heddn

English

Nicaragua

commented 1 February 2018 at 16:33

+1 on this idea.

Comment #4

mglaman

English

WI, USA

commented 1 February 2018 at 16:48

I use this constantly. I've seen it forked into the Contenta distro, and a few of my own projects (I just want the source plugin, not another module.)

It'll also be vital to Commerce integrations as CSV is a common way to import data from an ERP.

Not just +1, but +1,000

Comment #5

goz commented 1 February 2018 at 16:58

Totally agreed.

CSV format is the first format used by non technical professionnals who need to import data and to generate/edit those datas thanks to software they knew (MS Excel without mention it). It's also the easiest way to export data from many ERP and big solutions with datas.

+1,000 for me too

Comment #6

olafkarsten commented 1 February 2018 at 17:17

Agreed. +1

Comment #7

eiriksm

he/him

Norwegian Bokmål

Norway

commented 1 February 2018 at 18:48

+999

Comment #8

heddn

English

Nicaragua

commented 1 February 2018 at 18:56

Rather than just +1s, let's also explain what use case in Core we need it. Just because something is popular doesn't mean it should go into core. And for migrate, we've tended to keep things in contrib that Core itself doesn't need.

I know I didn't do that myself, but don't follow my bad example ;)

Comment #9

eli-t

he/him

English

Manchester

commented 2 February 2018 at 00:30

Immediate use case is for the Umami Demo profile now in core in 8.6.x. The imported content is in csvs and we have code to read it and create the relevant entries. Having the migrate_source_csv would allow us to bin the code and use migrate instead.

Comment #10

dercheffe

he/him

German

Sersheim, Germany

commented 2 February 2018 at 13:24

IMHO this makes sense very much. Especially if you want to create different content types with different fields (maybe even new modules/successors of legacy modules, but with the same data/content).

I would also vote for these modules in core (they support both CSV also):

https://www.drupal.org/project/views_data_export

and

https://www.drupal.org/project/feeds

Comment #11

heddn

English

Nicaragua

commented 2 February 2018 at 14:29

To get a more practical discussion going, can someone take it upon them self to roll a first patch of converting the existing code base from contrib into something added to migrate.module in core? We'd need the source plugin, the file object and the tests all moved/renamed. We also need to handle BC, so a possible rename of the plugin id could take place. A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

Comment #12

mglaman

English

WI, USA

commented 2 February 2018 at 17:47

A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

I think it should be `file_csv`. The plugin is for a file source which is in the CSV format. This opens up a naming convention which better handles other file formats. For instance, this Allows for file_json or file_xml.

Comment #13

heddn

English

Nicaragua

commented 2 February 2018 at 18:44

I know it is a bit bikeshedding, but JSON and XML tends to not be a "file" as much as a URL data source. While csv is almost always a file. It really should be named 'csv', but that was already used. I wonder if there is any way to hack things so we support BC and still use the same name. We could try it at least.

Comment #14

dawehner

German

commented 2 February 2018 at 19:02

I think this really make sense. CSV is simply a really common used format and it would make migrate more usable by default for many usecases.
One could say: Oh it is easy to parse CSV, but it turns out it is not. On top of that the module also doesn't load the entire CSV file upfront, but rather reads line by line. I think this makes it a good candidate also for bigger migrations out there.

Comment #15

mglaman

English

WI, USA

commented 2 February 2018 at 20:00

XML tends to not be a "file" as much as a URL data source

Maybe so for JSON. But many times I have had to deal with systems which FTP files to a server and they are read and processed in either CSV or XML.

Comment #16

mikeryan

he/him

English

Pittsfield, MA, USA

commented 2 February 2018 at 20:04

On the last point - the last big project I did involved JSON files (provided as attachments to Jira issues, if you must know).

A larger point, though, is that the retrieval of file/stream-oriented sources would best be separated from parsing of the content retrieved. migrate_plus provides a source plugin which encapsulates separate fetcher and parser plugins (the fetcher further making use of authentication plugins), and uses this to support XML and JSON. CSV sources would also fit this model naturally, and there is an issue to do this (#2867090: Merge migrate_source_csv into migrate_plus as a parser plugin). It would make sense to me, anticipating XML and JSON sources will eventually follow CSV into core, to do this from the beginning.

Comment #17

cosmicdreams commented 12 February 2018 at 19:15

That makes sense from a classic ETL (Extract Transform Load) perspective. Which is sometimes described as ETTL (Extract Transfer Transform Load).

Comment #18

phenaproxima

he/him

English

Massachusetts

commented 22 February 2018 at 16:32

Issue tags:

+Out of the Box Initiative

This would help the Out of the Box Initiative. Let's do it! Consider this my maintainer +1 :)

Comment #19

mglaman

English

WI, USA

commented 24 February 2018 at 03:53

Assigned:

Unassigned

» mglaman

I'm going to roll a patch just to get things kicked off. If it gets rejected because we want fetchers and parsers per #16 then so be it! But a patch will help kick this along :)

Comment #20

mglaman

English

WI, USA

commented 24 February 2018 at 04:21

Assigned:	mglaman	» Unassigned
Status:	Active	» Needs review

Status	File	Size
new	2931739-20.patch	24.79 KB

Here we go!

. We also need to handle BC, so a possible rename of the plugin id could take place. A suggestion is csv_file. Perhaps someone else could come up with a better suggestion.

I didn't do this. I kept as CSV.

We didn't agree on a name, so a more straight forward patch
If someone has `migrate_source_csv` installed it technically will provide plugins after core's `migrate`, therefore it overrides the core `csv` plugin with the contrib version.

So we could rename it. Or keep it the same and allow contrib to override the plugin definition. I have a feeling that will not fly over well, but it is a possibility.

Comment #21

24 February 2018 at 05:15

Status:

Needs review

» Needs work

The last submitted patch, 20: 2931739-20.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #22

mglaman

English

WI, USA

commented 24 February 2018 at 05:46

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+namespace Drupal\Tests\migrate\Unit\Plugin\migrate\source;

+++ b/core/modules/migrate/tests/src/Unit/CSVUnitBase.php
@@ -0,0 +1,82 @@
+ * @group migrate

Test quirks that didn't fail in contrib but blew up here.

Comment #23

mglaman

English

WI, USA

commented 24 February 2018 at 05:49

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-23.patch	24.8 KB

Fix on those two issues.

Comment #24

24 February 2018 at 06:36

Status:

Needs review

» Needs work

The last submitted patch, 23: 2931739-23.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #25

mglaman

English

WI, USA

commented 24 February 2018 at 06:39

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
--- /dev/null
+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+class CSVTest extends KernelTestBase {

I missed this. Class name does not match the filename.

Comment #26

yogeshmpawar

He/Him/His

English

Pune(MH), INDIA 🇮🇳

commented 26 February 2018 at 12:02

Assigned:

Unassigned

» yogeshmpawar

Comment #27

yogeshmpawar

He/Him/His

English

Pune(MH), INDIA 🇮🇳

commented 26 February 2018 at 12:06

Assigned:	yogeshmpawar	» Unassigned
Status:	Needs work	» Needs review

Status	File	Size
new	2931739-27.patch	24.81 KB
new	interdiff-2931739-23-27.txt	536 bytes

Added Class name which matches with the filename & also added an interdiff.

Comment #28

26 February 2018 at 12:54

Status:

Needs review

» Needs work

The last submitted patch, 27: 2931739-27.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #29

rakesh.gectcr

he/him

English

Manchester

commented 21 March 2018 at 11:51

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-29.patch	24.75 KB
new	interdiff-2931739-27-29.txt	1.71 KB

Changed the Place of test to Drupal\Tests\migrate\Kernel\source, lets see how the test goes

Comment #30

mglaman

English

WI, USA

commented 21 March 2018 at 12:45

--- b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVKernelTest.php
...
+++ b/core/modules/migrate/tests/src/Kernel/source/CSVKernelTest.php
...
+class CSVKernalTest extends KernelTestBase {

CSVKernel.

This will fail.

Comment #31

21 March 2018 at 12:49

Status:

Needs review

» Needs work

The last submitted patch, 29: 2931739-29.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #32

rakesh.gectcr

he/him

English

Manchester

commented 21 March 2018 at 13:48

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-33.patch	24.78 KB

Lets hit again... :)

Comment #33

21 March 2018 at 14:44

Status:

Needs review

» Needs work

The last submitted patch, 32: 2931739-33.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #34

mglaman

English

WI, USA

commented 21 March 2018 at 14:47

+++ b/core/modules/migrate/config/schema/migrate.source.schema.yml
@@ -0,0 +1,31 @@
+    path:

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/source/CSVKernelTest.php
@@ -0,0 +1,31 @@
+class CSVKernalTest extends KernelTestBase {

CSVKernelTest

Not Kernal

Comment #35

rakesh.gectcr

he/him

English

Manchester

commented 21 March 2018 at 15:04

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-35.patch	24.78 KB
new	interdiff-2931739-33-35.txt	471 bytes

:) Thanks Matt....

Comment #36

heddn

English

Nicaragua

commented 5 April 2018 at 17:09

I'm not sure how important to block things on #16.

I did some quick design work on what we'd need to do to support the fetch/parser method this morning and it is significantly different than what is being done in migrate_source_csv today.

So, do we need to be able to support fetching the CSV from a non-local endpoint, say an HTTP request? Do need to be able to support multiple CSV URIs? If the answer is no, then we don't need fetchers/parsers. But I feel like we should have those things.

With that in mind, here's some design:

A new URI source plugin.
The parser can define a default fetcher. For CSVs that should be something akin to file_copy since file objects require a local file.
The current CSV class code becomes the parser.
Other possible fetchers are http and file (file doesn't copy, it just uses php file streams).
The parser would mask the presence of the fetcher as much as possible. But would pass along config options to the fetcher.
The fetcher interface in migrate_plus is pretty messy right now and assumes an HTTP resource. I think it should be less restrictive and return a \SplFileObject of some sort. Then if the parser needs to read in a file file into a json or xml parser, use $file->fread($file->getSize())

Comment #37

heddn

English

Nicaragua

commented 5 April 2018 at 17:44

Here's a summary of a conversation in Slack w/ @mglaman and @phenaproxima:

phenaproxima: we should have fetchers but not add any extra overhead or complexity from fetchers.
mglaman: having local files is enough for commerce, we used something outside of drupal to fetch the files local
heddn: leave it to something outside of Drupal to retrieve the files locally. Or have contrib build that feature.

So, I'm going to suggest that having fetchers, while nice if easy... isn't critical. It isn't super easy. So let's implement without them and leave it to contrib or follow-ups in core if we find they are super critical. My $.02: fetchers won't become critical, because source csv in contrib became a top 100 module without it and it wasn't a common support request in the issue queue.

Comment #38

heddn

English

Nicaragua

commented 5 April 2018 at 18:29

Status	File	Size
new	2931739-38.patch	44.8 KB
new	interdiff_35-38.txt	33.63 KB

9 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-20.patch	24.79 KB
hidden	2931739-23.patch	24.8 KB
hidden	2931739-27.patch	24.81 KB
hidden	interdiff-2931739-23-27.txt	536 bytes
hidden	2931739-29.patch	24.75 KB
hidden	interdiff-2931739-27-29.txt	1.71 KB
hidden	2931739-33.patch	24.78 KB
hidden	2931739-35.patch	24.78 KB
hidden	interdiff-2931739-33-35.txt	471 bytes

So, in the last few weeks a fair number of patches landed upstream that added test coverage and various fixes. Here's an update for all them.

Note, I removed the schema yml, because that was only there for migrate plus integration.

Next steps is to review the doxygen and see if things fail on the test bot. Doxygen, because I think its a little lacking.

Comment #39

5 April 2018 at 19:39

Status:

Needs review

» Needs work

The last submitted patch, 38: 2931739-38.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #40

heddn

English

Nicaragua

commented 5 April 2018 at 20:58

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-40.patch	47.89 KB
new	interdiff_38-40.txt	10.16 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-38.patch	44.8 KB
hidden	interdiff_35-38.txt	33.63 KB

OK, doxygen added. PHPCS stuff fixed and hopefully test failures too?

Except for the 80 char column wrapping on the doxygen. I didn't do that yet while we are discussing the language. I find its really hard to wordsmith when I'm constantly fixing the 80 chars on comments. Let's land on good docs, then do the line wrap once.

Comment #41

heddn

English

Nicaragua

commented 5 April 2018 at 21:06

Status:

Needs review

» Needs work

Status	File	Size
new	2931739-41.patch	47.9 KB
new	interdiff_40-41.txt	551 bytes

2 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-40.patch	47.89 KB
hidden	interdiff_38-40.txt	10.16 KB

Fixing a copy/paste.

Comment #42

dww

we/he/they

commented 5 April 2018 at 22:08

I love migrate_source_csv. I don't mind installing it from contrib, but if we want to move it into core for core's benefit, that's fine with me.

@heddn asked for comments/wordsmithing on the phpdoc comment for the CSV source class. Here goes:

+ * - header_row_count: (optional) Signifies the number of rows of data in a file until the header row.

... "Defaults to 0." ? Don't we usually say what happens when you don't define something?

Also, "until the header row" is wrong. Maybe this is clearer:

" - header_row_count: (optional) The number of rows of header information in the CSV file before the first data row."

+ * - keys: (optional)

Uhh, is that really optional? I remember migrations blowing up spectacularly if you don't define that correctly.

+ * - column_names: (optional) Numeric key 0-based index of the columns in source CSV file.

~~Ditto about "optional". If you don't define this, you have no way to access any of the source values, right? Seems this is required.~~
Sorry, I didn't realize defining a header_row gives you this automatically! Learn something new every day. We should probably say so right here, not just if you closely read the examples.

Furthermore, the intended input for it is something like this:

  column_names:
    -
      id: Identifier
    -
      first_name: 'First Name'
    -
      last_name: 'Last Name'
    -
      email: 'Email Address'
    -
      ip_address: 'IP Address'
    -
      date_of_birth: 'Date of Birth'

So it's not just a "Numeric key 0-based index of the columns". It's an array with numeric keys starting at 0 where the values are column machine name => column human-readable name associations. Not sure the most concise/elegant way to write that, but I it seems crucial to somehow explain it's expecting a nested array. We see it in the examples a little further down, but we should be clear up here, too.

Furthermore, Do the human readable names *ever* get used anywhere? I haven't see it, but maybe I haven't triggered the appropriate error condition where those would be visible.

Anyway, how about this?

" - column_names: (optional) Index of the columns in the source CSV file. If the file defines a header row, the values in the header row are used as the machine name of each column. If defined explicitly in the migration configuration, each entry in this array must have a numeric key, starting at 0, and the value should be a mapping of machine name to human-readable name for that column.

Or something. :/ Sorry. This is a bit confusing to document clearly.

+ * - delimiter: (optional) The field delimiter (one character only).

Defaults to a comma ','.

Side note: Duh! ;)

Okay, you get the idea. Ditto all the other optional values...

+ * Example:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   header_row_count: 1
+ *   keys:
+ *     - id
+ *
+ * # my_file.csv
+ * id,country
+ * 1,United States
+ * 2,Nicaragua
+ * 3,Spain
+ * @endcode
+ *
+ * This example configures the source plugin with minimal options.
+ *
+ * Description of options:
+ *   path: The file path to the CSV file.
+ *   header_row_count: This is the number of lines into the CSV to find the row that is the "header". If a header row is configured, then setting column_names is not required as the header names become the property names.
+ *   keys: The source property(ies) that uniquely identify the source. It defaults to a string/varchar database column.

Took me a second to realize all the text at the bottom was related to this example, not the next one.

Also: "It defaults to a string/varchar database column." huh?
Oh wow, I've never seen this, either: "Here we define the database schema to be a huge text column." We should fix the docs for "keys" above to talk about this schema stuff, not buried in these examples. I didn't realize that plumbing existed.

Maybe something like this:

+ * Simple example with minimal options:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   header_row_count: 1
+ *   keys:
+ *     - id
+ *
+ * # my_file.csv
+ * id,country
+ * 1,United States
+ * 2,Nicaragua
+ * 3,Spain
+ * @endcode
+ *
+ * Description of options:
+ *   path: The file path to the CSV file.
+ *   header_row_count: Set to 1 to indicate a header row in the CSV file that names the columns. Since there's a header row, setting column_names is not required as the header names become the property names.
+ *   keys: The source property that uniquely identifies each row.

Here's a proposal for the next example block:

+ * Advanced example with most options configured:
+ *
+ * @code
+ * source:
+ *   plugin: csv
+ *   path: /tmp/my_file.csv
+ *   keys:
+ *     id:
+ *       type: text
+ *       size: big
+ *   column_names:
+ *     -
+ *       id: Identifier
+ *     -
+ *       country: 'Country'
+ *   file_class: 'Drupal\example\MyCSVFileObject'
+ *   delimiter: '|'
+ *   enclosure: '%'
+ *   escape: '`'
+ *   file_flags: 14
+ *
+ * # my_file.csv
+ * %one%|%United States%
+ * %two%|%Nicaragua%
+ * %three%|%Spain%
+ * @endcode
+ *
+ * Description of options:
+ *   keys: The source property(ies) that uniquely identify the source. Here we define the database schema to be a huge text column.
+ *   column_names: These become the names of the source properties. It is needed because the CSV file has no header information and we therefore could not define header_row_count.
+ *   file_class: The file object is overridable.
+ *   delimiter, enclosure, escape  are all standard arguments to the SplFileObject to override default CSV reading functionality. @see http://php.net/manual/en/splfileobject.setcsvcontrol.php.
+ *   file_flags is a bitmask. In this example it does not break each row at a new line because it does not contain \SplFileObject::DROP_NEW_LINE. @see http://php.net/manual/en/splfileobject.setflags.php
+ *

I removed the bogus header row from the example my_file.csv here, since that would break the example migration, and it's just confusing.

Also, it seems valuable to have the advanced example show something with multiple columns as keys, since that'd be a more realistic example.

Meta: I should have been editing all this in emacs and posting a new patch, not trying to write it here. :/

Okay, I'm going to stop here for now and check back in with @heddn if this is actually helpful...

Comment #43

dww

we/he/they

commented 5 April 2018 at 22:22

Status	File	Size
new	2931739-43.migrate_source_csv.patch	48.17 KB
new	2931739-43.interdiff.txt	4.27 KB

Sorry, here's #42 as a patch and an interdiff. Much better.

Comment #44

jofitz

he/him

commented 6 April 2018 at 11:04

Status:

Needs work

» Needs review

Setting to Needs Review to run the testbot.

Comment #45

6 April 2018 at 11:35

Status:

Needs review

» Needs work

The last submitted patch, 43: 2931739-43.migrate_source_csv.patch, failed testing. View results

Comment #46

jofitz

he/him

commented 6 April 2018 at 14:51

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-46.patch	48.13 KB
new	interdiff-43-46.txt	3.22 KB

Corrected coding standards errors.
Corrected typo that was causing CSVSourceYieldTest to fail.
Unable to replicate the other 2 test failures - hoping the other changes might work some magic.

Comment #47

heddn

English

Nicaragua

commented 6 April 2018 at 15:39

Status	File	Size
new	2931739-47.patch	48.41 KB
new	interdiff_46-47.txt	5.49 KB

6 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-41.patch	47.9 KB
hidden	interdiff_40-41.txt	551 bytes
hidden	2931739-43.migrate_source_csv.patch	48.17 KB
hidden	2931739-43.interdiff.txt	4.27 KB
hidden	2931739-46.patch	48.13 KB
hidden	interdiff-43-46.txt	3.22 KB

Improvements on docs.

Comment #48

heddn

English

Nicaragua

commented 6 April 2018 at 15:55

Status	File	Size
new	intrdiff_47-48.txt	5.85 KB
new	2931739-48.patch	48.75 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-47.patch	48.41 KB
hidden	interdiff_46-47.txt	5.49 KB

And a few more improvements. We're close enough, I formatted them to 80 chars.

Comment #49

6 April 2018 at 16:27

The last submitted patch, 46: 2931739-46.patch, failed testing. View results

Comment #50

dww

we/he/they

commented 6 April 2018 at 16:28

Status	File	Size
new	2931739-49.migrate_source_csv.patch	49.1 KB
new	2931739-49.interdiff.txt	5.2 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	intrdiff_47-48.txt	5.85 KB
hidden	2931739-48.patch	48.75 KB

Even more better PHPDocs. ;)

Note: I added a @todo about file_class:

*   @todo We should have an interface that this class must implement.

Not sure if we should do that right now as part of this patch, or handle it in a follow-up. But it seems that if we let people declare their own class, we should enforce an interface their class must implement if it's going to work for this.

Comment #51

6 April 2018 at 17:30

The last submitted patch, 48: 2931739-48.patch, failed testing. View results

Comment #52

6 April 2018 at 17:31

The last submitted patch, 47: 2931739-47.patch, failed testing. View results

Comment #53

6 April 2018 at 17:31

The last submitted patch, 50: 2931739-49.migrate_source_csv.patch, failed testing. View results

Comment #54

heddn

English

Nicaragua

commented 6 April 2018 at 17:32

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -16,27 +16,31 @@
+ *   the machine names of each column. If defined explicitly in the migration

Any values explicitly defined in column_names will override...

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -16,27 +16,31 @@
+ *   alternative CSV file reader. This class should extend \SplFile.

For now, it really should extend CSVFileObject.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -16,27 +16,31 @@
+ * - delimiter: (optional) The field delimiter (one character only). Defaults
+ *   to a comma (,).

Why this change?

Comment #55

dww

we/he/they

commented 6 April 2018 at 19:06

1. Sure.

2. Sounds good (for now). Thoughts on if we want to provide (and enforce) an interface as part of this patch, or move that to a followup?

3. Because of 80 character wrapping.

Cheers,
-Derek

Comment #56

heddn

English

Nicaragua

commented 6 April 2018 at 19:19

Status	File	Size
new	interdiff_50-55.txt	12.51 KB
new	2931739-55.patch	49.71 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-49.migrate_source_csv.patch	49.1 KB
hidden	2931739-49.interdiff.txt	5.2 KB

Hopefully this fixes up the last few tests. More docs tweaks.

Comment #57

6 April 2018 at 20:11

Status:

Needs review

» Needs work

The last submitted patch, 56: 2931739-55.patch, failed testing. View results

Comment #58

cosmicdreams commented 6 April 2018 at 20:21

OK, to do a code standards pass? Couple things I'm seeing.

Comment #59

heddn

English

Nicaragua

commented 6 April 2018 at 22:25

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff_55-59.txt	1.45 KB
new	2931739-59.patch	50.76 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_50-55.txt	12.51 KB
hidden	2931739-55.patch	49.71 KB

Oops, let's add the interface too. This should be ready for any types of review.

Comment #60

dww

we/he/they

commented 6 April 2018 at 22:54

Status:

Needs review

» Needs work

Great, thanks for adding the interface!

Now we need:

a) CSVFileObject should say it implements it.

b) The docs for file_class in CSV.php should reference it.

c) [optional] Something (?) should enforce that whatever you put in file_class implements it. There was talk of the CSV constructor throwing an exception.

Comment #61

dww

we/he/they

commented 6 April 2018 at 23:41

Assigned:

Unassigned

» dww

I'm going to work on my list from #60, and maybe fix the test failures, too. Stay tuned.

Comment #62

dww

we/he/they

commented 7 April 2018 at 00:01

Assigned:	dww	» Unassigned
Status:	Needs work	» Needs review

Status	File	Size
new	2931739-62.patch	50.9 KB
new	2931739-interdiff-59_62.txt	2.84 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_55-59.txt	1.45 KB
hidden	2931739-59.patch	50.76 KB

Whoops, somehow I missed that A and C were already done when I skimmed the interdiff. Sorry about that. ;) Edit: Oh, because those changes were not included in the interdiff.

So here's a patch for B.

I briefly looked at the tests, and it's not immediately obvious why they're failing. :/ My local setup isn't currently running tests, so I'll leave that for someone else to sort out.

@heddn: Can you check your editor wrapping settings? I'm working in an emacs window exactly 80 chars wide, and your stuff keeps extending off the screen. That's why the interdiff is larger than it needs to be, since I keep re-fixing the 80 char wrapping. :/ Please either fix your editor's wrapping, or turn it off so it preserves what I'm uploading.

Thanks,
-Derek

Comment #63

dww

we/he/they

commented 7 April 2018 at 00:10

p.s. @cosmicdreams: Sorry I missed your comment. I'm done with this for now. I'll assign to myself if I'm working on it again. Meanwhile, feel free to assign this to yourself if you want to work on something. Thanks.

Comment #64

7 April 2018 at 01:50

The last submitted patch, 62: 2931739-62.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #65

dww

we/he/they

commented 7 April 2018 at 02:09

Status	File	Size
new	2931739-65.patch	50.89 KB
new	2931739-interdiff-59_65.txt	1.12 KB

Ugh, sorry for the noise. In Slack @heddn convinced me my setup was wonky. I double-checked, and lo, my fill-column was actually set to 78.

Here's the fixes from #62, wrapped @ 80, not 78.

Still not sure WTF is causing those tests to fail. I haven't looked. No sense having the bot re-test this.

Comment #66

heddn

English

Nicaragua

commented 7 April 2018 at 02:36

Status	File	Size
new	2931739-66.patch	50.93 KB
new	interdiff_65-66.txt	3.01 KB

Thank goodness for tests. This should fix the final failures. Any chance we an upgrade our doctrine annotation parsing any time soon to get rid of this ugly work around?

Comment #67

quietone commented 9 April 2018 at 11:10

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,103 @@
+ * - assume CSV format
+ * - skip header rows on rewind()
+ * - address columns by header row name instead of index

Pretty sure these are sentences, and need capitalization and a full stop.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,321 @@
+ *   the file defines a header row, the values in the header row are used as

s/defines/includes/ ?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,321 @@
+ * Simple example with minimal options:

No need for Simple and Advanced in the next example. Just 'Example with minimal options' says it all.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,321 @@
+ *   - path: The file path to the CSV file.
```
The description of path is verbatim from above and not adding value, in my opinion. I'd prefer a paragraph format explaining what is happening here.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,321 @@
+ * Advanced example with most options configured:

Advanced can be removed.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,321 @@
+ * Description of options:

As before, an explanation versus description would be better, I think.

Comment #68

rakesh.gectcr

he/him

English

Manchester

commented 9 April 2018 at 12:14

Status:

Needs review

» Needs work

Comment #69

heddn

English

Nicaragua

commented 9 April 2018 at 13:59

Assigned:

Unassigned

» heddn

Comment #70

heddn

English

Nicaragua

commented 9 April 2018 at 14:18

Assigned:	heddn	» Unassigned
Status:	Needs work	» Needs review

Status	File	Size
new	2931739-70.patch	50.89 KB
new	interdiff_66-70.txt	4.03 KB

6 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-62.patch	50.9 KB
hidden	2931739-interdiff-59_62.txt	2.84 KB
hidden	2931739-65.patch	50.89 KB
hidden	2931739-interdiff-59_65.txt	1.12 KB
hidden	2931739-66.patch	50.93 KB
hidden	interdiff_65-66.txt	3.01 KB

Comment #71

jofitz

he/him

commented 10 April 2018 at 10:48

Status:

Needs review

» Needs work

The patch in #70 does not address #67.2 (this may or may not be intentional).

Comment #72

dww

we/he/they

commented 10 April 2018 at 16:16

Status:

Needs work

» Needs review

67.2 was a ? ~~comment~~ review concern, and I don't think it helps improve the meaning of the code comment. A file has to define values in its header row for it to matter and be useful. It's not like you simply include "#header row" in your file, you have to define what it is by naming all your columns. I dunno, we're seriously nitpicking and bike-shedding now. ;)

Generally, I think the paragraph version makes these comments a bit less useful, since it forces people (with short attention spans) to read an entire block of text if all they want to know is why a specific key is defined. I agree we can skip comments where there's nothing added over the original doc comments, but I'd personally rather see these still in the style of #66, not #70.

That said, I don't want to post another patch if it's going to be a pushing match. ;) If everyone else thinks a paragraph is better, I'll let it go.

Cheers,
-Derek

[edited for clarity]

Comment #73

quietone commented 10 April 2018 at 21:26

Status:

Needs review

» Needs work

There was quite a bit of effort in getting the source, process and destinations plugins in core commented with examples and explanations and to be in a consistent style. Since this is going into core, it too should use the same style. Setting to NW because of that.

Personally, I do prefer lists and I often think in lists. And yet, I do find that having the plain language explanation can be very helpful at times to clarify a point that for some reason I didn't get in the details of the plugin options.

Comment #74

dww

we/he/they

commented 10 April 2018 at 21:57

Status:

Needs work

» Needs review

@quietone: Please look at the patch and/or interdiff. It's already in the style you asked for.

The only reason this was "needs work" was a (IMHO unhelpful) change for s/defines/includes/. This needs review, not work.

Comment #75

quietone commented 17 April 2018 at 09:47

@dww, sorry my last comment does seem out of sequence. Anyway, my concerns from #67 are addressed, including 67.2. So that is done.

What is next here? I don't have time to do a full review but I suggest that the discussion of using a fetcher be put in the IS because someone is going to ask about it.

Comment #76

mglaman

English

WI, USA

commented 17 April 2018 at 14:12

I suggest that the discussion of using a fetcher be put in the IS because someone is going to ask about it.

When I read Fetcher I think of a remote source. Which seems to blow up the scope of this plugin source. The current source plugins do not use a fetcher, correct? Or would the fetcher have a "Local" option which just lets sourcing the file from the local disk? A lot of the comments here have seemed to echo the thought: Keep it simple so we can source local CSVs via core alone as quickly as possible.

Comment #77

dww

we/he/they

commented 17 April 2018 at 15:47

Assigned:	Unassigned	» dww
Issue tags:		+Needs issue summary update

I'm working on a major IS update. Stay tuned.

Comment #78

dww

we/he/they

commented 17 April 2018 at 16:01

Assigned:	dww	» Unassigned
Issue summary:	View changes
Status:	Needs review	» Needs work

First pass. Summarizing the fetcher/parser debate is still todo.
Anyone else want to take a turn with that?

Comment #79

heddn

English

Nicaragua

commented 18 April 2018 at 14:17

Issue summary:	View changes
Status:	Needs work	» Needs review

IS updated to discuss fetchers and follow-up (#2962091: Adopt fetchers/parsers logic for use by source csv plugin) opened to continue that discussion.

Back to NR.

Comment #80

dww

we/he/they

commented 18 April 2018 at 15:41

Issue summary:	View changes
Issue tags:	-Needs issue summary update

Cool, thanks for opening the follow-up. I did a few other minor edits to the summary, and added this as a remaining task:

Decide if we're okay with a straight port (done), or if we need to hash out all the fetcher/parser stuff first, figure out the design, class names, avoid conflicts, possible BC hell, etc.

Generally, I'm fine with getting this in sooner and leaving #2962091: Adopt fetchers/parsers logic for use by source csv plugin postponed for now. However, if we foresee doing the whole fetcher/parser system, "csv" makes a lot more sense as a parser plugin name than it does as a source plugin. Whenever the dust settles on #2962091, are we going to be stuck with some BC hell while we try to rename classes? Or is the vision that the parser plugin namespace will be distinct, and it won't matter, and folks who have "source: plugin: csv" in their migration will continue to be supported somehow? Yes, I understand plans can change once you actually start working on something, but I'd love clarity on why committing this patch, right now, as-is, isn't going to make our lives harder in the future.

Echoing @mglaman in comment #12, perhaps the plugin_id for this should be called "file_csv" for now, instead of parking on "csv" itself?

Regardless, removing 'Needs issue summary update' tag.

Comment #81

heddn

English

Nicaragua

commented 18 April 2018 at 15:56

re #80: What @phenaproxima and I had discussed was something that wouldn't be BC hell. By sticking with the same source plugin name; and a straight port, folks would just need to disable contrib source csv module. Because the plugin in core has the same one-for-one features, even the same name.

Renaming and would cause more issues.

Comment #82

dww

we/he/they

commented 18 April 2018 at 17:12

Sorry, you misunderstand my concern in #80. I already summarized #81 in the edit to the summary I did in #78:

API Changes

...

Existing users of this plugin from contrib are unharmed. The name is the same, but we haven't changed anything about how it works. So, contrib users simply disable migrate_source_csv module before upgrading to core 8.X.Y and none of their migration files have to be changed to continue working.

What I'm talking about in #80 is that N months from now, once this is in core, people like me that are relying on CSV migrations will have happily seen our contrib-driven CSV migrations working with CSV in core without modification. However, at some point, #2962091: Adopt fetchers/parsers logic for use by source csv plugin is going to introduce a slew of new classes, plugin ids, etc. Someone is going to say "The 'csv' id is all wrong for a source plugin! That's really a parser. Elegance++" People like me are going to say: "screw you, my migrations are already working, please don't re-re-break them!" Fight ensues. ;)

That's the situation I'm trying to mitigate by treading a bit more carefully now...

Maybe we can publicly agree, right here, that we're not going to do that in #2962091, commit "ourselves" to working around the issue that 'csv' is already taken as a source plugin id, and whatever happens in there is going to jump through a few more hoops to keep it working. That's all I'm saying.

Are we cool with that? If so, I'm going to copy this text into #2962091 to make it clear whatever refactoring happens there has to keep this working.

If we're not cool with that, please say so now.

Thanks,
-Derek

Comment #83

heddn

English

Nicaragua

commented 18 April 2018 at 17:31

Migrate module is stable. Therefore, once this lands it would have to abide by BC policy according to https://www.drupal.org/core/d8-bc-policy. So, when the rubber meets the road and we want fetchers, can we do it in a BC manner and keep everyone happy? I'm pretty sure we can.

The fetcher would just need to know what parser to use. And this existing CSV parser would just start /also/ being a parser. The fetcher would retrieve the file and put it in the location that the csv plugin asks for it to be. The code that makes all this magic happen might be ugly, but csv and users of these new fetchers wouldn't need to see the ugliness.

Comment #84

dww

we/he/they

commented 18 April 2018 at 20:47

Issue summary:

View changes

Great. You're now on record saying so, both here and in #2962091: Adopt fetchers/parsers logic for use by source csv plugin. ;) Updated both summaries accordingly.

No more remaining tasks, other than probably a thorough review and RTBC from someone who hasn't yet put their hands all over this code...

Thanks!
-Derek

Comment #85

heddn

English

Nicaragua

commented 19 April 2018 at 14:08

Priority:

Normal

» Major

Seeing this is on the roadmap for 8.6 for both migrate and OOB, let's bump to a major.

Comment #86

quietone commented 26 April 2018 at 07:41

Thanks for the updates to the IS, very thorough. I haven't done a full review, mostly because I can't RTBC this anyway since I worked on it in migrate_source_csv.

Comment #87

phenaproxima

he/him

English

Massachusetts

commented 3 May 2018 at 14:56

Status:

Needs review

» Needs work

It's getting there!

```
+++ b/core/modules/migrate/src/CSVFileInterface.php
@@ -0,0 +1,42 @@
+/**
+ * Defines a CSV file interface.
+ */
```
Can we expand this doc comment to explain what constitutes a "CSV file", as defined by this interface? (i.e., it has header row(s) and named columns) If not now, then we need a follow-up to document this better.

+++ b/core/modules/migrate/src/CSVFileInterface.php
@@ -0,0 +1,42 @@
+   * Number of header rows.

Should be "Returns the number of header rows".

+++ b/core/modules/migrate/src/CSVFileInterface.php
@@ -0,0 +1,42 @@
+   *   Get the number of header rows, zero if no header row.

"...zero if there are no header rows."

+++ b/core/modules/migrate/src/CSVFileInterface.php
@@ -0,0 +1,42 @@
+  /**
+   * Number of header rows.
+   *
+   * @param int $header_row_count
+   *   Set the number of header rows, zero if no header row.
+   */
+  public function setHeaderRowCount($header_row_count);

Let's rephrase these similarly.

+++ b/core/modules/migrate/src/CSVFileInterface.php
@@ -0,0 +1,42 @@
+  /**
+   * CSV column names.
+   *
+   * @return array
+   *   Get CSV column names.
+   */
+  public function getColumnNames();
+
+  /**
+   * CSV column names.
+   *
+   * @param array $column_names
+   *   Set CSV column names.
+   */
+  public function setColumnNames(array $column_names);

The doc comments here could also use some fine-tuning and better type hints (string[] rather than array).

```
+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+use Drupal\migrate\CSVFileInterface;
```
I don't think we need this use statement, since CSVFileObject and CSVFileInterface are in the same namespace. Also...should these not be in the Drupal\migrate\Plugin\migrate\source namespace? They are source-specific, after all.
```
+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+/**
+ * Defines a CSV file object.
```
This doxygen isn't very helpful. Can we expand what a "CSV file object" is? Maybe "a wrapper around CSV data in a file"?

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+ * - Address columns by header row name instead of index.

numeric index

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+  /**
+   * The number of rows in the CSV file before the data starts.
+   *
+   * @var integer
+   */
+  protected $headerRowCount = 0;

I wonder if 1 might not be a more sane default here. I imagine that most CSV files have a header row (correct me if I'm wrong), so it's a reasonably safe default assumption.

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+  /**
+   * The human-readable column headers, keyed by column index in the CSV.
+   *
+   * @var array
+   */

The type hint should be string[].

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+    // Necessary to use this approach because SplFileObject doesn't like NULL
+    // arguments passed to it.
+    call_user_func_array(['parent', '__construct'], func_get_args());

How will this prevent NULL arguments from going to SplFileObject?

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+      // Set meaningful keys for the columns mentioned in $this->csvColumns.

$this->csvColumns is not a thing.

+++ b/core/modules/migrate/src/CSVFileObject.php
@@ -0,0 +1,102 @@
+  public function setHeaderRowCount($header_row_count) {

Man, I wish we could use scalar type hints.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * If the CSV file contains non-ASCII characters, make sure it includes a UTF
+ * BOM (Byte Order Marker) so they are interpreted correctly.

Could we cite a link here that explains what a BOM is, and how one might be sure that it is set correctly?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * - path: Path to the source CSV file available from local storage.

Is this an absolute path, or a relative one?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * - keys: Array of column name(s) which represent the columns uniquely
+ *   identifying each record. If declared as a nested array, the keys must be

The first sentence is kinda obtuse. Can we just say: "An array of column names in the CSV file. Can be an array of strings, or an array of arrays. If declared as an array of arrays..."

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * - column_names: (optional) Names of the columns in the source CSV file. If
+ *   the file defines a header row, the values in the header row are used as
+ *   the machine names of each column. Any values explicitly defined in
+ *   column_names will override the values from the header row. Each entry in
+ *   the array must have a numeric key. Numbering starts at 0. Each value should
+ *   be a mapping of machine name to human-readable name.

I think we need to rephrase this, because it's over-explaining the functionality in a very confusing way. If I'm understanding it correctly, column_names is a simple key-value map where the values are the column name in header row, and the keys are the corresponding source row property names. A quick example (no need for code) would also help.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * - file_class: (optional) Full class name that includes namespace for an

Can we rephrase? Maybe something like: "Fully qualified (including the full namespace) name of a class that can read the CSV file."

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * - file_flags: (optional) Bitmask flags for the SplFileObject. Defaults to
+ *   bitmask value of 15.

And what does a bitmask value of 15 mean? We should probably link to external documentation for \SplFileObject to explain further.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * 1,United States

USA #1? =P

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * The path should be a full path to a file in local storage. Remote file

"in local storage" is weird. How about just "a local file"? Also, let's move this to the explanation of the 'path' configuration key.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * streams are not supported. The header_row_count indicates the row in the CSV
+ * file that names the columns. Since there is a header_row_count, setting
+ * column_names is redundant and not done. The header defines the source
+ * property names. The keys configuration designates the source property that
+ * uniquely identifies each row.

I think these sentences are both confusing and redundant -- we pretty much explained all of this already.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ *   column_names:
+ *     -
+ *       id: Identifier
+ *     -
+ *       country: Country

Why is this a nested array?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+ * In this example of keys we override the default database schema to be a large
+ * text column. The column_names define the names of the source properties. It

"Example of keys" is weird. How about: "In this example, we override the default database schema so that the 'id' column will be a large text column."

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+  /**
+   * List of available source fields.
+   *
+   * Keys are the field machine names as used in field mappings, values are
+   * descriptions.
+   *
+   * @var array
+   */
+  protected $fields = [];

Are we using this anywhere?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+  /**
+   * List of key fields, as indexes.
+   *
+   * @var array
+   */
+  protected $keys = [];

Or this?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+    if (empty($this->getConfiguration()['path'])) {
+      throw new MigrateException('You must declare the "path" to the source CSV file in your source settings.');
+    }
+
+    // Key field(s) are required.
+    if (empty($this->getConfiguration()['keys'])) {
+      throw new MigrateException('You must declare "keys" as a unique array of fields in your source settings.');
+    }
+
+    $this->fileClass = $this->getConfiguration()['file_class'];

Let's just call getConfiguration() once, after setConfiguration(), and keep re-using its result.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+    if (!file_exists($this->getConfiguration()['path'])) {
+      throw new InvalidPluginDefinitionException($this->getPluginId(), sprintf('File path (%s) does not exist.', $this->getConfiguration()['path']));
+    }
+    // File handler using header-rows-respecting extension of SPLFileObject.
+    $this->file = new $this->fileClass($this->getConfiguration()['path']);

Let's get the path once so we don't have to keep calling getConfiguration(). It'll be easier to read.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+    $delimiter = $this->getConfiguration()['delimiter'];
+    $enclosure = $this->getConfiguration()['enclosure'];
+    $escape = $this->getConfiguration()['escape'];

Same here. Calling getConfiguration() repeatedly makes the code harder to read.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,318 @@
+  public function getIDs() {

This needs to be getIds(), not getIDs().

Comment #88

dww

we/he/they

commented 3 May 2018 at 18:35

@phenaproxima: Thanks for the excellent and thorough review! I don't have time right now to re-roll this, but a few quick replies for whomever does:

9. I wonder if 1 might not be a more sane default here. I imagine that most CSV files have a header row (correct me if I'm wrong), so it's a reasonably safe default assumption.

Please no to this change:
a) I use/see CSV files all the time without a header row.
b) It defaults to 0 in contrib.
c) If we silently change it and get this wrong for existing migrations, we introduce data loss.

15. Is this an absolute path, or a relative one?

Yes. :( Not sure if this is the right spot, but I wanted to know and see people constantly asking about how to make this a relative path to the directory of the module providing a given CSV migration. From what I can gather, here are all the ways people can/do use this:
a) URI like public://import/csv/example.csv and put the CSV file in your public files dir.
b) Absolute path.
c) Relative path from the site root.
d) https://www.drupal.org/project/system_stream_wrapper or #1308152: Add stream wrappers to access .json files in extensions
e) Implement hook_migration_plugins_alter() along the lines of:

function migrate_source_csv_test_migration_plugins_alter(&$definitions) {
  $definitions['migrate_csv']['source']['path'] = drupal_get_path('module', 'migrate_source_csv_test') . $definitions['migrate_csv']['source']['path'];
}

16. The first sentence is kinda obtuse. Can we just say: "An array of column names in the CSV file. Can be an array of strings, or an array of arrays. If declared as an array of arrays..."

No, because that's not what this knob is doing. It's specifying which of your columns define the unique key for the row. Maybe a single column name, or an array of column names that together define the unique key.

17. I think we need to rephrase this, because it's over-explaining the functionality in a very confusing way. If I'm understanding it correctly, column_names is a simple key-value map where the values are the column name in header row, and the keys are the corresponding source row property names. A quick example (no need for code) would also help.

Yeah, I don't love this comment or how this knob works. But no, you're not exactly understanding how it works (further proof it all needs work). The keys of this knob are always a numeric index to specify which column in the CSV you're talking about. The values are either a string (the property name to refer to column #X with in the rest of your migration) or a nested array, where the key is the property name and the value is the (never used anywhere?) "human-readable names" of each column.

This knob is required if you don't have a header row, since it's the only way to map column numbers to property names.

If you have a header row, you probably don't want to define this, but if you do, whatever you put in here will overwrite whatever was declared in the header row.

Unless the "human-readable" name thingy is A Thing(tm) somewhere I'm not aware of, I'd love to kill that aspect of the code and enforce that this knob always has the same structure. Flat array of integer keys and string property name values.

#20 Yes, let's definitely fix that. It's alphabetically (and morally) last, so it should go to the end of the file. ;)

#23: Because of the confusing and fubar nature of #17.

#25: Not that I know of.

#26: Definitely. It's how we know the migration row key.

Otherwise, yes to everything you said.

Thanks again!
-Derek

Comment #89

phenaproxima

he/him

English

Massachusetts

commented 3 May 2018 at 19:27

Hi @dww, thanks for the equally clear and thorough response! I really like the way you explained some of the configuration options, so for whoever works on this patch next...use the comment in #88 to rework some of the documentation :)

Comment #90

dww

we/he/they

commented 3 May 2018 at 20:21

Re: #89: Thanks. :)

Re: #87.17 + #88.17

If we change the structure of keys to always be a flat array, what happens to existing migrations that are (pointlessly) defining human-readable descriptions? I'd rather this source plugin didn't barf on them with exceptions, forcing them to rewrite any of their working migrations. So maybe we have to leave support for the complex nested thing, but somehow flag it as deprecated? If I knew it was going to be caught and nicely handled by default, maybe there's some kind of StopUsingDeprecatedMigrationSourceSettingsException we could throw? ;) Or something?

Short of any of that, I'd be happy for the public-facing docs to pretend this thing must be a flat array, even if the code actually supports both formats.

Comment #91

mikeryan

he/him

English

Pittsfield, MA, USA

commented 3 May 2018 at 21:09

Unless the "human-readable" name thingy is A Thing(tm) somewhere I'm not aware of, I'd love to kill that aspect of the code and enforce that this knob always has the same structure.

Yes, it is - the descriptions are returned as the values of the source plugin's fields() method (machine names are the keys) and are displayed in migrate_tools by the drush migrate-fields-source command as well as displayed in the UI, allowing you to document fields with something a little more helpful than "id".

Comment #92

jofitz

he/him

commented 14 May 2018 at 12:05

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-92.patch	50.73 KB
new	interdiff-70-92.txt	10.24 KB

Addressed all of @phenaproxima's comment's in #87 except for:
9. Based upon @dww's response in #88.
11.

How will this prevent NULL arguments from going to SplFileObject?

16. Based upon @dww's response in #88.
17. Based upon @dww's response in #88.
23. Based upon @dww's response in #88.

Comment #93

14 May 2018 at 13:07

Status:

Needs review

» Needs work

The last submitted patch, 92: 2931739-92.patch, failed testing. View results

Comment #94

dww

we/he/they

commented 14 May 2018 at 17:29

Re: #87.25: As @mikeryan said in #91, that's A Thing(tm). Behold:

interface MigrateSourceInterface extends \Countable, \Iterator, PluginInspectionInterface {

  /**
   * Returns available fields on the source.
   *
   * @return array
   *   Available fields in the source, keys are the field machine names as used
   *   in field mappings, values are descriptions.
   */
  public function fields();

So yeah, #92's patch and interdiff is incorrectly removing that and we need to keep it.

Test failures look like disagreement over the location of the CSVFileObject in the namespace. I agree it could live in \Drupal\migrate\Plugin\migrate\source instead of directly in \Drupal\migrate, but we have to get all parts of the code and tests to agree on that. ;)

Cheers,
-Derek

Comment #95

jofitz

he/him

commented 14 May 2018 at 19:49

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-94.patch	50.73 KB

Fixed the test failures.

Comment #96

heddn

English

Nicaragua

commented 16 May 2018 at 17:05

Status	File	Size
new	interdiff_95-96.txt	3.34 KB
new	2931739-96.patch	50.44 KB

5 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-70.patch	50.89 KB
hidden	interdiff_66-70.txt	4.03 KB
hidden	2931739-92.patch	50.73 KB
hidden	interdiff-70-92.txt	10.24 KB
hidden	2931739-94.patch	50.73 KB

All feedback address now from #87. Between the awesome work from dww and Jo Fitzgerald. Along with a small nit I have with keeping consistent variable naming. This is ready again for another round of reviews.

For #87.11, this was done when originally we were setting the flags in the file constructor. That was removed and the constructor just left. I've cleaned that up in this latest patch.

Comment #97

heddn

English

Nicaragua

commented 16 May 2018 at 18:47

Just noticed this:

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ * 1,Nicaragua
+ * 2,Spain
+ * 3,United States
...
+ * %really long string that makes this unique%|%United States%
+ * %even longer really long string that makes this unique%|%Nicaragua%
+ * %even more longer really long string that makes this unique%|%Spain%

Nit: let's order these both by country alpha. Leaving at NR for other feedback.

Comment #98

smaz

English

England, UK

commented 24 May 2018 at 15:22

diff --git a/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv b/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv
new file mode 100644

Do we need a .htaccess here to prevent direct web access to any CSV files used for testing? We have one in the Umami profile for the default content:
https://cgit.drupalcode.org/drupal/tree/core/profiles/demo_umami/modules...

Comment #99

mglaman

English

WI, USA

commented 24 May 2018 at 15:38

RE: #98. What if the server is nginx? It won't make a difference at that point. I don't see the benefit, unless there's some history in the thread which added it for Umami.

Comment #100

phenaproxima

he/him

English

Massachusetts

commented 24 May 2018 at 15:42

Preventing access to the CSV file(s) is not in scope for this issue. It's worth looking into and talking about, at least, but let's open a follow-up issue for that if it's needed.

Comment #101

eli-t

he/him

English

Manchester

commented 9 June 2018 at 08:23

Linking #2941488: Add a .htaccess file to core/profiles/demo_umami/modules/demo_umami_content/default_content/images to prevent direct downloading of images for context, where .htaccess was added for Umami images. Note this was a follow up issue to the issue that introduced the files being protected.

Comment #102

quietone commented 11 June 2018 at 12:36

Added a follow up #2978827: Add a .htaccess file to prevent downloading of csv test files to decide on adding a .htaccess file.

Comment #103

phenaproxima

he/him

English

Massachusetts

commented 15 June 2018 at 23:50

Status:

Needs review

» Needs work

Looks good and pretty logical, but I found a number of relatively minor things. Mostly documentation stuff.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   the CSV file before the first data row. If set, the values in this row are
+ *   used as the source property names for each column in the file. If there are

Shouldn't this be "...source property names for each data row in the file"?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   multiple header rows, only the final row before the data is used for column

Can we rephrase this: "If there are multiple header rows, the column names are taken from the final one."

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   the file defines a header row, the values in the header row are used as
+ *   the machine names of each column. Any values explicitly defined in

There has been no previous mention of machine names. We need to explain what a machine name is in this context, and how it's used.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   namespace) that can read the CSV file. The class must extend \SplFileObject
+ *   and implement \Drupal\migrate\CSVFileInterface.
```
I wonder -- given the way this works, do we actually need CsvFileInterface? Why not just create a new class which extends \SplFileObject with the appropriate methods, and enforce that file_class is an instance/descendant of that?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ * - file_flags: (optional) Bitmask flags for the SplFileObject. Defaults to
+ *   bitmask value of 15.

'15' tells me nothing. Are there constants we could use here? Like

FLAG_ONE | FLAG_TWO<code>?
</li>

<li>
<code>
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   keys:
+ *     - id

Let's add a comment explaining that this will create a single-column text key, using the value from the 'id' column of the CSV file.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   enclosure: '%'
```
Using % as an enclosure is really jarring, and I can't imagine it'd be very common. Let's just use a single quote instead.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   escape: '`'

Can we add a usage of this character to the example data, so as to illustrate what it's for?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ *   file_flags: 14

We need a comment explaining what this flag will do.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+ * reading functionality. Configuration option file_flags is a bitmask. In this

We already explained that file_flags is a bitmask, so we can remove this sentence.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    if (empty($plugin_configuration['path'])) {
+      throw new MigrateException('You must declare the "path" to the source CSV file in your source settings.');
+    }

Why not use file_exists() here?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    if (empty($plugin_configuration['keys'])) {
+      throw new MigrateException('You must declare "keys" as a unique array of fields in your source settings.');
+    }

This should check if $plugin_configuration['keys'] is an empty array.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    }
+
+  }

Nit: There's an extra blank line here.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    $path = $this->getConfiguration()['path'];
```
Since we do this at least twice, how about we instantiate the file reader class in __construct(), then call a method on it to get the path instead?
```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+  /**
+   * @return \SplFileObject
+   */
+  protected function setupFile() {
```
Doc comment needs to be expanded. Alternatively, if we move the logic from initializeIterator() into the constructor, we can just move all of this method's logic into initializeIterator(), where it makes more sense (IMHO).

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+      else {
+        $ids[$value]['type'] = 'string';
+      }

The doc comment says that this will default to text, not string. Let's make it consistent.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    if (!$this->file) {
+      $this->initializeIterator();
+    }
+    foreach ($this->file->getColumnNames() as $column) {

We can reduce this to foreach ($this->initializeIterator()->getColumnNames() as $column).

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+    $fields = $this->getConfiguration()['fields'] + $fields;
+
+    return $fields;

We don't need to reassign $fields, just return $this->getConfiguration()['fields'] + $fields.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+  /**
+   * {@inheritdoc}
+   */
+  public function getConfiguration() {
+    return $this->configuration;
+  }

Do we not inherit this from SourcePluginBase() already?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+  /**
+   * Gets the file object.
+   *
+   * @return \SplFileObject
+   *   The file object.
+   */
+  public function getFile() {
+    return $this->file;
+  }

I'm not sure we need this method. Why can't calling code simply use this plugin as an iterator, like all other source plugins? If it's to maintain BC, we should deprecate this right now.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+  /**
+   * {@inheritdoc}
+   */
+  public function setConfiguration(array $configuration) {
+    // We must preserve integer keys for column_name mapping.
+    $this->configuration = NestedArray::mergeDeepArray([$this->defaultConfiguration(), $configuration], TRUE);
+  }

Correct me if I'm wrong, but can't we use ConfigurablePluginTrait for this?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+      'file_class' => 'Drupal\migrate\Plugin\migrate\source\CSVFileObject',

We can use the ::class form here.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,305 @@
+  /**
+   * {@inheritdoc}
+   */
+  final public function calculateDependencies() {
+    return [];
+  }

Making this final is a good idea!

Comment #104

heddn

English

Nicaragua

commented 16 June 2018 at 00:38

I'll see when I can respond. But a quick thought is that the second half of this review should result in better code comments. Not just blindly changing the code. I'm not able to go into more detail.

Comment #105

smaz

English

England, UK

commented 4 July 2018 at 19:13

@heddn ping on your above comment - people are working on migrate + Umami at the moment at devdays so we might be able to get some help on this.

Comment #106

heddn

English

Nicaragua

commented 4 July 2018 at 20:00

Try to answer what you can. But 103.14, 19, 21 (as examples) are there for a reason and we need to just add comments why. I wish my life weren't so busy right now or I'd go into more detail.

Comment #107

heddn

English

Nicaragua

commented 17 July 2018 at 14:32

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff_96-107.txt	8.59 KB
new	2931739-107.patch	50.1 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_95-96.txt	3.34 KB
hidden	2931739-96.patch	50.44 KB

I think I hit most of the above points of feedback. Exceptions are as follows.

103.8: There is already a comment about what this does.
103.18: no, it doesn't implement ConfigurablePluginInterface. That is issue #2937177: Migrate plugin base classes should implement ConfigurablePluginInterface.
103.19: because of #2954413: Add getter for file.
103.20: ConfigurablePluginTrait doesn't exist; maybe you are thinking of the above ^ issue?

Comment #108

17 July 2018 at 15:19

Status:

Needs review

» Needs work

The last submitted patch, 107: 2931739-107.patch, failed testing. View results

Comment #109

17 July 2018 at 15:19

Version:

8.6.x-dev

» 8.7.x-dev

Drupal 8.6.0-alpha1 will be released the week of July 16, 2018, which means new developments and disruptive changes should now be targeted against the 8.7.x-dev branch. For more information see the Drupal 8 minor version schedule and the Allowed changes during the Drupal 8 release cycle.

Comment #110

phenaproxima

he/him

English

Massachusetts

commented 14 August 2018 at 19:33

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ * - keys: Array of column name(s) which represent the columns uniquely

"name(s)" should be "names". If there's one item in that array, so be it; but the (s) is jarring here.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ *   column names and the values should be sub-arrays that define the database
+ *   column schema to store values for that key. Otherwise, defaults to a schema
```
We start talking about a database schema -- but that only applies to ID mapping with the Sql map plugin. Which is the default, of course; but if you don't know that, then this tickles the WTF reflex. So we should probably link to the Sql class documentation for reference.

Also, instead of "defaults to a schema of 'string'", can we just say "By default, each key is stored as a string"?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ *   the source property names names of each row. Any values explicitly defined

"names" is duplicated.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ *   the array must have a numeric key. Numbering starts at 0. Each value should
```
"...a numeric key, starting at 0". Also, why the hell would this be an array of arrays? It seems like column_names should be a simple hash (key => value); having it be a nested array is really confusing. So, if we can avoid that, so much the better.
```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ * - file_class: (optional) Fully qualified name of a class (including the full
+ *   namespace) that can read the CSV file. The class must extend \SplFileObject
+ *   and implement \Drupal\migrate\CSVFileInterface.
```
I'm thinking we should remove this option entirely. I can't honestly see too many people needing to extend the file class, or change it. (But it's worth grepping contrib to prove me either right or wrong on that point.) If they do need to extend it, they can also extend this source and implement their own custom logic around that. Also, CSVFileInterface is gone from this patch (and rightfully so), so we should remove all mention of it.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ * @see http://php.net/manual/en/class.splfileobject.php.

Let's move this @see up to where the other one is.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+ * @see http://php.net/manual/en/splfileobject.setcsvcontrol.php.
+ * @see http://php.net/manual/en/splfileobject.setflags.php.

Ditto.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    // Path is required. The file might not exist at this path at plugin
+    // discovery, so we do not do a file_exists check here.

The constructor isn't called during discovery, so I'm not sure why this is relevant.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    if ($this->file) {
+      return $this->file;
+    }
```
Why do we need to cache the file object? Isn't it the whole point of initializeIterator() that we return a fully prepared iterator, and let the calling code deal with it?

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+      throw new InvalidPluginDefinitionException($this->getPluginId(), sprintf('File path (%s) does not exist.', $path));

This is not a problem with the plugin definition; it's a problem with configuration. I think this should be InvalidArgumentException.

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    $this->file = new $this->fileClass($path);
```
I think that the creation of the file object should be done in a protected method, so that ninjas can override it if needed. This would also allow us to remove the arcane file_class option.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+  protected function setupFile() {

I'm not sure why this method exists; it seems like its contents should be part of initializeIterator().

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    $delimiter = $plugin_configuration['delimiter'];
+    $enclosure = $plugin_configuration['enclosure'];
+    $escape = $plugin_configuration['escape'];
+    $this->file->setCsvControl($delimiter, $enclosure, $escape);

I think this can all be one line.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+      if (is_array($value)) {
+        $ids[$delta] = $value;
+      }
+      else {
+        $ids[$value]['type'] = 'string';
+      }

It's really weird to me that we're setting $ids[$delta] or $ids['value'] depending on what $value is. Shouldn't $ids always be keyed by the name of the key?

```
+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    foreach ($this->initializeIterator()->getColumnNames() as $column) {
```
We shouldn't be calling initializeIterator() just to get the column names; it forces us to have $this->file and make initializeIterator() mutate its state, which is awkward. Let's just add a new protected method to derive the column names without needing to touch the iterator.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+    // Any caller-specified fields with the same names as extracted fields will
+    // override them; any others will be added.
+    return $this->getConfiguration()['fields'] + $fields;

Wat? We're relying on undocumented configuration properties here ('fields' isn't mentioned in the doc comment).

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,309 @@
+  /**
+   * Gets the file object.
+   *
+   * @return \SplFileObject
+   *   The file object.
+   */
+  public function getFile() {
+    return $this->file;
+  }

What is the purpose of this method? I don't see it being used for anything. We should remove it unless there's a good reason to keep it around.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSVFileObject.php
@@ -0,0 +1,103 @@
+  /**
+   * Sets the CSV column names.
+   *
+   * @param string[] $column_names
+   *   Sets the CSV column names.
+   */
+  public function setColumnNames(array $column_names) {

There's more to say here. Isn't $column_names supposed to be a key => value hash?

+++ b/core/modules/migrate/tests/modules/csv_source_yield_test/src/Plugin/migrate/source/YieldRows.php
@@ -0,0 +1,45 @@
+/**
+ * Yields each image and sku.
+ *
+ * @MigrateSource(
+ *   id = "yield_rows"
+ * )
+ */
+class YieldRows extends CSV {

This is neat, but why do we have it? I don't understand what we're testing here.

+++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/artifacts/people.csv
@@ -0,0 +1,16 @@
+id,first_name,last_name,email,ip_address,date_of_birth
+1,Justin,Dean,jdean0@example.com,60.242.130.40,01/05/1955
+2,Joan,Jordan,jjordan1@example.com,137.230.209.171,10/14/1958
+3,William,Ray,wray2@example.com,4.75.251.71,08/13/1962
+4,Jack,Collins,jcollins3@example.com,118.241.243.64,08/16/1977
+5,Jean,Moreno,jmoreno4@example.com,12.24.215.20,10/18/1940
+6,Dennis,Mitchell,dmitchell5@example.com,185.24.131.116,08/21/1999
+7,Harry,West,hwest6@example.com,101.74.110.171,10/05/1987
+8,Rebecca,Hunt,rhunt7@example.com,253.107.6.23,04/21/1922
+9,Rose,Rogers,rrogers8@example.com,21.2.126.228,08/14/2005
+10,Juan,Walker,jwalker9@example.com,192.118.77.225,03/09/1958
+11,Lois,Price,lpricea@example.com,231.185.100.19,04/08/1944
+12,Patricia,Bell,pbellb@example.com,226.2.254.94,02/26/1950
+13,Gerald,Kelly,gkellyc@example.com,31.204.2.163,09/25/1948
+14,Kimberly,Jackson,kjacksond@example.com,19.187.65.116,06/21/1926
+15,Jason,Mason,jmasone@example.com,225.129.68.203,10/01/1950

Ideally, this file would include data that would exercise the delimiter, enclosure, and escape characters.

+++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/field.storage.node.field_last_name.yml
--- /dev/null
+++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml

+++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml
+++ b/core/modules/migrate/tests/modules/migrate_source_csv_test/config/install/node.type.people.yml
@@ -0,0 +1,11 @@

@@ -0,0 +1,11 @@
+langcode: en
+status: true
+dependencies: {}
+third_party_settings: {}
+name: People
+type: people
+description: ''
+help: ''
+new_revision: true
+preview_mode: 1
+display_submitted: true

Why not import the data as user accounts instead? Given the data we're testing with, that's a more realistic scenario.

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/MigrationProvidersExistTest.php
@@ -36,6 +36,10 @@ public function testProvidersExist() {
+      // CSV source intentionally does not have a source module.
+      if ($id == 'csv') {
+        continue;
+      }

Rather than make a bizarre exception here, why not just set the source module to "migrate"?

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVTest.php
@@ -0,0 +1,59 @@
+  public function testCreate() {
+    /** @var \Drupal\migrate\Plugin\MigratePluginManagerInterface $migrationSourceManager */
+    $migrationSourceManager = $this->container->get('plugin.manager.migrate.source');
+    $this->assertTrue($migrationSourceManager->hasDefinition('csv'));
+  }

We can remove this method; I don't think it covers anything useful (it also does not cover __construct() at all). If the 'csv' plugin does not exist, we *will* run into an exception elsewhere in the test.

+++ b/core/modules/migrate/tests/src/Kernel/Plugin/migrate/source/CSVTest.php
@@ -0,0 +1,59 @@
+    $node = Node::load(1);
+    $this->assertEquals($node->label(), 'Justin Dean');
+    $this->assertEquals($node->get('field_first_name')->value, 'Justin');
+    $this->assertEquals($node->get('field_last_name')->value, 'Dean');
+    $this->assertEquals($node->get('field_email')->value, 'jdean0@example.com');
+    $this->assertEquals($node->get('field_ip_address')->value, '60.242.130.40');
+    $this->assertEquals($node->get('field_dob')->value, '1955-01-05');
+  }

We should be asserting all of the imported data, not just one record.

+++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
@@ -0,0 +1,113 @@
+class CSVFileObjectTest extends CSVUnitTestCase {

Do we really need a base class for this test? It seems like it would be acceptable to merge the two.

+++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
@@ -0,0 +1,113 @@
+    $this->csvFileObject->setFlags(\SplFileObject::READ_CSV | \SplFileObject::READ_AHEAD | \SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY);

According to the source plugin's doc comment, these are the defaults. We should just make them the defaults in CSVFileObject, and remove this line.

+++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
@@ -0,0 +1,113 @@
+  /**
+   * Tests that the construction appropriately creates a CSVFileObject.
+   *
+   * @covers ::__construct
+   */
+  public function testCreate() {
+    $this->assertInstanceOf(CSVFileObject::class, $this->csvFileObject);
+  }

We can remove this; it's not really testing anything.

+++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
@@ -0,0 +1,113 @@
+    $this->assertEquals($expected, $actual);

This should be assertSame().

+++ b/core/modules/migrate/tests/src/Unit/CSVFileObjectTest.php
@@ -0,0 +1,113 @@
+    $actual = $this->csvFileObject->count();
+
+    $this->assertEquals($expected, $actual);

We should use assertCount() here.

+++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
@@ -0,0 +1,104 @@
+  /**
+   * The happy path file url.
+   *
+   * @var string
+   */
+  protected $happyPath;
+
+  /**
+   * The un-happy path file url.
+   *
+   * @var string
+   */
+  protected $sad;

This is adorable, but can we choose better property names? These aren't very descriptive.

Comment #111

eiriksm

he/him

Norwegian Bokmål

Norway

commented 10 September 2018 at 14:26

Assigned:	Unassigned	» eiriksm
Issue tags:		+DrupalEurope

Working on this

Comment #112

eiriksm

he/him

Norwegian Bokmål

Norway

commented 10 September 2018 at 16:01

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-112.patch	50.1 KB
new	interdiff-2931739.txt	555 bytes

WOW!

So, this was a bit of a rabbit hole. I started by looking into the test failures. And for some reason the plugin was not picked up. So digging deeper in the rabbit hole, I found out the annotation parser did not pick it up. So why did the annotation parser not pick it up? Well because the function token_get_all, which the Doctrine TokenParser uses, tries to set the doc comment by using the last known doc comment before the token T_CLASS is found. So it initially gets found, but then, since we changed this:

-      'file_class' => 'Drupal\migrate\Plugin\migrate\source\CSVFileObject',
+      'file_class' => CSVFileObject::class,

there is another instance of the T_CLASS token, but without the doc comment (of course).

Not sure if we should reports it as a bug to doctrine/annotations as well, but for now: It's not possible to have a plugin discovered by annotation if you also use ::class in the same file :o

So for starters, here my try to fix the test failures from #107. I will look at the comments after I can verify this fixes the failures.

Comment #113

heddn

English

Nicaragua

commented 10 September 2018 at 16:03

That is already fixed upstream. We just cannot use that style of syntax until we can upgrade to the newer version of doctrine.

Comment #114

10 September 2018 at 16:44

Status:

Needs review

» Needs work

The last submitted patch, 112: 2931739-112.patch, failed testing. View results

Comment #115

eiriksm

he/him

Norwegian Bokmål

Norway

commented 11 September 2018 at 07:25

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff.txt	2.41 KB
new	2931739-115.txt	50.24 KB

OK, cool.

Here is an updated patch that should fix the test failures for starters. Interdiff from #107, not my own.

Comment #116

eiriksm

he/him

Norwegian Bokmål

Norway

commented 11 September 2018 at 08:03

Status	File	Size
new	2931739-116.patch	50.24 KB

Oh snap, was a bit fast at typing. Here is the patch in .patch format

Comment #117

dww

we/he/they

commented 12 September 2018 at 17:52

Assigned:	eiriksm	» Unassigned
Status:	Needs review	» Needs work

Nice work, thanks!

However, there's still a long list of (mostly spot-on) changes from the review in #110 that need to be addressed.

Re: #110.1, this whole first sentence:

* - keys: Array of column name(s) which represent the columns uniquely
*   identifying each record. If declared as a nested array, the keys must be

could be even simpler:

* - keys: Array of column names that uniquely identify each record.

Re: #110.4: "Also, why the hell would this be an array of arrays?"
And #110.18: "Isn't $column_names supposed to be a key => value hash?"

We went over this at #87.17, #88, and sadly at #91 @mikeryan confirmed It's A Thing(tm).

Yes, $column_names *should* be a simple hash of numeric keys (index or delta) and a string values (the names). There's apparently a "need" for an optional 3rd layer of "description". Perhaps that should be punted to another config knob if it really needs to exist, so column_names doesn't have to be weirdly overloaded like this. column_names vs. column_descriptions or something. Both are simple integer key to string value hashes. Both are optional so long as your CSV provides a header row and column_names are defined inline. If there's no header row, column_names is required (but simple to understand, unlike now).

+1 to everything else from #110.

Thanks,
-Derek

p.s. Also unassigning, since I don't assume @eiriksm wants to re-roll for all of this. If anyone wants to actively work on this, please assign yourself. Thanks again!

Comment #118

eiriksm

he/him

Norwegian Bokmål

Norway

commented 12 September 2018 at 19:40

Hi, sorry about that. My intention was to work through the list after fixing the failures but I got swamped by other things. I will assign myself if I get the time to pick it up in the coming days

Comment #119

mikelutz

He/Him

English

Michigan, USA

commented 17 September 2018 at 17:59

5 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_96-107.txt	8.59 KB
hidden	2931739-107.patch	50.1 KB
hidden	2931739-112.patch	50.1 KB
hidden	interdiff-2931739.txt	555 bytes
hidden	2931739-115.txt	50.24 KB

Comment #120

longwave

he/him

English

commented 17 September 2018 at 22:20

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-120.drupal.Bring-migratesourcecsv-to-core.patch	47.47 KB
new	interdiff.2931739.115-120.txt	11.75 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff.txt	2.41 KB
hidden	2931739-116.patch	50.24 KB

Tried to move this along a bit from the reviews in #110 and #117.

#110.14: This is the result of 'keys' being either a plain string or a field definition; I tried to add comments to help explain what is going on.
#110.16: Not sure what this undocumented property was useful for, so I removed it.
#110.19/20/21/24/25/26/30: Left these alone for now.

Everything else noted in the reviews should be fixed.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,294 @@
+ * - keys: Array of column names that uniquely identify each record, if keys
+ *   should be stored as strings. Otherwise, the keys must be column names and
+ *   the values should be field structure definitions as used by

I realise I'm overloading the word "keys" here to mean two different things, but this structure is hard to explain succinctly!

Comment #121

longwave

he/him

English

commented 17 September 2018 at 22:28

Status	File	Size
new	2931739-121.drupal.Bring-migratesourcecsv-to-core.patch	46.29 KB
new	interdiff.2931739.120-121.txt	4.64 KB

Whoops, I broke some of the tests entirely, this should be a bit better.

Comment #122

17 September 2018 at 22:56

The last submitted patch, 120: 2931739-120.drupal.Bring-migratesourcecsv-to-core.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #123

dww

we/he/they

commented 18 September 2018 at 02:07

Status:

Needs review

» Needs work

That looks great, thanks!

#110.14: This is the result of 'keys' being either a plain string or a field definition; I tried to add comments to help explain what is going on.

The comments partly help. But the deeper issue of how weird the code is remains. Maybe it's too much to ask that we actually fix this stuff, given that it all already works in contrib. It does seem like the sort of thing that has to happen to modules as they go into core. If they have any bizarre code, they need to be simplified and cleaned up before being committed.

#110.19/20/21/24/25/26/30: Left these alone for now.

Those are all test cleanups. Can/should they be moved to a follow-up issue? If so, let's open it as a child issue. If not, we need to fix the tests before this is committed.

Either way, this still needs work. But it's getting very close!

Thanks again,
-Derek

Comment #124

longwave

he/him

English

commented 18 September 2018 at 06:23

I agree the tests still need to be fixed here, I just ran out of time.

Dwelling on this overnight I wonder if the complication around 'keys' is YAGNI for core. CSV is inherently a file made of strings so what is the use case for the key being something other than a string? Should we simplify this entirely for core and leave it to subclasses if they need anything more complex, much like the removal of the 'file_class' and 'fields' options?

Comment #125

phenaproxima

he/him

English

Massachusetts

commented 18 September 2018 at 06:44

Speaking as a Migrate maintainer for core, I can confidently say that, as long as we use a different plugin ID, we are absolutely free to disregard any kind of backwards compatibility with contrib.

As far as core is concerned, this is a new feature. Therefore, our priority should be to build a well thought-out and easy to use CSV plugin, not maintain compatibility with contrib. The old plugin can continue to exist in contrib, with all its idiosyncrasies, if people need it.

Comment #126

mikelutz

He/Him

English

Michigan, USA

commented 18 September 2018 at 16:32

1 file was hidden/shown/deleted

Status	File	Size
hidden	2931739-120.drupal.Bring-migratesourcecsv-to-core.patch	47.47 KB

Comment #127

dww

we/he/they

commented 18 September 2018 at 16:52

Re: #125:

...as long as we use a different plugin ID, we are absolutely free to disregard any kind of backwards compatibility with contrib.

Then we would have to change the plugin ID for this, since the patch is currently using the one from contrib, 'csv'. :/

Re: #124:

CSV is inherently a file made of strings so what is the use case for the key being something other than a string?

The string can hold different kinds of values. Some "strings" are actually integers that would be much more happily stored and indexed as ints in the schema, not strings. Some strings are fixed length and relatively short. Some strings are huge. Some are floats (the kind of number, not CSS). ;) Etc.

I think part of the confusion is that the config setting is called "keys" but really it's the means by which this plugin can implement MigrateSourceInterface::getIds(). If you read the PHP doc for that, it starts to make more sense why we need to allow field structures to be defined. It's simply a config shortcut that you can use a flat array of strings, in which case we use those values as the machine name index for the ids structure, and we give you a default subarray to declare that id as a string. testGetIdsComplex() is showing it in action.

We could potentially improve both test and doc coverage of cases where numeric keys are in use and could be specified as such. I'm glad we're testing "paragraph" sized big text columns, but ints (and floats) seem very common in various migration cases, and worth both documenting and testing as part of this. In fact, all of the test CSV sources that use an 'id' are actually ints, and we should probably define/test them that way. Or at least have 2 tests showing that int ids (e.g. nids, uids, etc) work when declared as either ints or they use the default string schema.

Comment #128

dww

we/he/they

commented 18 September 2018 at 17:07

Issue summary:

View changes

Cleaned up the API changes section of the summary, and updated the remaining tasks.

Also, this:

+ * - keys: Array of column names that uniquely identify each record, if keys
+ *   should be stored as strings. Otherwise, the keys must be column names and
+ *   the values should be field structure definitions as used by
+ *   MigrateSourceInterface::getIds().

might be more clear as:

* - keys: Array of column names that uniquely identify each record, if each key
*   should be stored as a string. Otherwise, the array must be indexed by column
*   name and the values should be field structure definitions as used by
*   MigrateSourceInterface::getIds().

(sorry if that wrapping isn't right, I'm doing this comment via browser, not emacs).

Comment #129

mikelutz

He/Him

English

Michigan, USA

commented 18 September 2018 at 18:56

Status	File	Size
new	2931739-129.drupal.Bring-migratesourcecsv-to-core.patch	45.29 KB
new	interdiff.2931739.120-129.txt	18.31 KB

3 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff.2931739.115-120.txt	11.75 KB
hidden	2931739-121.drupal.Bring-migratesourcecsv-to-core.patch	46.29 KB
hidden	interdiff.2931739.120-121.txt	4.64 KB

This still needs work on tests, I need to go through the rest of #110. This is my attempt to add some sanity to the whole column names thing. I'm content to change variable and config key names around, but code wise, this makes more sense to me.

The human readable descriptive names are ONLY for the fields method in the source. There is absolutely no reason why the file object needs to know about them. Now we have two config items, column_names and column_descriptions, both are simple indexed arrays to map the 0 indexed column to a field key and description. If not present, header row is used for either/both, and lacking everything, you basically get no fields, so I want a little more error handling around that. I rather like requiring either a header row or column names, but if we decide to allow the whole thing to be accessed using just the column index, then we need a little more work around the fields method to support it.

I also reduced the ids array to a simple array that makes everything a string. If there's strong reason to allow other id types, we can put it back.

Comment #130

dww

we/he/they

commented 18 September 2018 at 19:43

Disclaimer: I haven't looked at the patch nor interdiff, only going on your comment. About to head into a meeting, so I don't have time for more right now.

Thanks for column_names vs. column_descriptions!

However, -1 for "I also reduced the ids array to a simple array that makes everything a string. If there's strong reason to allow other id types, we can put it back.". See #127 for why.

I'm a little worried that #129 is moving too many things in too many different directions before we have agreement on what we're trying to accomplish.

Thanks,
-Derek

Comment #131

mikelutz

He/Him

English

Michigan, USA

commented 19 September 2018 at 01:07

The string can hold different kinds of values. Some "strings" are actually integers that would be much more happily stored and indexed as ints in the schema, not strings. Some strings are fixed length and relatively short. Some strings are huge. Some are floats (the kind of number, not CSS). ;) Etc.

While that may be true, the purpose of this typing is specifically for the idmap. While the default is certainly a SQL IdMap and a mySQL backend, it's not guaranteed. At the end of the day, the migration executable writes the map using the source values from the row. Unless we are going to typecast based on those key types, or have a lot more error checking around them, at the moment that happens, the executable is sending in a string value because the row is an array of strings. So despite the fact that it may seem like a simple incrementing integer value in a spreadsheet, and it may seem reasonable that it should be stored and indexed in the database as an integer, you are trusting an unknown idmap object can handle getting a string when you've told it it's getting an integer. The truth is, it probably can, but I'm not a fan of sending a sting to do an integer's job. Just because php is loosely typed doesn't mean we need to write loosely typed code.

As far as supporting longer strings, I think there's a better case for that, but I'm also not sure it's reasonable. if you need a paragraph sized identifier, you are probably doing something wrong.

I was thinking about potentially supporting using the csv line number as a migration key if none is specified. I won't think it's an ideal situation, but i could see some use cases where someone might want it.

Comment #132

dww

we/he/they

commented 19 September 2018 at 01:45

All of #131 is probably true, but that seems like a fight to have with MigrateSourceInterface::getIds(), not this patch. If that API expects/allows all this, that's the place to have the tests to see what happens with faulty input, not in this particular implementation. We're just making sure the source config allows us to uphold the API we've been given to work with. Contrib has used this particular way, the 'keys' config knob, a large number of existing migrations are written to work with that, and it'd be pretty annoying for us to both squat on the 'csv' namespace *and* break all those migrations by changing it at this point. It's wonky, but we almost fully understand it now and nearly have it documented properly. Most people will never care. If they care, and they go out of their way to define a column as an int, it's their problem if they have dirty source data.

Comment #133

mikelutz

He/Him

English

Michigan, USA

commented 19 September 2018 at 02:40

I realize up around #81 @heddn talked about a straight port and keeping the same namespace, but the more I dig into this, the more I agree with #125. These configuration knobs are wonky, and once they are committed, it's going to be very difficult to fix. If we change the plugin name we can use a more intuitive set of configurations, and anybody currently using contrib can just keep doing so until they are ready to update their migrations.

I still vote against allowing non string ids when we only provide string fields, but I'm far more bothered by the column_names knob. If you aren't using a header row, you should be able to supply a list of column names as a simple array, and you shouldn't have to provide descriptions beyond the field names if you don't want to. Just making that change would break existing migrations and would warrant a name change, but I think it makes the plugin more useable going forward, and it's far better to do it now than try to do it later.

Comment #134

mikelutz

He/Him

English

Michigan, USA

commented 19 September 2018 at 02:50

Rather than make a bizarre exception here, why not just set the source module to "migrate"?

More interestingly why is there a test in the migrate module which extends a test class from the migrate_drupal module and enables a test module from the migrate_drupal_ui module to test that all source plugins have an annotation that they aren't required to have?
The module dependencies for tests should go the other way. Source plugins aren't required to have a source module annotation. Migrations tagged with 'Drupal 6' or 'Drupal 7' are only allowed to use source plugins which have a source module annotation. We shouldn't have to add dummy annotations to source plugins that don't apply to Drupal migrations just to pass tests.
Migrate has been built primarily for d6 and d7 migrations, but it's purpose is greater, as this issue demonstrates. If there isn't a source module, there isn't a source module.

Comment #135

heddn

English

Nicaragua

commented 19 September 2018 at 13:07

I don't know if this helps with some of the confusion. Around the timeframe of #110, there was a conversation in slack that never made it into this issue. Or so it would appear. In this conversation, @phenaproxima and @heddn agreed to not do a straight port as put forth in #81. So we /will/ have a new name. And this lets us make the API "right" and "sane".

As far as the column names thing, we don't need to expose that to folks in the API. For keys, I want to vote for using the approach we see in migrate_plus and core and call the thing ids and use the same sql style declaration of int, string, etc.

Comment #136

phenaproxima

he/him

English

Massachusetts

commented 19 September 2018 at 13:30

Migrate has been built primarily for d6 and d7 migrations, but it's purpose is greater, as this issue demonstrates. If there isn't a source module, there isn't a source module.

Migrate can freak out and throw exceptions during plugin discovery if source_module isn't set. In retrospect, that behavior should have been entirely confined to Migrate Drupal, but at this point it's too late to go back. In Drupal 9, we can and should scale back the invasiveness of the source_module and destination_module stuff. But for now, we should just set the source_module to 'migrate' and call it a day.

Comment #137

mikelutz

He/Him

English

Michigan, USA

commented 19 September 2018 at 14:18

Migrate can freak out and throw exceptions during plugin discovery if source_module isn't set. In retrospect, that behavior should have been entirely confined to Migrate Drupal, but at this point it's too late to go back.

I just did another quick grep, The only place I see the exception raised is in \Drupal\migrate_drupal\MigrationPluginManager::processDefinition() which throws an exception if migrate_drupal is enabled and there is a migration tagged with Drupal 6 or Drupal 7 that uses a source plugin with no source module defined. The exception declares that the source plugin must define the source module property, but it's really an exception on the migration, which shouldn't be using a source with no module defined. Actually, that should allow the migration to define a source module in it's configuration if it wants to, to mimic SourcePluginBase::getSourceModule.

Regardless, It should be punted to another issue, I suppose.

Comment #138

dww

we/he/they

commented 19 September 2018 at 19:29

Issue tags:

+Needs issue summary update

Re: #135:

As far as the column names thing, we don't need to expose that to folks in the API.

I don't know what that means. Of course we have to let people define their column names, or they can't write the rest of their migration. Are you proposing we *require* a header row? I'd be very strongly opposed to that. So we have to have a way to declare the column names via the API of how you configure your source plugin.

Maybe you mean we don't have to expose column descriptions to folks? Great. But then shouldn't we have an issue somewhere to deprecate that part of the migrate source plugin API? See comment #91 from @mikeryan.

Regardless, if the party line is that core is going to change namespaces and break existing migrations for this, so be it. In that case, the summary is out of date, and perhaps some of the migrate maintainers who've been discussing this offline could update things to reflect the current "agreement". Namely, the API changes section can no longer claim BC and ease-of-use for existing users of the contrib module, and it should start to spell out the ways that we envision this working differently (even if it's primarily about changing the available config knobs).

Also agreed that #136 and #137 are out of scope for this issue.

Thanks,
-Derek

Comment #139

heddn

English

Nicaragua

commented 19 September 2018 at 19:40

Yeah, we need to update the IS.

Here's what I expect the yml looks like:

    0: id
    1: first_name
    2: last_name
    3: email
    4: country
    5: ip_address
    6: date_of_birth

or alternatively:

    - id
    - first_name
    - last_name
    - email
    - country
    - ip_address
    - date_of_birth

This as opposed to the existing requirement:

    0:
      id: Identifier
    1:
      first_name: First Name
    2:
      last_name: Last Name
    3:
      email: Email Address
    4:
      country: Country
    5:
      ip_address: IP Address
    6:
      date_of_birth: Date of Birth

Then I expect the source module to array_combine or some such on id, first_name, etc. to make the required nested array that migrate needs.

So, no API exposed to site users about a double nested array thing, just a yaml array that the csv plugin converts into the right format.

Comment #140

dww

we/he/they

commented 19 September 2018 at 19:52

Right, so column names are required, either via header row or a (simple) array.

Column descriptions are optional, via a second (simple) array config knob. If not defined, we default to reusing the name as the description.

Or are you proposing we simply hide descriptions entirely and not provide any way to declare them? In that case, follow-up issue to deprecate them from the Migrate API entirely?

Comment #141

heddn

English

Nicaragua

commented 19 September 2018 at 20:05

For a CSV source, I'm suggesting we hide their existence in the implementation details. It is confusing DX. That isn't to say it has no value. And for other sources, it might make sense. So, hide but don't deprecate is my suggestion. If someone needs the description, they should be able to extend the source plugin or keep using contrib.

Comment #142

dww

we/he/they

commented 19 September 2018 at 20:17

The nested array stuff is certainly confusing DX.
The flat arrays with separate knobs per #129 isn't bad.

If descriptions vs. machine names in general makes for a better UI/UX, and that's what the API itself supports so that various tools can benefit, it seems we should make it easier to give human readable names than "extend the plugin". I don't see what that has to do with a CSV source. Why does it matter if "first_name" vs. "First name" appears in my tool output, regardless of if the name came from a CSV file, a JSONAPI call, a DB query, etc?

I thought we were going to kill off the contrib version of this once in core. Why maintain two slightly different copies of the same thing?

Comment #143

andypost

he/him

Russian

commented 1 November 2018 at 00:39

Where the upgrade path from contrib will live? if structure will change then probably contrib needs to release 3.0 with 8.7 requirement and provide upgrade to disable itself at least

Comment #144

heddn

English

Nicaragua

commented 27 February 2019 at 18:36

Status	File	Size
new	interdiff_129-144.txt	9.48 KB
new	2931739-144.patch	44.02 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	2931739-129.drupal.Bring-migratesourcecsv-to-core.patch	45.29 KB
hidden	interdiff.2931739.120-129.txt	18.31 KB

The tests will fail miserably so not changing status. But this rebases things using league/csv instead of a custom csv reader. I'm haven't had a chance to manually test things, but it should be a drop-in replacement for our previous file reader solution. Meaning, it should work but the tests need to be adjusted for the new solution.

Comment #145

heddn

English

Nicaragua

commented 27 February 2019 at 21:26

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff_144-145.txt	20.06 KB
new	2931739-145.patch	36.58 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_129-144.txt	9.48 KB
hidden	2931739-144.patch	44.02 KB

Let's see if this fixes up any of the tests.

Comment #146

heddn

English

Nicaragua

commented 27 February 2019 at 21:34

Status:

Needs review

» Needs work

Some self review here:

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,215 @@
+ * In this example, the migration source will use a single-column string key

The string comment here is unneeded.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,215 @@
+    if (empty($this->configuration['ids']) || !is_array($this->configuration['ids'])) {

All this error checking here and below could use tests.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,215 @@
+      'stream_filters' => [],

Missing docs on this.

+++ b/core/modules/migrate/src/Plugin/migrate/source/CSV.php
@@ -0,0 +1,215 @@
+    if ($reader->supportsStreamFilter()) {

This could use tests.

+++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
@@ -0,0 +1,104 @@
+   * The multi line file url.
+   *
+   * @var string
+   */
+  protected $multiLine;
...
+    $multiLineContent = <<<'EOD'
+id,title,description
+1,"Title 1","Description 1 Line 1
+Description 1 Line 2
+Description 1 Line 3"
+2,"Title 2","Description 2 Line 1
+Description 2 Line 2
+Description 2 Line 3"
+3,"Title 3","Description 3 Line 1
+Description 3 Line 2
+Description 3 Line 3"
+EOD;
...
+    $this->multiLine = vfsStream::newFile('multi-line.csv')
+      ->at($root_dir)

Let's get rid of this multi line scenario as we won't be supporting it in core.

+++ b/core/modules/migrate/tests/src/Unit/CSVUnitTestCase.php
@@ -0,0 +1,104 @@
+    $happy = <<<'EOD'
...
+    $sad = <<<'EOD'

Let's rename these standard and non_standard.

Comment #147

heddn

English

Nicaragua

commented 28 February 2019 at 02:30

Status:

Needs work

» Needs review

Status	File	Size
new	interdiff_145-147.txt	9.3 KB
new	2931739-147.patch	35.36 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_144-145.txt	20.06 KB
hidden	2931739-145.patch	36.58 KB

Comment #148

28 February 2019 at 03:36

Status:

Needs review

» Needs work

The last submitted patch, 147: 2931739-147.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Comment #149

dww

we/he/they

commented 28 February 2019 at 16:43

Hrm, I'm rather -1 to a new composer.json dependency for this. That means we have to:

a) Carefully audit the external dependency.
b) Have confidence it's well-maintained, will get security updates, that we can coordinate with those updates when needed, etc.
c) Have confidence it's going to survive as long as this code will want to use it.
d) Have confidence it's not going to have conflicting requirements from what we want (e.g. dropping support for PHP 7.0 before we do, whatever).

Looking at the interdiff, we're not saving many lines of code for all those potential costs. The entire CSVFileObject.php in contrib is 117 lines, with comments and whitespace. Most of that are trivial set/get methods. The only actual code of substance are these:

  public function rewind() {
    $this->seek($this->getHeaderRowCount());
  }

  public function current() {
    $row = parent::current();

    if ($row && !empty($this->columnNames)) {
      // Only use columns specified in the defined CSV columns.
      $row = array_intersect_key($row, $this->columnNames);
      // Set meaningful keys for the columns mentioned in $this->csvColumns.
      foreach ($this->columnNames as $key => $value) {
        // Copy value to more descriptive key and unset original.
        $value = key($value);
        $row[$value] = isset($row[$key]) ? $row[$key] : NULL;
        unset($row[$key]);
      }
    }

    return $row;
  }

I know we all love our #ProudlyFoundElsewhere tags, but it doesn't seem like an obvious win in this case.

Am I missing something?

Thanks,
-Derek

Comment #150

heddn

English

Nicaragua

commented 28 February 2019 at 17:02

Status:

Needs work

» Needs review

Status	File	Size
new	2931739-150.patch	33.71 KB
new	interdiff_147-150.txt	19.41 KB

2 files were hidden/shown/deleted

Status	File	Size
hidden	interdiff_145-147.txt	9.3 KB
hidden	2931739-147.patch	35.36 KB

This fixes tests locally.

Comment #151

heddn

English

Nicaragua

commented 28 February 2019 at 17:08

Reasons to use it, it supports stream resources. The #1 requested feature for CSV files. The contrib module doesn't have that support. https://thephpleague.com/ is a well respected member of the PHP community.

The downside as you stated is lining up release dates. From https://csv.thephpleague.com/

Once a new major version is released, the previous stable release remains supported for six more months through patches and security fixes.

Comment #152

dww

we/he/they

commented 28 February 2019 at 17:27

Okay, duly noted. Thanks.

However, it seems like adding a new feature request on the way into core is the wrong approach. ;) Certainly the use-case-in-core for this (the Umami stuff) doesn't care about streams -- we'll have local files. Stream support seems to open up a lot of potential what-ifs that should be more carefully considered. IMHO, we should either:

a) Keep the core patch simple, small, easy, based on already-proven code, without introducing new dependencies. Add stream support at a later date in a follow-up issue.

or:

b) Try out the new dependency and stream support in contrib for a while and forget about trying to get this into 8.7.0.

Either way, I don't see much hope in adding a new external dependency at this late date in the 8.7.x release cycle. Maybe the framework and release managers will see it differently, but I doubt it.

Comment #153

andypost

he/him

Russian

commented 28 February 2019 at 18:04

I'm also see no real benefit of external dependency, meanwhile using this lib I recall issues with encodings and that cause to always override it to properly handle variable encodings

Comment #154

heddn

English

Nicaragua

commented 28 February 2019 at 19:30

There is stream filter support for encoding conversion. I opted not to include it for the initial patch here. Since for core's use, we don't need to handle encoding conversion. It would be a later follow-up. Having an absolute path for a local file for CSV is painful.

The following is required to get the absolute path:

function hook_migration_plugins_alter(&$definitions) {
  $definitions['migrate_csv_test']['source']['path'] = drupal_get_path('module', 'my_Module') . $definitions['my_migration']['source']['path'];
}

Were-as we could use a stream once #1308152: Add stream wrappers to access .json files in extensions lands. And no plugin alters and cache clearing are needed.

I'm not particularity fixed on getting this into 8.7 if 8.8 works better. But it would be nice to use a library instead of parsing the CSV ourselves.

Comment #155

heddn

English

Nicaragua

commented 28 February 2019 at 22:22

Status:

Needs review

» Closed (won't fix)

Yes we can and have written a nifty CSV solution in contrib. But if we bring it into core, it has several latent limitations and any enhancements to address these features would leave us writing our own solution. And we'd take that ~100 lines and quickly run into many 100s of lines. Migrate maintainers have many other priorities. Continuing to roll our own solution for reading CSVs doesn't seem like a good use of that time.

Known limitations:

no stream support
no BOM removal
no encoding/decoding at stream reading time (only supports utf-8 char sets)
hook_migration_plugins_alter is need for local file support in a lot of cases

Until we're more welcome to use 3rd party PHP libraries, I'm going to mark this closed, won't fix. I don't really want to implement a halfway CSV solution in core that has so many limitations.

Comment #156

mlncn commented 20 March 2019 at 17:51

Too bad; this would make the migration support already in Drupal core really come alive. I thought Drupal had long since moved to embracing well-maintained 3rd-party dependencies!

Comment #157

heddn

English

Nicaragua

commented 21 March 2019 at 16:43

I've opened #3042160: Utilize leage/csv for CSV reader to move this over into contrib.