Refactor fetcher/parser classes to make fetcher the iterator [#2640508]

Follow-up to #2623012: Implement interfaces and base classes for URL-based sources

Per https://www.drupal.org/node/2623012#comment-10691800, the current breakdown of responsibilities between the fetcher and parser classes does not permit using the fetcher for scenarios requiring iterative fetching (i.e., the fetcher must fetch everything at once). Most immediately, the existing XML parser does not use the fetcher because it won't work with the XMLReader API. So, let's make the fetcher the main iterator for the Url source plugin, and either have it call the parser for each item before returning the results to the Url plugin, or have the Url plugin call the fetcher and then the parser.

Comments

Comment #1

24 December 2015 at 22:23

mikeryan created an issue. See original summary.

Comment #2

mikeryan

he/him

English

Murphysboro, IL, USA

CreditAttribution: mikeryan at Acquia commented 2 February 2016 at 02:10

Priority:	Major	» Normal
Status:	Active	» Postponed

My vision was to make the fetcher a format-agnostic iterator that produces a chunk of raw data corresponding to one migratable data item at a time, which the parser would then parse into individual fields. Once I express that clearly this way, however, a problem becomes immediately apparent - for most data formats (like, say, XML or JSON) you have to know how to parse the format to identify those chunks to be further parsed, making the fetcher no longer format-agnostic. We might have a mechanism for the format-agnostic fetcher to call the format-savvy parser to do the work, but this ties the fetcher and parser more closely together. At any rate, for the XMLReader source which is our current practical use case, it is most convenient for it to do both kinds of parsing as it goes - separating the migratable items, and parsing out the specific fields from each item.

So, at this point, after periodically puzzling over how to cleanly support both grab-everything-and-give-it-to-the-parser and take-a-piece-at-a-time, I'm ready to move on. In particular, the XMLReader approach inherently mixes parsing and fetching so they're difficult to cleanly separate. So, I'm postponing this rather than closing it in case someone has a good idea to generalize it, but for I think we should stick with what we have now, where you can have a parser that doesn't need an external fetcher (i.e., XMLReader).