File DataFetcher plugin should avoid reading of entire file into a string until the method getResponseContent is called.

The source file for migration might be pretty large (for example: ~1Gb xml/csv file) and in this case it make sense to use stream readers like XMLReader in data_parser plugins (see Drupal\migrate_plus\Plugin\migrate_plus\data_parser\XML ) instead of reading the whole file into memory. Otherwise it can produce unstable performance and fatal errors due to memory limits.

Please see proposed patch with fix. It's left for DataParsers to decide whether to call getResponseContent() or not.
The patch also contains small fixes to match File DataFetcher plugin to DataFetcherPluginInterface declaration.

Note, the File DataFetcher plugin implements DataFetcherPluginInterface with following method.

/**
   * Return content.
   *
   * @param $url
   *   URL to retrieve from.
   *
   * @return string
   *   Content at the given url.
   */
  public function getResponseContent($url);

  /**
   * Return Http Response object for a given url.
   *
   * @param $url
   *   URL to retrieve from.
   *
   * @return \Psr\Http\Message\ResponseInterface
   */
  public function getResponse($url);
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

itsekhmistro created an issue. See original summary.

itsekhmistro’s picture

Status: Active » Needs review
FileSize
1.97 KB

Please see the proposed patch with fix.

afi13’s picture

Status: Needs review » Reviewed & tested by the community

OK for me.

heddn’s picture

I'll fix these small nits on commit.

  1. +++ b/src/Plugin/migrate_plus/data_fetcher/File.php
    @@ -34,10 +34,13 @@ class File extends DataFetcherPluginBase {
    +    // Expected to be @return \Psr\Http\Message\ResponseInterface.
    +    $response = new Response('File is readable.', 200);
    

    Can we just return this without assigning to a variable?

  2. +++ b/src/Plugin/migrate_plus/data_fetcher/File.php
    @@ -45,8 +48,32 @@ class File extends DataFetcherPluginBase {
    +    return $content;
    

    Just return $this->getFileContent($url);. No need for the variable.

  3. +++ b/src/Plugin/migrate_plus/data_fetcher/File.php
    @@ -45,8 +48,32 @@ class File extends DataFetcherPluginBase {
    +  private function getFileContent($url, $offset = 0, $maxlen = NULL) {
    

    Brilliant idea.

heddn’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests
FileSize
1.75 KB
1.52 KB

Actually, this is back to NW. I was looking at the Response object returned by File and we don't have an import for it. And we don't even have a composer dependency for it either. So, I'll leave fixing that to a follow-up. This is just the basics.

And we should add tests. Because its the right thing to do and we really need to beef up testing coverage. It would have caught the non-use of a Response. Or if I'd committed this code, the test would have caught the missing import.

Matroskeen’s picture

Status: Needs work » Needs review
Issue tags: -Needs tests
FileSize
1.91 KB

I started looking at xml-related stuff and noticed this issue. I like the idea, but I didn't find the use-cases when DataFetcherPluginInterface::getResponse() is being called without DataFetcherPluginInterface::getResponseContent().

I'm attaching a re-roll just to see how it goes, but I don't see a real value. I might miss something though.

Speaking of tests, we have this covered already. Example is SimpleXmlTest, which is using file data fetcher.

Status: Needs review » Needs work

The last submitted patch, 6: 2854085-6.patch, failed testing. View results
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

Matroskeen’s picture

Status: Needs work » Closed (won't fix)

I asked other people in #migrate channel hoping they'll clarify to me the proposed solution, but they couldn't confirm either.

@heddn said:

that was opened ages ago. things might have changed since it was opened. safe enough to close won't fix and ask for someone to re-open if they can think of a way around the obvious issues you uncovered

I agree, so I'm marking this as won't fix.