Right now, the structure of the Url source plugin configuration is:

source:
  plugin: url
  data_fetcher_plugin: http
  data_parser_plugin: json
  urls: /migrate_example_advanced_position?_format=json
  authentication: # See #2761489: Provide authentication plugins to HTTP fetcher
  item_selector: 1
  fields:
    ...
  ids:
    ...

Everything, even options specific to either the fetcher or parser, is at the source plugin configuration level. Ideally, I'd like things to be broken down cleanly between them, like

source:
  plugin: url
  fetcher:
    plugin: http
    authentication: # See #2761489: Provide authentication plugins to HTTP fetcher
  parser:
    plugin: json
    # One might expect this to be a fetcher option, but the parser controls the fetcher, determining
    # when to get the the next URL in the list, and the fetcher is designed for one URL at a time.
    urls: /migrate_example_advanced_position?_format=json
    item_selector: 1
  fields:
     ...
  ids:
    ...

Two issues with that:

  1. We already have two cases of parser plugins that subsume the fetching functionality (xml and soap), so the fetcher isn't actually used. That being said, now that I think about it if the parser plugin has access to the whole source configuration it can peek at the fetcher config, even if the actual fetcher plugin isn't being used.
  2. This is obviously disruptive for anyone already working with this plugin. Although, I'm wondering if the source plugin could recognize the "old" structure (seeing 'urls' at the top level) and rearrange it into the "new" structure, which would be used internally?

Thoughts?

Comments

mikeryan created an issue. See original summary.

mikeryan’s picture

Issue summary: View changes
kriboogh’s picture

Been working with migrations a bit now, trying to figure out how everything works. I think sometimes simpler is better. I think we can come a long way if we just have 'source'-classes and 'parsers'. Source classes are the fetchers in my view, parsers just deal with the response of those and return a unified format a destination can use (an array of objects).

# sql source

source:
  plugin: sql 
  database: 
  username:
  password:
  ... other database connection parameters.
  query: sql statement
  <parser not used>

# a remote source (url):

source:
  plugin: url
  url: http://...
  authentication:
    type: basic
    username:
    password:
  parser: xml
     xpath: //xpath/selection

source:
  plugin: url
  url: http://...
  parser: json
     selector: 1
     ...

# A SOAP source

source:
  plugin: soap
  wsdl: http://soap.service/wsdl
  function: executeMeMethod
  parser: xml
     ...

# A local file source:

source:
  plugin: file
  path: DRUPAL_ROOT/sites/modules/module/data/data.xml
  parser: xml
     xpath: //xpath/selection
     ...

source:
  plugin: file
  path: DRUPAL_ROOT/sites/modules/module/data/data.json
  parser: json
     selector: 1
     ...

source:
  plugin: file
  path: DRUPAL_ROOT/sites/modules/module/data/data.xls
  parser: xls
     delimiter: \t
     ...

# a FTP source

source:
   plugin: ftp
   server:
   port:
   ...
   parser: xls
      delimiter: \t

So we have sources:
- ftp
- soap
- url
- file
- sql

Parsers:
- xls
- json
- xml

But again, this is how I would implement it.

PunamShelke’s picture

Hello,

I have one issue is their
I am working with alfresco webscript, for file they have on link with basic auth so i have to download file by hitting that url..

is it possible through migration?

This is my yml file

source:
plugin: url
data_fetcher_plugin: http
data_parser_plugin: json
urls: 'path to folder/sites/default/files/import/forms/forms.data.json'
item_selector: forms
# Unique ID.
ids:
file_resource:
type: string

# Source field definitions.
fields:
-
name: file_resource
label: 'file_resource'
selector: file_resource
constants:
file_source_uri: http:alfersco path/alfresco/s/custom/bdccontent?ContentId=29684005(file_resource)
file_dest_uri: 'public://migration'
# Destination.
destination:
# We will be creating entities of type "file" this time.
plugin: 'entity:file'
urlencode: true
# Mappings.
process:
file_source:
plugin: urlencode
source: constants/file_source_uri
file_dest:
-
plugin: concat
delimiter: /
source:
- constants/file_dest_uri
- file_resource
# Make sure we don't have any url-unfriendly characters.
-
plugin: urlencode
filename: file_resource
uri:
-
plugin: http
authentication:
# Recognized types are basic and digest.
type: basic
parameters:
username: user
password: pass
-
plugin: download
source:
- '@file_source'
- '@file_dest'
# Dependencies.
dependencies:
enforced:
module:
- forms_migrate

is this correct?

heddn’s picture

re #4, please open a support request for your question. I'd like to keep us more focused on the question in the issue summary.

The one thing I note in your post though is that your http fetch stuff is in the process section, not the source. It should move to the source.