JSON-based source classes

Last updated on
30 April 2025

Using MigrateListJSON and MigrateItemJSON

Migration from data encoded in JSON can be accomplished using the MigrateListJSON and MigrateItemJSON classes, based on the MigrateList pattern.

Consider this scenario:

  $list_url = 'http://www.example.com/list.json';
  $item_url = 'http://www.example.com/item.json?id=:id';
  $fields = array(
    'id' => 'Unique ID',
    'title' => 'Title of the object',
    'content' => 'Text of the object'
  );
  $this->source = new MigrateSourceList(
    new MigrateListJSON($list_url), 
    new MigrateItemJSON($item_url, array()), 
    $fields
  );

In the above example, $list_url is a JSON endpoint that contains a list of document id's that should be migrated. $item_url represent the endpoint at the source that contains a json representation of the source object. The $item_url contains the ':id' element that will be replaced by a source identifier.

$fields contains an array with field names available in the source objects.

Using list pattern with a single JSON file

Let's consider a different scenario, where the JSON input consists of a single file with an array of objects to be imported:

[
  {
    "id":"3595062:Page:24924",
    "title":"This is our second page",
    "description":"<p>This is the second page body.</p>",
    "createdDate":"2012-05-11T23:39:27.433Z",
  },
  {
    "id":"3595062:Page:24863",
    "title":"The first page",
    "description":"<p>Hello World!</p>",
    "createdDate":"2012-04-05T21:43:35.894Z",
  }
]

In this case, the same file serves as both the ID listing and the data source, and the assumptions made by the JSON classes don't work here - we need to override them.

class ExampleListJSON extends MigrateListJSON {
  /**
   * The default implementation assumes the IDs are top-level array elements, 
   * but the array elements are the data items - we need to look inside them
   * for the IDs.
   */
  protected function getIDsFromJSON(array $data) {
    $ids = array();
    foreach ($data as $item) {
      $ids[] = $item['id'];
    }
    return $ids;
  }
}

class ExampleItemJSON extends MigrateItemJSON {
  protected $data = array();
  /**
   * Two problems with the default getItem() - it reads and parses the JSON on
   * each call, which we don't want to do for multiple items in the file; and,
   * we need to retrieve a given item from the file according to its 'id' element.
   */
  public function getItem($id) {
    // We cache the parsed JSON at $this->data.
    if (empty($this->data)) {
      $data = $this->loadJSONUrl($this->itemUrl);
      if ($data) {
        // Let's index the array by the ID for easy retrieval.
        foreach ($data as $item) {
          $this->data[$item->id] = $item;
        }
      }
      else {
        // Error-handling here....
      }
    }
    // Return the requested item
    if (isset($this->data[$id])) {
      return $this->data[$id];
    }
    else {
      return NULL;
    }
  }
}

Then, we create the migration source using our extended classes:

    $fields = array(
      'id' => 'Unique page ID',
      'title' => 'Page title',
      'description' => 'Page body',
      'createdDate' => 'Page creation date',
    );
    $json_file = '/data/sources/example-pages.json';
    $this->source = new MigrateSourceList(new ExampleListJSON($json_file),
      new ExampleItemJSON($json_file, array()), $fields);

Importable data not at the top level

When JSON is returned from an API, the desired data may not be at the top level but rather wrapped inside a response. Often an API will return this:

{
  "success":true,
  "results":[
  {
    "id":"3595062:Page:24924",
    "title":"This is our second page",
    "description":"<p>This is the second page body.</p>",
    "createdDate":"2012-05-11T23:39:27.433Z",
  },
  {
    "id":"3595062:Page:24863",
    "title":"The first page",
    "description":"<p>Hello World!</p>",
    "createdDate":"2012-04-05T21:43:35.894Z",
  }
]
}

In this use case the above classes will not work as written and will need to be modified.

class MultiListJSON extends MigrateListJSON {
  /**
   * Originally this class assumed that the results would be all that was in the JSON
   * object. This class has been modified to look inside the JSON object and find the
   * expected data within the ['results'] member. 
   */
  protected function getIDsFromJSON(array $data) {
    $ids = array();
    foreach ($data['results'] as $item) {
      $ids[] = $item['id'];
    }
    return $ids;
  }
  
  /**
   * If this is the case then the count will be off as well and we will have to overriden as well.
   */
  public function computeCount() {
    $count = 0;
    if (empty($this->httpOptions)) {
      $json = file_get_contents($this->listUrl);
    }
    else {
      $response = drupal_http_request($this->listUrl, $this->httpOptions);
      $json = $response->data;
    }
    if ($json) {
      $data = drupal_json_decode($json);
      if ($data) {
        $count = count($data['results']);
      }
    }
    return $count;
  }
}

Using MigrateSourceJSON

The above classes use the drupal_json_decode() function (which wraps PHP's json_decode). While this works fine in many cases, it's important to understand that it takes the entire JSON file and decodes it into an in-memory structure. Thus, if you have a very large JSON file, as in Ning exports, you're likely to hit the PHP memory limit. The MigrateSourceJSON source class provides an alternative which incrementally parses the JSON file with an analog to the PHP XMLReader class (which addresses this issue for XML). To be able to incrementally parse the file does require some assumptions about how the JSON is organized - specifically, out-of-the-box it assumes the JSON contains an array of objects to be imported - but by providing an override to the MigrateJSONReader class you can handle different structures.

Consider the following example data, an array of simple objects:

[
  {"id":"53","field1":"value1"},
  {"id":"54","field1":"value2"}
]

This is delivered from the URL http://example.com/services/data.json. So, to migrate from this file, your source setup in your migration constructor will look like:

// Document the source fields
$fields = array(
  'id' => 'Unique ID for each source data row',
  'field1' => 'The actual source data',
);
// Second parameter is the unique ID from the source file
$this->source = new MigrateSourceJSON('http://example.com/services/data.json', 'id', $fields);
...
$this->addFieldMapping('field_example_data', 'field1');
...

Quite simple, if your data fits the default structure. What if your data is not a simple array of objects, with the data fields at the top level of each object? Consider this example:

[{
  "Release": {
    "id": "3263",
    "name": "Debut",
    "release_date": "2008-06-10",
    "artist_id": "1080",
  },
  "Artist": {
    "id": "1080",
    "name": "Jane Roe"
  },
},
{
  "Release": {
    "id": "3368",
    "name": "Sophomore Slump",
    "release_date": "2009-03-24",
    "artist_id": "1080",
  },
  "Artist": {
    "id": "1080",
    "name": "Jane Roe"
  }
}]

We're trying to import the releases, but the data for the release is a layer deeper - the "fields" that would be available for mapping by default would be Release and Artist rather than id, name, etc. We can address this by overriding the underlying class that does the actual parsing and building of the data rows, MigrateJSONReader:

class ExampleJSONReader extends MigrateJSONReader {
  /**
   * Implements MigrateJSONReader::next().
   */
  public function next() {
    // Do the default fetch.
    parent::next();
    // At this point, $this->currentElement (which will eventually become our $row object
    // in the migration class) contains Release and Artist objects. We want to move the
    // Release fields up to the top level.
    if (is_object($this->currentElement)) {
      foreach ($this->currentElement->Release as $field_name => $field_value) {
        $this->currentElement->$field_name = $field_value;
      }
      // Get rid of the Release object, we won't be referencing it. Leave the Artist object - it
      // will be available in prepareRow(), prepare(), etc. should we want to reference it.
      unset($this->currentElement->Release);
    }
  }
}

Then, we tell the source class to substitute our version of the JSON reader class:

$this->source = new MigrateSourceJSON('http://example.com/services/data.json', 'id', $fields, array('reader_class' => 'ExampleJSONReader'));
...
$this->addFieldMapping('title', 'name');
$this->addFieldMapping('field_release_date', 'release_date');
...

Help improve this page

Page status: Not set

You can: