Content Export Support [#2898704]

Problem/Motivation

Creating and maintaining content manually can be a pain-staking task. Adding support to export content could greatly reduce the effort required to create content files as well as open up additional workflows for use of the module.

Proposed resolution

Core export commands and services (#2943907: Content Export Support: Entity Export Services)
- Unit tests along the way
Exposure and workflow testing through CLI interfaces (#2943912: Content Export Support: CLI Interfaces)
UI design and implementation (#2943914: Content Export Support: Web Interfaces)

Further discussion encouraged.

Remaining tasks

User interface changes

Provide an interface to support exporting of selected content items.

The overall workflow for exporting content should be explored and refined first using CLI interfaces (#2943912: Content Export Support: CLI Interfaces) followed by design and implementation of administrative UIs (#2943914: Content Export Support: Web Interfaces).

API changes

TBD.

More detail on this is being outlined and discussed within #2943907: Content Export Support: Entity Export Services.

Data model changes

More detail on this is being outlined and discussed within #2943907: Content Export Support: Entity Export Services.

TBD

Comments

Comment #1

31 July 2017 at 14:53

slucero created an issue. See original summary.

Comment #2

slucero

he/him

English

CreditAttribution: slucero at Mediacurrent commented 31 July 2017 at 14:58

Cross-posting some of the relevant conversation from #2894715: Demo content module:

One thing I'm interested in is using the core Serialization module as an API for processing field data on import/export (handling entity references, files, etc). Mentioned that in my issue in the default_content issue queue: #2896971: Question about architecture.

You've already seen that you need such a thing (hence your events in 2.x), so the question is just whether relying on core would make that easier. I don't know how easy the implementation would end up being, especially since we want to simplify the final output (by removing the list arrays for single-cardinality fields, and the "value" level for field values with a single column).

-- bojanz #2894715-5: Demo content module

@bojanz, your suggestion for using the Serialization API is very interesting since the overall concept and goal lines up very closely with the functionality I'm aiming to achieve here. One of my biggest concerns with it, however, is the fact that it is the main driving force chosen in the default_content module and I wonder how much of that is the cause of my original difficulties with that module that led to writing this one.

Especially when it comes to using interrelated content that approach resulted in a tightly interlinked collection of exported content dependent on ID references whereas I sought to make something a bit more flexibly linked. Even if not going fully down the route of using Serialization API for the export all the way to files, the normalization/denormalization functionality could prove very beneficial.

The other approach I have given a lot of consideration to for the import/export operations was somewhat following the form of migration templates to define exactly how content should be exported. This approach could support more varied use of content processors within exported and imported content to produce a wider variation of content if desired. An example of this would be to configure page nodes to export with the body field value replaced with a sample content processor. Using this, the imported content could then be used multiple times to produce a structured set of demo content with varied values.

-- slucero #2894715-6: Demo content module

Comment #3

johnwebdev CreditAttribution: johnwebdev commented 4 October 2017 at 08:19

So, we are evaluating to use this module in a larger project and I'm definitely interested in contributing to have this feature built.

Default Content module exports the content keeping it's id which I believe is a problem because it requires things to be imported in a specific order and exporting becomes harder if you use a clean database. Instead the export should rely on something else like UUID.

Secondly, imagine we export a node having a file (image). And then we export another node using the same file (image). How can we expect the export files to look like? I believe we must assume that the child entity already exists and if not create it on hand. This is awkward if we use multiple export files since, we actually store the same child entity in multiple files. Any thoughts on that? Perhaps it's nothing to bother with anyway.

Should each entity have it's own export file along with it's child entities, or do we keep them separated?

Thinking of Drush commands, I believe a command to export where you put an entity id along as argument seems like a good way to get started?

Comment #4

slucero

he/him

English

CreditAttribution: slucero at Mediacurrent commented 4 October 2017 at 15:17

To lead here, I don't have a clear roadmap at this point for the feature, so I don't have solid answers for much of this. This is both helpful and not since the implementation plan is still flexible.

Default Content module exports the content keeping it's id which I believe is a problem because it requires things to be imported in a specific order and exporting becomes harder if you use a clean database.

This was a problem for all of my use cases and one of the largest reasons this module was initially written.

Should each entity have it's own export file along with it's child entities, or do we keep them separated?

This is one of the major questions that's up in the air at this point. Keeping the content together makes the most sense in the current workflow since the manual maintenance is easier that way, but I would assume once an export functionality is available the workflow may change entirely to be managed through the UI. If that becomes the case then the main benefit of keeping things grouped is moot, and the functional benefits of exporting everything to independent files becomes more valuable.

If items are all exported to independent files it becomes easier to recognize reused content and avoid things being duplicated throughout aggregated files. This would of course be dependent on things being consistently identified as well, such as via UUID as you previously mentioned.

Overall Workflow

One of the overall workflows I had envisioned was an option for creating "templates" for the export process of various content/entity types. This could follow a similar pattern as (and possibly use components of) migration jobs. Following this concept an export template could be created for a content type that maps content to relevant fields to be included in the export. Taking this a step further, a more generalized template could be created to bulk create test content by defining, say a lorem ipsum plugin to be used for the body field so that for each exported node the content of the body field is generated using different lorem ipsum snippets. Similarly image fields could use a dummy image plugin to produce placeholder images for use in exported content.

Comment #5

johnwebdev CreditAttribution: johnwebdev commented 4 October 2017 at 20:13

but I would assume once an export functionality is available the workflow may change entirely to be managed through the UI.

I think there are valid use cases for both managing through UI and Terminal. Perhaps one way to keep the files separated are to have a convention for the file names which in main entities exported files can be referenced to with ease.

One of the overall workflows I had envisioned was an option for creating "templates" for the export process of various content/entity types. This could follow a similar pattern as (and possibly use components of) migration jobs. Following this concept an export template could be created for a content type that maps content to relevant fields to be included in the export. Taking this a step further, a more generalized template could be created to bulk create test content by defining, say a lorem ipsum plugin to be used for the body field so that for each exported node the content of the body field is generated using different lorem ipsum snippets. Similarly image fields could use a dummy image plugin to produce placeholder images for use in exported content.

This is actually a pretty cool idea. Being able to export actual content but also being able to generate dummy content within the same template. Neat.

Comment #6

websiteworkspace CreditAttribution: websiteworkspace commented 5 November 2017 at 14:46

Thank you for this effort.
Looking forward to a dev version of export support for yaml_content.
Will be happy to perform testing once a dev version is available.

This module, with both export and import, is very much needed for use cases like Commerce 2.x (DC2x), so that base DC2x configuration and base commerce content entities can be created once (commerce product attributes, commerce product variations commerce products), then exported, to be available for automated import using drush and shell scripts. With such capability it would be possible pre-fabricate complex DC2x store setups with base content that can be installed by default with a profile of a use case specific DC2x distribution.

Comment #7

johnwebdev CreditAttribution: johnwebdev commented 5 February 2018 at 15:06

Since there is no clear road map, I'm just dumping down a lot of ideas and concepts I've worked on. By all means they are just suggestions.

Endpoints = How do a user import/export content?

Potential endpoints

Drush
User interface in Drupal

User interface in Drupal

For instance, like Configuration synchronisation you could download the exported content as a .zip, as well as import using .zip.

You could also have something like Deploy

I was also thinking about saving exported content in a folder, like the config/sync though that means you'll have to move the contents as part of a deployment, which I'm not sure of.

Main point is here that the endpoints should be decoupled that we can build new ones.

Importing and exporting should be unidirectional. This means that either way how content is imported and exported, the module does not keep any meta data or state about the content being exported and imported.
So given we'd use a user interface like uploading a .zip there won't be any concerns regarding integrity, but simply creating or updating the exported content.

How are content exported?

I did a small image of how I thought an export could work technically. Based off the idea of using Serialisation and inspiration of other modules.

https://i.imgur.com/mpaW8NQ.png

The process means the use of processor plugins (I believe you mentioned earlier as well) which allows developer to opt-in and add additional data, etc.

Also some questions (perhaps part of the user interface)

Do you want to preserve IDs? (Error-prone)
Do you want to include entity references children? (Most likely)
Do you want to move physical assets? (Probably)
Do you want to include all languages? (Maybe)

Thinks like URL aliases, Menu links etc would be handled by processor plugins. I looked at Facets which has multiple processor stages which is really flexible for developers wanting to opt-in.

Since an entity may contain many entities, I figured it would be better to actually split up an export in multiple files (which also would make debugging easier).

I'd be happy to discuss further :)

Comment #8

slucero

he/him

English

CreditAttribution: slucero as a volunteer commented 11 February 2018 at 16:14

Issue summary:

View changes

@johndevman, thanks for all the thought you've put into this! I appreciate the ideas greatly, especially since I haven't gotten to swing back around to this to put some more thought into it.

Endpoints = How do a user import/export content?

Potential endpoints

Drush

User interface in Drupal

Both of these are features I'd like to target, with the additional consideration of Drush and/or Drupal Console. As long as the core commands are appropriately compartmentalized into services, exposure into these different interfaces, as well as adequate testability, should be easier to achieve. The core focus will first be the core functionality in appropriate services, and then they can be exposed into lower-level commands from Drush/Console for quicker implementation and testing to work out the kinks before a larger investment in designing a UI around the process.

So based on that, the prioritization would look like:

Core export commands and services (#2943907: Content Export Support: Entity Export Services)
- Unit tests along the way
Exposure and workflow testing through CLI interfaces (#2943912: Content Export Support: CLI Interfaces)
UI design and implementation (#2943914: Content Export Support: Web Interfaces)

Main point is here that the endpoints should be decoupled that we can build new ones.

This is critical and one of the main motivations for the separation of tasks I've outlined for this implementation.

Importing and exporting should be unidirectional. This means that either way how content is imported and exported, the module does not keep any meta data or state about the content being exported and imported.
So given we'd use a user interface like uploading a .zip there won't be any concerns regarding integrity, but simply creating or updating the exported content.

I think this is a great requirement to build around. It would keep the system simpler and more maintainable all around. The only exception we address currently for this is updating existing content if it's matched, but I think this is a core element of the module for most use cases that will need to be maintained.

User interface in Drupal

I've cross-posted this section for reference and follow-up in the new export UI ticket: #2943914-2: Content Export Support: Web Interfaces

For instance, like Configuration synchronisation you could download the exported content as a .zip, as well as import using .zip.

I like this idea since that will support more environments and usage scenarios where someone may not be as comfortable with file system access or may not even have ready access to it.

You could also have something like Deploy

I haven't actually used the Deploy module, so I'll have to look into this before I can weigh in on that. I expect there could be some definite parallels tho given the similarity of use cases.

I was also thinking about saving exported content in a folder, like the config/sync though that means you'll have to move the contents as part of a deployment, which I'm not sure of.

This gets into some of the larger workflow questions I'd still like to explore further. In my mind this relates to a couple of the largest questions that have been brought up previously and still need further exploration:

How is exported content distributed within files (aggregated vs one per file vs other options)?
How does exported content relate to the existing manual creation process?

Export Process

I'm cross-posting this section to the new ticket for the core services for this process for reference and follow-up: #2943907-2: Content Export Support: Entity Export Services

I did a small image of how I thought an export could work technically. Based off the idea of using Serialisation and inspiration of other modules.

https://i.imgur.com/mpaW8NQ.png

I think this is a great model to start with. The normalization process was brought up previously and shows a great deal of promise. My concerns from my initial digging into the existing APIs at that point are that the existing core APIs for this may not suit our purposes.

The process means the use of processor plugins (I believe you mentioned earlier as well) which allows developer to opt-in and add additional data, etc.

This and the event system currently in use are both critical features in my mind since to this point these have been the primary methods of not only site-specific customizations, but also adding in support for more difficult or non-standard features like node menu links (#2879468: Nodes Cannot Create Menu Links Automatically), user loading #2876203: Support load user entity by name process callback, and, soon, path aliases (#2883434: Nodes Cannot Create Path Aliases).

Also some questions (perhaps part of the user interface)

Do you want to preserve IDs? (Error-prone)

Do you want to include entity references children? (Most likely)

Do you want to move physical assets? (Probably)

Do you want to include all languages? (Maybe)

Since this post is already getting very long, let's follow-up and discuss these decisions in more detail in #2943907: Content Export Support: Entity Export Services.

Things like URL aliases, Menu links etc would be handled by processor plugins. I looked at Facets which has multiple processor stages which is really flexible for developers wanting to opt-in.

I hadn't thought of referencing Facets and their multi-staged system, that's a great idea and could potentially solve some of the problems I've encountered so far with the event-based system.

Since an entity may contain many entities, I figured it would be better to actually split up an export in multiple files (which also would make debugging easier).

I think there are a lot of side-effects that will cascade down from any decisions we make on this front. I tend to agree that the content will need to be split across multiple files, especially to support content reuse and cross-referencing, but the extreme I would like to avoid as much as possible is making the automated exports too difficult to manually interpret and maintain where needed. One example of how this could come about, would be if all of the exported content was keyed by UUID which is notoriously unfriendly.

I'd be happy to discuss further :)

Thanks again for all the thought and insight you've put into this!

Comment #9

johnwebdev CreditAttribution: johnwebdev commented 15 February 2018 at 08:36

The core focus will first be the core functionality in appropriate services, and then they can be exposed into lower-level commands from Drush/Console for quicker implementation and testing to work out the kinks before a larger investment in designing a UI around the process.

Sounds like a sensible approach.

I think this is a great requirement to build around. It would keep the system simpler and more maintainable all around. The only exception we address currently for this is updating existing content if it's matched, but I think this is a core element of the module for most use cases that will need to be maintained.

I agree. There are also some edge cases with deletion for instance if using Paragraphs. But that should not be handled by core element of the module but rather a processor plugin.

I like this idea since that will support more environments and usage scenarios where someone may not be as comfortable with file system access or may not even have ready access to it.

Yes, it's a bit more work for the user than using something like Deploy, but still a viable approach. It was inspired by how Episerver works with exporting and importing data.

This gets into some of the larger workflow questions I'd still like to explore further. In my mind this relates to a couple of the largest questions that have been brought up previously and still need further exploration:

The problem with this approach that it is for developers or experienced site builders. I am not sure whet ever those are responsible for moving content in general.

I think there are a lot of side-effects that will cascade down from any decisions we make on this front. I tend to agree that the content will need to be split across multiple files, especially to support content reuse and cross-referencing, but the extreme I would like to avoid as much as possible is making the automated exports too difficult to manually interpret and maintain where needed. One example of how this could come about, would be if all of the exported content was keyed by UUID which is notoriously unfriendly.

Agreed. Perhaps a follow-up issue to discuss it more throughly?

Comment #10

slucero

he/him

English

CreditAttribution: slucero as a volunteer and at Mediacurrent commented 3 March 2018 at 15:02

@johndevman, I've created #2949611: Content Export Support: File Organization Stragegy to follow-up on our discussion above regarding file organization for the exports.

I think there are a lot of side-effects that will cascade down from any decisions we make on this front. I tend to agree that the content will need to be split across multiple files, especially to support content reuse and cross-referencing, but the extreme I would like to avoid as much as possible is making the automated exports too difficult to manually interpret and maintain where needed. One example of how this could come about, would be if all of the exported content was keyed by UUID which is notoriously unfriendly.

Agreed. Perhaps a follow-up issue to discuss it more throughly?