Problem/Motivation
Recipes -- and especially site templates -- can include content. Core can import the format generated by the venerable Default Content contrib module.
However, exporting content is a gap. You need the Default Content module to do that. Default Content has some problems, namely that it is not extensible for export -- new types of fields and data structures can't be handled by it without patching the module.
Besides, it's not really feasible to require recipe authors, and especially site template creators, to use Default Content to put content into their recipes. It needs to work, and it needs to be able to handle all core field types, and it needs to be able to handle exotic contrib field types (Entity Reference Revisions, Smart Date, Experience Builder's stuff, and so on).
Proposed resolution
I propose we add a new content:export command to the core/scripts/drupal script.
Initially, to keep things simple, it should support exporting entities one at time, with no handling of dependencies, and only in YAML format. For example:
$ php core/scripts/drupal content:export node 42
... YAML DUMP HERE ...
To generate the export, it should use the Serialization module's normalization API to normalize the entity and all of its fields. This means the command will have to exit with an error if Serialization is not installed -- but that's probably okay for the time being. This is a developer-facing command anyway, and we can lift that restriction when and if Serialization is turned into a core subsystem (which was discussed in #2296029: Move Serialization module back into a core/lib component).
The exported content should be, pretty much, exactly what you'd get out of the Default Content module. We don't need to handle normalization for all core field types right away; that can happen in follow-ups, as long as the normalization is pluggable.
Indeed, this command will not, initially, be as robust as Default Content is -- both because Default Content is more battle-tested, and crucially, because it has a lot of hard-coded handling of various special cases and field types (both core and contrib), much of which will need to be ported into core piecemeal after this first issue is committed. But we can start here.
In a follow-up issue, we should add support for exporting an entity and its dependencies into a folder structure. We should also support doing the export as a specific user (maybe a --user=N option), rather than merely "the first one with an administrative role".
User interface changes
None, but there will be a new content:export command for the drupal script.
Introduced terminology
None.
API changes
No API changes as such, but a slew of additions to the experimental default content API (including a new event) and a change to the Serialization module's normalizers in order to support passing callback functions to the normalizers' $context parameter, which necessitates a new interface for those callbacks. Fields and data types will be able to specify a setting that lets them opt into, or out of, being exported.
None of these changes have BC implications.
Data model changes
None.
Release notes snippet
TBD
Issue fork drupal-3532694
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
larowlan+1 for this, it will likely require a dependency on the serialization module but I think that is fine
Comment #3
phenaproximaComment #4
phenaproximaJust to give a little context, @larowlan linked me to https://www.previousnext.com.au/blog/we-could-add-default-content-drupal..., which is from almost a decade ago. It outlines four tricky problems that prevent the addition of default content capabilities to core. I want to quickly shoot these down.
Core's default content import is done by recipes, and works well. The concern about shipping content with modules is moot. Recipes' job is to put all necessary configuration in place before any content is created, and the recipe system's strong, straightfoward configuration handling means that default content can come in easily. This concern is definitely no longer applicable.
This was true for v1 of the Default Content module. We would still need to have Serialization enabled for export, that's true...but there are no special modules required for import, which is the end user-facing case. Recipe authors are unlikely to be bothered by the need to install Serialization before exporting content.
This is probably legitimate, although the situation is likely significantly better now than it was at the time the blog post was written, thanks to the advent of JSON:API and the core improvements that it brought us. But yes, normalization is the meat and potatoes of doing default content export correctly.
We already put default content in recipes, which puts profile support on the back burner (where it belongs). This is not a problem anymore.
So there you go. If you ask me, the time to do this is now!
Comment #5
thejimbirch commentedMoving to the Default content system.
Comment #7
nicxvan commentedSlightly different use case, but tome can export content to json for keeping it in git.
Comment #8
phenaproximaComment #9
phenaproximaUpdating API changes based on my current progress.
Comment #10
phenaproximaThis is now reviewable and has a passing test.
My deep journey into the Serialization module (which I've not used before) has shown me several things:
ExportMetadata) which makes it easy for normalizers to flag other entities as dependencies.So...onward! Let's get this crucially important feature shipshape, and merged in.
Comment #11
nicxvan commentedGreat work! and this went so quickly.
I like how you can set individual properties as not exportable.
It's kind of sad that layout builder is left out again, but I think fixing that issue is out of scope as you mentioned I assume this will not work with Experience builder either then?
I know I've objected to final and private on several issues now, but a content exporter feels explicitly like the kind of thing you want to extend and this precludes that.
That method of testing is pretty clever!
Haven't deeply reviewed this yet.
Comment #12
phenaproximaNot out of the box, but by hooking into the serialization system, it gives XB a way to become exportable: all it needs to do is implement a normalizer that can normalize its various data structures. That's the single biggest advantage of this approach over Default Content's -- modules can handle exporting their own data.
The exporter should not be extensible. If you want to change how it operates, you should implement a normalizer. To me, that feels like the correct amount of API surface here; what would the use case be for directly extending the exporter itself?
Comment #13
nicxvan commentedEverything is final though not just the exporter.
We shouldn't limit contrib based on my lack of creativity. My point is, just with everything else with final you have no recourse beyond unfinalize or reflection.
It's marked final you can't extend it or decorate it so if someone wants to experiment with the complex data normalize the solution is to just copy everything and fork it.
Comment #14
phenaproxima(emphasis added)
That's not true. You can decorate anything with an interface, and the export normalizer has an interface (
NormalizerInterface). It is a decorator itself. Decoration is the correct way to add more things to final classes.I am not going to die on the final/private hill in this issue; if a committer tells me to mark it non-final and make the private members protected, I'll do that. But I will insist that the class be marked internal with a clearly-worded warning, because it is part of an experimental subsystem. If someone extends an internal class and it breaks them, they deserve what they get. 😈
Comment #15
nicxvan commentedI will explore that further I may have missed something in my testing it's been a bit.
I 100% agree.
Also agreed, it's caveat emptor.
My point of contention is we should not block it, but that we should warn them not to extend it.
It's why I want to begin using @final. It's an even stronger warning.
I'm just wary of final after running into blockers with rector and symfony.
Comment #16
phenaproximaComment #17
murzAs an alternative, until this feature is in core, we can use this module: https://www.drupal.org/project/single_content_sync - it can export Layout Builder too, and also integrate reference entities like menu and path_alias into the single yaml file together with node.
Comment #18
phenaproximaA few follow-ups have been suggested to me privately by interested parties, plus a couple of ideas of my own:
ExportMetadatato facilitate this.Finderclass) and it would be trivial to make the export command write JSON instead of YAML, maybe with a--format=jsonoption.--as-user=Noption to the export command for that.demo_umami_contentto use the default content system! That would be a really strong test of its capabilities and would exercise more nooks and crannies than just our comparatively sad little test fixture. :)Comment #19
nicxvan commentedOne thing that I'm not sure how to flag for infrastructure is if this includes media then the git repos for recipes using these exports may get enormous.
I've been using tome to manage https://nlighteneddevelopment.com for a couple of years.
I don't have a lot of content or images and I deploy once or twice a year and that repo is currently a gigabyte.
Comparing that with a drupal 11 site that I have hundreds of deploys which is like 50 megabytes.
I've been considering if there is a way to set up git lfs for my site.
I don't think this is a blocker by any means but infrastructure should prepare.
Comment #20
phenaproximaInfra has its own issue queue: https://www.drupal.org/project/issues/infrastructure?categories=All
Comment #21
nicxvan commentedThanks!
Comment #22
phenaproximaFiled the follow-ups from #18.
Comment #23
phenaproximaTagging as a contributed project soft blocker because without this, we can't easily build site templates.
Comment #24
phenaproximaOpened #3532961: [PP-1] Add a normalizer for component tree items to take advantage of this in Experience Builder.
Comment #25
phenaproximaAdding a related issue that will absolutely impact this one -- or will be impacted by this one -- depending on which gets committed first.
Comment #26
phenaproximaAdding #3533005: Allow fields to be marked as non-exportable as related, which would implement field-level export access control in core.
Comment #27
larowlanLeft some comments on the MR, nice work!
Comment #28
phenaproximaComment #29
thejimbirch commentedI reviewed and made a minor suggestion. Leaving as needs review for someone more technical than I to review.
Comment #30
phenaproximaChange record drafted: https://www.drupal.org/node/3533854
Comment #31
phenaproximaComment #33
phenaproximaAdjusting credit.
Comment #34
larowlanGave this a manual test against umami, works well.
Tested it with user 3 and I think we need to look into how password hashing works
Because
pre_hashedis set to FALSE, when the user is imported their password will get re-hashed - see\Drupal\Core\Field\Plugin\Field\FieldType\PasswordItem::preSaveand they won't be able to login.I think we probably want to fix that and add test-coverage.
Other than that, I think this is looking good to go
Comment #35
phenaproximaOoooh, great catch. Fixed with a test (there's no Christmas-ey CI run for this one since it's not fixing a pre-existing bug in HEAD).
Comment #36
berdirStarted this on the MR, where I added some more comments, but moving this to an issue comment.
I'm not really sold on the serializer + callbacks structure.
When I created default_content v2, I explicitly avoided the serialization module and the Symfony seralize component. The existing importer in core avoids it too. \Drupal\Core\DefaultContent\Importer::setFieldValues expects values that it can set as-is with a few specific known exceptions.
You mention this is explicit in regards to entity fields, but this relies on an existing arbitrary normalization format ($format is NULL), we have no idea what we get back from the that normalization process.
You work around this by adding several callbacks that explicitly undo the the specific normalizers in core, such as timestamp and entity references. What about field types you're missing, what if those normalizers have been customized?
What exactly does using serialization provide if we undo half of it?
95% of the normalization in default_content is two methods, normalizeTranslation (30loc) and getValueFromProperty() (66loc). I wrote it 5 years ago with no changes since then (there are a bunch of open issues to add support for some additional field types, but it can handle _a lot_ out of the box). It's specifically built to match the import logic and builds on content entity and typed data API and to handle entity types generically. I think we can find some extension points for this (possibly either those callbacks, or maybe tagged services, which I think would be more direct).
Comment #37
phenaproximaIt does not avoid Serialization because of any specific problem in Serialization; it avoids it because it is a module, and the core importer is a subsystem (which it needs to be, since recipes can be applied even with no other modules installed).
We aren't undoing as much as you think. Serialization, as it exists in core, gets us 95% of the way there. Most of the reason we need the callbacks is to match the stuff that Default Content puts out -- which, again, is the short-term goal of this MR. With a coherent import and export system in core, we can begin to evolve the "format" a little bit and remove some of these workarounds.
That's fair. We could send it an actual value (I had previously been using
raw, orraw:1.0) so that at least a format is defined and we can build on that. Happy to restore that if you want; it won't hurt anything, and it would certainly be prudent to make the desired output format explicit.But the whole point of the callback system (which was @alexpott's idea) is so that the normalizers themselves don't need to know anything special about output format, and just focus on downcasting our data structures to simple arrays and primitives.
Indeed, in previous versions of the diff, I did change the normalizers to know about the specific export data format, and act accordingly -- this way is much cleaner and far less prone to getting stuck with edge cases that cannot be worked around in contrib (if Default Content doesn't know how to handle a specific field type correctly, you're screwed; with the callback system, you can do something about it). Apart from the new setting on data definitions, export logic is confined to export-related code.
Comment #38
berdirI said "I", not "It". When I wrote default_content 2.x. The exporter here specifically doesn't avoid it, it absolutely depends on it.
What gets you there is the specific implementations of the ContentEntity, List and Field normalizers. We know how content entities are built, they are containers of lists of field items with properties. Serialization/Normalization is a super generic API capable of dealing with arbitrary data structures, which we are not working with.
I don't see how it makes sense to use a generic normalize API when we know that we do not support generic denormalization.
The workarounds are because serialization is used. default_content doesn't need any special handling for timestamp fields, or date fields, which this doesn't handle yet. It just exports the raw values. The serialization normalizers were added to remove drupalisms from our data structures and allow arbitrary clients to consume our data. They want formatted, standardized dates (for example), not UNIX timestamps.
default content export and import was purpose-built for a compact, simple and stable export/import format of default content in Drupal. 1.x used hal_json and it was pretty annoying to work with. The reason hal_json was used is that it deals with dependencies, which is useful for us and why I specifically added that as well. The default normalizer doesn't do that, so you have to add that back.
Field definitions don't really have a way to identify serial identifiers from non-serial ones, our storage basically just assumes that integers are, while strings are not (\Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema::processIdentifierSchema). We can easily add a check for that in \Drupal\default_content\Normalizer\ContentEntityNormalizer::getFieldsToNormalize().
.. with the callback system that specifically invented for this. There is absolutely no reason why we couldn't add something similar to the default_content logic. As mentioned, it could be built on tagged services, so we wouldn't need an event listener to register callback, that we then pass through a magic array key around:
We can register them directly on the exporter, check if we have something matching the type we have and call it. And we can add something to support import as well.
Right I forgot about that. The existing default_content logic does not, because it was specifically designed to export and import raw data and not worry about access and users. So this is another workaround that's needed because you use serialization.
Comment #39
phenaproximaHere's the thing: I personally do not, at the end of the day, actually care whether this uses Serialization or not.
The goal of this MR is to do whatever it takes to get core to export content in the format it knows how to import (which was lifted from Default Content). Whether it does that with a dedicated normalizer, or a tagged service collector, or straight-up magic, is not very important to me. I have two needs here that I'm trying to fulfill:
Context also matters: the default content API is experimental and we have a great deal of latitude to change it. There is plenty of time between now and 11.3.0 to work out the architecture.
This feature is strategically necessary. I am doing whatever needs to be done to get it in at all, and this has already been refactored too many times.
My vote: merge it more or less as-is, and then open follow-ups make whatever architectural changes we want (tagged services? an officially supported "exportable" setting for fields? etc.) before we call the core default content API stable. If that means we don't need to bother with Serialization, great -- so be it. Until 11.3.0 reaches beta, we can do whatever we like.
I'm certainly open to some of what you propose!
But site templates need this feature, and they need it now.
Comment #40
phenaproxima@berdir, I did a little bit of experimenting and indeed, not using Serialization definitely does show the potential to simplify a number of things quite a bit. Registering additional field types could be done with, say, a
PreExportEventsubscriber that does something like:As I've said, I'm mostly agnostic to how it works. But, with that being said, this has been refactored four times and if I'm going to make further architectural changes, I'd really like there to be alignment between those who feel strongly about that architecture, so that the next refactor is the last one.
Comment #41
berdirNoted on the time constraints.
I don't have as much time to keep up with this (wrote my previous comment at 1am) but I've been thinking on how to add extension points to the current default content export code. I'll try to create a MR to show my ideas asap and then we could discuss in slack or a call with the others?
On stability: also noted, but you also change stable apis with the field settings and the callbacks where it it gets more complicated with stability. IMHO, an approach that's works more out of the box and requires fewer adjustments in core and contrib will make your life easier. Your site templates will need to play nice with contrib entity types too.
Comment #42
alexpott@berdir thanks for the reviews.
Discussed with @phenaproxima - we agreed to removed the dependency on serialization as suggested by @berdir. @phenaproxima and I still think there is value in allowing fields to have some say in how they are exported via the setSetting() capability. Re 11.3.x vs previous releases - I think hardcoding a string is fine in this situation - we can create an enum from the string when we use it and error if it is wrong. Also @phenaproxima has a test showing that these settings do not end up in base field override configuration so I don't think we need to worry about configuration schema. @berdir maybe you have another suggestion that we could leverage for a field to give the exporter extra information. We're trying to avoid the exporter making too many assumptions about fields and give some control to the fields.
Comment #43
phenaproximaComment #44
berdirThanks for considering my feedback. I really like the direction, this is way more isolated to the component and requires fewer overrides and customizations.
I do have "few" more thoughts, we can decide what of that we can look into in follow-ups or if there are things we want to change before we get in. (Warning, still a long comment, because I'm me and like writing many words).
* I'm still not too fond of the settings approach, but I can live with it. What I'd suggest is that we explore to allow this but also start off with sane defaults. Basically what \Drupal\default_content\Normalizer\ContentEntityNormalizer::getFieldsToNormalize() does, as a fallback, if nothing is explicitly specified. This could also be used for the changed field instead of a callback. Kind of what Exportable::ignore() does, but in the default case, we'd check those for the entity keys and so on. The current API might not have enough context to access that though. What I would have done in default_content is add a specific event to the mentioned method. There will be plenty of contrib and custom entity types that do not use \Drupal\Core\Entity\ContentEntityBase::baseFieldDefinitions() (as that was added after 8.0 IIRC) an those will be broken. Their ID's will be exported, resulting in conflicts and so on. There are also still a bunch of "useless" fields being exported now, such as revision affected (calculated on save) and content translation metadata fields (less clear, but IMHO not useful for default content).
* For the event, now that we have control over it, what I had thought about to explore in default_content is to make it "active", so basically just pass in the field (or even property) and metadata and call it for all of them instead of using it to register callbacks based on the field type. It would be slower, but events are pretty fast once initialized and performance isn't really a concern here. Just an idea, didn't fully think this through. Advantages would be that multiple events can possibly deal with fields of the same type and they're not limited to act on the type. pathauto could do something about it's weird flag, scheduler could act on all the base fields it adds and so on. On tagged service, I definitely don't feel strongly about that, especially now that we need a far less of those and many are provided by default.
* In default_content, I specifically pushed a lot of the customization to the property level, because it allows to handle field types more generically. default_content doesn't need any special handling for files, image or dynamic entity reference fields for example, because they all use an EntityRefeference property. It doesn't always work (there are a bunch of issues about layout paragraphs for example), but it does work nicely for those.
* Files: there are no changes on file yml files but I assume you did verify to recreate them. However, if you delete the fixture folder you'll notice that the actual files won't be exported. This is currently missing and implementing it isn't really compatible with the current stream wrapper approach. default_content handles this in \Drupal\default_content\ContentFileStorage::writeEntity. This, beside references, is why I recommend extracting the output part.
* A feature that was kept in the importer is the ability to have nested entities such as paragraphs (in ERR, this is called composite entities). It is clear that the decision to do this embedding would live in the ERR module. But for it to work, ERR needs an API to do the normalization into an array. There would be workarounds I suppose (with the current API, let it export to YAML into memory, then parse that again), but it's pretty awkward. That's one reason why in default_content, the normalization is a separate API/service.
* On UserInterface vs AccountInterface. The distinction is vague and I'm not sure we even should have it, but UserInterface is an entity. AccountInterface is a more abstract concept, it is in theory possible that there could be another entity type implementing that that isn't users, and this logic might not apply to that. That's why UserInterface for me is the correct interface. Not a big deal because it's rather theoretical.
Comment #45
phenaproximaI experimented with having the event be the thing that carries a list of what to export and what to skip, and was very quickly convinced that it's better than the setting. Having this be something the event can decide is much more flexible, allows sane defaults, and can be easily overridden by modules that need to do something different. It also doesn't introduce a new setting with an ambiguous relationship to config. An added bonus is that doing this allowed me to remove the
Exportableenum, further reducing API surface.Overriding exportability for an individual property can still be done (and it is) -- you just need to write a custom export callback for it (the Path module's DefaultContentSubscriber is an example). I think this is a reasonable balance.
The fixtures were originally created with Default Content. :) There are some minor differences between them now and how Default Content generates them, but those differences are due to subtle shifts in how field values are exported. Functionally, they should not affect how content is imported.
I agree that the difference is largely academic, so here's an equally academic reason for keeping AccountInterface: this way, we aren't having a core subsystem depend on a module (even though it's a required one). A minor point of cleanliness, but a solid one.
Thought about this for a bit and decided it makes sense to support this.
Exporter::export()will just return the array so that you can export recursively if needed; the final serialization can take place in the command.Comment #46
mstrelan commentedI think it's possible we might be trying to access
->uuid()on null in the link event subscriber. Other than that most of the other comments I made are nits.Comment #47
phenaproximaAll outstanding feedback is resolved.
Comment #48
phenaproximaComment #49
phenaproximaCrediting @mstrelan for his review.
Comment #50
berdirI suspected I was too verbose with this. The important bit is in the second part of my paragraph, the first was just an intro, a preemptive reply to "I verified file entities by re-exporting them". I know they were created with default_content. My point is that what this is missing is the logic to export the actual files, not the content of the file entities. That needs special handling. Try deleting the whole folder and then re-exporting them, not just overwriting the existing files. And it requires hardcoding file entities in the drush command or wherever the logic for dealing with the output will be. Fine with a follow-up for this, as it will require that we write the files directly to a folder, but I think it would be good to have those follow-ups ready for the next steps (such as references as well)
Comment #51
phenaproximaOh! Gotcha. We do have that follow-up; file export will be handled as part of adding dependency export capabilities.
Comment #52
mstrelan commentedI had some additional questions and suggestions for improving the docs, otherwise this is looking really good.
Comment #53
mstrelan commentedThanks for addressing that feedback. I've only just actually tested it now rather than just reading the code and it works well. One thing that stands out to me that could be addressed in a follow up is whether it makes sense to export the created timestamp. It would be a bit weird to have a node or user that was created before a site was created.
Comment #54
berdirI didn't review every detail but I don't have any objections anymore to this being RTBC.
On created, mixed thoughts, default content can get old, but it can also look weird if it's all the same, especially on articles which are sorted by date, can also introduce a random factor in tests. I'd keep it, it's easy to remove through an event or by hand.
What I'd appreciate is if someone creates issues for pathauto and ERR and maybe even try to implement the provided events to replicate the current logic in default_content. Would also help to verify this is extensible enough.
Comment #55
phenaproximaDone.
Comment #56
alexpottCommitted 80fd4ba and pushed to 11.x. Thanks!
We also need to open an issue to make an 11.3.x version and up of default content that uses all the core stuff.
Comment #59
gábor hojtsy