Problem/Motivation
Follow-up to #2725533: Add experimental content_moderation module.
When content_moderation creates new ContentModerationState entities, there should only ever be one of these entities to represent the state of a single revision of an item of content. We should enforce that at the data level, no item of content will ever have two corresponding content_moderation_state entities.
Proposed resolution
To solve this, we should add some unique keys to the content_moderation_state tables. To identify a unique entity, the following fields are used:
- content_entity_type_id
- content_entity_id
- content_entity_revision_id
- workflow
- langcode
Remaining tasks
Review.
User interface changes
None
API changes
None.
Data model changes
New unique keys on the content_moderation_state_field_data and content_moderation_state_field_revision tables.
Comments
Comment #2
jibranComment #4
sam152 commentedI think this is a case of an issue that really doesn't need a test. The UniqueField constraint is already tested, so this is overkill and could be tested by simply asserting the field definition has that constraint applied, but maybe it can be the start of tests pertaining to any other constraints that are introduced.
Comment #6
timmillwoodSeems to do the trick.
Comment #7
alexpottI don't think this is that simple :(
The constraint needs to enforce that content_entity_type_id, content_entity_id and content_entity_revision_id is a unique combination. So the constraint has to be on the entity type level.
See the Comment entity type for an example of an entity level constraint.
Comment #8
sam152 commentedReally good catch. Ran into a similar issue in the views integration ticket where moderation two entity types at the same time (which the tests are light on) caused a conflict.
Comment #9
sam152 commentedAttached is a fresh approach which would address this properly and resolve the @todo.
Comment #11
sam152 commentedCoding standards failing the bot is cool.
Comment #12
sam152 commentedMaking it a count query, because we don't actually need the results.
Comment #15
sam152 commentedWrong KTB.
Comment #16
timmillwoodDoes it matter if it has the same ID? isn't this validated somewhere else? don't we just care about the content_entity_type_id, content_entity_id, and content_entity_revision_id in this issue?
Nice! I hadn't considered the possibility of OR here.
Comment #17
sam152 commentedThe id check in the query ensures entities don't consider themselves duplicates when being updated.
Comment #18
timmillwoodah ha, yes!
Comment #19
alexpottI'm pondering whether we should do extra lookup - there's a cost associated with this. We could just change
\Drupal\content_moderation\ContentModerationStateStorageSchema::getEntitySchema()to be:The change is
indexestounique key.Regardless of whether we enforce via a constraint on not I definitely think we should enforce on the data level. Also interestingly we need to add the index to the revision table too - since atm we've added the index to the data table but we're not querying that in \Drupal\content_moderation\Plugin\Field\ModerationStateFieldItemList::getModerationState()...
So we should add:
Comment #20
catchThis should have an accessCheck(FALSE) even if we never expect anyone to implement access. But I also agree with @alexpott I'd rather see this at the data level if we can.
Comment #21
alexpottI also suspect that we have an issue with langcodes on the latest patch because in order to make my suggestions of unique keys work I needed to do:
Comment #22
alexpottAdd even that is not perfect because \Drupal\Tests\content_moderation\Kernel\ContentModerationStateTest::testMultilingualModeration() fails.
Comment #23
alexpottOh and lol! This is not coping with workflow - it is possible that an entity is in more that one workflow. Not recommended for sanity but we shouldn't prevent this on the data level.
Comment #24
sam152 commentedWait, so not enforcing this with the schema? Does that mean the primary feedback is the access check?Edit: Oh, I get it, lets not prevent multiple workflows on the data level, but still the other constraints.
Comment #25
sam152 commentedWhen the suggestion in #22 is implemented we get a revision table that looks like:
Under the hood we're creating a new revision and adding a language. I'm not totally sure, but I think this could be a valid state. The the revision ID is 2 for the french node, we would need a corresponding revision for the other languages, I think.
Comment #26
sam152 commentedComment #27
sam152 commentedOn the ways to make the translation test pass was to only create a new ContentModerationState revision if the entity under moderation was also a new revision. This passes that specific test, but we'll see what the testbot says about the rest of them.
The tests were quite handy to see how this behaved compared to the constraint. This is one of the test cases that is no longer valid by using the unique keys.
Previously I was validating that you could have no ContentModerationState entity that shared either an ID or revision ID, but now it's the combination of both.
Probably a fair trade off given the simplicity of this approach + perf win?
Comment #29
sam152 commentedTest fixes, phpcs and removing another left over test case from the constraint.
Comment #30
alexpottStill missing workflow as a key.
Neat. So this will result in the content and content_moderation_state revisions tracking each other. I like.
Comment #31
sam152 commentedGood catch re: adding the workflow key.
Comment #32
alexpottI think we should still have reference to this method since this lookup is the reason this key exists and has to be against the revision table too.
This needs to be the other way around. When we query we query by content_entity_type_id, content_entity_id, content_entity_id and workflow. SQL is optimised to use the left hand side of indexes. Load all of the languages at this point. But we need langcode to be part of the index because of uniqueness.
violation is constraint language... we should probably update this.
Comment #33
sam152 commentedTIL re: indexes. Working with the database schema directly is a bit of a blind spot for me, so appreciate the feedback. Curse the golden handcuffs of abstraction! :)
Comment #34
alexpottWell we are really testing that the mapping to from content entity to content state entity is unique per workflow.
More validation... - sorry should have noticed before.
Comment #35
alexpottAlso I think given the performance implications of not having an index on the revisions table we're in major bug territory. On a large site there are likely to be thousands of content moderation state entities so lookup by the related entity needs to be quick.
Comment #36
sam152 commentedYeah, maybe the test should be named after the class it's testing. Removes artefacts from the constraint.
Comment #37
timmillwoodIt doesn't look like the test covers langcode. Other than that looks good.
Comment #38
sam152 commentedGood catch.
Comment #39
timmillwoodNice one!
Comment #40
wim leersAs a maintainer of the REST module and fixing lots and lots of Entity + REST support bugs lately, I've seen my share of entity validation constraints. Which is why I took a look at this issue.
So this issue fixes this todo.
Except this is the only constraint-related changes. We're setting a max length. And we're changing the storage schema to have a uniqueness requirement.
But why not have an actual entity validation constraint?
Like for example
\Drupal\aggregator\Plugin\Validation\Constraint\FeedUrlConstraint?This proves what I thought the problem was: we're relying on storage exceptions rather than doing actual uniqueness constraint checking.
Apparently the patch in #15 did this. But @alexpott wrote this in #19:
I don't disagree with that. It's fine to also enforce this on the data level.
But leaving this entirely to the storage level like this is doing right now means that you'll only ever get "storage exception" as feedback, rather than explicit validation errors. That's going too far.
If #2843753: Prevent ContentModerationState from being exposed by REST's EntityResource would already have been done, then I could easily add the extra REST test coverage to show that this would result in very unhelpful error responses via REST. If you'd have a validation constraint, then you'd get very helpful error responses.
Comment #41
dawehnerI agree, these exceptions don't provide any additional semantics while entity validations do.
Comment #42
sam152 commentedI'm happy to roll the constraint into the patch as well, it's already written. As pointed out it just add some extra cost to saving the field data, but justified if there's a good reason (like better REST support).
Comment #43
wim leersYay!
Comment #44
sam152 commentedAdded the entity validations back into the patch.
Comment #45
sam152 commentedComment #46
wim leersIdeally, #2843753: Prevent ContentModerationState from being exposed by REST's EntityResource would already have been fixed, then this issue could be expanding the test coverage introduced in that issue. But it's not reasonable to block this on that , so … back to RTBC per #40 :)
Comment #47
wim leersComment #48
dawehnerAt least for me the primary test coverage for entity constrains should not be on the REST level but rather on the pure entity api level,
because when you put it onto the rest level you are one level too high in the abstraction chain and will forget cases of the validation.
Comment #49
wim leers#48: absolutely!
Comment #50
alexpottI think we some rules about validators and storage exceptions. Things like this have a real cost. If you are bulking importing content entities under moderation this causes an extra query per validation. This especially is a concern for content moderation where we aim to be installable on a site with thousands of existing nodes. At some point a batch process is going to have to run and place all the existing content under moderation.
I'm not sure what the right answer is here.
Comment #51
alexpottOne reason we might consider the validation is OTT is that the entity here is normally created as a result of another entity being created. However that is going to skip validation.
Comment #52
sam152 commentedThe question of whether this entity type should be interacted with directly, not via the computed field has also come up recently in #2860889: ContentModerationStateInterface additions.
Comment #53
wim leers#51: what does "OTT" mean?
When doing bulk importing, the code doing that could just skip validation (assuming it was very thoroughly tested).
IOW:
I think that strikes the desired balance?
Comment #54
catchOTT = Over The Top.
I'm not sure on the validation + exception vs. exception-only question - part of me thinks we should commit the exception and open a follow-up for the validation.
Comment #55
wim leersHistory suggests that follow-up is unlikely to be fixed.
Comment #56
wim leersComment #57
timmillwood+1 to #54.
Re: #55 - Maybe it doesn't get fixed for a reason? like it's not needed or there isn't a good solution.
Half of this patch is better than no patch at all.
Comment #58
wim leersSuggesting this is a possible explanation is dishonest. Obviously there is a need: REST/JSON API/GraphQL/RELAXed Web Services need this.
There is a green patch that supports validation.
Hence the ambivalence in @alexpott's comment in #50.
Comment #59
alexpott@Wim Leers - but does REST/JSON API/GraphQL need this? I'm not completely sure that anything should be interacting with these entities directly.
Comment #60
wim leersI don't know. Why should anybody be interacting with
FilterFormatconfig entities orItemcontent entities directly?My intent is not to hold up this issue.
My intent is to force us to think about the consequences. To consider the integration with the rest of the system. Rather than just postponing that to a follow-up. If that follow-up must be resolved before
content_moderationcan become stable, then sure. But otherwise we're just creating more technical debt, because we have not actually solved the integration question.A contrib module that gets moved into core must integrate with all relevant other core modules. If we don't, we just increase core's technical debt. (And in this case, also the amount of work for the API-first initiative, yes.)
Updating issue title to reflect current scope.
Comment #61
alexpott@Wim Leers but the only reason these are separate from the content entities themselves is to make the module installable/uninstallable they have no meaning outside the context of of the related content entity for which it is being saved. Sometimes less surface area is a good thing - right?
See the hoops we're jumping through in \Drupal\content_moderation\Entity\ContentModerationState::save() and \Drupal\content_moderation\Entity\ContentModerationState::realSave()... we don't even save it without saving the related content entity!
Comment #62
wim leers#61 makes it sound like there is never a reason to create these
ContentModerationStateentities via a HTTP API (REST, GraphQL or otherwise) or via code. If that's the case: okay, then there's no need for validation. But then we somehow need to be able to indicate that entities of this type can never be created/updated/deleted/viewed. Which we can do, by implementing an access control handler that forbids access for all operations.Comment #63
timmillwoodThis issue is still RTBC, which kinda means the community agrees on #44 being the way forward, although @alexpott makes a good point about the overhead of entity query in a validation.
Therefore, could we get the patch from #38 committed, then open a follow up to discuss, work on the interdiff in #44?
Comment #64
sam152 commentedI think what everyone agrees on right now is that the ContentModerationState entity is a kind of internal implementation detail that we don't really want expose to users that heavily. For that reason, I think the access control handler is a really good idea.
I've opened #2873287: Dispatch events for changing content moderation states in order to try to squash another use case which brings these entities into the foreground, with the view that we could almost mark the whole thing as @internal.
Opened #2873291: Create an access control handler to deny CRUD access to the ContentModerationState entity for this.
Comment #65
wim leersI think #2873291: Create an access control handler to deny CRUD access to the ContentModerationState entity should be done in the scope of this issue, if we indeed think that this entity type is internal. I'll provide the implementation:
Then at least this issue/patch will leave us in a consistent state, rather than an inconsistent state.
Comment #66
sam152 commentedI like it, but keen to hear what @alexpott thinks.
Comment #67
timmillwoodWhy wouldn't we just do this in the follow up?
but as the code is written, lets just get it in.
Comment #68
alexpott@timmillwood because we don't want to add the validator to remove it. @Wim Leers's access control handler idea is great. Let's do it.
Comment #69
wim leersUpdating issue title.
Comment #70
sam152 commentedAdded the code from #65 and a quick test. Updated the issue summary as well.
Comment #71
wim leersComment #72
alexpottI think this should have a comment as to why we are forbidding all access.
Comment #73
timmillwoodAdded comment.
Comment #74
jibranThanks! @timmillwood.
Comment #76
tacituseu commentedUnrelated failure.
Comment #77
alexpottLet's be a bit more helpful.
Discussed this issue with @dawehner and he remarked that we should probably mark the entity class as internal. Created #2876419: Review content_moderation module and mark code with @internal where necessary
Comment #78
alexpottCommitted and pushed 8bedf88 to 8.4.x and b9d6aa5 to 8.3.x. Thanks!
Backported to 8.3.x because content_moderation is experimental.
Credited @catch, @Wim Leers, @dawehner and myself for review comments that influenced the direction of the patch.
Comment #81
sam152 commentedNice one! Thanks for the all the guidance on this one @alexpott and @Wim Leers.
Comment #82
wim leersYay! Sorry for insisting, but I am convinced this will help our collective future sanity :)
Comment #84
sam152 commentedThis needs to be added to the upgrade path because of the change to max_length.