Problem/Motivation
Right now there is nothing that tracks what the latest revision ID is for an entity. You can figure this out by querying for and joining the revision table against max(revision_id), but this is far from ideal and doesn't translate into a very useable solution for other subsystems such as content_moderation, views and entity queries.
With the workflow initiative in full swing, the latest revision is becoming a more important concept and as such entity API should become more aware of it.
Proposed resolution
- Add the latest revision ID to the entity data table to enable further work in this area for other subsystems.
- Ensure latest_revision is tracked on the data table to possibly support different languages having different latest revisions in the future.
Follow ups:
#2864995: Allow entity query to query the latest revision
#2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table
Remaining tasks
Agree, patch, test, etc.
User interface changes
None.
API changes
Additions only.
Data model changes
Additions only, an additional column on entity tables for revisionable entity types.
| Comment | File | Size | Author |
|---|---|---|---|
| #140 | 2784201-140.patch | 5.91 KB | yakoub |
| #138 | 2784201-138.patch | 15.35 KB | yakoub |
| #116 | 2784201-116.patch | 10.63 KB | sam152 |
| #116 | interdiff.txt | 1.43 KB | sam152 |
| #103 | 2784201-103.patch | 10.68 KB | sam152 |
Comments
Comment #2
joachim commentedAt the moment, content moderation tracks the latest revision of an entity in the 'content_revision_tracker' table.
However:
- this can't be used by entity queries
- the table isn't defined in hook_schema(), but created on the fly by the content_moderation.revision_tracker service when first needed, so queries in other code can't really rely on it.
Comment #3
dawehnerIdeally this would be a feature of the entity system.
Comment #4
joachim commentedUse case: a scheduling module needs to send an email for any node whose fields have a certain value.
If a node has never been published, then an entity query using current revision works fine. If a node has been published, and then further drafts created, then the most recent draft cannot be queried for.
Comment #5
timmillwoodIn content moderation module the following method is used to get the latest revision:
It's not pretty, but works. Maybe we need a new
latestRevision()entity query method?Comment #6
joachim commentedgetLatestRevisionId() works, but only when you are concerned with one single entity. It can't work when you want to query across a set of entities.
The only way to do this that I see is having the revision ID of the latest revision somewhere in the database. At the moment, it's on the 'content_revision_tracker' table, but EQ isn't aware of that.
Comment #7
timmillwoodIs this too messy?
Full node revision table:
Latest revisions:
Comment #8
timmillwoodLooking into the feasibility of a query like the one in #7. I think the biggest sticking point would be the
innerJoin()method not accepting another query.Comment #9
joachim commentedThere's a more efficient way to do this in a query: see http://stackoverflow.com/a/23285814/187581
I tried to use the self-join technique in the View representative relationship handler years ago, but never managed to get it working once there were other joins involved.
Either way though, it's not going to scale well...
Comment #10
sam152 commentedProducing a query to get the right revision is one thing, but keep in mind this query will need to be represented in a views data definition as well. I'm not too savvy with relationship plugins, maybe that's an option? In any case as pointed out, maintaining an index is both fast and conceptually easy to translate into views.
As for storage, maybe this could live on the base table and become a generic feature of entity API instead of content_moderation?
Comment #12
bkosborneLooks like the current option is to just construct a manual query and bypass the entity query service?
I also have a need for this. Using Workbench Moderation, I have a moderation state "delayed publish". I want to find all nodes who's latest revision indicates that states, and then publish them on a specific date.
Comment #13
sam152 commentedWhen using Workbench Moderation, you have access to a table which keeps an index of this information for you. This issue is an effort to try and make that index unnecessary, so your use case should be quite straightforward.
Comment #14
bkosborneYes, I can use that table, but I can't use an entity query as discussed above. I was just throwing in my two cents about a use case for this.
For anyone looking for a manual query you can run to get a list of the latest revision IDs when using Workbench / Content moderation:
Comment #15
sam152 commentedAh, my misunderstanding. Makes sense.
Comment #16
benjy commentedUse could also use an EQ and then the ModerationInformation service.
I should probably point out, loading all entities like this could get slow on bigger sites.
Comment #17
timmillwoodIf we can find a way to get the latest revision using Entity Query and Views it means we can remove the Content Moderation revision_tracker table.
Comment #18
sam152 commentedI've made a start on this. I think it would be best to approach this as a generic entity API feature and start to introduce usage of the API into other modules as needed.
This is an early concept. Further work required is:
Comment #20
sam152 commentedFix some tests.
Comment #22
sam152 commentedSeeing what breaks apart from the pending entity definition updates.
Comment #24
sam152 commentedComment #26
dawehnerI'd use something like
$entity_types_with_revisionshere, and then$entity_typewhen you iterate over them.Comment #27
timmillwoodFrom my experience with @catch adding new fields and new queries on save rings alarm bells.
Also I think we need a multilingual test. Content translations + revisions seems to be biting us a bit, so want to make it obvious it works, and make sure a regression won't cause issues.
Comment #28
sam152 commentedStill very much a WIP, but adding an upgrade path test.
#26.1, done.
#27.1, yeah will be good to interrogate the specifics of this. It's modelled on how the base table gets the revision ID updated, but as you pointed out on IRC, this might be able to happen in one query if this ID is on the base table.
#27.2 Good idea, will more test cases.
I think it makes sense to do this in stages, with the ID being added as a first pass then the follow ups such as removing the CM tracker table and adding entity query capabilities as follow ups. Reflecting that in this issue and creating a follow up for what this issue was intended to be.
Comment #29
timmillwoodAs I work on #2860097: Ensure that content translations can be moderated independently I think the latest_revision field might be a better or complementary solution to the default_revision field.
Comment #31
sam152 commentedComment #32
sam152 commentedComment #33
sam152 commentedCreated #2864995: Allow entity query to query the latest revision as a follow up.
Comment #34
sam152 commentedComment #35
sam152 commentedSome more testing and some core test fixes. Batching the upgrade path up next.
Comment #38
sam152 commentedUploading the patch used to generate the database dump. I've been using that and:
drush si minimal && ./core/scripts/db-tools.php dump-database-d8-mysql | gzip > latest-revision-upgrade-path-dump.php.gzComment #39
timmillwoodUsing the patch from #35 with content moderation seems to work well.
I need to do some more tests, but I have a feeling this won't work how I want it to for multilingual revisions. Will get back to you on that.
Comment #40
sam152 commentedGlad it seems to work with CM. Next step on that front will be to see what else breaks once the tracker table is removed.
Re: multilingual revisions, happy to work on it as a follow up if we need to detach multilingual revisions from eachother (if that's the plan), but no idea what else would break if we were to do that.
Replacing the DB dump with a few more entity types, with some multilingual test content. Once the upgrade path passes and is batched I'll look at extracting the parts from the DB dump that are meaningful to the test and applying them on top of drupal-8.2.0.bare.standard_with_entity_test_revlog_enabled, if that suits our purposes.
Comment #42
sam152 commentedI couldn't find a way to get a dump on a stable version of drupal (to match the existing dump) with entity_test from 8.4.x installed, given entity_test doesn't have an upgrade path and there are features in entity_test that 8.2.0 doesn't have. Might be worth creating an 8.4.x dump with entity_test installed, but it might need to be updated as HEAD changes. For now just going to roll with the dedicated dump. The other option would be to use node for this instead, given you can use the 8.0 dump and updb all the way to HEAD.
Fixes tests, batching still outstanding.
Comment #43
timmillwoodI've been looking at per language latest revisions. It's pretty cool, I then setup Content Moderation to use this to determine the "latest revision" and an edit is always done on the latest revision. However this causes issues when the latest revision for language A is a lower revision ID than the latest revision for language B. This is because \Drupal\Core\Entity\Plugin\Validation\Constraint\EntityChangedConstraintValidator looks at the changes across all translations because there might be untranslatable shared fields.
I need to give some more thought about if or how this might work.
Comment #45
sam152 commentedFound a way to build on top of the existing dumps if I tested with Node and BlockContent. Makes the added dump only stuff that is relevant to testing this feature. Leaving uncompressed for reviewability.
Also added batching to the update hook.
Comment #46
sam152 commentedAdding script used to prepare the DB fixture in case it's needed later.
Comment #47
sam152 commentedComment #48
sam152 commentedI think this is ready for some feedback. All @todos resolved.
Comment #49
sam152 commentedAnother nice side effect of this would be being able to do #2865616: Add an option for EntityConverter to load the latest entity revision and fix all entity forms to use this option..
Comment #50
timmillwoodI'm still unsure about to handle the languages. I think for now we need the latest_revision to be the same for every language, because otherwise things blow up. Therefore should this be in the base table rather than the data table?
Does latest_revision need to be an entity_key?
Comment #51
dawehnerWell, isn't that what revision_translation_affected is for on top? Ideally we would have a latest_affected_revision on top.
Comment #52
sam152 commentedSeeing what testbot says to having this on the base table.
Comment #54
sam152 commentedHaving thought about this for a while, I think keeping this on the data table allows us to solve decoupling revisions for different languages using this as a basis. Need to find some time to articulate my thoughts around it, in the meantime reuploading #45 for clarity.
Comment #55
sam152 commentedI think we can actually turn "latest_revision" into what you're describing with "latest_affected_revision". Given how this behaves, I don't think latest_revision is ever actually useful if doesn't really represent the revision that becomes the basis of the next revision.
Comment #56
sam152 commentedBased on this new understanding I think the best course of action is:
Either that or we can keep the existing latest_revision field and add a new one as suggested.
Comment #57
sam152 commentedAs per the hangout #56 is going ahead but field name is going to be more explicit to accurately communicate what is going on with the languages. That plus some more tests to assert how it behaves with multiple languages is going to be required.
Comment #58
sam152 commentedAdded support for latest_revision_translation_affected. I realised we'll need to keep latest_revision if we want to get rid of the tracker table for all entity types, so we have both.
I left a @todo in there to move revision_translation_affected to content entity base. The original issue (#2453153: Node revisions cannot be reverted per translation) says:
Not sure why we wouldn't do that for them if it's part of a core interface.
Otherwise, patch is working well. Just have to make it green, update the upgrade path + test. Getting on to that shortly.
Comment #59
sam152 commentedComment #60
sam152 commentedComment #62
hchonovFrom the architecture point of view I don't understand how good it is to have columns for the latest revision and the latest revision translation affected flag in the tables of the default revision. These are two completely different concepts - default and forward/draft revisions and mixing them could create a lot of troubles.
This means each time a forward revision is saved then a resave of the default revision has to be triggered or like the current approach a db update statement has to be executed, but in this case you have to reset both the persistent and the static entity cache for that entity - I didn't see this happening in the patch.
Also what happens if user B saves a new forward revision while user A is working on the default revision and saves it without creating a new revision? Will user A reset the update made on the data table because of having loaded the entity before the new forward revision has been saved by user B?
Comment #63
sam152 commentedGood feedback re: static caches, will need to add these to the test cases.
I don't think there is a way to work on the default revision in core if there is a forward revision?
Regarding placement of these columns, they are on the data table so there can be a different value for each language. In my opinion, that's going to be key in fixing the critical issues around data loss, revisions and content moderation. Running the query to update the data table uses a similar pattern to updating the default revision in the base table:
Comment #64
hchonovUpdating the default revision in the base table is something completely different, because the base table is intended to keep track of the default revision. The data table contains information only about the default revision.
So if you want to properly separate the concepts from one another then you'll have to keep track of the last draft revision in the base table and put the drafts in a new table e.g. drafts_data.
Tracking the last draft revision in the data table doesn't make sense because what happens if there are two different users working on two separate drafts? To which one of them will the data table point? I don't know if content moderation allows this currently, but even if not then it might be supported later and then we'll have to deal with this again on the storage level. I think it is a completely realistic use case having two users working on two separate drafts.
This also adds two new base fields and the complexity for tracking what you need in order to spare yourself only a single entity query? I am not convinced that it is worth it, at least not based on the current issue summary.
If however you are still convinced this is the way to go for content moderation then you could implement this tracking outside of the storage.
Comment #65
timmillwoodI'm not sure it's possible for two users to work on two separate drafts? or ever will be, until we introduce Workspaces.
Comment #66
sam152 commentedI've put some thoughts against a few of the points you've raised, all valid things to discuss. I think the issue is going to be part of the puzzle for solving an associated critical issue, so going to continue on with it in the hopes we can agree on an implementation of this.
Not sure I follow, this isn't possible anywhere in core or CM right now? Also, not sure how this patch also would prohibit such a feature, given I have no idea what implementing this would look like.
This sounds way out of scope for what we're trying to do, forward revisions are stored in the revision table.
Complexity wise, I think it's far simpler than the alternative content_moderation is shoehorning into the process right now.
I also feel like being able to point to a canonically tracked latest revision is going to be pretty essential for the workflow initiative. First of all the query to do this is really ugly and doesn't work in the context of any other drupal query related concepts, views, entity query (as raised in the original issue). Right now the query also assumes the highest revision ID is the latest, which is an assumption I think will be challenged. Performance is also a major consideration, hence work-arounds in CM to address this.
We already have a seperate revision tracker table for this, which is what we're aiming to fold in as a generic entity api feature, given forward revisions are going to be a concept that needs more support in core in general as the WI continues.
Comment #67
sam152 commentedAdded another follow up #2866972: Provide content_moderation agnostic support for views and forward revisions.. I feel like there are enough wins in bringing this to core to put some effort behind it, but perhaps getting some validation from the entity api maintainers at this stage.
Comment #68
hchonovIf there is already a solution for this then why do you want to make it a generic entity API feature? Could you point some uses cases beside the ones from content moderation from which the need of generalization arises?
By adding all this complexity only content moderation will gain profit and there are no other use cases that I could think of, at least not at the moment.
Do I understand it correctly, that multiple drafts will never be supported and feature requests regarding this will be rejected?
Comment #69
timmillwood@hchonov - I think multiple drafts based off the same parent revision will be supported, and workspace module will provide the UI for doing this, just like Content Moderation provides the UI for creating draft revisions.
Comment #70
sam152 commentedBy adding all this complexity only content moderation will gain profit and there are no other use cases that I could think of, at least not at the moment.What is specifically jumping out as complex, anything I can address?
content_moderation is the only core module currently dealing with the domain of forward revisions, but there are more on the way. Forward revisions are a feature of the API, so I can see these feature additions being useful to anyone using the API, core, content_moderation or otherwise.
Comment #71
hchonovIf you consider big nested entity forms e.g. 100 entities with 10 translations, then it will result on save in the initialization of 100x10x2=2000 field item list object instances + 2000 field item object instances being initialized and checked for changes. And this is going to happen even if you are not making use of forward revisions.
I just don't see the point of having the default revision keeping track of the latest forward revision and doing this so invasive in the data table. It just gets complex when you have multiple drafts by multiple users - in this case you don't have only one last draft revision, but multiple.
Comment #72
timmillwoodI'm not sure we can fully commit to a direction on this issue until we decide on a direction for #2860097: Ensure that content translations can be moderated independently and have a proof of concept in place.
If this issue doesn't end up giving anything to the solution for #2860097: Ensure that content translations can be moderated independently then I'm not sure we need two fields. Also as latest_revision is always the same for every language and translation I think it should be fine to be on the base table, and updated in the same query as revision_id.
Comment #73
sam152 commentedAgree, lets keep an eye on the other few revision related issues until we seriously start looking at this. In the current iteration, latest_revision can definitely go on the base table.
Comment #74
sam152 commentedCompleted upgrade path for the translation affected field, so there is a green patch to test against if needed. Back into experimental territory with the added @todos but if it turns out the latest_revision_translation_affected field isn't needed, it wont matter.
Comment #75
sam152 commentedComment #76
sam152 commentedHelps to upload the right patch.
Comment #77
sam152 commentedComment #78
catch@hchonov
This concerns me as well about the current approach. Having the information is useful, but having to manipulate the base table and caches like this points to it not being the right place for it. If we add support for multiple drafts, we'd have to work against the schema added here.
I'm still confused in issues like #2860097: Ensure that content translations can be moderated independently about exactly what our desired behaviour is - i.e. questions like 'should reverting a translation revert the shared/non-translatable fields or just the translation?' and similar. Do we maybe need a new issue that's just documenting scenarios and use cases?
Comment #79
hchonov@catch
Exactly. This is why I prefer that we stick to the approach of having a separate storage for those needs e.g. the current service \Drupal\content_moderation\RevisionTracker.
I think @mkalkbrenner and @plach both have discussed this topic a lot. My personal preference is that when non-translatable fields are present and a translation is being reverted to a previous revision then the user should be made aware and the decision should be made by the user for each non-translatable field. I think this would be pretty easy solvable as soon as we have conflict management :). But yes, we could open a new issue and discuss this again.
Comment #80
timmillwoodI think ultimately we need to decouple translated revisions from each other, but this is a big undertaking. Especially has we need to maintain an upgrade path even in to D9.
Comment #81
hchonov@timmillwood
I am totally for this! However this is not going to be easy and we'll have to deal with different concepts and use cases in D8 when having both one revision ID for all translations and one revision ID per translation. I think this is going to be a lot of overhead that it makes me even think if could we do it in D8?
Comment #82
timmillwood@hchonov - If we do it, it'll have to be in D8, so we can then deprecate the old way and remove it in D9.
Comment #83
sam152 commentedMy impression is a tighter integration with entity api would have allowed for better integration of forward revisions with the rest of core. Tracking this in core allows us to move all of the generic entity api features introduced by content_moderation for forward revisions into core, for the benefit of future modules. The seperate tracker table seems like a bolted on approach to me, but I guess core could maintain this?
Comment #84
sam152 commentedAdding another related issue. The latest_version route is another feature CM has to add and support. We could bring this core as another generic feature of forward revisions/entity api if there was the appropriate tracking for it.
Comment #85
timmillwoodAfter talking to @hchonov about #2766957: Forward revisions + translation UI can result in forked draft revisions it brought me back to this issue where we were also looking at ways of storing more data about a revision.
We are now considering a "revision_type" field which will store if a revision is a default or a draft revision. This will not, however, provide us with a quick way to get the latest revisions. As stated in #6 we are looking for a way to get the latest revisions across a series of entities. For example, a View of the latest revisions of all nodes.
One idea @hchonov and I discussed was adding a new table, the 'revision_draft_table' maybe. This would give us a table such as 'node_draft' which stores data similar to the node table, but for draft revisions rather than default revisions.
Comment #86
catchWould an is_default column in the revision table be enough for some of these use-cases? The advantage of that is the default-ness of a revision should never change.
Comment #87
sam152 commentedRe #85, kind of like a seperate base table for the latest revision instead of adding the latest revision ID to the existing base table? Sounds like a viable option.
Re #86, I don't think this would solve this problem entirely. You can have multiple default/non-default revisions for a single entity, so getting the canonical latest revision one would still be tricky under this model.
Comment #88
hchonovWe need the revision_type column in both the entity and entity revision table. In the entity table we need it to know which is the latest draft revision which might be an older revision than the current default revision. In the entity revision table we need it for having a proper revision history. Having said this we will have to refer to draft revisions only with the term draft revisions instead also with the term forward revisions as draft revisions will not anymore be necessary only forward revisions, actually this is the case currently as well regarding draft revisions behind the default revision.
I think we will need additionally one more column previous_revision/base_revision/origin_revision in the entity revision table because at the point when we merge a draft into the current default revision we have to estimate what are the actual changes in the draft otherwise we might revert changes made to the default revision while working on the draft. Consider the following use case : draft revision 8 originates from the default revision 7 and meanwhile a default revision 9 is created. The translatable title field of the entity remains the same between revision 7 and 8 but is changed in revision 9. Now when we want to merge the draft revision 8 with the default revision 9 we need a way to check if the title field has been changed in the draft or in the default revision in order to decide the value of which revision we'll take for the new merge revision. @timmillwood told me that currently it is impossible through the FAPI to create a new default revision from the latest default revision if there is an existing draft because in this case on entity edit only the draft revision will be loaded. However there are two other cases because of which we have to support this : 1.) An entity might be changed through the entity API e.g. through some update or 2.) having multiple translations and having the same case with the title field but in this case it being not translatable which allows that a default revision is being altered even if there is an existing draft revision.
Comment #89
hchonovIgnore this comment as it was a duplicate of the previous one.
Comment #90
timmillwoodThe 'is_default' (suggested by Catch) and 'revision_type' (suggested by @hchonov) look to achieve the same thing, which I think it what we need for #2766957: Forward revisions + translation UI can result in forked draft revisions, although I'm not sure it helps with this issue. Unless we do what I think @hchonov is suggesting and list both the latest and default revision in the entity tale.
The previous_revision/base_revision/origin_revision field @hchonov suggests is something we actually do in the Multiversion contrib module with a parent_revision field, and something we're looking to introduce into core at some point to assist with Workspaces.
Comment #91
hchonovFor the current use case they both will achieve the same thing, yes! The only difference is that having the column "revision_type" gives contrib and custom the possibility of creating revisions which are neither default nor draft revisions.
Like I've mentioned in my previous post:
But yes I would put the latest draft revision in the current entity table or create a new entity_draft table only for draft revisions, but similar to the entity table. I think one table for all revision types should do fine or do you have anything against this?
Comment #93
amateescu commented@joachim, thanks for the great suggestion in #9, I've used it successfully for both views and entity query in #2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table and #2864995: Allow entity query to query the latest revision.
Comment #94
sam152 commentedIs views and queries aren't a case for altering the way forward revisions are tracked and stored, maybe the rest of the follow-ups can be addressed without any underlying changes.
Add some methods along the lines of:
isLatestRevisionloadLatestRevisionMight help with issues like:
#2865616: Add an option for EntityConverter to load the latest entity revision and fix all entity forms to use this option.
#2879377: Provide a latest-version route in core for content entities with forward revisions.
Comment #95
sam152 commentedQuick patch to explore #94. Code is lifted from #2903524: Don't add quickedit to displays where entities have a forward revision. which would also benefit from this.
Comment #96
samuel.mortensonUpdated #95 to not throw the "Only variables should be passed by reference" notice in PHP < 7.
Comment #97
amateescu commentedThe
isLatestRevision()case can also be accomplished with something like this.The EXPLAIN of the new query is:
While the current query in HEAD is like this:
I'm not sure of the performance implications of the addition SUBQUERY but I thought it was something interesting to try out.
Comment #99
joachim commented> I'm not sure of the performance implications of the addition SUBQUERY but I thought it was something interesting to try out.
More expensive than the self-join technique I mentioned in #9 -- much more! For *each row* of your outer query, you run an additional query.
Could we get this issue back on track to its stated solution, which is:
> Add the latest revision ID to the entity data table to enable further work in this area for other subsystems.
This is not doable with just queries in a way that scales.
Comment #100
sam152 commented@plach had some ideas around how this might work, might be worth pinging to see if he has capacity to review the issue.
Another issue which could address this problem space would be:
These lookups could just as easily live in a service somewhere instead of
RevisionableInterface.Comment #101
dawehnerComing from #2865616: Add an option for EntityConverter to load the latest entity revision and fix all entity forms to use this option. I was suggesting to add at least a method to
EntityRepositoryto fetch the latest revision of an entity. One could argue that it might be also a good idea to add it to the storage directly, but its something you don't need directly on there.This doesn't solve the usecase of views though.
Comment #102
sam152 commentedOne thing we encountered in that issue as well was wanting to load the latest revision without having first loaded the default revision, so I think I'm liking the idea of putting some of these methods on
EntityRepository. We could also add anisLatestRevisionto the entities themselves, which use the entity repository to lookup the latest revision ID, then optimise it later if we can come up with an approach that is faster.Comment #103
sam152 commentedPulling in @amateescu's code from #2918569: Simplify ModerationInformation::getLatestRevisionId().
This approach adds a new method to EntityRepository and one to RevisionableInterface. I believe this satisfies the use-cases in most of the issues out there related to pending revisions.
I think the important thing is to agree on where these things should live, with the view that we should be able to optimise them further in the future if we gain that capability. The other thing is the nuances around how non-revisionable and new entities should be treated in the context of
isLatestRevision. The test cases show the current behavior is to default toTRUEfor both cases.I think this approach is far away from the original IS that it might be a new issue, otherwise if we're confident moving the latest revision ID into the storage tables is the wrong approach (which seems like a closed case), then we can rejig the IS for this issue.
Comment #104
timmillwoodWhy do we need this?
Comment #105
sam152 commentedIn #2337191: Split up EntityManager into many services, methods were moved from
EntityManagerinto a bunch of different services. As a resultEntityManagernow implementsEntityRepositoryInterface, so I'm not sure how we'd add something to the repository without creating the same deprecated function inEntityManager.Comment #106
timmillwoodAh yes, that kinda sucks. The workarounds would be to add a new interface or not add the method to an interface, both bad ideas.
Comment #107
joachim commented> otherwise if we're confident moving the latest revision ID into the storage tables is the wrong approach (which seems like a closed case),
Why are we saying that it's the wrong approach? I still think it's the right approach! Anything else isn't scalable for Views and queries in general.
Comment #108
phenaproximaNit: $storage is never reused, so there's o need to keep it in its own variable.
Why not simply reuse $definition here?
What happens if there is no result?
Comment #109
phenaproximaAlthough I think that storing the latest revision ID in a table is hacky and potentially brittle, I have no choice but to agree with this sentiment. Without a table upon which to join, I don't see how Views can possibly create a relationship to the latest revision. Which I can see being useful for all kinds of things.
Comment #110
amateescu commented@phenaproxima and @joachim, Views already provides a generic "latest revision" filter since #2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table, is that not enough?
Comment #111
phenaproxima@amateescu, it is not. Because what if we want to show fields from the latest revision? We would have to create a relationship to it. With this architecture, I don't see how we will be able to do that.
Comment #112
amateescu commentedI'm sorry but I don't understand what you mean. When you use the 'latest revision' filter, all the fields displayed by the view will be those of the latest revision.
Comment #113
phenaproximaBut what if I want to show some fields from the default revision, and some fields from the latest, in the same view? As far as I know, there is no way to do this without creating a relationship.
Comment #114
amateescu commentedCan't you create a regular relationship to the revision tables and make the 'latest revision' filter apply only to that relationship?
Comment #115
phenaproximaI'll try that! :)
Comment #116
sam152 commentedThat's how the default revision ID is tracked, so in my mind it was simply giving latest revisions the same treatment and making them more of a first-class citizen, however this implementation hasn't been a blocker for any of the proposed follow-ups, so I'm inclined to agree that it wasn't necessary. For our use cases now, the latest revision is quite special but I suppose this could be less true in the future and leaving the storage tables alone is probably preferable unless there are some very compelling reasons to change them.
Feedback from #108:
1. Fixed.
2. Fixed, good catch.
3. NULL is returned, as covered in
EntityRepositoryTest::testNoMatchingEntity.Comment #117
joachim commented> Can't you create a regular relationship to the revision tables and make the 'latest revision' filter apply only to that relationship?
What 'latest revision' filter? There is no filter for that AFAICT, because there's no direct data for that in the database.
> That's how the default revision ID is tracked, so in my mind it was simply giving latest revisions the same treatment and making them more of a first-class citizen
Exactly. I don't see how it's brittle. The API takes care of setting it when you save the entity, just as it takes care of handling the default revision. If you write to the entity tables bypassing the API then you've already invalidated your warranty.
So can we *please* stop posting this patch that merely adds some methods and doesn't fix this issue in the way described by the title and the summary?
Comment #118
sam152 commentedI did propose a new issue, so this could continue to be explored. Perhaps we should split it out? In fairness the current IS evolved from something else.
I also think we should agree on the interfaces used for accessing/loading/checking for the latest revisions, even if the implementation for tracking them evolves. In that sense, it'd be good to scrutinise #116 a bit more. It'd be an important/quite visible part of the public API, so it's a touch more important than some mere methods imo :).
Doing so unblocks a bunch of other improvements in core, so an implementation in now is valuable imo. Especially given changes of this nature are likely to take a long time to work out/build consensus for.
@amateescu had some wizardry for that in #2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table. It is available when the base table for the view is the revisions table. Ie, you are displaying a list of all revisions and then choosing the latest one for each entity.
Comment #119
sam152 commentedThe split proposed in #118 is happening in #2926483: Add API methods for determining whether an entity object is the latest (translation-affecting) revision, so this issue can indeed focus on the original IS and if doing something like this is a good idea.
Comment #120
plachI think we should try to do this: the latest (affected) revision ID is going to be needed more and more, as we adapt the core (and contrib) space to use pending revisions, so this would be both a DX and a performance win.
Comment #121
xjmShould this be postponed on #2926483: Add API methods for determining whether an entity object is the latest (translation-affecting) revision?
Comment #122
plachDefinitely :)
We should at very least reroll any previous work on top of #2926483: Add API methods for determining whether an entity object is the latest (translation-affecting) revision.
Comment #123
clairedesbois@gmail.comComment #124
sam152 commentedBlocker is resolved, so I think the only question left here is do we actually need this?
Comment #126
wim leersLooks like the current direction of the issue is to not add a new field? If so, the title and IS need an update.
Comment #127
joachim commented> Looks like the current direction of the issue is to not add a new field? If so, the title and IS need an update.
Adding a new field is what this issue is *about*.
It's either that we add it, or we mark as wontfix.
> I think the only question left here is do we actually need this?
But do we really have to go round this again? Scroll up -- plenty of discussions about performance and use cases.
Comment #128
sam152 commentedBoth @catch and @hchonov disagreed we needed this or at least disagreed on the current approach. So yes, plenty more discussion to take place if consensus is to be reached.
Comment #129
joachim commented> Both @catch and @hchonov disagreed we needed this or at least disagreed on the current approach. So yes, plenty more discussion to take place if consensus is to be reached.
@catch disagrees on the approach:
> Having the information is useful, but having to manipulate the base table and caches like this points to it not being the right place for it. If we add support for multiple drafts, we'd have to work against the schema added here.
By 'multiple drafts', do we mean the possibility of having two separate lines of revisions, like 2 git branches come off the published revision? That indeed suggests the entity base table is the wrong place for this.
@hchonov's objections are along similar lines -- that the base table is the wrong place.
I hadn't realized that #2865579: Rewrite the 'Latest revision' views filter and remove the revision_tracker table has since landed. I was just about to get stuck into trying to make the self-join approach work in Views, when I spotted the filter plugin that's now in core! :)
I would say in light of that, the question this issue should resolve first is this:
Is the 'latest revision' Views filter suitable for very last numbers of entities? Eg, if we have an admin view of latest drafts of nodes, on a site with 500k nodes, how does the self-join affect performance?
Comment #134
mrclay commented@joachim I have a site with 260K revisions, and the Latest Revisions self-join has become extremely slow (~40s locally). We're looking at excising old revisions or moving the view to Solr.
I can see from https://www.drupal.org/project/drupal/issues/2865579 people were happy to lose the table, but I'm not sure what other benefits were had.
@timmillwood Can you help explain what you mean by "maintaining the tracker table was not sustainable"? Was it blocking features added since? A big source of bugs?
Comment #136
yakoub commentedunless the `content_moderation` module is enabled, there is no difference between current revision and latest revision .
otherwise if any other site builder/developer disrupts the consistency of current revision without installing `content_moderation` then it is their responsibility to handle latest revision problem on their own .
but `content_moderation` doesn't need this latestRevision method added to Query since the maintain track over revisions in the state entity `ContentModerationState` field `content_entity_revision_id`
the problem with this field is that it is not `entity_reference` and that is why you can't join against it from other Query objects .
the reason this field is not `entity_reference` is that it references various entity types .
the correct solution is to turn `ContentModerationState` entity into a field which is attached to entity types and this way the ambiguity is eliminated .
the field would have same attributes/table columns as the `ContentModerationState` entity properties .
given this field the Query objects will work flawlessly and not only latest revision but any delta parameters revision could be attained in simple field condition of format
Comment #137
sam152 commentedHi @yakoub, what you are describing is being tracked here: #3057946: [PP-1] Move content moderation information to a base field of the entity being moderated
Comment #138
yakoub commentedas i wrote i agree that ideal solution to implement content_moderation as field instead of entity .
but i still implemented another approach as proof of concept which is to define specialized Query for content_moderation .
the specialized query supports this code snippet for joining the referenced moderated entity :
$query->condition('refer:node.uid.entity:user.name', 'admin');the `refer:node` will join entity table content_moderation to node entity .
i don't know how useful you will find this and how much you think it answers the requirement .
the patch doesn't include test code but instead i include following code for testing which can be run using `console snippet --file=`
note that QueryAggregation still doesn't work .
also `Tables.addField` is same as parent class except for call to new method `referModeratedEntity` and `resetModeratedEntity`
Comment #139
yakoub commentedcorrection : getKey should get `revision` and not `id` .
Comment #140
yakoub commentedchanged parent addField method to allow encapsulating it .
Comment #141
catch'latest revision' also diverges when workspaces module is installed, it's not just content_moderation.
Comment #142
yakoub commentedi don't know about workspace ... yet,
but i think same as content_moderation the solution is either using field table instead of entity table or providing a specialized Query object .
but trying to solve this using latestRevision method is wrong approach as we have seen the bad performance in issue#2950869 .
moreover latestRevision lacks any context or meaning without content_moderation and workspace, so implementing it independently from those modules will result in inconsistency .
Comment #143
catchThe API for creating draft/forward revisions is in core as of 8.x (in 7.x it wasn't), so the concept does have meaning without either module installed at the API level, even if you'd never actually have any forward revisions without using them (or a custom/contrib equivalent).
Comment #144
yakoub commentedwell then it needs to be deprecated .
Comment #145
yakoub commentedthe issue doesn't describe which api in core are involved in manipulating revisions and what was the intent/usecase of introducing it .
can someone please describe this ?
Comment #146
sam152 commented@yakoub: #2940575: Document the scope and purpose of pending revisions