Problem/Motivation
Some issues such as
#3007752: Entity usage list does not scale
#3015287: Allow usage records to be registered in background
#3056026: Remove orphan handling of paragraphs when parent exists
#2985265: Cache viewable results when generating usage page
#3002332: Track composite (e.g. paragraph) entities directly and only as their host
#2971131: Improve label handling on usage page
#2949952: Make it easier to retrieve full relationship chain
are examples that show how our current approach is not scalable, and also hacky in some scenarios.
Storing all 1-to-1 relationships in DB is great, but not using the same rules to display them on the UI is challenging. On the other hand, arguably most users of this module are not really interested in that flexibility, and only interested in "in what node is this piece of media being used (regardless of all paragraphs/blocks in between them)".
This issue is to explore a new architecture of the module to simplify things, and hopefully allow better scaling for large sites.
Note: This will obviously be done in a new 3.x branch, since important API-level aspects would change.
Proposed resolution
- We introduce the concepts of top/middle/bottom level for entities
- Only Top-level entities are tracked as source (by default only nodes, sites could override that)
- Middle-level entities are disregarded entirely, but traversed when looking for targets
- Bottom-level entities are all entities configured to have the "Usage" local task (tab) on their page. These are the only "target entities" we store information for in DB
- We no longer care to store information about the fieldname, relationship method (plugin id), or count in DB. Now all a DB row tells us is that: "a (source, top-level) entity of type: A, with id: B, on its revision C and language D points to a (target, bottom-level) entity of type E with id: F". This "reference" from source and target may be direct or through any number of intermediate non-top / non-bottom entities.
- All tracking calculation is deferred to a (usually) background process using
DestructableInterface - We can now build the usage page on the UI using direct paged queries. The page will show usages grouped by source entity type (so we can easily join the entity table on the default revision) and only show records that point to the default revision
- We no longer need to provide any specific views integration
Remaining tasks
Should be fixed here:
- We need to implement new instances of
hook_entity_delete()and similar, to: a) Remove records from the DB when target (bottom) entities are deleted, b) Set the "needs regeneration" flag/warning when middle-level entities are deleted
Can be fixed in follow-ups:
- Having dropped the views integration, we might consider a custom field handler that would either show "Not being used", or a link to "Check usages"
- When a field / field storage is deleted, and also when any middle-level entity is deleted, we may end up with stale info in the DB. There may be several ways to approach this, it's probably best to discuss / test pros and cons of each in a follow-up.
- When an entity is used only in past revisions of a top-level entity, those "hidden" usages will not show up in the usage list. We need a mechanism to let users discover the past usages through the UI, if needed.
- Currently to expose new entity types to be tracked as source, sites need to write custom code and override
\Drupal\entity_usage\EntityUsageSourceLevel::TOP_LEVEL_TYPES. It would be better to expose this to be configurable on the UI.
User interface changes and modifications on default behavior
- The usage page (when visitors click on the "Usage" tab) now only displays rows for top-level source entities.
- The usage page no longer displays columns for "Field Name", or "Used in".
- The usage page will now group source entities by their type, with a common pager for all groups. For example, if the "Number if items per group" (defined in the settings form) is set to 10, and Nodes and Users are configured to be tracked as top-level entities, then the first page will display a group of 10 rows for node sources, and another group of 10 rows for user sources. The next page would fetch the next 10 rows for both groups, and so on.
- When the module is first installed,
nodeandmediaentities (if they exist) will have the "Usage" tab enabled by default. This will mean they are automatically tracked as targets (bottom-level) by default. Note: This does not apply to existing sites. - By default only
nodeentities are tracked as source. - In some situations (for example after a field has been deleted), a warning in the status report page will be displayed, informing users that usage re-generation is needed. In order to do so and remove the message, users need to go to the Batch Update form and trigger a batch update of usage statistics.
API changes
No change is needed in tracking plugins, as long as they extended the \Drupal\entity_usage\EntityUsageTrackBase base class.
The changes below might affect custom or contrib code interacting with this module:
- The setting option
usage_controller_items_per_pageis now calledusage_controller_items_per_group. - The hook
hook_entity_usage_block_tracking()no longer receivesmethod,field_name, orcountas parameters. - The
entity_usageDB table no longer has columns formethod,field_name, orcount - The system now uses a state flag
entity_usage_needs_regenerationto display a warning on the status report page when we detect stale data might exist - The
Drupal\entity_usage\EntityUpdateManagerservice now implementsDestructableInterface, and during CRUD hooks we only register the operations that happened during the current request. All real usage tracking is deferred to the end of the request (normally in background), inside the\Drupal\entity_usage\EntityUpdateManager::destruct()method. - The module no longer provides specific views integration. In other words, we no longer implement
hook_views_data()orhook_views_data_alter(). - The methods:
::trackUpdateOnCreation(),::trackUpdateOnEdition(), and::trackUpdateOnDeletion()fromEntityUpdateManagerare now protected instead of public. - The method
\Drupal\entity_usage\EntityUsageInterface::registerUsage()is now only intended to _adding new records_ (instead of adding/updating existing). Its signature has changed since it now receives less arguments. - A new
\Drupal\entity_usage\EntityUsageInterface::deleteUsage()method is created to allow deleting a specific record from the DB. - The method
\Drupal\entity_usage\EntityUsageInterface::deleteByField()is removed, since we no longer have field information in DB - All events dispatched by this module have been adjusted, since we no longer pass information about field_name, method, etc. Also, a new event is created when a specific DB record is deleted.
- The
\Drupal\entity_usage\EntityUsageInterface::listSources()return value now no longer includes information about the field name, method, or count for the retrieved records. - The
\Drupal\entity_usage\EntityUsageInterface::listTargets()return value is now a simple associative array where keys are target entity types, and values are indexed arrays of target entity IDs. - The (already) deprecated methods
\Drupal\entity_usage\EntityUsageInterface::listUsage()and\Drupal\entity_usage\EntityUsageInterface::listReferencedEntities()were removed.
| Comment | File | Size | Author |
|---|---|---|---|
| #26 | entity_usage-3060802-refactor-26-D9.patch | 198.44 KB | omarlopesino |
Issue fork entity_usage-3060802
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
marcoscanoFirst shot as a POC, seems to work in manual testing (checking records in the DB).
Several things still pending:
- All changes to the controller
- Update hooks
- Update the tests
- Test / adjust batch update code
- Figure out what to do when a field or a middle-entity is deleted
- Test / adjust the views integration
Comment #3
marcoscanoStill todo:
- Pager on the controller
- New controller for the "More info" page (past revisions)
- Update hooks
- Update the tests
- Figure out what to do when a field or a middle-entity is deleted
- Test / adjust the views integration
Comment #4
marcoscanoAbout "New controller for the More Info page":
I'm less convinced a page using past revisions is useful when there is an usage in the default revision. It doesn't add any value to know that previous revisions also used it, and it is still being used in the current revision. So I'm actually removing that column from the table.
What would be useful, though, is to have a mechanism to let users know that there is a usage ONLY in a past revision of a node. But that's tricky, since that node isn't in the main usage page because... well... there's no usage in the default revision! Still need to think the best way to get around this.
Views integration:
Since we no longer have the "count" column in DB, I'm feeling we shouldn't have the views relationships anymore, since there is no point in including it. Site-builders can just create a link to the usage page and display for each row in their view something like "Check usage", which would point to the usage path for that entity.
If anything, we can provide a custom views field plugin that instead of always showing the link, display "No usages" when there are no usages recorded, but that can be a follow-up I think.
Field deletion / Middle-entity deletion:
This patch includes a very basic approach: whenever a field or field storage is deleted, if they were in a top or middle-level entity, we set a flag that will display an error in the status report, informing that bulk regeneration is needed.
This system can likely be improved to be smarter.
Still pending:
- Decide how to best display the "hidden" (past revisions-only) usages on the UI
- Update the tests
Comment #5
marcoscanoSetting to NR to hopefully have opinions about the new approach, even though there are no updates to tests yet.
I've added a message to the usage page that for now only informs users if "there are hidden usages", even though we aren't providing a way of telling the user what those hidden usages are.
Comment #6
marcoscanoNow with tests updated, and some bugs caught along the way.
tests++
Comment #7
marcoscanoComment #8
marcoscanoComment #9
marcoscanoComment #10
brooke_heaton commentedUpdate hook 8301 fails for me with:
[error] The 'target_id_string' field specification does not define 'not null' as TRUE.
Comment #11
marcoscanoThanks @brooke_heaton ! It seems that I was testing this on new installs only, and not sure how that change got in there :). This patch fixes it, in any case (also re-rolled, since no interdiff)
Updb seems to work for me now:
Comment #12
marcoscanoI opened a 3.x branch of the module with #11, just so it's easier to keep track of changes.
Comment #13
rp7 commentedGreat stuff being done here, nice work.
Are there reasons we are not using Views for rendering the usage page? This would allow the usage page to be more customizable.
Currently on a project where the client doesn't want usages to be listed on a separate page, but rather as a block in the sidebar of the canonical page (/node/123). If there was views integration, this would be as simple as adding a block display.
Are there any hard-blockers for this? Willing to put effort into this.
Comment #14
rp7 commentedAnother question: will the 2.x branch still be supported? If so, for how long?
Comment #15
marcoscano@rp7 thanks for chiming in!
That's an interesting question :)
In the 2.x branch, that was impossible since that page needed some "massaging" of the results before building the page. It's true, though, that with this simplified approach in 3.x we may very well simplify it even further and use a view for that... Apart from introducing a hard dependency on views, I can't think of any other major drawback of this idea right now, so it could be worth exploring.
Another good question. This 3.x branch is very much experimental for now, there's still a lot to figure out and edges to polish for it to be even usable. If everything works well, though, and we are happy with the improvements, my idea was to focus new features on 3.x only. On the other hand, I don't plan to force anybody to upgrade, since there are disruptive changes. So even if 3.x is successful, I envision 2.x still receiving major bug fixes / security fixes for the foreseeable future.
Comment #16
geek-merlinComment #17
frobReally liking this module. Is there any hope in splitting this module as a part of the experimental 3.x redesign? Right now it seems like it could make a useful API module which includes the plugin definitions and services and then more sub-modules which include things like views integration and UI.
Comment #18
frobI noticed in #3090018: Proposal to refactor entity_usage they mention using a the D7 Tree module approach to simplifying nested references. Another approach that might be easier is to use the nested-set approach that the entity_hierarchy module is using.
Comment #21
omarlopesinoHi. In a project, I need to see where the media are being used, and the 3.x resolution fits as we do want to ignore the entities referenced between the node and the media.
I've created an MR from the dev branch, rerolling patch #11.
I've found a problem, there was an infinite recursion produced in these circumstances:
- There is a content type with a paragraph that allows references to other nodes.
- A node is created with this paragraph.
- The node that is referenced in this paragraph, also has a reference to the other node.
- The entity usage of that node is recalculated
In those conditions, the entity reference track plugin would go fall into an infinite recursion as it will find a reference that points to the self node that is currently being recalculated.
The problem is fixed in this commit: https://git.drupalcode.org/issue/entity_usage-3060802/-/commit/663bcaaf3... , by not going deep into entities that are currently being calculated.
The module looks functional now but the tests need to be reviewed after the reroll, I will look into it. Any help is appreciated.
Comment #22
omarlopesinoNow the tests are fixed, along with some corrections that were missing after the re-roll. Please review, thanks!
Comment #23
flyke commentedI cant apply patch #11 latest dev.
composer require 'drupal/entity_usage:2.x-dev@dev'with this in the patches section in my composer.json:
this gives error failed to apply patch.
If I manually try to apply the patch (downloaded it locally):
git apply -v --directory=web/modules/contrib/entity_usage 3060802-11.patchthen I can see that there ae parts of the patch that can be applied, but a lot that cant:
Comment #24
omarlopesinoThose errors are fixed in the MR. I should have hidden the patches which won't work, my bad.
@flyke the merge request attached to the task solves all that problems, may you apply a patch from that MR and check again?
Comment #25
brooke_heaton commentedThe MR patch is not working for me against "drupal/entity_usage": "2.x-dev@dev". Patch fails:
https://git.drupalcode.org/project/entity_usage/-/merge_requests/14.patch
Comment #26
omarlopesinoI don't know why, but the plain diff does not fail against the 8.x-2.x branch, please use it instead.
In any case, I've attached the patch complementing the MR.
Comment #27
omarlopesinoComment #28
anybodyAs this module is used by 26,105 installations, several other modules and even seems quite helpful as Drupal Core (entity_reference) addition, could this perhaps be pushed forward by any community initiative or something like that?
@marcoscano this is still assigned to you, are you working on this actively?
Comment #29
marcoscanoI am not currently working on this, forgot to un-assign the issue.
I still think though that there are challenges with both sides.
On one hand, the 2.x (currently supported branch) has scalability issues as pointed out in the issues in the issue summary. On the other hand, the idea with the refactoring here changes significantly the approach, still has a few edges that need polishing, and is much more opinionated, which means it might not be a good fit for all projects.
Personally, I would love to pursue this 3.x approach in the long run, but before diving deeper into it, I would love to hear that it's not only a "theoretical" solution, but actually meets the needs of real projects out there.
Comment #30
brooke_heaton commented@marcoscano - I'm late to the party here but have a site that has an immense number of entity reference fields and this module seemed rather crucial. Can you elaborate on what you mean by 'opinionated' and in what way? What types of projects would suffer from a different approach and what types of projects would benefit?
Comment #31
marcoscano@brooke_heaton the issue description goes in more detail, but the most important part is that we would be disregarding all intermediate entities in a relationship chain, and only store in the DB information about the origin and the final target. So for instance in a relationship chain such as node -> paragraph -> paragraph -> media, only the relationship between the node and the media item is stored. In a paragraphs scenario this isn't a big deal, since paragraphs have no standalone representation in Drupal, but who knows what sites might be doing, so far the module was not special-casing types of entities, and we just blindly stored all 1-to-1 relationships in DB. With the new approach we start looking at the entities themselves that are part of a relationship and decide which ones we store and which one we don't, that's why it's more opinionated.
Comment #32
acbramley commentedThe main issue with the proposed solution I see is that some entity types may be a mix of top, mid, or bottom level depending on what the entity itself is being used for.
Let's take a scenario (which is in use in a current client project that heavily uses entity_usage) with Node, Managed link (from linky module) and Block content entity types.
Nodes can reference Linky entities via entity reference fields and embedded wysiwyg links (via linkit).
Nodes can reference blocks via layout builder, or embedded in wysiwyg content (via entity_embed)
Blocks can reference Linky entities via entity reference fields and embedded wysiwyg links.
The chain in this scenario seems to be Node = Top, Block = Middle, Linky = Bottom which will work for the majority of use cases.
However, what about blocks that aren't embedded via layout builder? Such as blocks embedded via entity_embed or blocks that just appear on the site in other locations. Those need to be treated as Top level entities. How would this new architecture denote when a block is a middle or a top level?
We could say reusable blocks are always top level, and inline blocks are always mid level, however what about when a reusable block is embedded via entity_embed? You would want that treated as mid (or maybe top, or maybe both?). This is where it could become too complex/opinionated.
This is just one example using common entity types, but there are endless possibilities.
Comment #33
berdirI can't answer for @marcoscano, but I think #3002332: Track composite (e.g. paragraph) entities directly and only as their host from me did influence the plan here. Also note that I didn't actually look at the patch yet, sadly.
I don't think your use cases are that complicated. Yes, the module will be "opinionated", but it's opinions are simple and a "middle entity" is only such in explicit, clear cases:
* Entity Reference Revisions has an entity type flag for what it calls composite entities, which mostly means paragraphs (there might be others, not sure). No questions there, paragraphs are designed as embedded, *single-use* entities.
* Content Blocks in core have a reusable flag, if that's set it is also only used in a single place in a single entity, so it can be considered a middle entity. any other block embed or reference is not.
The decision will be with the plugins and maybe a hook or so customize it.
TMGMT has similar logic for example for composite entities (does not support layout builder yet I think), which works very well for us.
Right now, entity_usage with paragraphs is quite complicated to use, as you need to join/loop through an undefined amount of layers and implement this composite logic at query time. This would massively simplify that. For example, TMGMT right now also doesn't support looking up suggested/related entities to translate through paragraphs, and with this, we could optionally support entity_usage to cover that and all the other things that entity_usage has plugins for. We also have a custom integration that shows embedded media entities so users can translate them, that would get so much simpler and support more use cases.
Comment #34
acbramley commented@Berdir keen to see how it turns out :)
Comment #35
partdigital commentedOne suggested approach that we've been using on our project to handle entity usage:
We created a service that accepts a top entity along with a specification. It would then traverse through the tree and only store the results that we needed based on that specification. As we traversed the tree we would also store the location of each item so that once the usage was captured we could easily traverse that set with methods like
getParent(),getChild(),getSibling()etc.For example, our API looks like this:
We define a specification. It's basically just an array but it could be made into a plugin/config entity and given a name. So that you could define meaningful traversal specifications for your project. You can also simply generate a "default" specification by observing what fields and entity types there are on the site. Though I've usually found it more useful to be more explicit somehow.
We then pass that specification into a method.
Now we can do things like this:
This is very fast because we store the entity id and its location in the set (basically an index). See the example below. The key is the location and the value is the entity id.
To get this working with the broader entity usage, you could:
The api might look like this:
Just food for thought as you're working on this :)
Comment #36
acbramley commentedIs the plan to still go ahead with this 3.x branch? I see there's now a 4.x branch using entity_track. Surely we should consolidate efforts on a single new architecture?
We use this module pretty heavily on one of our client projects and they've recently asked for features such as filtering the Usage list by current/previous revision so I'm happy to help the efforts in order to unlock so of those more complex features.
Comment #37
marcoscanoThanks all who have been providing feedback and ideas to this issue. Apologies for not replying earlier 🙏
@acbramley thanks! I will take any help available :)
Currently I would say that both 3.x and 4.x branch are very much experimental and shouldn't be used on prod. Development on 3.x stalled at some point because I didn't feel good being the only one moving this idea forward (being this such a disruptive architectural change). Then at some point in time @seanb and @askibinski came up with the idea of splitting the API into a generic layer to "track things", and then make Entity Usage just be a consumer of that API, which makes sense to me, but we didn't fully make the switch into this new 4.x branch, and the development kind of stalled.
Yes, I think at this point it makes sense to envision the refactoring mentioned here on top of the 4.x branch. In order for that to happen, I would say that a rough roadmap could be:
ET = Entity Track
EU = Entity Usage
0- [NEEDS WORK] Fix tests in D10 / Switch to GitlabCI #3408387: Fix tests in HEAD for D10
1- [ALMOST DONE ?] review the current code / update the branches with latest commits on EU (entity_usage) 2.x and ensure we have feature parity between ET 1.x + EU 4.x and EU 2.x
- This was kind of OK as of Dec 2022 with #3324787: Update 4.x branch and #3324797: Update with entity_usage 2.x changes but we'd need to review latest bug-fixes since then.
2- [NEEDS REVIEW] ensure that the test coverage of ET 1.x and EU 4.x combined is equivalent of what we have in EU 2.x
- This probably happened as part of the above issues as well, but we'd need to double-check we are not losing test coverage in the switch
3- [NEEDS WORK] ensure we have an upgrade path for existing users on EU 2.x #3326110: Create an upgrade path for EU 2.x -> ET 1.x + EU 4.x
4- [DONE ?] have some real world experience / feedback of ET 1.x + EU 4.x
- I know of one reasonably-sized project that is using ET+EU on prod for a couple years now, but it would be great to get more alpha testers out there if we can.
After this, I believe we could tag a EU 4.0.0-beta1 and mark it as recommended branch instead of 2.x.
Then, it would likely make sense to revisit the refactor from this issue and simplify everything in a 5.x branch probably?
I am OK going forward with this plan and welcome everyone that is able/willing to participate.
Thanks!
Comment #38
claudiu.cristeaAs there's no move here, and because I badly needed something like the refactoring has envisioned, I had to create a new module which is more or less based on this idea. Enter Track Usages.
Here are some key differences:
Posted this comment for those who might be interested.
PS: I needed this kind of functionality to achieve the scope of File Visibility module
Comment #39
eelkeblokAs a new and interesting twist, the project page for Entity Track says
However, I am unsure what changes those are. Looking for a way to speed up the indexing process.
Comment #40
vasikeI tried @claudiu.cristea solution
and I added some MRs for some issues
https://www.drupal.org/project/issues/track_usage
Maybe they could help with some solutions on some projects ... for some people "here"
Also this doesn't look ready for review ... it still "Needs work" ... imho