Problem/Motivation
One of the main goals of the D8 Entity API is making entities storage-agnostic.
Proposed resolution
#1497374: Switch from Field-based storage to Entity-based storage is going to allow us to make entity storage swappable, as the Entity Storage controller will have full responsibility over field data. The big missing piece is entity querying: currently entity-defining modules use hardcoded SQL queries to retrieve sets of entities and often retrieve partial data. This does not make sense as this way the Entity API is completely bypassed. Instead we need to convert all entity SQL queries to the Entity Query API and fully load entities whenever we need their data.
This will probably have a performance impact, but we can leverage entity caching and lazy loading as mitigating strategies, once only the Entity API is used to access entity data. Morover these can be implemented as internal refactorings without breaking the public API.
As a side note, there is consensus that once the Entity Storage API is complete and core code has adopted it, we can still change the underlying SQL schema without affecting contrib modules relying on the API to access entity data. This will allow for further refactorings, such as improving multilingual support for entity types still missing it.
Remaining sub-tasks
None
Comments
Comment #1
plachI marked this as critical as I think we cannot ship without it, but feel free to change the priority if this is not actually true :)
I am now creating the actual issues.
Comment #1.0
plachAdded actual issue links.
Comment #1.1
plachUpdated issue summary.
Comment #1.2
plachUpdated issue summary.
Comment #2
chx CreditAttribution: chx commentedI think we can skip the search extender -- it'd be rather complicated to convert that one and everyone who uses nonsql uses solr or sphinx or searchapi or something.
Comment #3
chx CreditAttribution: chx commentedAlso note this is probably postponed on #2068655: Entity fields do not support case sensitive queries
Comment #4
catchThis feels major in the sense I'd happily backport dbtng -> entity query conversions to point releases. On the other hand it'd be great to have 8.x-1.0 releases of alternative storage backends that actually work for all core entity queries - but downgrading until this blocks one of them.
Comment #5
Gábor HojtsyI think so long there are direct queries to entities in core like that, we cannot say the storage is swappable.
Comment #5.0
Gábor HojtsyUpdated issue summary.
Comment #6
slashrsm CreditAttribution: slashrsm commentedWhile working on comments conversion some questions appeared, which I think need to be discussed:
1. How do we decide which storage controller does something belong to. There are definitely many cases where this is not a question at all (most recent comments, most recent nodes, ...), but there are some "mixed" cases where I was not completely sure (example: get N most commented nodes). This kind of operation will return nodes, while it will definitely need to query against comment tables. It is a bit strange to return nodes from comment storage controllers and is also not very nice to query comment tables from node storage controller. I assume there will be some more cases like this and I think we should define general guidelines for this cases to prevent confusion and inconsistency in API.
2. Code that will use storage controllers to get some data sometimes expects entity ids and sometimes entirely loaded entities. We should probably define how we solve that, as it could also bring confusion. One option is to define naming conventions for storage controller functions that will reflect type of data we're returning. Another way to go would be to support both ways, which would be controlled with an argument.
Comment #7
slashrsm CreditAttribution: slashrsm commentedWhile working on comments conversion some questions appeared, which I think need to be discussed:
1. How do we decide which storage controller does something belong to. There are definitely many cases where this is not a question at all (most recent comments, most recent nodes, ...), but there are some "mixed" cases where I was not completely sure (example: get N most commented nodes). This kind of operation will return nodes, while it will definitely need to query against comment tables. It is a bit strange to return nodes from comment storage controllers and is also not very nice to query comment tables from node storage controller. I assume there will be some more cases like this and I think we should define general guidelines for this cases to prevent confusion and inconsistency in API.
2. Code that will use storage controllers to get some data sometimes expects entity ids and sometimes entirely loaded entities. We should probably define how we solve that, as it could also bring confusion. One option is to define naming conventions for storage controller functions that will reflect type of data we're returning. Another way to go would be to support both ways, which would be controlled with an argument.
Comment #8
chx CreditAttribution: chx commented> How do we decide which storage controller does something belong to. There are definitely many cases where this is not a question at all (most recent comments, most recent nodes, ...), but there are some "mixed" cases where I was not completely sure (example: get N most commented nodes).
That's definitely tricky but again it's clearly comment:
SELECT nid FROM comments GROUP BY nid ORDER BY COUNT(*) DESC
-- it does return nodes but it's a comment query.> Code that will use storage controllers to get some data sometimes expects entity ids and sometimes entirely loaded entities. We should probably define how we solve that, as it could also bring confusion.
Yes, there is some confusions around this. Given that we can always extract $ids from $entities with array_keys; I think we should standardize + fix those that pass $ids.
Comment #8.0
chx CreditAttribution: chx commentedAdded one more aggregator issue
Comment #8.1
plachUpdated issue summary.
Comment #8.2
plachUpdated issue summary.
Comment #8.3
plachUpdated issue summary.
Comment #9
plachComment #10
plachComment #11
plachComment #12
vijaycs85Updating issue summary with sub-tasks.
Comment #13
plachComment #14
sunComment #15
larowlanAdded #2205185: Split up ForumManager to create ForumIndexStorage at request of @andypost and @chx
This will move forum SQL queries from forum.module into ForumRepository
Comment #16
slashrsm CreditAttribution: slashrsm commentedNot really an entity, but somehow related: #2209145: Move all path alias SQL queries to a single storage controller.
Comment #17
fgmComment #18
plachComment #19
plachComment #20
slashrsm CreditAttribution: slashrsm commentedComment #21
slashrsm CreditAttribution: slashrsm commentedComment #22
plachComment #23
plachComment #24
plachComment #25
plachComment #26
naveenvalechaUpdating issue summary.Moved node #2068333: Convert node SQL queries to the Entity Query API and forum #2205185: Split up ForumManager to create ForumIndexStorage under completed sub tasks list.
Comment #27
plachComment #28
andypostlooks all child issues are fixed, can we close this one
Comment #29
chx CreditAttribution: chx commentedSeems #2228733: Remove getFeedDuplicates - its unused and untested is open.
Comment #30
BerdirIt is, but that's actually only related, not really a sub-issue. That is just moving a query that already is an entity query out of the storage class, because there is no reason for it to be there, as it is fully storage-agnostic.
So +1 to closing this.
Comment #31
plachSounds good, thanks everybody!
Comment #33
Gábor Hojtsy