Problem/Motivation
Postponed on #3492391: Make the event dispatcher available before container full bootstrap
Menu tree storage, config and cache uses the same pattern: execute a query, catch an exception, try to create the table, if succeeded rerun the query. This is already wasteful and there's already demand #2338747: Move {router} out of system.install and create the table lazy for it elsewhere.
Proposed resolution
Create a trait for lazy table creation and use that trait in the existing backend services. Later do the same with services that now do not make use of lazy table creation.
Remaining tasks
User interface changes
API changes
Ther new trait Drupal\Core\Database\LazyTableCreationTrait is created.
The trait has 2 methods: Drupal\Core\Database\LazyTableCreationTrait::catchException() and Drupal\Core\Database\LazyTableCreationTrait::ensureTableExists(). The first method with holds the throwing of the exception when the table does not exists. The second method tries to create the table. There is also an abstract method Drupal\Core\Database\LazyTableCreationTrait::schemaDefinition() that every service that uses the trait must implement. The method should return the table schema for the table.
A number of class constants have been deprecated and will be removed before Drupal 12.0.0:
Drupal\Core\Batch\BatchStorage::TABLE_NAMEDrupal\Core\Flood\DatabaseBackend::TABLE_NAMEDrupal\Core\Lock\DatabaseLockBackend::TABLE_NAMEDrupal\Core\Queue\DatabaseQueue::TABLE_NAME
A number of protected class variables have been deprecated and will be removed before Drupal 12.0.0:
Drupal\Core\Lock\DatabaseLockBackend::databaseDrupal\Core\Routing\MatcherDumper::tableName
| Comment | File | Size | Author |
|---|
Issue fork drupal-2371709
Show commands
Start within a Git clone of the project using the version control instructions.
Or, if you do not have SSH keys set up on git.drupalcode.org:
Comments
Comment #2
chx commentedComment #3
chx commentedBumped ensureTableExists to Connection; now we should install at least. Writing tests.
Comment #5
chx commentedComment #6
chx commentedComment #8
chx commentedComment #9
chx commenteddawehner asked for a rename.
Comment #10
chx commentedBumped ensureTableExists to schema, there's 1 usage of it outside of the database, so it doesn't really matter and it's really a schema-ish operation. Added a test for this one too. The patch is still net negative (54 LoC) despite adding a new interface and tests and it of course enables doing this conversion for a lot other very cheaply.
Comment #11
chx commentedMissing interface doxygen pointed out by dawehner. Added the results of some discussions with dawehner into the summary.
Comment #12
chx commentedI realized that running a select on an empty table is pointless. Genius, that's my middle name, isn't it? Took me a bit.
Comment #13
dawehnerIt would be nice to explain why this schema is separated out ...
Given that its more than just a string we should document it, if possible
Alternative getTable() could always return the table string.
I'm curious ... did you considered to add the tablename into the SchemaProviderInterface?
Maybe put a @see hook_schema() on here.
ha, this is not rudimentary anymore
Let's rename it to testSimpleSelecgtWithEnsuringTable
Comment #14
dawehnerComment #16
chx commentedAll done except 4) because the query already has the tablename.
Comment #17
dawehnerThis change is pretty convenient and even could make in the future makes some of the tests faster as well as avoids
the manual creation of tables in DrupalKernel tests, which would be pretty handy.
Comment #18
Crell commentedOverall this makes good sense, and I like how it's broken up into pieces within the DB API. That said, there's some improvements we can still make:
If the table doesn't exist, why do we need to create the table and then delete? Our desired post-condition is already true (there are no records with those $cids in the database), so why bother creating the table? We only need do that on insert/update.
Same comment here. If a truncate fails, our post-condition is known-true so why bother creating the table?
Not introduced by this patch so not really a blocker here, but can someone explain to me why there's a drupal_static() call inside a class? Wouldn't an object property accomplish the same thing with less hackery?
Off topic, but a reasonable improvement.
This could be read as whether or not the table existed before, not whether or not we're now sure the object exists. New wording: "TRUE if the table already existed or now exists, FALSE if there was an error and it does not."
Since that's a tests-only table, can we not move it into this class entirely rather than calling out to hook_schema?
Nitpick: static, not self.
Discussing with dawehner at BADCamp I suggested that maybe we should have a table object, rather than a "schema provider object" (since in all relevant cases we will be ensuring only a single table). However, reading through the patch I can see the benefit of having the schema defined directly on the object that is the swappable service. So I'm OK with this design.
Comment #19
chx commented1,2 reverted (and removed the bug fix, not necessary now)
3. off topic as noted
5. 7. fixed
6. the schema is in the module and is used by the installSchema() call in setUp, why would I copy it?
Comment #20
dawehnerThe feedback of @crell got addressed in #20.
Not copying the schema array is a valid point.
Comment #21
catchI've just committed #1426804: Allow field storages to be persisted when they have no fields. - this added some test code that we should be able to remove again here. Leaving RTBC since that can happen in a separate issue, but also not committing yet because I've not been able to give this a proper look yet (like the idea a lot though).
Comment #22
yched commentedThis indeed looks like it would let us remove some edgy test code from #1426804: Allow field storages to be persisted when they have no fields., but that would require adding a SchemaProviderInterface class for Entity & Fields tables schemas, which is probably outside the scope of that issue here.
On that aspect though :
1) SchemaProviderInterface::getSchema() [no param] means there has to be one SchemaProvider object per table, which doesn't look applicable for dynamic needs like, well, entities & fields. Could that method receive the $table_name as a param to allow the same object to return schemas for different tables ? Then single-table implementations like the ones in this patch can always ditch/overlook the param.
2) Naming fun...
In the "Entity/field SQL storage" space, SqlContentEntityStorageSchema is in charge of assembling entity & field tables schema and creating the tables (currently does so in the various onEntityTypeCreate() / onFieldStorageCreate() / ... methods from EntityTypeListenerInterface & FieldStorageDefinitionListenerInterface)
So it would most likely be the object to also implement the SchemaProviderInterface added here.
But then : SqlContentEntityStorageSchema implements SchemaProviderInterface ?
I.e. "a Schema is a SchemaProvider" ? That sounds quite recursive :-/
That's a case of two different-but-related systems using the same name for slightly different things. Naming things in the "SQL entity storage schema" issue was quite debated IIRC. In the end, the "Schema" objects in the "SQL entity storage schema" land are more "handlers for some schema logic".
If this issue adds its notion of SchemaProviderInterface (which I think is a perfectly reasonable name), and Entity/field SQL storage has to use it, then maybe it would make sense for SqlContentEntityStorageSchema and its existing interfaces to move to being "SchemaProviders" too, instead of just "Schemas"...
Anyway - item 2) is probably more for discussion with the Entity Storage folks, probably in the followup issue where we move entity/field tables to the on-demand-creation mechanism added here.
Comment #23
chx commented> Could that method receive the $table_name as a param to allow the same object to return schemas for different tables ? Then single-table implementations like the ones in this patch can always ditch/overlook the param.
Very nice idea! I didn't want the schema to return an array indexed by table names cos the single use is common but your suggestion makes a lot of sense. So setting for CNW to move {cachetags} in.
Comment #24
plach@yched:
In the early Entity Storage patches
SqlContentEntityStorageSchemawas calledContentEntitySchemaHandleror something like that. I guessSqlContentEntityStorageSchemaProviderwould definitely make more sense although "handler" is the right word IMO, as it does more than just returning a schema definition. I think we removed the handler suffix for consistency with the other entity handlers, but I've never been a big fan of that choice.Anyway, I guess such a rename would need to happen in a BC way and, given the new policies, I'm not even sure that would be enough to let it be approved.
Comment #25
yched commented@chx, @plach:
Thinking a bit more about passing $table_name to SchemaProviderInterface::getSchema(), I don't think that would be enough for SqlContentEntityStorageSchema(). It means getSchema($table_name) needs to reverse-engineer the table name to figure out which table it is (the base table, the revision table, the data table, some single-field table...) - that is, do the opposite of TableMappingInterface::getXxxTableName(). That would be some brittle regexp logic - not even sure that's doable since we hash some parts to avoid long table names.
Would be more helpful if SchemaProviderInterface::getSchema() received the "internal code for the table" : 'base_table', 'revision_table', 'data_table', 'field_table for field foobar'...
Meaning, SchemaProviderInterface::getSchema() should receive the table name (good enough for other, non-entity cases) + an arbitrary $context that was passed to $query->executeEnsuringTable($schemaProvider, $context) ?
(the code that does the query knows which table it is talking about, and is able to pass the right $context to describe that table)
Comment #26
andypost@chx++
Having this in core will help history, comment_statistics to have own storage on per bundle basis.
The tracker could get some facelift too
Comment #27
yched commentedWondering how the patch would work for queries JOINing several tables.
- one single $query->executeEnsuringTable[s?]($schemaProvider), and the same $schemaProvider is used for all the tables in the query ?
(not really compatible with the per-table $context param suggested in #25)
- one $query->executeEnsuringTable($schemaProvider, $table_name) per table present in the request ?
Comment #28
yched commentedFTR, regarding #25 and #27 : I'd hate to block this patch on requests for the (somewhat complex...) Entity/Field case. Having on-demand creation for Entity/Field tables would definitely make tests faster and simpler to write, but moving *all* runtime Entity/Fields requests (SqlStorageController + EFQ + Views...) to "beware, the tables might not exist yet, use $query->executeEnsuringTable() correctly" is not likely to be completely trivial, and having on-demand table creation for other systems is totally worthwhile in itself.
Just trying to see if we can shape the initial feature in a way that keeps the door open for entities without too much API changes. I know, that's a fine line to scope creep :-/
Comment #29
chx commentedMore and more I tend to think that while adding a table name serves this issue well anything else should be a followup. Any query doing a JOIN is not fit for this. It's possible the field table case is too complicated to be covered by this simple mechanism. It is useful for so much else, although.
Comment #30
catchThis might be an awful idea, but what if we just put this logic directly into execute()?
The try/catch has no real cost.
The catch case if there's no schema definition we'd just rethrow the exception - that will have a bit of overhead, but PDO exceptions are rare enough that shouldn't matter.
Also I'm wondering what happens if the lazy table creation is implemented, but the first query to that table is in a query where it's not the base table. That feels like a race condition but also possible - could we check other tables in the query if the base table is OK, and only rethrow the exception after doing that?
Comment #31
chx commentedGreat idea with an optional schema_provider argument on execute itself. But I'd rather answer your multi table question the same as I did yched's: if there is more than one table then simply re-throw. This solution is for simple I would say.
Comment #32
chx commentedBy the way this is why I still love working on Drupal core: I thought this ready but since then we got two really great ideas! Please bring them on :)
Comment #33
Crell commentedI think it's important to target the 80-90% use case here, not all use cases. To that end:
1) If you're a use case where you're not working with the base table, I guess this doesn't work for you. That's a case you'd still need to handle a variant of this logic manually (just as the current use cases are doing now). If even that isn't sufficient, you're back to hook_schema (ie, what you need to do already.) Fields may be such a use case where following this logic manually is necessary, or at least a task for a second pass.
2) I would prefer to not fold this logic into execute(). That complects executing the query with the (rare) case of lazy schema generation. If you want to mess with lazy schema generation you should know what you're doing. Keep each method single-purpose. (SRP and all that.)
Comment #34
chx commentedCrell's argument for 2) sounds compelling but I will wait for catch's answer.
Comment #35
catchI'm OK with not handling entities and fields here, I do think we should open a follow-up to explore that more though, and try to leave that open in this patch as much as possible, pretty much what yched said in #28.
@Crell this is another case where 'complection' doesn't really stand up as a magic/dirty word when looking at the overall context of the change.
Having the separate method only works if every read from the table is from exactly the same system that's handling the write. In the case of entities/fields there's at least three systems that interact with the storage (entity CRUD, entity field query and Views). If we don't use this for fields (which clearly is the most advanced and difficult to handle use case), there are other places where lazy table creation would be useful, that also have Views integration.
The obvious example in core is dblog. The Logger/DbLog class is currently in dblog module, and there's a hook_schema(). However it seems quite likely that we'll want to move the actual logger to core/lib at some point and drop the hook_schema(). This would allow the logger to be used independently of the UI at least, and remove the implicit dependency of the class on hook_schema() and module installation which was my original motivation for opening #1167144: Make cache backends responsible for their own storage.
However dblog module would still provide views integration, services.yml and extra UI bits like displaying individual log messages.
With the logic in execute(), dblog only needs to do the hook_schema() -> SchemaProviderInterface change, which is pure implementation detail and no API change (with the partial exception of the schema no longer showing up in drupal_get_schema() but that's never actually used any more and we're close to dropping it at this point).
With the logic in executeEnsuringTable(), Views would need to be able to distinguish between tables that need executeEnsuringTable() and queries that don't. If the former, it would also need to add to the hook_views_data() API to allow modules to communicate whether at table is lazy created or not.
(Side note, but any module can define Views integration for anything, I could see 'cache inspector' and 'queue inspector' modules showing up in contrib).
So while having two separate public methods keeps the concepts separate in the database layer, it will slowly have the effect of bleeding knowledge about lazy table creation - both the general concept and whether particular tables are lazy-created or not, around different subsystems in core, creating considerably more coupling overall.
Given that, this looks like an example of SRP taken too far to me, found this discussion which looks relevant too:
From http://programmers.stackexchange.com/questions/150760/single-responsibil...
Apart from that. I'd expect by Drupal 9 that we move almost entirely away from hook_schema(), and more or less all tables are created via this mechanism - because a major trend since Drupal 5 or so has been to decouple various subsystems from specific database implementations (cache, lock, queue, entities). Most of what we have left in core hook_schema() definitions is things that simply have not been got to yet, @andypost gave three good examples. So the use cases for querying a table that's not lazy-created have already become increasingly narrow and this will tend towards the default (let alone that writing database queries is becoming increasingly narrow anyway with Views and EFQ).
Comment #36
yched commentedInteresting points in #35.
Not sure I get how putting the logic in execute() is that different from having a dedicated executeEnsuringTable() method though. Whatever the method, a SchemaProviderInterface object needs to be passed to the query anyway, right ?
So the code that writes the query has to know 1) that the table is lazy created and 2) what is the right SchemaProvider for the table- execute() doesn't bring much over executeEnsuringTable() regarding "who needs to know what" ?
(except executeEnsuringTable() is easier to add in query_alter() ?)
Comment #37
plachAdding to #35: I think doing this for entities/fields would be tricky because atm the events that trigger schema creation/changes are completely separated from actual querying. Hence I'm not sure it would be easy to find a point in the various execution flows where it makes sense to call an explicit
executeEnsuringTable()method (and pass a schema provider) and still keep things storage-agnostic. I've no actual concern, just the feeling that this would not play well with the current architecture...(obviously this is not meant to be a blocker at all :)
Comment #38
catchThinking through it, that's making me think that having to provide the SchemaProviderInterface object at all is potentially brittle. It's not more brittle than what we do now, but if we apply the pattern to places like dblog then it gets trickier.
The only way to handle that would be to add either something declarative or an event so that when the exception is caught, it then figures out whether the table has a matching SchemaProviderInterface class somewhere - but that adds a fair bit of dependencies.
Comment #39
yched commentedWhich is a bit like hook_schema() with an indirection : mapping table names to "the name of a class that can provide the schema" instead of the schema array directly...
Yeah, sounds a bit complicated in terms of instantiating the classes - they are likely to be business objects too, rather than just SchemaProviderInterface objects, and thus have dependencies...
What this patch gives us is a mechanism for on-demand creation of tables, where the impact is that all queries on those tables need to account for "the table might not exist yet" and provide the SchemaProvider for the table.
That is good enough for self-enclosed systems where all queries are generated by using a defined API - which is a general trend anyway since we tend to abstract from hardcoded SQL implementations. --> log, cache, history, comment_statistics...
Then, as @catch points, there's Views, that virtually allows exposing any SQL table, bypassing the APIs above.
But then could Views simply try / catch its query and consider that "Exception, table does not exist" means "empty result set" ?
Comment #40
chx commentedHad a massive discussion with catch; summary updated accordingly. I believe this works even for the field case and definitely works for Views. Now, can we get some PostgreSQL folks to implement
isTableMissingExceptionquickly?Comment #41
amateescu commentedHere you go :)
Comment #42
plachWhy not just
return $e->getCode() === '42P01';?Comment #43
amateescu commentedBecause I'm dumb? :P
Comment #44
yched commented@chx @catch : great to hear about the new plan !
What I don't get though is:
Which SchemaProviders do we call exactly ? How do we find them ?
Comment #45
chx commentedThere is no interdiff here, the change is too big, treat is as a completely new patch. amateescu, thanks for the pgsql patch.
The execute method debate is decided by patching query in the connection class instead.
Edit: re #44 we find them in the backtrace.
Comment #47
chx commentedI didn't use the options carefully prepared in config, caught the wrong exception twice. Also two minor leftovers removed. Drupal standard now installs in peace. Whether the tests pass, I do not know :)
Comment #49
dawehnerWe talked about making that a little bit more explicit: 'create_missing_table'
We should explain why return early is okay: this basically optimizes for SELECT queries, right?
Let's get rid of cuf
What about using the += pattern here as well? ... Oh and btw. on_table_exception seems to be an older name for on_table_missing?
Is there a reason to have this method even it is not used at all? (neither storm nor myself could find a use of it)
... old documentation
Let's document it ... in general it is confusing to be NULL by default, what about an empty string? It should document that most of the time you can pretty much ignore the table name, unless you need to support multiple tables.
This is a bit confusing: what about documenting that just setting cache entries does not save cache tags, ... this just happens if someone invalidates them.
Comment #50
chx commentedComment #52
chx commentedAh yes, MenuTreeStorageTest asserts that reading creates the table. That doesn't happen. So I refactored the test slightly and save instead.
Comment #53
yched commentedHate to be the downer here, but I can't say I'm in love with the "fetch missing info from wherever we can find it higher in the callstack" thing.
- It feels like uncommon/surprising black magic
- It feels brittle too: many things can be in the callstack, including objects that happen to be SchemaProviders for unrelated systems - since it seems a lot of simple systems won't check the table name param, you can end up creating the wrong table ?
- De facto, it moves from (previous approach) "only code that knows the right SchemaProvider can query a lazy table" to (current approach) "only the SchemaProvider can query a lazy table". I don't see the gain, it's just more constraining ? The previous approach (explicitly pass the SchemaProvider for the query) already had that "only the subsystem can query its tables" aspect, but more flexible and without black magic.
- If joins are still not supported, we haven't gained in terms of use cases ?
Won't fight this if everyone else likes the idea, but I'm not too yay myself :-)
Comment #54
amateescu commentedEdit: Stepped into a different bug.
Comment #55
dawehner@amateescu
Are you sure you don't run into the following bug? #2349441: Regression: 'Front page route not found' Exception when installing Drupal against an existing database
Comment #56
amateescu commented@dawehner, yup, that was the problem. Manual installation works fine on MySQL but fails on Postgres:
Edit: I can try looking into this a bit later today if no one beats me to it.
Comment #57
chx commented@yched we can fix that. We were very close fixing that in fact -- the database passes in the table name already and the caller checks whether there is a schema returned for that table name. cache already returns a schema if it is something it is interested in it. All we need to do is to change the interface to make the table name required and make the implementations check it. In fact I had this coded and this is why I have switched from
$bt['class']to$bt['object']because this way getSchema can look at $this->table / $this->bin. But I have backed out because I saw MenuTreeStorage calling$this->getSchema()and I thought that we can have our cake and eat it too but no. You are right, this is brittle so MenuTreeStorage will need to pass in the table name. That's really not a big deal.Because the win is gigantic: query strings work without bothering to pass the SchemaProvider in to query/queryRange/queryTemporary.
Comment #58
chx commentedEdit: nevermind.
Comment #59
chx commentedEdit: nevermind. Double post, somehow.
Comment #61
chx commented@amateescu, is this enough to fix pgsql? No doubt this is necessary because pgsql overrides the query() function without calling the parent (opsie); I am just not sure whether it's enough.
Comment #62
amateescu commented@chx, nope, doesn't seem to be enough. I also tried to debug it for a few hours but didn't get anywhere.. I'll try again tomorrow.
Comment #63
amateescu commentedAlmost there :)
There are multiple problems with Postgres here:
1) in
Drupal\Core\Database\Driver\pgsql\Insert::execute(), we set the return option toDatabase::RETURN_NULLwhich makes
fail because
$handledwill be NULL on an insert in a cache table. Added a hacky workaround to return an array instead and just use its first value.2) since we've removed various pieces of code that ensured tables were created before running INSERT queries, we now have to extend the fixes/workarounds from #2181291: Prevent a query from aborting the entire transaction in pgsql to
Drupal\Core\Database\Driver\pgsql\Insert::execute()as well as handling it insideDrupal\Core\Database\Driver\pgsql\Connection::query().3) I'm now getting
which appears to be a general problem with inserting serialized PHP data into a bytea column, so I wonder how the installation even works without this patch.
Edit: to answer myself - it works by magic and I see that cache_discovery gets populated with serialized data just fine :/
To be continued tomorrow...
Comment #64
chx commentedDiscussed with @amateescu yesterday after he realized postgresql insert relies on schema. Here's the plan:
Comment #65
amateescu commentedExcept that the info from 1. (a $schema array) is not what's necessary for insert :) But that's no biggie, we can just query information_schema again.
Anyway, the plan above allowed me to make some progress. A lot more tables are created now but some are still not, like cache_data, cache_menu, cache_render and menu_tree.
This interdiff is against #61 if anyone wants to play with it during the weekend.
Comment #66
Crell commentedI have to say, trying to crawl up the call stack to get to an object with a given interface strikes me as a "holy crap, why would you do such a thing?" action. That's relying on all sorts of implicit behavior and statefulness that feels like it's going to blow up in our faces, or more likely someone else's face.
To whatever extent a subsystem maintainer can veto something, I do to that approach.
The original approach seemed far more contained and reasonable to me. The only caveat is Views, which to me seems like a non-issue. If the table isn't there, there's nothing to show anyway so who cares? Let's solve the 80% case, not over-complicate it with language black magic (which has performance concerns, too; generating the call stack is not a fast operation) to try and cover every possible edge case.
Comment #67
chx commentedSupporting query strings is not an edge case and there you have problems with doing this otherwise. Performance concerns are likely invalid since this is a very rare occasion coupled with one of the slowest possible database operations: creating a table.
Comment #68
chx commentedBut, I have said everything I had to say multiple times now; I have carried this issue so far if people don't like it; that's not really a problem of mine. Without any regrets I am unfollowing this issue. MongoDB now carries a little DB driver anyways to facilitate an easy install and it has a Schema class that blackholes every call to it anyways. If people want simpler and speedier tests they can continue with either the latest or earlier patches. I am out.
Comment #69
dawehnerAs a compromise can we replace the auto-detection of the used class and pass along the schema provider as part of the
$options?Both
$connection->query(),$connection->select(),$connection->update()etc. provide it.Comment #70
catchIf we did it as part of options that gives us a potential route to supporting more than one table/schema provider (or the table not being the base table) later on as well as static queries. I'm thinking about an eventual situation where Views handles locating the schema provider, if one exists, for each table in a query and adds it to options.
Views isn't the only issue here, EntityQuery is almost as problematic (except that it's at least linked to the API dealing with the schema, whereas Views isn't at all, so EnttiyQuery probably has a closer route to discovering that information).
Using the backtrace isn't a performance concern for the reasons chx mentioned (we're already doing a DDL operation if we get to that point). I agree it's weird but so is an extra execute() method, $options does seem like a decent compromise between the two.
Comment #71
Crell commentedUsing the options array makes more sense, I agree. It's not quite what it was originally intended for, but such expansion is why it's an array rather than just a primary/replica boolean. Let's see if we can make that work.
Comment #72
tstoecklerSpot on. Entity Query already has the storage injected, so it can get the schema quite easily.
Comment #73
dawehner@amateescu
Great work!! Did you managed to do a little bit more progress in the meantime?
Here is a first try based upon the last patch in #61 to implement the idea mentioned in #69.
After fixing a couple of calls to
$connection->select()inMenuTreeStoragethis could actually pass.
Comment #74
amateescu commented@dawehner, nope. And now I'll wait until the new approach/patch is RTBC before doing any other Postgres fixes.
Comment #75
dawehnerCool that you agree, having a conflict here would be bad.
It does? Afaik this should be part of
\Drupal\Core\Entity\Query\Sql\Query,but I'm not really sure whether I see the entity storage at the moment, can you enlighten me?
On the other hand I try to understand the point what is the point in that. We could just specify
$options['create_missing_table'] = TRUE;which would return empty results in both views and entity query and be done. There is no real requirement to create the tables for that.
Comment #77
dawehnerIts green again.
Update the issue summary to include the new idea.
dawehner passes the ball to @amateescu
Comment #78
amateescu commentedThanks, but, as I said earlier, I'll resume working on this only when it's RTBC or at least fully reviewed and agreed upon by the db maintainers.
Comment #79
yched commentedThe IS still contains "climb the backtrace and call the the SchemaProvider interfaces..."
Also, not clear what's the status of selects with JOINs in that latest approach ? (asking for entity/fields in EFQ/Views).
Comment #80
dawehnerTo be honest I'm personally okay with not supporting every possible usecase.
Updated the IS a bit more.
Comment #81
yched commentedSure, as I said earlier, I absolutely do not intend to block this on supporting every use case. Just trying to have a clear view of what's supported and what's not, because that's the difference between "this can be used by self-enclosed systems like cache or history" and "this can be leveraged by entity storage".
Comment #82
dawehnerSo @yched do you agree with that patch as it is? We need a RTBC (which is odd, because that is the wrong state) so that @amateescu picks it up and tries to fix the PGSQL problems.
Comment #83
amateescu commentedWell, let me clarify that a bit. I'm waiting for an 'unofficial' RTBC (as in, not the issue status) from people who want this patch to go in a specific direction or have strong opinions about it, like @catch, @Crell or @yched.
I hope no one gets this the wrong way, it's just that there are a lot of other patches to work on instead of learning Postgres (which I installed and used for the first time just a couple of weeks ago to help @chx and test that the error code from #41 works as expected).
Comment #84
Crell commentedI think I noted this earlier, but may not have been clear about it... we don't need to create the table lazily on delete, since we don't need it to be there for the post condition to be satisfied. However, we DO need to catch a table-not-found case and continue gracefully without letting the exception bubble up. I'd suggest just catching the exception outside the foreach and swallowing it.
I think the same applies to a few other cases below.
This doesn't say where the table definition comes from. I assume it's from schema API but that's not at all clear. But isn't the idea to allow the schema to be passed in, not pulled from a static hook_schema definition?
Also, I do not understand the TRUE meaning. If a query fails an exception gets thrown, so the return value is irrelevant. Don't load too much meaning into a single property.
Also, schema_provider is not documented?
$handled should never be null. From the code here I don't see how it should ever be anything but a boolean, The code below says it can also be a statement, but that's still an object. Just don't let handleMissingTable ever return null, period.
$options['create_missing_table'] should never be empty. defaultOptions() should ensure that. If this is a boolean check just do that.
I really don't get the point of create_missing_table at all. If the schema provider defines one table, we're done. If it defines multiple... wouldn't we need all of them? This feels like is seriously over-complex for what should be a simple task.
Why is this not a method on Connection instead, where it could be overridden per driver if needed? Database should just be the connection management and utility facades; this is not just a utility facade; it's internal behavior of a database driver.
As above.
There's a lot of these lines. Does that mean this code was all that buggy before...?
womp womp.
I agree with the overall direction, but the internals feel very over-complicated at present.
I would see this as simply needing a schema_provider option. In case of table-not-found, install everything the schema_provider gives you and try again. If it still doesn't work or one wasn't provided, just exception as normal.
I can only assume that the double and undocumented keys here are for complex cases like Field API, but I believe the setup here puts the complexity in the wrong place. There should be no parameters on ProviderInterface::getSchema(). It should return all tables to be created. In a more complex case like Field API where the tables might be conditionally named, well, then you don't make the calling object itself a provider. Just instantiate one with appropriate constructor parameters. Something like (mostly pseudo-code):
FieldProvider can then provide all of the necessary tables for that given field (or whatever) based on $field_name (or whatever other setup happens other than getSchema()). That's a totally legit thing to do, and cleaner than optimizing for the degenerate case of Foo being its own provider. (If it can be, that's cool, but we shouldn't count on that always being the case.)
That lets us separate out the schema definition logic to its own object, keep the Foo class or whatever it represents clean, and keeps the handling code inside the DB API shorter and with fewer overloaded properties. The presence of schema_provider itself triggers the "on failure do this and try again" logic, and its absence means... do exactly what happens now.
Comment #85
Crell commentedComment #86
dawehnerThank you @crell for your review!
Note: given that we have both create_missing_table and schema_provider we don't create tables of delete calls.
Worked a bit on the documentation here.
At least in my version of the code, in case you have a different exception ... you will return NULL;
Yeah, you are right here.
Fixed it.
Well we want to at least support the cache backens ... which have a dynamic table name.
Comment #87
amateescu commented@Crell is right, $handled should never be null. that's a problem that we need to fix anyway for pgsql, see why in #63.
Also:
Wouldn't that be fixed by something like this?
Comment #89
Crell commentedDynamic table name, sure. Then getSchema() can return a table definition with the right name. If it's a method on the backend itself, it has access to $this->bin. If it's a separate object, pass $this->bin to that object's constructor. Problem solved.
Really, this should be a lot easier than the last patch was making it out to be. :-)
Comment #90
yched commentedNot opposed to that, but just trying to clarify what it would mean when applied to entities / fields.
This means the first request on an entity type, no matter what are the actual tables being requested, creates all the base tables and all the per-field tables for the configurable fields that exist at this point in time.
1) In practice it probably means little actual perf gains for tests, since we create all field tables anyway.
If not for test perf, there's IMO little interest to bother moving the entity/field storage to the lazy model ?
2) Say some new fields is created in the UI after that.
- Their tables are not created immediately, since in that scenario we moved to the "lazy" model.
- They are created by the next first request involving one of those fields, which triggers a SchemaProvider call
- The SchemaProvider returns all the previous tables + the new ones for the fields that were created since the last request that had to call the SchemaProvider.
- This means the "lazy create" supporting code needs to check which tables already exist, so that we don't create them again.
(We also have spent a bit more time re-computing schemas we don't need for tables that already exist - no biggie, since that is a rare runtime occurrence, but that still feels like a clustered, non-cached version of hook_schema())
So, as compared to "SchemaProvider::getSchema() receives the missing table name and only returns the schema for that one":
- some complexity moves from one place to the other, but can't be fully avoided ?
- "tables appear by bursts depending on the sequence of fields creation and requests being executed" feels a little more confusing/unpredictable than "a table appears when it is requested".
- it's much less interesting for test perf
Then again, it's still not clear to me whether the system designed here intends to missing tables in JOINs. If it doesn't, then entities/fields are out of scope to begin with, and the above is irrelevant.
Comment #91
catchWould an array of schema providers in $options help the fields case, feels like that might be enough to support multiple tables? Of course entity/field needs to then figure what that array is supposed to be, but then on the exception could iterate each schema provider to check. Would be less tables at a time than every single field on an entity then for the case yched brings up.
Comment #92
Crell commentedPerhaps I'm not understanding Field API's usage patterns properly. Wouldn't it be querying Field-at-a-time, and so only creating the tables for a given field? (Remember, we only need to bother creating tables on write, not on read, since for a SELECT we'd still end up with empty data anyway in the end so no sense mucking with the schema.)
Comment #93
yched commented@Crell
Not good enough, unfortunately, because a *LEFT* JOIN on a table with no data does not mean the result set is empty.
Imagine the following case:
- You have a View that lists nodes. Let's say the current set of nodes is such that the View actually returns results.
- You add a new field to a node type, and change the view so that it filters on "nodes WHERE that new field is empty".
- With current HEAD, the new field table is created immediately. It is empty, but the Views query LEFT JOINs to it, so the View still returns the same result set as before. That is correct : there *are* nodes, and the new field is "empty" on them.
- With the current proposed approach, the View is suddenly empty. Until someone actually adds one field value in one arbitrary node (doesn't even have to be one of the nodes selected by the View), which creates the table, and the View is back to returning all its previous nodes. That would feel very broken :-)
An approach where :
- the SchemaProvider is responsible for a set of tables, and receives the name of the table that needs to be created,
- and the query can specify an array of SchemaProviders (e.g for our case one per entity type involved in the query); when a table is deemed missing, they are called in turn until one of them says "yep, that's mine" and actually returns a schema.
would seem to solve that.
That also means the "create on table not exists" logic would need to account for all the tables that are part of the query, and possibly behave differently based on the type of JOIN ?
[edit: hm, not completely sure that "behave differently depending on the join" part would actually be needed]
Comment #94
yched commentedRestating this to be extra clear: I'm just trying to outline what (I think) it would take to be able to leverage that lazy-create feature for entity/fields because, in other issues, @catch and @alexpott seemed to expect that it will solve a couple race conditions on entity/field table creation, that currently require weird workarounds in some tests. And also, yes, it's likely that it would make our tests run faster.
I don't intend to block a simpler version from getting in, and I'm perfectly fine with a decision of "yeah, too complex, screw entity / fields, let's just have a nice lazy-creation mechanism that works great for simpler APis".
Or "let's get a simple version in first, and possibly enrich it to account for more complex uses in a later issue" (would then imply BC break concerns, of course).
In other words : not my call :-)
Comment #95
kgoel commentedNeed to address #84 and #87. Just a re-roll right now since patch didn't apply.
Comment #96
daffie commentedFor the testbot
Comment #98
kgoel commentedMy bad for not adding "do not test" suffix as the patch was going to fail anyway. Working on addressing #84 and $87.
Comment #99
kgoel commentedProbably, this is not useful but posting patch anyway.
Comment #100
dawehnerLet's always send it to the testbot.
Comment #101
yched commentedGiven that the discussion has stalled since then, I'd vote for doing this :-/
Comment #102
bzrudi71 commentedSeems that this patch would help us to fix some PosgreSQL related issues while in transaction. Didn't read the whole story, but +1 for this approach here! Adding the PostgreSQL issue as related.
Comment #103
amateescu commented@bzrudi71, you might also be interested in comments #62, #63 and #65 where I was trying to make this patch work for PostgreSQL, before the patch started to drift into a different direction.
Comment #104
dawehnerTagging, its not trivial btw.
Comment #105
almaudoh commentedRerolled. Took three steps and an hour. Small changes due to some changes in SQL-queries in HEAD. Patch is smaller due to removals related to caching system changes.
Comment #107
almaudoh commentedNot really read the code yet, but shouldn't this be
$schema[$table_name]Comment #108
almaudoh commentedThe main cause of the failures. Inconsistency between implementation of 'create_missing_table' and 'missing_table_name'.
Comment #109
almaudoh commentedFixed #107 and #108.
Comment #113
basic commentedThis patch is causing testbot issues and not succeeding on the bots. Please review / re-roll the patch before submitting again.
Comment #115
almaudoh commentedI wonder why testbot keeps re-queuing it :(
Comment #116
almaudoh commentedRe-rolled and fixed a stray debug...
Comment #119
jthorson commentedComment #120
almaudoh commentedReroll
Comment #121
almaudoh commentedComment #125
dawehnerThe more I think about this particular issue I think it is wrong to do that in this subsystem. The query parts are separated from the schema at the moment and this patch would break that.
Comment #126
Crell commentedI'm pretty sure this won't happen until 8.1 anyway...
dawehner: Can you clarify? You access the schema through the connection object already anyway.
Comment #127
dawehnerFair, but i still think that its connects two subsystems which ideally would not know about each other.
Comment #128
Crell commentedYou mean the DB connection and the schema management? How would those be fully decoupled? The schema needs a connection in order to do anything...
Comment #129
fgmI /think/ the idea is that the schema needs a storage (hence a connection) to implement its operations, but the schema mechanism itself should not need it.
Comment #137
andypostLooks the issue brings some confusion, In #2664830-72: Add search capability to help topics
it is not clear which schema management to use - hook_schema vs definition from service's method
Comment #142
catchComment #143
catchComment #147
andypostComment #151
bhanu951 commentedTried to Re-roll patch from #120 to11. Branch.
Comment #152
bhanu951 commentedComment #153
smustgrave commentedReroll seemed to have test failures.
Comment #155
daffie commentedI have updated the IS and created a CR.
Comment #157
daffie commentedThe Mr is ready for a review.
Comment #158
nicxvan commentedOh I've had my eye on this it will make hook schema deprecation much nicer.
Comment #159
andypostadded suggestion
Comment #160
daffie commented@andypost: Thank you for the review.
Changing the class variable to a readonly variable and setting its value is not allowed in PHP. From the PHP documentation:
See: https://www.php.net/manual/en/language.oop5.properties.php
I have changed the variables to be of the string type.
Comment #161
catchOut of interest why could we not use constants for this case? The classes should be able to override those.
Comment #162
daffie commented@catch: I some cases it can be a constant in other cases it cannot. Like with
Drupal\Core\Menu\MenuTreeStorage,Drupal\Core\Routing\MatcherDumperandDrupal\Core\KeyValueStore\DatabaseStorage. In those cases the table name is set in the class constructor.Comment #163
mondrakeA bit late to the party... but why not to use events instead of a trait? #3410480: [META] Use events in Database Schema operations
For example (conceptual, not tested):
from
to
OR
from
to
EDIT: ... or, else, how about a
SchemaGeneratorservice with methods similar to the two events above?Comment #165
mondrakeA PoC for #163 in MR!10397.
Comment #166
mondrakeHm... looking at the MR now, I start figuring out we could simplify further, and let the dynamic query layer handle this:
Comment #167
catch#163-#166 are incredibly concise so if we can make those work, that is very tempting.
Comment #168
mondrake#166 would have BC concerns (db drivers would have to match the behavior of the
onFailureEnsureSchema()method on all classes extending fromQuery), so maybe for now we can settle on a proxy of that.Comment #171
mondrakeMR!10403 is for review. That includes the framework and some conversion, but not all of them. Once/if the framework is agreed, we can easy complete the remaining ones.
An interesting next step would be to pass the schema as a value object instead of an array, but there are other issues for that.
Comment #172
daffie commentedComment #174
mondrakeThanks for review @daffie! Let's continue with the new approach then. Before completing the conversions: add tests for
Connection::executeEnsuringSchemaOnFailure().Comment #175
mondrakeMR!10403 is now doing #163, deprecates/converts all uses of
::ensureTableExists()and::catchException(), and is green on all databases. A review at this stage would be helpful.#166 is just a piece of cake away with this underlying layer, so I will push that forward.
Comment #176
mondrakeNow
Flood\DatabaseBackendimplemenrs #166, and if this is OK we need to convert the rest where possible (i.e. calls to dynamic queries extendingQuery).Pausing now to let reviewing happen.
Comment #177
mondrakeFiled #3492391: Make the event dispatcher available before container full bootstrap as a possible follow up.
Comment #178
daffie commented@mondrake: It looks like a beautiful solution. My problem is that the solution will not work with MongoDB. The solution relies on that the database will throw an exception when the table does not exists. However that is not what MongoDB does. MongoDB just creates a table when one does not exists on an insert. A table in MongoDB is called a collection. It will be a table with no validation added to make sure that it only allows data in the right form being added. See: https://www.mongodb.com/docs/manual/reference/method/db.collection.inser....
The reason I started to work on this issue was to add a hook to allow the database driver to change the table schema. The database driver for MongoDB stores timestamps as MongoDB\BSON\UTCDateTime instead of an integer value as is done by the by core supported databases.
If we could make it to work for MongoDB that would be great. I will need some time to think how I can make it to work with MongoDB. If you have some ideas or suggestions, then please speak up.
Comment #179
daffie commentedWrongly assigned to issue to @acbramley
Comment #180
mondrake@daffie well, that's why I suggest using events, that gives us the possibility to extend/override how they are processed. At least conceptually, the MongoDB driver could implement a subscriber/listener for the
ExecuteMethodEnsuringSchemaEvent, make it so that it processes the event before core'sSchemaRequestSubscriberdoes, do its alternative processing of the schema request, and stop propagation of the event so that core no longer processes it afterwards. Of course, needs to be proven in practice.Comment #181
daffie commentedComment #182
mondrakePostpone on #3492391: Make the event dispatcher available before container full bootstrap
Comment #184
catchAdding the issue this is postponed on to the issue summary.