Once a DB table is created though db_create_table() I would expect that the previously cached DB schema (see drupal_get_schema) would be invalidated.
Instead every time db_create_table() is executed after that the DB schema is already cached the newly create table schema won't be inserted in the cached schema resulting in the fact that subsequents calls to drupal_write_record will fail.
Currently if one needs to insert something into the table after a db_create_table() has to call drupal_get_schema('table', TRUE) before doing any insertion. I think this is pretty illogical as IMHO one would just expect that the newly created table to be in the DB schema.
This easily leads to bugs for example in .install files and update functions.
For example: http://drupal.org/node/716506#comment-2612184
I would suggest adding a call to call drupal_get_schema('table', TRUE) in db_create_table() so that the cached schema will be refreshed..
Another approach would be making drupal_get_schema intelligent enough to understand that it's cached schema is out of date but this probably would have some problems with performance..
Hope this helps,
I'm interested in hearing your comments.
Fabio Varesano
Comment | File | Size | Author |
---|---|---|---|
#22 | drupal.forbid-drupal-write-record.patch | 1.06 KB | sun |
#8 | drupal.update-schema.8.patch | 790 bytes | sun |
Comments
Comment #1
sunsubscribing
Comment #2
Crell CreditAttribution: Crell commentedI am not sure that adding the line to db_create_table() itself is a good idea. For one thing, it's just a procedural wrapper around the OO DB layer, and direct function calls within the DB layer classes is in most cases a bug. For another, tables are generally created en masse, like on install. Clearing and rebuilding the cache after every single alteration to the schema is needless thrashing. Plus, the schema cache is based on hook_schema() anyway, not the actual tables themselves. :-)
Rather, I think we should be making sure that after any operation likely to affect the schema we clear the cache. Eg, after each update hook, or after a series of update hooks gets called. In fact, don't we do that already since the cache is cleared after an update to begin with?
Comment #3
sunSimilar thoughts crossed my mind when reading the other, original issue. My thought was that a module update, which adds a table, will have the new table already defined in hook_schema().
I quickly tried to look up whether we clear the schema cache before executing updates, and probably we're not - I think that's more likely to be the cause here.
Comment #4
Crell CreditAttribution: Crell commentedfax8: I don't quite follow the use case where the current setup would be causing a problem. Can you give an example? I couldn't quite follow how the Mollom issue you linked is related.
Comment #5
fax8 CreditAttribution: fax8 commentedBasically the linked mollom bug was affected by this "inconsistence" because they had a series of serialized drupal variables (mollom_form_*) which they were trying to move to a dedicated table during an update function.
The logical approach they used was to create a new table with db_create_table() then populate the new table with subsequent calls to drupal_write_record() .. unfortunately the db schema isn't modified after a db_create_table() so that this check
failed resulting in loosing the data to be inserted.
So, my point is that each time that a table is created/modified/deleted I would expect that the changes to be available in subsequent calls to drupal_get_schema() .. but, as the schema get cached, this doesn't happen.
Comment #6
Dave ReidMore clearly you can't do this in an upgrade currently:
I think we have a similar issue with #620298: Schema not available in hook_install() but this is for updates, not installs.
Comment #7
Crell CreditAttribution: Crell commentedRight, that's a long-standing issue with Schema API. You cannot rely on the schema being valid during an update hook because it's "in flux" at that point, and the caches won't be cleared until after the update process is done. I am not really sure how to solve that.
Comment #8
sunSomehow, I thought it would be as simply as this.
Thoughts?
Comment #9
Crell CreditAttribution: Crell commentedCan you include a unit test to verify that it works? If it really is that simple that would make a lot of people very happy... :-)
Comment #10
Dave ReidI'm not sure how #8 fixes the situation at hand.
Comment #11
sunLet me try to explain why it should fix this bug - using your example snippet:
I'm not sure how to properly test this though.
Comment #12
nonsiesubscribing
Comment #13
Dave ReidYeah unfortunately any call to drupal_write_record would then cache the schema, and any other update in another module that tries to do the same thing is screwed.
Comment #14
sun@Dave: Sorry, I'm unable to follow you. Could you elaborate a bit more, please?
drupal_write_record() does not cache the schema, it tries to use it. It currently cannot use it, because the schema is outdated.
This patch is essentially doing exactly the same as the fix that we applied to Mollom's module update. Just "once for all" instead.
Comment #15
fax8 CreditAttribution: fax8 commented@Dave not if all the table schema update/create functions care to invalidate the cached schema or modify with the new info the existing cached one.
Comment #16
Crell CreditAttribution: Crell commentedfax8: The schema update routines affect the DATABASE. The schema definitions that drupal_get_schema() uses come from the hook_schema() implementations. Drupal does nothing at present to keep those in sync for you. That is an entirely manual process, and will remain so (unfortunately) for Drupal 7. The database-modification functions have no reason to be messing with the manual declarations one way or another.
Better solution: Just don't use drupal_write_record() in an update hook. Honestly I don't really understand what purpose it serves anymore with the insert, update, and merge query builders.
Comment #17
David_Rothstein CreditAttribution: David_Rothstein commenteddrupal_write_record() is very useful still because it allows you to pass (potentially unknown) objects rather than having to specify every single database column. In theory it's not quite as critical in an update hook - but the fact is that many API functions rely on drupal_write_record(), so if that doesn't work it means you can't use your module's API functions either. That was the issue that made #620298: Schema not available in hook_install() so annoying, and we really should fix this one too.
I think @sun's patch is correct as is, and clearing the cache only that once is all that is needed.... Here is how I understand it:
Anyway, if we can't test this (or even if we can) we should at least try to expand the code comment a bit to be more specific about how this works; it's a very tricky and confusing situation.
Comment #18
Crell CreditAttribution: Crell commentedIf a cache flush right before we start the update process works, I'm good with it. Although if it were that easy, why didn't we ever do it before? :-) It may be worth asking bjaspan for his input here, since he wrote the schema system originally.
Comment #19
bjaspan CreditAttribution: bjaspan commentedSadly, I do not think the simple patch will work.
hook_schema() reflects what the db will look like after all updates have been run. The current problem is that the cached schema structure is not updated until after updates are run, so that means that in the window between a call to db_create_table() and when all updates are done, the schema structure is wrong. Changing Drupal so the schema structure is updated before updates are run just changes the window when the schema structure is wrong to time between before the updates are run to when all the necessary db_create_table() statements are executed. Either way, there is a still a window when the schema structure does not accurately reflect the database. Also consider that it isn't just db_create_table() that can trigger this situation, but any db alteration.
Here's a scenario showing the problem with updating the schema before updates:
1. Module Foo 1.0 and Bar 1.0 are installed
2. Module Bar 1.1 is installed which requires module Foo 1.1, so both are installed at once.
3. The update to Foo 1.1 changes Foo's tables.
4. Prior to updates being run, the schema structure is wiped. Now the schema structure reflects Foo's 1.1 table structure, but the db still has the Foo 1.0 table structure.
5. Bar's updates run first. bar_update_1() calls drupal_write_record('foo', ...).
6. Boom.
Here's another scenario:
1. Module Foo 1.0 is installed.
2. Module Foo 1.1, 1.2, and 1.3 are released with db updates, but a site owner chooses not to install them.
3. The site owner installs Foo 1.4.
4. Before updates are run, the schema structure is wiped. Now the schema structure reflects Foo's 1.4 table structure, but the db still has the Foo 1.0 table structure.
5. foo_update_1_1() calls db alteration functions to bring the db up to the 1.1 schema, then calls drupal_write_record('foo', ...).
6. Boom.
The problem here is that the Schema API is not keeping the cached data structure in sync with the db tables, but db_write_record() assumes that it does.
The only completely correct solution I can think of at the moment is for the Schema API functions update the cached schema structure for the change they are making, so it always reflects the actual state of the database in real time. Yes, this means that for the duration of update functions, the *correct* schema structure is *not* the sum of what is returned by hook_schema(), because hook_schema() assumes all updates have been run. Once all updates are run, the schema cache can be wiped and replaced by the results of hook_schema(), though unless there are bugs there will be no difference between that and the dynamically updated schema structure.
To put it another way, this is all just a slightly more complex case of an API not maintaining consistency due to caching. Callers should never have to know that an API performs internal caching. See http://jaspan.com/dx-files-static-caching-drupal.
Comment #20
sunThanks for reviewing, Barry!
However, you will get the exact same or similar errors if you remove the "flush-schema-cache-prior-running-updates" steps from both of your examples, because the examples are focusing on an entirely different topic: update function dependencies.
The only difference this patch will lead to is that when module B depends on A and the updates ran in proper order, then both module B and A will be able to use API functions that may use drupal_write_record() to insert/update data.
In D7, both scenarios could actually be resolved with hook_update_dependencies(), and perhaps the use-cases for this hook are much broader in contrib than we thought.
Technically, I'd agree that a "perfect" solution would update the updated parts of the schema immediately after performing the update. However, I'm not sure whether that is doable within the current system.
Thoughts?
Comment #21
David_Rothstein CreditAttribution: David_Rothstein commentedRight, Barry's first example sounds like a case of module Bar not properly implementing hook_update_dependencies() when it should have.
But the second example, I don't think is? That one looks like it might be a problem...
Comment #22
sun1. Module Foo 1.0 is installed.
2. Module Foo 1.1, 1.2, and 1.3 are released, all with DB updates, but not installed.
3. Module Foo 1.4 is installed.
4. Updates are run, without flushing schema cache before.
5. foo_update_1_1() alters the database, invokes drupal_write_record().
6. Boom.
No difference. Actually, that's like it is now.
However, I can absolutely see the additional problem of changing schemas across multiple updates + API and helper functions failing due to this. But then again, that's a different issue:
1. Foo 1.0 installed.
2. Foo 1.1 - 1.4 released, schema changes everywhere, but only Foo 1.4 is installed.
3. Prior running updates, flush all caches.
4. Boom.
But yeah, I see now that it's totally error prone to use this function during updates. So, can we prevent this from happening please?
Comment #23
bjaspan CreditAttribution: bjaspan commentedre #22, yes my exact point is that regenerating the schema cache before *or* after all updates are run has the same problem. That's why the patch in #8 won't work.
The patch in #22 correctly reflects the fact that it isn't save to use drupal_write_record() during updates, so it seems like an improvement over the current situation.
Well... it's an improvement in terms of total correctness, but then it also prevents the function from being used in situations where a module authors knows it will work fine, such as an update function for a module that has NEVER changed its tables and just wants to create some new rows. However, the limitations on when doing that is truly safe are very subtle (e.g. if the module has ever updated a table in a prior update, it isn't safe), so it is probably best to outlaw it entirely.
Comment #24
sunExactly... the module author may know that now, but if all of a sudden the schema of the affected tables is changed or needs to be changed, the module author would have to rewrite all of his updates. Hence, better don't allow it upfront.
Just double-checked core .install files and fortunately, we're not using drupal_write_record() anywhere. ;)
Comment #25
Crell CreditAttribution: Crell commentedThere are contrib modules for D7 that are starting to use drupal_write_record() to save entities. Actually I think even core does that. #22 would effectively prevent $entity_save() from ever being possible inside an update hook, which is a non-starter to me.
Comment #26
bjaspan CreditAttribution: bjaspan commentedOptions that I am aware of:
1. Leave things the way they are, remaining vulnerable to subtle bugs under unpredictable circumstances.
2. Fix the Schema API functions so that drupal_get_schema() always tells the truth; the "right" solution.
3. Forbid any operation during update operations that might trigger the subtle and unpredictable bugs.
I point out that #2 is a not an API change, it is a bug fix that makes the APIs work as advertised.
Comment #27
sun@Crell: Are you sure? :)
http://api.drupal.org/api/function/file_save/7
http://api.drupal.org/api/function/locale_date_format_save/7
http://api.drupal.org/api/function/path_save/7
errrrrr, technically, one could say that drupal_write_record() is dead in core already ;)
Comment #28
Crell CreditAttribution: Crell commented@bjaspan: Yes. #2 is also by far the most difficult. :-)
@sun: I don't follow. How is it dead if it's being used in all of those functions?
Comment #29
sunIf we make those 3 functions use db_merge() like everywhere else, then drupal_write_record() is not used throughout core. Hence, those changes along with the patch in #22 could be committed.
Comment #30
Crell CreditAttribution: Crell commentedHm. Using drupal_write_record gets a +1 in my book in general, so I'm OK with that. :-) We'd need to make sure it was well-documented that certain functions are blacklisted in update hooks, and make sure we know what all of them are.
Comment #31
bjaspan CreditAttribution: bjaspan commentedHow difficult would it be for Schema API functions to update the cached schema structure? For create or alter table or field or index, they have the new structure to add/replace in to the schema, and for drop they know which tables or fields or indexes to remove from the schema.
Comment #32
David_Rothstein CreditAttribution: David_Rothstein commentedHm, what am I missing here? When I go to http://api.drupal.org/api/function/drupal_write_record/7 it says "26 functions call drupal_write_record()"....
and a number of those definitely get used in update hooks.
This sounds like it would be tricky because you'd have to make sure that cache never got deleted while there are still outstanding updates to run, right? Because if it did get deleted, you'd never really be able to get it back to where it was supposed to be. For example, if an update function was aborted, you might not come back to run the remaining updates until later, and you'd need the cached data to still be there or otherwise the second scenario from #19 would once again occur.
Here's a thought: What parts of http://drupal.org/project/schema are we willing to put into core in order to solve this problem? Actually, as far as I can tell, some of the schema introspection stuff looks like it's already in the new database API, isn't it?
Because then, you could primarily just do something like this:
Comment #33
Dave ReidI would much rather go for a solution like 32 and always invalidate caching the schema while in update mode.
Comment #34
David_Rothstein CreditAttribution: David_Rothstein commentedHere's another fun gotcha that can occur when trying to use an API function inside an update, even if that API function doesn't use drupal_write_record: #734710: API functions that invoke hooks don't work as expected when a disabled module is being uninstalled
Maybe we should just tell people to stop using API functions altogether :)
Comment #35
Crell CreditAttribution: Crell commentedIdeally, we should make the schema objects entirely able to self-analyze their own tables and be done with it, and then allow an over-ride via the schema hooks that's Drupal specific. The problem is that is a major change to the Schema system, and while I want to do it I'm fairly certain we can't get away with it in alpha2 state. Also, last I checked schema module was MySQL-specific. Remember that we have 3 databases we need to make work.
(This is one of the reasons I've always hated schema-dependent code.)
Comment #36
bjaspan CreditAttribution: bjaspan commentedre #32: David makes a great point for why having the Schema API functions dynamically update the cached schema is harder than I realized.
re #35: schema.module's inspection code supports mysql and postgres, or at least it did when I last touched it. I've given the module to Mike Ryan.
Comment #37
derhasi CreditAttribution: derhasi commentedIt seems drupal_get_schema() is not using drupal_static at the moment. By doing so, we could alter the static cached $schema:
I guess building a wrapper function for this method would be a good idea. So someone could use
What do you think of this approach?
Comment #38
sunIn #922996: Upgrade path is broken since update.php does not enforce empty module list, we want to eliminate the entire access to module APIs and information provided by hooks during the update.php process, since that is the only right thing to do.
As mentioned earlier in this issue, I still think it would be a good idea to get rid of
drupal_write_record()
altogether. It is a pretty lame data-schema binding/conversion attempt, which doesn't take the actual entity properties into account and performs a "dumb" transformation of object properties into values that might be suitable for the storage schema columns.So let me go out on a limb and call this issue won't fix, and provide a new issue for the proposed direction:
#1774104: Replace all trivial drupal_write_record() calls with db_merge()