While I think it's too late for this to see widespread use, I think this is a piece of code that should be highly encouraged to be used. It may be worth a followup patch to actually use this in node.module, where a GREAT benefit could be had.

The benefit? If the schema is modified, NO code changes are necessary to change what data is loaded/saved from that object. The other benefit, really, is that it is guaranteed to be type safe and some of the queries that write data into tables can be quite long if there are a lot of fields; this reduces that code quite a bit since we now have a list of all fields available, and their type.

It also introduces a flag to schema API: "serialize". Simply, it makes the api smart enough to serialize fields prior to writing / after reading.

Comments

yched’s picture

StatusFileSize
new3.87 KB

Added missing '%b' for 'blob' columns, as noted by dmitrig01 on IRC.

CCK could (and will no matter what :-) ) make a great use of that. So will Views, obviously, and I guess a good deal of other contrib modules. So having this in core clearly avoids every contrib coming with their own copy :-)

eaton’s picture

Priority:Normal» Critical

Holy crap.

Seriously, holy *crap*. This is just the sort of thing that SchemaAPI is an enabler for. Right off the bat, no fewer than 6 of my modules can ditch 40-60 lines of CRUD code each. The process of writing the heavy-lifting code for a module just got easier.

The only complained I have is that I'd like to see it named something like drupal_write() or drupal_load(), but i can totally live with the long function name. Who do I talk to to convince this is a bug that needs fixing, rather than a feature?

I just spent some time testing it and it is freakin' awesome.

eaton’s picture

An addendum... the only real issue that I've found is that the result returned is a boolean rather than the primary key of the record that was inserted. That WOULD be pretty cool, although a quick call to db_last_insert_id() would accomplish the same and not force those who don't need it to incur the slowdown.

Seriously, I just whipped up an altered version of a module I'm working on, and it made the CRUD work essentially *nonexiestant*.

merlinofchaos’s picture

Note that db_last_insert_id is already called, so the object is filled in with any new IDs necessary. This is why $object is passed into the write function as a reference. Theoretically db_last_insert_id supports multiple serial types, though I think that's database dependent.

So if you write a node, look at $node->nid to see the new nid.

eaton’s picture

Status:Needs review» Reviewed & tested by the community

Wow. I completely missed that, I was reflexively dsm()ing the return value; that's even cooler.

With that in mind, I'm going to go out on a limb and RTBC this. I've run it through both functions and it's a huge boost. The only question, IMO, is whether we're willing to get it into D6. I hope we do.

webchick’s picture

sub

webchick’s picture

I agree with Eaton that these functions should be called drupal_write and drupal_read or drupal_get/drupal_set or something like that. It is non-obvious that drupal_unpack_something() is the opposite of drupal_write_something().

Beyond that, could one of you guys post some sample code so we can see this in action and test it? I think it might help make the argument.

merlinofchaos’s picture

Note that they are not exactly the opposites of each other, which is why they're named that way.

One of them actually forms a query and writes to the database; the other reads data but the query isn't part of it. The reason for this is that you write objects one at a time (generally) but you can read many from a single query.

That said, I'm ok with something like:

drupal_write_row() or drupal_save_row() and drupal_read_row() or drupal_load_row() or somesuch.

The actual names of the functions can easily be fixed when this patch is applied, so I'm not going to bother with a reroll just to rename them. That's pretty arbitrary and easy to change just by editing the patch.

eaton’s picture

Beyond that, could one of you guys post some sample code so we can see this in action and test it? I think it might help make the argument.

Here's a sample snippet:

<?php
$vocabulary
->name = 'Test vocab';
$vocabulary->description = 'This is a programmatically submitted vocabulary.';
drupal_write_schema_record('vocabulary', $vocabulary);
dsm($vocabulary);

$vocabulary->name = 'Whoops, messed that up';
drupal_write_schema_record('vocabulary', $vocabulary, TRUE);
dsm($vocabulary);
?>

While it's not terribly impressive at first glance, here's what IS cool. First, it discovers the type safety information from schemaapi, constructing the query properly. Second, it automatically populates the object with schemaAPI's default values. Third, it populates the $vocabulary object with the primary key once it's persisted. Fourth, the second call -- the one that has TRUE appended to the arguments -- constructs a properly formed UPDATE query without any additional code.

Finally, the 'under the hood' feature that's added is the ability to flag certain fields as 'serialized'. If they are, these two save and load functions will handle the serialization of complex bits on the object automatically. When i was building voting_actions, handling these kinds of structures cleanly (I was maintaining three tables full of these things) took a LOT of code. This is a huge improvement, and focusing on getting people to use it in their modules whenever possible brings us closer and closer to clean use of SchemaAPI to get to the promised land of clean descriptive metadata driving our structures. ;)

Am I excited? Yes. Yes, I am.

webchick’s picture

oh. my. god.

So basically, this is THE missing link to making installation profiles creatable by mere mortals??!

eaton’s picture

So basically, this is THE missing link to making installation profiles creatable by mere mortals??!

I wouldn't say that it's THE missing link. It still only acts as a substitute for what you would create using db_query() and hand-rolled SQL yourself. The difference is that *for objects whose properties map cleanly to their table structure* -- the majority of our objects, really -- this function can handle rolling the actual insert and update SQL for you based on the .schema file that defines the table structure.

Modules that build complex, convoluted table structures that are tough to keep updated will still require lots of manual labor, but this makes keeping things in sync loads and loads easier as schemas are evolving, etc.

Crell’s picture

This (the write query, anyway) actually sounds a lot like the db_insert()/db_update()/db_delete() functions I tried to get into core last year. Dries rejected them at the time, so they ended up in Ber's ill-maintained helpers module instead. Essentially the same functionality is already being baked into the D7 database API, since PDO is type-agnostic at the PHP level. It should also help with cross-database portability. And in fact checking the schema like this is exactly how we'd handle LOBs for Oracle/DB2 in modification queries.

It's good that so many people see the value in more abstraction. :-)

Gábor Hojtsy’s picture

Status:Reviewed & tested by the community» Needs work

I wonder why are people submitting features still for Drupal 6. We should stabilize the code, not add more stuff, and more stuff and more stuff. However cool they are, after all we'd like to have a release sometime, right?

Anyway, being pair of functions, similar naming would be much better IMHO. Still, I am passing the decision off to Dries, as I am looking for bugfixes for Drupal 6, not features.

merlinofchaos’s picture

Goba: Note that I don't really consider this a new feature; I think this is something that was seriously missing from the schema API and its lack is very much a design flaw, and one that needs to be corrected.

Because schema API hasn't been in very long, and we've only just started to use it and see where it's going, it's dangerous to label this stuff as merely 'new features'. Putting out new APIs without the features that make the new APIs really useful is not effective.

Gábor Hojtsy’s picture

I'd like to actually put something out, so I am aiming for a beta release soon. I have my own list of shame on which is not included in Drupal 6 (I am doing the D6 i18n talk tomorrow here at FrOSCon/Sankt Augustin), without which I *think* users will not value our (huge number of) achievements that much, but I am sure others have their lists too. We can continue on "we should not release Drupal 6 without this design flaw fixed" for ages.

Offtopic rant: I am eagerly waiting for the bug fixes. The RTBC queue is now four patches, and that is it. Most of them are "design flaws". If we are in such a good shape with Drupal 6, then we could just as well throw it on the world in the form of a beta release. (Unfortunately I don't think we are in that a good shape, but we might just as well try and see).

merlinofchaos’s picture

Offtopic rant: I am eagerly waiting for the bug fixes. The RTBC queue is now four patches, and that is it. Most of them are "design flaws". If we are in such a good shape with Drupal 6, then we could just as well throw it on the world in the form of a beta release. (Unfortunately I don't think we are in that a good shape, but we might just as well try and see).

You know what? All of us here are busy people. This 'feature request' came as a direct result of trying to actually get some work done for Drupal. I'm really, I mean seriously, sick and tired of people trying to suggest that their priorities have to be my priorities.

Your lame attempt at applying guilt only pisses me off. How can I be working on this feature request? Hey, I'm only trying to get a very important module to work with Drupal 6, and it really, really bothers me when you insinuate that what I'm spending my time on isn't just as important. I don't understand why this attitude is so prevalent throughout the Drupal hierarchy, but it turns volunteers away. Stop it. Guilt is NOT an appropriate motivating force when you're getting work from volunteers.

Gábor Hojtsy’s picture

Feature freeze changes priorities. I looked up the wikipedia entry for the freeze term now, but it does not seem like I misinterpreted it before. if a feature freeze discourages people from submitting features for the current version, then I don't think it makes for a problem. Your work is important, but I don't think it fits in this stage of the development. Mine is not the final say, I also sent a note to Dries to get him look at this.

(Poor Dries I send all feature requests to him these days. He specifically asked me to be more strict about what goes in, so he is the one who makes the exceptions.)

merlinofchaos’s picture

(Poor Dries I send all feature requests to him these days. He specifically asked me to be more strict about what goes in, so he is the one who makes the exceptions.)

That's fine. Commit it or reject it. It's the offtopic rant I find completely objectionable. I'm far, far more upset about that than I will be if this is rejected.

Dries’s picture

I'll have a closer look at it the next couple of days.

At first glance, the third parameter "$object" of drupal_unpack_schema_record() looks a bit odd. It seems cleaner to merge the objects later on.

The obvious question: can we take advantage of this somewhere in core? If so, it should be part of this patch, IMO.

moshe weitzman’s picture

Status:Needs work» Reviewed & tested by the community

set to RTBC so the committers can decide. i don't know of any work that this needs.

these new functions cannot possibly break anything. and they can remove a lot of code from contrib and even reduce security errors. i think they merit inclusion in D6.

@merlinofchaos - just to present the other side: bug fixing and testing a release is work that almost noone wants to do. we have to use guilt and many other techniques to get anyone to pay attention to it. without asking people all the time and even guilting them (and horse trading and blackmailing and ...) a release would never get ready. lets discuss more elsewhere.

merlinofchaos’s picture

Moshe: Guilt someone else. Using guilt on me makes me stop working, not work harder. Using guilt reminds me that I'm spread really thin; that I have HUGE projects that are important (you, in fact, reminded me how important last release) and that since Drupal core has an army of people working on it, and Views has...uh, me, and a few people who are helping out, it's really clear where my time is best spent. My time is, currently, not best spent on core.

Dries: The 3rd parameter is important to views, which is not using objects of stdClass(). If we don't send an object through, this function will create an object of stdClass, copy everything from the object it loads the data from and then Views will have to copy all of those keys again; that's a loop that we could easily avoid. Especially when you figure that a good sized view might be loading 10 or 15 different records pretty easily. Yes, we can get rid of that third parameter, but I would rather not.

I think we could put this to good use in node_save() and node_load() pretty quickly. It will save a bit of code there.

Dries’s picture

Status:Reviewed & tested by the community» Needs work

If we can put it to good use in core, we need to include that in this patch please. Thanks. :)

Gábor Hojtsy’s picture

I set it to "needs work" originally, because as people said, consistency in the naming would be better. I don't think so general names like drupal_save() and drupal_load() work, but it would be better to bring some consistency here.

profix898’s picture

(subscribing)

pwolanin’s picture

subscribe - a killer feature that we all want, but I'm worried about timing.

yched’s picture

database.inc now has _db_type_placeholder(), which could probably be used here ?

chx’s picture

Version:6.x-dev» 7.x-dev
Status:Needs work» Needs review
webchick’s picture

Status:Needs review» Needs work

It's still CNW because core needs to make use of it somewhere (see #22).

moshe weitzman’s picture

StatusFileSize
new8.68 KB

I went ahead and converted node_save() to use this code. It makes good sense for writes, and less so for reads. Code is more readable now IMO.

The $node-log handling isn't right yet. I will get to that in next 24 hours.

moshe weitzman’s picture

StatusFileSize
new8.93 KB

Here is a working node_save(). As you can see, it got much cleaner.

I changed earl's write function a bit. When you were updating a record, it populated missing f$object fields with defaults. Thats not what you want for updates - you want existing values to be left alone. So thats what it now does.

My next patch will include a conversion of node_load(). I was wrong before - thats extremely desirable. Stand by for the patch and some more explanation.

eaton’s picture

After discussing this with moshe, I think it's a very valuable addition. Obviously, any addition this late is controversial. i'll attempt to explain why it is still desirable, and the core committers can decide whether it's compelling. :-)

1) Most important, it DOES NOT break any APIs or change any existing APIs. It is NOT a required change, just an additional set of two helper functions that will simplify life for developers by leveraging D6's schema api. Why is that important?

2) Right now, SchemaAPI is just being used as a replacement for hook_install. This new feature makes the benefits of converting to SchemaAPI much more compelling: it eliminates the need for hand-rolled update/insert/select functions of almost all simple table structures. That's pretty huge.

3) Reading and writing to simple tables can be done by populating an object and calling a function, rather than writing discrete CRUD functions. And keeping them in sync while developing -- a major annoyance for developers working on evolving contrib modules -- is as simple as updating your .schema file -- your CRUD code is 'updated' for free.

4) Eventually, modules that use this method will be able to take advantage of SchemaAPI's advanced features -- hook_schema_alter(), for example, would allow modules to tag efficient 'extra columns' onto existing tables. No queries would need to change for the records to load and save properly.

4) To reiterate, it breaks no existing code and no one is required to use it. It just makes things cleaner, and eliminates the need to redundantly specify table/column data every time we touch the DB. That's happy.

chx’s picture

Status:Needs work» Needs review

Holy Batman, that's a lot of code cleanup! I asked Dries to visit this issue and decide whether it's six or seven.

moshe weitzman’s picture

StatusFileSize
new10.79 KB

OK, here is a patch for review. This one has node_load() consulting Schema for the fields that it should retrieve. Earl's read function was not useful for this task and in fact not useful for anywhere in core that I could see so I'm being conservative and just removed it from the patch.

Thanks to Eaton for the nice summary. I'll add one more big benefit. High volume sites can now choose to hang custom fields right off of the node table instead of performing a JOIN to a custom table. They just need to add a small hook_schema_alter(). They get load and save of their custom field for free. So I'd consider this patch now a "scalability improvement" in addition to the advantages already listed. Drupal grey beards will recognize this as a glorious return of nodeapi('fields') functionality which we lost during the node revisions patch years ago.

eaton’s picture

OK, here is a patch for review. This one has node_load() consulting Schema for the fields that it should retrieve. Earl's read function was not useful for this task and in fact not useful for anywhere in core that I could see so I'm being conservative and just removed it from the patch.

Earl's function was useful because it handled unserializing serialized fields automatically. That's definitely a useful function. drupal_read_record() would be a good corollary addition to this, even if it only accepted primary keys and returned all fields, just because of that.

I'll have to look closer at the new functions to see how they differ from Earl's, but the direction this is going is very, very useful.

Wim Leers’s picture

Subscribing.

yched’s picture

Latest patch redifines drupal_schema_get_placeholder() ?
There's already a _db_type_placeholder() in database.inc, which seems more thorough.

While I'm still very much in favor of this, it should be noted that calling drupal_write_record() (or functions that rely on it, which could be trickier to detect) in update functions will cause problems, because the function relies on the latest state of the schema, while updates are precisely executed in a context where the current state of the schema is *not* the one returned by drupal_get_schema.

merlinofchaos’s picture

I believe drupal_get_schema is supposed to give the current state of the schema, and its state should be reflected by the alter_schema hook when something external changes it.

Gábor Hojtsy’s picture

That is why you can't use it in update functions.

merlinofchaos’s picture

Goba, from my reading, you just said that you can't use it during an update function because it will be accurate.

eaton’s picture

Well, the purpose of update functions is generally to read data in from an OLD version of the schema, make changes, then write them to a new version of the schema (or something along those lines). As such, using functions that reflect EITHER the old or the new state is unlikely to be perfect. Update functions will always (IMO at least) be a place where custom SQL is called for and 'canned' functions are tricky.

merlinofchaos’s picture

Oh. Terminology glitch:

Because the function has $update as a parameter, that's the update I read.

moshe weitzman@drupal.org’s picture

Version:7.x-dev» 6.x-dev

ahem. this issue in 'code needs review', so please review it.

@yched - i say your comments about similarity between the db placeholder function. our two versions are slightly different and i haven't yet figured out why. earl wrote the one belonging to this patch. help wanted.

merlinofchaos’s picture

They are slightly different because it has special handling for the serial type, where it puts in a NULL to ensure that a new id is generated on insert.

bjaspan’s picture

Subscribing. I just learned about this issue but I've been working on this idea for a while (and I've written about it elsewhere though it did not get nearly this much attention).

moshe weitzman’s picture

StatusFileSize
new10.29 KB

i removed that near duplicate placeholders function and am using the one from database.inc now. working fine for node inserts and updates.

yched’s picture

Those lines towards the end of the patch looks strange :

drupal_write_record('node', $node, 'nid');
(...)
drupal_write_record('node_revisions', $node, 'vid');

The 3rd argument for drupal_write_record is supposed to be (bool) $update, right ?

in function drupal_schema_fields_sql(), I guess the $columns array should be initialized.

Also, it could certainly be debated, but I think having this function return an array of prefixed fields (instead of a comma delimited list) would make it more flexible, and make this part more elegant :

+  $fields = drupal_schema_fields_sql('node', 'n');
+  $fields .= ', '. drupal_schema_fields_sql('node_revisions', 'r');
+  $fields .= ', u.name, u.data';

Aside from those 'patch reading' remarks, it's currently uneasy for me to actually run and test, sorry :-)

moshe weitzman’s picture

I am now returning an array from that sql fields function. I also initialized $columns as suggested even though the prior code did not generate a notice and is valid PHP.

$update will be FALSE in the case of INSERT and the name of the primary key field in the case of an update.

moshe weitzman’s picture

StatusFileSize
new10.46 KB

oops - now the file.

yched’s picture

$update will be FALSE in the case of INSERT and the name of the primary key field in the case of an update.
Ah right, got it :-)

bjaspan’s picture

I've have now had a chance to read through this issue and quickly read through the code. I'm really pleased to see Schema API being used for exactly the kind of thing it was intended. Good job!

Comments:

#14: Earl, leaving this functionality out was not an oversight or a design flaw but a decision to keep the initial Schema API simple.

My original implementation of the API in schema.module supported not just per-table record loads but per-object loads based on join information encoded in the schema. This allows, for example, loading a node, all its terms, comments, votes, etc., all in one place, in node_load(), so that almost every module's nodeapi('load') becomes obsolete. Additionally, all one-to-one and a single one-to-many node-related tables can be loaded in a single SELECT query, substantially reducing the number of queries necessary to load a node. See the original Schema API issue (e.g. http://drupal.org/node/136171#comment-229414). The code has bit-rotted since I last touched it in April (and the method for encoding join info is wrong) but it won't be hard to bring back.

We decided to leave join info out of the D6 version of Schema API, so the per-object load/save function went with it. I did not occur to be that a single-table per-record load function would be useful/popular. I'm pleased to see that it is and that you wrote it. :-)

#48 code comment: Tables can have primary keys involving more than one column. drupal_write_record()'s $update parameter assumes the pkey is a single field. We can restrict drupal_write_record() to work with single-column pkey tables for simplicity but should explicitly acknowledge and document that fact. However, it also does not seem like it would be much more complicated to allow $update to be an array.

I hope to take a more careful look at this code before it gets committed. More later.

moshe weitzman’s picture

StatusFileSize
new10.86 KB

I have enhanced drupal_write_record() so that $update can be an array of fields. This makes the function useful to many more tables. I also added more docs for that function.

nedjo’s picture

This is the small, elegant piece that makes all that Schema work come alive.

Although this is an API addition, it's completely optional--no one will have to change their code if they don't wish to. It is indeed late and excluding it on that basis would be a perfectly valid decision, but the risk of putting it in seems relatively low.

For D7, http://drupal.org/node/140860, Consistent table names to facilitate CRUD operations, will make these methods much more useful, as will mapping foreign key relationships in Schema API and thus enabling us to load nested arrays (resulting in e.g. $node['#user'] = array('uid' => 3, 'name' => '...', ...);). But this small first step already shows its value in the much cleaner node loading code.

I suppose in the patch we should be adding the 'serialize' flag as needed in the schema definitions, e.g., data column in the users table.

moshe weitzman’s picture

Status:Needs review» Reviewed & tested by the community

@nedjo - thanks. i did not add serialize flag to users schema because we aren't touching user load and user save, the logical places where it would be used. we'll get that for the next release.

this was rtbc before, and we've now had several positive reviews. i humbly promote to rtbc again. justification for including the patch in D6 is #31.

chx’s picture

I am so totally supporting this to get into Drupal 6, look at those node functions, isn't this the most Drupalish thing ever, so much less code but more extensible?

bjaspan’s picture

Status:Reviewed & tested by the community» Needs review

Static code review, just looking at the patch for now:

- In drupal_schema_fields_sql(), no need to initialize $columns if $prefix is not set.

For drupal_write_record():

- Tables with single-column pkeys are going to be common. Perhaps $update should support being a string or array; if (is_string($update)) { $update = array($update); }.

- For an update, *every* unset $object->$field is initialized to its default value or the empty string. For unset columns with no default value, the empty string is not necessarily type-correct. We should probably just leave the field unset, not UPDATE it, and return the object with it still unset.

- The code is written so that every UPDATE statement will always SET every column in the table even when not all of them are provided in the input object (the other columns are set to default or '', see above). Is that what we want? Wouldn't it be better simply not to SET columns whose field is unset in the provided object?

For node_save():

- The old code node_load()s the existing node and initializes every field in $node from it if not provided in the new node to save; later, it updates every field in the table. Looking at the patch, it seems that the new code doesn't do that, so any field not specified in the input argument will be set to its default value or the empty string (see above). Am I missing something here?

I'm not looking at the full patched code, just the patch, and do not have time at the moment to test this (shortly off to the airport) but I'm concerned enough that I'm marking it CNR so I can review it more carefully later, unless someone points out that I'm a bonehead. :-)

eaton’s picture

- The code is written so that every UPDATE statement will always SET every column in the table even when not all of them are provided in the input object (the other columns are set to default or '', see above). Is that what we want? Wouldn't it be better simply not to SET columns whose field is unset in the provided object?

This is a big one -- its use as an update function would be hampered if unspecified values always got flushed back to defaults.

moshe weitzman’s picture

I fixed the issues that Barry found. It was just a mistake I made when allowing for $update to be an array. We certainly don't wan't to overwrite existing data to defaults during an UPDATE.

Please folks - test this. You just need to:
- create a new node. then go to its edit form. Assure that everyhting was saved.
- change some values and submit the edit. view the node and assure that your edits were saved.

You have now tested node load and node save which is all that we change in this patch. The interfaces for each remain the same; this is just an internal change.

moshe weitzman’s picture

StatusFileSize
new10.91 KB

oops - now the patch.

flk’s picture

I haven't reviewed the code, but with a clean head both load and save work perfectly fine.

moshe weitzman’s picture

Status:Needs review» Reviewed & tested by the community

we now have a positive re-test and all known code issues are addressed. rtbc again.

nedjo’s picture

Status:Reviewed & tested by the community» Needs review

Thanks for the new version Moshe.

Few of us have done more than give a general thumbs up to this patch. Barry (our resident Schema API guru) gave it a going over and identified some issues. Barry, you indicated you might "review it more carefully later". Are you satisfied the issues were successfully addressed? Any other points? Is this RTBC?

i did not add serialize flag to users schema because we aren't touching user load and user save

I assume that, in principle, it should be possible to use our new methods with any object (or, for that matter, any table mapped by our schema), even if there are other methods available. Since we're introducing this feature, we should add it as needed in the schema. The fields I find needing it are:

* actions.parameters
* comments.users
* system.info
* users.data
* watchdog.variables

A difficult case is profile_values.values, which is sometimes serialized and sometimes not (see _profile_field_serialize()).

moshe weitzman’s picture

Status:Needs review» Reviewed & tested by the community

lets resist the urge to add scope. adding a new property to the schema api requires discussion. that can happen in a separate patch. if modules want to they can add the serialize flag with schema alter. At this point, we really need some indication from the committers about the worthiness of this patch for D6 (see justification in #31).

im sure barry will chime in when he has time and interest. meanwhile, the patch has been tested and code reviewed.

eaton’s picture

I'd agree that serialization is a valuable feature that has a LOT of potential for use in core. However, justifying it in this patch would require converting more of core to use this function -- and that would take more work and more time. It's a trade-off, and considering how late we are in the cycle I lean towards Moshe's response.

If it's that critical, let's corner Dries and Gabor in Barcelona and see if we can make a case for the serialization AFTER this essential underlying patch goes in. ;)

nedjo’s picture

Status:Reviewed & tested by the community» Needs work

It makes good sense to postpone the serialization question, but in that case we should remove the $info['serialize'] handling from the patch. What doesn't fit with our general approach is to code in support for a possible future addition to the Schema API, particularly one that has an obvious missing implementation in core.

I'd still be happier seeing a last signoff from Barry.

eaton’s picture

Status:Needs work» Reviewed & tested by the community

Serialization handling is already in the patch, is simple, and is a useful piece of metadata useful in many of the contrib modules that would be using the functionality. The use of these functions in core is proof-of-usability. Their use in core is the real payoff. If a core committer asks for its removal, that's perfectly fine and it can be taken out. For the moment, though, the patch as it stands is RTBC and constitutes a small but significant boost for SchemaAPI in the coming release.

merlinofchaos’s picture

If you remove 'serialize' I will CNW this and strongly object to it being committed. Every feature I included in this is one I need. If you remove features I need, then this feature becomes less useful to me and everyone else.

bjaspan’s picture

Status:Reviewed & tested by the community» Needs work

I am still concerned about this:

<?php
+    if (!isset($object->$field)  && !count($update)) {
+     
$object->$field = isset($info['default']) ? $info['default'] : '';
+    }
?>

This used to run on UPDATE but that was a simple test-backwards bug which Moshe fixed. Now, it says: "When inserting a record, for EVERY unset field (from the schema) in the object, set it to that field's default *or the empty string if there is no default*." Problems: (1) The empty string is not a type-compatible valid default value for every field type (e.g. what does '' mean as a timestamp?). (2) It is not semantically valid to provide any default for a schema field that does not specify one itself. If someone calls db_write_record and provides no value for a field with no default that is NOT NULL, the INSERT query should fail for that reason.

I think all that needs to happen is changing that code to:

<?php
+    if (!isset($object->$field)  && !count($update) && isset($info['default'])) {
+     
$object->$field = isset($info['default']);
+    }
?>

The rest of the function is written to handle passing the correct set of fields and values in the INSERT and UPDATE cases.

Note that this also means db_write_reocrd() will no longer fill in the passed object reference with the empty string for non-provided fields with no default value. I'd call that a feature.

I'd like to see a test case that passes an incomplete object to this function and verifies that the right thing happens for provided fields, missing fields with defaults, and missing fields that are nullable. I wish we had a core test suite to add it to, but we don't. I'll try to write the test later today, though.

bjaspan’s picture

Sorry, that code block should be:

<?php
+    if (!isset($object->$field)  && !count($update) && isset($info['default'])) {
+     
$object->$field = $info['default'];
+    }
?>
moshe weitzman’s picture

Status:Needs work» Reviewed & tested by the community
StatusFileSize
new11.13 KB

Barry is right again. Here is a patch incorporating his change. I also had to move the tracking of $serials outside of an if() condition since that condition was now false for key fields like nid, vid, ...

Gábor Hojtsy’s picture

Status:Reviewed & tested by the community» Needs review

Let's get the new modifications tested.

moshe weitzman’s picture

i volunteer to write a simplest unit test for drupal_write_record if this gets in. it can likely be part of our current node creation test.

eaton’s picture

Status:Needs review» Reviewed & tested by the community

Just created and edited nodes, emptied fields then updated, etc. All seems to be working well.

bjaspan’s picture

Status:Reviewed & tested by the community» Needs work

This function is proving to be more subtle than we expected, I think. I feel bad that I keep causing new versions to be rolled but so far I think my concerns have all been valid.

What happens if you do:

<?php
$node
->nid = 10;
// fill in other $node fields here
drupal_write_record('node', $node);
?>

$node->nid is the pkey and is type serial, but I'm providing a value. This is NOT an update because I did not pass the $update arg; ostensibly I'm trying to create node 10 specifically. In a previous version of the patch, the value I passed was ignored, NULL was used instead, and the newly generated value was retrieved with db_last_insert_id(). In the current version, the value I pass is used and I have no idea at all what db_last_insert_id() will return. However, I'm pretty sure that mysql and pgsql will both accept an explicit value for the 'serial' column. They will reject it only if the column is part of a unique key and the value already exists. But then what happens the next time an auto-incr value is required for that field? I have no idea, but I suspect the answer is different for mysql and pgsql. Yuck (I am liking type 'serial' less and less).

I think the answer is "don't do that"---we outlaw providing values for type serial fields. We can do it by ignoring passed-in values for serial fields or we can just specify that behavior is unspecified if you pass in a value for a serial field. I vote for the former, which means another patch.

To summarize: I think that

<?php
+    if ($info['type'] == 'serial') {
+     
$serials[] = $field;
+    }
?>

needs to be

<?php
+      if ($info['type'] == 'serial') {
+       
$object->$field = 'NULL';
+       
$serials[] = $field;
+      }
?>

like it used to be. This does not constitute undoing any of the previous changes I or anyone else has suggested; we are not see-sawing.

I am happy to roll the next version of this patch (there are some other minor improvements I'd like to make) but am too tired now to do it well; soonest I can do it is tomorrow evening.

Dries’s picture

1. I can see how this function is super-useful. At the same time, it worries me that by committing this now, we're only scratching the surface, and that we'll fail to rewrite more parts of core to take advantage of this.

2. It would be great if we could further extend the code comments based on the various problems we spotted and fixed. A lot of important knowledge risks getting lost in this issue.

3. build correlations -- can we clarify that in the code?

4. populate them in the after the query. sounds incorrect.

5. It looks like _db_type_placeholder() should become db_type_placeholder(). It's no longer a private function.

6. Can we write $conditions instead of $cond?

7. We're loading more data than we used to load and what once was one query are now two queries -- this needs to be performance tested IMO.

nedjo’s picture

I looked at rewriting more of core using these functions and came across the following issues.

1. drupal_schema_fields_sql() doesn't seem to be frequently useful. Generally we select only certain fields in a query rather than all available ones, so the ability to load an array of all field names in a table doesn't have frequent application that I can see. In the cases we do want all fields, we don't usually need to know the field names as we can select them as *.

(Probably I'm missing something obvious, but in the node module example that was converted, I'm not sure looking at the original code why we list all the fields in the first place. Why not simply select n.* and r.*?)

2. drupal_write_record() assumes an object as an argument. We do save some items that are handled as objects, but much more often we're dealing with arrays. To make consistent use of this, we would need to convert many functions to use objects.

moshe weitzman’s picture

@nedjo - SELECT * is slower than listing fields because the DBMS has to lookup in its internal tables what fields are defined. I don't know how much slower, but Souvent22 submitted several patches to remove SELECT * and they were discussed and committed. I would like objects to start loading fields based on schema so that developers can move fields from table to table to reduce joins.

We have an ugly 50/50 mix IMO between arrays and objects. Lets just use object here since we are principally talking about node and user and term (IMO). Thats where dev needs flexibility. In some other issue, lets standardize drupal on arrays (or make use of OO instead of stdClass).

Thanks for working on additional parts of core.

nedjo’s picture

Thanks Moshe for the clarifications, that makes more sense.

We could consider something like:

<?php
// Convert to an object if needed.
if (is_array($object) {
 
$object = (object) $object;
 
$array = TRUE;
}
else {
 
$array = FALSE;
}

....
// If we began with an array, convert back.
if ($array) {
 
$object = (array) $object;
}
?>

to make drupal_write_record() flexible enough to handle either objects or arrays.

It would be nice to get this function to return our SAVED_NEW and SAVED_UPDATED constants instead of TRUE on success.

The $update argument is not really update-specific but rather provides the primary key data that we don't yet have in the schema. These data are indeed temporarily needed, but when we've added them to the schema instead we'll still need a way to distinguish between update and insert requests. Possibly we should pass in an $action argument with values e.g. TRANSACTION_INSERT and TRANSACTION_UPDATE and call the current $update argument $primary_keys, to avoid having to introduce a new argument when we remove the need for passing primary keys.

In special cases - when we have a single serial primary key field - it may be possible to programmatically determine the operation type, e.g.,

<?php
$action
= isset($object->$primary_key) ? <code>TRANSACTION_UPDATE</code> : <code>TRANSACTION_INSERT</code>;
?>

but this isn't always the case, e.g., when we're inserting into tables with multi-field primary keys.

moshe weitzman’s picture

please, i beg us to stop engineering this. these are all good ideas but we have no time, nor taste for additional risk. comments that don't begin with "i installed and reviewed this patch" are not particularly helpful. we will do more with this, but not now.

- casting to an object can be done by the caller with no additional lines.
- we don't need to return those constants because the caller is currently specifying to an update or an insert.
- you can conceivably have $update fields that aren't primary keys. lets leave this for D7

dries posted a todo list - thats what needs doing. please please.

nedjo’s picture

StatusFileSize
new907 bytes

Almost all our core objects I looked at (users, blocks, system items, comments, taxonomy terms, vocabularies, etc.) are saved and updated in array format, so there doesn't appear to be much in core that can benefit from the patch in its current form. The two object types I find handled as objects for purposes of saving are files and nodes.

Given the Form API's use of arrays, I suspect most data transactions in contrib also use arrays rather than objects.

Attached is an untested conversion of files.inc.

We could also convert the insert and update transactions in hook_nodeapi() implementations, as they use objects.

hswong3i’s picture

subscribe :)

bjaspan’s picture

I hope to look at this on the plane to Barcelona tonight, unless I succeed in sleeping. :-)

moshe weitzman’s picture

Status:Needs work» Needs review
StatusFileSize
new19.89 KB

I have reread the thread and incorporated everyone's feedback. Here is a new version which:
- expands drupal_write_record() usage. we now use it for terms, vocabs, files, and nodes. i can expand the usage further - let me know if thats desired.
- ignores serial values during INSERT, as proposed by Barry
- all of Dries' suggestions in #74, including expanded docs.
- drupal_write_record() now accepts $object as array or object and suggested by Nedjo. This is pretty much required since so many queries work with form array data.
- we return SAVED NEW and SAVED UPDATED as suggested by Nedjo.
- I had to fix a bug and a notice in taxonomy.admin.inc that prevented free tag vocabs from listing their terms.

Benchmarks are coming later today.

TESTERS
- you should create and edit a node, a vocabulary, a term, and a file upload. assure that all data was written as expected. you can tell that just by editing the object again after a submission.

moshe weitzman’s picture

I disabled all drupal and mysql cache and then ran apache bench (ab -c1 -n200) against a drupal home page (10 visible nodes). As you can see, there is no performance degradation from this patch. Requests per second dropped from 17.33 to 17.32 which is insignificant. This tests the node_load part of the patch. It is harder to benchmark the write parts of this patch like node_save() but those were already individual queries into node and node_revision (for exampel) so from a DB perspective nothing has changed there. I thin we can safely say that database performance is not affected by this patch.

** PATCHED **
Server Software:        Apache/2.0.59
Server Hostname:        mm
Server Port:            80

Document Path:          /pr/index.php
Document Length:        4910 bytes

Concurrency Level:      1
Time taken for tests:   11.547601 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      1085000 bytes
HTML transferred:       982000 bytes
Requests per second:    17.32 [#/sec] (mean)
Time per request:       57.738 [ms] (mean)
Time per request:       57.738 [ms] (mean, across all concurrent requests)
Transfer rate:          91.71 [Kbytes/sec] received
** UNPATCHED **
Server Software:        Apache/2.0.59
Server Hostname:        mm
Server Port:            80

Document Path:          /pr/index.php
Document Length:        4910 bytes

Concurrency Level:      1
Time taken for tests:   11.541449 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      1085000 bytes
HTML transferred:       982000 bytes
Requests per second:    17.33 [#/sec] (mean)
Time per request:       57.707 [ms] (mean)
Time per request:       57.707 [ms] (mean, across all concurrent requests)
Transfer rate:          91.76 [Kbytes/sec] received
moshe weitzman’s picture

I disabled all drupal and mysql cache and then ran apache bench (ab -c1 -n200) against a drupal home page (10 visible nodes). As you can see, there is no performance degradation from this patch. Requests per second dropped from 17.33 to 17.32 which is insignificant. This tests the node_load part of the patch. It is harder to benchmark the write parts of this patch like node_save() but those were already individual queries into node and node_revision (for exampel) so from a DB perspective nothing has changed there. I thin we can safely say that database performance is not affected by this patch.

** PATCHED **
Server Software:        Apache/2.0.59
Server Hostname:        mm
Server Port:            80

Document Path:          /pr/index.php
Document Length:        4910 bytes

Concurrency Level:      1
Time taken for tests:   11.547601 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      1085000 bytes
HTML transferred:       982000 bytes
Requests per second:    17.32 [#/sec] (mean)
Time per request:       57.738 [ms] (mean)
Time per request:       57.738 [ms] (mean, across all concurrent requests)
Transfer rate:          91.71 [Kbytes/sec] received
** UNPATCHED **
Server Software:        Apache/2.0.59
Server Hostname:        mm
Server Port:            80

Document Path:          /pr/index.php
Document Length:        4910 bytes

Concurrency Level:      1
Time taken for tests:   11.541449 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      1085000 bytes
HTML transferred:       982000 bytes
Requests per second:    17.33 [#/sec] (mean)
Time per request:       57.707 [ms] (mean)
Time per request:       57.707 [ms] (mean, across all concurrent requests)
Transfer rate:          91.76 [Kbytes/sec] received
eaton’s picture

StatusFileSize
new19.93 KB

All operations appear to be working nicely, though the patch didn't apply correctly first-time-around. I re-rolled it against the latest HEAD and things seem to be going smoothly.

I did get a series of notices from taxonomy module when previewing a node, but everything saved correctly. I'm looking closer to see if that's a problem introduce by this patch.

eaton’s picture

Status:Needs review» Reviewed & tested by the community

Just confirmed that the notice-spam when previewing comes from the current code in taxonomy module, NOT this patch. Should be fixed in a separate issue. Based on the testing and the previous progress, I'm RTBC'ing.

pwolanin’s picture

Status:Reviewed & tested by the community» Needs work

there is a function, db_placeholders(), in this just-committed patch for HEAD: http://drupal.org/node/167284 that uses _db_type_placeholder($type);

Can you please re-roll changing the name to db_type_placeholder($type); there too? (sorry not set up for patch rolling ATM).

nedjo’s picture

Thanks Moshe for your fine work here. The patch is now much more broadly useful.

I've reviewed the code again (but not tested). There's only one small detail I'm unsure of: the special handling for log entries in node updates.

<?php
+    // When updating a node, avoid overwriting an existing log entry when we don't save a new revision.
+    unset($node->log);
?>

This previously was:

<?php
-  if (!empty($node->log) || $node->is_new || (isset($node->revision) && $node->revision)) {
-   
// Only store the log message if there's something to store; this prevents
-    // existing log messages from being unintentionally overwritten by a blank
-    // message. A new revision will have an empty log message (or $node->log).
-    $revisions_table_values['log'] = !empty($node->log) ? $node->log : '';
-   
$revisions_table_types['log'] = "'%s'";
-  }
?>

It looks to me like we'll get changed functionality.

We have special handling for logs because, unlike regular fields, they want to be new for each revision. When we present a node editing form, the log field is always blank. rather than containing the previous value.

Currently the patch unsets the log on updates where we're not creating a new revision. This presumably will prevent users from adding a log entry or changing an existing entry when editing a node (revision).

So likely to keep our existing functionality this should be:

<?php
   
// When updating a node, avoid overwriting an existing log entry with a blank one when we
    // don't save a new revision.
   
if (empty($node->log)) {
      unset
$node->log;
    }
?>

Other than that, looks great.

eaton’s picture

Status:Needs work» Needs review
StatusFileSize
new20.3 KB

The attached patch changes the reference to the placeholder function, as requested in #87. Any thoughts from others about the log message handling?

moshe weitzman’s picture

Status:Needs review» Reviewed & tested by the community

thanks eaton. yours is now the authoritative patch here. i've tested it and it still rocks :).

"This presumably will prevent users from adding a log entry or changing an existing entry when editing a node (revision)." yes it does, and i think this matches current behavior. how do you think current behavior differs? my code is simply more elegant :)

nedjo’s picture

@moshe

I agree your code is more elegant, but I believe it's a change in functionality.

<?php
$revisions_table_values
['log'] = !empty($node->log) ? $node->log : '';
?>

The existing implementation will never discard a non-empty log entry.

<?php
unset($node->log);
?>

Yours discards it whether it's empty or not. So, on updates not saved as revisions where there is an entry made, the entry won't be saved. That's the difference.

To test this:

* Before patching, update a node. Give it a log entry but don't save as a new revision. Expected result: the new log entry will go into the node revisions table for the current revision.

* After patching, repeat the same. Expected result: the new log entry will not be saved, so the current revision will retain whatever log entry it had before (empty or not).

Hence my suggested extra test for empty.

webchick’s picture

Status:Reviewed & tested by the community» Needs work

I can confirm that the old behaviour nedjo describes is the desired behaviour. A patch that changes this needs to be fixed.

See http://drupal.org/node/87474#comment-156504 for the obscure bug this line of code is preventing against.

moshe weitzman@drupal.org’s picture

Status:Needs work» Needs review
StatusFileSize
new20.38 KB

I've updated that empty($node->log) check as suggested by nedjo and webchick. thats needed to match existing behavior.

webchick’s picture

One last bug I found... you get the following notices when reverting a revision (these aren't there with the patch reverted):

* notice: Undefined property: stdClass::$revision_timestamp in /Users/webchick/Sites/head/modules/node/node.pages.inc on line 544.
* notice: Undefined property: stdClass::$revision_timestamp in /Users/webchick/Sites/head/modules/node/node.pages.inc on line 551.

Probably an easy fix, but I'm on my way out the door.

moshe weitzman@drupal.org’s picture

StatusFileSize
new20.38 KB

good catch, webchick. patch updated.

Dries’s picture

This looks incorrect:

+  // rename timestamp field for clarity.
+  $fields = str_replace('r.timestamp', 'r.attimestamp AS revision_timestamp', $fields);
moshe weitzman@drupal.org’s picture

Assigned:Unassigned» moshe weitzman@drupal.org
StatusFileSize
new20.38 KB

@Dries - yes, that was a last second type. Fixed now.

eaton’s picture

Status:Needs review» Reviewed & tested by the community

Just took this through its paces again, testing nodes, node edits, revisions, taxonomy terms, and so on. Everything appears to be working fine (including timestamps on individual revisions). RTBC.

Zothos’s picture

Hmm, just looked through the code.

<?php
// Build the SQL.
$query = '';
+  if (!
count($update)) {
?>

the count seems wrong. Shouldnt it be isset or something?
Althought it works as you did it.

moshe weitzman’s picture

@zothos. i believe that count is proper there. that variable is always set so isset() is always true.

Zothos’s picture

you are right im sry. The only thing which is also possible here is is_null(), but then we have to be sure the var is set. I should read the docs before posting... XD

KarenS’s picture

Subscribing. Yched added this to the D6 version of CCK in case it doesn't make it into core, so I need to know if it does get into core so I can remove it from CCK.

Obviously, I'd rather see it in core :)

Dries’s picture

Status:Reviewed & tested by the community» Fixed

Tested and committed to CVS HEAD. Thanks for all the hard work, glad this made it in. Now onto fixing critical bugs please ...

KarenS’s picture

CCK just got another haircut. Thanks everyone!

coupet’s picture

subscribing

hswong3i’s picture

Category:task» feature
Priority:Critical» Normal
Status:Fixed» Active

a minor request: should we also have something like drupal_drop_record(), so we will able to delete record based upon schema? it should just similar as drupal_write_record() syntax. i guess it should be a very simple implementation, which can help our developers and complete our implementation :)

on the other hand, should we move the building of INSERT/UPDATE query (or also DELETE, if we will have drupal_drop_record()) into database abstraction layer (something similar as http://drupal.org/node/53488 but more simple), which handle by wrapper function (e.g. db_query_insert(), db_query_update() and db_query_delete())? there is a lot of benefits as listed in http://buytaert.net/scaling-with-mysql-replication#comment-2206.

moreover, with provide enough information, Oracle's LOBs handling (http://drupal.org/node/147947) can work as transparency: no query phaser and rebuild are required, no additional db_update_blob() nor db_update_clob() are required. maybe it can't help about MySQL database replication directly (http://drupal.org/node/147160#comment-315160), but it should also be a good starting point for this purpose.

these suggestion may be a bit late within D6 development cycle, but it shouldn't be too difficult for implement, and able to enrich our API into a better status :)

Regards,
Edison

Gábor Hojtsy’s picture

Version:6.x-dev» 7.x-dev

Features go to Drupal 7.

moshe weitzman’s picture

Status:Active» Fixed

please open new feature requests or expand upon othr ones. this issue is done.

Gábor Hojtsy’s picture

Version:7.x-dev» 6.x-dev
Anonymous’s picture

Status:Fixed» Closed (fixed)
dww’s picture

Category:feature» bug
Status:Closed (fixed)» Active

This change was never documented at Converting 5.x modules to 6.x. Furthermore, it introduced a somewhat obscure hook_nodeapi('update') API change that I describe here: http://drupal.org/node/114774#comment-1087468 (which also isn't documented, other than that comment).

moshe weitzman’s picture

Status:Active» Closed (fixed)

That nodeapi(update) change could have been documented. Thanks for adding the comment. As for the introduction of drupal_write_record(), one could argue that this is simply new functionality and thus not required on the update page. I think this can go back to closed but feel free to reopen.

dww’s picture

Status:Closed (fixed)» Active

Many of the things listed on the update page are pointing out new additions to the API that module maintainers might want to make use of while upgrading, not required things they must change about their module for it to work at all. For example:

http://drupal.org/node/114774#drupal-alter
http://drupal.org/node/114774#module_load_include
http://drupal.org/node/114774#db-column-exists (one I added to the API, and had this exact discussion about when it landed)
http://drupal.org/node/114774#user-mail-tokens (another one I added to the API)
http://drupal.org/node/114774#placeholder_helper
http://drupal.org/node/114774#drupal_match_path

This page is the *only* place that we document the human-readable, high-signal diff of the API between versions. Unless you think reading through the entire contents of api.drupal.org or the CVS commit log history is a good place to find out what changed (optional or required) from one version to the next, those of us making changes to the core API better keep those pages as accurate and complete as possible. Otherwise, incredibly useful features like this one (that cost lots of blood, sweat and tears on the even of the D6 release) will remain virtually unknown to the bulk of the people writing D6 code.

I'm one of the more active core contributors (hell, I'm even listed in MAINTAINERS.txt) and the only reason I know about this issue is because I found it while doing CVS archeology to figure out why $node was no longer fully loaded in hook_nodeapi()... :(

Wim Leers’s picture

Status:Active» Closed (fixed)

Since we're going from D6->D7 now, I think we can close this issue …