We should make our database abstraction layer more robust and ensure that module authors can use it without string manipulations inside the query. Several queries use implode() to get their arguments into the query. This is undesirable as we rely on the module author to check the keys and values of such arrays for exploitation attempts.
I have created the attached patch which shouldbe able to allow us to not use implode anymore.
A minor problem is that all inserted values will be treated as strings. This might be a problem with PostgreSQL at least. However, the same strategy is already used in Drupal core without any complaints I know of.
Summary: This patch will alow us to simplify some code in node.module, user.module, taxonomy.module and probably others.
| Comment | File | Size | Author |
|---|---|---|---|
| #36 | db_query-test.php | 2.97 KB | Cvbge |
| #34 | db_query_5.patch | 2.82 KB | killes@www.drop.org |
| #33 | db_query_4.patch | 2.82 KB | killes@www.drop.org |
| #29 | db_query_3.patch | 2.85 KB | killes@www.drop.org |
| #27 | db_query_2.patch | 2.72 KB | killes@www.drop.org |
Comments
Comment #1
killes@www.drop.org commentedIt's a patch.
Comment #2
killes@www.drop.org commentedSqueezed out two lines of code after consultation with Karoly. Adds only 10 loc (plus some docs).
Comment #3
chx commentedDo I need to say +1?
Comment #4
killes@www.drop.org commentedAfter some discussion with Adrian at Drupal Con we found out that we do not know why node_save currently works with pgsql. It currently assumes that all db columns are strings. It seems to work but we should not rely on it.
Here is a patch that checks for the type of field that is inserted.
It needs testing.
Comment #5
drumm+1 for making this into an API. I've seen too many hacked together query builders in Drupal and Contrib. I have not tested.
Comment #6
Bèr Kessels commenteduntested. a big +1 for the feature
Comment #7
killes@www.drop.org commentedthe patch still applies. the new patch here updates node_save to use it. Untested.
Comment #8
killes@www.drop.org commentedthe patch still applies. the new patch here updates node_save to use it. Untested.
Comment #9
Cvbge commentedWell, it does not work. Real life exampless of not-working include flexinode (my experience) and forum module (as someone reported). Probably also others.
The bug occurs when a normal user (but with necessarily rights) adds a node and he has no controls (the 'moderated', 'sticky', 'published' etc). The, in node_load() $node->sticky, $node->moderated (and maybe others) are set to FALSE (or TRUE, but in this case it works).
When doing printf("%s", FALSE) the FALSE is change to empty string. The sticky and moderated db fields are numeric and postgresql do not accept '' (empty string) as a value of integer type.
The result is for example such error:
Comment #10
Cvbge commentedHere's a quick fix for 4.6
Comment #11
killes@www.drop.org commentedthe last patch wasn't good.
Comment #12
killes@www.drop.org commentedHere is an additional patch for taxonomy.module
it removes the two custom functions that are used for inserts and updates.
Needs testing.
@Cvbge: shoduld we have a better test than just is_numeric in _db_argument_type to fix the pgsql problems? We could also try to pass $value by reference and cast it to int. this function is only used for updates/inserts so we can afford an additional check.
Comment #13
killes@www.drop.org commentedHere's a revised version fo the original patch. _db_argument_type now gets the $value argument by reference and we check for bools too. numerics and bools are cast to int.
Comment #14
Bèr Kessels commentedI cannot say anything about performance for I do not know how to benchmark this. But I tried it, And rewrote some queries in a custom module, to use it, and like it a lot.
The code becomes cleaner, and better readable. But above all, i see this idea going into a very interesting and good direction: that of 'more' query builders in core'.
a +1!
Comment #15
Cvbge commentedI'll be talking about postgres db only.
The patch won't work as expected in some cases.
Let's say we have a text/char/other character column and we want to insert a text that looks like a number. The _db_argument_type() will recognize the value as a numeric/number etc and will use
$key = %dformat. Even not considering printf() numeric conversion this can lead to incorrect value inserted into db.Examples of such problematic texts are:
01(leading 0),2e2(scientific notation), probably also text with trailing/leading white spaces. When inserting such data into db without quotes the postgres db will think this is a number and will change 010 to 10, 2e2 to 200 . Here is an example:I think it's safe to use '%s' everywhere. I could not find it writtent directly in postgres documentation (although I remember reading it somewhere some time ago). Postgres will convert the string to correct integer/numeric/bool value.
Comment #16
killes@www.drop.org commentedOk, the patch now does only distinguish between integerss and the rest. This is needed to decide whether we need quotes.
Comment #17
Cvbge commentedI'm sorry, but I see no reason for this integer check. Can you explain why there is a need to differentiate between integers and other types?
The only problem I see is with converting bool(FALSE). This value is converted to empty string ('') and when trying to insert it into numeric, float, bool, date (probably any column that is not text type) etc you get error.
One solution would be to check for bool(FALSE) and convert it to 0. This would work for numeric-like and bool fields. But it would not work for DATA columns (would produce error). Also, entering '0' into text column might not be what the author wanted.
But this is more a problem with the coder - if you have integer/date etc field, insert integer/date/etc data type, not bool!
Side note: the integer-version, the one without '', will be used only for integers (i.e. ..., -2, -1, 0, 1, 2, ... see http://php.net/manual/en/language.types.integer.php).
Comment #18
Cvbge commented"DATA" should be "DATE"
Comment #19
killes@www.drop.org commentedOk, another attempt. There is indeed no real reason to use %d for integers, they work fine as strings and are converted by sprintf to strings independend of %d or %s. But it is the Right Thing(tm), so I kept it.
Booleans, however, need the %d formatter in order to be converted to 0 and 1. TRUE to 1 always works, but FALSE to 0 only works with %d.
All tests on php 4.3.x.
Comment #20
Cvbge commentedThe part I was objecting previously looks ok now. I haven't tested the code though.
I still think it'd be nice to change php NULL to real SQL NULLs, but I don't have how to do it (at least not with current approach). The best would be if there was (s)printf flag that would just ignore it's argument...
Comment #21
Thomas Ilsche commented+1, tested
Runs nicely but maybe NULL and floats might need special treatment.
I have attached an INCOMPLETE patch for node_save using the new feature that also fixes the following issue.
Comment #22
killes@www.drop.org commentedHere is an updated patch that takes a lot of the code I added to node_save out again (approx 20 loc). it also fixes a docs glitch in that function. Needs testing and is to be used with the patch from http://drupal.org/node/17656#comment-39222
Comment #23
Cvbge commentedA remark about NULL values:
in this patch they are treated as integers and thus allways changed to '0'.
In existing code the '' was INSERTed into text-like columns (%s) if didn't check if the variable was set (or did not care).
If they now use %a and still do not check for NULLs the '0' will be inserted which migh or might not create problems (it won't matter for if()s, but maybe for other uses?)
Comment #24
killes@www.drop.org commentedCvbge: AFAIK we don't use NULL values inside the DB in Drupal core.
Comment #25
Cvbge commentedYou're probably right, then
1. not checking if a variable is set is a bug (and I've already filled a bug for one of such bugs, can't find it - sticky, moderated etc. fields were not set at all when not selected when submitting a post)
2. There's already a lot of DEFAULT NULL in the database schema...
Comment #26
Cvbge commentedHello, NULLs again.
Previously I wrote that NULLs are treated as integers and passed with %d and converted to 0.
This is of course not true [were I sick when writting it?]. They are treated as strings and converted to '' (empty string)
This will create similar problems as FALSE. Maybe NULL should be treated as %d (and converted to 0) after all?
Comment #27
killes@www.drop.org commentedOk, I've included is_null() in the condition.
Comment #28
Cvbge commentedok, last thing (really ;)): maybe add some information to docs saying that null is converted to '0' (not to '') ? :)
Comment #29
killes@www.drop.org commentedKjartan prefers this version. I also added some comment.
Comment #30
killes@www.drop.org commentedAfter some more discussion the consensus was that we should not care too much about NULL values since they are not used in Drupal core and use the second patch (http://drupal.org/files/issues/db_query_2.patch).
Comment #31
chx commentedI tested said patch today. It applies and it's an absolute must for security purposes. If it would have been in, two out of the three flexinode vulnerabilities would not have happened.
Comment #32
chx commentedI meant: if it would have been available in the distant past -- it was not a lament "why this was not commited earlier". Also I was totally wrong because db_query array parameter usage would have been enough for flexinode :( .
Note to self: look around for db_escape_string usage and get those SELECTs reworked.
Comment #33
killes@www.drop.org commentedI've changed a fugly strpos to a less fugly substr and added a code comment for Cvbge.
Comment #34
killes@www.drop.org commentedsilly typo in patch.
killes@helios:/home/killes/drupal-cvs$ php -l includes/database.inc
No syntax errors detected in includes/database.inc
:p
Comment #35
Cvbge commentedI'm trying to implement support for %b which is needed for storing binary data in the database (see http://drupal.org/node/10407#comment-47173).
I have a working implementation, but it's slower then killes' for queries not involving %a. Below some test results done on Celeron 850 without any load. Only db_query() was benchmarked, total and average of 10000 runs. Difference can be seen in test 2 and 5 mainly.
Comment #36
Cvbge commentedok, last test was not done correctly, this time they are ;)
Attaching script for testing.
Again tests were done using 10000 runs.
Results seem to suggest that the longer the query, the slower my db_query() gets. I should test it more, I'll probably do later.
But I'd like to note that the query parsing time is a) much smaller then query execution time b) it's very small anyway.
Let's take the highest query parsing time from the test, 0.6ms. Even if there were 1000 such queries that would make total time of 600ms, that is 0.6 second. Such time is almost marginal.
But I'd like to see some real queries, from email I sent to drupal-devel:
Results:
Original db_query(): (%a queries are not meaningfull because original function does not support it)
My version:
Killes version:
Comment #37
Cvbge commentedFixing version.
Comment #38
Cvbge commentedI've been busy for a couple of last days, but finally got some time and used devel.module on a copy of real site. I've also enabled all modules and all blocks.
For admin/access: "Executed 40 queries in 182.94 microseconds." for the first time (no cache), and "32 queries in 107.16 microseconds" for the second time (cached) (hmm for uid=1?)
Longest taking query was
INSERT INTO {cache}- 60ms. Parsing time was 2.3ms.Fastest one was
SELECT last_comment_timestamp, last_comment_name, comment_count FROM {node_comment_statistics} WHERE nid = %dwhich took 1.6ms. Parsing time was 0.13ms.SELECT DISTINCT(uid), MAX(timestamp) AS max_timestamp FROM {sessions} WHERE timestamp >= %d AND uid != 0 GROUP BY uid ORDER BY max_timestamp DESCtook 13ms and parsing time was 0.14ms.For main page: "Executed 72 queries in 243.69 microseconds."
Again longest query were
INSERT INTO {cache}(53ms),UPDATE {cache}(10ms),SELECT * FROM menu ORDER BY mid ASC(18ms).After being cached longest queries were
SELECT COUNT(*) FROM node_access WHERE nid = 0 AND CONCAT(realm, gid) IN ('all0') AND grant_view = 1(6.5ms),SELECT t.* FROM term_data t, term_node r WHERE r.tid = t.tid AND r.nid = 23 ORDER BY weight, name(6.75ms), and twoSELECT flexinode_1.textual_data AS flexinode_1, flexin ...flexinode's query that took 7 and 5ms.I believe this proves that query parsing times need not to be taken into account.
Comment #39
Cvbge commentedBTW, "cache support" was disabled, so how come {cache} was used?
Comment #40
Cvbge commentedWell it's not going to get in so changing the status to remove from search results for RTBC issues ;)
Comment #41
moshe weitzman commentedwell, whats the status here? seems like most think this is a good approach. i agree, but haven't tested yet.
Comment #42
killes@www.drop.org commentedThe problem is that Dries doesn't like it...
Comment #43
svenax commentedSorry for bumping an old issue, but is there a reason that the %a syntax hasn't been included? I think it would be absolutely terrific to be able to construct complex INSERT and UPDATE queries in this way. It sure would make life easier.
Comment #44
killes@www.drop.org commentedI think the main reason is "Dries doesn't like it" :p
I still think it is an excellent idea.
Comment #45
Crell commentedDries shot down using a separate function, too. Similiar functionality is now in helpers.module. See: http://drupal.org/node/53488
Comment #46
Crell commentedThe functionality this patch offers has been reimplemented as part of the Database TNG patch: http://drupal.org/node/225450