As discussed during the "database mini-summit" inside Drupalcon (really a lunch), we need to whitelist queries that goes thru db_query(): only simple SELECT queries should use that interface.

As discussed, we should have an "unsafe" mode, in which queries go unchecked. For adventurous module writers.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Damien Tournoud’s picture

Status: Active » Needs review
FileSize
26.95 KB

This is pretty much the approach that we agreed upon, let's see if hell breaks loose.

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
26.95 KB

Fixed a syntax error in pgsql/schema.inc. (oups)

Heine’s picture

Why should we have an unchecked mode? I'm afraid this will be the easy way out for module authors. Just flip the flag to TRUE and you're done updating.

cha0s’s picture

Just a couple of questions... would appreciate some insight.

Why is this critical?
Is 'whitelist' the proper term for what's happening here?
What are the performance implications of adding the regex?
Why is this necessary?

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

catch’s picture

Priority: Critical » Normal

This looks like:

1. Babysitting broken code.
2. Introducing the equivalent of 'this function is deprecated' BC handling.

Neither are critical. Both are what we have coder module for.

Heine’s picture

@catch, if we don't enfore SELECT only db_queries, we may end up with lots of modules that do not work across the different database drivers.

catch’s picture

Yes, that would be a bug in the module, which we would file bug reports for in the respective issue queue.

I've seen modules use mysql_query before too, but we don't regexp module files for mysql_query every page request and throw exceptions just in case either.

edit: also,

One of the main design goals of DBTNG is to avoid string parsing of queries whenever possible.

http://drupal.org/node/314464#comment-1078529

Damien Tournoud’s picture

Priority: Normal » Critical

This doesn't fall in the "Babysitting broken code" category. If you issue an INSERT query using db_query() it might not fail under MySQL, but might fail under PostgreSQL or SQLite. We are making something that doesn't fail (but should) fail.

This is a critical *task*, not a bug.

Adding a regexp here should not incur any performance hit (but this will have to proven).

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
27.41 KB

Installer queries need to be marked as unsafe, too. This should make Drupal install again.

nevets’s picture

Come on, this is over the top. You are trying to enforce your view of the world on other (non-core) modules.

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
28.26 KB

I can't believe we still had a direct UPDATE query in core!

catch’s picture

If you issue an INSERT query using db_query() it might not fail under MySQL, but might fail under PostgreSQL or SQLite. We are making something that doesn't fail (but should) fail.

Once again, that's a bug, it can be fixed in a bug report for the module, it doesn't need cruft adding to core to protect against it. Also you can write SELECT queries with db_query() which fail on Postgres and SQL too.

cha0s’s picture

Yeah, how could a regexp for every single db_query() possibly have a performance penalty?

ksenzee’s picture

The fact that we still had a db_query("UPDATE...") in core says a lot. That query should have failed. We might as well not bother with multiple db drivers if we're going to let db_query() cheat. Could we do this with a simple strpos() though? It would be a lot faster.

Damien Tournoud’s picture

@ksenzee: I believe preg_match() will be faster in every case. Regexps are fast and compiled, and generally faster then dumb string functions as soon as you are doing something more then once.

catch’s picture

I spoke to Damien in irc but still firmly disagree with this. I would not object to adding a (temporary, removable in D8) regexp (or simple strpos for common patterns) in hook_requirements() and/or hook_modules_enabled() - that could still throw an exception and prevent the module being used until it's cleaned up.

jpmckinney’s picture

Status: Needs review » Needs work
+        // Whitelist queries. Only simple SELECT queries should be passed to this
+        // function (and the db_query() wrapper). The queries should start with
+        // SELECT and should not contain any quotes.
+        if (empty($options['unsafe'])) {
+          if (!preg_match('/\s*SELECT\s+[^\']+/', $query)) {

For the regex to match the comment, shouldn't the regex have anchors:

          if (!preg_match('/^\s*SELECT\s+[^\']+$/', $query)) {

Also, why are only single quotes evil? Why not double quotes?

Crell’s picture

Priority: Critical » Normal

We discussed this in SF as a way to ensure that module developers that haven't heard us ranting about not putting INSERT/UPDATE/DELETE queries into db_query() for the past two years or don't read the documentation don't end up writing code that is impossible to run on non-MySQL. (Any inserts/updates involving LOB or BLOB fields, and on Oracle anything with an empty string because Oracle can't tell the difference between empty string and NULL.) I am OK with it on condition that there is no measurable performance hit.

I am however going to downgrade from critical, though. Critical means "Drupal doesn't work without this". Wrong. Module developers can be too lazy without realizing it without this. That's not critical.

"unsafe" is the wrong key to be using, though. That reads as "hey Drupal, this query isn't safe." Which is not really what we mean here. "allow_full_query"? "allow_non_select"? "relax" => TRUE? Not sure here.

Heine’s picture

What would prevent anyone encountering a problem from just flipping the flag to TRUE? Can't we make a non API function (eg _install_query) that allows this? (Then again, some people would use that without thinking twice).

nevets’s picture

CMS that make blockades to development end up with people simply working around the blockade. There are other spots that Drupal does not protect developers from themselves, why such a strong stance on forcing people to make their code cross database compatible?

jpmckinney’s picture

I'm against babysitting developers who don't know what they're doing and throwing roadblocks in front of developers who do.

David_Rothstein’s picture

I agree with @catch and others - this check would be really great for Coder module, but not here.

There are a million ways in Drupal to write code that works on some systems but breaks on others. I can use core PHP functions that were only introduced in PHP 5.3, use functions from PHP libraries that not everyone has installed, make direct system() calls that run programs that not everyone has, etc... Why is the database layer special? These are all bugs.

Also, there can be legitimate reasons to use db_query() for these queries sometimes. Maybe I'm writing a simple, compatible UPDATE query (and I'm a lazy developer), or more to the point, maybe I'm writing custom code that will only ever run on a single server and I want to use some MySQL-specific hack. Do we really want to make all those people set the 'yes_drupal_I_really_mean_it' => TRUE flag every time they do this?

I think as an alternative the documentation for db_query() could maybe go more in-depth about when and why it's a bad idea to use it? - right now it just states it pretty matter-of-factly without explaining why.

***

Oh, also, here is a more complete list of places in core that are theoretically using the API incorrectly (although a couple have notes suggesting that the database API doesn't support their query yet - don't know if that's still true):

./modules/comment/comment.install:199:  db_query('UPDATE {comment} SET created = changed');
./modules/system/system.install:1732:    db_query("UPDATE {poll_votes} SET chid = (SELECT chid FROM {poll_choices} c WHERE {poll_votes}.chorder = c.chorder AND {poll_votes}.nid = c.nid)");
./modules/system/system.test:1683:    db_query("UPDATE {users} SET pass = :pass WHERE uid = :uid", array(':pass' => $user1->pass, ':uid' => $user1->uid));
./modules/forum/forum.install:280:  db_query('INSERT INTO {forum_index} (SELECT n.nid, n.title, f.tid, n.sticky, n.created, ncs.last_comment_timestamp, ncs.comment_count FROM {node} n INNER JOIN {forum} f on n.vid = f.vid INNER JOIN {node_comment_statistics} ncs ON n.nid = ncs.nid)');
./modules/simpletest/drupal_web_test_case.php:1232:    db_query('INSERT INTO {registry} SELECT * FROM ' . $this->originalPrefix . 'registry');
./modules/simpletest/drupal_web_test_case.php:1233:    db_query('INSERT INTO {registry_file} SELECT * FROM ' . $this->originalPrefix . 'registry_file');
./modules/book/book.install:22:  db_query("DELETE FROM {menu_links} WHERE module = 'book'");
./includes/database/mysql/database.inc:107:    db_query('DELETE FROM {sequences} WHERE value < :value', array(':value' => $max_id));

I'm not sure any of these actually break anything, but for the sake of setting a good example it sounds like it makes sense to convert as many as possible.

ksenzee’s picture

@nevets, I'm a little curious why you consider this a blockade to development. The new db layer is quite usable - easier than doing db_query() right, if you ask me. This is more like putting a "Road Closed" sign on the smaller, bumpier road, so people look up and notice the freshly paved highway right in front of them.

There are a million ways in Drupal to write code that works on some systems but breaks on others. ... Why is the database layer special?

This is a special case because it's way too easy to do this wrong. To do it right, contrib authors would have to be even more aware of the coding standards than core contributors, apparently, since we currently have a spot in core where it's done wrong. And if either core or contrib uses the DB layer wrong, we're putting huge barriers in front of anyone who wants to use Drupal on anything but MySQL. Those people's needs are important, and we've made a public commitment to them.

In hindsight, it might have been useful to rename db_query() to something else - maybe db_simple_select(), although that's a dorky name - to make it blindingly obvious that you're only supposed to use it for simple select queries. We didn't do that, and so what we have is a deceptive API. This is a simple change that will help correct that.

lostchord’s picture

@All

If you are going to provide a function that is intended to be used exclusively for SELECT queries then call it db_select_query! No confusion about its function then and no arguments about screening queries for SELECT. The overall API then makes more sense... use db_query if you want to break things, use db_select_query for SELECT, etc.

@crell (#21)

... to ensure that module developers that haven't heard us ranting about not putting INSERT/UPDATE/DELETE queries into db_query() for the past two years or don't read the documentation ...

There is nothing in the D6 documentation for db_query that mentions this although there is an informal comment added about 6 months ago that refers to INSERT. If you had amended the D5/D6 documentation 2 years ago to inform developers working with D5 and D6 that there was a non-backward compatible change coming in D7 and advising how they should code to avoid problems in the future then you would have grounds to complain. As it stands I think it is the developers who will be complaining.

@David_Rothstein (#25)

maybe I'm writing custom code that will only ever run on a single server and I want to use some MySQL-specific hack. Do we really want to make all those people set the 'yes_drupal_I_really_mean_it' => TRUE flag every time they do this?

Agree completely. If you force developers trying to use Drupal for purely in-house applications to jump through hoops to support multi-database capablities they have no interest in then they may well go somewhere else.

cheers

jpmckinney’s picture

If you force developers trying to use Drupal for purely in-house applications to jump through hoops to support multi-database capablities they have no interest in then they may well go somewhere else.

Or they will write:

function backdoor_db_query($query, array $args = array(), array $options = array()) {
  if (empty($options['target'])) {
    $options['target'] = 'default';
  }
  $options['unsafe'] = TRUE;

  return Database::getConnection($options['target'])->query($query, $args, $options);
}

So, really, what is the point? Developers who don't want to heed advice won't, all of a sudden, read the documentation if you throw a DatabaseInvalidQueryException; they will just look for the quickest, simplest work-around.

ksenzee’s picture

@jpmckinney: This isn't aimed at developers who are hell-bent on using the APIs wrong. It's an attempt to help people who are trying to do it right. Why wouldn't your theoretical dev just skip the API entirely and use mysql_query()?

jpmckinney’s picture

Valid point about people trying to do it right. Still, I'm not convinced throwing exceptions is the best way to reach these developers.

Crell’s picture

@lostchord: The entire D7 DB layer is a BC break, on every query. Drupal does that in major versions. We have the BC argument every version but we're still in alpha state for D7, so it's too early for the scheduled flame war. Let's wait a few weeks and then we can have it. :-)

@jpmckinney: I am open to suggestions on alternate ways to ensure that developers do things "right". We have an increasing number of BLOB fields in core (all serialized fields should be BLOBs, there's an open issue for that which is in progress), which impacts Postgres. Oracle and SQLServer both need help on both BLOB and LOB, I believe. We need to get people off of db_query() for insert and update queries, and schema queries are just very unstandard to begin with. If you have a better suggestion for how to do that for the 6000+ contrib modules out there, I am open to other ideas.

catch’s picture

@Crell - what's wrong with the hook_requirements() + hook_modules_enabled() (or equivalent) from #19.

Crell’s picture

String parsing source code is icky business and not fool-proof. I could easily see unnacceptable false positives there. Also, that wouldn't allow for "Yes I know I'm not supposed to but I really do have a good reason on this site" cases.

catch’s picture

OK let's not do it at all then, I think this is won't fix, and a coder rule.

Damien Tournoud’s picture

If this is a won't fix, then we need to remove all the non MySQL drivers from core. Relying on *users* of those databases to fix the bugs is nothing more then pure wishful thinking (see: the history of Drupal < 7 on PostgreSQL).

Making sure that your modules are using the API correctly and that you issue portable queries is the job of *every single* module maintainers. It is not the job of the maintainers of the "third-party" drivers, it is not the job of the users of those engines.

Obviously, this simple regexp will not ensure that every SELECT query is portable. But it takes us a long way in making sure that module maintainers are aware of those issues and don't write queries the wrong way unknowingly.

nevets’s picture

Its not Drupals job either to make sure people from database portable code, it's job is to facilitate portable database code.

ksenzee’s picture

Facilitating good code is exactly what this does. If there's an "I know what I'm doing" option, then nobody's getting forced to do anything. Just add the option and you're done.

I would 100% agree that this is a coder module feature, not something for core, *if* we had renamed db_query(). As it is, 99.9% of contrib modules are going to screw this up somewhere. Damien is right: If we don't do something here, we might as well not pretend we're supporting anything but MySQL.

cha0s’s picture

EDIT: I spoke too soon @ksenzee. You're correct that it does at least offer an option as a way out of the exception.

@Damien, I'm not sure if you spoke incorrectly, but your assertion illustrates why this patch doesn't belong in core, IMO.

Making sure that your modules are using the API correctly and that you issue portable queries is the job of *every single* module maintainers [sic].

This should be documented and put in coder.module. That's how we've solved "making sure that module maintainers are aware of those issues and don't write queries the wrong way unknowingly" this far, as I can see.

Crell’s picture

Can we reliably write a coder module check that flags non-SELECTs in db_query() and says "hey dude, stop that!"?

Josh Waihi’s picture

Status: Needs work » Needs review
FileSize
342.31 KB

I'm surprised that no one in this thread has mentioned the simpletest framework, which, BTW tests every patch on d.o. Funny enough, d.o is that same place we host contrib modules. So why not use the test bots to test every recommended release of a contrib module and provide on the project page a box with crosses or ticks next to each database driver indicating driver compliance for that release. This will:

  • Encourage contrib maintainers to write tests for there modules. (Yay!)
  • Encourage contrib maintainers to write compatible code for there modules, or show that there module is pretty poor and anti-compliant. Same on them

The coder module is not a solution as there are many developers who don't use it. The thing about open source is that we cannot control how people use or develop Drupal solutions. But since most people go to the project page to download a module. We can give them guidance on whether to use that module or not. We already do this with usage stats.

So I don't agree with checking for illegal use - we don't do that in parts of the drivers. But I think it would be a good idea to rename the db_query function so that module maintainers do need to re-code there modules which will force them to implement DBTNG correctly, or rename the db_query function if they are lazy.

I propose db_static_query, as it is used for more than just SELECT queries and with good reason, as the drivers don't cover 100% of SQL requirements.

Status: Needs review » Needs work

The last submitted patch, 783814-rename-db_query.patch, failed testing.

catch’s picture

The placeholders change from D6 to D7 ought to ensure that queries don't work without some kind of tweak anyway, unless they have no placeholders at all, so I'm not sure that a change which breaks all existing D7 modules, and breaks D6 modules a little bit more, just for the sake of breaking them, is much help.

I'm also not clear why multi-database type test bots aren't mentioned on here either, I remember that being discussed in irc and the argument, irrc, was that it was going to be too resource-intensive - I don't see how we can get away without that eventually, so I'd rather have the resources being expended in test slaves than in runtime code on every Drupal 7 site.

Crell’s picture

We discussed multi-DB testbots in SF with the reps from MS and Oracle. The basic takeaway was that we wanted to have nightly "run everything and report somewhere outside of the queue" setup that MS and Oracle could run themselves on their infrastructure. Presumably we'd then host a similar setup for SQLite and PostgreSQL on the d.o infrastructure. Issues found that way would not be considered critical or issue blockers, but we would know very quickly if something breaks so we can go fix it.

That was only discussed in the context of core, however. I don't know what the resource needs would be to extend it to contrib. This issue is more about contrib. I'm open to the test-server approach instead, if that can be made to work.

Renaming db_query() is not, I think, necessary at this point, as catch said. That's also a huge API break to happen in alpha5, and not one that I would support. :-)

Damien Tournoud’s picture

As an example of all the messing we are getting ourselves into, see Randy's surprise that double quotes in static queries don't work consistently in #813310: db_query(): Static query with double quotes is not properly parsed.

Randy is far from being the random module author. I stand that we definitely need to automatically fail those queries, or our cross-database support will remain a pipe dream.

Damien Tournoud’s picture

Status: Needs work » Needs review

Rerolled my earlier patch from #14, that I'm pretty much convinced we need. With the following improvements:

  • properly anchor the regexp (thanks jpmckinney#20)
  • add the double quote to the regexp (thanks rfay)
  • rename 'unsafe' to 'relax_checks', hoping to make Crell happy
Damien Tournoud’s picture

The patch.

rfay’s picture

Subscribing... Not completely able to parse this issue.

I certainly think that:

  • If quotes aren't allowed they shouldn't be allowed (single or double)
  • This should be mentioned somewhere in the static queries docs page: http://drupal.org/node/310072
Damien Tournoud’s picture

Wrong patch, sorry.

Damien Tournoud’s picture

Rfay noted that we need to anchor the regexp on the right.

rfay’s picture

I tested my case and this works for me. Consistent, predictable. Needs to be documented still, of course.

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

Damien Tournoud’s picture

I benchmarked this, and there is (as expected) no significant difference made by that additional check.

I tested the following:

  • full benchmark + 50 static queries per run
  • 100 runs total

The difference is below the noise level.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
75.94 KB

I don't believe how painful this is. At least this one should install, and I hope I haven't missed a broken query.

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
83.04 KB

Ok, so I missed some of those. This should bring us closer.

Damien Tournoud’s picture

And this one should fix the "Exception thrown without a stack frame" (triggered from DatabaseConnection_mysql::nextIdDelete() which is registered as a shutdown function).

Status: Needs review » Needs work

The last submitted patch, 783814-whitelist-queries.patch, failed testing.

Damien Tournoud’s picture

Status: Needs work » Needs review
FileSize
86.15 KB

Fixing the remaining issues (temporary tables queries should be relaxed, one missing query in menu tests, and one typo in the action tests).

apaderno’s picture

Status: Needs review » Reviewed & tested by the community
Crell’s picture

Status: Reviewed & tested by the community » Needs review

What's going on in DatabasTasks_pgsql? I don't know Postgres, so what is this patch doing there with the LANGUAGE='sql'?

The exception thrown by the new regex check say a bad query was passed to db_query(). That's technically not true, since we do support calling $connection->query() as a first-class API; that's the better way to handle DB switching now.

Of course, that then immediately raises the question in my mind, would we be better off putting this check in db_query() and allowing $connection->query() as the bypass? That would dramatically reduce the amount of code we have to change within the DB drivers themselves to set the bypass flag all over the place. (Some of them are kinda ugly if we don't have any placeholders.)

catch’s picture

Can we see the overhead added to an individual db_query() rather than a verbal summary of ab runs please? xhprof output showing the relative cost of the preg compared to the rest of db_query() functions, for say 300 queries, along with microtime on 10,000 runs before and after ought to do.

And on the assumption this ends up going in (which I very strongly feel it shouldn't for the same reasons given repeatedly above), I still want to see a @todo to remove this from D8.

Crell’s picture

I would still be open to a coder.module rule instead if we can show it will be sufficiently effective at weening people off of doing things in a non-portable way.

rfay’s picture

To wean people, we need to have clearly stated rules (and hopefully consistent behavior when the rules are violated). I personally think that as far as software reliability over the long term this approach is good. It means that errors are treated as errors. I do understand the (potential) conflict between reliability and performance, but I always go back to reliability as the starting place.

dhthwy’s picture

FileSize
227.05 KB
222.85 KB

jesus, 10k db_query()'s resulted in 350k function calls.

Although I'd tend to agree with catch, adding the preg_match seems to have a miniscule impact on performance in the grand scheme of things. I don't see it hurting given these benchmarks. I hereby release jpegs into the world.

catch’s picture

Status: Needs review » Needs work

@dhthwy - could you post the same thing including the cpu time? I doubt it's that much different but nice to know.

If those numbers are right this in in the region of 1.5% of each call to db_query() which is indeed fairly negligible, although I still wince at adding any overhead at all to the critical path especially when in this case there are non-runtime solutions.

Other issues with the patch. That was only from a cursory glance (I still don't consider the arguments made here for doing this at all to be convincing given alternatives which don't require critical path API changes, and while I might be in a minority, it's not just me, so didn't do a full review yet), however if you want to ignore all that, please don't mark it rtbc without fixing them:

1. Where $relax_checks is used, it should be documented, particularly the nextId() hunks stood out.

2. The patch has to rewrite valid db_query() calls where '' is used as a string literal in the query due to adding those to the regexp (presumably to account for mistakes like ':placeholders'). We should not break valid code using the API correctly just because it might be a false positive for cases when it's not. That's moving from enforcing an existing API to changing it. Those changes should just be reverted, or moved to a separate issue. The separate issue, if one is created, should also correct that fact that those query strings overall are still double quoted, which doesn't match coding standards - in fact if we decided to make such a change for consistency (with or without the regexp) one advantage would be that any normal calls to db_query() would use single quoted strings per coding-standards.

Damien Tournoud’s picture

A first pass at answering concerns here:

@Crell: What's going on in DatabasTasks_pgsql? I don't know Postgres, so what is this patch doing there with the LANGUAGE='sql'?

We are not doing anything here, just adding the relax check.

@Crell: Of course, that then immediately raises the question in my mind, would we be better off putting this check in db_query() and allowing $connection->query() as the bypass? That would dramatically reduce the amount of code we have to change within the DB drivers themselves to set the bypass flag all over the place. (Some of them are kinda ugly if we don't have any placeholders.)

I actually floated the idea in DCSF, and you rejected it on the basis of API consistency. And I agree with the you from DCSF.

@Crell: I would still be open to a coder.module rule instead if we can show it will be sufficiently effective at weening people off of doing things in a non-portable way.

For all those that think that social means (like the coder module, which might, one day, be deployed on the drupal.org test infrastructure) will solve this issue, think again. The support of PostgreSQL on Drupal core and modules has been a joke for years.

This patch aims at encouraging module developpers to do the right thing from the start. Once something is broken, it's generally very hard to unbreak. Of course, this complements other tools, like the Coder module.

Even if I'm maintaining the PostgreSQL and SQLite drivers in core, *I will not make sure that every module on Drupal.org works on PostgreSQL and SQLite*.

@catch: The patch has to rewrite valid db_query() calls where '' is used as a string literal in the query due to adding those to the regexp (presumably to account for mistakes like ':placeholders').

Those queries are using string literals. As a consequence, they are not valid.

The separate issue, if one is created, should also correct that fact that those query strings overall are still double quoted, which doesn't match coding standards - in fact if we decided to make such a change for consistency (with or without the regexp) one advantage would be that any normal calls to db_query() would use single quoted strings per coding-standards.

Indeed. The fact that those queries uses double quotes is a symptom that they need to be fixed. I considered doing that but rulled against it, as it would significantly increase the size of this patch.

dhthwy’s picture

catch’s picture

I don't remember string literals being invalid in db_query(), was unable to find any discussion prior to this issue, or docs either. Double quoted strings, or placing quotes around placeholders yes, but not the examples which are removed in this patch (which is probably why there's so many of them). So if they're invalid we need better documentation than "use the appropriate placeholders to pass string values.".

On this:

Even if I'm maintaining the PostgreSQL and SQLite drivers in core, *I will not make sure that every module on Drupal.org works on PostgreSQL and SQLite*.

We can equally change that sentence to:

I will not make sure that every module on Drupal.org works

And it would be even more reason not to add safety checks to core functions which run on every request.

This is getting repetitive on both sides, but to reiterate, I can't think of a single example where contrib modules being broken has resulted in the addition of a lot of runtime code to core, with the exception of security issues which are their own category (but even when core is extremely helpful the responsibility is on contrib modules to use the API correctly).

@dhthwy: thanks, that looks like 2.2% cpu usage added to db_query(), since db_query() cpu usage itself is only 2-5% of most requests in terms of CPU that confirms a negligible performance difference, so it's really down to whether the change is desirable in itself or not.

rfay’s picture

I have to admit I am still baffled why a string literal would be a compatibility problem, assuming that SQL is the base for static queries at all. But I'm fine with it, as long as it's consistently handled.

Crell’s picture

I suspect the quote issue is due to the way the DB itself treats " vs. ' in query strings. It's not something that the Drupal code cares about either way at the moment.

rfay’s picture

@Crell, not wanting to get sidetracked, but in SQL (I think every variant) single and double quotes are equivalent and are appropriate for string literals. And the failing queries in #813310: db_query(): Static query with double quotes is not properly parsed work fine when issued directly to mysql.

But the question remains: Why are string literals considered "bad" in PDO/DBTNG? Because they can be used by stupid developers to introduce SQL injection? They are not "bad" in the SQL sense - they're a standard and expected part of the language. And we do support static queries.

Crell’s picture

I believe string literals in queries were considered "bad" starting with D6, on the grounds that they made cross-DB support harder. I cannot find a reference to that in the D6 update guide, though. I thought it was there: http://drupal.org/update/modules/5/6

In practice I believe it's only an issue for BLOB and LOB fields, since those require special handling, including secondary queries, on some databases (Postgres and Oracle in particular). varchars and ints should be safe in practice, but as always it's easier to document "just don't do it" than "just don't do it on these types of fields in these cases under these conditions".

rfay’s picture

String literals remain fundamental to D6. You have to use them with string-type placeholders in order to have a valid query: db_query('update {node} set ... where title = "%s"', $newtitle ); So I think this is a D7 thing.

Damien Tournoud’s picture

Single quotes and double quotes have never been the same in SQL (some discussions about this):

That said, there are several benefits in removing all string literals from queries, which is the point of this issue:

  • Without string literals, all queries are guaranteed to contain only identifiers, so third party drivers can quote reserved identifiers (that even Drupal core still uses) using (more) simple regexps
  • Without string literals, drivers can massage the strings, as needed for Oracle to workaround the empty string == NULL bug
  • Without string literals, we don't risk SQL injections anymore, #win

Ensuring that only SELECT queries are passed to db_query() and that all DML queries (INSERT, UPDATE, DELETE, TRUNCATE, MERGE) are using our query builders also has the following additional benefits:

  • Module writers don't write queries they think are valid, but really aren't,
  • BLOB / LOB fields can be properly massaged (see InsertQuery_pgsql, UpdateQuery_pgsql),
  • The database engine can workaround bugs or semantic differences in those queries (see UpdateQuery_sqlite, DeleteQuery_sqlite),
  • The database engine can implement constructs that are not supported by some database engines (see TruncateQuery_sqlite),

We have a nice database abstraction layer, which will make true database portability a reality, assuming module authors uses the API properly. Relying on social pressure to do that will fail, as we fail to support PostgreSQL for years.

rfay’s picture

Thanks, DamZ. I'm +1 on clarifying this and enforcing it in the code, all the way.

catch’s picture

Issue tags: +API change

I'm neutral on removing single quoted literals from a code style perspective. However it's valid ANSI SQL and removing it is a very late API change which judging by the patch will affect a reasonable number of modules. Absolutely protecting against SQL injection might make it worth doing regardless (although we don't have a history of lots of SQL injection issues compare to XSS or access bypass), but adding the API change tag here either way so it gets documented properly if this goes in.

Once again on the social pressure argument, I still don't see what we get here that's not covered by automated testing, which feels like the proper place to be doing these kind of checks rather than in the critical code execution path.

cha0s’s picture

So you're telling me that once people find out "All I have to do is relax_checks = TRUE", that they'll somehow be 'socially pressured' to do the right thing in this case? lol

In other news, hey everybody, just use 'relax_checks' => TRUE,

ksenzee’s picture

cha0s, I think you're misinterpreting Damien's comments. This patch is not intended to be social pressure. It is intended to make people say, "Whoa, I'm doing something wrong, guess I'll check the docs." See my road analogy in #26. What we have *now* is social pressure, and it's basically useless.

tstoeckler’s picture

Version: 7.x-dev » 8.x-dev

It seems from the comments above that the more or less decided on way to go here includes a rather intrusive API change to the DB abstraction layer.
Moving to Drupal 8.
If you feel, this must be fixed in Drupal 7, please move back and elevate to critical.
Or if instead this issue should be fixed in Drupal 7 through documentation, hence without an API change, please move back and remove the API change tag.

jhedstrom’s picture

Version: 8.0.x-dev » 8.1.x-dev
Issue summary: View changes
Issue tags: +Needs issue summary update

This saw no attention in 8.0.x cycle, bumping to 8.1.x.

Crell’s picture

Status: Needs work » Closed (won't fix)

Given that

1) By now, everyone should know how to use DBTNG.

2) In Drupal 8, you should rarely if ever be writing your own queries anyway

3) In Drupal 8, with swappable services even if you do need to write your own queries I endorse an alternate service over carefully crafted multi-DB queries

I'm going to mark this won't-fix.