Convert db_placeholders() to DBTNG [#314464]

Comment	File	Size	Author
#95	database.inc-314464-95.patch	1.4 KB	markus_petrux

#94	database.inc-314464-79.patch	998 bytes	markus_petrux

#89	placeholders.patch	12.41 KB	catch

#79	database.inc-314464-79.patch	998 bytes	markus_petrux

#61	db_314464.patch	4.19 KB	drewish

#60	placeholder_array.patch	3.67 KB	Crell

#57	placeholder_array.patch	2.97 KB	Crell

#44	placeholder-benchmarks.txt	2.14 KB	Dave Reid
#38	placeholder-db_placeholders.benchmark.txt	1.3 KB	Dave Reid
#38	placeholder-at-is_array.benchmark.txt	1.32 KB	Dave Reid
#38	placeholer-is_array.benchmark.txt	1.32 KB	Dave Reid
#38	placeholder-at.benchmark.txt	1.31 KB	Dave Reid
#34	placeholder-is_array.patch	4.12 KB	Crell

#34	placeholder-at.patch	4.12 KB	Crell

#34	placeholder-at-is_array.patch	4.14 KB	Crell

Comment #1

Damien Tournoud CreditAttribution: Damien Tournoud commented 28 September 2008 at 10:55

This probably should be a dynamic query.

Log in or register to post comments

Comment #2

Crell CreditAttribution: Crell commented 28 September 2008 at 17:58

db_placeholders() is the correct way to handle D6 IN statements. That's not been upgraded to the new API yet, though, and frankly I'm not entirely certain how we want to do so. For now, a dynamic select I know works properly and is very tight and readable. :-)

Log in or register to post comments

Comment #3

Crell CreditAttribution: Crell commented 6 October 2008 at 02:23

Title:

Problem with static SELECTs and WHERE IN-clause

» Convert db_placeholders() to DBTNG

OK, so I looked at db_placeholders(). We can convert it to the new API and just have it return incrementing placeholders fairly easily. However, we cannot do that without also updating everywhere it's used, as the API would be changing. So it's best to convert db_placeholders() in one shot across all of core. Renaming the issue accordingly.

The new API should, I think, take an sequential array and return a 2 element array that can be read using list(), containing a string of placeholders and an array of values to substitute for them. So something like this:

$values = array(1, 2, 3, 4, 5);
list($placeholders, $args) = db_placeholders($values);
$result = db_query("SELECT * FROM {foo} WHERE fid IN (" . $placeholders . ")", $args);

Sound like a plan?

Log in or register to post comments

Comment #4

CorniI CreditAttribution: CorniI commented 6 October 2008 at 14:34

I'd just agree with #1 that the new standard for IN-Queries is the dynamic query builder, much easier :D

Log in or register to post comments

Comment #5

Crell CreditAttribution: Crell commented 6 October 2008 at 14:52

Looking at the patch in http://drupal.org/node/302207#comment-1044710, I'm not sure I'd call that easier. :-)

Log in or register to post comments

Comment #6

catch

he/him

English

CreditAttribution: catch commented 26 October 2008 at 10:43

It's very non intuitive not being able to get IN to work in static queries - I just spent a couple of hours trying to work out why the multiple version of comment_nodeapi_load() was only working on the first node in my $nids array. I've used a dynamic query for now, but unless we're going to say 'use a static query for static queries, unless you want to use a dynamic query instead' as our guidelines, it's likely to get confusing.

Log in or register to post comments

Comment #7

Crell CreditAttribution: Crell commented 26 October 2008 at 18:26

@catch: I agree. Now what do you think of the solution proposed in #3? I don't want to move forward with it until I have some idea that it will be accepted.

Log in or register to post comments

Comment #8

catch

he/him

English

CreditAttribution: catch commented 26 October 2008 at 18:43

Crell: I'd really like to be able to do array(':nids', implode(', ', $nids) - but I have a feeling you'd have suggested that if there was an easy way to do it. Also what if I want to do IN(1,2) AND :something - seems like $args would preclude that.

Log in or register to post comments

Comment #9

Damien Tournoud CreditAttribution: Damien Tournoud commented 26 October 2008 at 19:16

What I would *love* to do is:

db_query("SELECT * FROM {foo} WHERE fid IN (:fids) AND type = :type", array(':fids' => array(1, 2, 3), ':type' => 'my_type'));

That would require to go thru $args and check for the type of each argument, which we may or may not want to do, based on performance considerations.

Another approach, perhaps with a lesser performance impact:

db_query("SELECT * FROM {foo} WHERE fid IN (:fids) AND type = :type", array(':fids' => array(1, 2, 3), ':type' => 'my_type'), WITH_PLACEHOLDERS);

My reasoning is that the $query part of db_query() should remain a static string as much as possible.

Log in or register to post comments

Comment #10

catch

he/him

English

CreditAttribution: catch commented 26 October 2008 at 19:28

Damien - that's even better.

Log in or register to post comments

Comment #11

Crell CreditAttribution: Crell commented 26 October 2008 at 23:39

Well WITH_PLACEHOLDERS is impossible as the third parameter is already the $options array. If anything it would be an additional options key.

Unfortunately either of those syntaxes would require us to string parse the query to find :fids and replace it with :fids_1, :fids_2, etc. One of the main design goals of DBTNG is to avoid string parsing of queries whenever possible. Right now the only place we do so is for the table prefixing, and if I could get rid of that, too, I would. :-)

(That said, I'm happy to see people putting out ideas for this issue as it is an important one!)

Log in or register to post comments

Comment #12

markus_petrux CreditAttribution: markus_petrux commented 27 October 2008 at 00:16

A mix between #3 and #9:

function db_placeholders($arguments, $type = 'int') {
  $placeholders = implode(',', array_fill(0, count($arguments), db_type_placeholder($type)));
  return array($arguments, $placeholders);
}

$values = array(1, 2, 3, 4, 5);
$result = db_query("SELECT * FROM {foo} WHERE fid IN (:fids)", array(':fids' => db_placeholders($values)));

db_query could react accordingly when a query argument is an array and take what db_placeholders() did.

Log in or register to post comments

Comment #13

Crell CreditAttribution: Crell commented 27 October 2008 at 01:13

db_type_placeholder() is going away, as the new query syntax is type-agnostic by design. #12 still requires string parsing the query to find :fids and convert it, just like #9 does. That would require us to have a preprocessor on all queries, which would have to be done before we create the prepared statement object. I'm really not wild about that approach.

Log in or register to post comments

Comment #14

catch

he/him

English

CreditAttribution: catch commented 27 October 2008 at 01:22

Let's not add string processing back in, nice as it looks on the surface it'd be ugly inside. I'm surprised PDO doesn't have a way to deal with this internally to be honest.

If we're stuck with IN(" $placeholders "), could we perhaps have array(':placeholders' => $args) to allow for IN queries with other conditions?

Log in or register to post comments

Comment #15

markus_petrux CreditAttribution: markus_petrux commented 27 October 2008 at 19:22

To evaluate performace impact in #9 option 1:


function tmp_query($query, $args = array(), $options = array()) {
  if (!is_array($args)) {
    $args = func_get_args();
    array_shift($args);
  }

  // Expand placeholders for IN() operators.
  foreach ($args as $key => $data) {
    if (is_array($data) && $key[0] == ':') {
      $keys = array();
      foreach ($data as $i => $value) {
        $args[$key . '_' . $i] = $value;
        $keys[] = $key . '_' . $i;
      }
      unset($args[$key]);
      $query = str_replace($key, implode(', ', $keys), $query);
    }
  }
//  list($query, $args, $options) = _db_query_process_args($query, $args, $options);

//  return Database::getActiveConnection($options['target'])->query($query, $args, $options);
}
timer_start('tmp_query');
for ($i=0; $i < 10000; $i++) {
tmp_query("SELECT * FROM {foo} WHERE fid IN (:fids) AND type = :type", array(':fids' => array(1, 2, 3), ':type' => 'my_type'));
}
print timer_read('tmp_query');

The loop took around 222 ms on one system, and 382 on another.

HTH

Log in or register to post comments

Comment #16

Crell CreditAttribution: Crell commented 27 October 2008 at 20:10

222 ms as compared to...?

Performance is part of the issue. The performance impact on queries that DON'T need an array is also part of it. And there's just general code cleanliness. String parsing of a serialized data structure (eg, SQL) inherently means you serialized too early.

Log in or register to post comments

Comment #17

markus_petrux CreditAttribution: markus_petrux commented 27 October 2008 at 21:48

222 ms as compared to...?

Ok, I added a check for isset($options['placeholders']) to bypass the foreach loop above. It takes 34 ms when FALSE, and 230 ms when TRUE. The for loops 10000 times in both cases, so it's 0.0034 ms -vs- 0.023 ms per one single call tmp_query() in the example above.

If there are more ideas on how to deal with this, then that could be compared with this example.

String parsing of a serialized data structure (eg, SQL) inherently means you serialized too early.

Sure, but you have to write code that is not so easy to understand / maintain, maybe.

Log in or register to post comments

Comment #18

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 8 November 2008 at 03:46

Ok, I'm going to have to put my foot down and say #3 is unacceptable. We put db_placeholders() in core for a reason, and that reason is because it's way too easy to introduce SQL injection attacks from not properly escaping your arguments. Removing that would be a huge step backwards in terms of security.

That leaves us with two options:

1. Damien's suggestion about ' ... IN(:fids) ...', array(':fids' => array(1, 2, 3)), which I personally find easiest to read and most logical. Would require an is_array() check on all placeholders on all queries.

2. Some sort of "magical" placeholder. Perhaps @placeholder to signify an array of values, or :db_placeholder to try and pick something that's not going to be a database column. Would require string parsing all placeholders on all queries.

I'd like to see benchmarks of both approaches to know which way is best. I can't tell from #15 / #17 if that's what's being tested... sorry, been a long week. :(

Log in or register to post comments

Comment #19

catch

he/him

English

CreditAttribution: catch commented 8 November 2008 at 14:32

OK, so my suggestion in irc, which was was very late at night so might have a fundamental flaw was this:

db_query("SELECT * FROM node WHERE nid IN(:placeholders) AND published = :published", array(':placeholders' => array(1, 2, 3, 4, 5), ':published' => 1));

function db_query($query, $args)
  if (isset($args[':placeholders'])) {
    unset($args[':placeholders']);
  }
[.. continue as normal ..]

If that's viable, then it's a lot less expensive than parsing for the magic placeholder query - we're just doing an isset, then only do the str_replace etc. after that.

The issue here is you can only use one IN() per query. Not pretty, but we can do the same thing for $args[':placeholders2'] etc. if we really have to.

Assuming no major holes, I reckon this is the least worst solution - it allows us to keep the current syntax with no serious performance implications.

Log in or register to post comments

Comment #20

webchick

she/they

English

Vancouver 🇨🇦

CreditAttribution: webchick commented 8 November 2008 at 16:21

I would really love to NOT have to do that. :( But we might be forced to if the other way proves too expensive.

We'd have to decide a depth to go to, for example 3 (placeholders, placeholders2, placeholders3) that we deemed "No query is possibly going to be crazy enough to need THREE IN() clauses!" But, if someone /was/ to write some bizarro query that required it, their only recourse would be to hack core.

There's also a DX problem in that people are not going to expect to need a "fancy" placeholder to do something that's standard SQL. I predict "Unknown column 'Array' in 'where clause' " being one of our top 10 Troubleshooting FAQs for D7 when your example ends up getting translated to "SELECT * FROM node WHERE nid IN(Array) AND published = 1" (or whatever it does).

Log in or register to post comments

Comment #21

pwolanin CreditAttribution: pwolanin commented 8 November 2008 at 17:18

What happened to the suggestion from last night of using @ or some other special char to indicate an array?

something like:

function db_query($query, $args = array(), $options = array()) {
  if (!is_array($args)) {
    $args = func_get_args();
    array_shift($args);
  }
  list($query, $args, $options) = _db_query_process_args($query, $args, $options);

  foreach ($args as $key => $data) {
    // is_array() is slow, so do an initial char-based check.
    if ($key[0] == '@' && is_array($data)) {
      $new_keys = array();
      $base = ':' . substr($key, 1);
      foreach ($data as $i => $value) {
        $p = $base . '_' . $i;
        while (isset($args[$p])) {
          $p .= mt_rand();
        }
        $new_keys[$p] = $value;
      }
      $query = str_replace($key, implode(', ', $new_keys), $query);
      unset($args[$key]);
    }
  }

  return Database::getActiveConnection($options['target'])->query($query, $args, $options);
}

Log in or register to post comments

Comment #22

markus_petrux CreditAttribution: markus_petrux commented 8 November 2008 at 21:01

IMO, that's great idea :)

- Suggestion 1:


$key = '@abc';

timer_start('timer-1');
for ($i=0; $i < 10000; $i++) {
  $base = ':' . substr($key, 1);
}
print timer_read('timer-1') ."<br />\n";

timer_start('timer-2');
for ($i=0; $i < 10000; $i++) {
  $base = $key;
  $base[0] = ':';
}
print timer_read('timer-2') ."<br />\n";

First method took on my system 7.64 ms. Second method took on my system 3.63 ms. So it looks like $base = $key; $base[0] = ':'; is significantly faster than $base = ':' . substr($key, 1);.

- Suggestion 2: if using @ prefix, maybe it could be assumed that $data is an array, so checking for is_array($data) could be removed. Alternative solution: document that @ can only be used when argument is an array.

- Suggestion 3: maybe it could also be removed the while loop/mt_rand. Just document that these placeholders need to use a unique prefix.

Log in or register to post comments

Comment #23

pwolanin CreditAttribution: pwolanin commented 8 November 2008 at 20:59

quick benchmark, suggests that having the check on each arg isn't too bad:

elapsed with stub code: 0.585354089737
elapsed w/ no @: 0.983224153519
elapsed w/ 1 @: 2.87513208389
elapsed w/ 2 @: 5.26019191742

function db_query($query, $args = array(), $options = array()) {
  if (!is_array($args)) {
    $args = func_get_args();
    array_shift($args);
  }

  foreach ($args as $key => $data) {
    // is_array() is slow, so do an initial char-based check.
    if ($key[0] == '@' && is_array($data)) {
      $new_keys = array();
      $base = ':' . substr($key, 1);
      foreach ($data as $i => $value) {
        $p = $base . '_' . $i;
        while (isset($args[$p])) {
          $p .= mt_rand();
        }
        $new_keys[$p] = $value;
      }
      $query = str_replace($key, implode(', ', $new_keys), $query);
      unset($args[$key]);
    }
  }
//  list($query, $args, $options) = _db_query_process_args($query, $args, $options);
//  return Database::getActiveConnection($options['target'])->query($query, $args, $options);
}

function dummy_query($query, $args = array(), $options = array()) {
  if (!is_array($args)) {
    $args = func_get_args();
    array_shift($args);
  }

//  list($query, $args, $options) = _db_query_process_args($query, $args, $options);
//  return Database::getActiveConnection($options['target'])->query($query, $args, $options);
}

$start = microtime(TRUE);
for ($i=0; $i < 100000; $i++) {
  dummy_query("SELECT * FROM {foo} WHERE fid = :fid AND type = :type AND status = :status", array(':fid' => 1, ':type' => 'my_type', ':status' => 1));
}
$end = microtime(TRUE);
echo "elapsed with stub code: ". ($end - $start) ."\n";

$start = microtime(TRUE);
for ($i=0; $i < 100000; $i++) {
  db_query("SELECT * FROM {foo} WHERE fid = :fid AND type = :type AND status = :status", array(':fid' => 1, ':type' => 'my_type', ':status' => 1));
}
$end = microtime(TRUE);
echo "elapsed w/ no @: ". ($end - $start) ."\n";

$start = microtime(TRUE);
for ($i=0; $i < 100000; $i++) {
  db_query("SELECT * FROM {foo} WHERE fid IN (@fids) AND type = :type AND status = :status", array('@fids' => array(1, 2, 3), ':type' => 'my_type', ':status' => 1));
}
$end = microtime(TRUE);
echo "elapsed w/ 1 @: ". ($end - $start) ."\n";

$start = microtime(TRUE);
for ($i=0; $i < 100000; $i++) {
  db_query("SELECT * FROM {foo} WHERE fid IN (@fids) AND nid in (@nids) AND type = :type AND status = :status", array('@fids' => array(1, 2, 3), '@nids' => array(1, 2, 3, 4, 5, 6), ':type' => 'my_type', ':status' => 1));
}
$end = microtime(TRUE);
echo "elapsed w/ 2 @: ". ($end - $start) ."\n";

Log in or register to post comments

Comment #24

pwolanin CreditAttribution: pwolanin commented 8 November 2008 at 21:05

If we don't do the check on the 1st char, but just do:

    if (is_array($data)) {

the results are like:

elapsed w/ no @: 1.32884788513
elapsed w/ 1 @: 3.0096218586
elapsed w/ 2 @: 5.33900809288

Log in or register to post comments

Comment #25

markus_petrux CreditAttribution: markus_petrux commented 8 November 2008 at 21:26

I tried to mean just checking for if ($key[0] == '@') {. :-|

Log in or register to post comments

Comment #26

markus_petrux CreditAttribution: markus_petrux commented 8 November 2008 at 21:48

hmm... just noticed that your code could only be used for numeric values. It would fail for string_col IN (:values).

The following includes my previous suggestions and fixes that (adding dynamically generated arguments to $args array).

function tmp_query($query, $args = array(), $options = array()) {
  if (!is_array($args)) {
    $args = func_get_args();
    array_shift($args);
  }

  foreach ($args as $key => $data) {
    if ($key[0] == '@') {
      $new_args = array();
      $base = $key;
      $base[0] = ':';
      foreach ($data as $i => $value) {
        $new_args[$base . '_' . $i] = $value;
      }
      $query = str_replace($key, implode(', ', array_keys($new_args)), $query);
      unset($args[$key]);
      $args += $new_args;
    }
  }
//  list($query, $args, $options) = _db_query_process_args($query, $args, $options);
//  return Database::getActiveConnection($options['target'])->query($query, $args, $options);
}

Log in or register to post comments

Comment #27

markus_petrux CreditAttribution: markus_petrux commented 8 November 2008 at 21:54

Benchmarks...

Code in #23 took on my system:

elapsed with stub code: 0.278710126877
elapsed w/ no @: 0.61524105072
elapsed w/ 1 @: 2.38443303108
elapsed w/ 2 @: 4.19608592987

Code in #26 took on my system:

elapsed with stub code: 0.280394077301
elapsed w/ no @: 0.611225128174
elapsed w/ 1 @: 2.18950986862
elapsed w/ 2 @: 4.24191999435

Log in or register to post comments

Comment #28

Crell CreditAttribution: Crell commented 9 November 2008 at 21:27

OK, so if I'm following correctly, the @ solution doubles the time for a query but we're dealing with a tiny number to start with. It then has an incrementally larger impact on queries that actually DO have an array in them, by a linear amount.

Are any of the above micro-benchmarks for just the is_array() method? I agree with webchick that I'd rather magically detect arrays than magically detect @placeholder, if at all possible. Personally I'd veto :magic_name unless there's absolutely no alternative.

Whatever mechanism we go with I think it's logical to say that the overall cost for when there IS an array will be about equal, since it's essentially the same loop either way. So the main question is what the performance impact is on the 90% of queries that do not need the array handling.

Can someone try making these modifications to core as a patch and then benchmarking a normal page load? That will do a better job of telling us what the real world cost is than micro-benchmarks.

Log in or register to post comments

Comment #29

Damien Tournoud CreditAttribution: Damien Tournoud commented 9 November 2008 at 21:36

/me bet that the full page benchmark impact will be below the margin of error.

Log in or register to post comments

Comment #30

pwolanin CreditAttribution: pwolanin commented 9 November 2008 at 22:22

whoops - yes - apparently I forgot the line to add the new args to the existing args.

Probably Damien is right - that the rest of the overhead involved with executing a query will case any of the proposed changes to have a nearly unmeasureable effect on overall page serving time.

Log in or register to post comments

Comment #31

markus_petrux CreditAttribution: markus_petrux commented 9 November 2008 at 22:32

Crell, please note that the above benchmarks lack a small detail that may raise less difference between the proposal and the current method. It is the fact that "elapsed w/ X @" samples ought to be compared with db_placeholders(). The sample "elapsed with stub code" does not use IN () conditions.

Also, these tests loop 100,000 times. When 100,000 iterations take 2 seconds, that means one single step takes 0.00002 seconds.

Log in or register to post comments

Comment #32

Crell CreditAttribution: Crell commented 9 November 2008 at 22:50

@#31: As I said, let's see benchmarks on a full page load on HEAD to see what the real impact is. We shouldn't need to do any query conversion for that, just modify the conditional, since whatever we do the cost when we DO need to process an array should be a constant between the mechanisms being considered. We just want to test the cost of the extra checking.

Log in or register to post comments

Comment #33

markus_petrux CreditAttribution: markus_petrux commented 9 November 2008 at 23:01

I don't have the time now to test with a real page on D7 now, but this is something that may give an idea on how much overhead would be added with the foreach loop for checking @ in db_query().

/**
 * This function contains the code that would be added to db_query().
 */
function dummy($args) {
  foreach ($args as $key => $data) {
    if ($key[0] == '@') {
      $new_args = array();
      $base = $key;
      $base[0] = ':';
      foreach ($data as $i => $value) {
        $new_args[$base . '_' . $i] = $value;
      }
      $query = str_replace($key, implode(', ', array_keys($new_args)), $query);
      unset($args[$key]);
      $args += $new_args;
    }
  }
}

/**
 * Loop 100,000 times with an array of arguments with no @.
 */
$start = microtime(TRUE);
for ($i=0; $i < 100000; $i++) {
  dummy(array(':fid' => 1, ':type' => 'my_type', ':status' => 1));
}
$end = microtime(TRUE);
echo "elapsed with no @: ". ($end - $start) ."\n";

elapsed with no @: 0.451392889023

That's 451 ms to loop 100,000 times.

HTH :)

Log in or register to post comments

Comment #34

Crell CreditAttribution: Crell commented 10 December 2008 at 05:13

Assigned:	Unassigned	» Crell
Status:	Active	» Needs review

File	Size
placeholder-at-is_array.patch	4.14 KB

placeholder-at.patch	4.12 KB

placeholder-is_array.patch	4.12 KB

OK, getting back to this...

Attached are three patches. All use @, although for one of them it wouldn't actually matter. All are based on the code from #21 above, and there is a unit test included to confirm that it works.

One patch checks only based on the use of @.

One patch checks only based on whether the data is an array. (If we went this route we would not use the @ sign, but for now it provides a more direct speed comparison since it's just the if statement we're really concerned about.)

One patch checks both for an @ and for whether the data is an array.

I'm uploading all three to see if the bot can handle all of them at once. :-) But we can now benchmark the total effect on a Drupal page load for normal queries. Once we know which approach we want to take we can micro-optimize the effectively constant cost for processing array-based placeholders and decide whether we want the @ sign or not, etc.

Log in or register to post comments

Comment #35

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 10 December 2008 at 06:57

Yay testing bot! Subscribing to help benchmark in the morning.

Log in or register to post comments

Comment #36

markus_petrux CreditAttribution: markus_petrux commented 10 December 2008 at 08:28

If @ is required, then checking for ($key[0] == '@') is faster and checking for is_array() might not be needed if the the execute method fail if it finds a remainig array in any $args item?

Another potential advantage when @ is required is that it may help identify this kind of queries when scanning code for whatever purpose.

Log in or register to post comments

Comment #37

Crell CreditAttribution: Crell commented 10 December 2008 at 08:39

@ #36: That's all dependent on what the benchmarks tell us. If the difference between the two methods is huge, then that answers it for us. If it's small, then we can pick our mechanism based on the DX factors of the resulting syntax rather than on performance. Let's wait for the numbers and see.

Log in or register to post comments

Comment #38

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 10 December 2008 at 18:04

File	Size
placeholder-at.benchmark.txt	1.31 KB
placeholer-is_array.benchmark.txt	1.32 KB
placeholder-at-is_array.benchmark.txt	1.32 KB
placeholder-db_placeholders.benchmark.txt	1.3 KB

Here's my preliminary benchmarking results.

I used the following code (adjusted a little for each patch's syntax and detailed in each benchmarking result) and ran with ab -c 1 -n 1000:

define('DRUPAL_ROOT', dirname(realpath(__FILE__)));
require_once DRUPAL_ROOT . '/includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$nids = db_query("SELECT nid FROM {node} ORDER BY created DESC LIMIT 10")->fetchAll(PDO::FETCH_COLUMN);
$nodes = db_query("SELECT * FROM {node} WHERE nid IN (@nids)", array('@nids' => $nids))->fetchAll();
$uids = db_query("SELECT uid FROM {users} ORDER BY name DESC LIMIT 25")->fetchAll(PDO::FETCH_COLUMN);
$users = db_query("SELECT * FROM {users} WHERE uid IN (@uids)", array('@uids' => $uids))->fetchAll();

exit();

From fastest to slowest:
122.481 ms: $key[0] == '@'
123.259 ms: $key[0] == '@' && is_array($data)
123.692 ms: is_array($data)
127.999 ms: db_placeholders()\

Log in or register to post comments

Comment #39

Crell CreditAttribution: Crell commented 10 December 2008 at 18:41

Thanks, Dave. What we really need, though, is a benchmark of a normal Drupal page load with each of these patches applied as compared to an unpatched HEAD using ab (and indicating what the settings were). That way we can gauge the "real world" impact of each method.

Interestingly it looks like any of these is an improvement over db_placeholders(), which is nice.

Log in or register to post comments

Comment #40

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 10 December 2008 at 18:45

I was hesitant to run the benchmarks on a normal Drupal page since it wouldn't show the true comparison benchmarks until the queries that use db_placeholder are replaced with '@'. BTW, I ran the previous tests with ab -c 1 -n 1000 http://mysql.drupalhead.local/test.php.

Log in or register to post comments

Comment #41

Crell CreditAttribution: Crell commented 10 December 2008 at 19:16

Well at the moment all we're looking for is the impact of the extra if() check. That's going to run for every query either way, so that will at least give us a comparison between the different mechanisms. You're right that a full benchmark against core would require a full conversion, which is a PITA, but from the benchmarks you already did it looks like we'll probably come out ahead either way.

Log in or register to post comments

Comment #42

catch

he/him

English

CreditAttribution: catch commented 10 December 2008 at 19:19

The three patches are mixed up with some query object caching stuff it seems.

Log in or register to post comments

Comment #43

Crell CreditAttribution: Crell commented 10 December 2008 at 19:27

That's deliberate. Trying to cache the statement object for a processed query would be very difficult, as we can't guarantee the same placeholders are generated each time. So instead I just disabled caching for those queries. The odds of them being run multiple times on the same page request is fairly low anyway.

Log in or register to post comments

Comment #44

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 16 December 2008 at 23:18

File	Size
placeholder-benchmarks.txt	2.14 KB

Here's my benchmarks for current HEAD with 50 users, 50 nodes (10 on front page), lots of taxonomy terms, nearly all core module enabled, and a few blocks on the front page (recent blog posts, who's new, who's online). Run with ab -c 1 -n 500 -C MY_USER1_SESSION_COOKIE http://mysql.drupalhead.local/, so logged in as user 1. I restarted apache before each test.

Log in or register to post comments

Comment #45

Crell CreditAttribution: Crell commented 17 December 2008 at 06:43

Hm. If I'm reading that right, the answer is "they're all virtually identical and have nearly no performance impact after all". I find that somewhat surprising, but I'd be very happy if it's true. :-)

If that's the case, then do we want to use @ flagging or the presence of an array? I think I would marginally prefer the array check, as it reduces the funky characters in the query string. OTOH, the @ makes it more obvious what you're expecting. Hm...

Log in or register to post comments

Comment #46

markus_petrux CreditAttribution: markus_petrux commented 17 December 2008 at 07:32

Benchmarking against a normal page may encapsulate the fact than one method is faster than the other. It may not have significant impact on normal page, but it may for some kind of crons, importing/exporting nodes, when processing a lot of data, certain views, etc.

The first benchmarks just compared the cost of using @ and/or is_array, etc. so I would take that into account as well.

Log in or register to post comments

Comment #47

Crell CreditAttribution: Crell commented 17 December 2008 at 08:16

Well the cost should be a constant multiplied by the number of queries executed, so any difference should be there. Even if we take the benchmarks in #38, there's a 0.8% difference between @ and is_array(), and both are faster than db_placeholder(). So I think the verdict is "the performance difference isn't big enough for us to care".

Log in or register to post comments

Comment #48

catch

he/him

English

CreditAttribution: catch commented 17 December 2008 at 11:23

Seems like the string comparison vs. is_array won't have any practical impact, so all other things being equal, my vote is for

db_query("SELECT name FROM {test} WHERE age IN (:ages) ORDER BY age", array(':ages' => array(25, 26, 27)))->fetchAll();

Since it's one less thing to remember, and in general very, very nifty.

Log in or register to post comments

Comment #49

markus_petrux CreditAttribution: markus_petrux commented 17 December 2008 at 11:43

Having no significant reason to do it otherwise, then it looks great like #48, TBH.

Log in or register to post comments

Comment #50

19 December 2008 at 08:50

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #51

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 19 December 2008 at 18:22

Status:

Needs work

» Needs review

Testing slave #8 failure.

Log in or register to post comments

Comment #52

Dries CreditAttribution: Dries commented 24 December 2008 at 09:53

Status:

Needs review

» Fixed

Committed to CVS HEAD. Thanks all.

Log in or register to post comments

Comment #53

catch

he/him

English

CreditAttribution: catch commented 24 December 2008 at 14:06

Status:

Fixed

» Needs work

The tests included with the patch which was committed don't match the :placeholder syntax which (I think) was agreed on.

Log in or register to post comments

Comment #54

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 24 December 2008 at 14:06

Status:

Needs work

» Active

Let's make sure we get a followup to correct this doc change:

Index: includes/database/database.inc
@@ -349,10 +349,14 @@ abstract class DatabaseConnection extend
    * @param $query
    *   The query string as SQL, with curly-braces surrounding the
    *   table names.
+   * @param $query
+   *   Whether or not to cache the prepared statement for later reuse in this
+   *   same request.  Usually we want to, but queries that require preprocessing
+   *   cannot be safely cached.

:)

Log in or register to post comments

Comment #55

Crell CreditAttribution: Crell commented 24 December 2008 at 16:43

Wait, Dries, what did you commit? The patches posted earlier were for benchmarking, not for applying. They all needed more tweaking before they were commit ready.

Are you agreed on the is-array-only check then? I'm fine with that, but there was other code in that patch that is totally irrelevant if we go that route. (It was retained for simplicity in testing and benchmarking.)

Log in or register to post comments

Comment #56

Crell CreditAttribution: Crell commented 25 December 2008 at 05:24

Status:

Active

» Needs review

Attached patch is a follow-up cleanup.

1) Fix lots of documentation bugs, including #54.

2) Fix the unit test to use :, not @, which is what we're going to use.

3) Remove the rand-based duplicate key checking from the expandArguments() method. It's not really needed unless the caller is doing something extremely stupid with his placeholders, which therefore falls into the "don't babysit broken code" category. A big block o' comments was added to explain that fact. Net result, this is probably the most efficient we can reasonably make that algorithm.

Log in or register to post comments

Comment #57

Crell CreditAttribution: Crell commented 25 December 2008 at 05:24

File	Size
placeholder_array.patch	2.97 KB

Sigh.

Log in or register to post comments

Comment #58

drewish CreditAttribution: drewish commented 25 December 2008 at 08:13

Status:

Needs review

» Needs work

Fails under PostgreSQL:

Array to string conversion	Notice	database.inc	1513	DatabaseStatementBase->execute()	
SELECT name FROM {test} WHERE age IN (:ages) ORDER BY age - Array ( [:ages] => Array ( [0] => 25 [1] => 26 [2] => 27 ) ) SQLSTATE[22P02]: Invalid text representation: 7 ERROR: invalid input syntax for integer: "Array"	Uncaught exception	database.inc	70	DatabaseConnection_pgsql->query()

Log in or register to post comments

Comment #59

David Strauss

he/him

CreditAttribution: David Strauss commented 25 December 2008 at 08:27

Subscribing.

Log in or register to post comments

Comment #60

Crell CreditAttribution: Crell commented 25 December 2008 at 09:05

Status:

Needs work

» Needs review

File	Size
placeholder_array.patch	3.67 KB

Stupid PostgreSQL... Try this.

Log in or register to post comments

Comment #61

drewish CreditAttribution: drewish commented 25 December 2008 at 09:38

File	Size
db_314464.patch	4.19 KB

Crell suggested changing the line in expandArguments: $query = str_replace($key, implode(', ', $new_keys), $query); to $new_keys to array_keys($new_keys)

That got the test to pass in both MySQL and PostgreSQL.

Log in or register to post comments

Comment #62

markus_petrux CreditAttribution: markus_petrux commented 25 December 2008 at 10:27

One potential problem using :placeholder here is that the str_replace may potentially affect other pleaceholders. Please, consider this:

db_query("SELECT name FROM {test} WHERE foo IN (:foo) AND bar = :foobar", array(
  ':foo' => array(25, 26, 27),
  ':foobar' => 28,
));

The following str_replace will affect both placeholders:

$query = str_replace($key, implode(', ', array_keys($new_keys)), $query);

If we're using @, then the str_replace would be safer in this case because @ would only be used for this kind of arguments. The above statement using @ would work:

db_query("SELECT name FROM {test} WHERE foo IN (@foo) AND bar = :foobar", array(
  '@foo' => array(25, 26, 27),
  ':foobar' => 28,
));

Though, it would still fail on queries like this:

db_query("SELECT name FROM {test} WHERE foo IN (@foo) AND bar = @foobar", array(
  '@foo' => array(25, 26, 27),
  '@foobar' => array(28, 29),
));

Another possibility would be using preg_replace instead of str_replace. But, then that would be slower.

Maybe doxygen could document this potential problems?

Log in or register to post comments

Comment #63

David Strauss

he/him

CreditAttribution: David Strauss commented 25 December 2008 at 17:16

It needs to be documented or totally fixed, not just with an "@" prefix for the placeholders. Using "@" for the placeholders only mitigates and masks a very real problem.

Log in or register to post comments

Comment #64

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 25 December 2008 at 18:14

I'm pretty sure that we solved this problem in the SQLite driver with preg_replace.

Log in or register to post comments

Comment #65

Crell CreditAttribution: Crell commented 25 December 2008 at 21:18

I'm inclined to just document that issue. The latest patch already removes the rand-key-generator, on the grounds that we expect the caller to use sane placeholders. I don't see why we can't do the same here. Trying to bullet-proof that code is going to slow it down.

Log in or register to post comments

Comment #66

markus_petrux CreditAttribution: markus_petrux commented 25 December 2008 at 21:22

Quick benchmark to compare str_replace with a possible way of using preg_replace:

$query = 'SELECT name FROM {test} WHERE age IN (:ages) ORDER BY age';
$key = ':ages';
$new_keys = array(
  ':ages_0' => 25,
  ':ages_1' => 26,
  ':ages_2' => 27,
);
$times = 10000;

timer_start('timer-1');
for ($i=0; $i < $times; $i++) {
  $dummy = str_replace($key, implode(', ', array_keys($new_keys)), $query);
}
$ms = timer_read('timer-1');
print 'Using str_replace: '. $ms .'ms for '. $times .' times, '. round($ms / $times, 6) ."ms average.<br />\n";

timer_start('timer-2');
for ($i=0; $i < $times; $i++) {
  $dummy = preg_replace('#'. $key .'(\W|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
}
$ms = timer_read('timer-2');
print 'Using preg_replace: '. $ms .'ms for '. $times .' times, '. round($ms / $times, 6) ."ms average.<br />\n";

Result:

Using str_replace: 34.86ms for 10000 times, 0.003486ms average.
Using preg_replace: 47.61ms for 10000 times, 0.004761ms average.

Edited to fix a bug, last timer_read was using 'timer-1', oops. Results fixed as well. Sorry.

Log in or register to post comments

Comment #67

drewish CreditAttribution: drewish commented 26 December 2008 at 04:40

Wouldn't strtr() do the trick?

Log in or register to post comments

Comment #68

markus_petrux CreditAttribution: markus_petrux commented 26 December 2008 at 07:13

Wouldn't strtr() do the trick?

Nope, it produces a similar result than using str_replace.

$query = 'SELECT name FROM {test} WHERE foo IN (:foo) AND bar = :foobar';
$key = ':foo';
$new_keys = array(
  ':foo_0' => 25,
  ':foo_1' => 26,
  ':foo_2' => 27,
);
$result = str_replace($key, implode(', ', array_keys($new_keys)), $query);
print 'Using str_replace: '. $result ."\n";
$result = strtr($query, array($key => implode(', ', array_keys($new_keys))));
print 'Using strtr: '. $result ."\n";
$result = preg_replace('#'. $key .'(\W|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
print 'Using preg_replace: '. $result ."\n";

Results:

Using str_replace: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foo_0, :foo_1, :foo_2bar
Using strtr: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foo_0, :foo_1, :foo_2bar
Using preg_replace: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foobar

Log in or register to post comments

Comment #69

catch

he/him

English

CreditAttribution: catch commented 26 December 2008 at 11:02

I think it's reasonable to just document this, queries are going to blow up when this bug gets hit and the comments should make it clear why.

Log in or register to post comments

Comment #70

markus_petrux CreditAttribution: markus_petrux commented 26 December 2008 at 11:19

Using preg_replace means adding just less than 0.005 milliseconds per query using argument with array(). Not fixing this, even if documented, may potentially cause annoying bugs. It's a matter of DX versus a little overhead penalty.

Log in or register to post comments

Comment #71

Dries CreditAttribution: Dries commented 26 December 2008 at 11:49

Sorry for pulling the trigger too fast on this. I committed Crell's and drewish' patch in #61. We can continue to refine as necessary, so I'm not marking this 'fixed' yet. Thanks for cleaning up behind me. ;-)

Log in or register to post comments

Comment #72

Crell CreditAttribution: Crell commented 26 December 2008 at 21:50

Thanks, Dries.

So the only remaining question, I think, is do we:

1) Document that using placeholders that are substrings of each other is a bad idea because they'll collide and expect module developers to not be dumb.

2) Switch from str_replace() to preg_replace() to have a less error-prone string replacement mechanism that won't create collisions (as much?) at the cost of a bit more processing time (but only when doing array expansion in queries).

3) All of the above.

Input?

Log in or register to post comments

Comment #73

David Strauss

he/him

CreditAttribution: David Strauss commented 26 December 2008 at 22:07

I prefer correctness over documenting pitfalls. The performance impact is pretty negligible.

Log in or register to post comments

Comment #74

markus_petrux CreditAttribution: markus_petrux commented 27 December 2008 at 10:21

won't create collisions (as much?)

As far as $key contains ":" followed by any number of letters, numbers or underscores, it won't. In the regular expression \W and $ should catch anything that is not a letter, number, underscore or end of subject.

Maybe one possible optimization would be using [^_a-zA-Z0-9] instead of \W. Here's a quick benchmark:

$query = 'SELECT name FROM {test} WHERE age IN (:ages) ORDER BY age';
$key = ':ages';
$new_keys = array(
  ':ages_0' => 25,
  ':ages_1' => 26,
  ':ages_2' => 27,
);
$times = 10000;

timer_start('timer-1');
for ($i=0; $i < $times; $i++) {
  $dummy = str_replace($key, implode(', ', array_keys($new_keys)), $query);
}
$ms = timer_read('timer-1');
print 'Using str_replace: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

timer_start('timer-2');
for ($i=0; $i < $times; $i++) {
  $dummy = preg_replace('#'. $key .'(\W|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
}
$ms = timer_read('timer-2');
print 'Using preg_replace v1: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

timer_start('timer-3');
for ($i=0; $i < $times; $i++) {
  $dummy = preg_replace('#'. $key .'([^_a-zA-Z0-9]|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
}
$ms = timer_read('timer-3');
print 'Using preg_replace v2: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

Results:

Using str_replace: 34.81 ms for 10000 times, 0.003481 ms average.
Using preg_replace v1: 49.17 ms for 10000 times, 0.004917 ms average.
Using preg_replace v2: 48 ms for 10000 times, 0.0048 ms average.

It seems preg_replace v2 tends to cost a bit less than v1.

Testing results:

$query = 'SELECT name FROM {test} WHERE foo IN (:foo) AND bar = :foobar';
$key = ':foo';
$new_keys = array(
  ':foo_0' => 25,
  ':foo_1' => 26,
  ':foo_2' => 27,
);
$result = str_replace($key, implode(', ', array_keys($new_keys)), $query);
print 'Using str_replace: '. $result ."<br />\n";
$result = strtr($query, array($key => implode(', ', array_keys($new_keys))));
print 'Using strtr: '. $result ."<br />\n";
$result = preg_replace('#'. $key .'(\W|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
print 'Using preg_replace v1: '. $result ."<br />\n";
$result = preg_replace('#'. $key .'([^_a-zA-Z0-9]|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
print 'Using preg_replace v2: '. $result ."<br />\n";

Using str_replace: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foo_0, :foo_1, :foo_2bar
Using strtr: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foo_0, :foo_1, :foo_2bar
Using preg_replace v1: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foobar
Using preg_replace v2: SELECT name FROM {test} WHERE foo IN (:foo_0, :foo_1, :foo_2) AND bar = :foobar

preg_replace v1 and v2 are equivalent.

Log in or register to post comments

Comment #75

27 December 2008 at 11:10

Status:

Needs review

» Needs work

The last submitted patch failed testing.

Log in or register to post comments

Comment #76

catch

he/him

English

CreditAttribution: catch commented 27 December 2008 at 13:04

Looking at those benchmarks, and since we only need to do that tiny extra bit of work on queries using the array anyway, I agree we should just go ahead and fix this.

Log in or register to post comments

Comment #77

John Morahan CreditAttribution: John Morahan commented 27 December 2008 at 13:53

If we use preg_replace with \b (as we did with SQLite), it is a bit faster than the other preg_replace versions, although still not quite as fast as str_replace:

$query = 'SELECT name FROM {test} WHERE age IN (:ages) ORDER BY age';
$key = ':ages';
$new_keys = array(
  ':ages_0' => 25,
  ':ages_1' => 26,
  ':ages_2' => 27,
);
$times = 1000000;

timer_start('timer-1');
for ($i=0; $i < $times; $i++) {
  $dummy = str_replace($key, implode(', ', array_keys($new_keys)), $query);
}
$ms = timer_read('timer-1');
print 'Using str_replace: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

timer_start('timer-2');
for ($i=0; $i < $times; $i++) {
  $dummy = preg_replace('#'. $key .'(\W|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
}
$ms = timer_read('timer-2');
print 'Using preg_replace v1: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

timer_start('timer-3');
for ($i=0; $i < $times; $i++) {
    $dummy = preg_replace('#'. $key .'([^_a-zA-Z0-9]|$)#', implode(', ', array_keys($new_keys)) .'\1', $query);
}
$ms = timer_read('timer-3');
print 'Using preg_replace v2: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

timer_start('timer-4');
for ($i=0; $i < $times; $i++) {
  $dummy = preg_replace('#' . $key . '\b#', implode(',', array_keys($new_keys)), $query);
}
$ms = timer_read('timer-4');
print 'Using preg_replace v3: '. $ms .' ms for '. $times .' times, '. round($ms / $times, 6) ." ms average.<br />\n";

Using str_replace: 4251.93 ms for 1000000 times, 0.004252 ms average.
Using preg_replace v1: 5701.79 ms for 1000000 times, 0.005702 ms average.
Using preg_replace v2: 5795.97 ms for 1000000 times, 0.005796 ms average.
Using preg_replace v3: 5215.83 ms for 1000000 times, 0.005216 ms average.

Log in or register to post comments

Comment #78

Crell CreditAttribution: Crell commented 27 December 2008 at 19:51

OK, it looks like there's a consensus for using preg, and it looks like the \b of SQLite is the fastest by a slim margin. Can someone roll the appropriate patch and update the inline docs as necessary?

Log in or register to post comments

Comment #79

markus_petrux CreditAttribution: markus_petrux commented 28 December 2008 at 02:09

File	Size
database.inc-314464-79.patch	998 bytes

John, that's a good one. \b is perfect here.

I tried to roll a patch that contains an explanation about the issue, but it probably could be made it easier. Anyway, HTH.

Log in or register to post comments

Comment #80

markus_petrux CreditAttribution: markus_petrux commented 28 December 2008 at 02:11

Status:

Needs work

» Needs review

Oops, I forgot to change the status. Sorry.

Log in or register to post comments

Comment #81

Dave Reid

he/him

English

Nebraska USA

CreditAttribution: Dave Reid commented 28 December 2008 at 02:23

IMHO, we should probably be using the following since we don't have anything specifically document that says "You cannot use the '#' character in a placeholder."
$query = preg_replace('/' . preg_quote($key) . '\b/', implode(', ', array_keys($new_keys)), $query);

Log in or register to post comments

Comment #82

markus_petrux CreditAttribution: markus_petrux commented 28 December 2008 at 02:50

hmm... maybe this should be documented that placeholders should only contain letters, numbers and underscores?

If placeholders could contain other characters, then \b could find another placeholder that should not be affected.

Log in or register to post comments

Comment #83

Dries CreditAttribution: Dries commented 28 December 2008 at 08:25

I think it is fine not to accept # as part of a placeholder.

Log in or register to post comments

Comment #84

David Strauss

he/him

CreditAttribution: David Strauss commented 28 December 2008 at 20:06

Can we throw an exception if the placeholder contains illegal values? I know we're not supposed to "babysit broken code" and all that, but I've long said that's not my personal priority/agenda.

Log in or register to post comments

Comment #85

Crell CreditAttribution: Crell commented 28 December 2008 at 21:22

I think PDO will already throw an exception if you pass it garbage as a placeholder. No need for us to do so as well.

Log in or register to post comments

Comment #86

David Strauss

he/him

CreditAttribution: David Strauss commented 28 December 2008 at 21:57

@Crell Can we get unit tests to check for such exceptions with garbage placeholders?

Log in or register to post comments

Comment #87

catch

he/him

English

CreditAttribution: catch commented 28 December 2008 at 22:47

Status:

Needs review

» Reviewed & tested by the community

Can we do that in a separate issue? We don't try to lock down t() placeholders to stop people using @!#@^^ instead of @foo and I don't see that this is much different.

Log in or register to post comments

Comment #88

David Strauss

he/him

CreditAttribution: David Strauss commented 28 December 2008 at 23:19

Status:

Reviewed & tested by the community

» Needs work

@catch The task of adding tests should never be relegated to a separate issue, and our lack of enforcement and tests for invalid placeholders in t() is not argument that we should lack the same tests here.

I just don't find "I *think* PDO throws an exception" to be an acceptable answer. We need to at least know what should happen with invalid placeholders, even if that handling is embedding in PDO, and I'd prefer that we test for that expected outcome properly propagating through DB-TNG.

Log in or register to post comments

Comment #89

catch

he/him

English

CreditAttribution: catch commented 29 December 2008 at 00:34

File	Size
placeholders.patch	12.41 KB

Discussed this with David in irc, and I pointed out that afaik we still allow ? placeholders in code if not by convention, despite them breaking devel query logging, so I stand by moving that discussion to a separate issue.

On another note, db_placeholders() still lives. Attached patch removes it and all calls to it except taxonomy_select_nodes, which is a bit of a pig and needs a full conversion to db_select() before we can do it properly - but we'll need to do it before we can close this so leaving at CNW. It's getting late so I'm not taking that on tonight. Some of the other queries need converting to db_select for other reasons too, but this issue is long enough I think and they ought to be caught by the general dbtng conversions.

Log in or register to post comments

Comment #90

catch

he/him

English

CreditAttribution: catch commented 29 December 2008 at 12:45

I've moved the conversion of static queries with db_placeholders to #352054: Convert calls to db_placeholders() in static queries.

Log in or register to post comments

Comment #91

Dries CreditAttribution: Dries commented 29 December 2008 at 16:05

I committed #352054: Convert calls to db_placeholders() in static queries so we should be back on track now. Thanks catch! :)

Log in or register to post comments

Comment #92

markus_petrux CreditAttribution: markus_petrux commented 29 December 2008 at 18:39

I did a small test with sqllite driver for PDO and it seems it does not like # as part of a placeholder.

try {
  $dbh = new PDO('sqlite:/tmp/foo.db');
  $dbh->query('CREATE TABLE foo (x int);');
  $dbh->prepare('SELECT * FROM foo WHERE x = :foo#');
  var_dump($dbh->errorInfo());
} catch (PDOException $e) {
    echo 'Connection failed: ' . $e->getMessage();
}

Result:

array(3) { [0]=>  string(5) "HY000" [1]=>  int(1) [2]=>  string(23) "unrecognized token: "#"" }

When trying with :foo#bar, it reported a syntax error.

However, it was possible to use placeholders like :fòó (note the accents). If someone uses non-word characters, the preg_replace statement will stop at the first non-word character, probably resulting in SQL syntax error, most likely detected during development, but who knows...

If the syntax for placeholders is checked, then it will penalize runtime performance. Maybe this kind of checks could be delegated to coder module?

Log in or register to post comments

Comment #93

Crell CreditAttribution: Crell commented 29 December 2008 at 22:21

I am totally cool with documenting "ASCII alphanumerics and underscores only" and adding that to the coder module as well. Otherwise we run into "babysitting" territory.

Log in or register to post comments

Comment #94

markus_petrux CreditAttribution: markus_petrux commented 30 December 2008 at 15:14

Status:

Needs work

» Needs review

File	Size
database.inc-314464-79.patch	998 bytes

Here's the patch in #79 updated with an explanation on how named placeholders need to be formatted.

Log in or register to post comments

Comment #95

markus_petrux CreditAttribution: markus_petrux commented 30 December 2008 at 15:16

File	Size
database.inc-314464-95.patch	1.4 KB

Sorry, I attached the wrong one.

Log in or register to post comments

Comment #96

Dries CreditAttribution: Dries commented 30 December 2008 at 20:32

Status:

Needs review

» Fixed

I've committed the patch in comment #95. If necessary, we can follow-up with additional patches. I've marked it 'fixed' but feel free to re-open with a patch.