We received the following report from an optimization consultant:

The drupal_queue module has what could arguably called a bug. The following statement: (N is an integer date, seconds since epoch)

update queue set expired = 0 where expired < N

This should really be:

update queue set expired = 0 where expired != 0 and expired < N

The difference is the first update targets 230,000 rows to update, the 2nd targets about 11. And obviously updating the expired to 0 when its already 0 doesn't make much sense. This was being fired off via cron every 5 mins during the run.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

alex_b’s picture

Title: Performance optimization on query » Queue: Performance optimization on query
Project: Drupal Queue » Drupal core
Version: 6.x-1.1 » 7.x-dev
Component: Code » system.module
Status: Active » Needs review
FileSize
729 bytes

Great find. This affects Drupal 7 and Drupal Queue module:

// in system_cron():
db_update('queue')
  ->fields(array(
    'expire' => 0,
  ))
  ->condition('expire', REQUEST_TIME, '<')
  ->execute();

should be:

// in system_cron():
db_update('queue')
  ->fields(array(
    'expire' => 0,
  ))
  ->condition('expire', 0, '<>')
  ->condition('expire', REQUEST_TIME, '<')
  ->execute();
moshe weitzman’s picture

Um, couldn't you say this same thing about any update query? It would get boring reading so many queries like SET city=Boston WHERE city!=Boston AND skill=high. Seems like the MySQL compiler could just do this internally (but maybe it doesn't and thats the whole point).

alex_b’s picture

moshe: not quite. The first query updates all records that are smaller than REQUEST_TIME (230000 rows in the example above), while the patched query updates only those that aren't 0 (11 rows in the example above). The potential scale of the queue table makes the difference here.

pwolanin’s picture

@alex_b - are you sure about that? I think mysql does it already. For example, on a test site w/ 4 nodes:

mysql> update node set status = 1;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 4  Changed: 0  Warnings: 0
pillarsdotnet’s picture

The following is a NOP on every SQL implemenation I've used:

update foo set bar = baz where bar = baz
pwolanin’s picture

Unless we can show some actual performance difference, I think this should be closed.

pillarsdotnet’s picture

Status: Needs review » Closed (works as designed)
Damien Tournoud’s picture

Status: Closed (works as designed) » Active

Hm? I don't see MySQL doing that, or at least it is not documented.

Damien Tournoud’s picture

Ok, doesn't seem documented, but seems to happen anyway on MySQL:

mysql> UPDATE comment SET status = 1;
Query OK, 12143 rows affected (0.58 sec)
Rows matched: 12143  Changed: 12143  Warnings: 0

mysql> UPDATE comment SET status = 1;
Query OK, 0 rows affected (0.04 sec)
Rows matched: 12143  Changed: 0  Warnings: 0

Could we get some testing done on PostgreSQL and SQLite?

Damien Tournoud’s picture

The SQLite Driver does that out of the box (because we manually transform the queries), can anyone confirm that PostgreSQL does that too?

pillarsdotnet’s picture

FileSize
945 bytes

Output of mysql.test:

Creating test database and seeding with 1024 rows of random data
Dumping test database for repeatable results
Testing 100 iterations of query: "update queue set expire = 0 where expire < 2147483648"
Run 1: real 2.61 user 0.67 sys 0.34
Run 2: real 1.24 user 0.50 sys 0.21
Run 3: real 1.19 user 0.43 sys 0.21
Run 4: real 1.25 user 0.49 sys 0.23
Run 5: real 1.27 user 0.48 sys 0.24
Testing 100 iterations of query: "update queue set expire = 0 where expire <> 0 AND expire < 2147483648"
Run 1: real 1.18 user 0.45 sys 0.21
Run 2: real 1.23 user 0.47 sys 0.24
Run 3: real 1.26 user 0.48 sys 0.23
Run 4: real 1.89 user 0.61 sys 0.30
Run 5: real 1.47 user 0.58 sys 0.25

Downloading and installing postgres now, even though I think it's pointless...

pillarsdotnet’s picture

Okay, redid the scripts because Postgres doesn't have an "unsigned" datatype, and because the numbers were too small to be meaningful.

mysql.test results:

Creating test database and seeding with 65536 rows of random data
Dumping test database for repeatable results
Testing 100 iterations of query: "update queue set expire = 0 where expire < 1"
Run 1: real 1.47 user 0.53 sys 0.26
Run 2: real 1.35 user 0.49 sys 0.22
Run 3: real 1.16 user 0.42 sys 0.20
Run 4: real 1.24 user 0.47 sys 0.19
Run 5: real 1.46 user 0.52 sys 0.26
Testing 100 iterations of query: "update queue set expire = 0 where expire <> 0 AND expire < 1"
Run 1: real 1.25 user 0.41 sys 0.20
Run 2: real 1.32 user 0.50 sys 0.27
Run 3: real 1.64 user 0.51 sys 0.26
Run 4: real 1.26 user 0.47 sys 0.21
Run 5: real 1.25 user 0.51 sys 0.21

postgresql.test results:

Creating test database and seeding with 16,536 rows of random data
DROP DATABASE
CREATE DATABASE
NOTICE:  CREATE TABLE will create implicit sequence "queue_qid_seq" for serial column "queue.qid"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "queue_pkey" for table "queue"
CREATE INDEX
Dumping test database for repeatable results
Testing 100 iterations of query: "update queue set expire = 0 where expire < 1"
Run 1: real 179.28 user 6.00 sys 1.04
Run 2: real 177.24 user 6.26 sys 1.09
Run 3: real 177.75 user 6.34 sys 1.14
Run 4: real 177.29 user 6.23 sys 1.12
Run 5: real 175.35 user 6.33 sys 1.14
Testing 100 iterations of query: "update queue set expire = 0 where expire <> 0 AND expire < 1"
Run 1: real 8.55 user 3.42 sys 0.71
Run 2: real 8.73 user 3.37 sys 0.68
Run 3: real 9.21 user 3.37 sys 0.70
Run 4: real 8.37 user 3.35 sys 0.70
Run 5: real 8.41 user 3.40 sys 0.67

Well I'll be... I guess it does make a difference...

EDIT: removed line-noise from cut-and-paste above.

PPS: The mysql output should also say 16,536 rows... (65536 / 4).

pillarsdotnet’s picture

FileSize
1.01 KB
919 bytes

bah. files didn't attach...

neclimdul’s picture

Postgres probably is doing the "right" thing from a academic standpoint. Don't see why we wouldn't play along. We really don't want to update where expire = 0 so we might as well not rely on the db to figure that out for us.

pillarsdotnet’s picture

Category: bug » task
Issue tags: +Performance

Tagging.

This isn't really a "Bug report", as nothing is broken; it just runs sub-optimally under Postgres. Since no new features are introduced, it isn't a "Feature Request", so I guess it's a "Task".

pillarsdotnet’s picture

Issue tags: +PostgreSQL

tag.

pyrollo’s picture

The following is a NOP on every SQL implemenation I've used

From a data point of view, it could be. But if the database implements triggers they should be triggered on every row (affected or not). That's how it works on Oracle databases and I guess on PostgreSQL.

This can explain why PostgreSQL performance are so different for the two requests.

neclimdul’s picture

There's your academic(actually reasonable real world) explanation. Either way, we're not trying to re-update the old ones but expire new items so I'm for the addition. It makes even more sense to avoid in the event triggers would be run on the old items.

pyrollo’s picture

I agree with you : unless you actually want to trigger something you should only update what is needed.

I did not agree with the idea that an update leaving unchanged data is always equivalent to no operation.

Dave Reid’s picture

Issue tags: +Queue API

Adding tag since this change would need to be backported to http://drupal.org/project/drupal_queue when fixed in core.

pillarsdotnet’s picture

Title: Queue: Performance optimization on query » Do not set expired=0 where expired is already 0.
Version: 7.x-dev » 8.x-dev
Category: task » bug
Status: Active » Needs review
Issue tags: +Needs backport to D7
FileSize
796 bytes

Re-roll for d8, better title, and tags.

pwolanin’s picture

Style question - do we use '<>' or '!=' for core? I thought the latter.

Damien Tournoud’s picture

We use <>, that is the ANSI operator.

neclimdul’s picture

I don't know if its documented anywhere but I'd prefer the ANSI operator as well.

pwolanin’s picture

Maybe we need to start a companion page for Drupal 7? This looks a bit stale: http://drupal.org/node/2497

pillarsdotnet’s picture

@#22, #23, #24, #25 -- Does any of this require a change to the patch in #21? Should it change to "needs work" or "reviewed & tested"?

pillarsdotnet’s picture

#21: system_cron-879076-21.patch queued for re-testing.

pillarsdotnet’s picture

Status: Needs review » Reviewed & tested by the community

Any reason this shouldn't be RTBC ?

webchick’s picture

Status: Reviewed & tested by the community » Needs review

Because no one else has reviewed it yet. :)

pillarsdotnet’s picture

pillarsdotnet’s picture

neclimdul’s picture

Status: Needs review » Reviewed & tested by the community

I've been running this in production for a couple weeks with the drupal_queue module without issue(mysql). Being as this isn't adding any odd functionality just a small modification to make ACID compliant DB's faster and the only semi negative reviews have been code standard related and addressed I'll mark this RTBC.

webchick’s picture

Status: Reviewed & tested by the community » Fixed

Cool, thanks neclimdul!

Committed and pushed to 8.x and 7.x.

Dave Reid’s picture

This has been fixed in drupal_queue.module for Drupal 6 as well.
http://drupalcode.org/project/drupal_queue.git/commit/07374f2

Status: Fixed » Closed (fixed)
Issue tags: -Performance, -PostgreSQL, -Needs backport to D7, -Queue API

Automatically closed -- issue fixed for 2 weeks with no activity.