I've been looking at a large Drupal 6 site running notifications, comment posting is very slow due to the following query:

INSERT INTO notifications_queue (uid, destination, sid, module, eid, send_interval, send_method, cron, created, conditions) SELECT DISTINCT s.uid, s.destination, s.sid, s.module, 3467363, s.send_interval, s.send_method, s.cron, 1298441289, s.conditions FROM notifications s LEFT JOIN notifications_fields f ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND (s.uid <> 1) AND ((f.field = 'tid' AND f.intval IN (16651,16661,16671,16681,4431,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,1761,711,16651,16661,4431,2161,16681,16671,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161)) OR (f.field = 'nid' AND f.intval = 1496241) OR (f.field = 'type' AND f.value = 'blog') OR (f.field = 'author' AND f.intval = 268241)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid)

EXPLAIN (of the same query except just the SELECT):

+----+-------------+-------+------+-----------------------------+---------+---------+---------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------------+---------+---------+---------------+--------+----------------------------------------------+
| 1 | SIMPLE | s | ALL | PRIMARY,uid_type_conditions | NULL | NULL | NULL | 206910 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | f | ref | PRIMARY | PRIMARY | 4 | connect.s.sid | 1 | Using where |
+----+-------------+-------+-----

This takes around 1.5 seconds on average.

Can half the number of rows examined by added an additional index to {notifications}. Late here so no patch - will sort that tomorrow, but:

CREATE INDEX queue ON notifications (status, event_type, send_interval, uid);

Now the explain looks like:

MariaDB [connect]> EXPLAIN SELECT DISTINCT s.uid, s.destination, s.sid, s.module, 3467363, s.send_interval, s.send_method, s.cron, 1298441289, s.conditions FROM notifications s LEFT JOIN notifications_fields f ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND (s.uid <> 1) AND ((f.field = 'tid' AND f.intval IN (16651,16661,16671,16681,4431,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,1761,711,16651,16661,4431,2161,16681,16671,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161,16651,16661,4431,16681,16671,1761,711,2161)) OR (f.field = 'nid' AND f.intval = 1496241) OR (f.field = 'type' AND f.value = 'blog') OR (f.field = 'author' AND f.intval = 268241)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);
+----+-------------+-------+------+-----------------------------------+---------+---------+---------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys                     | key     | key_len | ref           | rows   | Extra                                        |
+----+-------------+-------+------+-----------------------------------+---------+---------+---------------+--------+----------------------------------------------+
|  1 | SIMPLE      | s     | ref  | PRIMARY,uid_type_conditions,queue | queue   | 772     | const,const   | 107625 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | f     | ref  | PRIMARY                           | PRIMARY | 4       | connect.s.sid |      1 | Using where                                  |
+----+-------------+-------+------+-----------------------------------+---------+---------+---------------+--------+----------------------------------------------+
2 rows in set (0.00 sec)

Counts from the two tables:

MariaDB [connect]> SELECT COUNT(*) FROM notifications;
+----------+
| COUNT(*) |
+----------+
|   214620 |
+----------+
1 row in set (0.05 sec)

MariaDB [connect]> SELECT COUNT(*) FROM notifications_fields;
+----------+
| COUNT(*) |
+----------+
|   398802 |
+----------+
1 row in set (0.09 sec)
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

catch’s picture

Version: 7.x-1.x-dev » 6.x-2.x-dev
Status: Active » Needs review
FileSize
573 bytes

Here's the patch. I'm not sure what the status of the different versions is, so against 2.x-dev for now.

catch’s picture

Had an idea to optimize the hook_node_*() of this further (not the actual query, just take the hard work out of the node/comment posting) - posted an issue at #1075862: Notifications queuing adds a lot of time to node/comment inserts, doesn't affect the patch here.

mstef’s picture

Status: Needs review » Needs work

Isn't this patch forgetting to alter hook_schema() as well?

mstef’s picture

Status: Needs work » Needs review
FileSize
1.35 KB

Attached patch added 3 indicies to improve performance. Also added the changes to hook_schema().

rjbrown99’s picture

This seems to impact the 6.x-4.x branch as well. I have similar queries showing up.

INSERT INTO notifications_queue (uid, mdid, send_method, sid, module, eid, send_interval, language, cron, created, conditions) SELECT DISTINCT s.uid, s.mdid, s.send_method, s.sid, s.module, ?, s.send_interval, s.language, s.cron, ?, s.conditions FROM notifications s LEFT JOIN notifications_fields f ON s.sid = f.sid WHERE (s.status = ?) AND (s.event_type = '?') AND (s.send_interval >= ?) AND (s.uid <> ?) AND ((f.field = '?' AND f.intval IN (?)) OR (f.field = '?' AND f.intval = ?) OR (f.field = '?' AND f.value = '?') OR (f.field = '?' AND f.intval = ?)) GROUP BY s.uid, s.mdid, s.send_method, s.sid, s.module, s.send_interval, s.cron, s.conditions, s.language HAVING s.conditions = count(f.sid)
rjbrown99’s picture

FWIW, the patch in #4 still needs to add the indexes to the schema. IE, something like

'indexes' => array('uid_type_conditions' => array('uid', 'type', 'conditions')),
mstef’s picture

Oh, I think I screwed that up..

pdrake’s picture

I fixed the schema issues in #4 and added one additional index that reduces the number of rows examined in the query below by more than 99.9%, and reduces the execution time from ~4.06s to ~0.00s:

SELECT n.sid FROM notifications n INNER JOIN notifications_fields nf? ON n.sid = nf?.sid WHERE (n.event_type = ?) AND (nf?.field = ?) AND (nf?.intval = ?)

pdrake’s picture

Here's a followup that adds another beneficial index. Specifically, when inserting items into the notifications_queue table, a select is performed which uses both notifications_fields.value and notifications_fields.intval. This index reduces query execution time from ~4.7s to ~0.3s for queries like:

INSERT INTO notifications_queue (uid, destination, sid, module, eid, send_interval, send_method, cron, created, conditions) SELECT DISTINCT s.uid, s.destination, s.sid, s.module, ?, s.send_interval, s.send_method, s.cron, ?, s.conditions FROM notifications s LEFT JOIN notifications_fields f USE INDEX (field_intval) ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND ((f.field = 'tid' AND f.intval IN (?,?)) OR (f.field = 'nid' AND f.intval = ?) OR (f.field = 'type' AND f.value = '?') OR (f.field = 'author' AND f.intval = ?)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);

rjbrown99’s picture

catch’s picture

I tried #9 on a site that's having problems with this query, but didn't see an improvement, here's the EXPLAIN:

mysql> EXPLAIN SELECT DISTINCT s.uid, s.destination, s.sid, s.module, 8373081, s.send_interval, s.send_method, s.cron, 1352627514, s.conditions FROM notifications s LEFT JOIN notifications_fields f ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND (s.uid <> 981841) AND ((f.field = 'tid' AND f.intval IN (711,1383,2341)) OR (f.field = 'nid' AND f.intval = 2567041) OR (f.field = 'type' AND f.value = 'sc_forum') OR (f.field = 'author' AND f.intval = 2725901)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);
+----+-------------+-------+------+-----------------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys                           | key     | key_len | ref                | rows   | Extra                                        |
+----+-------------+-------+------+-----------------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | s     | ref  | PRIMARY,uid_type_conditions,queue       | queue   | 772     | const,const        | 225780 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | f     | ref  | PRIMARY,field_intval,field_value_intval | PRIMARY | 4       | connect_prod.s.sid |      1 | Using where                                  | 
+----+-------------+-------+------+-----------------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
2 rows in set (0.00 sec)

And forcing the index field_intval or field_value_intval to be used actually increases the number of rows evaluated:

mysql> EXPLAIN SELECT DISTINCT s.uid, s.destination, s.sid, s.module, 8373081, s.send_interval, s.send_method, s.cron, 1352627514, s.conditions FROM notifications s LEFT JOIN notifications_fields f USE INDEX (field_intval) ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND (s.uid <> 981841) AND ((f.field = 'tid' AND f.intval IN (711,1383,2341)) OR (f.field = 'nid' AND f.intval = 2567041) OR (f.field = 'type' AND f.value = 'sc_forum') OR (f.field = 'author' AND f.intval = 2725901)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);
+----+-------------+-------+--------+-----------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                     | key     | key_len | ref                | rows   | Extra                                        |
+----+-------------+-------+--------+-----------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | f     | ALL    | field_intval                      | NULL    | NULL    | NULL               | 749568 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | s     | eq_ref | PRIMARY,uid_type_conditions,queue | PRIMARY | 4       | connect_prod.f.sid |      1 | Using where                                  | 
+----+-------------+-------+--------+-----------------------------------+---------+---------+--------------------+--------+----------------------------------------------+
2 rows in set (0.00 sec)

I don't have any good ideas to improve this yet, just updating the issue for now.

pdrake’s picture

@catch note that the query addressed by the index in #9 is a bit different than the original query and the query in #11 (specifically, the original query has a WHERE clause like "AND (s.uid <> ?)."

catch’s picture

Hmm OK it looks like that's due to this variable:

    // Add one more condition if we don't send notifications on own posts
    if (!variable_get('notifications_sendself', 0) && !empty($event->uid)) {
      $query['where'][] = 's.uid <> %d';
      $query['where args'][] = $event->uid;
    }

However removing that condition the EXPLAIN still doesn't look improved:

mysql> EXPLAIN SELECT s.uid, s.destination, s.sid, s.module, 8373081, s.send_interval, s.send_method, s.cron, 1352627514, s.conditions FROM notifications s LEFT JOIN notifications_fields f USE INDEX (field_intval) ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND ((f.field = 'tid' AND f.intval IN (711,1383,2341)) OR (f.field = 'nid' AND f.intval = 2567041) OR (f.field = 'type' AND f.value = 'sc_forum') OR (f.field = 'author' AND f.intval = 2725901)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);
+----+-------------+-------+--------+---------------+---------+---------+--------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref                | rows   | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+--------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | f     | ALL    | field_intval  | NULL    | NULL    | NULL               | 749881 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | s     | eq_ref | PRIMARY,queue | PRIMARY | 4       | connect_prod.f.sid |      1 | Using where                                  | 
+----+-------------+-------+--------+---------------+---------+---------+--------------------+--------+----------------------------------------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT DISTINCT s.uid, s.destination, s.sid, s.module, 8373081, s.send_interval, s.send_method, s.cron, 1352627514, s.conditions FROM notifications s LEFT JOIN notifications_fields f ON s.sid = f.sid WHERE (s.status = 1) AND (s.event_type = 'node') AND (s.send_interval >= 0) AND (s.uid <> 981841) AND ((f.field = 'tid' AND f.intval IN (711,1383,2341)) OR (f.field = 'nid' AND f.intval = 2567041) OR (f.field = 'type' AND f.value = 'sc_forum') OR (f.field = 'author' AND f.intval = 2725901)) GROUP BY s.uid, s.destination, s.sid, s.module, s.send_interval, s.send_method, s.cron, s.conditions HAVING s.conditions = count(f.sid);