Dear Friends,

I am migrating a large PhpBB forum with +500.000 posts to Drupal. I am using PostgreSQL, which has very neat logging, query analysis and statics features. I log any query which executes in more than 30 ms. It seems that some queries are executing to slowly :

2009-03-13 17:23:47 CET LOG:  durée : 53.388 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 31 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 83.108 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 50 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 52.884 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 4 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 54.215 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 28 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 54.393 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 29 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 54.316 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 26 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 59.272 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 25 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 58.622 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 27 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL
2009-03-13 17:23:47 CET LOG:  durée : 54.749 ms, instruction : SELECT COUNT(n.nid) FROM (SELECT DISTINCT ON (nid) * FROM node) n INNER JOIN term_node tn ON n.vid = tn.vid AND tn.tid = 23 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL

Running EXPLAIN ANALYSE:

"Aggregate  (cost=4418.38..4418.39 rows=1 width=4) (actual time=74.428..74.429 rows=1 loops=1)"
"  ->  Nested Loop  (cost=12.98..4418.37 rows=1 width=4) (actual time=74.421..74.421 rows=0 loops=1)"
"        ->  Merge Left Join  (cost=12.98..4033.42 rows=56 width=8) (actual time=70.333..71.815 rows=1019 loops=1)"
"              Merge Cond: (n.nid = h.nid)"
"              Filter: (h.nid IS NULL)"
"              ->  Subquery Scan n  (cost=0.00..4020.11 rows=113 width=8) (actual time=70.258..71.366 rows=1025 loops=1)"
"                    Filter: ((n.created > 1234369419) AND (n.status = 1))"
"                    ->  Unique  (cost=0.00..3006.18 rows=67595 width=96) (actual time=0.019..55.687 rows=67484 loops=1)"
"                          ->  Index Scan using node_pkey on node  (cost=0.00..2837.19 rows=67595 width=96) (actual time=0.018..35.562 rows=67484 loops=1)"
"              ->  Sort  (cost=12.98..13.00 rows=10 width=4) (actual time=0.070..0.073 rows=11 loops=1)"
"                    Sort Key: h.nid"
"                    Sort Method:  quicksort  Memory: 25kB"
"                    ->  Bitmap Heap Scan on history h  (cost=4.33..12.81 rows=10 width=4) (actual time=0.035..0.039 rows=11 loops=1)"
"                          Recheck Cond: (uid = 1)"
"                          ->  Bitmap Index Scan on history_pkey  (cost=0.00..4.33 rows=10 width=0) (actual time=0.023..0.023 rows=11 loops=1)"
"                                Index Cond: (uid = 1)"
"        ->  Index Scan using term_node_vid_idx on term_node tn  (cost=0.00..6.86 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1019)"
"              Index Cond: ((tn.vid)::integer = (n.vid)::integer)"
"              Filter: ((tn.tid)::integer = 31)"
"Total runtime: 74.522 ms"

Displaying 20 topics demands 1,5 second. If 500 users are connected, any database can die.
This may even be considered like a security issue, as anyone running queries on a forum can kill a database.

(only on large forums). This issue does not exist on small forums.

This seems to be this part of code in unread topics:

function _forum_topics_unread($term, $uid) {
  $sql = "SELECT COUNT(n.nid) FROM {node} n INNER JOIN {term_node} tn ON n.vid = tn.vid AND tn.tid = %d LEFT JOIN {history} h ON n.nid = h.nid AND h.uid = %d WHERE n.status = 1 AND n.created > %d AND h.nid IS NULL";
  $sql = db_rewrite_sql($sql);
  return db_result(db_query($sql, $term, $uid, NODE_NEW_LIMIT));
}

I don't have a solution yet.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

grub3’s picture

The DISTINCT CLAUSE is not needed as nid is the primary key from nodes.

SELECT COUNT(n.nid) 
FROM node n 
INNER JOIN term_node tn ON n.vid = tn.vid
LEFT JOIN history h ON n.nid = h.nid 
WHERE n.status = 1 AND n.created > 1234369419 AND h.nid IS NULL AND tn.tid = 31 AND h.uid = 1

This does not solve this issue.

grub3’s picture

A suitable solution would be to hide topic NEW answers on request.

In large forums, there is X topics (> 50.000) and Y (>500.000) posts.
If 1.000 simultaneous users are connecting, it make impossible to drive down queries with COUNT.

There can be virtually an infinity of SQL queries involving history.
It makes possible for attackers to run simultaneous queries.

The only solution would be to use MATERIALIZED views in PostgreSQL.
It is a wizer solution to disable NEW post count on large forums, based on user settings.

grub3’s picture

I think we should only display if NEW messages have been added. I looked at PhpBB code and this how they solve this problem. Counting new messages is very dangerous and should not be done. The solution is to look for queries which use "COUNT" and transform them into queries which only test if new messages exist.

I am not *** very sure *** but I think any wize attacker could drive down Drupal forum reading this message.

grub3’s picture

Priority: Normal » Critical

Priority: critical?

Damien Tournoud’s picture

Priority: Critical » Normal

This is exactly why we have Materialized views.

catch’s picture

Version: 6.9 » 7.x-dev
Priority: Normal » Critical

@Damien, yes it is, but since there's currently no code for materialized view API in CVS, nor a core issue to get it in, we should keep the actual core performance issues it's supposed to solve critical to avoid losing track of them.

I was running a forum on Drupal with tens of thousands of nodes and hundreds of thousands of comments on shared hosting and then a very small VPS for ages - it's not only sites with lots of resources which run up against these queries.

sun.core’s picture

Priority: Critical » Normal
Status: Active » Postponed (maintainer needs more info)

Still an issue with revised taxonomy index?

catch’s picture

mysql> EXPLAIN SELECT COUNT(n.nid) AS count FROM node n INNER JOIN forum f ON n.vid = f.vid AND f.tid = 7 INNER JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE (status = 1) AND (n.created > 123456789) AND (h.nid IS NULL );
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra            |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
|  1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Impossible WHERE | 
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
1 row in set (0.00 sec)

hmm..

andypost’s picture

Status: Postponed (maintainer needs more info) » Needs review

Current query was wrong and fixed in #877020: _forum_topics_unread() broken

mysql> EXPLAIN SELECT COUNT(n.nid) AS count FROM node n INNER JOIN forum f ON n.vid = f.vid AND f.tid = 35 LEFT JOIN history h ON n.nid = h.nid AND h.uid = 1 WHERE (status = 1) AND (n.created > 123456789) AND (h.nid IS NULL );
+----+-------------+-------+--------+-----------------------------------+---------+---------+----------------+------+--------------------------------------+
| id | select_type | table | type   | possible_keys                     | key     | key_len | ref            | rows | Extra                                |
+----+-------------+-------+--------+-----------------------------------+---------+---------+----------------+------+--------------------------------------+
|  1 | SIMPLE      | f     | ref    | PRIMARY,tid                       | tid     | 4       | const          |    2 | Using index                          |
|  1 | SIMPLE      | n     | eq_ref | vid,node_created,node_status_type | vid     | 4       | d7.f.vid       |    1 | Using where                          |
|  1 | SIMPLE      | h     | eq_ref | PRIMARY,nid                       | PRIMARY | 8       | const,d7.n.nid |    1 | Using where; Using index; Not exists |
+----+-------------+-------+--------+-----------------------------------+---------+---------+----------------+------+--------------------------------------+
3 rows in set (0.00 sec)
grub3’s picture

To run an actual EXPLAIN ANALYSE, you need a real database.
The query planner will choose different options depending on table size.
On small tables, the tables will ALWAYS be cached in memory.

I can only suggest to use PostgreSQL + pgAdmin3 and 500.000+ rows for your testings.

You may be interested to read my developer guide here:

Guidelines for SQL:
http://drupal.org/node/559302

The most interesting part if :
Run EXPLAIN on slow queries to fin sequential scans
http://drupal.org/node/559720

Typically, in a Forum, we need to avoid any kind of sequential scan.

Dries’s picture

Status: Needs review » Needs work

No patch yet.

andypost’s picture

@jmpoure I have no forum with big number of posts to explain

Elin Yordanov’s picture

Is that issue a duplicate of #145353: Forums performance improvements?

This is a big performance issue on content heavy sites. On my site, it takes about 340 ms to calculate unread topics of all forums.
See the attached screenshot from XHProf report.