Allow using SQL_CALC_FOUND_ROWS in pager_query when it's available [#38297]

Comment	File	Size	Author
#6	pager_query.patch	4.33 KB	chx

Comment #1

chx commented 20 November 2005 at 14:15

Status:

Active

» Closed (won't fix)

Won't fix. Reason: postgresql.

Log in or register to post comments

Comment #2

killes@www.drop.org commented 20 November 2005 at 14:32

Status:

Closed (won't fix)

» Active

Wouldn't it be feasible to move the relevant (parts of) functions to the db specific include files?

Log in or register to post comments

Comment #3

Wesley Tanaka commented 20 November 2005 at 15:37

My mistake about the printf style arguments not getting passed. The same arguments get passed to the count query as to the main query. But this would still be nice to have when the query (and the needed count query) is an expensive join.

Log in or register to post comments

Comment #4

magico commented 19 August 2006 at 19:01

Version:

4.6.3

» x.y.z

Log in or register to post comments

Comment #5

FiReaNGeL commented 4 October 2006 at 17:10

Been looking at that issue and I just stumbled upon this (old) thread.

@killes: yes, it would be doable (and be a Very Good Thing (tm)). However, it would require a rewrite of pager_query. Right now, it's doing the queries backward. To use SQL_CALC_FOUND_ROWS, you must do the count 'query' (actually, just retrieve the precalculated row numbers) after the actual query. Right now, pager_query is doing the COUNT(*) query, then using the result to determine the limit of the subsequent query. On large databases, it gets ugly very fast, as COUNT(*) is inefficient. I have > 2 seconds COUNT queries on my test box with > 500 000 nodes.

The good way of doing it would be to do the query using SQL_CALC_FOUND_ROWS (if mysql, as pgsql would still use to old way of doing it) to determine the number of totals rows. But you'd need to know in advance which limit to use on the original query, so you'd need to know in advance on which page you're at (we have $page for that, and if it doesn't exist, we can assume page = 1). That's where i'm at right now, trying to figure out what

$pager_page_array[$element] = max(0, min((int)$pager_page_array[$element], ((int)$pager_total[$element]) - 1));

is for.

It would also be nice to detect if results == 0, and then redo the query for page 1. In case someone bookmarks a page which doesn't exist anymore.

Log in or register to post comments

Comment #6

chx commented 17 October 2006 at 14:13

Category:	feature	» task
Status:	Active	» Needs review

Status	File	Size
new	pager_query.patch	4.33 KB

The current pager_query makes an effort to make sure the pager being displayed is somewhat correct. If someone however deletes a couple nodes from the view you are currently on, you still can click onto a page which does not exist. My patch, therefore, does not sweat over something that's futile anyways. The number of pages will be updated after the query runs, not before. But, it runs one less query for MySQL. The price we pay is the small increase in the chance of returning an ermpty page, but I am emphasizing again, this chance was never zero and it's still very small.

Log in or register to post comments

Comment #7

killes@www.drop.org commented 17 October 2006 at 19:27

The code looks very nice. It gets rid of one of the cranky regexps that used to give us trouble once in a while and replaces it by a much easier one. No test done.

Log in or register to post comments

Comment #8

dries commented 18 October 2006 at 11:42

Why does pager.inc call a private function (i.e. _db_paged_query())?

Log in or register to post comments

Comment #9

FiReaNGeL commented 21 October 2006 at 01:12

Now this is strange. I benchmarked this on my desktop (MySQL 5.0.11). I created a bunch of nodes(70000) using devel generate-content, then benchmarked the 'administer->content management-> post' page.

MySQL query cache OFF (more on that later)

Before the patch:

SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT D runs in average 1346.906ms stdev 158.538
SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid runs in average 27ms

After the patch:

SELECT SQL_CALC_FOUND_ROWS n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT D runs in 6300ms?
The FOUND_ROWS query runs very fast as expected.

This is a very strange result, I'll do more testing on my own, but additional benchmarkers would be required. It is unexpected I think; i'll bench other pages too to see if its query dependent (maybe the mysql query optimizer just choked on that one with SQL_CALC_FOUND_ROWS?).

Now, more problems. It seems that the patch actually broke the pager, as pager links don't appear anymore. Only the links arent displayed anymore; i can access page 2 just fine with ?page=2.

To make things worse, it appears that in some cases SQL_CALC_FOUND_ROWS queries aren't cached by the mysql query cache, resulting in heavily degraded performance. With the old implementation, the COUNT query would get cached and be very fast past the first user / page. Now, with the cache turned on, the SQL_CALC_FOUND_ROWS always take 6000ms on that page - it doesn't get cached. I don't get this problem with the front page pager_query; the SQL_CALC_FOUND_ROWS get cached correctly. I don't have an explanation for this behavior, as I didn't set a limit on the size of queries that can be cached.

One thing that is consistent about it though is that even if it cache the first page of results, on the second page, it will have to run the query again (can't be mysql query cached, because the query isn't 100% identical; the limit has changed). This problem didnt exist with the old implementation; the COUNT query was content and was cached correctly, resulting in fast page loads.

I really thought that this patch would help performance, but as of now, it seems to make things worse for multiple reasons. Right now, -1 from me, sadly :(

Log in or register to post comments

Comment #10

FiReaNGeL commented 21 October 2006 at 17:59

Status:

Needs review

» Closed (won't fix)

After discussion with chx on IRC, we decided that the current system is superior to the SQL_CALC_FOUND_ROWS approach, if only because of mysql query cache yielding superior performance with the old COUNT approach.

Log in or register to post comments

Comment #11

hickory commented 12 October 2007 at 09:05

Version:

x.y.z

» 7.x-dev

I haven't tried the patch yet, but a test of SQL_CALC_FOUND_ROWS on a query I just tried yielded slightly different results to those described above (PHP 5, MySQL 5).

A query that normally took about 2 seconds (about 200,000 nodes), still took the same amount of time when SQL_CALC_FOUND_ROWS was added, and with MySQL's query cache turned on the result of the query using SQL_CALC_FOUND_ROWS was cached. This means that at least for some queries you could eliminate the extra time spent running a COUNT() query by using SQL_CALC_FOUND_ROWS.

The problem remains that queries with different LIMIT values aren't cached, so you don't get the benefit of a cached COUNT() query when paging through the results. It's debatable, but I'd say that the speed benefit for loading the first page outweighs the slowdown for subsequent pages, which aren't going to be accessed as often.

Log in or register to post comments

Comment #12

R.Muilwijk commented 8 May 2008 at 10:26

Subscribing...

Log in or register to post comments

Comment #13

owahab commented 18 June 2008 at 11:36

subscribing

Log in or register to post comments

Comment #14

sinasalek commented 20 October 2014 at 09:23

Issue summary:

View changes

We all know that mysql query cache is not smart. on modern interactive websites that are constantly updating. Mysql query cache can even slowdown the whole system.
On enterprise websites with complex queries, query cache is useless. Views for example generates lots of joins from many different tables and mysql is simply not capable of caching these sort of queries effectively

Log in or register to post comments

Comment #15

sinasalek commented 20 October 2014 at 12:21

Work on this issue is continued here #778050: Add support for database hints and make PagerDefault properly pluggeable, please join in and helpout if you're interested

Log in or register to post comments

Allow using SQL_CALC_FOUND_ROWS in pager_query when it's available

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

News items

Our community

Documentation

Drupal code base

Governance of community