Closed (won't fix)
Project:
Drupal core
Version:
7.x-dev
Component:
database system
Priority:
Normal
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
20 Nov 2005 at 14:03 UTC
Updated:
20 Oct 2014 at 12:21 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
chx commentedWon't fix. Reason: postgresql.
Comment #2
killes@www.drop.org commentedWouldn't it be feasible to move the relevant (parts of) functions to the db specific include files?
Comment #3
Wesley Tanaka commentedMy mistake about the printf style arguments not getting passed. The same arguments get passed to the count query as to the main query. But this would still be nice to have when the query (and the needed count query) is an expensive join.
Comment #4
magico commentedComment #5
FiReaNGeL commentedBeen looking at that issue and I just stumbled upon this (old) thread.
@killes: yes, it would be doable (and be a Very Good Thing (tm)). However, it would require a rewrite of pager_query. Right now, it's doing the queries backward. To use SQL_CALC_FOUND_ROWS, you must do the count 'query' (actually, just retrieve the precalculated row numbers) after the actual query. Right now, pager_query is doing the COUNT(*) query, then using the result to determine the limit of the subsequent query. On large databases, it gets ugly very fast, as COUNT(*) is inefficient. I have > 2 seconds COUNT queries on my test box with > 500 000 nodes.
The good way of doing it would be to do the query using SQL_CALC_FOUND_ROWS (if mysql, as pgsql would still use to old way of doing it) to determine the number of totals rows. But you'd need to know in advance which limit to use on the original query, so you'd need to know in advance on which page you're at (we have $page for that, and if it doesn't exist, we can assume page = 1). That's where i'm at right now, trying to figure out what
$pager_page_array[$element] = max(0, min((int)$pager_page_array[$element], ((int)$pager_total[$element]) - 1));
is for.
It would also be nice to detect if results == 0, and then redo the query for page 1. In case someone bookmarks a page which doesn't exist anymore.
Comment #6
chx commentedThe current pager_query makes an effort to make sure the pager being displayed is somewhat correct. If someone however deletes a couple nodes from the view you are currently on, you still can click onto a page which does not exist. My patch, therefore, does not sweat over something that's futile anyways. The number of pages will be updated after the query runs, not before. But, it runs one less query for MySQL. The price we pay is the small increase in the chance of returning an ermpty page, but I am emphasizing again, this chance was never zero and it's still very small.
Comment #7
killes@www.drop.org commentedThe code looks very nice. It gets rid of one of the cranky regexps that used to give us trouble once in a while and replaces it by a much easier one. No test done.
Comment #8
dries commentedWhy does pager.inc call a private function (i.e. _db_paged_query())?
Comment #9
FiReaNGeL commentedNow this is strange. I benchmarked this on my desktop (MySQL 5.0.11). I created a bunch of nodes(70000) using devel generate-content, then benchmarked the 'administer->content management-> post' page.
MySQL query cache OFF (more on that later)
Before the patch:
SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT D runs in average 1346.906ms stdev 158.538
SELECT COUNT(*) FROM node n INNER JOIN users u ON n.uid = u.uid runs in average 27ms
After the patch:
SELECT SQL_CALC_FOUND_ROWS n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT D runs in 6300ms?
The FOUND_ROWS query runs very fast as expected.
This is a very strange result, I'll do more testing on my own, but additional benchmarkers would be required. It is unexpected I think; i'll bench other pages too to see if its query dependent (maybe the mysql query optimizer just choked on that one with SQL_CALC_FOUND_ROWS?).
Now, more problems. It seems that the patch actually broke the pager, as pager links don't appear anymore. Only the links arent displayed anymore; i can access page 2 just fine with ?page=2.
To make things worse, it appears that in some cases SQL_CALC_FOUND_ROWS queries aren't cached by the mysql query cache, resulting in heavily degraded performance. With the old implementation, the COUNT query would get cached and be very fast past the first user / page. Now, with the cache turned on, the SQL_CALC_FOUND_ROWS always take 6000ms on that page - it doesn't get cached. I don't get this problem with the front page pager_query; the SQL_CALC_FOUND_ROWS get cached correctly. I don't have an explanation for this behavior, as I didn't set a limit on the size of queries that can be cached.
One thing that is consistent about it though is that even if it cache the first page of results, on the second page, it will have to run the query again (can't be mysql query cached, because the query isn't 100% identical; the limit has changed). This problem didnt exist with the old implementation; the COUNT query was content and was cached correctly, resulting in fast page loads.
I really thought that this patch would help performance, but as of now, it seems to make things worse for multiple reasons. Right now, -1 from me, sadly :(
Comment #10
FiReaNGeL commentedAfter discussion with chx on IRC, we decided that the current system is superior to the SQL_CALC_FOUND_ROWS approach, if only because of mysql query cache yielding superior performance with the old COUNT approach.
Comment #11
hickory commentedI haven't tried the patch yet, but a test of SQL_CALC_FOUND_ROWS on a query I just tried yielded slightly different results to those described above (PHP 5, MySQL 5).
A query that normally took about 2 seconds (about 200,000 nodes), still took the same amount of time when SQL_CALC_FOUND_ROWS was added, and with MySQL's query cache turned on the result of the query using SQL_CALC_FOUND_ROWS was cached. This means that at least for some queries you could eliminate the extra time spent running a COUNT() query by using SQL_CALC_FOUND_ROWS.
The problem remains that queries with different LIMIT values aren't cached, so you don't get the benefit of a cached COUNT() query when paging through the results. It's debatable, but I'd say that the speed benefit for loading the first page outweighs the slowdown for subsequent pages, which aren't going to be accessed as often.
Comment #12
R.Muilwijk commentedSubscribing...
Comment #13
owahab commentedsubscribing
Comment #14
sinasalek commentedWe all know that mysql query cache is not smart. on modern interactive websites that are constantly updating. Mysql query cache can even slowdown the whole system.
On enterprise websites with complex queries, query cache is useless. Views for example generates lots of joins from many different tables and mysql is simply not capable of caching these sort of queries effectively
Comment #15
sinasalek commentedWork on this issue is continued here #778050: Add support for database hints and make PagerDefault properly pluggeable, please join in and helpout if you're interested