Displaying statistics pages can be slow. The attached patch removes one of the "GROUP BY" columns to increase the speed of generating the "Top pages in the past n days" page.

(Grouping by 'title' does not work, as there are many different paths that can have the same title. Instead, 'path' is unique for each page, so it is a more logical column to group by. I tested this on my active webpage to verify that the resulting page was what I expected.)

CommentFileSizeAuthor
#6 statistics_9.patch1.16 KBJeremy
statistics.module-cvs_0.patch615 bytesJeremy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

drumm’s picture

How about a key on the path column?

Dries’s picture

Status: Needs review » Fixed

Committed to HEAD. Marking this active until we clarifie the path index thing.

Dries’s picture

Status: Fixed » Active
Jeremy’s picture

In which case several of the columns should have keys. I was going to add it to my earlier patch, but haven't had time.

Cvbge’s picture

SQL code as in the patch won't work with postgresql:

dt=> SELECT COUNT(path) AS hits, path, title, AVG(timer) AS average_time, SUM(timer) AS total_time FROM accesslog GROUP BY path;
ERROR:  column "accesslog.title" must appear in the GROUP BY clause or be used in an aggregate function
Jeremy’s picture

FileSize
1.16 KB

The attached adds three keys that I confirmed are used. The keys are on path, url and uid.

Before adding a key for "path":

EXPLAIN SELECT a.aid, a.timestamp, a.url, a.uid, u.name FROM accesslog a LEFT JOIN users u ON a.uid = u.uid WHERE a.path LIKE 'node/1%%'

table	type	possible_keys	key	key_len	ref	rows	Extra
a	ALL					75	Using where
u	eq_ref	PRIMARY	PRIMARY	4	a.uid	1	

After:

table	type	possible_keys	key	key_len	ref	rows	Extra
a	range	path	path	256		3	Using where
u	eq_ref	PRIMARY	PRIMARY	4	a.uid	1	

And another query that gains from the "path" key:

EXPLAIN SELECT COUNT(DISTINCT(path)) FROM accesslog;

table   	type	possible_keys	key	key_len	ref	rows	Extra
accesslog	index		path	256		75	Using index

Before adding the key for "url":

EXPLAIN SELECT COUNT(DISTINCT(url)) FROM accesslog WHERE url <> '' AND url NOT LIKE '%%node%%';

le      	type	possible_keys	key	key_len	ref	rows	Extra
accesslog	ALL					75	Using where

And after:

table   	type	possible_keys	key	key_len	ref	rows	Extra
accesslog	index		url	256		75	Using where; Using index

Before adding the key for "uid":

EXPLAIN SELECT COUNT(DISTINCT(uid)) FROM accesslog;

Result
table   	type	possible_keys	key	key_len	ref	rows	Extra
accesslog	ALL					75	

After:

table   	type	possible_keys	key	key_len	ref	rows	Extra
accesslog	index		uid	5		75	Using index
Jeremy’s picture

Status: Active » Needs review
moshe weitzman’s picture

sure, these indices speed up the admin pages. but remember that every index needsb to be maintained for every insert. since accesslog is inserted into on every view, this is potentially harmful to a lot more people than admins. i'm not sure how to measure this tradeoff.

one approach is to copy the access log table to a read only table and do admin pages off of the copy. that means we have a copy dedicated to reading a different one dedicated to writing.

drumm’s picture

MySQL's insert delayed extension would be perfect for this. Unfortunately it is not ANSI SQL.

The inserts are done in the exit hook so the extra time is not usually passed on to the user (when drupal_goto() is used the exit hook execution happens before the redirect is sent). Although I'm guessing the total index maintenance time may be larger than the savings on the statistics pages.

killes@www.drop.org’s picture

Status: Needs review » Needs work

postgres part is missing.

Cvbge’s picture

I think this needs to be benchmarked. I think adding 3 indexes that need to be updated for every page access just for the sake of 1 admin-visible page is dubious.

Jaza’s picture

Version: x.y.z » 4.7.x-dev
Status: Needs work » Closed (fixed)

-1 from me. The accesslog table has a much heavier INSERT than SELECT rate on higher-traffic sites, and as such, we should have as few keys as possible on this table, in order to optimise it for fast INSERTing, not for fast SELECTing.

Closing issue (due to my personal -1, and due to extended inactivity).