This issue was created to consolidate comments and discussions about a possible solution for D6 users experiencing duplicate nodes in views and other listings. The discussion of this issue has been taking place in several issue threads, and this issue is a place to consolidate any discussion about the specific problem indicated below.

The problem

In node.module, node_db_rewrite_sql() adds a JOIN between node and node_access to implement node access restrictions. As the number of node_access entries for a single node grows, multiple node_access entries may match the JOIN conditions, thus resulting in duplicate rows. When multiple modules participate (such as og, domain_access, etc.), the problem can result in duplicates appearing in views, blocks, and other node listings where module implementers have not considered the possibility of such duplicates.

Solution

There is debate about the proper solution to this, and attempts have been made at the module level to solve this problem by introducing DISTINCT clauses to eliminate duplicates. However, my original belief was the problem should be solved at the core level by changing node_db_rewrite_sql() so that it uses subselects within an introduced WHERE clause instead of a JOIN. This was a pragmatic change for us to our own site, and solved the duplicate problem entirely.

Here is the simple replacement code we have been using:

/**
* Implementation of hook_db_rewrite_sql
*/
function node_db_rewrite_sql($query, $primary_table, $primary_field) {
  if ($primary_field == 'nid' && !node_access_view_all_nodes()) {
    $wsql = _node_access_where_sql();
    if ($wsql) {
      $return['where'] = '((SELECT COUNT(*) FROM node_access na WHERE '.$primary_table.
      '.nid = na.nid AND ('. $wsql .')) > 0)';
    }
    return $return;
  }
}

Attached is a complete patch for the node module in 6.15

Status and history

I am a relative newcomer to such changes. While working well for us, I believe such a patch is a dramatic change, and further input from more experience Drupal developers is required.

  • The issue was first introduced #580838: Duplicate nodes in views - how to eliminate them. The original problem had to do with an identical query problem in the domain_taxonomy module. As development for us progressed, we realized that the same problem existed in the node module and we decided to patch our system there.
  • @nonzero introduced the change as a possible solution in #284392: db_rewrite_sql causing issues with DISTINCT. However, that thread was long and complex and was really attempting to solve a problem with db_distinct_fields() which manifested itself in similar ways.
  • This issue was opened as a place to discuss the node.module patch above, whether it is suitable and a reasonable solution to the problem for D6 users.

If you believe the solution I proposed warrants further consideration, please indicate as such. I am especially interested in whether there are D7 migration issues introduced by such a change.

CommentFileSizeAuthor
#194 681760-node-access-d6-194.patch776 bytespuddyglum
#185 681760-multi-realm-access-d7-185-tests.patch4.33 KBNiklas Fiekas
#185 681760-multi-realm-access-d7-185.patch7.57 KBNiklas Fiekas
#183 681760-multi-realm-access-d7-183-tests.patch4.24 KBNiklas Fiekas
#183 681760-multi-realm-access-d7-183.patch7.49 KBNiklas Fiekas
#178 681760-multi-realm-access-178-tests.patch4.28 KBNiklas Fiekas
#178 681760-multi-realm-access-178.patch7.54 KBNiklas Fiekas
#174 node-access-before-and-after.png14.26 KBpuddyglum
#172 681760-d7.patch7.45 KBxjm
#164 681760-comment-pager-test-only-2.patch4.24 KBNiklas Fiekas
#164 681760-comment-pager-test-combined-2.patch7.51 KBNiklas Fiekas
#163 681760-comment-pager-test-only.patch3.08 KBxjm
#163 681760-comment-pager-test-combined.patch6.34 KBxjm
#162 681760-post-apocalypse.patch3.26 KBxjm
#156 681760-node-access-d8-new.patch3.24 KBagentrickard
#153 681760-node-access-working-D7.patch3.24 KBagentrickard
#152 681760-node-access.patch3.3 KBagentrickard
#149 681760-node-query.patch4.47 KBagentrickard
#148 681760-node-access-working-D7.patch3.24 KBagentrickard
#142 681760-node-access-working-D7.patch3.22 KBagentrickard
#140 681760-node-access-working-D7.patch2.9 KBagentrickard
#139 681760-node-access.patch2.88 KBagentrickard
#138 681760-node-query-D8.patch4.38 KBagentrickard
#127 681760_node_access-127-D6.patch742 bytespwolanin
#125 681760_node_access_d6_1.patch742 bytesJosh Waihi
#119 681760_node_access_d6.patch750 bytescatch
#117 681760_node_access_d6.patch737 bytescatch
#114 681760_node_access_d6.patch729 bytescatch
#112 beforepatch.png46.06 KBpuddyglum
#112 withpatch.png46.32 KBpuddyglum
#101 Screen shot 2010-12-31 at 12.46.56 PM.png30.51 KBagentrickard
#99 681760-node-query_1.patch5.37 KBagentrickard
#98 Screen shot 2010-12-31 at 12.46.56 PM.png30.51 KBagentrickard
#97 Screen shot 2010-12-31 at 12.36.17 PM.png32.21 KBagentrickard
#92 Picture 1.png45.6 KBagentrickard
#83 681760-node-query_1.patch5.34 KBagentrickard
#82 681760-node-query_1.patch6.26 KBagentrickard
#81 681760-node-query_1.patch5.28 KBagentrickard
#75 681760-node-query.patch4.31 KBchx
#74 681760-node-query.patch4.51 KBagentrickard
#72 681760-node-query.patch3.38 KBagentrickard
#70 681760-nosq-query.patch3.28 KBagentrickard
#66 681760-node_access_subselects_0_0.patch662 bytesagentrickard
#61 681760-node_access_subselects.patch657 bytesNaX
#37 681760-comment-3149138-node_access_subselects.patch728 bytesJosh Waihi
#21 681760-node_access_subselects.patch1.54 KBJosh Waihi
#7 681760-node_access_subselects.patch1.54 KBJosh Waihi
node_access_subselects_0615.patch1.54 KBgarywiz
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

izmeez’s picture

subscribing

crea’s picture

Subscribing

t.lan’s picture

subsribing

jide’s picture

subscribe

joeredhat-at-yahoo.com’s picture

Subscribing

ibis’s picture

subscribing

Josh Waihi’s picture

Version: 6.15 » 6.x-dev
Status: Active » Needs review
FileSize
1.54 KB

I like this implementation. I've tested and found it to work so far. But we should get more reviews since this is an alteration that is very much in the core of Drupal. It has major effects implications if such a fix were to be wrong.

I've attached the same patch with a better patch diff to make it easier to apply.

ptorrsmith’s picture

Subscribing... will look to utilise on a project which has that issue currently.

ptorrsmith’s picture

it worked a treat!

Saved Josh Waihi's patch and ran

patch -p0 < 681760-node_access_subselects.patch

OUTPUT...
patching file modules/node/node.module
Hunk #1 succeeded at 2068 (offset 9 lines).
Hunk #2 succeeded at 2159 (offset 9 lines).

and sure enough my duplicates disappear :-)

Of course not 100% sure it doesn't affect other parts of the site, so will be keeping an eye out for that.

Thanks.

Peter

smzur22’s picture

subscribing

webservant316’s picture

Category: bug » support

subscribing

It would be nice to see some comments from Drupal core developers about this patch.
Patching core does seem to be a serious matter.

Does the Drupal core team know about this thread? Is a fix underway for 6.15+ or 7.0?

Josh Waihi’s picture

I work with Drupal Core (mainly Drupal 7). Drupal 7 doesn't allow you to rewrite sql with regexes, it has a proper abstraction layer. Would be a good idea to get more core people to look at this though

webservant316’s picture

Josh - Thanks for responding. Thanks also if you could recruit comment from more core people.

I must chose either this patch or likely the patch #397 at http://drupal.org/node/284392 in order to solve this duplicate node problem in my views. Unfortunately I don't have the time to dive in and thoroughly understand all dimensions of this issue personally. From as much as I have read the patch proposed on this thread seems the better, if only because his argument is that the use of DISTINCT to solve the duplication error seems like a kludge and just the wrong way to solve the problem. Somehow the original query must have the sensibility to know what it is after in the first place. In my case it is clear that the duplication is caused by a join of the node table with tables in the TAC module and Role module, because the number of duplicates match the number of roles existing for whatever user I am logged in as. I don't know where the error is, whether at the core or the module level, but the original query must be smart enough to know that one positive test for permission should give me one node record, not a record for each role I am in that tests positive for permission.

In the end I just need to figure out which patch the Drupal team agrees is the best. I don't want to chose a patch that complicates or even ruins my upgrade path to Drupal 7.0 or upgrade paths needed for Views or TAC.

So again this patch seems to be the better with my limited understanding, but what does the Drupal team say?

webservant316’s picture

Category: bug » support
Priority: Critical » Normal

A summary of my research into the duplicate nodes in Views bug.

1. When I saw this bug I started a post at http://drupal.org/node/715074 titled 'Node view filtered by taxonomy term results in duplicates by user roles!?'. Someone directed me to another post...

2. So I joined the mega post at http://drupal.org/node/284392 titled 'db_rewrite_sql causing issues with DISTINCT'. Various patches were suggested with the leader seeming to be #397. I was ready to install the patch when another person directed me to yet another thread.

3. So I also joined this post here, http://drupal.org/node/681760, titled 'Eliminating duplicate nodes caused by node_access table joins'. Frankly this guys patch seemed to make more sense to me using the sub-select instead of the DISTINCT to get rid of duplicates which shouldn't have been there in the first place.

4. While meditating over this issue with extremely limited understanding it seemed to me that duplicates are being created in the Views module because joins of various nature are included in the query which the views module does not seem to be aware of. In my case I figured out that my node table was joined with tables from TAC and roles because the number of duplicates matched the number of roles that my current user was linked to. The error here, again, is that the test for permission to access the node should result in one node returned, not one node for each role I am a member of.

5. And while waiting to hear back from my posts on this issue, I had to continue with my work and while creating yet another View I noticed that the Views interface provides the option of 'distinct' yes/no. Curious. The note by this option says 'This will make the view display only distinct items. If there are multiple identical items, each will be displayed only once. You can use this to try and remove duplicates from a view, though it does not always work. Note that this can slow queries down, so use it with caution.' It seems to me that the author of the Views module recognized that sometimes duplicates would appear because of mysterious joins or whatever beyond their control and so they simply added the distinct flag to deal with the issue. This appears to be a brute force method to deal with an underlying flaw, and their comment is unsetttling, maybe it will work and maybe it will not. In the end I simply used this flag, inspite of the performance warnings, and all the duplicates disappeared in my Views without any patch. This seemed safer for now rather than patching core. As for performance issues, I can understanding the warning because in my case my Views are reading NODE X USER-ROLES instead of just NODE number of records. Whereever the fix for this is applied it should not use DISTINCT to remove duplicates caused by a join of the node table with a permissions access table because this will result in asking the database to crunch multiple rows of the same node every single time I access a node. Sure a database might be fast enough to conceal the fact, but the query needs to be designed more thoughtfully to test for permission and then ask for one copy of the node. This raises the interesting dilema that though this fix should be in the core, at least in my case the problem is caused by TAC which is not core. I don't understand Drupal well enough to know where the demarkations of abstractions are, but this bug seems to crash a boundary between core and the contributed TAC module. After testing many access modules I settled on TAC and love it. Perhaps it should be added to core.

6. I am thinking that using this distinct flag within the View interface itself might be about effectively the same as using #397 anyway, so I plan to just use the flag instead of the patch unless someone explains something differently. Yet, the sub-select solution here at http://drupal.org/node/681760 seems to be headed in the direction of a solution for integration into the Drupal core.

7. Last question...I only observed the duplicates in my use of the Views module and so using the 'distinct' flag in the Views module cured my problem. However, do the patches in the threads above deal with possible duplication in others areas as well or is this problem limited to the Views module? This question was answered. Yes the bug effects all locations a node is accessed, not just Views. In my case Views was the only place I noticed the problem, so the Views flag solves my problem for now without a core patch and so I will stick with that until core developers work through the all the issues and propose the solution with their better understanding.

So my Views problem is solved for now, but am I hoping that a solution is integrated into Drupal core quickly to restore performance back to the optimum.

Josh Waihi’s picture

Category: support » bug
Priority: Normal » Critical

bumping this up to critical to get some attention from Drupal 6 core guys. I justify this as a bug and critical since it is core that is breaking core functionality identified by contrib.

advseb’s picture

subscribing

Skorpjon’s picture

subscribing.

The replacement in node.module works for me.

benoitg’s picture

subscribing

xjm’s picture

Tracking.

taka’s picture

subscribing

Josh Waihi’s picture

fixed db_prefixing issue.

gengel’s picture

Category: support » bug
Priority: Normal » Critical

Subscribing

webservant316’s picture

Little surprised that we are not getting comments from the Drupal 6 guys or am I missing them.

Anticosti’s picture

Subscribing

martti’s picture

i tried this and still working..

pwolanin’s picture

Drupal 6 requires MySQL 4.1 or higher, so this seems like a reasonable fix and the logic makes sense.

izmeez’s picture

tested patch in #21 on Drupal 6.15 with test bed exhibiting duplicates and it solves the duplicates there.

I am just a novice but this fix seems to make sense as possibly a better solution than the fix in /includes/database.* files which I believe was committed to Drupal 6.16

It would be nice to hear what the experts think of this.

Thanks,

Izzy

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

@izmeez can you point me in the direction of the issue that made the commit?

izmeez’s picture

Josh,

According to 6.16 release notes the following was added:
http://drupal.org/node/284392, "db_rewrite_sql causing issues with DISTINCT"

EDIT: I have been using the "distinct" patch since the problem became evident in September 2009, but wanted to test the "subselects" patch in this issue for myself which I reported on in #27 and it works for me. Now, I look to see what the database gurus figure is the best. Meanwhile, I will just use Drupal 6.16 as is.

Izzy

ducdebreme’s picture

subscribing

EurekaLiz’s picture

This worked perfectly, thank you.

dabro’s picture

subscribing

dtoshack’s picture

subscribing

tobey_p’s picture

I still had some issues with dupes after upgrading to drupal 6.16 (Domain access active).
I needed Views with the module "Views Group By" to enable the SQL Aggregation Function Count.
Then turning on Distinct broke the whole query.

I read through a lot of related posts and was finaly directed here. The patch from 22 fixed the issues for me (so it seems after some checking). Additionally i have the feeling that some complex Views are rendering faster now.

I've to check through my whole Site and maybe do some benchmarking for my views.

Thanks alot for the patch and cheers,

tobey

sethcohn’s picture

Another +1 for the patch in #21.

Upgraded a site to Drupal 6.16 from a version much earlier (6.6), using workflow modules, (everything works prior to upgrades) and anonymous user trying to hit a page suddenly caused 360 seconds timeouts, (while user 1 and node admins surfed it just fine, due to never triggering a node access check) which I tracked down to the node_access sql call (aka the infamous Distinct issue), and despite the recent patches to core up thru 6.16, it's clear this patch is also needed. Removing workflow to help debug this, showed that it was clearly the node access checking thru multiple levels that causes the problem.

Installing the above patch restored the site to speedy goodness and no timeouts.

Gábor Hojtsy’s picture

Status: Reviewed & tested by the community » Needs work

Looking at the latest code posted:

1. This removes a private looking function. It is only invoked from one place in core but is clearly written in a way to be reused (ie. it has arguments not passed from its current invocation in core). Don't think we should break our API here.
2. The relation of this patch to the other DISTINCT node access issue was not explained (or I did not find it).
3. The patch has coding style issues with whitespace and concatenation.

Josh Waihi’s picture

Status: Needs work » Needs review
FileSize
728 bytes

1. Yup, thats what functions are for, to be re-used. Nevertheless, it is intended to be a private function. As in not used by other modules. While I do think the function should be moved. It only creates unused space not being there.
2. I guess this patch need re-evaluating considering the issue as apparently been taken care of.
3. attached is the patch with betting coding style.

pwolanin’s picture

Well, current core may work in most cases, I think this is still a better approach.

Gábor Hojtsy’s picture

@Josh Waihi: You are saying its not used by other modules. What methodology did you use to check that none of the thousands of Drupal 6 modules use this function in any of their branches?

pwolanin’s picture

@Gabor - the last patch keeps that function, just in case it's used elsewhere.

Josh Waihi’s picture

Status: Needs review » Reviewed & tested by the community

@Gabor, I said it is intended, I didn't say it wasn't. IMO, if a module uses that function, it shouldn't because it was suppose to be private. OO Drupal++

eltrufa’s picture

subscribe

xjm’s picture

KarenS’s picture

I ran into some problems that were fixed by this patch and not fixed by the DISTINCT patch that already went into core. In my case it was the combination of an aggregate query of nodes protected by Organic Groups. Regular views were fine, but views queries that did COUNT() or SUM() were all botched if viewed by regular users.

The patch fixed the problems I was seeing and did not cause any new problems anywhere that I could see.

Damien Tournoud’s picture

Version: 6.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

This definitely sounds like a good idea.

*But*, we fix bugs in the development version of Drupal, before backporting them. Because the query is essentially the same in Drupal 7, we should first work it out there, test the hell out of it and *after that* consider a backport to Drupal 6.

Randomly applying patches to Drupal 6 is what put us in this distinct mess in the first place :)

KarenS’s picture

Um, there is no 'node_db_rewrite_sql' in D7. This is not a problem in D7, at least it is handled in a completely different way in D7, so it only applies to D6 and earlier.

However I agree that there should be more than one report that it works. I have moved the patch to a production site where I was having trouble and so far it works and nothing else has emerged as a problem.

I didn't change the version back in case I am missing something, but I think there is no D7 applicability here.

Berdir’s picture

D7 might work differently, but the end result is the same. It does use DISTINCT as well...

http://api.drupal.org/api/function/node_query_node_access_alter/7 is D7's node_db_rewrite_sql...

izmeez’s picture

@KarenS Did you have an opportunity to test without the Distinct patch (Drupal 6.13 to 6.15) and with the Join patch to determine if the Join patch by itself is sufficient and possibly better?

KarenS’s picture

I haven't had time to look into what would happen without the other patch.

georgh’s picture

I have applied a modified patch #21 (see below) in a production site, it works fine. I think it would make more sense to use exists instead of counting the returned elements, as it states more clearly what is done. Although it is irrelevant in this case, using exists over count(*)>0 may perform better (it is like having an 'exit loop' instruction when finding an array element in a loop, http://sqlblog.com/blogs/andrew_kelly/archive/2007/12/15/exists-vs-count..., http://www.sqlmag.com/article/tsql3/exists-vs-count-.aspx).

This patch (#21) is the right thing to do, because instead of eliminating duplicates in the distinct (in the original function), which is an assumption that is not true for some queries, it does not generate them in the first place. Instead, the query only selects the correct nodes once, which is the intended purpose of the query rewrite.

My suggestion is to change

$return['where'] = '((SELECT COUNT(*) FROM node_access na WHERE ' . $primary_table . '.nid = na.nid AND ('. $wsql .')) > 0)';

to

$return['where'] = 'EXISTS (SELECT * FROM {node_access} na WHERE ' . $primary_table . '.nid = na.nid AND (' . $wsql . '))';

(I'm sorry for not attaching a proper patch, but I don't know how to use CVS)

xjm’s picture

#50: You don't need to use CVS. Use diff.
http://drupal.org/patch/create

lsiden’s picture

Friends, getting rid of "DISTINCT" has another benefit: it makes the query run much faster. I wrote this post: http://drupal.org/node/924660, before I found this thread. I'm not sure why, but including DISTINCT in a query stops the mySQL engine from using the index on 'nid', the node primary key. As I mentioned in my post, I've become responsible for a site with 19M nodes, and running this query can take hours when the index on nid is not used. (When the index on 'nid' is used, it take milliseconds.)

Doing a bit more research I see that on the D6 version of my site, the node_access table is empty, while on the D5 version, it has over 20M rows. So something got lost when I upgraded. I'm wondering whether dumping 20M nodes will fix this, or whether I should just enter a single row (nid=0, gid=0, realm=all, grant_view=1, grant_update=0, grant_delete=0).

srcsantos’s picture

I had a problem with taxonomy term count: it would double the count for any user other than user1.
At first, i changed the taxonomy module query to use "DISTINCT", but after reading this thread i reverted the modification and applied a patch made from #21 and #50 to my 6.19 Drupal, and everything seems fine now. I'm not a MySQL expert, but with my programming experience #50 seems to make sense.
Since i'm not so good with diff, here's the modification i made to the node module:

function node_db_rewrite_sql($query, $primary_table, $primary_field) {
  if ($primary_field == 'nid' && !node_access_view_all_nodes()) {
    $return['join'] = _node_access_join_sql($primary_table);
    $return['where'] = _node_access_where_sql();
    $return['distinct'] = 1;
    return $return;
  }
}

became

function node_db_rewrite_sql($query, $primary_table, $primary_field) {
  if ($primary_field == 'nid' && !node_access_view_all_nodes()) {
    $wsql = _node_access_where_sql();
    if ($wsql) {
      $return['where'] = 'EXISTS (SELECT * FROM {node_access} na WHERE '.$primary_table.
      '.nid = na.nid AND ('. $wsql .'))';
    }
    return $return;
  }
}

I know hacking core is not such a good idea, but this was the best solution i could find, and both my head and the wall were starting to resent the banging :).

webchick’s picture

Status: Patch (to be ported) » Needs review

Weird. Why is this listed as Patch (to be ported)?

webchick’s picture

Status: Needs review » Needs work

oh. because it is.

Weird, this doesn't show up in the critical issue block at that status, so moving to "needs work"

chx’s picture

Priority: Critical » Major

http://drupal.org/node/45111 says "When a bug renders a system unusable (not being able to create content or upgrade between versions, blocks not displaying…), and security vulnerabilities." we have already established, many many times that any bug present in previous Drupal versions can not be critical. Well, my friends this is with us for six years since 4.5 (six years and five days if you want to be precise) so how on earth is this critical?

KarenS’s picture

This was created as an issue to fix a problem in D6. Once a possible solution was found, instead of getting it applied to D6 it got changed to a D7 issue. It really has always been a critical issue that just never got solved and has just gotten bumped from version to version. It still exists in D6 because it has not yet been fixed in D7, so it's not really fair to say it cannot be critical in D7 because it exists in D6.

I don't know if it's 'critical' or 'major' in D7, but it's a nasty issue in D6 and earlier versions that has bitten me again and again. It's not critical at all if you don't use node access with table joins. If you do, it is.

reuel’s picture

subscribing

bitsgrecco’s picture

Version: 7.x-dev » 6.9

Subscribing

cdale’s picture

Version: 6.9 » 7.x-dev

This should be 7.x as per #45.

NaX’s picture

Status: Needs work » Needs review
FileSize
657 bytes

I came across this issue while using Views Calc on a 6.16 site. Patch #21 worked for me and I also implemented the suggestion in #50.

From what I can tell the EXISTS function is well supported across other database systems. I don't use SQLite or PGSQL, if other users can confirm its support I vote for using the EXISTS method.

The attached patch is a against D6 dev. I don't know about D7 somebody part of the D7 development effort will need help here.

I saw there were some discussions on the removal of _node_access_join_sql(). This patch does not remove the private function _node_access_join_sql().

Status: Needs review » Needs work

The last submitted patch, 681760-node_access_subselects.patch, failed testing.

NToronto’s picture

Status: Needs work » Needs review

node_access_subselects_0615.patch queued for re-testing.

agentrickard’s picture

And.....tracking with no useful comment. Sorry.

pwolanin’s picture

Seems like EXISTS may work for mysql and postgres, and for sqlite >= 3.1 http://www.sqlite.org/cvstrac/wiki?p=UnsupportedSql.

agentrickard’s picture

Revised for D6. Working on D7.

We don't want SELECT *. SELECT na.nid is preferable.

agentrickard’s picture

For reference on the D6 version. Note the very ugly temporary table creation we can remove. Also fixed incorrect pagination results.

Unpatched core.

mysql> EXPLAIN SELECT DISTINCT n.nid, n.sticky, n.created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))) AND ( n.promote = 1 AND n.status = 1 )ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10
    -> ;
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys                           | key            | key_len | ref                    | rows | Extra                                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_frontpage,node_status_type | node_frontpage | 8       | const,const            |    1 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | na    | ref  | PRIMARY                                 | PRIMARY        | 8       | drupal_cvs.n.nid,const |    1 | Using where; Distinct                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------------+------+-----------------------------------------------------------+
2 rows in set (0.03 sec)

Patched.

mysql> EXPLAIN SELECT n.nid, n.sticky, n.created FROM node n WHERE (EXISTS (SELECT na.nid FROM node_access na WHERE n.nid = na.nid AND (na.grant_view >= 1 AND ((na.gid = 0 AND na.realm = 'all') OR (na.gid = 0 AND na.realm = 'domain_site') OR (na.gid = 0 AND na.realm = 'domain_id'))))) AND ( n.promote = 1 AND n.status = 1 )ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 10;
+----+--------------------+-------+------+--------------------------------------+---------------------+---------+-----------------------+------+-----------------------------+
| id | select_type        | table | type | possible_keys                        | key                 | key_len | ref                   | rows | Extra                       |
+----+--------------------+-------+------+--------------------------------------+---------------------+---------+-----------------------+------+-----------------------------+
|  1 | PRIMARY            | n     | ref  | node_promote_status,node_status_type | node_promote_status | 8       | const,const           |   94 | Using where; Using filesort |
|  2 | DEPENDENT SUBQUERY | na    | ref  | PRIMARY                              | PRIMARY             | 8       | drupal619.n.nid,const |    1 | Using where                 |
+----+--------------------+-------+------+--------------------------------------+---------------------+---------+-----------------------+------+-----------------------------+
2 rows in set (0.00 sec)

Status: Needs review » Needs work

The last submitted patch, 681760-node_access_subselects_0_0.patch, failed testing.

agentrickard’s picture

It doesn't appear the DBTNG supports the EXISTS syntax. But it handles IN.

See http://drupal.org/node/310086

Subselects are generally useful only in two cases: Where the subselect results in only a single row and value returned and the operator is =, <, >, <=, or >=; or when the subselect returns a single column of information and the operator is IN. Most other combination would result in a syntax error.
agentrickard’s picture

Status: Needs work » Needs review
FileSize
3.28 KB

Drupal 7 version, using IN instead of EXISTS.

This likely still needs work, as I don't understand what the entity parts of the existing code are supposed to do.

Status: Needs review » Needs work

The last submitted patch, 681760-nosq-query.patch, failed testing.

agentrickard’s picture

Status: Needs work » Needs review
FileSize
3.38 KB

This patch works for everything except the Entity query test, which I do not understand.

Status: Needs review » Needs work

The last submitted patch, 681760-node-query.patch, failed testing.

agentrickard’s picture

Status: Needs work » Needs review
FileSize
4.51 KB

After some encouragement and coaching from chx in IRC, I get green tests locally for this one.

chx’s picture

FileSize
4.31 KB

The whole // Attach conditions to the subquery for nodes. part is basically duplicate code.

chx’s picture

Status: Needs review » Reviewed & tested by the community

And although it's my patch that's last, it basically agentrickard's I just rearranged it a little to avoid code duplication so I feel OK with RTBC :) especially that i have no idea who else would touch this without a hazmat suit :P

agentrickard’s picture

Status: Reviewed & tested by the community » Needs review

Well, before we run off RTBC-ing this, I do have some questions, which I think I know the answers to:

1) Does it fix the duplicate nodes problem? Yes.
2) Does it actually make the query faster by preventing temporary tables? Yes, I believe so....
3) Do we need to alter the pager_query logic, which currently will run two subselects. That might not be needed any more with this approach. I don't know.
4) Does it handle the multiple node table condition noted in the comments when we foreach() through the table list for the query? I don't know that the tests cover that condition.

In my brief tests with Domain Access, this works just fine, and it passes existing tests, but we've been down this road before on this issue....

Can we get a little review on these issues? @pwolanin, I'm looking at you.

We may also need to split the EXISTS / DBTNG issue into a new one.

pwolanin’s picture

Status: Needs review » Needs work

DIscussing in IRC - we really do not want an IN query.

We should ideally fix DBTNG so it supports 'EXISTS' as a condition.

If not, perhaps the subquery can be like (SELECT 1 FROM node_access na WHERE ... LIMIT 0,1)?

agentrickard’s picture

Status: Needs work » Needs review
agentrickard’s picture

Status: Needs review » Needs work

A quick check of the two queries in D7:

With patch:

mysql> EXPLAIN SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND (n.nid IN (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
| id | select_type        | table | type           | possible_keys                   | key            | key_len | ref         | rows | Extra                    |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
|  1 | PRIMARY            | n     | ref            | node_frontpage,node_status_type | node_frontpage | 8       | const,const |    8 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | na    | index_subquery | PRIMARY                         | PRIMARY        | 4       | func        |    1 | Using where              |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

Without patch:

mysql> EXPLAIN SELECT DISTINCT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (promote = '1') AND (status = '1') AND(( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys                           | key            | key_len | ref              | rows | Extra                                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_frontpage,node_status_type | node_frontpage | 8       | const,const      |    8 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | na    | ref  | PRIMARY                                 | PRIMARY        | 4       | drupal_cvs.n.nid |    1 | Using where; Distinct                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

So the patch, even non-optimized, is much better in this initial test, removing a filesort and a temporary table.

agentrickard’s picture

Status: Needs work » Needs review
FileSize
5.28 KB

After banging on this in IRC, chx gives us a way to handle EXISTS cleanly.

This may also solve #1001242: DBTNG support for EXISTS conditions.

With patch:

mysql> EXPLAIN SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') ) ) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys                   | key            | key_len | ref         | rows | Extra                    |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+------+--------------------------+
|  1 | PRIMARY     | n     | ref  | node_frontpage,node_status_type | node_frontpage | 8       | const,const |    8 | Using where; Using index |
|  2 | SUBQUERY    | na    | ALL  | NULL                            | NULL           | NULL    | NULL        |   23 | Using where              |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

agentrickard’s picture

FileSize
6.26 KB

Forgot the NOT EXISTS from chx's pastebin. Fixes comment, too.

agentrickard’s picture

FileSize
5.34 KB

Goofed that one by not saving the new comments. Sigh.

agentrickard’s picture

So here are some EXPLAIN queries, which I believe suggest the IN is better than EXISTS here.

EXISTING QUERY:

ysql> EXPLAIN SELECT DISTINCT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (promote = '1') AND (status = '1') AND(( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0
    -> ;
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys                           | key            | key_len | ref              | rows | Extra                                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_frontpage,node_status_type | node_frontpage | 8       | const,const      | 4727 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | na    | ref  | PRIMARY                                 | PRIMARY        | 4       | drupal_cvs.n.nid |    1 | Using where; Distinct                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
2 rows in set (0.00 sec)

USING EXISTS:

mysql> EXPLAIN SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
| id | select_type | table | type | possible_keys                   | key            | key_len | ref         | rows  | Extra                    |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
|  1 | PRIMARY     | n     | ref  | node_frontpage,node_status_type | node_frontpage | 8       | const,const |  4727 | Using where; Using index |
|  2 | SUBQUERY    | na    | ALL  | NULL                            | NULL           | NULL    | NULL        | 15037 | Using where              |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
2 rows in set (0.00 sec)

USING IN:

mysql> EXPLAIN SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND (n.nid IN (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
| id | select_type        | table | type           | possible_keys                   | key            | key_len | ref         | rows | Extra                    |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
|  1 | PRIMARY            | n     | ref            | node_frontpage,node_status_type | node_frontpage | 8       | const,const | 4727 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | na    | index_subquery | PRIMARY                         | PRIMARY        | 4       | func        |    1 | Using where              |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

Note that this is stock MySQL on locahost, with query caching. Testing on Domain Access with 10,021 nodes and 15,311 records in {node_access}.

agentrickard’s picture

In my tests, the pager query itself is running in < .01 ms, the problem is the pager COUNT query, which is 40+ ms.

pwolanin’s picture

@agentrickard - that's for the original query or the new query? If find it very hard to believe that the original one with the temp table could ran that fast.

agentrickard’s picture

I think the above tests all use query caching.

agentrickard’s picture

Here they are without cache:

ORIGINAL

mysql> EXPLAIN SELECT SQL_NO_CACHE DISTINCT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (promote = '1') AND (status = '1') AND(( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys                           | key            | key_len | ref              | rows | Extra                                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | n     | ref  | PRIMARY,node_frontpage,node_status_type | node_frontpage | 8       | const,const      | 8379 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | na    | ref  | PRIMARY                                 | PRIMARY        | 4       | drupal_cvs.n.nid |    1 | Using where; Distinct                                     |
+----+-------------+-------+------+-----------------------------------------+----------------+---------+------------------+------+-----------------------------------------------------------+
2 rows in set (0.05 sec)

USING EXISTS

mysql> EXPLAIN SELECT SQL_NO_CACHE n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
| id | select_type | table | type | possible_keys                   | key            | key_len | ref         | rows  | Extra                    |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
|  1 | PRIMARY     | n     | ref  | node_frontpage,node_status_type | node_frontpage | 8       | const,const |  8379 | Using where; Using index |
|  2 | SUBQUERY    | na    | ALL  | NULL                            | NULL           | NULL    | NULL        | 33420 | Using where              |
+----+-------------+-------+------+---------------------------------+----------------+---------+-------------+-------+--------------------------+
2 rows in set (0.00 sec)

USING IN

mysql> EXPLAIN SELECT SQL_NO_CACHE n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND (n.nid IN (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0;
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
| id | select_type        | table | type           | possible_keys                   | key            | key_len | ref         | rows | Extra                    |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
|  1 | PRIMARY            | n     | ref            | node_frontpage,node_status_type | node_frontpage | 8       | const,const | 8379 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | na    | index_subquery | PRIMARY                         | PRIMARY        | 4       | func        |    1 | Using where              |
+----+--------------------+-------+----------------+---------------------------------+----------------+---------+-------------+------+--------------------------+
2 rows in set (0.00 sec)

nnewton’s picture

subscribe

Danny Englander’s picture

Subscribing

agentrickard’s picture

@nnewton

Here's the query that worries me now. This is the COUNT for the pager, using EXISTS.

mysql> EXPLAIN SELECT SQL_NO_CACHE COUNT(*) AS expression FROM (SELECT 1 AS expression FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ) subquery;
+----+-------------+-------+------+---------------------------------+----------------+---------+------+-------+------------------------------+
| id | select_type | table | type | possible_keys                   | key            | key_len | ref  | rows  | Extra                        |
+----+-------------+-------+------+---------------------------------+----------------+---------+------+-------+------------------------------+
|  1 | PRIMARY     | NULL  | NULL | NULL                            | NULL           | NULL    | NULL |  NULL | Select tables optimized away |
|  2 | DERIVED     | n     | ref  | node_frontpage,node_status_type | node_frontpage | 8       |      |  8355 | Using index                  |
|  3 | SUBQUERY    | na    | ALL  | NULL                            | NULL           | NULL    | NULL | 33442 | Using where                  |
+----+-------------+-------+------+---------------------------------+----------------+---------+------+-------+------------------------------+
3 rows in set (0.01 sec)
agentrickard’s picture

FileSize
45.6 KB

And a Devel capture, to illustrate the point, this is with query cache for ANON user.

pwolanin’s picture

The query cache won't change the EXPLAIN result

agentrickard’s picture

;-p. Well, it still pegs at ~5.4 ms in Devel on every page load.

pwolanin’s picture

I think that's seconds not ms?

Why does the count query have an extra sub-select? I see you mention that in #77 - can we eliminate that?

agentrickard’s picture

I believe it's because the query count method itself adds it, to try to...wait for it...prevent duplicate query counts.

Alas, I did not get far digging into the OO that builds the COUNT query,

agentrickard’s picture

Query capture with mysql> SET GLOBAL query_cache_type = OFF; set.

On the command line, these queries run fine:

mysql> SELECT COUNT(*) AS expression FROM (SELECT 1 AS expression FROM node n WHERE (promote = :db_condition_placeholder_0) AND (status = :db_condition_placeholder_1) AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = :db_condition_placeholder_2) AND (na.realm = :db_condition_placeholder_3) )OR( (na.gid = :db_condition_placeholder_4) AND (na.realm = :db_condition_placeholder_5) )OR( (na.gid = :db_condition_placeholder_6) AND (na.realm = :db_condition_placeholder_7) ))AND (na.grant_view >= :db_condition_placeholder_8) )) ) subquery;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ':db_condition_placeholder_0) AND (status = :db_condition_placeholder_1) AND ( EX' at line 1
mysql> 
mysql> SELECT COUNT(*) AS expression FROM (SELECT 1 AS expression FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ) subquery;
+------------+
| expression |
+------------+
|       8282 |
+------------+
1 row in set (0.01 sec)
mysql> SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0
    -> ;
+-------+--------+------------+
| nid   | sticky | created    |
+-------+--------+------------+
| 15952 |      0 | 1292777603 |
| 14746 |      0 | 1292777323 |
| 14241 |      0 | 1292777281 |
| 14426 |      0 | 1292777108 |
| 15355 |      0 | 1292776458 |
| 12577 |      0 | 1292776091 |
| 15267 |      0 | 1292775720 |
| 13541 |      0 | 1292775665 |
| 15588 |      0 | 1292775544 |
| 10136 |      0 | 1292775317 |
+-------+--------+------------+
10 rows in set (0.00 sec)

What is Devel seeing that I am missing?

agentrickard’s picture

OK, I'm arguing over nothing here. This is the query currently, when query cache is off.

I'm going to re-roll using #1001242: DBTNG support for EXISTS conditions and then we should RTBC.

agentrickard’s picture

FileSize
5.37 KB

Patch optimized to use the new ->exists method. Should be ready to go, then we can further optimize. But #1001242: DBTNG support for EXISTS conditions has to go in first.

Note that chx and I tried changing the countQuery method to remove the additional subselect. In base testing, it dropped the query execution time 8-10%, but it seems that a large dataset will always produce an ugly query. Testing with 16,600 rows in {node} and 48,700 in {node_access}.

chx’s picture

Status: Needs review » Reviewed & tested by the community

We did everything we could. EXISTS is clearly better than an IN. Within the confines of our pager implementation (one that relies on counting every item) there is only so much we can do. This is an astonishing speedup from http://drupal.org/files/issues/Screen%20shot%202010-12-31%20at%2012.46.5... to http://drupal.org/files/issues/Picture%201_0_279.png

agentrickard’s picture

I just checked, and if we remove the subselect from countQuery with query caching turned on, then the performance is identical for both queries.

chx’s picture

That's a different issue please.

agentrickard’s picture

chx’s picture

Please disregard 101-104, I am fairly close to just delete all of them from here. The issue is hard enough.

webchick’s picture

Status: Reviewed & tested by the community » Needs review

This patch is very exciting in terms of its performance benefits, but we could really use some reviews from other folks. Moshe, Damien..?

puddyglum’s picture

This patch fixes a huge problem for our (D6) site. We have an ocean of custom node grants (6k users, 3k nodes presently) and with a variant of this patch one of our views had a performance increase of ~4200% (210 seconds down to 5 seconds). But the next time we have a Drupal security upgrade it's going to be a bother to have to patch quickly.

Does anybody have problems with it?

izmeez’s picture

@jmonkfish can you give some more details?

When you patch are you removing the distinct solution that was committed in I believe 6.15
and adding the join patch?

What is the variant of this patch that you needed to use?
Is it specific to a particular use case or some custom code?

Thanks.

Josh Waihi’s picture

@jmonkfish, you can try using Drush Make. It will allow you to apply patches to code after an update.

izmeez’s picture

@josh Can you please explain or provide link to how to use drush make for a core patch after update? Thanks.

dawehner’s picture

izmeez’s picture

@dereine Thanks for the reminder to check the drush make readme. I will check that project issue queue for any further questions on how to do that.

puddyglum’s picture

FileSize
46.32 KB
46.06 KB

Patch I've used for D6 /modules/node/node.module

function node_db_rewrite_sql($query, $primary_table, $primary_field) {
  if ($primary_field == 'nid' && !node_access_view_all_nodes()) {
    //$return['join'] = _node_access_join_sql($primary_table);
    //$return['where'] = _node_access_where_sql();
    //$return['distinct'] = 1;
    $wsql = _node_access_where_sql();
    if ($wsql) {
      $return['where'] = 'EXISTS (SELECT na.nid FROM {node_access} na WHERE ' . $primary_table . '.nid = na.nid AND (' . $wsql . '))';
    }

    return $return;
  }
}

76.20006 sec before applying patch
4.80006 sec after applying patch

15.87 x faster

What I was doing before was simply bypassing the node_db_rewrite_sql in views/includes/view.inc

    $query = $this->build_info['query'];
    $count_query = $this->build_info['count_query'];
    //$query = db_rewrite_sql($this->build_info['query'], $this->base_table, $this->base_field, array('view' => &$this));
    //$count_query = db_rewrite_sql($this->build_info['count_query'], $this->base_table, $this->base_field, array('view' => &$this));

But that bypasses the node's view access check completely. For me, that just happens to be what I want. And that is what gave me the 42x improvement at first. Now it's down to about 15x improvement, which is still great.

matt2000’s picture

subscribe

catch’s picture

FileSize
729 bytes

I rolled the change from #112 into a patch for Drupal 6 and I'm testing this on a large client site (don't have any D7 sites with node_access yet). Confirmed it's removing temporary tables, more data soon hopefully.

pkiff’s picture

Title: Greatly improve performace and eliminate duplicates caused by node_access table joins » Eliminating duplicate nodes caused by node_access table joins

Hey catch,

Isn't your patch for Drupal 6 missing a closing bracket for the IF statement?

I've modified it as indicated, and am now trying it out on a site with a fairly complex string of interacting TAC Lite permissions along with some problematic Views employing views_calc and three different vocabularies of taxonomy....

Added later: Well, there would seem to be an incompatibility somewhere with TAC Lite (Not unexpected, I suppose, given the complexity of the permissions settings involved with that module). I'm unable to test further, but wanted to flag that in case someone else ends up here with the same idea that I had. Reducing the numbers of terms to a max of one for each node seems to solve the problem that I had (though I needed to create new fields to hold the classification data that I would otherwise have placed in a taxonomy).

Added yet again later: The patch I was testing was from #114. It did not include the fix below to set $wsql. The "incompatibility" with TAC Lite that I claimed was inferred from the fact that the site would return blank/error page on any request. Looking below, it is quite possible that this was due not to an incompatibility with TAC Lite, but simply due to a problem with the node.module that the #114 patch introduced. When I have a chance I will apply a later version of this patch to verify this theory.

Phil.

pwolanin’s picture

Title: Eliminating duplicate nodes caused by node_access table joins » Greatly improve performace and eliminate duplicates caused by node_access table joins

refocus title to the current patch's effect.

catch’s picture

FileSize
737 bytes

Here's a D6 patch with no syntax error.

This was rolled to production today on a client site, initial results are encouraging - haven't yet been able to find a query taking more than a second, usually there would be a lot of eight second or so queries showing up immediately after a roll. This isn't conclusive yet but all positive so far at lesat.

pwolanin’s picture

I don't see that $wsql is ever set in the patch - should be $where?

catch’s picture

FileSize
750 bytes

Grr, I was porting the pressflow patch we're actually using (that doesn't apply cleanly) back to core, but clearly wasn't paying attention.

This one should be better (again).

Heine’s picture

Issue summary copied from various responses above:

Patches to test:

Drupal 7 #681760-99: Try to improve performance and eliminate duplicates caused by node_access table joins
Drupal 6 #681760-119: Try to improve performance and eliminate duplicates caused by node_access table joins

Problem

In node.module, node_db_rewrite_sql() adds a JOIN between node and node_access to implement node access restrictions. As the number of node_access entries for a single node grows, multiple node_access entries may match the JOIN conditions, thus resulting in duplicate rows. When multiple modules participate (such as og, domain_access, etc.), the problem can result in duplicates appearing in views, blocks, and other node listings where module implementers have not considered the possibility of such duplicates.

Also, adding DISTINCT to the query to remedy this issue causes MySQL to stop using indexes making for long, long query run times.

Solution

A subquery on node_access in a conditional clause that tests whether the subquery returns rows at all.

Notes

Additional notes about Drupal 6: pwolanin: "Drupal 6 requires MySQL 4.1 or higher, so this seems like a reasonable fix."

EXISTS is supported by MySQL, SQL server, PostgreSQL, SQLite 3.1 (needs testing and would require us to up requirements, as this would be the first change that would expose the use of DBTNG's EXISTS support).

As it is vital to get this into D7 ASAP, please try to not detract from the way to get this into core.

izmeez’s picture

@Heine Thanks for the summary.

Also for Drupal 6 will this patch obviate the need for the distinct patch that was committed to 6.16?
See comments: 27, 29, 48 and 107.

catch’s picture

Just a note this has been running on a client site for over a week. I haven't been able to detect an obvious improvement apart from improving some EXPLAIN output on certain queries, but equally there has been no regression in either performance or functionality (and these normally get picked up very quickly). One thing to bear in mind is this site is already well optimized - using solr in a lot of cases and with loads of caching, so it is not being bitten as hard by these queries as another site might be.

SQLite - we already have some user space functions in SQLite, it might be possible to add another one here? I'd also be fine with upping requirements though.

agentrickard’s picture

Title: Eliminating duplicate nodes caused by node_access table joins » Greatly improve performace and eliminate duplicates caused by node_access table joins
Version: 7.x-dev » 8.x-dev

And this now needs to get into Drupal 8 first.

pkiff’s picture

Following up on my comment in #115 above. Running multi-site install of drupal 6.20 on IIS 7. Various sites on the server employ node access modules, some use a different combination than others, including: Content Access, Module Grants, TAC Lite, and Taxonomy Term Permissions. And a variety of Views add-on modules are also enabled in various combinations.

I installed catch's patch in #119 last night. One or two sites complained initially with errors, but these disappeared after all caches were emptied. Everything has been running smoothly since then.

However, on one site, I needed to modify some of my Views to remove duplicates that were not appearing previously. Adding the "Distinct: Yes" option to those views removed the duplicates. All my work on these views is done using the Views interface: none of it is a result of unique custom code. Whether this means that my initial View malformed and was somehow taking advantage of a flaw related to the node access stuff in this thread....well, I don't know. Possibly there are some remainder duplicate entries somewhere in some temporary tables or something that I didn't clear, again I don't know. And I've also not done proper performance testing.

What I thought was important to note was that the view seems to have produced a slightly different result using this patch, so theoretically, it could affect other complex view constructions in drupal 6 if folks implement it without verifying their view results match.

Phil.

Josh Waihi’s picture

FileSize
742 bytes

Resubmitting catch's D6 patch without the git bits'n'pieces so drush_make plays nice.

Status: Needs review » Needs work

The last submitted patch, 681760_node_access_d6_1.patch, failed testing.

pwolanin’s picture

Status: Needs work » Needs review
FileSize
742 bytes

@fiasco - you need -D6 last in the patch name or the testbot gets confused.

Aniara.io’s picture

#127 works for me.

catch’s picture

It is now six weeks in production on a large Drupal 6 site with no reported issues.

webchick’s picture

I asked catch what node access module was used on said site, and he said it was OG. That's great, since OG is a pretty major contrib module.

I would still love to see verification of this from a user or two using a different node access module (Domain Access, Content Access, etc.) as well as a review or two from one of the node access gurus (Moshe, Earl, etc.) like I asked for back in January.

Heine’s picture

Webchick, how do you feel about the need to up SQLite requirements?

pwolanin’s picture

@Heine - I'm seeing for sqlite:

  /**
   * Minimum engine version.
   *
   * @todo: consider upping to 3.6.8 in Drupal 8 to get SAVEPOINT support.
   */
  public function minimumVersion() {
    return '3.3.7';
  }

So we already require a version that supports EXISTS unless I'm missing something.

crac’s picture

The patch above also fixed http://drupal.org/node/1157454 (taxonomy_term view broken when using node_access). Thanks a lot!

ChrisLaFrancis’s picture

Subscribe.

skylord’s picture

Patch from #127 fixes many bugs with Views for me... There are Content Access and own written access modules on site.

Fabianx’s picture

Subscribe.

Leeteq’s picture

Title: Greatly improve performace and eliminate duplicates caused by node_access table joins » Greatly improve performance and eliminate duplicates caused by node_access table joins

(Subscribing.)

agentrickard’s picture

FileSize
4.38 KB

Updated patch for d8, that we suspect will fail.

agentrickard’s picture

FileSize
2.88 KB

This should be a proper patch against 7.x.

agentrickard’s picture

Corrected for the fact that EXISTS will fail here if one matching record is found. We have to use a subquery IN for this to return correct results.

Here's a sample query (using Domain Access):

SELECT DISTINCT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND (n.nid IN (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0

Tests for both core (Drupal 7.4) and Domain Access (7.x.3) pass with this patch.

Note that the patch cannot be applied to Drupal 8 yet, since the NodeAccessBastTable security patch has not landed there yet.

chx’s picture

If the resulting query has DISTINCT then what what did we win???? Why it's there?

agentrickard’s picture

It's been so damn long since I even looked at this code.

Re-reroll against 7.x.

Query formatted and superflous () removed:

SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created 
FROM node n
WHERE promote = '1' AND status = '1' AND n.nid IN (
  SELECT na.nid AS nid
  FROM node_access na
  WHERE  na.grant_view >= '1'  AND  (
    (na.gid = '0' AND na.realm = 'all') OR
    (na.gid = '0' AND na.realm = 'domain_site') OR
    (na.gid = '1' AND na.realm = 'domain_id')
  )
)
ORDER BY sticky DESC, created DESC
LIMIT 10 OFFSET 0
pwolanin’s picture

So we can't use EXISTS? I'm not sure I understand what's changed?

Is the target SQL now different for 6 and 7?

agentrickard’s picture

EXISTS will fail. as I understand it, because it only returns a boolean TRUE/FALSE that at least one record matched the query.

That is, if one record in the subquery is valid, all are marked valid.

Here's the (incorrect) query that EXISTS generates.

SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = '1') AND (status = '1') AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = '0') AND (na.realm = 'all') )OR( (na.gid = '0') AND (na.realm = 'domain_site') )OR( (na.gid = '1') AND (na.realm = 'domain_id') ))AND (na.grant_view >= '1') )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0

The subquery here returns TRUE, and thus node access rules are not enforced.

See http://dev.mysql.com/doc/refman/5.0/en/exists-and-not-exists-subqueries.....

SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);
Traditionally, an EXISTS subquery starts with SELECT *, but it could begin with SELECT 5 or SELECT column1 or anything at all. MySQL ignores the SELECT list in such a subquery, so it makes no difference.

For the preceding example, if t2 contains any rows, even rows with nothing but NULL values, the EXISTS condition is TRUE. 

This is something @chx caught when reviewing the approach a while back, and I suspect it affects the D6 version as well.

chx’s picture

To neatly sum up, the subquery is not a dependent subquery.. If you have access to one node, you have access to all.

Edit: D6 version above is not affected: 'EXISTS (SELECT na.nid FROM {node_access} na WHERE ' . $primary_table . '.nid = na.nid AND (' . $where_sql . '))'; See, it binds the subquery to primary_table.nid.

I think the D7 version should do the same just to make sure we do not try to list every node you have access to.

chx’s picture

agentrickard’s picture

Trying the approach defined in #145.

agentrickard’s picture

@chx is correct. The following patch tests cleanly on Drupal 7.4 and Domain Access 7.x.3.

agentrickard’s picture

FileSize
4.47 KB

And this patch applies cleanly to D8.

Status: Needs review » Needs work

The last submitted patch, 681760-node-query.patch, failed testing.

chx’s picture

+        $field = 'entity_id';
       }
+      $subquery->where("$nalias.nid = na.nid");

I think you wanted $nalias.$field here.

agentrickard’s picture

Status: Needs work » Needs review
FileSize
3.3 KB

Ah.

agentrickard’s picture

And for D7.

agentrickard’s picture

Adding 'needs benchmarks' tag.

webchick’s picture

Ok, #1204658: Always use query metadata to specify the node access base table was committed, so 8.x is finally in sync with 7.x. So the 8.x version of this will need a re-roll. Sorry about the delay, #1136130: Regression: Reinstate WATCHDOG_* constants and document why they are necessary. created a hot mess.

agentrickard’s picture

Not a problem.

pdrake’s picture

Patch from #127 on a D6 site has the following effect for me (with query cache off):

Before Patch:
Executed 1281 queries in 3491.97

After Patch:
Executed 1281 queries in 4485.52

That's a roughly 28% performance degradation (on my particular set of queries, which I realize is fairly unique) when using the patch from #127. The node_access module in use here is OG. That said, it does resolve views-related issues for me, apparently without introducing any new bugs.

caschbre’s picture

I was noticing this issue due to views_calc and the og modules. I applied #127 and so far it looks good.

I haven't verified any performance changes however.

catch’s picture

Issue tags: +Needs backport to D7

Tagging for backport.

killes has benchmarked this and found that it makes some queries slower, keeps some the same, and makes others faster. So this is not straightforwardly better but it might be...

I'll nag him to post the actual results here.

catch’s picture

Priority: Major » Normal

I need to nag killes some more however even more benchmarks confirmed the initial results in #159. At the very least I'd want to see counter-benchmarks before we consider this major.

xjm’s picture

xjm’s picture

Hmm, so locally, #156 throws some errors with our test for #1390046: Having multiple node access grants within the same realm breaks comment pager query. Or I just applied the wrong patch. :P I confirmed the patch here fixes the issue in #1390046: Having multiple node access grants within the same realm breaks comment pager query and makes this test pass.

Here's a reroll of #156 for testbot.

xjm’s picture

Marked #1390046: Having multiple node access grants within the same realm breaks comment pager query as duplicate of this issue.

Here's the patch rolled with our test from that other issue. We'll probably modify the test to add a separate grant realm for this case. We should add coverage for other functional bugs related to this as well, e.g. #1237380: Forum count incorrect when using access control modules, now for D7.

Reference from chx regarding performance optimization: http://www.xaprb.com/blog/2006/04/30/how-to-optimize-subqueries-and-join...

Niklas Fiekas’s picture

Added a method to the testcase, that verifies, that the problem also exists for node pagination (looking at a forum term page), not only comments. Merging in the old #668268: Weird behavior of pager for non-admin users and the newer #1237380: Forum count incorrect when using access control modules, now for D7 issue accordingly.

What else do we need to test?

How do we account for the performance?
Edit: Did 500 requests on 10 threads both taking (randomly, +/-5%) about the same time. However that's barely a serious test, on my local, non optimized machine, with the specific case of having a node with taxonomy access control and 200 comments, MySQL.

As for the patch files: We're expecting test-only to fail with two fails and test-combined to pass. Adding the testForumPager() method I changed the class name to a more general one, so the registry might need to be rebuilt, when trying locally.

catch’s picture

Title: Greatly improve performance and eliminate duplicates caused by node_access table joins » Try to improve performance and eliminate duplicates caused by node_access table joins

Re-titling since we're not sure this 'greatly improves' performance.

pvanerk’s picture

Will the patch of #127 be included in Drupal 6 Core ?
The patch fixed the #1163078: Views Calc not summing values correctly issue for me and I need this on production sites. However I don't like to patch the drupal core myself but prefer to use stable releases.

Thanks!

xjm’s picture

Issue tags: +Needs backport to D6

@pvanerk -- Before a patch is considered for D6, it needs to go into D8 and then D7. So, what we need here is (apparently) benchmarks for the performance implications of the patch. Then it can be considered to go into core.

rjbrown99’s picture

If we are considering a D6 backport of #127, it needs $return = array(); at the top. In the current iteration of the patch in #127, $return will not be set in the case where the if statement fails. In the original function $return is always set.

Niklas Fiekas’s picture

Assigned: catch » Unassigned
Status: Reviewed & tested by the community » Needs review
Issue tags: +Needs tests

Forum term page

Old query:

EXPLAIN
SELECT DISTINCT f.* FROM forum_index f INNER JOIN node_access na ON na.nid = f.nid WHERE (f.tid = 1) AND(( (na.gid = 8888) AND (na.realm = 'node_access_test') )OR( (na.gid = 8889) AND (na.realm = 'node_access_test') ))AND (na.grant_view >= 1) ORDER BY f.sticky DESC, f.last_comment_timestamp DESC LIMIT 25 OFFSET 0

forum-query-old.png

New query:

EXPLAIN
SELECT f.* FROM forum_index f WHERE (f.tid = 1) AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = 8888) AND (na.realm = 'node_access_test') )OR( (na.gid = 8889) AND (na.realm = 'node_access_test') ))AND (na.grant_view >= 1) AND (f.nid = na.nid) )) ORDER BY f.sticky DESC, f.last_comment_timestamp DESC LIMIT 25 OFFSET 0

forum-query-new.png

Front page

Old query:

EXPLAIN
SELECT DISTINCT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n INNER JOIN node_access na ON na.nid = n.nid WHERE (promote = 1) AND (status = 1) AND(( (na.gid = 8888) AND (na.realm = 'node_access_test') )OR( (na.gid = 8889) AND (na.realm = 'node_access_test') ))AND (na.grant_view >= 1) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0

node-query-old.png

New query:

EXPLAIN
SELECT n.nid AS nid, n.sticky AS sticky, n.created AS created FROM node n WHERE (promote = 1) AND (status = 1) AND ( EXISTS (SELECT na.nid AS nid FROM node_access na WHERE (( (na.gid = 8888) AND (na.realm = 'node_access_test') )OR( (na.gid = 8889) AND (na.realm = 'node_access_test') ))AND (na.grant_view >= 1) AND (n.nid = na.nid) )) ORDER BY sticky DESC, created DESC LIMIT 10 OFFSET 0

node-query-new.png

530 nodes, 519 nodes forum index, 30016 comments

xjm’s picture

Issue tags: -Needs tests

Awesome awesome. Thanks @Niklas Fiekas, and thanks @msonnabaum for helping us figure out what to do.

msonnabaum pointed out that benchmarking the time spent on the query is probably too variable to be reliable, but the EXPLAINS are pretty clear, I think. In the forum case, we get rid of the distinct, but otherwise it's unchanged; and the front page query is actually improved.

msonnabaum said he'd post results for a newer version of MySQL as well if he has the chance, to compare. If those turn out to be neutral or improved as well, then I think we can safely say this patch can go in. :)

xjm’s picture

Status: Needs review » Reviewed & tested by the community

@msonnabaum also confirmed no regressions on a more recent version of MySQL.

The patch has test coverage, has been reviewed umpteem times, and fixes a number of functional bugs with node access and node listings. (I still sort of think this should be major). I'd say RTBC. :)

xjm’s picture

FileSize
7.45 KB

D7 version.

webchick’s picture

Assigned: Unassigned » catch

I trust catch to be able to make more sense of that EXPLAIN output than me. :) Assigning to him.

puddyglum’s picture

Assigned: Unassigned » catch
Status: Needs review » Reviewed & tested by the community
Issue tags: -Needs tests
FileSize
14.26 KB

Applying #172's patch had little impact to one of our site's performance. Tested a View with about 20,000 nodes in the pool.
Attached a Firebug screenshot showing before and after. Pretty sporadic, but no significant difference.

Niklas Fiekas’s picture

Awesome, thanks for testing!

webchick’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Performance, +Needs backport to D6, +Needs backport to D7

The last submitted patch, 681760-d7.patch, failed testing.

Niklas Fiekas’s picture

Status: Needs work » Needs review
FileSize
7.54 KB
4.28 KB

Oh ... some API changes. Rebased. D7 should be unaffected, so @xjm's patch still applies.

xjm’s picture

Probably for D7 we should wait and reroll once #1064954: _node_revision_access() static usage goes in there as well.

xjm’s picture

Status: Needs review » Reviewed & tested by the community

Though the D8 reroll is fine. Should have said that earlier. :)

catch’s picture

Version: 8.x-dev » 7.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)
Issue tags: -Performance

Alright since the performance testing is conclusively inconclusive, and this fixes a functional bug, I've committed/pushed this to 8.x.

catch’s picture

Assigned: catch » Unassigned
Niklas Fiekas’s picture

Status: Patch (to be ported) » Needs review
FileSize
7.49 KB
4.24 KB

Rebased @xjm's backport from #172.

Status: Needs review » Needs work

The last submitted patch, 681760-multi-realm-access-d7-183.patch, failed testing.

Niklas Fiekas’s picture

Status: Needs work » Needs review
FileSize
7.57 KB
4.33 KB

Ah ... we can't use entity_create() in D7, of course.

xjm’s picture

Status: Needs review » Reviewed & tested by the community

Go go go!

izmeez’s picture

Now this is moving into core through Drupal 8 and possibly filtering down because it fixes a "functional bug" I am wondering if this fix also addresses the question in comment #29 above, http://drupal.org/node/681760#comment-2782302

That is, is the previous commit to solve issues with DISTINCT still needed?

According to 6.16 release notes the following was added:
http://drupal.org/node/284392, "db_rewrite_sql causing issues with DISTINCT"

Thanks.

Niklas Fiekas’s picture

Well ... at least we no longer have any DISTINCT. I didn't look at that issue yet, though, so I can't tell if there is any other problem.

xjm’s picture

@izmeez: We'll probably want to create and test an updated D6 patch after this issue is fixed in D7.

webchick’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: +7.13 release notes

Excellent. Thanks SO much for your work on this, folks!!

Committed and pushed to 7.x. Tagging for the 7.13 release notes.

xjm’s picture

Version: 7.x-dev » 6.x-dev
Status: Fixed » Patch (to be ported)

I think this needs to go to D6 now.

puddyglum’s picture

I believe patch from #127 should still be the right one for D6. Trying to get some benchmarks.

izmeez’s picture

For D6 patch also see comment #168, http://drupal.org/node/681760#comment-5654310

puddyglum’s picture

#127 with $return = array() from #168

puddyglum’s picture

Status: Patch (to be ported) » Needs review
lightsurge’s picture

The distinct that the D6 port of this patch removes means duplication in lists, for people who've built their queries around Drupal core distinguishing nodes (in Views for example).

rjbrown99’s picture

For what it's worth, I have been using the patch from #194 for months now in production, no issues at all.

hefox’s picture

Adding tag "Performance," removed the needs backport to d7, should the needs backport to 6x be removed?

(Haven't tested, just encountered this when looking into something else).

salvis’s picture

Issue tags: -Performance

It's still broken in D6.

salvis’s picture

Issue tags: +Performance

(don't know why this happened...)

Status: Needs review » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.

jvieille’s picture

I tried to apply #194

catastropic impact on performance. A complex query with many joins that took 100 ms ran instead in 100 seconds