_node_query_node_access_alter() is (at least in some cases) horribly inefficient, causing severe performance issues.

It's not clear to me whether this is a real fixable problem, or if I'm just unaware of some complications which require it to be this way, despite the effect it has.

To paraphrase the sorts of queries I'm seeing, it's doing this:

SELECT node.nid
  FROM node
 WHERE EXISTS (SELECT na.nid AS nid
                 FROM node_access na
                WHERE na.gid = '0'
                  AND na.realm = 'all'
                  AND na.grant_view >= '1'
                  AND node.nid = na.nid);

instead of this:

SELECT node.nid
  FROM node
       INNER JOIN node_access na
                  ON na.gid = '0'
                 AND na.realm = 'all'
                 AND na.grant_view >= '1'
                 AND node.nid = na.nid;

The first version is doing a sub-select per row!

The second version just joins once and it's done.

I have a View which creates a query where the node table appears twice, so I then get two independent per-row sub-selects happening, and the effect on performance is pretty appalling (taking approximately 17 times as long, compared to a pair of joins). I've now disabled the access checks for that View, and am simply relying upon the nodes being published.

Is this fixable? Or does the JOIN approach break down in some cases?

There are no comments in the code explaining why it's using such an inefficient approach, so if it needs to be this way then at the very least the code should explain why?

Comments

jweowu’s picture

jweowu’s picture

Issue summary: View changes