_node_query_node_access_alter()
is (at least in some cases) horribly inefficient, causing severe performance issues.
It's not clear to me whether this is a real fixable problem, or if I'm just unaware of some complications which require it to be this way, despite the effect it has.
To paraphrase the sorts of queries I'm seeing, it's doing this:
SELECT node.nid
FROM node
WHERE EXISTS (SELECT na.nid AS nid
FROM node_access na
WHERE na.gid = '0'
AND na.realm = 'all'
AND na.grant_view >= '1'
AND node.nid = na.nid);
instead of this:
SELECT node.nid
FROM node
INNER JOIN node_access na
ON na.gid = '0'
AND na.realm = 'all'
AND na.grant_view >= '1'
AND node.nid = na.nid;
The first version is doing a sub-select per row!
The second version just joins once and it's done.
I have a View which creates a query where the node table appears twice, so I then get two independent per-row sub-selects happening, and the effect on performance is pretty appalling (taking approximately 17 times as long, compared to a pair of joins). I've now disabled the access checks for that View, and am simply relying upon the nodes being published.
Is this fixable? Or does the JOIN approach break down in some cases?
There are no comments in the code explaining why it's using such an inefficient approach, so if it needs to be this way then at the very least the code should explain why?
Comments
Comment #1
jweowu CreditAttribution: jweowu commentedWell I guess this was a purposeful change, originating here: #681760: Try to improve performance and eliminate duplicates caused by node_access table joins
Commit was 09ecc5ba834bda67d5208216e6e19d9eafdca933.
Comment #2
jweowu CreditAttribution: jweowu commented