Drupal Association members fund grants that make connections all over the world.
I became manager/developer of a Drupal site that has 19M nodes. I am now upgrading it to Drupal 6. When I request the page /admin/content/node, the function node_admin_nodes() (node.admin.inc) generates the query "SELECT DISTINCT n.*, ... FROM node n, user u ..." to generate the paged list of nodes. This query takes hours to run. "EXPLAIN " shows that the reason it takes so long is because mySQL does not use any of the indexes for the node table and therefore scans all 19M nodes. A DBA support guy at my hosting company explained to me that this is because of the "DISTINCT". Without the "DISTINCT" keyword, this query takes milliseconds to return the first 50 nodes for the display.
Further investigation on my part shows that the "DISTINCT" keyword gets added by node_db_rewrite_sql, the node module's over-ride of http://api.drupal.org/api/function/db_rewrite_sql/6. It does so because node_access_view_all_nodes() returns FALSE. This happens because my node_access table is empty.
My question is what do I need to do so that node_access_view_all_nodes() will return true when I or anyone with the 'Administrator' role (the role chosen in the admin_role module to act as administrator so that admins can log in as themselves
) can view the /admin/content/node page without locking up the site.
I have two observations:
1) A user with the admin role (can do anything) should implicitly have permission to view all nodes.
2) It is not clear to me why adding the DISTINCT keyword satisfies security restraints if the user does not have permission to view all nodes. In other words, if the user does not have permission to view all nodes, he/she should not even be able to access this page and the query should not run at all!
Can someone explain?