Please note the following about the SQL below created by the node reference module:

- On a DB of about 100k nodes it took 3 secs.
- Two "OR n.type = '1'"
- The "n.title LIKE '%%'" statement has no string in it.

The reason it is slow in MySQL (5.0) is that the ORDER BY clause can't use indexes due to differences with the WHERE clause (see http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html).

Showing rows 0 - 29 (30 total, Query took 3.0325 sec)

SELECT DISTINCT (
n.nid
), n.title AS node_title, n.type AS node_type
FROM node n
WHERE (
n.type = 'product'
OR n.type = '1'
OR n.type = '1'
)
AND n.title LIKE '%%'
ORDER BY n.title, n.type

Here is the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE n ALL node_type NULL NULL NULL 101444 Using where; Using temporary; Using filesort

Comments

Reg’s picture

Here's a restructured query. This version doesn't have the errors fixed in the SQL but it does solve the speed problem. What I don't know is whether it is legitimate to put a LIMIT clause in the statement but my goal at this point was just to address the speed issue. I'm hoping the owner of the module will take it from here for the speed issue. I'll have a look and see if I can find a solution for the SQL malformation.

Showing rows 0 - 29 (5,000 total, Query took 0.1335 sec)
SQL query:
(
SELECT DISTINCT (
n.nid
), n.title AS node_title, n.type AS node_type
FROM node n
WHERE (
n.type = 'product'
OR n.type = '1'
OR n.type = '1'
)
AND n.title LIKE '%%'
LIMIT 5000
)
ORDER BY node_title, node_type
Reg’s picture

I think I know where some of the error in the SQL is coming from. In the module there is this function:

function _nodereference_potential_references($field, $return_full_nodes = FALSE, $string = '', $exact_string = false) {

Then in the function there is this line:

    if (isset($string)) {

However, there is no check to see if the string is empty.

Reg’s picture

Title: Creates malformed SQL and does not scale » Creates malformed SQL which is also slow on large data

I made two changes to make the SQL statement correct:

This is around line 433 in function _nodereference_potential_references to remove the duplication

# if ($related_type) { # changed to below
   if ($related_type && !in_array($related_type, $args)) {

This is around line 447 in function _nodereference_potential_references to remove the LIKE clause when string is empty

# if (isset($string)) { #changed to below
   if (!empty($string)) {
Reg’s picture

Title: Creates malformed SQL which is also slow on large data » Creates malformed SQL and does not scale

Changed title to make more sense.

Reg’s picture

Title: Creates malformed SQL which is also slow on large data » Creates malformed SQL and does not scale

Speed help. If you follow some of the issues regarding the node tables primary key then you can choose to set the primary key to just the nid field. If you do this then you can also remove the DISTINCT from the query except that it is introduced by node_db_rewrite_sql($query, $primary_table, $primary_field) and bypassing it from what I have read means not playing nice with node security so it doesn't look like an option.

yched’s picture

I committed the part about empty string (D5 and D6 branches).

About

WHERE (
n.type = 'product'
OR n.type = '1'
OR n.type = '1'
)

Well, this looks strange - can you resave your settings for the noderef field and see if the query still has this afterwards ?

KarenS’s picture

Status: Active » Closed (won't fix)

The D5 version is no longer being supported. Sorry.