Hi

The update index is not working when a node is updated in my site and after debugging I found the issue with the below line

$result = db_query_range("SELECT asn.nid, asn.changed FROM {apachesolr_search_node} asn ". $join_sql ."WHERE (asn.changed > %d OR (asn.changed = %d AND asn.nid > %d)) AND asn.status = 1 ". $exclude_sql ."ORDER BY asn.changed ASC, asn.nid ASC", $args, 0, $limit);

There is a check for nid . So for example I update a single node/multiple nodes and the values I get in args are (time(),time(),node_id) . The query checkes for asn.nid > node_id , so the updated node is getting missed . The check for nid is fine when we delete the index and re-index as we get all the 3parameters as 0.

I also checked the apachesolr_search_node this is table which is getting checked when an update/new index is created , The table gets updated on the node save and if I save a nid 46 first and 36 next . I can see 36 last and 46 above it . So when the above query is run it checks for nid > 46 . So both 46 and 36 are missed as it will return empty rows . I dont see anywhere sorting is being done.

why cant the nid be removed from the condition as we already have the timestamp check .

Please correct me if I am wrong or mis-understood the condition put up in their

Comments

pwolanin’s picture

Version: 6.x-1.0 » 6.x-1.x-dev
Status: Active » Postponed (maintainer needs more info)

Since this code was revised recently, it's possible there is a logic error. However, your comments above don't make sense to me. The parameters to the query come from the the saved variable.

asak’s picture

Subscribing - i think we're having the same issue.
Will look into it as well.

jusfeel’s picture

subscribe

pwolanin’s picture

Note -the nid cannot be removed from the query since multiple nodes may have been updated in the same second.

jpmckinney’s picture

Status: Postponed (maintainer needs more info) » Fixed

Here's my analysis:

apachesolr_batch_index_nodes or apachesolr_search_cron call apachesolr_get_nodes_to_index, and pass those nodes to apachesolr_index_nodes.

Q: How do we get the nodes to index? A: apachesolr_get_nodes_to_index calls apachesolr_exclude_types, which calls apachesolr_get_last_index, which extracts the last_changed and last_nid variables from the apachesolr_index_last variable. It performs a query against the apachesolr_search_node table, ordering results by the changed and nid columns.

Q: How is the apachesolr_index_last variable set? A: That variable is set in only two functions. The first function, apachesolr_clear_last_index, resets apachesolr_index_last, effectively setting last_changed and last_nid to 0. In that case, apachesolr_get_nodes_to_index would get all the nodes, up to a given limit. The second function, apachesolr_index_nodes, sets last_changed and last_nid to the values of the nid and changed columns of the last indexed node (it takes the values of those columns from the apachesolr_search_node table).

As such, nodes are processed in order of which was last changed, with the oldest node being processed first. If many nodes are updated in the same second, nodes are processed in order of nid. Thus, I don't think there is a bug within apachesolr_batch_index_nodes.

ramprassad’s picture

Hi,

Not sure this may help you but I had a similar problem. I was directly calling apachesolr_index_nodes($rows,'apachesolr_search') function to index a subscription value for a business node making big bunch of nodes to reindex as the $rows of nids did not have a timestamp param so it screwed up 'apachesolr_index_last'. If you are doing this anywhere, just update the apachesolr_search_node table for the node with the current timestamp . The fix that I applied took two steps

1. Edited a node and saved
2. Took the changed timestamp for the node in the apachesolr_search_node and updated it in the apachesolr_index_last.

Subsequent node updates sent those nodes alone sent for indexing

pwolanin’s picture

@James - that's a helpful/coherent writeup of the algorithm - maybe we should add it to the README?

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mcabalaji’s picture

Issue summary: View changes