When deleting a node, it is not being removed from the Solr index. I am using Solr4 along with the latest dev module (ported to 6.x-3.x) and do not currently use Solr3 to test against. Deleting a node calls apachesolr_remove_entity, which ends up using deleteByQuery.

If I use deleteById, the entity is removed. If I change the query method in deleteByQuery to use the following it also works:

    $rawPost = '<delete>';
    $rawPost.= '<id>'. htmlspecialchars($document_id, ENT_NOQUOTES, 'UTF-8') . '</id>';
    $rawPost.= '<query>' . htmlspecialchars("sm_parent_document_id:$document_id", ENT_NOQUOTES, 'UTF-8') . '</query>';
    $rawPost.= '</delete>';

In what case would sm_parent_document_id even exist and is this necessary to have? I'm happy to create a patch (or updated the existing solr4 path) with a fix depending on which method is preferred.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

j0rd’s picture

I'm noticing the same problem. None of my nodes are getting removed.

j0rd’s picture

Priority: Normal » Major

Changing priority to major. Pretty big bug.

j0rd’s picture

Here's a hack patch to resolve the issue.

If you need to fix your install, you might want to look at my module to prune non-existant nodes from a solr index (with out having to delete the whole thing).

#885484: "Delete Orphaned Nodes from Index" - a batch API for large sites

Nick_vh’s picture

I'll look at this as soon as possible, Might need to add some tests so we can actually figure this out. Too bad the testbot does not have a solr running ;-)

kriboogh’s picture

We are running this one against a solr4 server and can confirm that the patch in #3 works for deleting nodes.

Nick_vh’s picture

Is this only a problem with solr 4?

Nick_vh’s picture

Is this only a problem with solr 4?

Nick_vh’s picture

FileSize
5.44 KB

Solving this another way until we've figured it out. Omitting the delete by query is not an option afaik.

Committed the following patch and will do a follow-up asap

Nick_vh’s picture

Status: Active » Needs review
FileSize
6.15 KB
Nick_vh’s picture

This patch actually solves multiple Solr 4 problems that we currently encounter. For some reason it is way stricter when sending documents so we should figure that out.

Status: Needs review » Needs work

The last submitted patch, 1874420-9.patch, failed testing.

Nick_vh’s picture

Status: Needs work » Needs review
FileSize
6.23 KB

Oeps, my bad

Status: Needs review » Needs work

The last submitted patch, 1874420-11.patch, failed testing.

Nick_vh’s picture

Status: Needs work » Needs review

#12: 1874420-11.patch queued for re-testing.

Status: Needs review » Needs work

The last submitted patch, 1874420-11.patch, failed testing.

Nick_vh’s picture

Status: Needs work » Needs review
FileSize
8.28 KB

DummySolr did not have the getSolrVersion method

Status: Needs review » Needs work

The last submitted patch, 1874420-16.patch, failed testing.

Nick_vh’s picture

Status: Needs work » Needs review
FileSize
8.21 KB

late night coding makes up for a whole lot of errors, and my simpletest is broken so relying on drupal.org...

Nick_vh’s picture

Looking for a more root cause to this problem

https://issues.apache.org/jira/browse/SOLR-3432

Nick_vh’s picture

FileSize
8.15 KB

Ok, so it seems that the id slashes trigger a regular expression in Solr 4.0 and that is why the query did not work. This patch should solve it without working around it.

Nick_vh’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
Status: Needs review » Patch (to be ported)

Committed to 7.x-1.x.

charos’s picture

Is the patch backported to latest 6.x-3.x-dev(Straight backport of the 7.x-1.x branch) as well ?

AntiNSA’s picture

waiting for 6.3 support

AntiNSA’s picture

Priority: Major » Critical
Nick_vh’s picture

+++ b/apachesolr.api.phpundefined
@@ -232,13 +232,13 @@ function hook_apachesolr_delete_by_query_alter($query) {
+function hook_apachesolr_exclude($entity_id, $entity_type, $row, $env_id) {

+++ b/apachesolr.index.incundefined
@@ -972,7 +972,7 @@ function apachesolr_term_reference_indexing_callback($node, $field_name, $index_
@@ -987,10 +987,14 @@ function apachesolr_term_reference_indexing_callback($node, $field_name, $index_

@@ -987,10 +987,14 @@ function apachesolr_term_reference_indexing_callback($node, $field_name, $index_
         // regardless of whether the facet is set to show as a hierarchy or not.
         // We would need a separate field if we were to index terms without any
         // hierarchy at all.
-        $fields[] = array(
-          'key' => $index_key,
-          'value' => $ancestor->tid,
-        );
+        // If the term is singular, then we cannot add another value to the
+        // document as the field is single
+        if ($field_info['multiple'] == true) {
+          $fields[] = array(
+            'key' => $index_key,
+            'value' => $ancestor->tid,
+          );

seems this slipped in the patch. Some explanation here : If the index_key is single and the field is single, there is no point in having multiple values in the field causing solr to break. It slipped in this patch so it's worth to mention what was done here.

// Index parent term against the field. Note that this happens
        // regardless of whether the facet is set to show as a hierarchy or not.
        // We would need a separate field if we were to index terms without any
        // hierarchy at all.
        // If the term is singular, then we cannot add another value to the
        // document as the field is single
        if ($field_info['multiple'] == true) {
          $fields[] = array(
            'key' => $index_key,
            'value' => $ancestor->tid,
          );
        }
David_Rothstein’s picture

The code that slipped in actually caused big problems: #1984664: Single-valued taxonomy field facets disappear after upgrading from 7.x-1.1

That's because the $ancestors variable doesn't just contain the term's parents; it contains the term itself also. So single-valued taxonomy fields are no longer being indexed.

I have a patch to fix it in the above issue.

David_Rothstein’s picture

Actually, even beyond that problem, the $field_info['multiple'] == true check doesn't really make sense to me here for other reasons; see my comment at #1984664-12: Single-valued taxonomy field facets disappear after upgrading from 7.x-1.1 for why.