Problem/motivation

On complex sites it is sometimes necessary to understand why specific pieces of content are not making it into solr. The apachesolr module does not currently provide an easy way to test end to end the pushing of a specific piece of content into solr with immediate useful debug messages.

Solution

Provide a drush command which takes a specific entity id as its argument and then runs through the entire process of building a solr document and posting it to solr. This should use all the internal apachesolr functions to do the job and should provide helpful error messages at the right point if the module thinks the content should not go to solr.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

johnennew’s picture

Status: Active » Needs review
FileSize
3.4 KB

Please find a patch suggestion for review attached.

This adds a drush command of the form:

To index node with nid 5 ...

  drush solr-index-entity 5
wonder95’s picture

I applied this patch and tested and it works perfectly. I have wanted to have this functionality for a long time, precisely for cases where I'm troubleshooting why a particular entity isn't being indexed. If I had more than two thumbs, I would use them all to give a multiple thumbs up.

There is a another issue for a similar feature request (see #1908094: Add the ability to index a specific entity), and I voted for it then, too . However, @Nick_vh suggested adding it to solr-index, and in an IRC conversation I had with @pwolanin, he suggested the same thing. However, there is a very valid reason for not adding it to solr-index, and it has to do with the Batch API. As I've documented here, when trying to debug my indexing hook with XDebug in a custom module using solr-index, the debugger dies somewhere in the batch processing and never gets to my code. With this, that isn't a problem since it's not running through Batch API. The only way to target a specific node (that I can figure out) is to clear out the index_entity table to just include the one node you want to index, and run the indexing from the UI, That's a lot of work. This patch alleviates that problem and allows for easy debugging of custom indexing code.

johnennew’s picture

Thanks @wonder95 for the review. I regularly use this function on many sites and still think it would be a useful addition to this module.

I didn't mention when I submitted the patch but it is also possible to index any entity by passing in an entity_type parameter, for example:

   drush solr-index-entity --entity_type=my_custom_entity 5

If omitted, the default entity_type is node.

RoySegall’s picture

This thing need some shaping up and i'll explain:
1. As i can see it the functionality need be more API since the function for re-index can be used outside of drush context. For example when updating a node to un-published and then run a search in the site that node will appear in the results. What i need to do is use the index function for that node and he will be gone from the search.
2. In search API there is an option for index a content after it's created - we can use this to add that option from the UI(can be done in another patch)
3. This is the most important - testing. I can't see a way to write test for this functionality when it's happens in a drush context. Not something i aware of.

RoySegall’s picture

I tested this one and it did not updated for me a specific node. Maybe i'm missing something. I'm uploading a patch with my progress.

Status: Needs review » Needs work

The last submitted patch, 6: apachesolr-drush_index_specific_entity-2197185-6.patch, failed testing.

pvhee’s picture

That's a very useful extension to debug single nodes indeed. It somehow always returned "Solr didn't like that document" but the solr documents were submitted fine.

amontero’s picture

amontero’s picture

Both #1 & #6 patches have the same performance issue that user escuriola found at #2623562: Avoid clearing all node cache when entity_load .

Attached is a reroll of #6, applying the same solution as described in the above linked issue.

The patch interdiff:

133c122
< +  $entities = entity_load_unchanged($entity_type, array($entity_id));
---
> +  $entities = entity_load($entity_type, array($entity_id), array(), TRUE);
206,208d194

Also, notice that the alternative solution in #1816462: Possible to instantly index an entity / node? does takes this in account.

wonder95’s picture

Attached is a re-rolled patch that fixes two problems with the previous patch:

  1. Fixes call to entity_load_unchanged() to pass $entity_id by itself, not in an array.
  2. Gets rid of reset() call on returned entity object, since entity_load_unchanged() does that on its last line.
amontero’s picture

Status: Needs review » Needs work

Good call. Didn't noticed that entity_load_unchanged() works on a single entity.
I like also the environment check. However, shouldn't we check for Solr readiness before the document sending?

amontero’s picture

@RoySeagall: Reviewing the code, I see that the apachesolr_entity_index() function you've added is not used anywhere for the Drush command. I'm 100% in for such a function, since I would benefit from it from my code. However, I think it should be addressed separately. I propose you to open a new issue to address it without the Drush specifics. Later, we could update the Drush command in the present issue to use the new API function. Additionally this would ease module maintainer's reviewing. Afterwards/simultaneously/previously we can work in the code within this issue here to get something useful sooner. Anyway, it will be useful later for extracting a separate API function as you propose. Does it makes sense?

@wonder95: Checking the SO thread you linked in #2, I see there is a recent answer. Did it solved the XDebug issue? Does the comment in #2 still stands?

I'm still figuring out the module's way of sending entities to Solr to compare it with the current patch to make sure we don't leave anything out. We're using code from this issue already in production and nothing broke, but would like to have it polished, reviewed and commited.