Has anyone integrated integrated Radioactivity with Apache Solr? I'm using the apachesolr module for search and would like to add the ability to sort by 'hotness'.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

cpliakas’s picture

I haven't done this yet, but I will be prototyping something soon. The challenge here is that documents in Solr are atomic, so in order to index the radioactivity data in a meaningful way the node or other entity, you have to reindex the entire thing. That means that in order for up-to-date radioactivity data to be indexed, is is likel that the entire dataset will have to be re-indexed often.

There is a technique that Solr provides which allows you to update only radioactivity data without having to reindex the document it is associated with, but Apache Sole Search Integration doesn't support it yet. If you think this would be a worth while feature, throw your support behind #1586320: Add support for the ExternalFileField field type.

Thanks,
Chris

greggles’s picture

Issue tags: +commonslove

Great point, Chris.

It may be that it is necessary to only add some radioactivity profiles to solr since they should be relatively stable, so it should have a long half-life.

Another potentially useful article is https://www.acquia.com/blog/delivering-right-search-results

Also tagging as this would be likely to be used in commons.

greggles’s picture

Title: Integration with Solr » Integrate radioactivity with Solr for sorting and content bias
Version: 6.x-1.3-rc2 » 7.x-2.x-dev
Category: support » feature

more specific title and more accurate metadata.

patrick.thurmond@gmail.com’s picture

Assigned: Unassigned » patrick.thurmond@gmail.com

I will take a look into doing this.

cpliakas’s picture

Issue posted against Apache Solr Config Generator might be related, so cross-posting.

#1741750: Add support for external file fields

ezra-g’s picture

Assigned: patrick.thurmond@gmail.com » Unassigned

pthurmond is not actually working on this.

cpliakas’s picture

So some additional thoughts ...

Depending on the size of the site and how frequently the radioactivity values are generated, the "crawl" approach could be to simply index the field and boost on the value. For larger sites where radioactivity changes frequently across a really large dataset of 10's of thousands of pieces of content, then the external file filed approach is advantageous to avoid re-indexing a metric crap ton of content every N number of minutes.

I think what would help here is to understand the frequency of change in radioactivity values across the dataset. Are the values for the entire dataset changed during cron runs? Can you separate out the generation of the radioactivity calculations from cron via a drush command or something? A better understanding of the calculations will help determine the best way to proceed.

Chris

pwolanin’s picture

I agreee with Chris - please define clearly the expected volume of content, the update frequency of the radioactivity values, and expected timeframe on which changes are supposed to be reflected in search scores.

Only with this can we even outline a POC integration, and the assumptions and working ranges need to be very, very explicit and clearly documented.

h3rj4n’s picture

I implemented the Apache Solr field_mappings hook so that the Radioactivity field can be indexed by Solr. This will be the first stap towards supporting Solr I guess. I added a patch with the code to do so.

The next part is when and how much the entity should be updated. I added the following lines the the _radioactivity_update_energy function:

<?php
  // Update Solr!
  if ($entity_type == 'node') {
    // If we haven't seen this entity before it may not be there, so merge
    // instead of update.
    db_merge('apachesolr_index_entities_node')
      ->key(array(
      'entity_type' => $entity_type,
      'entity_id' => $entity_id,
      ))
      ->fields(array(
        'bundle' => $bundle,
        'status' => 1,
        'changed' => REQUEST_TIME,
      ))
      ->execute();
  }
?>

Each time the radioactivity changes the node is queued for reindexing. I added Ultimate Cron to let the Apache Solr run every minute to process all the entities that needs to be indexed.

The code above only works for node's at this point. I tried to create a more generic way of adding the entity to the Apache Solr queue but that wasn't possible because the function isn't executed in a full Drupal bootstrap so you don't have access to all the Drupal functions. I didn't added the above part in the patch because I don't think it's the right sollution for the challange.

Edit \

By the way, Radioactivity writes the value straight to the database. So it doesn't need a cron to process the updated values. That's why I'm able to add the node straight to the index queue of Solr.

pinkonomy’s picture

Hi ,thanks for the code.
Is this possible to integrate also Search api?

fox_01’s picture

search api integration would be nice too, so that the index dont have to manually rebuild

mpp’s picture

Removed the whitespaces from this patch.

heshanlk’s picture

Status: Active » Needs review
FileSize
1.69 KB

This patch is not related to the Apache Solr, this will solve the issue of not displaying the timestamp field on Search api fields and available for sorting and filtering, there will be a duplicate version of energy field too, may be we can make use of it for different purpose like for ranges etc may be? So here is the patch to review.

gremy’s picture

I think it would be very useful to have a hook_radioactivity_post_update() hook so that third-party modules can implement any kind of integrations. I created this feature-request to address the problem: https://www.drupal.org/node/2503287