I have an index on some external entities that connect to a database. Leveraging search api is great because it enables the ability to use views. I am using the database backend.

https://www.drupal.org/project/external_entities

I realized that when a new row is created on the external database the total number of items is not updated for the search api index because drupal is not notified at all.

Clearing the index does not count the number of entities available to index again so the new items are left out of the index.

There is a couple of options on the interface that force the count and reindex.

- Destroy the index and create it again from configuration
- Disable and enable the index again

Programatically I am doing:

$index = Index::load('external_records');
// Clear it.
$index->clear();

IndexBatchHelper::setStringTranslation($this->getStringTranslation());
IndexBatchHelper::create($index);

Recently I changed it to

$index = Index::load('external_records');

$index->setStatus(0);
$index->save();
$index->setStatus(1);
$index->save();

IndexBatchHelper::setStringTranslation($this->getStringTranslation());
IndexBatchHelper::create($index);

I would like to have a way to do this from the interface with less steps and also this will make more obvious for people in situations similar to mine to really start the index from scratch.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

rodrigoaguilera created an issue. See original summary.

rodrigoaguilera’s picture

Issue summary: View changes
drunken monkey’s picture

Status: Active » Postponed (maintainer needs more info)

That's a good idea, thanks! Something like this has been often requested, we really should make this easier. (Even though, under ideal circumstances, this should never be necessary.)

Does this make more sense as a third button in the "Index status form" (i.e., next to "Queue all items for reindexing" and "Clear all indexed data") or as an additional checkbox for both these buttons' confirm forms? Or how would you have seen the UI for this?
(In any case, I guess we'd seque right into the "Track items" batch after submitting the confirm form.)

However, still a note specifically about your case: as explained in the doc block of DatasourceInterface, your custom datasource (or, rather, the module by which it is provided) is responsible for keeping track of new/updated/deleted items. So, it's to be expected that you'll need some custom code for that – if you can't detect CRUD operations on the datasource, then it's expected that you have to provide code that rebuilds the tracking table regularly (or however else you want to handle it). And, by the way, I think this would be the best way to do that (and how I'd implement it in the module):

      $index_task_manager = \Drupal::service('search_api.index_task_manager');
      $index_task_manager->stopTracking($index);
      $index_task_manager->startTracking($index);

That way, you avoid any undesired side effects (and the additional overhead) of disabling the index.

drunken monkey’s picture

Title: Add ability to rebuild an index in a way that also rebuild the indexed items » Add UI for rebuilding the tracking table for an index
rodrigoaguilera’s picture

Status: Postponed (maintainer needs more info) » Needs work

Thanks for looking into this. I will improve my code with that snippet.

I agree too add that third button as something like "Clear all indexed data and rebuild tracked items" with a confirmation informing about how that is the slowest and drastic action you can take for solving problems on an index.

I'm my situation I have no way to hook into the CRUD operations of the datasource so rebuilding it on demand is my fastest workaround.

drunken monkey’s picture

Status: Needs work » Postponed (maintainer needs more info)

I agree too add that third button as something like "Clear all indexed data and rebuild tracked items" with a confirmation informing about how that is the slowest and drastic action you can take for solving problems on an index.

I don't think we want to force people to also delete all indexed data while doing this. There's situations where you know there are only items missing, not surplus items indexed, and then throwing away all the indexed data seems like a waste.
If it's an extra button, I'd just add a "Rebuild tracking information" (or whatever, label TBD) button, and people would have to do "Clear all indexed data" in addition to that if that's what they want.
That's why I also think just having additional checkboxes for the two existing actions might be a good alternative.

Since others were interested in this functionality, too, I'll leave this in "Postponed" for another week or two to get additional input on the best UI.
(Also, "Needs work" is only for when there's already a patch, but that needs work. "Active" would have been the correct status.)

Renrhaf’s picture

Hey there, for a custom need I developed a Datasource fetching data from an external API and indexing it into an ElasticSearch backend.
I needed to track items too before being able to index them via the search api indexing batch, and I used a custom batch to insert all entries into the tracking table to do so. Also interested in this kind of features.

drunken monkey’s picture

Thanks, good to know! But do you have any opinion on the UI, as discussed in the last few comments (#3 ff.)?

patrickfweston’s picture

I'm running into a similar issue as Renrhaf above. We have an external API that we're indexing into a Solr backend. This API is updated and we detect updates. We were using code similar to what rodrigoaguilera originally posted to reset the tracking for the index. I've updated it to your snippet in #3.

As far as the UI goes, I think it makes sense to add a "Rebuild tracking information" link similar to the "Queue all items for reindexing" and "Clear all indexed data" links. I think grouping it here makes a little more sense because these are all actions to take after an index has been built out, ie they all involve updating or refreshing the current index.

You mention checkboxes in #6, but I'm not quite sure what you have in mind for those?

Renrhaf’s picture

I'm also for a solution using an additional button in the UI that will throw a batch to rebuild the whole tracking table.
Maybe some documentation should be added here to explain what this is for, because the tracking system is under the hood and not known by all users.

drunken monkey’s picture

Status: Postponed (maintainer needs more info) » Needs review
FileSize
10.57 KB

OK then, how about this?

You mention checkboxes in #6, but I'm not quite sure what you have in mind for those?

I mean't having an "Also rebuild tracking information" checkbox on the "Reindex" and "Clear" confirm forms, instead of a separate form for rebuilding the tracker.

drunken monkey’s picture

Anyone want to test/review?

borisson_’s picture

Status: Needs review » Needs work

I'm here with nits!

  1. +++ b/src/Entity/Index.php
    @@ -1127,6 +1128,22 @@ public function clear() {
    +  public function rebuildTracker() {
    +    if ($this->status()) {
    

    This can be improved by reversing the if

    if (!$this->status()) {
    return; 
    }
    ...
    

    This is just for readability so feel free to ignore.

  2. +++ b/src/Form/IndexRebuildTrackerConfirmForm.php
    @@ -0,0 +1,46 @@
    +    return $this->t("The complete information about existing and indexed item for this index will be deleted and will have to be rebuilt. This should usually not be necessary, but can help if some existing items aren't contained in the index's tracking data for whatever reason (in other words, when the total number of items to be indexed is less than it should be). This action cannot be undone.");
    

    /s/item/items/.

    This is really unwieldy to read, both in the patch and and when applied, but I don't think we can easily improve that. the description does get very long in the UI as well. Do you think it'd be better to change this into html with breaks/paragraphs in between the sentences?

  3. +++ b/src/IndexInterface.php
    @@ -695,6 +695,14 @@ public function reindex();
    +  /**
    +   * Starts a rebuild of the index's tracking information.
    +   *
    +   * @see \Drupal\search_api\Task\IndexTaskManagerInterface::stopTracking()
    +   * @see \Drupal\search_api\Task\IndexTaskManagerInterface::startTracking()
    +   */
    +  public function rebuildTracker();
    

    This is technically an API break. I think that means we should write a change record for this issue?

  4. +++ b/tests/src/Functional/IntegrationTest.php
    @@ -1444,6 +1458,51 @@ protected function checkIndexing() {
    +    $count = \Drupal::entityQuery('node')->count()->execute() - 1;
    

    Can we change this $count to $maniuplated_number_items? Or something else, I don't think $count suffiently conveys the meaning here.

drunken monkey’s picture

Status: Needs work » Needs review
FileSize
6.08 KB
11.92 KB

Thanks, I agree with all of those!
Here is the change record.

For 1., it's just a bit weird that the other methods around there follow the other pattern (though your proposed one is of course preferable). I now also changed it for clear(), I think that makes enough sense to do here even though it's out-of-scope.

borisson_’s picture

Status: Needs review » Reviewed & tested by the community

Great work Thomas!

  • drunken monkey committed 20307f5 on 8.x-1.x
    Issue #2930720 by drunken monkey, borisson_: Added a UI for rebuilding...
drunken monkey’s picture

Status: Reviewed & tested by the community » Fixed

OK then, thanks a lot for your feedback and help!
Committed.

rodrigoaguilera’s picture

Great feature!

Thank you Thomas :)

Renrhaf’s picture

Thanks !

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.