Prelude from freblasty:

I’ve committed a change to the tracking logic. Indexed items are marked using changed timestamp of “0”. Also inserted items use REQUEST_TIME. This will ensure all items are processed in the order that they got tracked. The usage of getLastIndexed() and setLastIndexed() are commented out and left there until the now tracker flow is reviewed.

Because of this commit the index batch process is operational now. On a side note I’m wondering whether Index::clear() shouldn’t take the index status and read-only value into account?

Current implementation:

  /**
   * {@inheritdoc}
   */
  public function clear() {
    $this->getServer()->deleteAllItems($this);
    $this->reindex();
    return TRUE;
  }

Proposed implementation:

  /**
   * {@inheritdoc}
   */
  public function clear() {
    if ($this->hasValidServer() && $this->reindex()) {
      $this->getServer()->deleteAllItems($this);
      return TRUE;
    }
    return FALSE;
  }

The proposed implementation allows the clear function to reuse the same conditions used by reindex. This way the IndexClearConfirmForm could reuse clear() instead of defining its own logic.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Nick_vh’s picture

I'm a little scared you reverted a design decision we made during dev days. The goal is to reduce the writes so whenever an item is tracked it should not update that row in the table. Indexing should be a read-only thing and using the variable we should keep track of where we were in the process.

From what I remember, the getRemainingItems() function does not incorporate the "new" items yet but it was still an item in the todo list.

I'll review it but I think we might need to revert it and talk more before we make those kind of changes.

Nick_vh’s picture

Also, we agreed we would not add functionality without tests so I'm going to revert your change and perhaps we can meet during the day to discuss it?
I'll be on IRC until 5:30PM.

Nick_vh’s picture

FileSize
6.87 KB
Nick_vh’s picture

Status: Active » Needs review

Uploaded patch works but is far from perfect. But it does show that we do not need the reset to 0 after indexing. It just does not make sense in a large scale site to update millions of rows during an indexing phase if all you want is to keep track of where you were.

aspilicious’s picture

I agree with nick on this one, such changes should be handled by patches in stead of a push, in order to maintain a clear overview of the end product.
And how the end result should work frontend and backend wise.

freblasty’s picture

FileSize
7.09 KB

The patch in #3 ensure all items are processed in chronological order and the index simply keeps track of the last processed item. However new items are being tracked using changed '0' which breaks the chronological order. The patch attached is based on #3 with two minor changes:
- inserting an item for tracking uses REQUEST_TIME;
- as changed with value '0' is no longer possible, i removed it from the getRemainingItemsQuery().

I've tested the workflow by hand for now and everything seems to work. Only thing I noticed is that when using the batch process it reports items not being indexed while everything indexed. But i'll fix that problem once this issue is fixed.

PS: Sorry for reverting a design decision that was made during dev days.

freblasty’s picture

Notes from brief hangout between me and Nick

It has come to my attention that newly tracked items were set to timestamp '0' deliberately. This to ensure new items get processed before changed items. However this causes a problem in how items are being processed using the new design.

In the new design tracking indexed items should never perform any database writes. To keep track of what was indexed and what is pending we use a state per index. As all items are sorted chronologically we only need to save the last indexed item from this chronological list to keep track of the progress (FIFO).

As a result adding tracked items with timestamp '0' causes the chronological list to enter a corrupt state. In a normal sens everything up to the last indexed state have been processed but when an item with timestamp '0' is added the meaning of processed breaks and is no longer guaranteed.

Action

Before we create some more advanced queue to ensure new items get processed before changed items, we're wondering if processing new items before changed is really that important? It seems more logical that everything gets processed in the same order as they were add/modified.

@Thomas: Could you add some feedback?

freblasty’s picture

Assigned: Unassigned » drunken monkey
drunken monkey’s picture

Assigned: drunken monkey » Unassigned

It seems more logical that everything gets processed in the same order as they were add/modified.

It was a design decision already in Drupal 7 that it makes sense to index newly created items as soon as possible, since they otherwise won't show up at all in searches, while changed items' indexed state will usually be already very similar to the new one. Implementing this with the new system is trivial, just using WHERE changed = 0 OR changed > :last_time OR (changed = :last_time AND item_id > :last_id). It would of course necessitate a second write per item in its total lifecycle (a third, if you count an eventual delete), but I don't think that's such a big problem.
I'm prepared to let myself be overruled here, though, if most people feel indexing items in the order of the changes makes more sense than indexing new items first.

So, what's everyone else's take on this?

Nick_vh’s picture

What we are building here is a priority tracking table and I'm sure there are better ways for building such systems then to "fake" the created timestamp in the tracking table.

As I see it we have some options here:

1) Use a first in first out approach and clean up the special case for the new items. This would make it simpler to understand what is going on in the tracking table + you avoid all writes during the indexing phase

2) Use a first in first out approach but add prioritization to this system. It should be a proper prioritization since other systems should be able to leverage this. An idea would be an extra column with priority and integer values when adding it. Similar to "boosts". We can sort on those values so that the order of changed is always the same. It does make it much more difficult to then get the next set of values because now we need to store the prioirty and the item id to get a grip on where we were last time but the priority could have changed. I don't think this system is stable enough to be reliable for large scale deployments.

3) Option 1 + an option to index new items directly on creation. This would be similar to the index items immediately but only for new items. This allows the user to choose if new items should be indexed instantly

I'm a big fan of option 3, as it does not remove the priority system but it leverages different systems.

alarcombe’s picture

Agree with nick about faking timestamps. Does the job, but whilst we've got the space to Do It Right, then we probably should. Also there will be different requirements based on use cases. I'm a big fan of 'let the user decide but provide sensible defaults for the 90%' - we shouldn't enforce one way of doing things at the exclusion of others which may be more appropriate.

So I'm in favour of number 3 :)

freblasty’s picture

Assigned: Unassigned » freblasty
freblasty’s picture

FileSize
22.87 KB

First attempt at implementing option 3. This patch will allow users to select three types of direct indexing: "Disabled", "New items" and "New and changed items". Depending on the direct indexing type the datasource plugin will react differently when DatasourceInterface::trackInsert() or DatasourceInterface::trackUpdated() is invoked.

Once an index has queued items it will register itself for processing. Processing itself is done by using a shutdown function. All queues and list are implemented using Drupal::state().

Currently there is still one problem I could find with this implementation. When last indexed is set to its initial state "id:0 and changed:0" then adding items to the queue fails and nothing is being processed or added to the tracking table. Should we update last indexed in this edge case?

Edit: This patch is based on the previous patches.

aspilicious’s picture

When creating yml files it's important that they are human readable.
I think we should switch to "text based" options for the indexing options in stead of numbers.
Makes it very easy for site builders and search api maintainers to investigate the config.

Nick_vh’s picture

  1. +++ b/lib/Drupal/search_api/Datasource/DatasourcePluginBase.php
    @@ -207,6 +218,17 @@ abstract class DatasourcePluginBase extends IndexPluginBase implements Datasourc
    +        if ($index->getOption('index_directly') === SEARCH_API_INDEX_DIRECTLY_ALL) {
    

    If we index all directly. Is there a need to keep track of the items in the table? Might be a silly question, but we would only need to keep track of this in case we want to switch back to a tracking system? I guess I'm a little confused on the purpose of index all directly and its impact on the tracking table & performance

  2. +++ b/lib/Drupal/search_api/Datasource/DatasourcePluginBase.php
    @@ -207,6 +218,17 @@ abstract class DatasourcePluginBase extends IndexPluginBase implements Datasourc
    +          // Set changed to just before the last indexed timestamp.
    

    Might need more explanations so that people understand why it goes to -1. This changes the timestamp to 1 second before the last updated date of the index pointer. This means that we will trick the indexer into thinking it already indexed it by putting a timestamp in the table that is not 0 but also not one of the values that will be chosen next.

  3. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -700,26 +700,113 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +      $ids = array_keys(($limit > -1 ? array_slice($queue, 0, $limit, TRUE) : $queue));
    

    one liners are neat but unreadable. Please make it more readable

  4. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -700,26 +700,113 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +      $this->dequeueItems(array_intersect($ids, $ids_indexed));
    

    Actually there is no queue anymore, it's a tracking table. Can we rename this?

    The queue is only used for tasks, not really for items. I guess this addition of the queue might not even belong to this patch

So in short, the queue that you added here really is another issue and we're making this too complex. In D7 Search API it was decided to not use queue's anymore for those purposes OR I'm confused why you are using the queue here. Queue's are inflexible for tracking items. Saving an entity twice adds it twice to the queue as you cannot inspect and iterate over a queue.

Nick_vh’s picture

Status: Needs review » Needs work
freblasty’s picture

The queue is being used to hold items that need processing. Once processed they get removed from the queue. Note that queue might not be the correct in this context. It is simply a storage not a real Drupal Queue.

In D7 Search API a static cache was used to hold the items which need direct processing.

Nick_vh’s picture

Hmm, so this "queue" will never live out of the page request because we're using the subscriber to execute the processQueueItems I see. So if we have a drush script that updates 1000 items this will add all those items to the list/queue and process them at the end of the update process?

The whole logic makes sense but I'd be careful calling it queue. as it is not in line with a "drupal" queue. It's a page-request-static-cache using the Drupal 8 state system.

let's rename it so it's clear. itemsToIndexList, staticQueue?

Nick_vh’s picture

Another suggestion for the name:

ephemeralTracking
addToEphemeralTracking
removeFromEphemeralTracking
processEphemeralTracking
...

ephemeral is often used to show a static state that does not live outside of the current scope. Such as cloud storage, .... See http://en.wikipedia.org/wiki/Ephemeral

Nick_vh’s picture

FileSize
23.73 KB

Making it work with both the test services

Nick_vh’s picture

FileSize
23.73 KB

Reroll so it applies again after the options of the index were fixed.

drunken monkey’s picture

  1. +++ b/lib/Drupal/search_api/Datasource/DatasourceInterface.php
    @@ -94,7 +94,7 @@ interface DatasourceInterface extends IndexPluginInterface {
        * @param array $ids
    -   *   An array of item IDs.
    +   *   An array of item IDs with the timestamp of their changed dates.
    

    What is that supposed to mean? Also, I don't see any change in the code relating to this.

  2. +++ b/lib/Drupal/search_api/Datasource/DatasourcePluginBase.php
    @@ -162,6 +162,17 @@ abstract class DatasourcePluginBase extends IndexPluginBase implements Datasourc
    +        // Check if new items need to be processed directly.
    +        if ($index->getOption('index_directly') !== SEARCH_API_INDEX_DIRECTLY_DISABLED) {
    +          // Get the last indexed state.
    +          $last_indexed = $index->getLastIndexed();
    +          // Set changed to just before the last indexed timestamp.
    +          $changed = ($last_indexed['changed'] - 1);
    +        }
    +        else {
    +          // Use the current request time.
    +          $changed = REQUEST_TIME;
    +        }
    

    This code doesn't belong into the datasource controller. This logic is completely generic for the index, the datasource should only need to know about new/changed/indexed/deleted items.

    If you need to use such complicated logic here, and even set a wrong timestamp after all, then this might show the limit of the new tracking system. (See also below.)

  3. +++ b/lib/Drupal/search_api/Datasource/DatasourcePluginBase.php
    @@ -251,18 +273,13 @@ abstract class DatasourcePluginBase extends IndexPluginBase implements Datasourc
    +        // Move to end of the array
    +        $last_key = array_slice($ids, -1, 1, TRUE);
    +        // Get the item id
    +        $item_id = key($last_key);
    +        // Get the changed value
    +        $changed = reset($last_key);
    +        $index->setLastIndexed($changed, $item_id);
    

    This also makes me worry a lot about the capabilities of the new tracking system. You don't check at all which items were successfully indexed, or whether indexing failed for some of them – you only remember a single ID (implicitly assuming the IDs are in the same order they were returned in from getRemainingItems(), which is also not guaranteed in any way).

    While I understand that reducing writes is a good goal, the system has to be correct and error-proof foremost. If this isn't possible or too complicated with the write-optimized system, we should probably switch back to the old one. (Minus the magic "1" for new items.)

  4. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -700,26 +700,113 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +      // Get the queue items.
    +      $queue = \Drupal::state()->get("search_api.index.{$this->id()}.item_queue", array());
    

    I don't think the usage of the State API is needed here. If you only save the items for the request, you can just use a property on the entity object, or a static property on the utility class, or something similar.

    I'm also not sure that the index entity class needs to know about this functionality at all. I think reacting to CUD of items should be handled in the utility class.

    It also doesn't make sense, in my opinion, to add a $limit parameter to this method.

  5. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -700,26 +700,113 @@ class Index extends ConfigEntityBase implements IndexInterface {
       public function reindex() {
    -    return $this->getDatasource()->trackUpdate();
    +    // Remove all items from the queue as they will be mark for reindexing.
    +    $this->dequeueItems();
    +    // Get the current index directly state.
    +    $index_directly = $this->getOption('index_directly');
    +    // Disable the directly indexing of items until track update is finished.
    +    $this->setOption('index_directly', SEARCH_API_INDEX_DIRECTLY_DISABLED);
    +    // Mark all tracked items for reindexing.
    +    $success = $this->getDatasource()->trackUpdate();
    +    // Restore index directly option.
    +    $this->setOption('index_directly', $index_directly);
    +
    +    return $success;
       }
    

    This code more or less proves that the current code in this patch is way too tightly coupled. Changing an option just for one method call is usually a strong sign for that.
    Dequeueing items should also not be necessary – it will only be relevant if someone programmatically does both an "Index now" and a "Queue all items for reindexing" in the same page request; and if they do, it shouldn't be our problem.

  6. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -803,11 +890,19 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +        "search_api.index.{$this->id()}.last_indexed",
    +        "search_api.index.{$this->id()}.item_queue",
    

    $this is not defined here (but, as said, I don't think this code is needed anyways).

  7. +++ b/lib/Drupal/search_api/EventSubscriber/SearchApiEventSubscriber.php
    @@ -0,0 +1,36 @@
    +class SearchApiEventSubscriber implements EventSubscriberInterface {
    

    Does it need the "SearchApi" prefix?

    Also, what does it do? It should definitely have a class documentation explaining exactly that (because I really have no clue).

  8. +++ b/lib/Drupal/search_api/EventSubscriber/SearchApiEventSubscriber.php
    @@ -0,0 +1,36 @@
    +  /**
    +   *
    +   * @param \Symfony\Component\HttpKernel\Event\GetResponseEvent $event
    +   */
    

    Obviously missing documentation here, at least a {@inheritdoc}.

  9. +++ b/lib/Drupal/search_api/Index/IndexInterface.php
    @@ -262,6 +262,35 @@ interface IndexInterface extends ConfigEntityInterface {
    +   * Dequeue items from the index queue.
    +   * ¶
    +   * @param array|NULL $ids
    

    Trailing whitespace.
    Also, it's "null" in doc comments, all lower case.

  10. +++ b/search_api.module
    @@ -16,6 +16,21 @@ use Drupal\search_api\Utility\Utility;
    +/**
    + * Constant which indicates items are not processed directly.
    + */
    +define('SEARCH_API_INDEX_DIRECTLY_DISABLED', 0);
    +
    +/**
    + * Constant which indicates only new items are processed directly.
    + */
    +define('SEARCH_API_INDEX_DIRECTLY_NEW', 1);
    +
    +/**
    + * Constant which indicates new and changed items are processed directly.
    + */
    +define('SEARCH_API_INDEX_DIRECTLY_ALL', 2);
    +
    

    I think these should now use const instead of define().
    Also, if we are keeping integers instead of names (though I'm also more in favor of names), I would make them bitmasks. So, create an additional "*_CHANGED" constant with 2, and set "*_ALL" to 3 (i.e., ALL | CHANGED).

  11. +++ b/search_api.module
    @@ -48,6 +63,10 @@ function search_api_entity_insert(EntityInterface $entity) {
    +          $index->AddToEphemeralTracker(array($entity->id()));
    

    This method doesn't exist in the patch.

    Also, I don't like that name at all, "ephemeral" is a much too complicated word here. I also don't think I've ever seen it used in code.

    In D7 this was called "index_delayed", maybe that's good enough? Otherwise, "static queue" might be a better name. I also was very confused by the use of "queue" here.
    rememberForIndexing() or indexAtEndOfRequest()? All pretty horrible, unfortunately …

Nick_vh’s picture

RE: 2)

This code doesn't belong into the datasource controller. This logic is completely generic for the index, the datasource should only need to know about new/changed/indexed/deleted items.

I'm not sure if the datasource should even know about new versus changed. Keep in mind that we're trying to mimic the same behavior as Drupal 7 but by being more generic. Adding the "Index new items directly" adds this -1 complexity. But perhaps there is even a better way. Maybe the index should then not even send the item to the tracking table but only add it when we're doing an update. This keeps the tracking table from additional complexity while maintaining the "index new items directly".

FWIW I'm not even sure if new items should be indexed directly and perhaps that could even be a separate patch. Based on my experience this is not something that people really care much about. It's either all directly or all queued.

3)

This also makes me worry a lot about the capabilities of the new tracking system. You don't check at all which items were successfully indexed, or whether indexing failed for some of them – you only remember a single ID (implicitly assuming the IDs are in the same order they were returned in from getRemainingItems(), which is also not guaranteed in any way). While I understand that reducing writes is a good goal, the system has to be correct and error-proof foremost. If this isn't possible or too complicated with the write-optimized system, we should probably switch back to the old one. (Minus the magic "1" for new items.)

If an item failed, the indexing should probably stop. If we decide to redo the item we need to re-add the item to the tracking table meaning that the changed value needs to be updated to the current REQUEST_TIME. Having the pointer be accurate is a critical part of the system indeed but it is very possible to make that fool proof. A failed item does not mean that the tracking table is incorrect. It all depends on the behavior on failure what to do with that item. getRemainingItems should guarantee that the items are returned in the same order and gives you back the next set to where we need to point the pointer to.

5)

This code more or less proves that the current code in this patch is way too tightly coupled. Changing an option just for one method call is usually a strong sign for that.
Dequeueing items should also not be necessary – it will only be relevant if someone programmatically does both an "Index now" and a "Queue all items for reindexing" in the same page request; and if they do, it shouldn't be our problem.

Or we are trying to do too many things at once. If we can start with a good tracking system, we can add the index directly option in a follow-up once the tracking is stabilized? If not, we have some iterations to do :)

Regarding the name of this queue mechanism, ephemeral is perhaps complex but it does say what it is. A short-lived storage. It's used in servers and fwiw I also use it in professional contexts.

freblasty’s picture

FileSize
29.42 KB

Second attempt at option 3. I've tried to decouple the code for indexing more and added a seperate utility class "EphemeralTracking". All defines are replaced with constants.

Current implementation for direct indexing will follow normal indexing flow. That is an item will be added or updated in the tracking table. Then added to the ephemeral tracking. Because of this when an error would occur during direct index processing the items will still get picked up by the cron.

Note: patch is based on changes from #6 and also includes changes made by Nick in #22 and #23.

Nick_vh’s picture

Also, before this can go in we need to add tests so we know this code is robust and is working.

drunken monkey’s picture

If an item failed, the indexing should probably stop.

So one faulty/problematic item could stop, or at least seriously slow down, the entire indexing process? That doesn't seem reasonable. Marking the failed items like they were newly changed, like you suggest afterwards, makes a lot more sense to me. This would at least preserve that capability of the old system. However, that should then be included in this patch. (See the D7 tests, where I also test an item which fails to be indexed. This should work in D8, too, or we'd have to redefine that part of the API.)

However, I don't think that your system would (cleanly) support indexing individual items either, right? You could only mark them as changed and then wait until they are indexed based on the normal order of indexing. (Or just index them without informing the datasource and risking to index it a second time in the same state.)
This would also pose a problem for any "Index immediately" functionality: you would either have to insert/update the item with a wrong "changed" value, or risk losing updates to still unindexed items. (It's not guaranteed that just because "Index immediately" is enabled, no items will be "dirty" in the tracking system – not even if we assume that items will always be indexed without fail.) Or just index it without "telling" the tracking system, ending up with a wrong "changed" value after all and risking an inconsistent/wrong state if indexing fails.

getRemainingItems should guarantee that the items are returned in the same order and gives you back the next set to where we need to point the pointer to.

As said, getRemainingItems() is not the problem – that's in the same class, we can let it guarantee anything we want.
However, all the rest of the indexing pipeline does not guarantee in any way that the order of items would be the same, and we'd have to add quite a bit of either code or method contracts to change that.

Or we are trying to do too many things at once. If we can start with a good tracking system, we can add the index directly option in a follow-up once the tracking is stabilized? If not, we have some iterations to do :)

I think the problem is rather a case of "premature optimization". The D7 system worked perfectly fine, has now been tested for years (and refined, with your and Acquia's help) and was pretty self-explanatory. To improve the latter, we could even add a "state" column, as Frédéric initially did in the D8 port, making the whole system really simple. (Dropping the "new items: changed = 1" special case would be fine for me, too.) With the current new approach, it seems that we trade a very minor performance improvement against either reduced functionality and correctness or largely increased complexity (and thus, error-proneness).

Furthermore, since the D7 system is very much decoupled, it would even be possible to just override this system if the few extra DB writes really are a problem for someone. We could even try to take Frédéric's initial approach a bit further and decouple tracking (more or less) completely from the datasource (which might make much sense in any case for #2230931: Supporting multiple item types per index). That way, overriding this part would become even easier.
In contrast, at least in the current state or from what I could come up with, the new system would be more tightly coupled, making it harder to completely replace with a custom solution. The less we assume of any part of the system, the better for flexibility.

So, I agree that we can very well add and/or refine "Index items immediately" later. But a "good tracking system" as the basis would be one which we don't have to adapt when adding such features that don't/shouldn't impact the tracking directly.

And, something else entirely:

+++ b/lib/Drupal/search_api/Datasource/DatasourcePluginBase.php
@@ -207,20 +211,23 @@ abstract class DatasourcePluginBase extends IndexPluginBase implements Datasourc
+            $statement = $this->createUpdateStatement()
+                ->fields(array('changed' => REQUEST_TIME));
+            // Check whether IDs need to be added as condition.
+            if ($ids_chunk) {
+              $statement->condition('item_id', $ids_chunk);
+            }
+            // Execute the update statement.
+            $statement->execute();

This will also change the "changed" column of items that would already be re-indexed, thus just delaying their being indexed. While unlikely, this might/will even lead to some cases where a frequently updated item never gets indexed.
Since we don't store explicitly which items still need to be indexed, we'd probably have to add the same logic here as for getRemainingItems() to avoid this scenario.

Nick_vh’s picture

So, I agree that we can very well add and/or refine "Index items immediately" later. But a "good tracking system" as the basis would be one which we don't have to adapt when adding such features that don't/shouldn't impact the tracking directly.

Totally agree. I'm all for flexibility and perhaps we went quite far here in trying to optimize - blame engineering-minds right?

I would suggest we look at http://en.wikipedia.org/wiki/Priority_queue and try to get the most basic implementation in as a generic system so that they can be overridden with priority queues of all sorts when that might become a necessity later on.

My thoughts after reading all of the above

  1. Initially a FIFO approach (new and changed are all the same)
  2. a system that allows for priority queues/tracking. Eg, adding a priority column allows for all sorts of algorithms and flexibilities. This system is not yet used in this implementation.
  3. a way to keep track where the system left of. This might prove the most difficult with priorities and setting the changed to a predefined number was one way to know exactly where you've been but I think we can do better. Maybe I/we should reconsider the writes and we should keep track of a status column where we filter by status and sort on changed and item_id and priority for the next set. The other alternative is deleting the item after we indexed it from the tracking table. The latter is an appealing idea for massive sites but as mentioned it could be over-optimization also. Keeping track of where you were using a pointer as suggested at first is impossible with a priority status as items can switch places.
  4. a generic tracking system that is decoupled from the datasource (sorry frederick, we removed it inititially) if someone wants to make optimizations to this system

I'm not fully aware of your thoughts on the tracking system in regards to the multiple entity types per index but I'd love to hear/read them.

Nick_vh’s picture

Issue tags: +sprint
drunken monkey’s picture

Excellent, great to hear. Your plan looks very sound, too.

2. a system that allows for priority queues/tracking. Eg, adding a priority column allows for all sorts of algorithms and flexibilities. This system is not yet used in this implementation.

The good thing is that this wouldn't even be necessary. We only define the interface, implementations can track the items any way they want. We wouldn't need a priority column for our basic implementation, but if someone wants to use one, they can just use a different table with that additional column. (Same if we later want to use a priority queue ourselves after all – we just add a new column to the table, but nothing about the interface changes.)

Re 3: Deleting items is, I think, out of the question simply because we then couldn't show the total amount of items (and the number of indexed ones) anymore. A small feature, sure, but still an important one, I think. Probably it would be possible to keep track of the total or indexed number in some other way, but altogether I think these are also optimizations we should leave to others, if they want them. (What would be the reasoning behind it, anyways? Just the smaller table, or are deletes less expensive than updates?)
Regarding the "state"/"status" (TBD) column: what would be the advantage compared to the current system of using "changed" as a dual-purpose column? Just increased clarity, or is there another advantage, too? In any case I wouldn't be opposed to that change, probably that decision was a premature optimization, too.

a generic tracking system that is decoupled from the datasource (sorry frederick, we removed it inititially) if someone wants to make optimizations to this system

Well, Frédéric's system wasn't really decoupled, it was kind of half-way. What I'd envision now would be that these two systems become completely separate, and that the index just retrieves both a datasource (or several, (hopefully) soon) and a tracker to use for their respective purposes. The tracker would then probably be globally pluggable in some way – maybe by using the utility class to create it and then allowing it there to be overridden in some way. Or maybe we should make it a service? But we can't inject that into the index anyways, right? It might still make sense as a service, though. That's probably a question for Bram or Andrei.

In any case, I don't see a use case for making that overridable on a per-index base, right? (If you really want, it would still be possible, by either overriding the entity class or by creating a proxy tracker class that uses a different implementation for each index.)

One problem with this would be that the tracker would then be unaware of any datasource settings restricting the indexed items. That means, the logic for only tracking items of the selected bundles (in our case) would have to live in search_api_entity_(insert/update/delete)(), before calling the respective tracking methods.
It's no conceptual problem, really, as that code is completely specific to our content entity datasource and not at all part of the framework itself. But it would still be nice to have as little such code in the module, and as much in the datasource class, as possible.
However, as I think hooks are currently the only way to react to entity CRUD events, there's hardly a (clean) way around it.

I'm not fully aware of your thoughts on the tracking system in regards to the multiple entity types per index but I'd love to hear/read them.

With a decoupled tracker, this would probably even be way easier than before. With datasource also responsible for tracking, an index's status would have been tracked across multiple datasources, so getting the index status would have necessitated looping over all of them. This way, we just introduce an additional column, "item_type", and pass the type to the tracker as well when tracking.

freblasty’s picture

My original idea behind the seperate tracker was to decouple it from the datasource. However as Thomas mentioned it was only half way done due to lack of time and needed more collaboration/modeling.

In any case, I don't see a use case for making that overridable on a per-index base, right? (If you really want, it would still be possible, by either overriding the entity class or by creating a proxy tracker class that uses a different implementation for each index.)

Personally I would allow a per index override. That way the Search API becomes a robust and flexible module and would in theory cover all use cases for tracking. Each tracker implementation can be provided as plugin and each index defaulting to the search api tracker.

drunken monkey’s picture

OK, we settled on having it as a plugin, configurable (in advanced options) per index.

freblasty’s picture

FileSize
84.69 KB

Attached patch contains the following:

  • Made both test services work
  • Tracker system is implemented as plugin
  • Tracker can be configured per index
  • Default tracker uses FIFO (no optimalizations)
  • Datasources report to the index for CUD operations instead of directly to the tracker
  • Tracker has knowledge about the items being tracked including the datasource which triggered it. However because of #2230931: Supporting multiple item types per index the datasource is ignored and serves as a skeleton to ease development of that issue (see @todo in patch)
  • Index batch process through the UI has been fixed
  • Added default_tracker setting to search api which is set on every new index entity. That way users only have to change the tracker if needed
  • Tracker plugins also support configuration (future proof)

What needs to happen with the "index immediately" option on an index? I think it should be removed as a more advanced tracker plugin which provides priority to items could replace this option. Even the different modes could be provided as a configuration setting of the more advanced tracker plugin ("disabled", "only new items", "new and updated items").

amateescu’s picture

  1. +++ b/config/search_api.settings.yml
    @@ -1,2 +1,3 @@
    +default_tracker: 'search_api_default_tracker'
    

    Plugin ids don't usually have a prefix that contains the module name and a suffix with the plugin type, so we should rename this one to just 'default'.

  2. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -310,6 +337,41 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +    // Get the tracker plugin manager.
    +    $tracker_plugin_definition = \Drupal::service('search_api.tracker.plugin.manager')->getDefinition($this->getTrackerId());
    +    // Determine whether the tracker is valid.
    +    return !empty($tracker_plugin_definition);
    

    I see this pattern a lot in the patch, where a local variable is created even when it's only used once after that. I think the code would be a lot more readable if we inline the call (and drop the unnecessary code comments) :)

  3. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -310,6 +337,41 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +      // Get the ID of the tracker plugin.
    +      $tracker_plugin_id = $this->getTrackerId();
    +      // Get the tracker plugin manager.
    +      $tracker_plugin_manager = \Drupal::service('search_api.tracker.plugin.manager');
    +      // Get the plugin configuration for the tracker.
    +      $tracker_plugin_configuration = array('index' => $this) + $this->trackerPluginConfig;
    +      // Create a tracker plugin instance.
    +      $this->trackerPluginInstance = $tracker_plugin_manager->createInstance($tracker_plugin_id, $tracker_plugin_configuration);
    

    Same here, $tracker_plugin_id and $tracker_plugin_manager are only used once, yet they have their own comment and separate line of code.

  4. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -701,27 +763,86 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +        // Get the datasource for the current item type. @todo: Should be
    +        // reworked once multiple datasources are supported.
    

    Should we add a reference here to https://drupal.org/node/2230931 ?

  5. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -701,27 +763,86 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +      // Get the datasource.
    +      $datasource = $this->getDatasource();
    +      // Mark all items for processing for the current datasource.
    +      return $this->getTracker()->trackUpdated($datasource);
    

    Same here, unneeded comment and local variable initialisation for $datasource :)

  6. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -701,27 +763,86 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +    // Check whether the index has a valid server and can be reindex.
    +    if ($this->reindex() && $this->hasValidServer()) {
    

    It's not clear from the method name reindex() that it returns a boolean saying if the items can be reindexed. It would be cleaner if we split the new checks added to reindex() into a new canBeReindexed() method.

  7. +++ b/lib/Drupal/search_api/Form/IndexForm.php
    @@ -91,6 +115,24 @@ class IndexForm extends EntityFormController {
    +  protected function getTrackerPluginDefinitionOptions() {
    

    This method should be moved to the plugin manager, with the simpler name of getOptions(). At least that's what core usually does :)

  8. +++ b/lib/Drupal/search_api/Form/IndexForm.php
    @@ -372,6 +466,43 @@ class IndexForm extends EntityFormController {
    +    // Do not notify the user about a missing datasource plugin if a new index
    ...
    +      // Notify the user about the missing datasource plugin.
    

    .. missing tracker plugin ...

  9. +++ b/lib/Drupal/search_api/Form/IndexForm.php
    @@ -458,8 +639,22 @@ class IndexForm extends EntityFormController {
    +      // Get the tracker from the entity.
    +      $tracker = $entity->getTracker();
    ...
    +      $tracker->submitConfigurationForm($form['trackerPluginConfig'], $tracker_form_state);
    

    $tracker is only used once..

  10. +++ b/lib/Drupal/search_api/Form/IndexStatusForm.php
    @@ -29,10 +29,10 @@ class IndexStatusForm extends FormBase {
    +      // Get the tracker used by the search index.
    +      $tracker = $index->getTracker();
    
    @@ -43,7 +43,7 @@ class IndexStatusForm extends FormBase {
    +      $has_remaining_items = ($tracker->getRemainingItemsCount() > 0);
    

    Same here.

  11. +++ b/lib/Drupal/search_api/Index/IndexInterface.php
    @@ -107,6 +108,30 @@ interface IndexInterface extends ConfigEntityInterface {
    +   * ¶
    
    @@ -270,6 +295,45 @@ interface IndexInterface extends ConfigEntityInterface {
    +   * ¶
    ...
    +   * ¶
    
    +++ b/lib/Drupal/search_api/Plugin/SearchApi/Tracker/DefaultTracker.php
    @@ -0,0 +1,421 @@
    +   * ¶
    
    +++ b/lib/Drupal/search_api/Tracker/TrackerInterface.php
    @@ -0,0 +1,139 @@
    +   * ¶
    ...
    +   * ¶
    ...
    +   * ¶
    ...
    +   * ¶
    

    Extra spaces.

  12. +++ b/lib/Drupal/search_api/Index/IndexInterface.php
    @@ -107,6 +108,30 @@ interface IndexInterface extends ConfigEntityInterface {
    +   * @return boolean
    
    @@ -270,6 +295,45 @@ interface IndexInterface extends ConfigEntityInterface {
    +   * @return boolean
    ...
    +   * @return boolean
    ...
    +   * @return boolean
    

    The doc standard is 'bool' :)

  13. +++ b/lib/Drupal/search_api/Plugin/SearchApi/Datasource/ContentEntityDatasource.php
    @@ -108,17 +96,22 @@ class ContentEntityDatasource extends DatasourcePluginBase implements ContainerF
    +    return ($items ? reset($items) : NULL);
    

    Shouldn't this be return ($items) ? reset($items) : NULL; ?

    Also, the code comments here are really not necessary, this load() pattern that calls loadMultiple() is used everywhere.

  14. +++ b/search_api.services.yml
    @@ -10,3 +10,7 @@ services:
    +  search_api.tracker.plugin.manager:
    

    The standard (core) naming convention for plugin manager's service is plugin.manager.[module_name].[plugin_type]. In our case, it would be plugin.manager.search_api.tracker, but maybe we can leave this as a followup issue because all our services have to be renamed.

In a manual test of this patch, I get these notices when saving the 'default_node_index' index form:

    Notice: Undefined index: trackerPluginConfig in Drupal\search_api\Form\IndexForm->validate() (line 606 of modules/search_api/lib/Drupal/search_api/Form/IndexForm.php).
    Notice: Undefined index: trackerPluginConfig in Drupal\search_api\Form\IndexForm->submit() (line 648 of modules/search_api/lib/Drupal/search_api/Form/IndexForm.php).

which means either that we need to update the default config file with the new configuration options and default values or that there's a bug in the form related code from the patch.

Overall, this looks quite good to me. The number of methods from DatasourceInterface was really scary, it was doing waaay too many things, so splitting out the tracking functionality is a great thing to do :)

Edit: fixed some typos.

Berdir’s picture

1. Hm. default is a special case but I've discussed with @alexpott before that our not-namespacing of plugins could result in problems. And we do more or less require for various plugin types, like entity_types, menu local tasks/actions and so on. The suffix is certainly unnecessary and in the case of the default and this being in the module where the plugin type is defined, I guess leaving it out is OK too. That said, when implementing plugins for other modules, I would recommend to prefix it with the module.

amateescu’s picture

drunken monkey’s picture

So, are you working on this now, Andrei? If you don't have other important tasks, I think this one has priority right now, and since Frédéric doesn't have time during the day, you can probably feel free to try and fix these issues.

As far as the architectural design goes, the patch looks great to me, more or less exactly what I had imagined. Great job, Frédéric, thanks!

freblasty’s picture

1. Hm. default is a special case but I've discussed with @alexpott before that our not-namespacing of plugins could result in problems. And we do more or less require for various plugin types, like entity_types, menu local tasks/actions and so on. The suffix is certainly unnecessary and in the case of the default and this being in the module where the plugin type is defined, I guess leaving it out is OK too. That said, when implementing plugins for other modules, I would recommend to prefix it with the module.

Should the namespace of the default tracker plugin stay as "search_api_default_tracker" or be changed to "default_tracker"?

Nick_vh’s picture

  1. +++ b/lib/Drupal/search_api/Batch/IndexBatchHelper.php
    @@ -105,17 +105,21 @@ class IndexBatchHelper {
    +      // Calculate the remaining amount of items that can be indexed. Note
    +      // a minimum is taking between the allowed number of items and the
    

    Please write "Note:" or "Note that".

    The minimum is for a status during the batch process I assume?

    Is this only to make sure that if the limit is higher than the remaining it does not exceed the amount it still needs to index? I assume it is. I was a little confused reading this so probably this could be improved.

  2. +++ b/lib/Drupal/search_api/Entity/Index.php
    @@ -701,27 +763,86 @@ class Index extends ConfigEntityBase implements IndexInterface {
    +        // Get the datasource for the current item type. @todo: Should be
    

    If you put @todo's in a separate line it is easier to pick them up in an IDE I think

  3. +++ b/lib/Drupal/search_api/Plugin/SearchApi/Tracker/DefaultTracker.php
    @@ -0,0 +1,421 @@
    +   */
    +  protected function createRemainingItemsStatement() {
    +    // Build the select statement. @todo: Should filter on the datasource once
    +    // multiple datasources are supported.
    +    $statement = $this->createSelectStatement();
    +    // Only the item ID is needed.
    +    $statement->fields('sai', array('item_id'));
    +    // Exclude items marked as indexed.
    +    $statement->condition('sai.changed', 0, '>');
    +    // Sort items by changed timestamp.
    +    $statement->orderBy('sai.changed', 'ASC');
    

    How do you get the remaining now? We are no longer using the 0 for unindexed items so we still need that tracking pointer to go from there.

Aside of that it is looking good. Are the tests that aspilicious wrote working still?

freblasty’s picture

FileSize
90.93 KB

Attached patch includes remarks from #33 and #38.

Overview of test results:

  • [OK] Drupal\search_api\Tests\SearchApiIndexStorageUnitTest 5 passes
  • [NOK] Drupal\search_api\Tests\SearchApiLocalActionsWebTest 18 passes 6 fails 4 exceptions 5 messages

    Fatal error: Call to a member function getDatasource() on a non-object in /var/www/mydev8/modules/sandbox/search_api/lib/Drupal/search_api/Tests/SearchApiIntegrationTest.php on line 328
    FATAL Drupal\search_api\Tests\SearchApiIntegrationTest: test runner returned a non-zero error code (255).

  • [OK] Drupal\search_api\Tests\SearchApiNodeStatusProcessorTestCase 12 passes
  • [NOK] Drupal\search_api\Tests\SearchApiOverviewPageTest 70 passes 33 fails 6 exceptions 18 messages
  • [OK] Drupal\search_api\Tests\SearchApiServerStorageUnitTest 5 passes
  • [OK] Drupal\search_api\Tests\SearchApiTransliterationProcessorTest 8 passes
  • [OK] Drupal\search_api\Tests\Menu\SearchApiLocalActionsTest 1 passes
  • [OK] Drupal\search_api\Tests\Menu\SearchApiLocalTasksTest 0 passes 6 fails
  • [OK] Drupal\search_api\Tests\Plugin\Processor\SearchApiHighlightTest 2 passes
  • [OK] Drupal\search_api\Tests\Plugin\Processor\HtmlFilterTest 20 passes
  • [OK] Drupal\search_api\Tests\Plugin\Processor\SearchApiIgnorecaseTest 1 passes
  • [OK] Drupal\search_api\Tests\Plugin\Processor\SearchApiStopwordsTest 2 passes

Edit: @Nick_vh the current default tracker implementation uses changed value 0 as mark for indexed items. Currently the tracker uses no optimalization.
Edit: Updated test results.

Nick_vh’s picture

Well, we don't want to keep that 0 thing. Let's make it conform with everything. Since we have a FIFO now and you are tracking the new items with their real changed value we do need the pointer don't we? Or we have to start writing 0's again for each item we index. I don't like that approach :)

amateescu’s picture

Assigned: freblasty » amateescu
FileSize
27.14 KB

Here's an interdiff between #32 and #39 for who wants to see exactly what changed.

Also, I'm starting to look at those test fails.

freblasty’s picture

@Nick_vh: See #2242321: Optimize the tracker plugin (or add a new one) for a followup on the default tracker so that this patch can get in and work for other depending issues can start.

amateescu’s picture

Assigned: amateescu » Unassigned
Priority: Normal » Critical
Status: Needs work » Reviewed & tested by the community
FileSize
92.07 KB
5.29 KB

The thing is.. I don't get those failures locally and I bet travis will pass as well. It must be something related to your local environment (and Thomas' too).

I could only find a few other minor things to fix, and given the green light we got for the architecture, I think you should feel free to commit it :)

amateescu’s picture

Status: Reviewed & tested by the community » Fixed
Issue tags: -sprint

@freblasty committed this in http://drupalcode.org/sandbox/daeron/2091893.git/commit/1764eaf6aaa9d787... , we can move forward now :)

amateescu’s picture

SearchApiIntegrationTest was marked as broken in #39 but I skipped it when reporting test passes in #43 :( Committed a small followup to fix it: http://drupalcode.org/sandbox/daeron/2091893.git/commitdiff/b3dc093a8168...

drunken monkey’s picture

Great work, and thanks for committing! You are right, discussion about details really shouldn't stop this, since it's blocking other issues.
So, continuing the discussion in #2242321: Optimize the tracker plugin (or add a new one).

  • Commit 1764eaf on master, 2230931--multiple_datasources, search_api_db, views, local-tasks by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, 2230931--multiple_datasources, search_api_db, views, local-tasks, 2253237-search-result-class by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, 2230931--multiple_datasources, search_api_db, views, local-tasks, 2253237-search-result-class, 2241429--language_support by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2253237-search-result-class, 2241429--language_support, htmlfilter by freblasty:
    #2232253: Make the indexing batch process work.
    
drunken monkey’s picture

Component: Backend » Framework
Status: Fixed » Closed (fixed)

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder, 2230881-test-all-processors by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder, issue-2273743 by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, 2235381-fix-config-schemas, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2 by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema, transliteration by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema, transliteration, ignore-case-test-fix by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema, transliteration, ignore-case-test-fix, 2281233-fix-stopwords by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema, transliteration, ignore-case-test-fix, 2281233-fix-stopwords, ignorecase-nick, renamefields by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, 2241429--language_support, htmlfilter, move-integration-tests-subfolder, issue-2273743, 2268885-fix-validation, synonyms-processor, 2230925-server-tasks, 2235381-fix-config-schemas-2, 2257113-Index-Processors-Fields, config-schema, transliteration, ignore-case-test-fix, 2281233-fix-stopwords, ignorecase-nick, renamefields by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, htmlfilter, issue-2273743, 2268885-fix-validation, synonyms-processor, 2235381-fix-config-schemas-2, config-schema, transliteration, ignore-case-test-fix, ignorecase-nick, renamefields, highlight by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, htmlfilter, issue-2273743, 2268885-fix-validation, synonyms-processor, 2235381-fix-config-schemas-2, config-schema, transliteration, ignore-case-test-fix, ignorecase-nick, renamefields, highlight, add-aggregation-nick by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, htmlfilter, issue-2273743, 2268885-fix-validation, synonyms-processor, 2235381-fix-config-schemas-2, config-schema, transliteration, ignore-case-test-fix, ignorecase-nick, renamefields, highlight, add-aggregation-nick, 2247923-Test_the_RenderedItem_processor by freblasty:
    #2232253: Make the indexing batch process work.
    

  • Commit 1764eaf on master, views, htmlfilter, issue-2273743, 2268885-fix-validation, synonyms-processor, 2235381-fix-config-schemas-2, config-schema, transliteration, ignore-case-test-fix, ignorecase-nick, renamefields, highlight, add-aggregation-nick, 2247923-Test_the_RenderedItem_processor, 2286813-fields-processor-plugin-base-test by freblasty:
    #2232253: Make the indexing batch process work.