Frequently asked questions

Last updated on
29 January 2018

This page tries to answer questions frequently asked by new users, either in the issue queue or elsewhere.

Why don't processors reduce the number of items to index?

E.g., you have an index configured with the "Role filter" processor to only index users with role "Editor". Still, on the index's status page the "Progress" note mentions the total number of users (of all roles) as the total number of items to index, and maybe even as the number of items already indexed. (This is also the same for all other processors that filter the items to be indexed, like "Entity status".) Therefore, you might wonder whether the processor is working properly, or not filtering the indexed items at all.

However, this is not a bug, the processor is (most likely) working correctly! It is just a regrettable, but hard to remove restriction in the design of the Search API that processors won't change the listed total number of items, and that items filtered out from indexing will still count to the number of items indexed. If you want to know the number of items that are really indexed, look at the "Server index status" row further down on the page.

Why can't I create a search index for entity type Y?

Since config entities use a radically different system for their schema/metadata, the Search API currently only supports indexing of content entities by default.

Why don't Search API searches find partial matches/substrings?

E.g., when searching for "break", why aren't items containing "breakpoint" (or "unbreakable") found.

This is not actually a problem of the Search API module itself. The Search API only supplies the framework, and passes the search string on to the backend (database, Solr, Elasticsearch, etc.) for the actual searching. The backend plugin therefore determines whether partial matches will be returned or not. For this reason you should refer to the backend's documentation or ask in their issue queue (if they aren't listed below).

For the "Database search" module/backend included in the Search API project, this functionality is already available, simply by enabling the "Search on parts of a word" option in the backend settings at /admin/config/search/search-api/server/local_database_server/edit. It is off by default. It can make searches much slower on large sites.

For information regarding Solr search, see here.

What is the "Server index status" for a search index?

One piece of information on an index's "View" page that might need some explanation is the "Server index status". This is the result count which the server returns for a completely unfiltered query for this index, so it represents the total number of items indexed on the server (for this index).

So, how can you use that to diagnose problems? This depends a lot on your current setup, which is why initial warning messages to automatically detect problems using this metric were soon removed again since the detection mechanisms didn't apply to all setups.

In general, the item count on the server should be between the number of indexed items and the total number of items displayed for the index. However, when using processors like the "Role filter" or "Entity status", not all items markes as "indexed" internally are really sent to the server. In these cases, you would have to verify yourself the total number of items that should be indexed, and whether the "Server index status" reaches that number (when the index displays "All items have been indexed.").
A second complication is Solr's commit behavior which means that (in some setups), for about two minutes after indexing, the "Server index status" won't yet reflect the real number of items indexed on the server.
Finally, if you are indexing data (most likely on a Solr server) with an external program, bypassing Drupal and the Search API for the indexing part, Search API will of course not know about your externally indexed items and show a too low total item amount (0, when indexing happens exclusively external). You would then have to check your external source for whether the "Server index status" you are seeing is correct and as expected.

What is an index's “tracking information”? How does tracking work?

When an index is created (or enabled), it first has to determine what the total set of items is that it should index. For example: all comments, all user profiles and all content except of the "Basic page" content type. This usually happens right away in a batch started automatically when the index form is submitted. In some cases, where this isn't possible, this operation will be executed over time during cron runs, or can also (until it is finished) be started manually with a batch via the “Track items for index” form on the index's “View” tab.

How this information is actually stored (and, later, processed) depends on the index's so-called “Tracker” plugin. The default tracker which the Search API provides stores all this information in the search_api_item table in the database – but other modules could provide other tracker plugins, using different mechanisms. The information stored will (probably independent of the tracker plugin in question) contain at least the item IDs and, for each item, whether it was already indexed in it's latest state or not. When first inserting this information for a new (or newly enabled) index, all the items will, of course, be listed as “not yet indexed”.

Then, as long as the index remains enabled (or isn't deleted completely), several things can happen that will change the tracking information:

  • Items should be indexed: The tracker uses its stored information to retrieve some items that still need to be indexed.
  • Items are indexed: The tracker will change the status of all indexed items to “has been indexed”.
  • Items are created, edited or deleted: The tracker will insert the information for the item into its data store (by default, a new row will be added to search_api_item); or will change the item's entry to “needs to be indexed”; or will remove the item's entry.
  • An admin clicks “Queue all items for reindexing” or “Clear all indexed data”: The tracker will set all its entries back to “needs to be indexed”.
  • An admin clicks “Rebuild tracking information”: The tracker will throw away the entries for all items and determine again which items exist for the index, inserting all of them into its data store. (Again, this can happen immediately via a batch, or during cron runs, or by manually clicking “Track now”.)