Common pitfalls

Last updated on
5 February 2022

This page lists the pitfalls most commonly encountered by new users, to hopefully minimize the people falling into them in the future.

While Drupal core provides its own "Search" module, the "Search API" suite of modules is completely independent of that. Therefore, if you are using the Search API for searches on your site, you should uninstall all unrelated search modules (unless you're sure you need them), especially Drupal core's "Search" module.

Keeping the "Search" module enabled will harm performance, since indexing for that module will still occur even though you are not using it anymore. (You might also get inferior search pages from it still accessible to users on your site.)

When you want to use Views to create a search page with a fulltext search, only use the “Search: Fulltext search” filter (or contextual filter) for the keywords input!

“Search keywords” are special in the Search API, compared to normal filters, in that they are parsed into separate words (unless you are using the “Single term” parse mode) that will all be searched separately. Normal filters, even on fulltext fields (that is, fields indexed as type “Fulltext”), will search for entered phrases as a whole, as if the keywords were put in quotes. Furthermore, only proper keywords will influence the relevance (or “score”) of results, if you are using this mechanism for sorting – filters won't do that.

So, even if you only want fulltext searches on a single field – if you want “normal” fulltext search behavior, use the “Search: Fulltext search” filter!

Creating a search block with Views

Setting up a search block that redirects users to your search view is not simple if you're not a Views expert, especially if you don't want all exposed filters present in this block. However, unless you want some preprocessing done on the form (most notably, adding autocompletion), you can easily circumvent this by creating a custom block and putting the HTML for the form in there (make sure to set the text format to "Full HTML" or something similar):

<form action="/PATH/TO/VIEW" method="get">
<label for="search-keywords">Search</label>
<input id="search-keywords" maxlength="128" name="search_api_fulltext" size="20" type="text" value="" />
<input class="form-submit" name="" type="submit" value="Search" />
</form>

(If you've changed the “Filter identifier“ setting of the “Search: Fulltext search” filter, change "search_api_fulltext" accordingly. For instance, for the “Search content” view provided by the “Database Search Defaults” module, this should be "keys" instead.)

Avoid showing all results when no keywords are entered

Just check "Required" in the settings of the "Search: Fulltext search" exposed filter and the view will be blank before keywords are entered. Alternately, set the "Exposed form style" to "Input required".

Changes in related entities don't lead to re-indexing

Note: When using version 8.x-18 or later of this module, this problem should be fixed due to #2007692: Changes in related entities don't lead to proper re-indexing being committed. If you still experience problems, please report this as a bug (or search for an existing issue).

It is possible to index the fields of entities (or other structures) related to your indexed items. For example, you could index the names of taxonomy terms contained in a node's taxonomy reference field. Or – for instance, for access control – the user roles of the node's author.

However, when you now change the name of a taxonomy term (or the roles of a user), you'll notice that the nodes who reference that term (or user) aren't getting marked as "dirty" and, subsequently, re-indexed. This leads to those fields containing related data to become stale. Unfortunately this is very hard to solve in the Search API, so a solution to this problem could still take a while.

There are a few custom workarounds available which you can use for your site:

  • Probably the easiest and most comfortable to implement would be to use the Rules integration of the Search API to automatically re-index (or mark as "dirty") items when their related entities change. (The rules to create for this of course depend on your specific setup.)
  • If you are (or employ) a developer: Use custom code to do the same. In hook_ENTITY_TYPE_update(), just call $index->trackItemsUpdated() with the appropriate index, datasource and IDs. (See below for an example.)
  • If such changes occur only very rarely, and if the site is rather small and only maintained by you, you can also just manually re-save all affected items if such a change occurs. (In our example, save all affected nodes after updating a user's roles or a term's name – or just reindex all data on the index.)

An example for doing this in custom code (when you have the name of related taxonomy terms indexed for a node index) follows:

/**
 * Implements hook_ENTITY_TYPE_update() for type "taxonomy_term".
 */
function MODULE_taxonomy_term_update($term) {
  if ($term->label() !== $term->original->label()) {
    $nids = \Drupal::entityQuery('node')
      ->condition('YOUR_TERM_FIELD', $term->id())
      ->accessCheck(FALSE)
      ->execute();
    if ($nids) {
      $nodes = \Drupal\node\Entity\Node::loadMultiple($nids);
      $item_ids = [];
      foreach ($nodes as $nid => $node) {
        foreach ($node->getTranslationLanguages() as $language) {
          $item_ids[] = "$nid:" . $language->getId();
        }
      }
      \Drupal\search_api\Entity\Index::load('YOUR_INDEX_ID')->trackItemsUpdated('entity:node', $item_ids);
    }
  }
}

Placing the above into a module file (and replacing all uppercase placeholders) will automatically mark nodes which reference a term as "dirty" when the term's name changes.

Having "Index items immediately" disabled can lead to leaks of confidential data

The "Node access" data alteration, which automatically filters out node results that the current user shouldn't be able to access, works with the indexed state of the entity. The same is true for manual set filters (e.g., in Views on the "Published" field) or most other access control mechanisms.
However, if the index's "Index items immediately" setting is disabled, changed items will (usually) not be indexed until the next cron run, which means the data in the index will be out-dated until then. Since, usually, the data of the results shown to the user comes from the database, not from the search index, this means that data which the user shouldn't see might be displayed to them in search results. However, this will be the case only for very specific setups:

  • The item must have been accessible previously and only later become inaccessible.
  • When the item becomes inaccessible, some data must be added that end users shouldn't see. (Otherwise, only data they could see before anyways will be shown to them.)
  • The "secret" data must be in a field that will be displayed in the search results (or could end up in an excerpt shown with the results).

If this setup applies to your site, it is very much recommended that you enable the "Index items immediately" option for the index in question. (Using Rules to immediately index items only if such a change occurs is also possible, if the load on the server would otherwise be too high. However, keep in mind that Solr's commit behavior might prevent this from working as expected.)

If you are using Solr, enabling the server's "Retrieve result data from Solr" option might also be a way to prevent this from happening, since the search will then show the old data while the new one isn't indexed, not the one with the confidential content added. However, it's tricky to set this up correctly, so that really only data from Solr will be used – so please only opt for this variant if you are an advanced user.

Unpublished content showing in Search API results

By default, the Search API doesn't place any access restrictions on search results it returns (though its Views integration adds explicit access checks for all results). If you have search results being displayed to unauthorized users, you could use the Content access or Entity status processors to change this (if applicable), or try to add appropriate Views filters that exclude these results some other way.

Indexing of broken references

Due to a Core bug #2723323: [META] Clean up references to deleted entities, which means that, for example, references to taxonomy terms aren't removed from nodes when the referenced term is deleted, the Search API will sometimes index those term references pointing to a nonexistent taxonomy term. This means, for example, that facets listing those terms will just display the term ID, not the term label.

To resolve this, either help fix the Core bug or use custom code for fixing the problem.

Different processor settings for fulltext fields

The Search API currently doesn't support separate processing for fulltext search keywords based on the searched field. Therefore, enabling processors like "Ignore case" or "Stopwords" for only some fulltext fields will usually not work as intended: while only the values of the selected fields are processed during indexing, the search keywords for all fields are processed by the processor when searching.

There is currently no proper solution for this problem so it is advised that you always enable or disable such processors for all fulltext fields.

The “Content access” processor doesn't work for some custom access mechanisms

While Drupal provides a pretty flexible node access system out-of-the-box, it is unfortunately not completely generic, especially when using a completely separate implementation (in this case: in the Search API). Therefore, though the “Content access” processor tries its best to account for all grants and access records in the node access system, this is unfortunately not enough to support all custom node access solutions/modules.
One popular module that we cannot support, for instance, is the view_unpublished module.

We have unfortunately not found any generic solution for this problem yet, if one even exists. If you want correct node access with a custom node access module for which the “Content access” processor doesn't work, you'll need to write your own variant of that processor.
For a more detailed discussion, see #2948707: ContentAccess processor fails to account for all grant records and the issues linked from there.

However, when the processor doesn't completely work, it should currently always err on the side of caution. We haven't received any complaints so far about users seeing content that they shouldn't have access to.

Help improve this page

Page status: No known problems

You can: