Data alterations and processors

Last updated on
24 March 2017

Drupal 7 will no longer be supported after January 5, 2025. Learn more and find resources for Drupal 7 sites

This page lists and shortly explains all data alterations and processors currently available. Unless otherwise noted, they are part of the core Search API module.

Note that not all data alterations and processors might be available for a certain index. This is usually based on the index's item type. For instance, the Bundle filter data alteration isn't available for indexes on item types which don't define any bundles (or, only a single one).

Data alterations

Bundle filter
Lets you to prevent entities from being indexed based on their bundle (content type for nodes, vocabulary for taxonomy terms, etc.). This way you can, for instance, create an index solely for news.
Language control
Allows you to control the language of items stored in the index. This is done by providing two different functionalities:
  • Normally, the content of the Item language property (which is automatically added by the Search API for all indexed items) is determined by the item's language property, if available, and otherwise set to undefined. With this data alteration, you can select any other property as an alternative source for the item language, which will then be used instead. Note that the selected field has to contain a single valid ISO language code for each item for this to work, though.
  • You can then also select the languages items in this index may have. Items with any other language (defined by the Item language property) will be rejected during indexing.
Node access
Adds node access checks to searches on this index. This is done by adding a new field, Node access information that stores the relevant access data. When the Node access information, author, and Status fields are present and indexed, appropriate filters will be automatically added to all searches so that they only return results that the current user is allowed to view. Some searches (e.g., search views) provide the option to override this behaviour on a per-search basis, though. Check the corresponding module's documentation for details.

In any case, you have to keep in mind that these access checks are solely based on the indexed data. If a node is edited in a way that changes its accessibility (e.g., by being unpublished), this change will only take effect once the node is indexed in its latest state. This means that there is potentially a gap between changing the node and the update of the access checks on search results, meaning that—depending on the data displayed for search results—users could in that time see data that should not be accessible to them. If you need to avoid that, use the index's Index items immediately option.

Also note that access on the individual fields is never checked — don't include them in the display, if they contain sensitive data. Refer to hook_node_access_records() and hook_node_grants() on implementing node access checks. The node access data stored in the index is based on the node_access table which is affected by hook_node_access_records().
The data alteration is only available for node indexes.

Search views do not filter based on node access by default. There is a simple option in the query settings called "Additional access checks on result entities" that will do an access check after the actual query is run, but this option should only be used as a last result. Search results counts and facets will not reflect the further restriction applied by views.

The proper way to do the node access checks in views is to add a filter on Indexed Node: Node access information. This can be complicated because it is important to know what values the field will hold, and this information can not be output through fields in the view itself. One must look to the data stored in the search server. In the case of Solr, this can be accomplished by examining the sm_search_api_access_node field in the schema browser. A sample value for one configuration of taxonomy access control was node_access_taxonomy_access_role:2. One could make a views search display for authenticated users for example that included all results, and a display for anonymous users that checked that the Node Access Information value is not equal to node_access_taxonomy_access_role:2 in the example above. A brief example can be found in this Drupal Answers answer

URL field
Adds a field containing the URL at which the entity can be displayed. For some item types, like nodes, this URL is already available, but this data alteration can be used to also add them for other types.
Aggregated fields
Offers the ability to add additional fields to the entity, containing the data from one or more other fields. Use this, e.g., to have a single field containing all data that should be searchable, or to make the text from a string field, like a taxonomy term, also fulltext-searchable.
The type of aggregation can be selected from a set of values: you can, e.g., collect the text data of all contained fields, or add them up, count their values, etc.
Complete entity view
Adds a field containing the whole HTML content of the entity as it is viewed on the site. The view mode used can be selected. This allows you to index exactly „what the user sees“, which is often what is expected, but might differ from just indexing the contents of other fields.
Note that this might not work for items of all types. All core entity types except files are supported, though.
Index hierarchy
Allows you to index hierarchical fields along with all their parents. Most importantly, this can be used to index taxonomy term references along with all parent terms. This way, when an item, e.g., has the term New York, it will also be matched when filtering for USA or North America.

Processors

Ignore case
Makes searches on selected fields case-insensitive. Some servers might do this automatically, for all others this should probably always be activated, at least for fulltext fields.
HTML filter
Strips HTML tags from selected fields and decodes HTML entities. If you are indexing HTML content (like node bodies) and the search server doesn't handle HTML on its own, this should be activated to avoid indexing HTML tags, as well as to give e.g. terms appearing in a heading a higher boost.
Tokenizer
This processor allows you to specify how indexed fulltext content is split into seperate tokens – which characters are ignored and which treated as white-space that seperates words.
Stopwords
Enables the admin to specify a stopwords file, the words contained in which will be filtered out of the text data indexed. This can be used to exclude too common words from indexing, for servers not supporting this natively.
Highlighting
Adds highlighting of search terms to the search results.

Help improve this page

Page status: No known problems

You can: