Lucene indexes are composed of atomic documents. Each document is divided into named fields which either have content that can be searched or data that can retrieved. Before storing data into a field, it is important to determine which field type best fits the nature of the content being indexed. It is also best practice to avoid having to retrieve large amounts of data from the index itself. The Search Lucene Content module implements this practice by indexing most data using the UnStored field type. When searches match content in these fields, only the matching node ID is retrieved from the index, and the search results are populated with data from the database. This allows us to make use of Drupal's APIs without having to re-implement them on the Lucene layer. All available field types are listed below:

  • UnStored fields are tokenized and indexed, but not stored in the index. Large amounts of text are best indexed using this type of field. Storing data creates a larger index on disk, so if you need to search but not redisplay the data, use an UnStored field. UnStored fields are practical when using a Lucene index in combination with a relational database. You can index large data fields with UnStored fields for searching, and retrieve them from your relational database by using a separate field as an identifier. The content in the node body is a good candidate for UnStored fields.
  • Keyword fields are stored and indexed, meaning that they can be searched as well as displayed in search results. They are not split up into separate words by tokenization. Enumerated database fields usually translate well to Keyword fields in Search Lucene API. Items like node IDs are best stored in keyword fields.
  • UnIndexed fields are not searchable, but they are returned with search hits. Database timestamps, primary keys, file system paths, and other external identifiers are good candidates for UnIndexed fields.
  • Text fields are stored, indexed, and tokenized. Text fields are appropriate for storing information like subjects and titles that need to be searchable as well as returned with search results.
  • Binary fields are not tokenized or indexed, but are stored for retrieval with search hits. They can be used to store any data encoded as a binary string, such as an image icon.