Using Sarnia to interact with external Solr data
This document refers to the 1.1 release (in beta) of the Sarnia module. Sarnia allows a Drupal site to interact with and display external data from Solr, mainly by building views of data from Solr. This is useful for large external datasets that either aren't practical to store in Drupal or that are already indexed in Solr.
Sarnia is also the name of a town in Ontario, Canada, home of the largest photovoltaic power plant in Canada.
Table of contents
- Generating a Solr core for testing
- Configuring Search API
- Creating Views of Solr data
- Advanced Solr
- Advanced Entities
- Features integration
Sarnia depends on Search API, Search API Solr, and Search API Views. The full list of dependencies includes:
Sarnia depends on the latest 1.x releases of Search API and Search API Solr. The included drush makefile,
sarnia.make.example, may help with downloading all of the dependencies.
After downloading the required modules, installing Sarnia will enable its dependencies. Enabling the "Views UI" module (included with Views) is also recommended.
Generating a Solr core for testing
In order to use Sarnia, you need a populated Solr core to work with. Sarnia does not care what sort of data is in the core, as long as the Solr schema specifies that some fields are stored as well as indexed. You may want to use a separate Drupal site with the ApacheSolr module and content generated using Devel Generate (a module that accompanies the Devel module) to populate a Solr core for basic testing. QA testing against your own data will better reveal any issues that relate to searching and displaying your particular data set.
For generating sample Solr data, ApacheSolr is preferred over Search API. When indexing data, Solr can be configured to index data without storing it; Search API makes the decision to index most data using Solr but to not store it (make it retrievable from Solr), while ApacheSolr stores all of the data that it indexes. In short, a Solr core generated using Search API will contain very little retrievable data, while a core generated using ApacheSolr will allow you to retrieve all properties from the core--the use case that Sarnia was built to address.
Configuring Search API
To connect your Solr core to Drupal, create a Search API server configuration.
Visit the Search API configuration section:
Admin > Configuration > Search and metadata > Search API (path: admin/config/search/search_api)
This page lists the configured Search API servers and indexes. Normally, servers and indexes are independent, but Sarnia's purpose is to use a Search API server as a data source. Instead of the normal process of creating an index and linking it up to a server through configuration, we will create a server and then let Sarnia create and manage the index:
Search API servers correspond with Solr cores, not Solr servers. If you want to use multiple Solr cores, you will create multiple "Search API servers", even though you may have a single multi-core Solr server set up.
Add a Search API server by visiting the "Add server" link. Give the server a name:
Then select the "Sarnia Solr service" service class and fill out your Solr connection information:
Clicking "Create server" will finalize your configuration, and you will be taken to an overview of your settings:
At this point, if you were to visit the Search API overview page again, you would see your new server listed:
Instead of going back to the overview page, visit the "Sarnia" tab (highlighted in image-4.png). This page allows you to create a new entity type based on your server.
The "ID field" select box contains a list of all the Solr fields that may be suitable for use as an entity id:
unique integer values; however, Sarnia has no way to determine which fields have unique values, so this choice requires some knowledge of your Solr core. This can not be changed after creating the entity type. If you are only reading from the core and not creating data or links based on Sarnia entities, it is not destructive to delete and re-enable the Sarnia entity for a particular server. Clicking "Enable" will save your configuration:
When you save your configuration, Sarnia will create a Search API index for you. You can see this index when you visit the Search API overview page:
At this point, your Drupal site is connected to Solr and can retrieve Solr data.
Creating Views of Solr data
Visit the Views UI:
Admin > Structure > Views (path: admin/structure/views)
Create a new View using the "Add new view" link.
In the "Show" section, select the name of the index that Sarnia created; it will be titled "[your server name] (Sarnia Index)". In the "Create a page" section, the View's "Display format" will be "Unformatted list", make sure that "Fields" is selected following the "of" (i.e. Unformatted list of Fields). The form will refresh, and you can click "Continue & edit".
In the edit page for the view you have just created, if you have not already selected "Fields", do so now:
All of the Solr data is available through a single field, named "[your server name] (Sarnia Index): Data". At the time that Sarnia was designed, the Views UI lacked the ability to filter fields, and long lists of poorly labeled fields are not usable. The Sarnia field bundles all Solr fields together into a single field with a combobox select element.
Find the "Data" field by clicking "add" in the Fields section and selecting "[your server name] (Sarnia Index): Data":
Data Views fields have a "Formatter" option:
This can be used to provide basic formatting options for a property. Most text fields will benefit from using the "Filtered text" formatter with the "Plain text" option, which will translate plain text line breaks into HTML breaks and URLs into links:
If you add filters, sorts, or advanced contextual filters (formerly known as an "argument"), you will again see "[your server name] (Sarnia Index): Data" as an option. When you select it, you can choose the Solr property to filter, sort, or use as context:
You may add multiple instances of the field, filter, sort, or contextual filter, which will let you combine and arrange your data according to various Solr properties.
Often in Solr, the same piece of data will be indexed multiple times for different purposes; some fields will not be suitable for search or display. Sarnia provides some "Solr Schema" configuration to manage these behaviors.
Naming conventions for these behaviors are not standard across Solr schemas, and fields aren't described in a way that is intelligible to Sarnia (ie, nothing in the
schema.xml explicitly declares the relationship between
ss_* fields and
sort_* fields, even they are generally different indexes of the same data), so Sarnia assumes certain conventions when applying schema rules. For example:
- Content is often aggregated into a single
contentfield for use in fulltext search, so the
contentfield is not available for display.
- Content is often aggregated and heavily tokenized in the
spellfield for spelling suggestions or corrections, so the
spellfield is not available for display.
- The dynamic base
sort_*is used for fields that are processed as a single token for sorting. There may be a duplicate version of this field for search, so
sort_*fields are not available for fulltext search.
- Solr fields containing more than one token are not suitable for sorting, since they are essentially multi-value. Sorting is disabled on
sort_*fields that correspond with
ss_*fields are used instead of the
ss_*when sorting; this allows click sorting on display fields in Views.
If you crafted your Solr schema yourself, you may want to check out the "Solr Schema" tab on your Sarnia Search API server configuration; otherwise, you probably want to stay far, far away :)
In the Search API server configuration for Sarnia servers/entities, you can "manage fields" on Sarnia entities. It is possible to add fields here, but there is no corresponding interface for editing field content; saving content has not been tested, even programmatically. Sarnia's relationship with Solr is read-only, so even if an editing interface were built out, it would not be possible to edit data stored in Solr.
It's possible to add Sarnia's Search API server and index to a feature. However, in addition to them you also need to save the corresponding entity type. Support for this exists in the VCS version, but if you have problems with it you might want to add the entity type manually using sarnia_entity_type_save().
A few pointers on already-experienced problems.
- In Views, I'm not seeing the Data field but only separate fields for each value? Should I use those?
- No. For Sarnia to work properly you need to use the Data field and select the required value from its settings. If the data field doesn't work, it might be that the solr_document field hasn't been added – that happens for example when you insert the sarnia entity type directly to the database without running sarnia_entity_type_save().
- Should I use the only Views base table added by Search API or should Sarnia define a separate one?
- You use the one added by Search API, so the Data field should be seen there. See the previous question if it isn't.