File Extractor: new computed field available in Search API index
File Extractor: new computed field available in JSON:API

Synopsis

This module adds a new computed field on File entity: "File extractor: extracted file".

This new field allows to access the content of the file:

  • in webservices like JSON:API
  • in a field formatter (file field)
  • in Search API

The module provides the following extraction methods:

  • Docconv binary
  • Pdftotext binary
  • Python Pdf2txt binary
  • Solr built-in extractor (Search API Solr)
  • Tika App JAR
  • Tika Server JAR

History

This project is a fork of Search API Attachments. More information on the module origins on: #3126845: Version 2.0.0

Requirements

Each extractor plugin can require different modules/libraries, if the requirements are not satisfied the plugin doesn't show up in the settings.

Each extractor plugin can require different binary on your server, when configuring the extraction, a test will be done to see if the extraction works. Also you can read the module documentation to see installation instructions for extractor plugins.

Configuration

  • Enable the File Extractor module on your site.
  • Go to the configuration page (/admin/config/media/file-extractor) and configure the extraction settings.

The module provides its own cache bin 'file_extractor', so in your settings.php file you can override the cache backend for this cache bin. For example if you want to use the File Cache module:

$settings['cache']['bins']['file_extractor'] = 'cache.backend.file_system';

Maintainers

Project information

Releases