What is it ?
PDF Formatter provides two formatters for dealing with PDF files. These formatters use the pdftotext and pdftohtml utilities. Under Ubuntu, they come with the poppler-utils package. Though PDF Formatter has been developped under Linux, it should work under Windows as long as these utilities are installed on the system.
How does it work ?
Under Drupal 7, a formatter can be assigned to a combination of field and display (Structure→Content Types→Manage display).
The available formatters are :
- Convert PDF to text
- Convert PDF to HTML
When the Search module tries to index your content, it asks for a “Search index” display mode. On a standard installation, this defaults to the default display mode which is the one used when displaying the content to a user. With these formatters, Drupal will generate a more complete while less enjoyable (but that doesn’t matter) version especially suited for indexation.
Applying the same formatter to the “Search result” display will allow the search results to show highlighted found terms.
Why use PDF Formatter ?
If you look for Drupal indexing of PDF files, you will surely find the following solutions :
- Tika / Solr : needs installation of a complete JRE environment, of Solr+Tika (which is not straightforward to setup)