No documentation on how to get this up and running, some help would be good.



curagea’s picture

Seconded. My helpers are installed, but I can't see to get Search Files working with them. Some documentation will be greatly appreciated.

mgifford’s picture

Yes, I'd really like a README.txt file too. However, this is as much as I've hammered out:


To Install from Debian/Ubuntu:

# apt-get install xpdf
# apt-get install catdoc
# apt-get install unrtf

Help Options available:

$ /usr/bin/env pdftotext
pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC
Usage: pdftotext [options] []
-f : first page to convert
-l : last page to convert
-layout : maintain original physical layout
-raw : keep strings in content stream order
-htmlmeta : generate a simple HTML file, including the meta information
-enc : output text encoding name
-eol : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors
-cfg : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information

$ catdoc
catdoc [-vu8btawxlV] [-m number] [-s charset] [-d charset] [ -f format] files

$ unrtf
Usage: unrtf [--version] [--help] [--nopict|-n] [--html] [--text] [--vt] [--latex] [--ps] [--wpml] [-t html|text|vt|latex|ps|wpml]


Set the Helper Files & extensions - admin/settings/search_files/helpers/
Word & Excel Files
HELPER NAME: Microsoft Word
HELPER PATH: /usr/bin/env catdoc %file%

HELPER NAME: Microsoft Excel
HELPER PATH: /usr/bin/env catdoc %file%

HELPER PATH: /usr/bin/env unrtf %file%

Set the Valid Directories -- admin/settings/search_files/directories

mmirza’s picture

Hi, I am having some real problems with setting up the search files modules, I've followed all the steps from the post, and yet nothing, can someone please help?

--David--’s picture

Me too... please help?

zaarkov’s picture

had trouble too,
now i'm using 6.x-2.0-beta4, which does the job very basic.

airliner’s picture

Just download, extract, change into directory of the helper.
Call ./configure -C "your_path" --> make --> make install and let auto detect by 6.x-2.0-beta4.

With Debian everything is ok.
But just search attachments, not searching in directories, but that's ok imho.

SocialNicheGuru’s picture

this hsould go in the readme.

apatrinos’s picture

On MacOS pdftotext requires a '-' as the last argument in order to output its results to the terminal and consequently to a php variable via the shell_exec call in function search_files_attachments_get_file_contents of file search-files_attachments.module. If this is general it should probably be incorporated in the documentation. Unfortunately this is not mentioned in pdftotext's help output, but it is the usual behavior for unix tools.

terryallan’s picture

Thanks for all comments above but the instructions are not clear enough for me yet.

I have installed the extracted catdoc app in a directory called helpers in the search_files module directory. ie search_files/helpers/catdoc

In Admin/Site Configuration/Search Files no helper apps are listed and so no configuration is possible.

Can anyone advise me please?


stodge’s picture

I'm having the same problem. I have it all configured and the helpers installed. I attached a .txt file and a .pdf file to new content. I re-index the search but I don't get any hits when searching. Any suggestions appreciated. Thanks

mdallmeyer’s picture

Hi, I just wanted to stick my head in and say that I got my search files module to work great using .jar's I wrote using the Apache POI Project.
Here is a link to the jar file I wrote which will extract text from .doc, .ppt, .xls. Alternatively, here is a wrapper .exe file, although I could not get this one to work, it had trouble finding the JRE.

Seems Apache Tika released a jar which does this for all MS Office files, including docx, xlsx, pptx, etc. (More links in case mirror dies)

I had to copy the JRE from the JDK to a directory on the server and then for the helper app line I wrote

"E://folder/folder/folder/jre7/bin/java -jar E://folder/folder/folder/MSOfficeToText.jar %file%"

selvaraj123’s picture

130.48 KB
134.5 KB
142.93 KB

#2 i have followed all showing any search file result

ge’s picture

In /admin/settings/search_files/helpers/edit/1 your screenshot shows the 'Helper path' setting as:
/usr/bin/env pdftotext %file% -
This would not be a valid path. It has a space in the path.
If pdftotext is in /usr/bin (like it is on my server), the setting would be:
/usr/bin/pdftotext %file% -
If pdftotext on your server is in /usr/bin/env (doubtful), then the setting would be:
/usr/bin/env/pdftotext %file% -


prabakaran’s picture

Issue summary: View changes

the module working is we will but new Microsoft word file not indexing in exampls (.docx, .pptx and .xlsx ) file not indexing

please help me.