No documentation on how to get this up and running, some help would be good.
Seconded. My helpers are installed, but I can't see to get Search Files working with them. Some documentation will be greatly appreciated.
Yes, I'd really like a README.txt file too. However, this is as much as I've hammered out:
ON SERVER WITH THE COMMAND LINE
To Install from Debian/Ubuntu:
# apt-get install xpdf
# apt-get install catdoc
# apt-get install unrtf
Help Options available:
$ /usr/bin/env pdftotext
pdftotext version 3.01
Copyright 1996-2005 Glyph & Cog, LLC
Usage: pdftotext [options] 
-f : first page to convert
-l : last page to convert
-layout : maintain original physical layout
-raw : keep strings in content stream order
-htmlmeta : generate a simple HTML file, including the meta information
-enc : output text encoding name
-eol : output end-of-line convention (unix, dos, or mac)
-nopgbrk : don't insert page breaks between pages
-opw : owner password (for encrypted files)
-upw : user password (for encrypted files)
-q : don't print any messages or errors
-cfg : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
catdoc [-vu8btawxlV] [-m number] [-s charset] [-d charset] [ -f format] files
Usage: unrtf [--version] [--help] [--nopict|-n] [--html] [--text] [--vt] [--latex] [--ps] [--wpml] [-t html|text|vt|latex|ps|wpml]
Set the Helper Files & extensions - admin/settings/search_files/helpers/
Word & Excel Files
HELPER NAME: Microsoft Word
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: Microsoft Excel
HELPER PATH: /usr/bin/env catdoc %file%
HELPER NAME: RTF Files
HELPER PATH: /usr/bin/env unrtf %file%
Set the Valid Directories -- admin/settings/search_files/directories
Hi, I am having some real problems with setting up the search files modules, I've followed all the steps from the post, and yet nothing, can someone please help?
Me too... please help?
had trouble too,
now i'm using 6.x-2.0-beta4, which does the job very basic.
Just download, extract, change into directory of the helper.
Call ./configure -C "your_path" --> make --> make install and let auto detect by 6.x-2.0-beta4.
With Debian everything is ok.
But just search attachments, not searching in directories, but that's ok imho.
this hsould go in the readme.
On MacOS pdftotext requires a '-' as the last argument in order to output its results to the terminal and consequently to a php variable via the shell_exec call in function search_files_attachments_get_file_contents of file search-files_attachments.module. If this is general it should probably be incorporated in the documentation. Unfortunately this is not mentioned in pdftotext's help output, but it is the usual behavior for unix tools.
Thanks for all comments above but the instructions are not clear enough for me yet.
I have installed the extracted catdoc app in a directory called helpers in the search_files module directory. ie search_files/helpers/catdoc
In Admin/Site Configuration/Search Files no helper apps are listed and so no configuration is possible.
Can anyone advise me please?
I'm having the same problem. I have it all configured and the helpers installed. I attached a .txt file and a .pdf file to new content. I re-index the search but I don't get any hits when searching. Any suggestions appreciated. Thanks
Hi, I just wanted to stick my head in and say that I got my search files module to work great using .jar's I wrote using the Apache POI Project.
Here is a link to the jar file I wrote which will extract text from .doc, .ppt, .xls. Alternatively, here is a wrapper .exe file, although I could not get this one to work, it had trouble finding the JRE.
Seems Apache Tika released a jar which does this for all MS Office files, including docx, xlsx, pptx, etc. (More links in case mirror dies)
I had to copy the JRE from the JDK to a directory on the server and then for the helper app line I wrote
"E://folder/folder/folder/jre7/bin/java -jar E://folder/folder/folder/MSOfficeToText.jar %file%"
#2 i have followed all steps.no showing any search file result
In /admin/settings/search_files/helpers/edit/1 your screenshot shows the 'Helper path' setting as:
/usr/bin/env pdftotext %file% -
This would not be a valid path. It has a space in the path.
If pdftotext is in /usr/bin (like it is on my server), the setting would be:
/usr/bin/pdftotext %file% -
If pdftotext on your server is in /usr/bin/env (doubtful), then the setting would be:
/usr/bin/env/pdftotext %file% -
Drupal is a registered trademark of Dries Buytaert.