There are already issues about my question in this page, but anyway there is no clear answer except a non-working suggestion.

Does this module support Windows or not? If yes, how to configure the path of helper applications?

Thanks
Arda

Comments

bander2’s picture

Arda,

I am using Windows 2003 and IIS6, and I got it to work.

helper path: D:\xpdf\pdftotext.exe %file% -
Directory: D:/root_site_folder/files

I am running 5.x and I noticed in another issue someone suggested using escape characters like this: D:\\xpdf\\pdftotext.exe %file% - but I didn't have to do that. And you do need to give your internet guest account access to cmd.exe.

Also, the helper application is dificult to set up, so you should test that separatly to make sure it works. I had trouble because pdftotext came with instructions for linux/apache.

Maybe 6.x is different from 5.x, but I think it will work.

Brendan

leszek.hanusz’s picture

Sorry if it is a stupid question but how can i give the internet guest account access to cmd.exe ?
I am using apache on windows.

leszek.hanusz’s picture

I finally got it to work on apache on Windows(with version 6.x-1.6 of search_files)
My main problem was that the function shell_exec didn't return anything for me so I had to use WshShell->Exec

I had to modify the hook_update_index function as follow:

/**
 * Implementation of hook_update_index()
 * 
 * lists all the files in the director(y/ies) and puts the files 
 * into the "search_files_files" table
 *
 * then indexes X(configurable) number of these files
 */
function search_files_update_index() {
  $helpers = search_files_get_helpers();
  // only update the list of files in the directories once per day
  if(TRUE) {//if (variable_get('search_files_last_index', 0) < (time() - 86400)) {
    variable_set('search_files_last_index', time());
    $result = db_query('SELECT * FROM {search_files_directories}');
    while ($directory = db_fetch_object($result)) {
      search_files_list_directory($directory->directory, $directory->id);
    }
  }
  
  $index_number = (int)variable_get('search_cron_limit', 100);
  $sql = "
    SELECT 
      *
    FROM 
      {search_files_files}
    LEFT JOIN 
    (
      SELECT 
        *
      FROM 
        {search_dataset}
      WHERE 
        `type` = 'search_files'
    ) AS `dataset` ON {search_files_files}.`id` = `dataset`.`sid` 
    WHERE
    (
      `dataset`.`reindex` IS NULL OR
      `dataset`.`reindex` != 0
    ) AND {search_files_files}.`index_attempts` <= 5
    LIMIT %s
  ";

  $WshShell = new COM("WScript.Shell");	

  $result = db_query($sql, $index_number);
  
  while ($file = db_fetch_object($result)) {
    $full_path = $file->full_path;
    $file_name = explode('/', $full_path);
    $file_name = $file_name[count($file_name)-1];
    $file_extension = explode('.', $file_name);
    $file_extension = $file_extension[count($file_extension)-1];

    if (in_array($file_extension, array_keys($helpers))) {
      // record that we are attempting to index the file in case it hangs
      $increment_sql = "
        UPDATE
          {search_files_files}
        SET
          `index_attempts` = `index_attempts` + 1
        WHERE
          `id` = '%s'
      ";
      $increment_result = db_query($increment_sql, $file->id);
      if ($file->index_attempts >= 5) {
        // indexind failed too many times, record this to the log and continue
        watchdog('Search Files', t('failed to index %full_path after %attempts attempts', array('%full_path' => $file->full_path, '%attempts' => $file->index_attempts)), array(), WATCHDOG_ERROR);
        continue;
      }
      // %file% is a token that is placed in the helper's parameter list to represent the file path to the attachment.
      // We need to put the filename in quotes in case it contains spaces.
      $quoted_file_path = '"'. escapeshellcmd($full_path) .'"';
      $helper_command = preg_replace('/%file%/', $quoted_file_path, $helpers[$file_extension]);
      
      //the following command doesn't work on windows ??
      //$file_text = shell_exec($helper_command);
      
      //then using WshShell->Exec instead...
      try {        
        $file_text = $WshShell->Exec($helper_command)->StdOut->ReadAll;
      } catch (Exception $e) {
        watchdog('Search Files', t('Exception caught during the indexing of %full_path : %exception_string', array('%full_path' => $file->full_path, '%exception_string' => $e->getMessage())), array(), WATCHDOG_ERROR);
        continue;
      }
      
      $file_text = search_files_convert_to_utf8($file_text);
      
      //echo "<p>$file_text</p>";
      
      search_index($file->id, 'search_files', $file_text); 
    }
    else{
      search_index($file->id, 'search_files', ''); 
    }
  }
}

Once you have replaced this function in the search_files.module file, you have to:

  1. First be sure to start clean by reinstalling the search_files module
  2. Add the repertory you want to index with "/" (ie: C:/xampp/htdocs/drupal/sites/default/files)
  3. Delete the TXT helper
  4. edit the PHP helper to have this path: c:\\windows\\system32\\cmd.exe /c c:\\pdftotext.exe %file% -
  5. move the pdftotext.exe file to c:\
  6. go to Site configuration/search settings and click on reindex site
  7. run cron manually enough times to index your files
  8. you can now try a search :)
peter85’s picture

Hi, I want to use the search file module with drupal6.9 on windows

Installed it. The search of pdf attachments works.
But i want to search .doc and .xls files, too.
This doesnt work.
The helpers for doc or xls are not for windows.
Is there a solution for Windows?

Dinis’s picture

There are a number of helpers for Windows/DOS, have you found the ones you need?

yuva_mind’s picture

any one plz tell me how to install helpers(pdf,doc,xls,ppt,rtf) files in windows machine . give detail link or document. thanks in advance.

rgds
Yuva

dgarciad’s picture

Hi

This is my configuration:

Directory list (absolute path):

D:/xampp/htdocs/observatorios/sites/default/files

Helpers:

PDF:
D:\xampp\htdocs\helpapps\pdftotext.exe %file% -

DOC:
D:\xampp\htdocs\helpapps\catdoc.exe %file%

XLS:
D:\xampp\htdocs\helpapps\xls2csv.exe %file%

PPT:
D:\xampp\htdocs\helpapps\catppt.exe %file%

In the case of PPT, DOC and XLS helper apps, be very careful to avoid using directory names with spaces or longer than 8 characters (i.e ../helperapps/.. instead of ../helpapps/...) because they cause troubles.

Regards

hackia’s picture

Im confused, so i cant just have the helper files on my website directory uder "/drupal/default/files/"helperfile"" and have it find the doc and pdf words in search?

Keep in mind it would not be on the localhost at all but hosted with Dreamhost.

conradio’s picture

Hi I wonder if you can help me PLEASE, I installed the search files module under windows what I need to do is be able to search a directory with txt files for a persons name or info in the document and return the results (files) for a person to open, my txt helper is as follows c:\\windows\\system32\\cmd.exe type %file% and my directory C:\xampp\htdocs\dot\sites\all\files the problem is that it is not returning any results.
At the point that it returns results we will have a report viewer that will open with an interpretor and display the reports in PDF.

Please help.

dgarciad’s picture

Hi

I undestand the settings should be something like this:

Directory list (absolute path):

C:\xampp\htdocs\dot\sites\all\files

Helper apps:

Why are you using "c:\\windows\\system32\\cmd.exe %file%". Shoudn't you use catdoc if you want to search into txt files?

Instead, I would use the following TXT helperapp, (using your actual path):
D:\xampp\htdocs\helpapps\catdoc.exe %file%

The first thing to check is if your helperapp is working. To do so, install catdoc, open a cmd window and type the following command to check if you get back some results (using your actual path).

e.g.
D:\xampp\htdocs\helpapps\catdoc.exe C:\xampp\htdocs\dot\sites\all\files\example.txt

Hope it helps

PD. I am no longer using Search files. I have switched to solr + solr attachement.

dgarciad’s picture

Hi

I haven't been able to make (catdoc/catppt/catxls) work until I have changed the path using names of eight characters. In the case of pdftotext, as it is a more recent application, it causes no problems.

Regards

livingegg’s picture

subscribing

thl’s picture

Status: Active » Fixed

ayyurek,
please try 6.x-2.x-beta4, it has been tested on WAMP, see #559414: search_files compatibility with Windows AMP
Also the helper configuration described at #340013: configuring helpers on Windows environment comment #7 works well.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

poobalu’s picture

I installed the search files module and also installed the pdftotext helper application for windows in

C:\xampp\htdocs\helpapps\pdftotext.exe %file%

and the Directory path

C:/xampp/htdocs/websites/epicdev/sites/default/files

but nothing is working for me..
Could anybody please help me to solve this problem. I don't know where I went wrong.

dgarciad’s picture

Try adding a '-' after your pdftotext path:

C:\xampp\htdocs\helpapps\pdftotext.exe %file% -

Regards

Lazzo’s picture

Version: 6.x-6.x-dev » 6.x-1.6
Priority: Critical » Normal
Status: Closed (fixed) » Active

Thanks for the information concerning how to configure windows helpers here.

I've followed your instructions installed the pdftotext helper on my Windows 2003 server. Have also confirmed that its working using windows CMD prompt - the content in any PDF are showing in the CMD window when i enter the pdftotext path and the path to a pdf file.

I've also installed WAMP 2.0,Drupal 6.14 and the module search_files 6.x-1.6 on the server. The module is activated and have been given user rights. (the option "server files" are showing when you search on the site.

Also activated the search_files index function using manually start of cron in drupal admin. The problem is that I got no hits for any content in the PDF files when searching...

To isolate the error I've checked the database - and could see that there exists file data in table search_files_files, full_path = "C: wamp www sites default files/Anstallningsbekr_arb_0.pdf" and index_attemps=1. However the column for searchable content (search_dataset.data) are empty for all records of type file_search.

Configuration
pdf path
C:\wamp\www\helpapps\pdftotext.exe %file% -

Directory path (same as used by attached items)
C:\wamp\www\sites\default\files

Any advices? Is it possible to activate som kind of error log for the search_files index process? Seems like something goes wrong here and the result are empty rows in the search_dataset.data coulmn.

hannanxp’s picture

Finaly I found an error in this code:

function search_files_update_index() {
  ....
  $sql = "
    SELECT
      *
    FROM
      {search_files_files}
    LEFT JOIN
    (
      SELECT
        *
      FROM
        {search_dataset}
      WHERE
        `type` = 'search_files'
    ) AS `dataset` ON {search_files_files}.`id` = `dataset`.`sid`
    WHERE
    (
      `dataset`.`reindex` IS NULL OR
      `dataset`.`reindex` != 0 
    ) AND {search_files_files}.`index_attempts` <= 5
    LIMIT %s
  ";

it should be like this:

function search_files_update_index() {
  ....
  $sql = "
    SELECT
      *
    FROM
      {search_files_files}
    LEFT JOIN
    (
      SELECT
        *
      FROM
        {search_dataset}
      WHERE
        `type` = 'search_files'
    ) AS `dataset` ON {search_files_files}.`id` = `dataset`.`sid`
    WHERE
    (
      `dataset`.`reindex` IS NULL OR
      `dataset`.`reindex` = 0 
    ) AND {search_files_files}.`index_attempts` <= 5
    LIMIT %s
  ";

focus on this chunk of code : `dataset`.`reindex` = 0

i hope this can help your question.

Thnx.

bahalabs’s picture

Wow!! Good Job. Its very good solution. The problem is solved now. Thank you...

T-MaK’s picture

Hi there!

I've tried to configure search_module by following your instructions but no files are indexed.
I really don't understand why...

First I installed search_module 6.x-1.6
Then I download and unpacked xpdf-3.02pl4-win32
I configure module with :

pdf path
C:\wamp\www\website\sites\default\files\pdftotext.exe %file% -

Directory path (same as used by attached items)
C:\wamp\www\website\sites\default\files

Finally I cleared cache & ran cron.

Did I missed something?

hannanxp’s picture

Nope, just try another helper for windows, i found catdoc is works well both for .doc an .txt.

here my configs:
- .doc
D:/xampp/htdocs/helpapps/catdoc.exe %file%
- .txt
D:/xampp/htdocs/helpapps/catdoc.exe %file%

and note that windows helpers do not compromise file name more than 8 chars long.

if it doesn't work (cause i forgot where is another bugs exactly), you can try my alternate new module at http://drupal.org/node/800664#comment-2980354

finaly, let me know if my module works for you.

thank you.