Closed (fixed)
Project:
Apache Solr Attachments
Version:
7.x-1.0
Component:
Code
Priority:
Critical
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
5 Aug 2009 at 17:19 UTC
Updated:
21 Jan 2017 at 01:10 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
pwolanin commentedIt should work - please try it and we can update the README.
Comment #2
timatlee commentedCan't seem to get it going. When trying to run it from the command line, I get:
Still using tika-0.3 from svn, revision 756979
Comment #3
timatlee commentedApparently I'm a newbie, and should have RTFM'd.
Checking out Tika from SVN at http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/extraction/lib will get you 0.4, except without tika-app-0.4.jar, which I guess is the java app that needs to be called.
I couldn't find a download that had tika-app-0.4.jar, so I wound up having to build it from source.
If you don't mind a few downloads, it's very easy to do. I have no idea what the license restrictions are specific to distributing a compiled jar, but I think if you're in this deep, you're probably OK having to go elseware to download some additional tools.
Oh, to note: I am doing this on Windows XP and 2003.
I'm sure this would be a fraction of the difficulty if I were using a real OS, but I'm not so fortunate for that where I work.
Hope this helps someone else down the road.
Comment #4
pwolanin commentedHmm, perhaps tika 0.4 added a new master application jar. Silly that Solr is not shipping it - but likely it's not needed for content extraction.
Comment #5
timatlee commentedHum, does that mean that, in the long run, Tika will not be required?
Comment #6
pwolanin commentedWell, the long-run plan was to use tika via Solr (or at least have that as an option), but it would be nice to continue to have a local tika extraction option.
Comment #7
timatlee commentedHmm, using Tika via Solr would be exceptional. I had a considerable amount of grief getting Tika to run properly when using IIS.... A lot of problems with quoting paths and such.
Any information out there on using tika via solr that I could experiment with?
Thanks,
Comment #8
pwolanin commentedhttp://wiki.apache.org/solr/ExtractingRequestHandler
Comment #9
pwolanin commentedI had the same experience with tika 0.4 - needed to build from source to get the app jar. PITA.
Comment #10
pwolanin commentedHere's an update to the README
Comment #11
pwolanin commentedcommitted that path + one more line
Comment #13
very_random_man commentedI've followed these instructions but unfortunately I'm still having a bit of related Tika trouble.
I've built Tika with Maven and it works fine from the command line (Mac OSX). However, when running the cron, the attachments module runs the shell_exec command to extract the text and gets no response. This error turns up in the apache error log
:
The problem appears to be that there aren't sufficient permissions to spawn the GUI. I notice that even when running via the command line, the GUI is still appearing very briefly.
Is there a way to suppress this GUI thing? Also, is this likely to only be a problem on my Mac environment? Would a linux server deal with this differently? Presumably anyone who has got this working must have worked around this.
Any hints will be most appreciated! :-)
Comment #14
very_random_man commentedComment #15
pwolanin commentedPerhaps you are using the wrong jar?
I've used tike 0.4 with no problem on Mac OS 10.5
Comment #16
very_random_man commentedI was using tika-app-0.5.jar. I've just downloaded and installed 0.4 and it works fine.
Also, when I run the 0.4 command line i don't see any programs pop up on my dock so I my hunch is that 0.5 is running some kind of GUI thing by mistake which shell_exec doesn't have permission to do.
Thanks for the module btw. Works like a charm now!
Comment #17
pwolanin commentedhmm, 0.5 works ok for me to index some files with this module - built from a checkout of http://svn.apache.org/repos/asf/lucene/tika/tags/0.5
so, not sure what the issue is for you, but marking fixed in the absence of more info.
Comment #19
lukus@timatlee;
You walkthrough worked for me, thanks.
But, I need to download the source using
Comment #20
Yaron Tal commentedOn ubuntu:
The last command failed at first (and second run), but the third time it finished and it seems to work.
In drupal I added the tika root dir (the checkout dir) as the Tika directory path and tika-app/target/tika-app-0.5.jar as the tika jar file.
Searching in drupal gives me content from within pdf files now.
Comment #21
johnennew commentedThanks Yaron,
Just a note that I managed to get tika version 0.9 working using these instructions in Drupal 7. I had some PDFs which earlier versions of Tika were not extracting contents from.
Download the 0.9 src code from the Apache tika website: http://tika.apache.org/download.html
On ubuntu you can then:
In the attachments setting screen I set the directory path as /usr/local/share/tika and the Tika jar file as tika-app-0.9.jar
After manually running cron and waiting 5 minutes, the pdf contents appeared in the search results
Comment #22
akshita commentedHi John
I followed exactly what you did but no luck .Please do the needful
cd /var
unzip apache-tika-0.9-src.zip
cd apache-tika-0.9
sudo apt-get install maven2
sudo mvn install
sudo mkdir /usr/local/share/tika
sudo cp tika-app/target/tika-app-0.9.jar /usr/local/share/tika
root@xxxxx:/var# cp tika-app/target/tika-app-0.9.jar /usr/local/share/tika
cp: cannot stat `tika-app/target/tika-app-0.9.jar': No such file or directory
Thanks
Revathi
Comment #23
nick_vhClosing because it's an old issue with no response. Also, most of the people have gotten it to work so it should work
Comment #24
pwolanin commentedComment #25
dahousecat commentedI had to add these 2 lines to settings.php to get it to work:
I don't see how these variables were meant to be set and nothing in the instructions about it?!?
Comment #26
glenshewchuck commentedI don't know about early versions of this module but as of version 7.x-1.4 the $conf settings can be found at /admin/config/search/apachesolr/attachments