Closed (won't fix)
Project:
Solr Nutch
Component:
Documentation
Priority:
Minor
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
7 Feb 2013 at 16:40 UTC
Updated:
29 Apr 2013 at 01:27 UTC
before I forget, I just rebuilt the Solr Nutch set-up and stumbled on the README vs the Nutch Tutorial
in the Nutch Tutorial there is a step on integrating Nutch and Solr by copying an xml from Nutch into Solr
correct me if I am wrong, but I assume that's not a needed step
the Apache Solr Integration Drupal module has config files that are copied into Solr and those files provide the needed Nutch compliant Schema
will clarify this post later when I am back in front of my dev machine
Comments
Comment #1
sgurlt commentedI am actualy trying to rebuild the readme and got stuck at the point to crawl my first url:
Indexing 1 documents
SolrIndexer: finished at 2013-03-04 10:28:31, elapsed: 00:00:21
SolrDeleteDuplicates: starting at 2013-03-04 10:28:31
SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373)
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
Could it be, that this has something to do with the config files?
I am using the files from the git archiv for nutch and the files from the solr modul for solr, shoud be correct I think?!
Comment #2
cilefen commentedSorry for the long delay -- you need to look at the Solr server log file also.
Comment #3
niccolox commentedThere is a dedup bug in nutch 1.6
Comment #4
cilefen commentedniccolo - thank you for the info, I am closing this issue.