Looking thought the docs, thinking about direct indexing with Tika, etc, I see a couple tweaks we might want to make.
To use the fast vector highlighter, the text fields being highlighted need to have:
termVectors="true" termPositions="true" termOffsets="true"
see: http://wiki.apache.org/solr/SchemaXml#Expert_field_options
The storage of Lucene term vectors can be triggered using the following field options:
termVectors=true|false
termPositions=true|false
termOffsets=true|false
These options can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr (phrase queries, etc., do not require these settings to be present).
An important question is the extent to which this increases the index size.
Comment | File | Size | Author |
---|---|---|---|
#1 | 15766161.patch | 3.17 KB | pwolanin |
Comments
Comment #1
pwolanin CreditAttribution: pwolanin commentedfirst pass
Comment #2
cpliakas CreditAttribution: cpliakas commentedAdded another schema.xml tweak that I think warrants it's own issue, although it is related to this topic.
#1586320: Add support for the ExternalFileField field type
Comment #3
Nick_vhWe also need some patch in the drupal code
http://wiki.apache.org/solr/HighlightingParameters mentions that we should enable hl.useFastVectorHighlighter in order to use this
Info : http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighl...
Comment #4
pwolanin CreditAttribution: pwolanin commentedWell, yes we will need to patch the module too, but that can be post-RC if needed.
I want this issue to be just about the schema (and solrconfig if needed) changes.
Comment #5
pwolanin CreditAttribution: pwolanin commented5000 nodes indexed and optimized, with current schema:
$ du -ch multicore/d7-core/data/index/
71M multicore/d7-core/data/index/
71M total
with the patch:
$ du -ch multicore/d7-core/data/index/
89M multicore/d7-core/data/index/
89M total
So it's a meaningful ~25% increase in index size.
Comment #6
Nick_vhpwolanin, do you want this in the module? Should we enlarge the indexes? Marking as RC blocker
Comment #7
Nick_vhGoing to roll out an RC now, this could very well be in RC2. Not critical
Comment #8
pwolanin CreditAttribution: pwolanin commentedGiven that we don't have any experience with this or comparative benchmarks, seems like more of a research project and possible documentation issue right now.
Comment #9
pwolanin CreditAttribution: pwolanin commentedComment #10
Nick_vhmoving this to the common schema project