Looking thought the docs, thinking about direct indexing with Tika, etc, I see a couple tweaks we might want to make.

To use the fast vector highlighter, the text fields being highlighted need to have:

termVectors="true" termPositions="true" termOffsets="true"

see: http://wiki.apache.org/solr/SchemaXml#Expert_field_options

The storage of Lucene term vectors can be triggered using the following field options:

termVectors=true|false
termPositions=true|false
termOffsets=true|false
These options can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr (phrase queries, etc., do not require these settings to be present).

An important question is the extent to which this increases the index size.

CommentFileSizeAuthor
#1 15766161.patch3.17 KBpwolanin
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

pwolanin’s picture

FileSize
3.17 KB

first pass

cpliakas’s picture

Added another schema.xml tweak that I think warrants it's own issue, although it is related to this topic.

#1586320: Add support for the ExternalFileField field type

Nick_vh’s picture

Status: Active » Needs work

We also need some patch in the drupal code
http://wiki.apache.org/solr/HighlightingParameters mentions that we should enable hl.useFastVectorHighlighter in order to use this

Info : http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighl...

pwolanin’s picture

Well, yes we will need to patch the module too, but that can be post-RC if needed.

I want this issue to be just about the schema (and solrconfig if needed) changes.

pwolanin’s picture

5000 nodes indexed and optimized, with current schema:

$ du -ch multicore/d7-core/data/index/
71M multicore/d7-core/data/index/
71M total

with the patch:

$ du -ch multicore/d7-core/data/index/
89M multicore/d7-core/data/index/
89M total

So it's a meaningful ~25% increase in index size.

Nick_vh’s picture

Priority: Normal » Major
Issue tags: +RC blocker

pwolanin, do you want this in the module? Should we enlarge the indexes? Marking as RC blocker

Nick_vh’s picture

Issue tags: -RC blocker +RC2 blocker

Going to roll out an RC now, this could very well be in RC2. Not critical

pwolanin’s picture

Given that we don't have any experience with this or comparative benchmarks, seems like more of a research project and possible documentation issue right now.

pwolanin’s picture

Status: Needs work » Postponed
Nick_vh’s picture

Project: Apache Solr Search » Apache Solr Common Configurations
Version: 7.x-1.x-dev »
Status: Postponed » Active

moving this to the common schema project