It makes no sense to index the same value multiple times in a multi value field.
Here's a patch that avoids duplicates, no matter how many times contrib modules probably add the same value.

Files: 
CommentFileSizeAuthor
#1 1917400_avoid_multiple_identical_values.patch522 bytesmkalkbrenner
PASSED: [[SimpleTest]]: [MySQL] 513 pass(es). View

Comments

mkalkbrenner’s picture

Status: Active » Needs review
FileSize
522 bytes
PASSED: [[SimpleTest]]: [MySQL] 513 pass(es). View
Nick_vh’s picture

Status: Needs review » Reviewed & tested by the community

Makes sense

Nick_vh’s picture

Version: 7.x-1.x-dev » 6.x-3.x-dev
Status: Reviewed & tested by the community » Patch (to be ported)

needs backport to 6.x-3.x, committed to 7.x-1.x

pwolanin’s picture

Version: 6.x-3.x-dev » 7.x-1.x-dev
Status: Patch (to be ported) » Needs work

I think we should revert - it's not the module's job to clean up your data or guess what you meant.

Nick_vh’s picture

Reverted in code, we should figure out if this is a difference for Solr (eg. Boosting). For facetting this certainly makes not a big difference.

pwolanin’s picture

Status: Needs work » Closed (won't fix)

the number of times the value appears affects scoring, so I don't think we should be trying to guess the intent at this level.

mkalkbrenner’s picture

After our conversation I agree with Peter and will solve the duplicate entry issue within Apache Solr Multilingual.

heacu’s picture

note that this can also be done effectively in solrconfig.xml using
UniqFieldsUpdateProcessorFactory
(http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/pro...).

you'll need Solr 4.0 for this, though.