Fulltext fields use the total content of the field for the sort_ index field [#2852606]

Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="sort_rendered_item" (whose UTF8 encoding is longer than the max length 32766)

So I'm indexing some pretty big rendered content. The fulltext field, as all fields are from search_api, are sortable. drunken_monkey in irc "To avoid the confusion of D7, we just declare all fields sortable now, and it's the backend's problem how to do that."

The exception that is being thrown in SearchApiSolrBackend::indexItems(), ultimately caused by the error above. I guess it is because the whole fulltext is being used for the index. I understand that seach_api_db just uses the first 30 characters.

Comment	File	Size	Author
#6	2852606.patch	3.24 KB	mkalkbrenner
#6
#4	2852606-04.search_api_solr_fulltext_field_sort_size.patch	1.11 KB	ekes
#4

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

14 February 2017 at 17:22

ekes created an issue. See original summary.

Comment #2

ekes CreditAttribution: ekes as a volunteer commented 14 February 2017 at 17:24

Issue summary:

View changes

Comment #3

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 14 February 2017 at 17:30

Comment #4

ekes CreditAttribution: ekes as a volunteer commented 14 February 2017 at 18:09

File	Size
2852606-04.search_api_solr_fulltext_field_sort_size.patch	1.11 KB

So yes if I drop that figure. Well I dropped it to 512 ;-) It indexes.

I'm not sure what a sensible figure is, but does a sort really need 30000 bytes or so? Seems pretty edge case.

Comment #5

arafalov CreditAttribution: arafalov as a volunteer commented 15 February 2017 at 02:13

The root issue here is that whatever definition is for sort_rendered_item, the content it gets is too large. Usually for text fields, this means you are feeding (for example) Chinese into English tokenizer and it tries to split on space, ending up with the whole text as one "word".

But as it is for sort field, I am assuming the field is actually defined as string instead. So then the question is what the expectations are. It does not make sense to sort by the whole long string, as you said. So, just sort by prefix.

And you can get to prefix in three ways.

Do it on the client if you are sending content to that field directly and you really don't care about anything beyond X characters. Then, you probably don't want to store it either, just index it.
Do in in Solr if that field is populated via copyField. Not much you can do for string, but you could make it text with Keyword tokenizer and TruncateTokenFilterFactory.
Truncate it in the Update Request Processors in the solrconfig.xml, if you want to be hipster about it. That would work with strings as well and is kind of equivalent of truncating it in the client, except for the wasted network transfer part. On the other hand, it would centralize the logic in Solr, in case there are multiple clients or other sources of logic.

Comment #6

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 24 April 2017 at 16:49

Status:

Active

» Needs review

File	Size
2852606.patch	3.24 KB

32 chars should be enough :-)

Comment #7

25 April 2017 at 07:53

mkalkbrenner committed 092f244 on 8.x-1.x authored by ekes

Issue #2852606 by ekes, mkalkbrenner: Fulltext fields use the total...

Comment #8

mkalkbrenner

German

🇩🇪

CreditAttribution: mkalkbrenner at bio.logis Genetic Information Management GmbH commented 25 April 2017 at 07:57

Status:

Needs review

» Fixed

Comment #9

9 May 2017 at 07:59

Status:

Fixed

» Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

Fulltext fields use the total content of the field for the sort_ index field

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Related issues

Thank you to these Drupal contributors

News items

Our community

Documentation

Drupal code base

Governance of community