Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="sort_rendered_item" (whose UTF8 encoding is longer than the max length 32766)

So I'm indexing some pretty big rendered content. The fulltext field, as all fields are from search_api, are sortable. drunken_monkey in irc "To avoid the confusion of D7, we just declare all fields sortable now, and it's the backend's problem how to do that."

The exception that is being thrown in SearchApiSolrBackend::indexItems(), ultimately caused by the error above. I guess it is because the whole fulltext is being used for the index. I understand that seach_api_db just uses the first 30 characters.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

ekes created an issue. See original summary.

ekes’s picture

Issue summary: View changes
ekes’s picture

So yes if I drop that figure. Well I dropped it to 512 ;-) It indexes.

I'm not sure what a sensible figure is, but does a sort really need 30000 bytes or so? Seems pretty edge case.

arafalov’s picture

The root issue here is that whatever definition is for sort_rendered_item, the content it gets is too large. Usually for text fields, this means you are feeding (for example) Chinese into English tokenizer and it tries to split on space, ending up with the whole text as one "word".

But as it is for sort field, I am assuming the field is actually defined as string instead. So then the question is what the expectations are. It does not make sense to sort by the whole long string, as you said. So, just sort by prefix.

And you can get to prefix in three ways.

  1. Do it on the client if you are sending content to that field directly and you really don't care about anything beyond X characters. Then, you probably don't want to store it either, just index it.
  2. Do in in Solr if that field is populated via copyField. Not much you can do for string, but you could make it text with Keyword tokenizer and TruncateTokenFilterFactory.
  3. Truncate it in the Update Request Processors in the solrconfig.xml, if you want to be hipster about it. That would work with strings as well and is kind of equivalent of truncating it in the client, except for the wasted network transfer part. On the other hand, it would centralize the logic in Solr, in case there are multiple clients or other sources of logic.
mkalkbrenner’s picture

Status: Active » Needs review
FileSize
3.24 KB

32 chars should be enough :-)

  • mkalkbrenner committed 092f244 on 8.x-1.x authored by ekes
    Issue #2852606 by ekes, mkalkbrenner: Fulltext fields use the total...
mkalkbrenner’s picture

Status: Needs review » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.