Error occurred while indexing non english content like Chines, Arabic etc.

The error is -
[error] Drupal\search_api_solr\SearchApiSolrException while indexing item entity:node/31529:zh-hans: Solr endpoint http://solr:8983/ internal Solr server error (500). {"responseHeader":{"status":500,"QTime":2},"error":{"msg":"[com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 middle byte 0x26 (at char #3286, byte #127)","trace":"[com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x26 (at char #3286, byte #127)\n\tat com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:45)\n\tat.

Comments

keshav.k created an issue. See original summary.

keshavv’s picture

Priority: Normal » Critical
Issue summary: View changes
mkalkbrenner’s picture

Priority: Critical » Normal

Can you provide a test that triggers that error?

gaurav.kapoor’s picture

@mkalkbrenner This error comes when trying to index non-English content, such as Spanish or Chinese translation of nodes.

mkalkbrenner’s picture

Category: Bug report » Support request
Status: Needs work » Postponed (maintainer needs more info)

There're installations that use Spanish or Chinese. I assume that the issue is in your specific content. I don't think that this is a general issue for Search API or Search API Solr.
But again, if you could isolate the string sequence and provide a test case we can take a closer look.

mkalkbrenner’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)
sershevchyk’s picture

Version: 4.1.11 » 4.2.7

I saw the same error:

[error]  Drupal\search_api_solr\SearchApiSolrException while indexing item entity:media/1067:en: Solr endpoint http://localhost:8983/ internal Solr server error (code: 500, body: {
>   "responseHeader":{
>     "status":500,
>     "QTime":0},
>   "error":{
>     "msg":"[com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 middle byte 0xe3 (at char #1068, byte #127)",
>     "trace":"[com.ctc.wstx.exc.WstxLazyException] com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0xe3 (at char #1068, byte #127)\n\tat com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:40)\n\tat com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:737)\n\tat com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3764)\n\tat com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:894)\n\tat org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:422)\n\tat org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:262)\n\tat org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:190)\n\tat org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)\n\tat org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)\n\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)\n\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:645)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:392)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.lang.Thread.run(Thread.java:750)\nCaused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0xe3 (at char #1068, byte #127)\n\tat com.ctc.wstx.sr.StreamScanner.constructFromIOE(StreamScanner.java:653)\n\tat com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1017)\n\tat com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4730)\n\tat com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3808)\n\tat com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3762)\n\t... 51 more\nCaused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0xe3 (at char #1068, byte #127)\n\tat com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:316)\n\tat com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:206)\n\tat com.ctc.wstx.io.MergedReader.read(MergedReader.java:104)\n\tat com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:88)\n\tat com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)\n\tat com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1011)\n\t... 54 more\n",
rcodina’s picture

I reproduced the same error with Spanish and Catalan content indexing. The problem was in a custom Search API plugin where we used substr method to cut a large text field. After replacing substr with mb_substr the error disappeared.