Problem/Motivation

When attempting to index media entities with file attachments, we are running into a few different Solarium exceptions during the extraction. One of the exceptions can be seen here - https://www.drupal.org/node/2859565. Regardless of the exception, it causes indexing to halt outright. If we try to start the indexing process again it will fail on the same item. If we unpublish or remove that item from the queue of items to be indexed indexing proceeds until it hits another item that throws an exception, and so on. In the interim, I've added some proof-of-concept code to search_api_attachments to handle the exceptions (https://www.drupal.org/node/2884453) for the sake of gathering some data on how often these exceptions are thrown, and if there is a pattern in the data that throws them. The two exceptions I'd like to report here are both thrown by acquia_search inside of SearchSubscriber.php. The files I have looked at that have caused these to be thrown are all PDFs.


Solarium\Exception\HttpException: Solr HTTP error: Authentication of search content failed url: http://example.com:80/solr/core/extract/tika?omitHeader=true&wt=json&json.nl=flat&extractOnly=true&resource.name=example.pdf&request_id=someId&x-request-id=someId in Drupal\acquia_search\EventSubscriber\SearchSubscriber->authenticateResponse() (line 131 of /path/to/codebase/modules/contrib/acquia_connector/acquia_search/src/EventSubscriber/SearchSubscriber.php).


Solarium\Exception\HttpException: Solr HTTP error: Internal Server Error in Drupal\acquia_search\EventSubscriber\SearchSubscriber->postExecuteRequest() (line 105 of /path/to/codebase/modules/contrib/acquia_connector/acquia_search/src/EventSubscriber/SearchSubscriber.php).

Proposed resolution

I'm yet to identify a pattern in the files that throw these errors (as well as the one in https://www.drupal.org/node/2859565) and am seeking insight into what the root cause may be that leads to them being thrown. Additionally, any suggestions on how to debug these items and move forward would be appreciated. Lastly, if it does turn out that this is a data-related issue specific to certain PDFs, suggestions on how to address this on Acquia's platform are welcome.

Core and module versions:
Drupal - 8.3.2
Search API - 1.0.0
Search API Solr - 1.0.0-beta3
Search API Attachments - 1.0-beta2
Acquia Connector/Acquia Search - 1.12

Comments

malik.kotob created an issue. See original summary.

Dane Powell’s picture

Status: Active » Postponed (maintainer needs more info)

Hi Malik, were you ever able to make progress on this issue?

Dane Powell’s picture

Status: Postponed (maintainer needs more info) » Closed (outdated)

Closing after a week of inactivity, per the contribution guidelines. Feel free to reopen if you can provide the requested information. Cheers.