Notice: iconv(): Detected an illegal character in input string in SearchApiAttachmentsAlterSettings->extract_simple() (line 146 of /ssd/www/drupal/sites/all/modules/search_api_attachments/includes/callback_attachments_settings.inc). [#2226525]

I have been trying to get the Search API / SOLR to use TIKA for attachments.

I am finding the following error in my watchdog logs:

Notice: iconv(): Detected an illegal character in input string in SearchApiAttachmentsAlterSettings->extract_simple() (line 146 of /ssd/www/drupal/sites/all/modules/search_api_attachments/includes/callback_attachments_settings.inc).

I am running the latest of:
Drupal 7.26
Search API
Search API SOLR
Search API Attachments with Fields module
TIKA 1.5

I have a FILE field (multiple) on a bundle.

I have tried both the remote and local implementation of Search API / SOLR / TIKA.

If I create a VERY SMALL 3-line text file, it works, however if I put an PDF or some other "document" it fails.

Hopefully someone else has already solve this problem. Thank you,

Respectfully,

Patrick O'Leary.

Comments

Comment #1

izus commented 26 March 2014 at 16:25

Comment #2

izus commented 8 June 2014 at 23:50

Status:

Active

» Closed (cannot reproduce)

hi,
tested it after issue mentionned in #1 was merged but can't reproduce it.
Please feel free to reopen it if there are more details on how to reproduce it with last code base

Comment #3

spadxiii commented 7 July 2015 at 10:04

Version:	7.x-1.3	» 7.x-1.6
Priority:	Major	» Normal
Status:	Closed (cannot reproduce)	» Active

I just ran into this issue myself as well. If I upload a txt file which is not encoded in UTF-8, the notice is thrown.

This is because the method 'extract_simple' tries to convert the txt file from UTF-8 to UTF8//IGNORE. But the file is not UTF-8 to start with. In my case it was ISO-8859-14.

protected function extract_simple($file) {
  // ...
  $text = mb_convert_encoding($text, "UTF-8");
  $text = iconv("UTF-8", "UTF-8//IGNORE", $text);
  //..
}

Comment #4

izus commented 7 July 2015 at 16:00

so to handle different types we probably need to detect file encoding and decide what to do with it

mb_check_encoding

can you please provide a patch if you fixed this locally or can we discuss another solution for this ?

thanks

Comment #5

spadxiii commented 9 July 2015 at 13:52

I haven't fixed it locally yet. The uploaded files were too large to parse anyway. :)

The little bit of code I put in my previous comment was how I temporarily 'fixed' the error. Changing the encoding of the text twice doesn't seem like a good idea though. It might be better to check the encoding and only switch once.

Comment #6

grimreaper

French

France 🇫🇷

commented 29 August 2015 at 16:40

Status:

Active

» Postponed (maintainer needs more info)

Hello,

Could you upload a lightweight version of your encoded file please?

I have used the following command to build a file encoded as your one and I didn't get the notice.

iconv -f UTF-8 -t ISO-8859-14 README.txt > README_ISO.txt

Comment #7

izus commented 27 September 2015 at 19:10

Status:

Postponed (maintainer needs more info)

» Closed (cannot reproduce)

hi,
i couldn't reproduce the issue here.
Also what was said in #3 is not correct as we are not assuming original string to be UTF8

  $text = mb_convert_encoding($text, "UTF-8");

is converting to UTF8

http://php.net/manual/fr/function.mb-convert-encoding.php

closing the issue for the moment as nobody could reproduce it, and no one uploaded a test file that is causing the issue.

Please feel free to reopen it if the issue subsists for you. We will try to look at it again.

Comment #8

spadxiii commented 30 September 2015 at 13:03

@izus : the method extract_simple does not use the mb_convert_encoding-call.

protected function extract_simple($file) {
    $text = file_get_contents($this->get_realpath($file));
    $text = iconv("UTF-8", "UTF-8//IGNORE", $text);
// ..
}

I'll have to dig some to get the file that was throwing errors here, but when I find it, I'll re-open this issue if it's still not working then.

Notice: iconv(): Detected an illegal character in input string in SearchApiAttachmentsAlterSettings->extract_simple() (line 146 of /ssd/www/drupal/sites/all/modules/search_api_attachments/includes/callback_attachments_settings.inc).

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Related issues

News items

Our community

Documentation

Drupal code base

Governance of community