Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
In _linkchecker_status_handling()
, error messages and response codes are assumed to always be ISO-8859-1. I've experienced issues with handling of error codes using multibyte characters (specifically Arabic text) where the error message is output as a string of nonsense characters.
The source encoding should be automatically detected if possible using mb_detect_encoding()
.
Comment | File | Size | Author |
---|---|---|---|
#1 | mb_error_encoding-2261795-1.patch | 1.08 KB | ben.kyriakou |
Comments
Comment #1
ben.kyriakou CreditAttribution: ben.kyriakou commentedAdded patch to fix this issue - if available, the source encoding is first checked by using
mb_detect_encoding()
. If not available, it will fall back to ISO-8859-1.Comment #2
ben.kyriakou CreditAttribution: ben.kyriakou commentedComment #3
hass CreditAttribution: hass commentedHave you seen how https://api.drupal.org/api/drupal/includes%21unicode.inc/function/drupal... works? Your solution may work with mb function, but otherwise it will fail as these function does not exists... :-(((
Comment #4
Richard Damon CreditAttribution: Richard Damon commentedLooking at the http headers for the returned page (in particular the Content-Type: header), should tell you the encoding for the page. (If it is omitted, it can be assumed to be ISO-8859-1, but if it is different, it should be specified). There is no need to "guess" the encoding. I suppose adding a guess if it doesn't define it might make sense for some "broken" sites.
Comment #5
hass CreditAttribution: hass commentedIt's not about a header. It's your local server error message that has an unknown encoding and that mysql update does not save the error message. I have not found any way to detect the encoding of this string yet. :-((( in my case the system was german and this means not ISO for sure. I had a core case about this too without any result at all.
Maybe easier to open a php bugcase in the hope to find the root cause.
Comment #6
Richard Damon CreditAttribution: Richard Damon commentedI am sure the Database error is because the data being sent to a Text field marked UTF-8 and the data is not encoded as UTF-8 and thus contains illegal characters. Converting from ISO-8859-1 "works" as any byte stream is a valid character stream, so the results will be valid UTF-8.
The real solution is to look at the response headers from the HTTP request, which should contain a Content-Type: header (at least if the page is not encode as ISO-8859-1), telling the encoding of the data on the page. If it didn't have it, then the page couldn't have any character not in ISO-8859-1. (This is a standard header used in many internet protocols to define encodings).
Generally, the basic level transfer routines do NOT check this header and convert the data, as they leave that for the final client (you may WANT the original data for some reason). I often will add a wrapper around the basic routines that does parse some of the basic headers and normalizes the data (convert all data into UTF-8, for instance, regardless of the original character set)
In the response packet, these headers will be placed in the header member as an associative array, so $result->header['content-type'] will have the header which will normally include a field with the character encoding. You should be able to use this encoding as the source incoming for your conversion (instead of just always using ISR-8859-1).
Comment #7
hass CreditAttribution: hass commentedIt looks like you have not understand the root cause. The message comes from your LOCAL php machine as it has NOT any answer from a remote host. As I know, there is no header with the datatype.