Problem/Motivation

non-ASCII characters are rendered as ????? or other weird characters
Persian language and replace by ???????????????? or Dropping between letters in word.

Example text:
Do not apply if air and surface temperatures are below 5ºC or above 35ºC.

Rendered as:
invalid pdf rendering

Steps to reproduce

add a non-ASCII character to content and generate PDF
example: Do not apply if air and surface temperatures are below 5ºC or above 35ºC.

Proposed resolution

Add UTF-8 encoding

Remaining tasks

provide a patch

User interface changes

none

API changes

none

Data model changes

none

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

h.parsi created an issue. See original summary.

matt b’s picture

Same issue here. Any updates or support for this?

matt b’s picture

smustgrave’s picture

Status: Active » Postponed (maintainer needs more info)
Issue tags: +Needs Review Queue Initiative

Could more steps be added by chance? But thanks to @Matt B in #3 seems like this might be an issue with dompdf maybe an issue should be logged there instead?

avpaderno’s picture

Reading About Fonts and Character Encoding, I gather it could be an issue with this module, which should reference the correct font in the used CSS stylesheet.

avpaderno’s picture

Version: 8.x-2.1 » 8.x-2.x-dev
Assigned: h.parsi » Unassigned

The attached screenshot does not show the described bug. It just shows the screenshot of part of a node, where the PDF link is present.

smustgrave’s picture

Status: Postponed (maintainer needs more info) » Active

Thanks for taking a look @apaderno

avpaderno’s picture

Also, what does dropping between letters exactly mean?

smustgrave’s picture

Not familiar with other languages but I took that as the letter appears slightly lower then expected/not centered with rest of the word. But just how I took it.

avpaderno’s picture

Status: Active » Postponed (maintainer needs more info)

That would be dropping letters.
Given the screenshot does not show exactly what the bug is, and the description is not clear, this needs more information from the OP.

matt b’s picture

I cannot comment for the OP, but I'm still struggling to get output in Farsi / Persian.

I've set

* {
    font-family: 'DejaVu Sans', sans-serif, Courier;
}

In the css, and whilst it now produces characters instead of lots of ??? (one or two specific issues in my header, which I'll look at separately), but when I copy and translate the text back to english using google translate it's clearly not giving me the original text, and when I compare to the original text in Drupal it is different - something is happening in the PDF production process.

This text (from both the node content and .../debug) looks correct (probably reverts to LTR here) :

در اشعيا باب ۵۵ آيه ٣ خدا از ما ميخواهد که به نزد او بياييم، به او گوش دهيم و زندگی واقعی پيدا کنيم .
خدا نمی خواهد ما هلاک شويم. او می خواهد که ما در پادشاهی او باشيم و به پسر عزيزش عيسی اين امکان را داده است تا او اين امر را فراهم کند.

But is displayed in the PDF as

هب ،مييايب وا دزن هب هک دهاوخيم ام زا ادخ ٣هيآ ۵۵باب ايعشا رد
. مينک اديپ یعقاو یگدنز و ميهد شوگ وا
وا یهاشداپ رد ام هک دهاوخ یم وا .ميوش کاله ام دهاوخ یمن ادخ
وا ات تسا هداد ار ناکما نيا یسيع شزيزع رسپ هب و ميشاب
. دنک مهارف ار رما نيا

matt b’s picture

I think the following is relevant, and this is probably a support request rather than a bug fix due this being a feature not supported by DomPDF?

https://github.com/dompdf/dompdf/issues/2619
https://github.com/dompdf/dompdf/pull/2107

Also, google about, I got the hint that this may not be an issue with Wkhtmltopdf, so I might give that a try (something for another day!)

matt b’s picture

I've switched to using the Entity PDF module, which uses mpdf as the engine. It is rendering Arabic characters fine.

hdahoud’s picture

You can use phpwkhtmltopdf library https://github.com/mikehaertl/phpwkhtmltopdf

jurgenhaas’s picture

I'm having the same issue with simple German umlauts, they get printed as 2-byte character combination, as if dompdf is able to deal with UTF-8. Switching to wkhtmltopdf engine solves the issue, though.

peri22’s picture

Status: Postponed (maintainer needs more info) » Needs review
StatusFileSize
new614 bytes

Hello, I had the same problem with some characters. I have set the font-family and created a patch to fix the character encoding.

avpaderno’s picture

Status: Needs review » Needs work
Issue tags: +Needs issue summary update
johanvdr’s picture

I confirm that the patch in #16 fixed the issue though with the encoding but that would throw a deprecated error in php 8.2.x
Deprecated function: mb_convert_encoding(): Handling HTML entities via mbstring is deprecated;

 $this->html = mb_convert_encoding($this->html, 'HTML-ENTITIES', 'UTF-8');


Handling HTML entities via mbstring is deprecated in PHP 8.2.

To fix that deprecated error:
$this->html = mb_encode_numericentity($this->html, [0x80, 0x10ffff, 0, 0xffffff], 'UTF-8');

johanvdr’s picture

Here is an updated patch which resolves that deprecated error.

johanvdr’s picture

I have updated the patch from #19. There was a small issue with the conversion map array.

johanvdr’s picture

Actually there is an even more simple fix that worked for me without any patch. Set the metatag header in entity-print.html.twig.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

kufliievskyi’s picture

I confirm that the template fix from the #21 comment works for me also. Looks like the most simple solution.

introfini made their first commit to this issue’s fork.

introfini’s picture

Status: Needs work » Needs review

#21 provides a straightforward fix. In a fresh installation with only that fix applied, everything is working correctly. Additionally, the Dompdf documentation also uses that meta tag: https://github.com/dompdf/dompdf/wiki/About-Fonts-and-Character-Encoding

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

I've created a merge request with the simple fix from #21. Please review...

jannakha’s picture

Title: not correct export character » UTF-8 encoding is required to display non-ASCII characters
Issue summary: View changes
StatusFileSize
new40.7 KB
jannakha’s picture

Status: Needs review » Reviewed & tested by the community

Both MR70 and #20 patches work and fix the issue.

MR70 will require developers to update any custom twig templates.

Although latest browsers treat <meta charset="utf-8"> the same as <meta http-equiv="content-type" content="text/html; charset=utf-8"> DomPDF needs specific attributes in meta tag.

m.stenta’s picture

StatusFileSize
new385 bytes

I can confirm that the suggestion in #21 (which is now implemented in MR70) fixes the issue for me.

In my case, the string "Moo’s Dairy Farm" was appearing in the PDF as "Mooâ??s Dairy Farm".

With the change from #21, it now appears as "Moo’s Dairy Farm".

The easiest way for me to fix this in my production deployments (while we wait for this issue) is to apply a patch via cweagans/composer-patches. Attached is a patch for this purpose (which is safer to use than a MR patch, which may be changed by anyone with push access).

Thanks for finding this solution @johanvdr!

abelpzl’s picture

Patch #28 fixed my issue. But in my case, I preferred to override the entity-print.html.twig template in my custom theme and set the meta tag header there, so I wouldn't have to apply the patch.

jsacksick made their first commit to this issue’s fork.

  • jsacksick committed 023b017d on 8.x-2.x authored by introfini
    Issue #3028545: UTF-8 encoding is required to display non-ASCII...
jsacksick’s picture

jsacksick’s picture

Status: Reviewed & tested by the community » Fixed

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.