Problem/Motivation

The method we are using to build the PDF file output for the User Guide is called "FOP", which is an Apache library.

Unfortunately, it does not support right-to-left (RTL) languages, such as Farsi.

So, we either need to:
- Not build the PDF output for the User Guide for RTL languages.
- Figure out another way to build the PDF output.

Proposed resolution

As a first step, modify the scripts so that we do not build the PDF output for Farsi, which is currently our only right-to-left language.

Then leave this issue open, and eventually maybe see if we can figure out a different way to build the User Guide PDF output that will work for all languages.

Remaining tasks

Figure out a new way to build PDF files that works for all languages.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

jhodgdon created an issue. See original summary.

  • jhodgdon committed 125534a on 8.x-3.x
    Issue #2887064 by jhodgdon: Cannot build PDF files for right-to-left...
jhodgdon’s picture

Commit made to modify the scripts to skip Farsi. Sorry ...

jhodgdon’s picture

It's possible that if I can use a later version of the FOP library, that will help. Some web pages I found said something about RTL language support being available in later versions. But... not sure...

jhodgdon’s picture

Assigned: Unassigned » novid
Status: Active » Needs review
FileSize
6.76 MB

I got this working! It turns out I needed to tell the XSLT to use FOP extensions, and tell FOP to not use "complex scripts" to get around a bug in the version of FOP I have installed (1.1). And I had to use GNU Unifont, because the Noto font doesn't have the Farsi characters. It's not a gorgeous font but at least it works.

So... as far as I can tell the PDF is OK (I cannot read Farsi so ... ??? At least it builds. I am going to attach the PDF here and ask our Farsi coordinator novid to review the file and confirm it is working OK...

Meanwhile I will commit the changes to the scripts.

@novid -- please let me know whether the PDF is looking OK or not?

  • jhodgdon committed bb17c52 on 8.x-3.x
    Issue #2887064 by jhodgdon: Cannot build PDF files for right-to-left...
jhodgdon’s picture

FileSize
6.47 MB

I already see that there are some font problems in the PDF for Farsi. Do you know of a font that has good Farsi coverage that we could use instead of Unifont, which besides being ugly does not apparently have full coverage?

I'm also attaching the EPUB version of the Guide in Farsi. I am not sure if it is OK either. It doesn't look very much like the PDF, when I compare a section that is translated, and it doesn't look like it is right-to-left... Is Farsi even a right-to-left language?

jhodgdon’s picture

So, Farsi is definitely a RTL language... It doesn't look like the epub file is aware of that, and I am not sure about the PDF. If you could tell me whether either of them is close to OK, that would help! Thanks!

jhodgdon’s picture

FileSize
236.75 KB

OK, I think I fixed the RTL problem in the ePub output. Check out this version...

jhodgdon’s picture

This article
https://advocacyassembly.org/en/news/53/
suggests that the Noto Arabic Naskh font is good for Persian/Farsi language. I also saw on https://en.wikipedia.org/wiki/Noto_fonts that there are CJK fonts in the Noto family, which would most likely be vastly better than the Unifont fonts we are currently using.

So... I will have to try downloading some of these fonts and see if we can make nicer PDFs for Chinese, Japanese, and Farsi. But I'd still like to know whether the Farsi PDF and ePub output is even remotely close to OK...

jhodgdon’s picture

FileSize
7.29 MB

OK... I tried to use the font "Noto Naskh Arabic" for Farsi, with a fallback of Noto Sans for characters that do not appear in the first font. Here is a PDF generated with that... The problem is that sometimes there is a mix of Farsi and Latin characters within the same word, and the PDF generator cannot handle switching fonts within the same word. Take a look... this is probably the best I can do right now, unless you know of a font that has both Farsi and Latin characters in it that would work better?

jhodgdon’s picture

I was not able to get the Noto CJK fonts to work. The problem is they are opentype instead of truetype fonts, and the PDF maker doesn't seem to support them. I tried converting them using Fontforge, but they didn't work either. So for the time being, I have gone back to Unifont for Chinese and Japanese. Maybe someone can suggest a better font that is free, easy to download, and TrueType... until then I'll stick with Unifont even though it is somewhat ugly, at least it works.

  • jhodgdon committed 0b0a7f7 on 8.x-3.x
    Issue #2887064 by jhodgdon: Cannot build PDF files for right-to-left...
novid’s picture

Sorry for my late answer, didn't have access to my development machine. Both of PDFs you mentioned here have multiple problems with character glyphs and text direction. The second EPUB file was near to final format which the PDF should follow.

There are some good open Persian fonts available like Vazir or some others from the same famly which has Roboto font characters for latin words.

Unfortunately, during next week i am away from any computer related activities. But in return i will have a look at FOP and find a way for RTL language support in this version of the project.

jhodgdon’s picture

Hi! Can you look at the PDFs and ePubs that are currently committed to the Git repository, 8.x-3.x branch, in the "ebooks" directory? I think they are OK but of course I cannot read Persian. Thanks!

novid’s picture

PDF still has previous problems with glyph rendering and text direction but EPUB is the same. MOBI is between with right glyphs but wrong direction.

novid’s picture

There are two debian font packages for Farsi (Persian):

  • fonts-farsiweb
  • fonts-freefarsi

Both of them are TrueType families including Homa, Nazli, Nazli Bold, Titr and FreeFarsi.

jhodgdon’s picture

Oh, you're right! Sorry, I thought I had the PDF building in the correct text direction and better font coverage, but it is definitely wrong... I will try one of the fonts you have suggested and see if I can build a better PDF.

Regarding Mobi, it looks like this bug:
https://bugs.launchpad.net/calibre/+bug/1073414
So it says I should convert to a different format. I'll give that a try...

Anyway I will report back here. Thanks again for reviewing!

jhodgdon’s picture

Regarding the PDF generation... I found some links/information:

a) the Apache FOP page (this is the PDF converter we are using) about "complex scripts"...
https://xmlgraphics.apache.org/fop/2.0/complexscripts.html

This says that "complex scripts" support can be enabled or disabled. The scripts/fop-conf.xml file we currently have in Git has this line that disables this support:

  <complex-scripts disabled="true"/>

But when I remove this line, the PDF build for Persian/Farsi dies with a Java error:

Exception
org.apache.fop.apps.FOPException
java.lang.NullPointerException

so I cannot even get the FOP converter to run without this line in the config file.

b) Regarding that, I found this bug report:
https://issues.apache.org/jira/browse/FOP-2262

This is a bug in FOP version 1.1 (which is what I am currently running), and has been fixed in FOP 2.2.

c) So... My conclusion is that to make successful Farsi PDF files, besides finding a better font, I need to be able to enable "complex scripts", and in order to do that, I need to update to FOP 2.2. I will try that and report back.

jhodgdon’s picture

Title: Cannot build PDF files for right-to-left languages » Cannot build PDF/mobi files for right-to-left languages

OK, after updating to FOP 2.1, the Farsi PDF is looking better! I first tried the FreeFarsi font. When I was building the PDF, I had a few missing character warnings, for things like emdash, nbspace, and rightquote, and this resulted in some # characters in things like section headers. Not too good, but definitely better than the Noto fonts.

So then I installed the Farsiweb fonts, and looked at the list. Only the "Nazli" font seems to have both bold and regular; none of them have Italics. Titr is only bold, and Homa is only regular. So I tried "Nazli". That one had pretty much the same missing characters, so I think FreeFarsi is probably better, since it includes regular, bold, and italic.

So, I guess I will go ahead and commit these changes. Can you take a look at the new Farsi PDF in the Git repo and see what you think?

Note: I haven't fixed Mobi yet... still looking into that...

  • jhodgdon committed 87e1504 on 8.x-3.x
    Issue #2887064 by jhodgdon, novid: Use FOP 2.1 to build ebooks and new...
jhodgdon’s picture

Regarding the Mobi format... I attempted to build an AZW3 format for Farsi, as suggested in the link in comment #18. However, it seems to still be left-to-right instead of right-to-left... This is possibly due to looking at it using an application and not an actual Kindle? But anyway it did not fix the problem. So... I think right now the best thing to do is add a README to the ebooks directory stating that .mobi is not supported for right-to-left languages.

Anyway, sometime let me know what you think about the newest PDF file output.

  • jhodgdon committed 17719e1 on 8.x-3.x
    Issue #2887064 by jhodgdon: Add README to ebooks directory
    
novid’s picture

The problem with text direction is now solved, but glyph rendering still has issues. I don't know why this happens with FOP in different versions. The correct output of Persian language should be similar to this page. There are both joined and separated letters in Persian in spite of English that letters in a word are separated from each other. The PDF you mentioned has only separated letters that make it impossible to read.

jhodgdon’s picture

That is not good. I think someone needs to file an issue with the Apache FOP project to tell them their FOP software is not working, in that case... Well, just to be sure... Is the current ePub output readable? I think you said it was earlier, but could you check that again?

The reason I ask about that is that in generating the PDF, the first step is to generate a "docbook" output, which is then passed through two different processors -- one makes ePub and the other makes PDF. So if the ePub is OK and the PDF is not OK, then it is definitely the fault of the FOP processing.

Thanks again for your advice/reviews on this issue....

jhodgdon’s picture

If we need to file an issue with FOP...

a) Here is the page about how to file issues in that project:
https://xmlgraphics.apache.org/fop/bugs.html

b) I did a search today and found zero issues with keywords "farsi" or "persian" in the FOP project.

c) It looks like we need to provide a "FO" file as input, not docbook. The transformation from docbook to PDF goes in two steps: (1) XSLT is used to get from DocBook XML to "FO" format, and (2) FOP is used to get from "FO" format to PDF. So I will need to see if I can product the FO file, which normally we do not see because the tool we use to transform docbook to FOP does not give you this file. But it should be possible to get it.

Also probably a smaller example rather than a 200-page PDF would be good, so I will see if I can make a small example and we can go from there.

novid’s picture

The EPUB format is finally correct and readable. So. the problem with PDF is definitely comes from FOP. I will have a loot at FOP Documentation about processing complex scripts and if this problem have not been resolved there, fill an issue in their project according to your guidance.

jhodgdon’s picture

FileSize
229.63 KB

Well, the problem could come from the XSLT process that takes the docbook (also used to build ePub so we know it is OK) and builds the FO intermediate file, or it could come from the FOP process that takes the FO file and builds a PDF.

To look into this, I added a --noclean option to the build script, so that it would save the intermediate FO file. Also, I took the huge docbook file and made a smaller one that just has one chapter in it.

So. I am attaching a tgz file containing:
- small.docbook -- the one-chapter docbook file
- small.fo -- the FO intermediate file
- small.pdf -- the PDF output

Please forgive me -- aside from the # signs that are in the PDF where the font was missing characters, I cannot really see the difference between the text in the 3 files. Can you tell which file(s) have unreadable Farsi/Persian script in them?

novid’s picture

DocBook and FO are correct, PDF is not.

jhodgdon’s picture

OK then. I think we should follow the procedure on
https://xmlgraphics.apache.org/fop/bugs.html
and file a bug, attaching the above zip file of "small" files.

I would be happy to create the issue, but I don't understand the language issues well enough to explain why the PDF is unreadable.

So maybe either you could write an explanation for the issue report, or else I can write the other part... What I think we should say is something like this:

----
We are having trouble making Farsi/Persian language output with FOP.

We are using the xmlto script with the --with-fop option to convert a DocBook file to PDF using FOP, which uses XSLT to first make a FO file, and then FOP to convert to PDF. We used the --noclean option to capture the intermediate FO XML file.

We have verified that the Farsi characters are readable in the DocBook and FO files. However, when we generate the PDF, the output is not readable. _______(explain the language issue here)_________.

Attached files: small DocBook sample file, FO intermediate output, and PDF output.

Note that there are some font issues in the PDF output as well -- missing glyphs -- those show as # characters, and are in the chapter/section headings. However, the main body of the text has no # characters, and it still has this problem.
----

jhodgdon’s picture

Oh, whoever creates the issue, let's put a link here so people following this Drupal User Guide issue can follow the Apache issue if desired.

And the version of FOP we are using is 2.1.

jhodgdon’s picture

I think it might also be useful to attach to the issue an ePub output of the same small file, because in the docbook file, the letters are left-to-right within that file (at least when I look at it).

When I look at the ePub vs. PDF, I can now clearly see what you are talking about, with the letters joined together in the ePub and not in the PDF... but I think it is harder to tell in the docbook or FO file what it should look like due to the right-to-left vs. left-to-right letters.

Anyway, here is a new zip file that includes the ePub output as well. And new suggested wording for the issue -- can you see if it is correct, and then one of us can create the issue?

----
We are having trouble making Farsi/Persian language output with FOP.

We are using the xmlto script with the --with-fop option to convert a DocBook file to PDF using FOP version 2.1 (this process uses XSLT to first make a FO file, and then FOP to convert to PDF). We used the --noclean option on the xmlto script to capture the intermediate FO XML file.

We have verified that the Farsi is readable (although left-to-right) in the DocBook and FO files, as well as in an ePub output created from the same DocBook file (which is right-to-left as expected). However, when we generate the PDF, the output is not readable. The problem is that there are both joined and separated letters in Persian/Farsi. However, the PDF does not join letters that should be joined together (they are all separated), making it impossible to read. Note that there are some font issues in the PDF output as well -- missing glyphs in the font we are using. Those show as # characters in the text, mainly in the chapter/section headings. However, the main body of the text has no # characters, and it still has this problem of the characters being separated instead of connected together.

Attached files: relatively small DocBook sample file that shows this problem, FO intermediate output from XSLT, and PDF output from FOP. Also ePub output for comparison, which is what the PDF output should look like.
----

novid’s picture

I will fill the issue with regards to your last comment, but i don't find the ZIP file you mentioned.

jhodgdon’s picture

FileSize
233.7 KB

Sorry, I forgot to attach the new zip file...

novid’s picture

Here is the issue in ASF JIRA system regarding FOP project. I will synchronize both issues for solving this problem.

jhodgdon’s picture

Thanks! I'm following that issue and have commented. Maybe I should make a build with the FarsiWeb fonts and we can see if the PDF is still screwed up, since they are saying maybe it is a font problem. I'll do that later today...

jhodgdon’s picture

Assigned: novid » jhodgdon

OK, on the Apache JIRA issue, the Apache developer Glenn Adams helped us figure out that it's a known bug in the FOP processing system that makes the PDFs.

We can get around it by modifying the XSLT process that takes DocBook and makes it into an FO XML file -- for Farsi, at least, it needs to omit the language attribute on the fo:root element in the FO XML output. See https://issues.apache.org/jira/browse/FOP-2728?focusedCommentId=16089182 and comments around this one for details if interested...

I will modify the scripts to do this and test it out, sometime in the next couple of days.

jhodgdon’s picture

Well. It turns out this is **far** from simple. The language attribute is put into the FOP file in the fo:root element, as noted above. Removing that was difficult, but possible ... but when I did that, I found that the PDF output still was the same as before, with unreadable non-connected Farsi characters. So, I looked at the intermediate FO output this time, and found that there was also a language attribute on many many sub-elements within the document. When I edited the FO file manually to remove all the language attributes, I got a readable PDF finally.

Unfortunately, there is no central template that I can override to remove all of the language elements. The way the XSLT stylesheets are written, this attribute is just added all over the place. So... it's not really possible to simply override one template and have the PDF generation work for us.

So. If we continue to use FOP to build PDF files, Farsi and other "complex" languages are going to continue to not work, until that FOP bug is fixed.

What I need to investigate now is whether we can use some other method besides FOP to build PDFs. I initially used FOP because this seemed to be recommended on forums etc., but I think it is possible to convert DocBook to PDF without using FOP. I just need to see if it works... which may take some time -- I've already invested a lot of time customizing PDF stylesheets for FOP so that the PDF looks OK... will need to do the same for the other method if it even works. I'll try it out sometime but it might be a week or two...

jhodgdon’s picture

It looks like the xmlto command supports three options for building PDFs:

a) option --with-fop -- what we are using now. Uses XSLT to transform DocBook to FO, then FOP to transform FO to PDF.

b) option --with-latex -- uses the dblatex command to transform directly from DocBook to PDF. Requires dblatex command.

c) [neither option] -- Uses XSLT to transform DocBook to FO, then "passive tex" to transform FO to PDF. Requires "passivetext" package and pdfxmltex command.

So I can test out (b) and (c) and see if either of them works better than FOP for us. Probably (c) would be the place to start, since it would use the FO template customizations I have already been using to improve formatting, and the font customizations too.

novid’s picture

Due to lack of strong support for complex scripts in FOP, this problem remains the same.

In the documentation page for complex scripts, there is no support for Persian (Farsi) in standard script codes but closest one is Arabic with arab code. Also in language property section it is said:

Certain fonts that support complex script features can make use of language information in order for language specific processing rules to be applied. For example, a font designed for the Arabic script may support typographic variations according to whether the written language is Arabic, Farsi (Persian), Sindhi, Urdu, or another language written with the Arabic script.

novid’s picture

About other methods for converting DocBook to PDF, i know that pandoc has this capability to convert many formats to others without using an intermediate processing like the on used in this project. In the documentation page for pandoc, support for DocBook as input file and PDF as output exists but i haven't test it yet. Although, i used this tool in another project for converting Markdown into HTML and PDF both in English and Persian (Farsi).

jhodgdon’s picture

FileSize
103.96 KB

Pandoc looks interesting. If the xmlto options don't work out, that will be the next thing I try. Thanks for the link!

If you have pandoc installed already and want to test it, I have attached here the docbook file for Farsi that is the output of the AsciiDoc build... well it's from a build on June 29 so not the most current (I've been doing the ebook builds on my other machine, because it has a later version of FOP), but it should be sufficient for testing anyway.

novid’s picture

Ok, i examined some time with Pandoc's PDF generation by LaTex and it introduced me to a hell of dependencies by LaTex packages. I think we should focus on the main tools which are inside the project and continue to work on #39 until problem is solved.

There is always time for testing other solutions, i know it was my suggestion but it is not good at this stage, sorry.

jhodgdon’s picture

Thanks very much for testing that out! I'll focus on the other tools for now, as you suggest. I may have some time today or tomorrow to work on this... If not, it will be a couple of weeks, sorry!

jhodgdon’s picture

I had some time to do some testing today.

The --with-latex option -- I tried it with a couple of languages:
- English -- it is less desirable than the --with-fop option, in my opinion, but could probably be tweaked.
- Catalan -- total failure. It seems that some "babel" TeX thing (presumably something within Text that takes care of language support) doesn't support Catalan language.
- Farsi -- I got a ton of "missing character" and "Undefined control sequence" messages, and then finally it failed with a "TeX capacity exceeded" error.

So, that is not going to work out.

The no-option version of the xmlto script also didn't work for me, as it said I needed to install the "passivetex" package, but I couldn't find this package to install (apt-get said it doesn't exist).

So that is not going to work out well either.

If Pandoc uses LaTeX then it will most likely have the same language problems as the --with-latex option of xmlto...

So. Not sure if there are other options we can find but at this point it doesn't look too good. :((((

jhodgdon’s picture

OK, I have another idea that I will try out in about a week.

I think we can make a successful PDF using FOP if we do the following:

a) Use the xmlto script to make a FO output file as a first step.
b) Use some sed or PHP script to process the output a bit. Things to do:
- Remove all language="xx" attributes to get around the FOP bug we have been discussing for the past several comments here.
- Convert NBSP characters to regular spaces, because I think this will get rid of some of the missing character glyph in font problems.
- Possibly some other conversions/processing to get rid of additional font issues.
c) Run the FOP command to convert the processed FO file to PDF

Regarding (b), second item, there are still some # characters in the PDF output, and mostly these are happening when there is a string like "Section 1.2 Title Of Section" (translated into Farsi for example), with the spaces actually being NBSP characters. This is not working well because it is a mixture of Farsi and Latin characters in one "word", and the Farsi font doesn't have the Latin characters. The FOP processing for deciding what is a "word" is a bit ... well you would think a NBSP character would break a "word" but it doesn't seem to. Anyway, I think if we change them to regular spaces, and maybe insert some spaces here and there if necessary, we can make break it up into all Farsi and all Latin character words instead of mixed ones, and we will avoid having those # characters for missing glyphs.

Anyway... I think this will work but I am traveling and will not be able to try it out for a week or maybe two, sorry! But at least this is I think hopeful... better than saying "Farsi cannot have a readable PDF file" anyway!

jhodgdon’s picture

Well, I am back to this issue again.

I have upgraded the O/S on my computer, and also apparently updated the FOP program. I am having problems now even building the failed Farsi output that we had before, because FOP is not recognizing the TrueType fonts now. I cannot even get it to reliably recognize OpenType fonts that I have installed... So I am not sure what to do.

For instance, I tried installing the FreeFont open-type fonts, and it couldn't find those. English is now only building if I specify Helvetica as the font, which actually works OK for English, and some other languages like Catalan. But ... not sure what to do now for Farsi, since FreeFarsi is definitely not being recognized, nor FreeFont, which I think has some Persian/Farsi coverage.

jhodgdon’s picture

Status: Helvetica is working fine for English, Catalan, French, Spanish. It doesn't work for Hungarian or Ukranian -- too many missing characters.
I haven't found a font now that works for Farsi, Japanese, or Chinese that FOP will recognize at all.
Hard to say for Bahasa Indonesia because it is not translated much.

This is horrible... I will try to troubleshoot another day but right now I would say the font situation is very bad, and I don't know why FOP is not recognizing my installed fonts at all. On my laptop, last time I checked, it was at least recognizing the installed fonts, but on my desktop machine, it isn't. It's a different version of FOP though I think. I'll need to look into it more.

jhodgdon’s picture

Title: Cannot build PDF/mobi files for right-to-left languages » Cannot build PDF files for right-to-left languages
Related issues: +#2895328: Alternative Japanese fonts for PDF file generation.

Doh! My brain wasn't working apparently. I figured out the font problem! In modifying the scripts to do the preprocessing to get around the FOP bugs, I dropped part of the command so the config file saying "auto-detect fonts" was not being used. So, now it is working mostly...

New status of PDF files:
- Noto Sans font is working fine for English, German, French, Catalan, Spanish, Hungarian, Indonesian, Ukranian. Some are not very much translated, but I think this font will continue to be OK for these languages.
- FreeFarsi font is working for Farsi -- FOR THE FIRST TIME!! It finally looks like the ePub, with the letters joined together. Hooray!!
[Note: Mobi is still not working for Farsi. See notes above in comment #18 and #22]
- I tried to use the OpenType font Noto Sans CJK JA for Japanese and the SC one for Chinese, but it wasn't recognized. I will for the moment go back to Unifont, but see also #2895328: Alternative Japanese fonts for PDF file generation..

So, this is looking good. I'll make a commit, and I think mark this as Fixed for the PDFs, and open a separate issue to explore Mobi.

jhodgdon’s picture

Status: Needs review » Fixed

  • jhodgdon committed 7b20fd9 on 8.x-3.x
    Issue #2887064 by jhodgdon: Cannot build PDF files for right-to-left...
jhodgdon’s picture

novid’s picture

Congratulation! So it was the problem of using proper fonts and the way FOP recognize them. I also remember another tool for managing different outputs which is publican and it's well structured. Just have it in your mind for future projects.

Thank you Jennifer for all the efforts on this issue.

jhodgdon’s picture

Yes, the last problem yesterday, once I had modified the scripts so that we avoided that bug we discussed on the Apache FOP bug tracker, was that I didn't have FOP set to auto-detect system fonts. Anyway, I am very glad that the PDF for Farsi is working!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.

jhodgdon’s picture

Status: Closed (fixed) » Needs work

I'm reopening this issue, because on #2904523-22: Create Persian/Farsi Screenshots, @novid says, regarding the PDF for Farsi:

If you use Nazli font as described in #2887064 most of the font rendering problems will be solved.
There are some issues with the PDF such as:

code examples that are rtl instead of ltr
using the initial translation instead of new one that i completed a week ago
the minor formatting problems that described in Bluecheese issues

The output is more acceptable than what we used to get during last 2 months. Thank you.

jhodgdon’s picture

Regarding the translation, I just checked and when I do a git pull on branch 8.x-3.x, no new files came in, so I did build the PDF using the latest translations of the User Guide that are in that branch. Can you check on your end and make sure your latest changes were committed/pushed to that branch?

Regarding the Nazli font, I will see if I can build a PDF with that and we can see if it is better.

jhodgdon’s picture

Status: Needs work » Needs review
FileSize
2.16 KB

OK, here is the patch to switch to using Nazli fonts (after doing apt-get install fonts-farsiweb to install the font). And the Farsi PDF output. Please let me know what you think. It still had a lot of missing glyph warnings, pretty much the same as with the FreeFarsi font. So, let me know if you think this is better than the FreeFarsi output (which is currently committed to the Git repository). We can use either one -- your choice @novid. Thanks!

Oh dang. It wouldn't let me upload the PDF file -- says it is too large... let me try again...

  • jhodgdon committed 411417d on 8.x-3.x
    Issue #2887064: temporary FA PDF file with Nazli font
    
jhodgdon’s picture

Weird. It says files have to be less than 50 MB and it is 14 MB so it should work. ?!?

Since I could not upload the PDF here, I temporarily committed it to the Git repository for the User Guide. It is in the ebooks folder, with file name guide-fa-tmp-nazli.pdf.

Please do a git pull, take a look at guide-fa.pdf (made with FreeFarsi fonts yesterday) and guide-fa-tmp-nazli.pdf (made with Nazli fonts today) -- both from the same Farsi source files and screenshots -- and see which one you prefer. Thanks!

novid’s picture

FileSize
3.09 MB

Ok, i thought that using standard fonts in Debian repository for Persian (Farsi) would solve the problem but it's not.
Problems

  • TOC does not contain anything at all for both files, i think this is the problem for RTL languages but i can't test it without another language available.
  • guide-fa.pdf: using FreeFarsi, the overall readability of generated file will decrease besides the huge file size (21 MB).
  • guide-fa-tmp-nazli.pdf: using Nazli, all the referred sections on the guide doesn't generate at all and there are # sign instead of section titles. file size is more acceptable now (14 MB).

Proposed Solution

  • using a custom font like Vazir, which i attached to this issue, would solve this problems. At least we can try this last font and in the matter of bad results, i choose the best output regarding the others.

Note: i use all these fonts in LibreOffice and there are none problems like these when i generate pdf from docx and odt formats.

jhodgdon’s picture

Yes, the problem seems to be an interaction between the FOP generator for PDF and the fonts. I will have to investigate whether (a) I can install the Vazir font so that FOP recognizes it and (b) whether it is better... Will try later this week.

jhodgdon’s picture

Issue summary: View changes
FileSize
92.77 KB

Hm. Today I installed the Vazir TTF font files on my machine, and attempted to make a PDF.

As I am sure you know by now, I am definitely not an expert on Farsi script, but it looks to me as though the characters are separated instead of stuck together (a problem we had earlier on. Also there were a lot of "missing glyph" errors (those are the ##### characters). See screenshot...
Screen shot of PDF made with Vazir font

jhodgdon’s picture

By the way, with the Vazir font, there was a table of contents in the PDF file. It was somewhat screwed up but it was there.

novid’s picture

Ok, i have some free time today and can test various fonts with trial and error in order to get better results.

jhodgdon’s picture

Thanks, that would be wonderful! I wish we had a different process available for building the PDF files... the one we are using works well for LTR languages, even those with non-Latin scripts (Japanese), but obviously it has some problems with Farsi. :(

jhodgdon’s picture

Hi again! I did some web searches about Arabic fonts myself today, and found another Ubuntu package that seems to work fairly well, except for the table of contents. It's package fonts-hosny-amiri, with the Amiri font.

I am going to try to fix the table of contents and then I'll commit a change to use this font, because aside from one section on page 157 of the output, it seems like there are no errors with missing glyphs and I think the text is at least readable... I hope? Anyway I'll spend some time to figure out the table of contents (hopefully) ...

jhodgdon’s picture

OK! I think I have a good PDF file now! I fixed the table of contents (in the XSL templates, and switched to this Amiri font, and it worked pretty well. The PDF file is much smaller now too.

I have committed those changes to the project... also did a git pull to get the latest FA and other language source, and rebuilt all the ebooks... take a look at the latest guide-fa.pdf and see what you think?

  • jhodgdon committed 8ce2fdb on 8.x-3.x
    Issue #2887064 by jhodgdon: Cannot build PDF files for right-to-left...
novid’s picture

Great! this is the first time that everything is correct and readable. So, there is no need for more trial and error procedures about Persian (Farsi) fonts suitable in this project. Although Amiri font is Arabic and Persian (Farsi) language is a subsection in Unicode Arabic Script, but it's fine and solves our problem.

jhodgdon’s picture

Status: Needs review » Fixed

Excellent! Let's mark this Fixed then. Thanks again for all of your attention and patience about the PDF files!

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.