Active
Project:
PDF To Text
Version:
6.x-2.x-dev
Component:
Miscellaneous
Priority:
Normal
Category:
Support request
Assigned:
Reporter:
Created:
3 Mar 2011 at 10:25 UTC
Updated:
3 Mar 2011 at 11:32 UTC
Cool module anyway,
after loading up german language pdf files containing „Umlaute” the text is shown but every umlaut dissapears. How can i fix that ?
Comments
Comment #1
Saubhagya commentedFirst you need to know what type of encoding is in the document. If it is PDFDocEncoding (which is currently supported by the module), then you can add whatever characters you like to add as specified (and following will work only then).
Go through http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf and at page 663 I think there is encoding for umalut characters.
Just add the octal and unicode equivalents of desired characters in array $_pdfDocToUni line 18 file initialize.pdf2text.inc (remember octal need to be in 3 digits as in other entries of array).
Then just go to line 335 of pdf2text.module and add your character in the same format of other ones (I can't do this as my keyboard doesn't support these characters).
Let me know how it work for you.