Voting starts in March for the Drupal Association Board election.
Since professional PDF to Word converters are extremely complex & costly, we decided to program one built with open-source techniques.
Don't expect a 1:1 copy of your PDF in Word format, but it will give you everything to need to make one with a little bit of work.
There are two options available:
- Text & image extraction
- OCR (text recognition)
Normally you will use text & image extraction, it tries to maintain the text layout, but the fonts and font-sizes are not extracted.
So you have to do a little bit of work to rebuild the PDF with the text and images given.
OCR technology will give a nice tool in case you scanned a document.
Credits go to:
- Drupal 7 for the excellent extremely modular platform.
- Conditional fields module for a really nice node form interface experience.
- PDF to Image module to deliver the images to the OCR processer.
- Tesseract OCR including some language packages, maintained by Google.
- pdftotext for text extraction from the Xpdf package
- pdfimages for image extraction also from the Xpdf package
- ImageMagick for image conversion and scaling
- JODConverter, the Java OpenDocument converter, for .txt to .doc conversion
- Drupal Business theme for a nice and tidy front-end
With all those projects we where able to build a free online tool to create word documents out of PDF files!