Doc to HTML — Usage and Configuration Guide (2.x)
The Doc to HTML module allows editors to upload documents directly from
a node edit form, convert them to HTML with LibreOffice, review the
generated markup in CKEditor 5, and save it into a
text_long or text_with_summary field.
The primary supported formats are DOC and DOCX.
Site builders can also enable additional LibreOffice-supported formats such as
ODT, RTF, and PPTX from the module
configuration.
Requirements
The module relies on LibreOffice being available on the same environment
where Drupal PHP runs, such as the web server, PHP-FPM container, or DDEV web container.
Drupal executes LibreOffice from the command line to perform the document-to-HTML
conversion.
-
Install LibreOffice on the same environment where your Drupal PHP runtime
is executed. -
Configure the LibreOffice executable directory in the Base path for LibreOffice
setting, for example/usr/binor
/Applications/LibreOffice.app/Contents/MacOS. -
Configure only the executable name in the Command setting, for example
soffice,libreoffice, orsoffice.exe.
Do not include path separators in this field. -
From the command line, you should be able to run a simple test such as
libreoffice --versionorsoffice --versionwithout errors. -
PHP 8.1 or later is required. The
proc_openfunction is recommended so the
module can enforce the configured conversion timeout.
Example DDEV configuration (config.yaml)
On local environments that use DDEV, you can install LibreOffice inside the
web container by adding an extra package to your config.yaml:
name: doc-to-html type: drupal11 docroot: web php_version: "8.3" webserver_type: nginx-fpm webimage_extra_packages: - libreoffice
With this configuration, DDEV installs libreoffice inside the web container,
so that the Doc to HTML module can execute LibreOffice from Drupal.
Global Module Configuration
Before using the widget on content types, configure the global settings used by the
Doc to HTML module. These settings provide the default behavior for conversion,
file handling, and HTML cleanup.
Typical global configuration steps:
-
Configure LibreOffice settings
Go to/admin/config/content/doc_to_html/libreoffice-settingsand configure:- The base path where the LibreOffice executable is located.
- The executable command name, such as
sofficeorlibreoffice. - The conversion timeout in seconds. The default is 60 seconds. Use 0 to disable the timeout.
-
Configure basic conversion settings
Go to/admin/config/content/doc_to_html/basic-settingsand configure:- The public files subfolder used for temporary converted HTML files.
- The enabled upload formats: DOC, DOCX, ODT, RTF, and/or PPTX.
- Whether converted output should be normalized to UTF-8.
- The optional body extraction regex used to extract content from the generated HTML.
- The body regex match index, where 0 means the full match and 1 or higher selects a capture group.
Testing the Conversion with TestWizard
The module provides a TestWizard tool to validate your LibreOffice
configuration and preview the conversion pipeline before enabling the widget on content
fields.
Using the TestWizard typically involves:
-
Navigate to
/admin/config/content/doc_to_html/test-wizard. - Upload a sample document using one of the file types enabled in Basic Settings.
-
Run the conversion to review:
- The raw HTML generated by LibreOffice.
- The extracted
<body>segment after applying the body regex. - The final preview after DOM regex cleanup is applied.
- Save the tested regex settings if the output matches the expected result.
Once the test behaves as expected, you can enable the widget on content fields.
Enabling the Widget on Content Fields
After configuring the global settings and validating them with the TestWizard,
enable the Doc to HTML widget on specific fields in your content types.
The widget is designed to work with:
text_longfieldstext_with_summaryfields
To enable the widget for a given field:
-
Go to Structure → Content types and select the content type
you want to configure. - Open the Manage form display tab.
-
Locate the
text_longortext_with_summaryfield that should
receive the converted HTML. - In the widget selector for that field, choose DOC to HTML.
- Configure the widget settings and save the form display.
Widget Settings
Each widget instance can override or extend the global behavior.
-
Apply body extraction regex: applies the body extraction regex configured
in Basic Settings to the conversion output. - DOM regex override: overrides the global DOM regex for this widget instance.
- Maximum file size: limits the uploaded document size. Use 0 for no widget-level limit.
-
Source file field: optionally saves the original uploaded document into
a file or image field on the same entity.
How the Doc to HTML Widget Works
Once the Doc to HTML widget is enabled for a field, the node edit form includes an
additional DOC to HTML file section.
-
The widget adds a virtual managed upload element used to upload the source document.
This upload element is part of the widget UI and is not itself a separate content field. - The editor uploads a document and clicks Convert.
-
The module:
- Receives the uploaded document through the widget upload element.
- Calls LibreOffice using the configured base path, command, and timeout.
- Converts the document to HTML.
- Optionally extracts the configured
<body>match. - Optionally applies DOM regex cleanup or custom post-processing.
- Injects the final HTML into the target
text_longortext_with_summaryfield.
-
The converted HTML is injected into the form state and editor field. The entity is
not saved automatically when Convert is clicked. -
The editor can review or edit the generated HTML in CKEditor, then click
Save to persist the content.
Saving the Source Document
By default, the uploaded source document is treated as an intermediate conversion input.
If the widget is configured with a Source file field, the original document
is promoted to a permanent file and saved into that file or image field when the entity
is saved.
This is useful when the site needs to keep the original DOC/DOCX file available for
download or archival purposes. The selected source field should be a normal file or
image field on the same content type.
Extensibility
Other modules can customize the conversion workflow with hooks or Symfony events.
-
hook_doc_to_html_pre_convert()can modify conversion options or cancel the conversion. -
hook_doc_to_html_post_convert()can inspect or replace the final HTML. -
Symfony event subscribers can listen to
doc_to_html.pre_convertand
doc_to_html.post_convert.
Drush Commands
The module also provides Drush commands for debugging and maintenance.
drush dth:version: show the detected LibreOffice version.drush dth:clean: clean the conversion temporary folder.drush dth:convert /path/to/document.docx: convert a document and print the generated HTML.
Typical Workflow Summary
- Install LibreOffice on the server or container where Drupal PHP runs.
- Configure the LibreOffice base path, command name, and timeout.
- Configure supported file types, UTF-8 handling, and optional body extraction settings.
- Use the TestWizard to validate LibreOffice and preview the generated HTML.
- Enable the DOC to HTML widget on a
text_longortext_with_summaryfield. - Optionally configure a source file field if the original document should be kept.
- Editors upload a document, click Convert, review the generated HTML in CKEditor, and click Save.
With this setup, the Doc to HTML module provides a configurable workflow for converting
office documents into clean Drupal text field content, using LibreOffice and the Drupal
Form API.
Project information
- Project categories: Developer tools
3 sites report using this module
- Created by sjpagan on , updated
Stable releases for this project are covered by the security advisory policy.
Look for the shield icon below.
Releases
Development version: 2.0.x-dev updated 2 May 2026 at 07:44 UTC
Development version: 8.x-1.x-dev updated 25 Jan 2019 at 15:28 UTC








