This project is not covered by Drupal’s security advisory policy.
The OCR module gives Drupal a text-extraction service that reads image and document files and returns their text content. It uses a plugin-based provider system so the underlying OCR engine can be swapped without changing any of the code that calls it. A Tesseract provider is included out of the box.
Features
- Service API — a single
ocrservice (OcrServiceInterface) with two entry points:recognizePath()for absolute filesystem paths andrecognizeUri()for Drupal stream-wrapper URIs (public://,private://, etc.). - Swappable provider plugins — any module can register its own OCR back-end by dropping a class into
Plugin/OcrProvider/and decorating it with#[OcrProvider]. The active provider is selected via simple config, no code change required. - Tesseract provider — uses the local
tesseractbinary. Supports any language pack Tesseract has installed and is configurable per-provider viaocr.settings. - AI Automators integration (optional submodule
ocr_ai_automators) — adds two automator rules (OCR: Image / File to Text and OCR: Image / File to Plain Text) that automatically populatetext_longandstring_longfields from an attached image or file field whenever a node is saved. No LLM or API key required.
Use this module whenever you need to make the text inside scanned documents, receipts, photographs of signs, or any other image-based content searchable, indexable, or editable within Drupal.
Post-Installation
After enabling the ocr module there is nothing mandatory to configure — it works immediately using Tesseract with English as the default language.
- Verify the binary — confirm
tesseractis on the server's$PATH(see Additional Requirements below). You can check withdrush ev "var_dump(\Drupal::service('ocr')->getProvider()->isAvailable());". - Adjust settings (optional) — edit
ocr.settingsto point at a non-default binary path or change the Tesseract language:drush cset ocr.settings provider_settings.tesseract.language fra+eng. - AI Automators integration (optional) — enable the
ocr_ai_automatorssubmodule, then go to Structure → Content types → [your type] → Manage fields, open atext_longorstring_longfield, and add an AI Automator rule. Choose OCR: Image / File to Text and set the base field to the image or file field you want to read from.
To use the service in your own code, inject \Drupal\ocr\OcrServiceInterface or call \Drupal::service(\Drupal\ocr\OcrServiceInterface::class).
Additional Requirements
- Tesseract OCR (required for the default provider) — a system binary, not a Composer package.
- Debian/Ubuntu:
apt-get install tesseract-ocr tesseract-ocr-eng - macOS:
brew install tesseract - DDEV: add a
.ddev/web-build/Dockerfile— see the repository for the ready-made example.
- Debian/Ubuntu:
- Additional Tesseract language packs follow the pattern
tesseract-ocr-[lang](e.g.tesseract-ocr-frafor French). ocr_ai_automatorssubmodule only: requires the AI module withai_automatorsenabled.
Recommended modules/libraries
- AI module — enables the
ocr_ai_automatorssubmodule, which lets you automate field population without writing any custom code. - Search API — index the OCR-extracted text fields to make image content full-text searchable.
Similar projects
- Tesseract OCR — ties OCR directly to specific field widgets and file handling. The OCR module differs by exposing a general-purpose service and plugin system rather than a fixed field integration, making it easier to call from other modules or Rules/ECA actions.
Supporting this Module
This is a custom project module. No public funding page is available.
Community Documentation
No external documentation yet. The README.md in the repository root covers installation, the service API, provider configuration, and how to write a custom provider plugin.
Project information
Minimally maintained
Maintainers monitor issues, but fast responses are not guaranteed.- Project categories: Integrations
- Created by d0t15t on , updated
This project is not covered by the security advisory policy.
Use at your own risk! It may have publicly disclosed vulnerabilities.
