This project is not covered by Drupal’s security advisory policy.

The OCR module gives Drupal a text-extraction service that reads image and document files and returns their text content. It uses a plugin-based provider system so the underlying OCR engine can be swapped without changing any of the code that calls it. A Tesseract provider is included out of the box.

Features

  • Service API — a single ocr service (OcrServiceInterface) with two entry points: recognizePath() for absolute filesystem paths and recognizeUri() for Drupal stream-wrapper URIs (public://, private://, etc.).
  • Swappable provider plugins — any module can register its own OCR back-end by dropping a class into Plugin/OcrProvider/ and decorating it with #[OcrProvider]. The active provider is selected via simple config, no code change required.
  • Tesseract provider — uses the local tesseract binary. Supports any language pack Tesseract has installed and is configurable per-provider via ocr.settings.
  • AI Automators integration (optional submodule ocr_ai_automators) — adds two automator rules (OCR: Image / File to Text and OCR: Image / File to Plain Text) that automatically populate text_long and string_long fields from an attached image or file field whenever a node is saved. No LLM or API key required.

Use this module whenever you need to make the text inside scanned documents, receipts, photographs of signs, or any other image-based content searchable, indexable, or editable within Drupal.

Post-Installation

After enabling the ocr module there is nothing mandatory to configure — it works immediately using Tesseract with English as the default language.

  1. Verify the binary — confirm tesseract is on the server's $PATH (see Additional Requirements below). You can check with drush ev "var_dump(\Drupal::service('ocr')->getProvider()->isAvailable());".
  2. Adjust settings (optional) — edit ocr.settings to point at a non-default binary path or change the Tesseract language: drush cset ocr.settings provider_settings.tesseract.language fra+eng.
  3. AI Automators integration (optional) — enable the ocr_ai_automators submodule, then go to Structure → Content types → [your type] → Manage fields, open a text_long or string_long field, and add an AI Automator rule. Choose OCR: Image / File to Text and set the base field to the image or file field you want to read from.

To use the service in your own code, inject \Drupal\ocr\OcrServiceInterface or call \Drupal::service(\Drupal\ocr\OcrServiceInterface::class).

Additional Requirements

  • Tesseract OCR (required for the default provider) — a system binary, not a Composer package.
    • Debian/Ubuntu: apt-get install tesseract-ocr tesseract-ocr-eng
    • macOS: brew install tesseract
    • DDEV: add a .ddev/web-build/Dockerfile — see the repository for the ready-made example.
  • Additional Tesseract language packs follow the pattern tesseract-ocr-[lang] (e.g. tesseract-ocr-fra for French).
  • ocr_ai_automators submodule only: requires the AI module with ai_automators enabled.
  • AI module — enables the ocr_ai_automators submodule, which lets you automate field population without writing any custom code.
  • Search API — index the OCR-extracted text fields to make image content full-text searchable.

Similar projects

  • Tesseract OCR — ties OCR directly to specific field widgets and file handling. The OCR module differs by exposing a general-purpose service and plugin system rather than a fixed field integration, making it easier to call from other modules or Rules/ECA actions.

Supporting this Module

This is a custom project module. No public funding page is available.

Community Documentation

No external documentation yet. The README.md in the repository root covers installation, the service API, provider configuration, and how to write a custom provider plugin.

Supporting organizations: 
Sponsored

Project information

  • caution Minimally maintained
    Maintainers monitor issues, but fast responses are not guaranteed.
  • Project categories: Integrations
  • Created by d0t15t on , updated
  • shield alertThis project is not covered by the security advisory policy.
    Use at your own risk! It may have publicly disclosed vulnerabilities.

Releases