Problem/Motivation

Drupal's existing encryption ecosystem (Encrypt API, Key, Field Encrypt) provides strong at-rest confidentiality but renders encrypted fields unusable for common CRM operations: equality search, deduplication, Views filters, and indexed lookups. Many real-world applications (CRMs, membership systems, healthcare) must both protect PII and allow efficient lookup by name, email, phone, or other identifiers. There is no contrib module that implements the hybrid pattern used by mature CRMs: encrypted canonical values + non-reversible, indexable search keys. Without a standard approach and reusable implementation, every Drupal project handling PII must reimplement the same brittle logic, risking security mistakes and inconsistent behavior.

Steps to reproduce

  1. Create an entity (or use a Node/Contact) with a Field Encrypt–enabled text field for aliases.
  2. Save several entities with distinct aliases (e.g., "John Smith", "Jane Doe").
  3. Attempt to run an entityQuery or a Views filter for aliases = 'John'.
  4. Observe that the query returns no results (or sorts by ciphertext), because the database only contains ciphertext and cannot compare plaintext values.
  5. Attempt to build a dedupe routine that looks up an email or phone: it cannot perform efficient SQL equality checks against encrypted columns.

This reproduces the core limitation: encrypted field values cannot be used for WHERE / ORDER BY / JOIN / INDEX operations.

Proposed resolution

Introduce a contrib module implementing a Searchable Encrypted Fields pattern (working name: encrypted_search_fields). The module's goal is to provide a reusable, secure, and configurable hybrid field/storage pattern that:

  • Stores the canonical PII value encrypted at rest (using existing Encrypt API + Key module).
  • Maintains one or more non-reversible index/search columns derived from the plaintext for lookup, deduplication, and indexing.
  • Provides normalization/phonetic options (lowercase, strip accents, tokenization, soundex/metaphone) for names and addresses.
  • Integrates with Views, EntityQuery, Search API and offers a secure Views filter plugin that hashes input before comparing to the indexed hash.
  • Supports configurable hashing strategy (HMAC with Key-managed secret; choice of algorithm), deterministic encryption options, and optional prefix/token indexing for limited “starts with” support.
  • Provides key-rotation helpers and guidance on reindexing strategies.

Security model

  • Encrypted column uses AES/GCM or equivalent via Encrypt API and keys stored/managed via Key module.
  • Search indexes use HMAC (server-secret) or keyed hash so index values are not vulnerable to rainbow-table attacks.
  • Index values are non-reversible and the module documents threat models and key-management recommendations.

High-level workflow

  1. Admin marks fields as “encryptable + searchable”.
  2. On entity save, plaintext is encrypted into field_x; simultaneously a normalized form and HMAC-based hash(s) are computed and written to index column(s) (e.g., field_x_hash, field_x_norm, field_x_phonetic_hash).
  3. Search/UI: When a user searches via a provided Views filter or EntityQuery helper, the module normalizes the input the same way and computes the same hash/HMAC, then queries the hash column(s). Returned entity IDs are loaded and the encrypted values are decrypted for authorized viewers.

Remaining tasks

  • Design & approve API and configuration surface (field settings, site-wide defaults, per-field overrides).
  • Define concrete DB schema patterns (column naming, types, indexes). Create migrations for existing fields if possible.
  • Implement base field/field type wrappers that support encrypted + hash for both configurable and base fields.
  • Implement the hashing service (HMAC) with Key integration for secret management and configuration UI to choose algorithm and salts/keys.
  • Implement normalization utilities (lowercase, unicode normalization, punctuation stripping, tokenization) and optional phonetic processors (Soundex/Metaphone).
  • Create a preSave trait / entity event subscriber to compute and persist index columns atomically with encrypted value.
  • Provide Views integration: filter plugin(s) that convert input to index values, a field handler to display decrypted values, and admin UI to hide index columns from display.
  • Add EntityQuery helper functions or extensions so programmatic queries can use the hash columns easily.
  • Search API integration: adapters for indexing hash columns and for allowing faceting/deduping while keeping PII encrypted in the DB.
  • Write key-rotation tooling: re-encrypting canonical values when keys rotate, and re-hashing/re-indexing if hashing key changes (or provide strategy to avoid changing HMAC key frequently).
  • Test coverage: unit, kernel, and integration tests for correctness, indexing, Views behavior, and key rotation.
  • Security review and documentation (threat model, guidance, recommended key rotation schedule, operational considerations for backups, multi-environment keys).
  • Documentation and upgrade path guidance for maintainers (how to enable on existing fields, migration steps, performance implications).

User interface changes

  • Field settings UI (per field):
    • Checkbox: “Encrypt this field at rest”.
    • Checkbox: “Create searchable index for this field”.
    • Index options: exact match (HMAC), normalized match (lowercase/strip accents), phonetic (soundex/metaphone), prefix tokens (length configurable).
    • Choose encryption profile (dropdown of available Encryption Profiles from Encrypt module).
    • Choose hashing profile / HMAC key (select from Key module keys; show recommended algorithms).
    • Help text: explain limitations (no SQL sorting on encrypted column, what index supports, reindex guidance).
  • Site configuration UI:
    • Global defaults for hashing/encryption algorithms and normalization rules.
    • Key management links (integration with Key module UI).
    • Reindex control page: list fields with searchable indexes and allow triggering background reindex (with progress) after key changes or normalization rule changes.
  • Views integration:
    • New Views filter plugin: “Encrypted Search Filter” that takes visible user input, normalizes & hashes it, and applies condition on the hash column.
    • Admin-facing explanation on the filter configuration indicating supported match types and performance characteristics.
  • Field display:
    • Encrypted columns are shown decrypted to authorized roles; index/hash columns are hidden from display and access unless the user has special debug permissions.

API changes

  • Services
    • encrypted_search.fields_processor — service to compute normalized values, phonetic tokens, and HMACs given a plaintext and a field configuration.
    • encrypted_search.key_manager — wrapper to read HMAC/encryption keys from Key module, enforce access restrictions, and provide rotation helpers.
    • encrypted_search.reindexer — background worker service to reindex existing entities for chosen fields.
  • Traits / Event subscribers
    • A trait (or an Entity Event Subscriber) that modules can include to compute and set hash/index fields in presave() or via an entity save event.
    • Hook or service events: hook_encrypted_search_pre_index() and hook_encrypted_search_post_index() for extensibility (or PSR events).
  • Field type/field widget APIs
    • Field type metadata to mark a field as supporting encryption/searchable index (settings schema additions).
    • Field storage wrappers that ensure atomic write of encrypted + index columns where possible.
  • Views / EntityQuery integration
    • Views filter handlers that accept plaintext input and use the hashing service to produce SQL-safe conditions against the hash column.
    • EntityQuery helper functions or plugins that expose ->conditionEncrypted(field_name, $plaintext) convenience method, which computes hash and adds condition against hash column.
  • Search API
    • Indexing plugins to allow Search API backends to index hash/normalized values (not raw encrypted values), plus a provider for mapping queries from plain input to hashed query.

Data model changes

For each configured “encryptable + searchable” field the module will add a small set of columns. Implementation choices include extending the field storage table, adding companion index tables, or using base fields. Below are recommended schema patterns and rationale.

  1. Encrypted canonical column
    field_x — text (ciphertext). Stored and managed by Encrypt/Field Encrypt. This column is never used for SQL comparisons.
  2. Primary hash column (equality search)
    field_x_hash — fixed-length string (e.g., 64- or 128-char hex/base64). Value = HMAC(secret_key, normalization(plaintext)). Indexed (B-tree). Used for equality lookups and joins for dedupe.
  3. Normalized column (optional)
    field_x_norm — smallstring or text containing a canonicalized form (lowercase, accent-stripped, punctuation removed). This may be hashed or stored as-is depending on policy (storing hashed normalized values preferred for reduced exposure). Used to implement case-insensitive matching or tokenized matching.
  4. Phonetic/auxiliary index (optional)
    field_x_phonetic_hash — value = HMAC(secret, phonetic(normalized)). Enables fuzzy dedupe using phonetic algorithms. Indexable.
  5. Prefix/token index (optional, tradeoffs)
    If “starts-with” search is required, compute hashes for a configurable set of prefixes (e.g., first 2–4 characters) or n-grams and store as separate index rows. This increases storage/index size and reduces anonymity; enable only when required.
  6. Index table pattern (alternative)
    Instead of adding many columns to the primary field table, use a companion index table:
     {entity_field_search_index} entity_type entity_id field_name index_type (hash, norm, phonetic) index_value 

    This allows flexible indexing and easier addition/removal of index types without schema migrations on the primary entity table.

  7. Indexes
    Create B-tree indexes on the hash columns (and multi-column indexes when appropriate) for fast equality lookup. Avoid indexing encrypted canonical columns.
  8. Access control & visibility
    Hash/index columns are considered operational metadata:
    hide them from UI views and API responses by default. Provide admin/debug permission to view if necessary.
  9. Key rotation and reindexing
    Rotating the HMAC/encryption key requires reindexing:
    encrypted canonical values must be re-encrypted with the new encryption key (if applicable), and hash/index values must be recomputed if the HMAC key changes. Provide an atomic reindex job and document offline strategies (e.g., dual-key acceptance windows).

Notes on tradeoffs: exact-match search via HMAC is performant and relatively simple. Partial matching, fuzzy search, and full-text search require additional index material or external search systems (Search API + Solr/Elasticsearch) and careful attention to what derivative data is stored. All index values should be generated deterministically from plaintext using a site/Key-managed secret to avoid rainbow attacks.

If this draft looks right, I can convert it into a ready-to-post Drupal.org issue body (with patch/roadmap sections), or produce machine-ready code scaffolding: base field definitions, hashing service, trait, Views filter plugin, and a minimal module scaffold to start implementation.

Comments

bluegeek9 created an issue. See original summary.

bluegeek9’s picture

Issue summary: View changes