Allow Entity Mesh to analyze content using configurable user roles

Problem/Motivation

Currently, the Entity Mesh module only analyzes links that are visible to anonymous users.
However, some sites we want to analyze are not accessible to anonymous users, so we can use the module in this context.

We need to extend Entity Mesh so that site administrators can configure which user role(s) should be used when crawling and analyzing links.

By default, the module should continue to behave as it does now (use the anonymous role), but it should allow selecting other roles through configuration.

This would make the module usable for a wider range of scenarios, especially for websites where access control is based on authenticated roles.

Steps to reproduce

  • Install and enable the Entity Mesh module.
  • Attempt to analyze a site where most or all content is restricted to authenticated users.

Observe that:

  • Entity Mesh only analyzes links visible to anonymous users.
  • Links restricted to authenticated or custom roles are skipped.

There is currently no configuration option to analyze the site using a different role.

Proposed resolution

Add a module configuration setting to choose which user role(s) Entity Mesh should use when analyzing links. By default, use the anonymous role for backward compatibility.

Reuse the functionality already proposed in issue #3535302, which implements the ability to generate a fake account object and assign roles dynamically.

At this point, there are two options:

  • either continue with the approach used in this issue, which involves changing the account globally in each processing step,
  • or implement a more refined solution, which would require at least the following two actions:
    • Replace new AnonymousUserSession() calls with the configurable fake account object.
    • Audit all access checks in the module to ensure they use the configured roles.

Remaining tasks

Add a configuration form to select the role(s) used for crawling.

Extend the fake account functionality from issue #3535302 to support configured roles.

At this point, there are two options:

  • either continue with the approach used in this issue, which involves changing the account globally in each processing step,
  • or implement a more refined solution, which would require at least the following two actions:
    • Replace new AnonymousUserSession() calls with the configurable fake account object.
    • Audit all access checks in the module to ensure they use the configured roles.

Add tests to confirm:

  • Default behavior remains anonymous-only.
  • Configured roles are respected during link analysis.

Entity Mesh Tracker System - Technical Overview

Purpose

The Tracker system provides a queue-based mechanism to manage and process entity
link analysis asynchronously, replacing the previous approach of truncating and
rebuilding the entire entity_mesh table during batch operations.

Database Schema

The entity_mesh_tracker table tracks entities pending analysis with
the following structure:

  • id: Primary key (auto-increment)
  • entity_type: Entity type identifier (e.g., 'node',
    'taxonomy_term')
  • entity_id: Entity identifier
  • operation: Operation type (1 = process/update, 2 =
    delete)
  • status: Processing status (1 = pending, 2 = processing, 3 =
    processed, 4 = failed)
  • timestamp: Unix timestamp of last update
  • retry_count: Number of failed processing attempts

Indexes: entity_lookup (entity_type, entity_id), status, timestamp

Unique constraint: entity_type + entity_id combination

Core Components

TrackerInterface

Defines service contract with constants for operations and statuses

Tracker Service (entity_mesh.tracker)

Implements tracking functionality:

  • addEntity(): Adds/updates entity in tracker (uses MERGE for
    upsert behavior)
  • addMultipleEntities(): Batch adds entities within
    transaction
  • getPendingEntities(): Retrieves entities awaiting processing
    (ordered by timestamp)
  • getFailedEntities(): Retrieves failed entities for retry
    logic
  • markAsProcessed(): Updates status to processed
  • markAsFailed(): Updates status to failed and increments
    retry_count
  • deleteEntity(): Removes entity from tracker
  • deleteProcessedRecords(): Cleanup of old processed records
  • getPendingCount()/getTotalCount(): Statistics
    methods
  • truncate(): Clears entire tracker table

Integration Points

Entity Hooks: Entity CRUD operations (insert/update/delete)
automatically add entries to tracker via entity hooks

Batch Processing: Refactored to populate tracker instead of
directly processing all entities

Cron Processing: Configurable cron job processes pending
entities with limit control (default: 50 per run, configurable via
entity_mesh.settings.cron_limit)

Drush Commands: New commands for manual tracker management and
processing

Processing Flow

  1. Entity operation (create/update/delete) triggers tracker entry
  2. Entry status = PENDING (1)
  3. Cron or manual processing picks up pending entries
  4. Status changes to PROCESSING (2) during analysis
  5. On success: status = PROCESSED (3), on failure: status = FAILED (4) +
    retry_count incremented
  6. Failed entities can be reprocessed based on retry limits
  7. Processed records older than configured days are automatically purged

Configuration

  • cron_enabled: Enable/disable automatic cron processing (default:
    TRUE)
  • cron_limit: Maximum entities to process per cron run (default:
    50)
  • processing_mode: Controls synchronous vs asynchronous processing
    behavior
  • synchronous_limit: Threshold for immediate vs queued
    processing

Benefits

  • Incremental processing: Only changed entities are
    analyzed
  • Performance: Avoids full table truncation and rebuild
  • Reliability: Retry mechanism for failed processing
  • Flexibility: Manual and automatic processing options
  • Scalability: Configurable limits prevent timeout issues on
    large sites
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

lpeidro created an issue. See original summary.

lpeidro’s picture

Issue summary: View changes
lpeidro’s picture

Title: Allow Entity Mesh to analyze links using configurable user roles instead of anonymous-only » Allow Entity Mesh to analyze content using configurable user roles instead of anonymous-only
lpeidro’s picture

Issue summary: View changes

lpeidro’s picture

Status: Active » Needs review

The option to configure the user profile under which content processing is executed has been implemented.

The configuration now includes three mutually exclusive options:

  • Anonymous users
  • Authenticated users with no role or with additional roles
  • A specific user already registered in the database

This is useful for intranet environments or systems where authenticated users access the website.

Relevant functional tests have been added to ensure that access permissions to content are properly validated.

Ready for testing and suggestions for improvements.

lpeidro’s picture

Assigned: lpeidro » Unassigned
lpeidro’s picture

Title: Allow Entity Mesh to analyze content using configurable user roles instead of anonymous-only » Allow Entity Mesh to analyze content using configurable user roles and implement Tracker system
Issue summary: View changes
Status: Needs review » Reviewed & tested by the community

Due to the need to improve performance in this task and the way content is processed, we have also implemented a tracking system along with a cron job to ensure greater stability in the process, as well as a specific cron for the module. I’m also taking this opportunity to update the task title and the description.

  • lpeidro committed ae180f3e on 1.x
    Issue #3544912: It is needed to clear clearMeshAccountCache during the...

  • lpeidro committed 343c9830 on 1.x
    Issue #3544912: Include webform as dependency for test env
    

  • lpeidro committed 98417c8a on 1.x
    Issue #3544912: The webform access was not calculated properly
    

  • lpeidro committed ccfdb2e0 on 1.x
    Issue #3544912: Fix test issues due to merge the branch 1.x
    

  • lpeidro committed 96330cc5 on 1.x
    Issue #3544912: Added Hook Update to set the cron values
    

  • lpeidro committed e7385518 on 1.x
    Issue #3544912: Fix issues detected during testing
    

  • lpeidro committed 00017ffe on 1.x
    Issue #3544912: The queue worker is not used anymore
    

  • lpeidro committed 47bc8162 on 1.x
    Issue #3544912: Implement the tracker in the hook alters
    

  • lpeidro committed 7f2d6077 on 1.x
    Issue #3544912: Implement the Tracker system in the Entity Class
    

  • lpeidro committed aa557795 on 1.x
    Issue #3544912: Added new methods needed for tracking
    

  • lpeidro committed 1429d71d on 1.x
    Issue #3544912: Created form for cron configuration
    

  • lpeidro committed 2b93fdb6 on 1.x
    Issue #3544912: When the tracker, the NodeBatch does not need to...

  • lpeidro committed 48b5a25e on 1.x
    Issue #3544912: Refactor drush commands and ad a new one for track...

  • lpeidro committed 9283a8d3 on 1.x
    Issue #3544912: Remove batch process that is not needed
    

  • lpeidro committed b7c1a5f6 on 1.x
    Issue #3544912: We add button for process tracker entities.
    

  • lpeidro committed 7724977d on 1.x
    Issue #3544912: Ony truncate the tracker table.
    

  • lpeidro committed 1af83816 on 1.x
    Issue #3544912: We refactor the node batch to use the tracker
    

  • lpeidro committed c6a97ef4 on 1.x
    Issue #3544912: We restructure the form of batch process with the option...

  • lpeidro committed 28073de9 on 1.x
    Issue #3544912: We create the batch process for entities tracking
    

  • lpeidro committed 15ceb96f on 1.x
    Issue #3544912: Only we need a field of timestamp to execute in a proper...

  • lpeidro committed 39b46b11 on 1.x
    Issue #3544912: Created service tracker manager to add or remove...

  • lpeidro committed e3e4d85b on 1.x
    Issue #3544912: Create service for tracker actions on the database
    

  • lpeidro committed 743030a5 on 1.x
    Issue #3544912: Create data base table for tracker
    

  • lpeidro committed 0989de57 on 1.x
    Issue #3544912: Created test for the system tracker
    

  • lpeidro committed 255eec29 on 1.x
    Issue #3544912: Overrite the haspermission method in DummyAccount...

  • lpeidro committed 83cc0e34 on 1.x
    Issue #3544912: Add form validation in the backend for configuration...

  • lpeidro committed 41118f60 on 1.x
    Issue #3544912: The account is not set or properly configured during...

  • lpeidro committed fefa9f00 on 1.x
    Issue #3544912: Fix functional and kernel test II
    

  • lpeidro committed 038d766e on 1.x
    Issue #3544912: Implement kernel test to check that check properly the...

  • lpeidro committed a19c1593 on 1.x
    Issue #3544912: Refactor the DummyAccount and the method check access
    

  • lpeidro committed 2e02d039 on 1.x
    Issue #3544912: Rename and update the method checkAccessEntity to be...

  • lpeidro committed bbf8bc9d on 1.x
    Issue #3544912: Improve the role selection and add the possibility to...

  • lpeidro committed 431cc932 on 1.x
    Issue #3544912: Improve explanation in form
    

  • lpeidro committed a817f240 on 1.x
    Issue #3544912: Added test for the configuration roles funcionality
    

  • lpeidro committed 37369e63 on 1.x
    Issue #3544912: Added funcionality to configure different roles.
    
lpeidro’s picture

Issue summary: View changes
Status: Reviewed & tested by the community » Fixed

Funcionality merged.

Now that this issue is closed, please review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, please credit people who helped resolve this issue.

Status: Fixed » Closed (fixed)

Automatically closed - issue fixed for 2 weeks with no activity.