Problem/Motivation

  • When running bulk updates for generating AI-based alt text on media or nodes, the system fails with a 500 error if there are a large number of nodes.
  • This happens because the current implementation tries to load and process too many entities at once, exhausting memory/timeout limits.
  • As a result, users with large content databases cannot use the bulk update functionality reliably.

Steps to reproduce

  • Enable the module and configure AI alt text generation.
  • Go to the bulk update page.
  • When large number of nodes on website not able to access getting HTTP 500 Error.

Proposed resolution

  • Replaces loading all entities at once with a paged entity query that fetches results in smaller batches.
  • Ensures that entities are processed iteratively instead of being held fully in memory.
  • Improves memory usage and execution time by avoiding Entity::loadMultiple() on a very large dataset.
CommentFileSizeAuthor
#2 ai-image-alt-text-3545687-2.patch2.84 KByogeshsevak
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

yogeshsevak created an issue. See original summary.

yogeshsevak’s picture

StatusFileSize
new2.84 KB

The attached patch updates the bulk update process to prevent 500 errors on sites with a large number of nodes:

  • Replaces loading all entities at once with a paged entity query that fetches results in smaller batches.
  • Ensures that entities are processed iteratively instead of being held fully in memory.
  • Improves memory usage and execution time by avoiding Entity::loadMultiple() on a very large dataset.
  • Maintains existing functionality while making the bulk update page stable and scalable for large content sites.

With this change, bulk alt-text updates now complete successfully even on sites with tens of thousands of nodes, without triggering memory exhaustion or timeouts.

Please review.

yogeshsevak’s picture

Assigned: yogeshsevak » Unassigned
yogeshsevak’s picture

Status: Active » Needs review

anybody made their first commit to this issue’s fork.

anybody’s picture

Version: 1.0.1 » 1.0.x-dev
Status: Needs review » Needs work

@yogeshsevak could you please provide this as MR instead?

rohit rana made their first commit to this issue’s fork.

rohit rana’s picture

Status: Needs work » Needs review

I reviewed Yogesh’s patch and implemented the same query-based approach to avoid loading all entities at once.

In addition, I refined the implementation by:

  • adding a check to ensure only entities with an attached image (target_id) are processed
  • handling the remaining limit safely to avoid unnecessary queries
  • adding defensive checks for field existence and empty values during processing

The updated code now uses an EntityQuery to fetch only entities that have image fields with empty alt text and processes them in limited batches. This prevents memory exhaustion and resolves the 500 error on sites with large content datasets while keeping the existing behavior unchanged.

Please review and let me know if anything should be adjusted further.

anybody’s picture

What about using the batch API? https://www.drupal.org/docs/drupal-apis/batch-api/batch-api-overview Wouldn't that finally make sense?

anybody’s picture

anybody’s picture

Issue tags: +Needs tests

As this is a larger change and test coverage would generally make a lot of sense for this module, I think tests should be added here.

More general tests can be added here: #3574780: Fix code style (phpcs. phpstan, styleint, cspell, eslint, ...) and add tests for all functionalities

rohit rana’s picture

@anybody, Thanks for the suggestion.

The goal of this patch is to prevent the memory exhaustion and 500 errors caused by loading a very large number of entities at once. The current approach limits the query and processes only a small batch of results per request, which significantly reduces memory usage.

Using the Batch API could certainly be useful for operations that intentionally process very large datasets across the entire site. In this case, however, the method only retrieves a limited number of entities (via the $limit parameter), so the operation remains lightweight within a normal request.

That said, if the feature evolves into a full-site scan or mass remediation process, adopting the Batch API would likely be a good improvement.

kreatil’s picture

I hit the same problem on a Drupal 10 site with roughly 30,000 media:image items missing alt text. In our case the 500 error was caused by the bulk UI loading and scanning too much content in one request.

As a workaround, I did not use ai_image_bulk_alt_text for mass processing anymore. Instead, I built a custom Drush command that:

  • targets only media:image
  • processes only rows where the image field alt value is really empty
  • selects candidates directly from media__field_media_image via database queries
  • works in small chunks
  • stops each run after a short runtime limit
  • aborts after a configurable number of failures
  • can pause future cron runs via Drupal state after repeated provider errors
  • skips media marked with our exclusion flag

This avoided the memory issue completely because it does not build the bulk admin form and does not call loadMultiple() across the full dataset. For local environments with missing files (stage_file_proxy) I also skip missing source files instead of failing the whole run.

So from practical use: moving this out of the bulk UI and into chunked server-side processing solved the issue reliably for us.