Problem/Motivation

Indexing content is slow because we are inserting records sequentially.

Proposed resolution

Updates the module to allow batched requests to the Milvus database. Requires this patch on the AI module

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

paulsheldrake created an issue. See original summary.

paulsheldrake’s picture

Issue summary: View changes

a.dmitriiev made their first commit to this issue’s fork.

a.dmitriiev’s picture

Status: Active » Needs review
Issue tags: +AI Initiative Sprint, +AI Product Development

I have created a new MR, because the original one had also some changes to the filtering, but I think it should be done in a separate issue, as this one is related to the batch indexing.

New MR follows the updated logic from parent AI Core issue. To avoid breaking changes the new method insertBatchIntoCollection was added to base class. In new MR this method is implemented.

a.dmitriiev’s picture

abarrio’s picture

StatusFileSize
new2.76 KB

Assing patch to be used in a project.

abarrio’s picture

StatusFileSize
new2.76 KB

Adding it with patch extension

csakiistvan’s picture

Assigned: Unassigned » csakiistvan
csakiistvan’s picture

Status: Needs review » Reviewed & tested by the community

Testing report for #3568651 — Batch inserts for improved indexing performance

Environment:

  • Drupal 11.3.9, PHP 8.3, DDEV
  • ai module 1.2.7, ai_vdb_provider_milvus 1.x-dev
  • MR !44 patch applied manually (patch from batching-support-44.patch)

Step 1 — Apply patch and enable module

The patch from MR !44 was applied to web/modules/contrib/ai_vdb_provider_milvus. The module was not yet enabled; enabling it also required pulling in drupal/search_api (not present in the project):

ddev composer require drupal/search_api
ddev drush pm:enable ai_vdb_provider_milvus -y

Both ai_search and search_api were enabled as dependencies. ✅


Step 2 — Verify patch contents are present

Running patch --dry-run confirmed the changes are already applied:

patch --dry-run -p1 < batching-support-44.patch
# → Reversed (or previously applied) patch detected

Both modified files contain the expected changes:

  • src/MilvusV2.phpinsertIntoCollection() detects batch vs. single record using isset($data[0]) && is_array($data[0]) and wraps accordingly.
  • src/Plugin/VdbProvider/MilvusProvider.php — new insertBatchIntoCollection() method with retry logic on error code 1100.

Step 3 — Unit-level tests via drush script

Since the project uses PHPUnit 12 while Drupal 11.3 core only ships PHPUnit 11 compatibility traits, the existing MilvusV2Test cannot currently be bootstrapped. Testing was done through a drush php:script with mocked Guzzle/MilvusV2 objects instead.

MilvusV2 batch detection logic (2 scenarios):

// Single record → must be wrapped: data sent as [[$record]]
$single = ['drupal_long_id' => 'node:1:en:0', 'vector' => [0.1, 0.2], 'content' => 'hello'];
$milvus->insertIntoCollection('col', $single);
// → $request['data'] has 1 element, $request['data'][0] === $single  ✅

// Batch → must NOT be double-wrapped: data sent as [$r1, $r2]
$batch = [['drupal_long_id' => '...', ...], ['drupal_long_id' => '...', ...]];
$milvus->insertIntoCollection('col', $batch);
// → $request['data'] has 2 elements, each element is the original record  ✅

MilvusProvider::insertBatchIntoCollection() (3 scenarios):

Scenario Expected Result
API returns code 0 No exception, returns normally ✅ Pass
API returns code 1100 (content too long), then 0 Sanitizes content in each record, retries, succeeds ✅ Pass
API returns unknown error code with message Throws exception containing the message ✅ Pass

All 10 assertions passed.


Observation: AI Core indexItems does not yet call insertBatchIntoCollection

This MR correctly adds insertBatchIntoCollection() to the Milvus provider, and MilvusV2::insertIntoCollection() now handles batch payloads. However, AiVdbProviderClientBase::indexItems() in the ai module still calls insertIntoCollection() once per embedding in a loop:

// ai/src/Base/AiVdbProviderClientBase.php, inside foreach ($items)
$this->insertIntoCollection(
    collection_name: $configuration['database_settings']['collection'],
    data: $data,
    database: $configuration['database_settings']['database_name'],
);

Until the AI Core base class is also updated to collect embeddings into a batch and call insertBatchIntoCollection(), real-world indexing will still insert records one at a time. This MR lays the necessary groundwork on the Milvus side; the performance gain depends on a companion patch in the ai module.


Summary

Test Result
Single record wrapped correctly in API payload ✅ Pass
Batch records not double-wrapped in API payload ✅ Pass
insertBatchIntoCollection succeeds on code 0/200 ✅ Pass
insertBatchIntoCollection retries after 1100 (content sanitize) ✅ Pass
insertBatchIntoCollection throws on unknown error code ✅ Pass
AiVdbProviderClientBase::indexItems calls batch method ❌ Not yet — requires AI Core companion patch

The MR code is correct and well-structured. To complete the performance improvement, a patch to AiVdbProviderClientBase::indexItems() in the ai module is also needed so that embeddings are collected and flushed as a batch rather than inserted individually.

csakiistvan’s picture

Assigned: csakiistvan » Unassigned