Problem/Motivation

Different models perform their tasks at different speeds. Its good to know how long the tests take and where there bottle necks.

Proposed resolution

  • We should time how long the tests take.
  • Even better if we can time how long each step of the tests take (For example, use Extended logger and save the logs/ Trace (maybe this is a follow up)).
  • We should roll that up into the overall results to show how long the whole thing took.
Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

yautja_cetanu created an issue. See original summary.

yautja_cetanu’s picture

Priority: Normal » Major
yautja_cetanu’s picture

Issue tags: +priority
blanca.esqueda’s picture

Assigned: Unassigned » blanca.esqueda

Assigning this task to me.
I'll give it a try - I'll reach to Bisonbleu or yautja_cetanu for concerns/help.

blanca.esqueda’s picture

For reference;

Bisonbleu -
While time is not stored, in this other issue execution time is displayed when running a Test Group with Drush… (see captures)
https://www.drupal.org/project/ai_agents_test/issues/3538737#comment-162...

bisonbleu made their first commit to this issue’s fork.

blanca.esqueda’s picture

Assigned: blanca.esqueda » Unassigned

bisonbleu’s picture

Status: Active » Needs work

WiP - Made some progress implementing this using AiAgentStatusPollerService. The latest code in the MR returns a combined Test Group group_duration as can be seen with a drush command (this is not yet displayed in the UI).

% ddev drush sql-query "SELECT id,label,FROM_UNIXTIME(created),model,eval_model,group_duration FROM ai_agents_test_group_result ORDER BY changed DESC LIMIT 1"        
12	Basic Pages Test Group	2025-09-26 14:36:35	claude-3-5-haiku-20241022	gpt-4.1-mini	20.7097

I’m now running into some PHPstan issues on GitLab (which I don’t see locally). Looks like they might be linked to how PHPstan is setup in the repo…

bisonbleu’s picture

Issue summary: View changes
Status: Needs work » Needs review
StatusFileSize
new352.75 KB

Currently only displaying Total completion time on Test Group Result page (see capture). Please advise where else Test and Test Group completion times should be displayed.

Displaying Total completion time

How to test:

  • Install & Enable ai_agents_test;
  • Download test_basic_pages_test_group.yaml.txt and remove the trailing .txt;
  • Go to admin/content/ai-agents-test/group and import the .yaml file;
  • Go to admin/content/ai-agents-test/group
  • In the Actions, select «Run Test Group»
  • Configure with your favourite LLM combo;
  • Click Start a new test run;
yautja_cetanu’s picture

Perhaps next to Status for each test.

Even better would be a kind of Gantt chart log thing where you can see visually the relative times it takes! But that might be a bit advanced.

Something like: https://claude.ai/public/artifacts/58a06f78-2552-40fc-add8-0eb17f483a02
OR: https://claude.ai/public/artifacts/dd00223a-017f-47c6-9ceb-6c29e741d879

But I think this is a follow-up issue. For now, I think having the time it took for each test specifically next to status would be good.

bisonbleu’s picture

StatusFileSize
new370.05 KB

@yautja_cetanu , the examples you provided are really nice… definitely where React shines!

For now I went for the minimalist approach: added a Time column to the Test Group table as you suggested. Also added the Execution Time when viewing a test details.

Test Group with Time

bisonbleu’s picture

After latest commit, tool_timings data is properly returned.

% ddev drush sql-query --database=default "select id,label,tool_timings from ai_agents_test_result where tool_timings !='{  }' order by id DESC"

31	Is Basic Page content type published by default	-\n  tool_id: toolu_01WqiMS9NuMAtji8xgF2HiCP\n  tool_name: get_content_type_info\n  start_time: 1759417803.2968\n  end_time: 1759417803.2978\n  duration: 0.001\n
30	Is Basic Page content type promoted by default	-\n  tool_id: toolu_01B1mWfygYaT1CzyCKyY2Rr7\n  tool_name: get_content_type_info\n  start_time: 1759417794.2072\n  end_time: 1759417794.2106\n  duration: 0.003\n
29	Is Basic Page content type sticky by default	-\n  tool_id: toolu_011xwm93CDbmEh6VBNw4zRww\n  tool_name: get_content_type_info\n  start_time: 1759417778.0349\n  end_time: 1759417778.04\n  duration: 0.005\n

And in case you're wondering, the "u" in toolu_ is not a typo - it's part of Anthropic's tool use ID format.

When Claude (Anthropic's models) uses tools/functions, it generates unique identifiers for each tool call. The prefix toolu_ stands for "tool use" and is followed by a unique alphanumeric string. This is the standard format that Anthropic's API returns when a model decides to call a tool.