Overview

This issue is created to review the current Canvas AI architecture based on discussions during DrupalCon Chicago and identify how we can improve performance, cost, and user experience.

/files/issues/2026-04-10/Screenshot%20from%202026-04-10%2014-43-01.png

Current Architecture Limitations

In the current architecture, the orchestrator agent is the least informed part of the system. Its only responsibility is deciding which sub-agent to call for a task.

It does not know:

  • What components exist on the current page
  • What components are available in the system
  • The code of a JS component

Because of this, most of the actual work and understanding is pushed to sub-agents.

There is also inefficiency in how input is handled. If a user provides a large input (for example ~1000 tokens), the orchestrator has to repeat and pass the same content again when invoking a sub-agent. This increases token usage and cost.

The current flow also has UI issues. When a sub-agent completes a task, its response is shown in the chat. Then the orchestrator returns a similar response again. The page updates only happen after the final orchestrator response. This creates duplicate messages and makes the flow confusing.

When building a page, the sub-agent generates a large YAML structure for components. If there is any mistake (invalid prop, missing required field), validation fails and the whole response is generated again. There is no caching or partial retry, which increases response time.

Proposed Direction

The orchestrator should be able to see and understand the system better.

It should have access to:

  • The current state of the page
  • Available components
  • Basic structure and constraints of components

The agent should create a plan for the end user for complex tasks, such as creating a full page, with real-time feedback on what’s happening. Some related issues:
#3547238: Follow up from 3531000 - Create a true plan for the end-user
#3546907: Implement Two-Step Agentic Flow with Planning Phase

Sub-agents should only focus on doing a specific task (for example creating or updating part of a page) without needing to handle everything.

Instead of generating the entire page in one go, the system should move towards smaller steps where parts of the page are created and shown as they are ready.

Context passing should also be reduced so we are not repeating the same large inputs across steps.

Potential Improvements

  • Make the orchestrator aware of page state and available components
  • Reduce repeating large inputs when calling sub-agents
  • Do not show sub-agent responses directly in the UI
  • Avoid full regeneration when validation fails
  • Support smaller, incremental updates instead of full page generation

Proposed resolution

User interface changes

Issue fork canvas-3584087

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

akhil babu created an issue. See original summary.

akhil babu’s picture

Issue summary: View changes
StatusFileSize
new92.91 KB
akhil babu’s picture

Issue summary: View changes
akhil babu’s picture

Issue summary: View changes
akhil babu’s picture

Issue summary: View changes
akhil babu’s picture

Issue summary: View changes
akhil babu’s picture

akhil babu’s picture

Issue summary: View changes

shubham.prakash made their first commit to this issue’s fork.

akhil babu’s picture

Title: Canvas AI: Review the current architecture » [Discuss] Canvas AI: Review the current architecture
d34dman’s picture

Hi @akhil babu,

I ran into similar issues during early iterations while working with Agentic workflow builder for FlowDrop. Even though its a different domain, the failure modes are quite close:

- hallucinated field names,
- half-applied state changes,
- prompt-injection through user content (un-intentional miss-guide),
- runaway token costs as state grows
- what i like to call "what ID did you just create?" round-trip problem (kind of catch-22 situation)

So a lot of engineering went into designing the system around the prompt.

What worked for us (its not perfect, but we achieved orders of magnitude improvement over pure agentic system)

- Choosing the right LLM output format for use (a Domain Specific Language instead of JSON tool calling)
- Slim context aggressively by supplying information on need to know base
- Defend against prompt injection in chat history
- Define predictable identifiers, so the LLM can plan multistep changes in single response without round trips
- Making agent action atomic and reversible
- "Manually encoding priorit failures into system prompt"

If there is enough interest in the solution, i am available to collaborate on this front.