--- AI TRACKER METADATA ---
Update Summary: Iteration agent on created components
Check-in Date: MM/DD/YYYY (US format) [When we should see progress/get an update]
Due Date: MM/DD/YYYY (US format) [When the issue should be fully completed]
Blocked by: [#XXXXXX] (New issues on new lines)
Additional Collaborators: @username1, @username2
AI Tracker found here: https://www.drupalstarforge.ai/
--- END METADATA ---

Background

We have been playing around manually with the Agent Explorer and the component creator agent to see if a subagent that looks at an image of an initial component and look at the state of the component will generate better results, and with three tests its obvious that it does.

One example:

We start with this image and ask the agent to generate it:

original component image

The first iteration as the agent works at the moment generates this:

First iteration

We then loop this screenshot and the original image and ask a sub agent to look and describe what is needed to change. It comes up with:

### Key Changes Needed for testcomponent

1. **Background**
- Add a blue abstract gradient with glowing/light overlay effects using custom CSS. Keep Tailwind's bg-black as a fallback.

2. **Typography & Layout**
- Split the large heading into two lines with stronger sizing and spacing.
- Add a glowing or outer shadow/text-shadow effect to the hero text for emphasis (custom CSS).
- Increase spacing between all major elements.

3. **Top Badge**
- Update the "NEW" badge with a more vivid blue and subtle shadow/glow (custom CSS).
- Increase badge and padding roundedness to look more pill-like.

4. **Button**
- Make the button more pill-shaped, larger, with a blue glow/inner shadow (custom CSS).
- Lighter button font color.

5. **Supporting Text**
- Subheadline should be slightly lighter, add subtle blue tint if possible via CSS.

We then feed this into the agent again and tell it to change this, and it also give the two images and it comes up with:

Second iteration

Add another step and its at

Third iteration

So its clear that this improves the results.

Overview

We should build this in so the agent automatically can loop over its results until its happy. There are some questions that needs to be answered though:

We can do this via the frontend, since the iframe that previews the component is on the same origin - a tool like html2canvas should work. This however means that the loop or instruction has to be started over or triggered via the frontend, instead of the normal agent loop.

One option is that the agent returns a value of how well the new component fits the screenshot and if its under a threshold and it didn't reach max loops, it just sends it back for another iteration.

Since it costs money another solution is that after you generate a component from a prompt or image, that there is a button in the chatbot or on the preview that says something like "Improve quality" that runs another loop.

Tested and not a viable solution.

We can also do this via backend, but then we need an endpoint where we can render the unsaved component. On top of that we need to have headless chrome or somehing similar on the backend that can take screenshots, alternate a service that can run a proxy so it can work against your local machine.

The advantage with backend generation is that you will be able to create components over MCP (and through it Claude Code, Roo Tools, Cursor etc.) and also create a starterkit modules, that can pre-generate 10-15 components for you when you start a new website.

Proposed resolution

  • Figure out which, if any solution is feasible to deploy. We will also have a way where agents can get supplemental features via recipes in AI 1.3.0, so it doesn't have to ship with XB AI core.
  • ☑ Experiment with the html2canvas and XB to verify that it works. - does not work

User interface changes

TBD

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

marcus_johansson created an issue. See original summary.

marcus_johansson’s picture

StatusFileSize
new21.21 KB
new310.42 KB

html2canvas tested with a manual button against the Experience Builder iframe and the results are not 1-to-1, so its not a solution that is viable.

vs

Its also stated in the documentation

Why doesn't CSS property X render correctly or only partially?
As each CSS property needs to be manually coded to render correctly, html2canvas will never have full CSS support. The library tries to support the most commonly used CSS properties to the extent that it can. If some CSS property is missing or incomplete and you feel that it should be part of the library, create test cases for it and a new issue for it.

I will do the headless chrome tool, because that is needed for other reasons anyway, the question then is how it can access the component via localhost. Will try to figure this out.

marcus_johansson’s picture

Issue summary: View changes
marcus_johansson’s picture

Issue summary: View changes

yautja_cetanu’s picture

yautja_cetanu’s picture

Follow up

  • Need to turn it into a proper agent now file upload hasn't happened.
  • Make it so we can swap out the back-end service for looking at the images.
  • Make it so we can configure the number of loops
  • UX Work - What does the user see whilst its doing its steps? (It has re-write the code for the componenent and re-write the output, will a user see it updating? Will it be greyed out so they can't make manual changes?)
yautja_cetanu’s picture

Issue tags: +AIInitiative, +Needs design
marcus_johansson’s picture

So, the current setup for anyone taking over:

There are three components to this agent:

1. The preview controller.
This is a controller that can preview the latest version of a JS component rendered within the frontend theme. This takes parameters that can be set with width and height in percentage or pixels. The reasoning is so you can request the component rendered with the frontend theme with the exact same measurements, so it can compare.

2. The Rate XB Components tool.
Currently it takes the file id and the component id, and uses Chromium to visit the controller above and take a screenshot and then it asks inside the tool to rate this from 1 to 10 how similar they are and what improvements can be made and returns these.

Pre-requirements:
This solution needs chromium installed can chrome-php.

What is missing:
On the controller we should add a possibility to have one off permissions. This means that the agent should be able to request and get a authentication code that is one-use only. Let me see if I can fix this.

The tool should be possible to move into a tool that a sub-agent is using instead now that we can use images.

The last part is to reiterate on the agent over and over until you get a result that is over N within A amount of loops. Rating 6 is something that worked quite well with the prompt above. I had that working in long timeout format, but with the iteration issue this should be possible to see over and over.

kristen pol’s picture

Issue tags: -AIInitiative +AI Initiative

Switching to the correct tag

akhil babu’s picture

Assigned: Unassigned » akhil babu
akhil babu’s picture

Currently, the code component creation agent returns the component data (id, js, css etc) in JSON format and based on that, the actual component is created from the UI.
But to review the component on the fly before sending the final output, the component must exist. Means, we should create it directly from the backend.

akhil babu’s picture

Version: 0.x-dev » 1.x-dev

akhil babu changed the visibility of the branch 1.x to hidden.

akhil babu’s picture

As mentioned in #12, component creation is currently happening from the UI. For the review to work, the component must be created from the backend. However, in that case, the CSS and JS will not be compiled properly, since the compilation happens in the frontend and the compiled CSS and JS are then sent to the backend. Because of this, the component will not be rendered in the preview controller.

Also, when Chromium is used, the component is rendered as seen by an anonymous user. XB does not apply styles for anonymous users. The Create/Edit XB page content permission is required for this. Even if the permission is granted, the issue described above still prevents the component from being previewed.

Also, currently the component agent doesn't actually 'see' the component image. The image is given only to the orchestrator agent and orchestrator 'describes' the image to the component agent. I think the output of first iteration would be more similar to the uploaded image, if the image is actually given to the component agent. Few options that come to my mind are

  • Create a tool for the component agent, get_image_content, that simply returns the output of file_get_contents($image_url).
  • Subscribe to Drupal\ai\Event\PreGenerateResponseEvent to check if the orchestrator has an image input → store the image ID somewhere → then attach the image when the component agent is triggered, using the same event subscriber.

I tried option 2. The Drupal\ai\OperationType\GenericType\ImageFile class currently only contains the filename, mime type, and file binary data. Perhaps an issue could be created for the AI module to also include the file id for images.

yautja_cetanu’s picture

For people not into AI basically we need to find a way that we automatically generate an accurate screenshot of a componenet.

- The componenet is built in react by typing javascript, css etc.
- We need some way we can take the componenet we see rendered and automatically take a screenshot without the user having to press screenshot screens.
- We need to send that screenshot to a server side API for processing (in this case an AI agent looking at it) whilst thinking carefully of any security implications.

yautja_cetanu’s picture

StatusFileSize
new9.52 MB

Andrew has created a video of the new htmltocanvas pro:

xb-ai-canvas.mp4

It shows a very simple way of this working but its not perfect, you can see the colour of "New Introducing Framer forms" isn't quite right.

andrewbelcher’s picture

StatusFileSize
new191.96 KB

improved canvas vs component with foreignObjectRendering

This with with the following option:

{
  foreignObjectRendering: true
}

I'm not sure the implication on browser support - this is the description from the docs:

Whether to use ForeignObject rendering if the browser supports it

andrewbelcher’s picture

StatusFileSize
new206.42 KB

rendering via screen capture API

Here is another approach using the screen capture API.

There is an issue of blur in this. That may not affect the LLM's looping as it may not care about image resolution/quality. However, I believe it is to do with DPI of the source vs the target canvas, which I expect is resolvable (perhaps this article would help).

I think we'd also want to look at finding a way to resize the iframe to the "correct" size to avoid scrolling etc, which the canvas approach bypasses.

Again some rough and ready code:

const iframe = document.querySelector('iframe[title="XB Code Editor Preview"]') as HTMLIFrameElement;

  let captureStream = null;

  try {
    captureStream = await navigator.mediaDevices.getDisplayMedia({
      video: {
        displaySurface: "browser",
      },
      audio: false,
      // @ts-ignore
      preferCurrentTab: true,
      selfBrowserSurface: "include",
      systemAudio: "exclude",
      surfaceSwitching: "include",
      monitorTypeSurfaces: "include",
    });

    const track = captureStream.getVideoTracks()[0];
    // @ts-ignore
    const imageCapture = new window.ImageCapture(track);
    const bitmap = await imageCapture.grabFrame();
    const canvas = document.createElement("canvas") as HTMLCanvasElement;
    canvas.width = iframe.offsetWidth;
    canvas.height = iframe.offsetHeight;
    const bounding = iframe.getBoundingClientRect();
    (canvas.getContext("2d") as CanvasRenderingContext2D).drawImage(bitmap, bounding.left, bounding.top, iframe.offsetWidth, iframe.offsetHeight, 0, 0, iframe.offsetWidth, iframe.offsetHeight);

    const img = document.createElement('img');
    img.src = canvas.toDataURL('image/png');
    img.style.position = 'absolute';
    img.style.top = '10px';
    img.style.left = '10px';
    img.style.border = '2px solid green';
    document.body.appendChild(img);
  } catch (err) {
    console.error(`Error: ${err}`);
  }

andrewbelcher’s picture

StatusFileSize
new9.69 MB

xb-ai-review-2.mp4 is a quick video of the change in merge request !1462.

At the very least it would need a better icon, and a review of my noddy React code. If we want to try and get this merged, it might want a feature flag for now. We were thinking that we might try and support a few different approaches, giving scope for server side headless browser where available as a more reliable and direct alternative. So a setting with options which allow us to control what we do, with a default of "disabled", might be a good approach.

lauriii’s picture

I'm wondering if we're approaching images and implementing designs the right way. It doesn't look like even with the latest changes the design produced matches closely enough the original image.

I also don't think it makes sense that the image is visible in the chat. It's confusing to see a new image appear there that I didn't upload myself.

I'm also wondering if we should upload multiple images because some issues could happen on a specific breakpoint?

yautja_cetanu’s picture

I'll bring this up at the XB AI meetup but I think we should create another module for this to experiment on, could be a sandbox module, could be something else and maybe bring it back in. As I think we need to find out how to make something that physically works before figuring out the ideal UI.

There are three approaches:

  • Use the html2canvas stuff in the example above - This is the simplest, and provides high resolution screenshots, but they are sometimes inaccurate (incorrect fonts, background colours, etc).
  • Use the browser screenshot tool - This requires the User to click a button giving permission for the screen to be shared with XB AI like when you share a screen in google meet. It provides accurate images but they are low resolution and grainy (which I think will be fine, AI doesn't seem to do better with higher res images as it takes up more tokens.
  • php chrome or playwrite, server side with a headless browser - This is the most likely to work method, and will be necessary if we use agents working in the background so we want to do this eventually. But its the hardest to host and make work. (Agents need to direct a browser).

I think we need to make a module that allows us to switch between the 3 above so we can try prompt engineering to get it to work well. With testing a review agent using chatgpt playground, we got the design to look pretty good after one iteration. As you can see above, in the canvas one, it didn't look good after one iteration (but that might just be the prompt). When we have a better idea of what works we can think about the different aspects of the UI we have to worry about:

  • Accuracy of the Image - One reason why we went for making it so it uploaded the screenshot in the chat is that the screenshot isn't accurate sometimes. This gives the end-user the ability to see and understand why its not quite working. (But obviously if we can make the screenshots accurate then we don't need this.
  • How long this takes - Review agents significantly increase the cost and time it takes to do anything. We may need some UI that allows people to choose (Like chatgpt has thinking mode vs not) or can we make it happen quickly?
  • Why are these review agents happening and what prompts should fire them off or not? In the demo above we side-stepped it by giving the user a button. But the example of, using a figma screenshot and making an exact duplicate is unlikely to be a major way the chatbot is used, if that is our goal a dedicated screenshot to drupal tool is probably better. Its more likely they will upload images for inspirtional. So that would change things a little. (Maybe you don't want a review agent as you want user feedback quickly).

So my proposal is:

We create a module that allows you to pick between 3 different approaches and we can try the prompt engineering to figure out what is the best approach and then we can use this information to answer the UX questions above (eg break points etc).

akhil babu’s picture

Assigned: akhil babu » Unassigned
catia_penas’s picture

Issue tags: +strategic evolution
catia_penas’s picture

Issue summary: View changes
yautja_cetanu’s picture

Title: Iteration agent on created components » Visual Review and Iteration agent on created components

Project: Experience Builder » Drupal Canvas

Experience Builder has been renamed to Drupal Canvas in preparation for its beta release. You can now track issues on the new project page.

rakhimandhania’s picture

rakhimandhania’s picture

rakhimandhania’s picture

Issue tags: +AI Page Generation