Test multiple AI LLMs [#3545198]

Problem/Motivation

We have been mostly testing against Gemini. We should test the module against other LLMs like ones from OpenAI and Anthropic.

Steps to reproduce

Test either the dev version or alpha version with other LLMs, configure those LLMs for 'Chat' and 'Chat with Structured Response'

Remaining tasks

OpenAI

Install and enable the module
Install and enable the OpenAI https://www.drupal.org/project/ai_provider_openai module and configure it
Test the migration process (docs to be provided)

Anthropic

Install and enable the module
Install and enable the Anthropic https://www.drupal.org/project/ai_provider_anthropic module and configure it
Test the migration process (docs to be provided)

Comments

Comment #1

5 September 2025 at 20:49

dmundra created an issue. See original summary.

Comment #2

dmundra

he/him

English

commented 5 September 2025 at 20:49

Issue summary:

View changes

Comment #3

dmundra

he/him

English

commented 5 September 2025 at 21:09

Assigned:

Unassigned

» dmundra

Comment #4

dmundra

he/him

English

commented 10 September 2025 at 23:03

Issue summary:

View changes

Comment #5

dmundra

he/him

English

commented 10 September 2025 at 23:04

Working on testing this by setting API keys for both OpenAI and Anthropic

Comment #6

dmundra

he/him

English

commented 10 September 2025 at 23:51

Looks like for OpenAI sort of worked but the HTML was stripped out for one post and the other post was truncated. This was with gpt-3.5-turbo model

Here are the outputs:

AI migration response for url https://accessibility.civicactions.com/posts/delivering-digital-first-turning-21st-century-idea-into-action:
Array
(
    [title] => Delivering Digital First: Turning 21st Century IDEA into Action
    [created] => 1664534400
    [changed] => 1664534400
    [langcode] => Array
        (
            [value] => en-US
            [language] => English
        )

    [status] => 1
    [promote] => 1
    [sticky] => 
    [default_langcode] => 1
    [revision_default] => 
    [revision_translation_affected] => 
    [path] => Array
        (
            [alias] => /posts
            [pid] => 
            [langcode] => en
        )

    [field_author] => Array
        (
            [0] => Mike Gifford
            [1] => Emily Ryan
        )

    [field_post_content] => Array
        (
            [value] => What 21st Century IDEA Means to CivicActions Our team has embraced many of the key elements of The 21st Century Integrated Digital Experience Act (IDEA), which became a US law 5 years ago (December 2018). This was a bipartisan act to encourage government agencies to build a framework and requirements for a digital-first public experience, with an emphasis on accessibility, building mobile-friendly user experiences, digitizing services and forms, and improving the overall customer experience. The legislation was designed with specific recommendations for a variety of senior executive agency roles including Chief Information Officers. At a high level, the guidance was designed to push agencies and the overall administration to modernize the federal government. Included in the IDEA act is a handful of strategic goals calling on agencies to: Modernize their websites (which includes being mobile-friendly) Digitize services and forms Transition to standardized and centralized shared services Improve overall customer experience (CX) Additionally, the Office of Management and Budget (OMB) Memo M-23–22, Delivering a Digital-First Public Experience, came out this past fall (September 2023), which expanded on the 21st Century IDEA act. The OMB memo clarified what the 21st Century IDEA act meant by the concept of modernization. The OMB memo affords greater insight into what OMB and the federal government expects agencies to deliver to the public. This OMB Memo gives specific mechanisms by which agencies should be improving their CX. Specifically, agency sites should: Provide a feedback mechanism for users to report satisfaction or dissatisfaction with each web page or piece of web content Test online content with the intended target audience before and after publishing Examine websites and digital services to ensure content is written and implemented so that Limited English Proficiency (LEP) users can meaningfully access those services CivicActions has extensive experience working within these strategic goals found in the IDEA act and the OMB memo, and we have brought this expertise to agencies such as the US Department of Veterans Affairs (VA), Centers for Medicare and Medicaid Services (CMS), the National Science Foundation (NSF), and the Department of Education (ED). The work that we do is accessible, responsive, mobile performant, and frequently exceeds the requirements of Section 508, thereby ensuring the solutions create the best customer experience while adhering to the needs of the public. Modernizing websites We understand that a modern digital experience fully incorporates smartphones. Our team works with our clients to build a good mobile first experience. We know from the US Census data that millions of Americans do not have a desktop computer. Data from Analytics.USA.gov shows that more than half of the traffic to the sites monitored is coming from either iOS or Android users. To ensure we're building the best mobile-first experience, CivicActions works extensively with Drupal and the US Web Design System (USWDS). We maintain the Drupal USWDS Theme, and also implement it for our clients. This allows us to confidently build modern digital experiences for multiple devices, screen sizes and assistive technology, through a proven mobile-first strategy. By leveraging the USWDS federal design system and open source platforms such as Drupal, we improve the customer experience (CX) of our clients' sites. In addition, we bring content strategy, user research, and visual design into our processes to ensure they meet the needs of the public and the agencies we serve. Our practitioners are well-versed in using human-centered design (HCD) approaches and we balance this with addressing the needs of each agency and their goals. Finally, our focus on accessibility allows us to build sites that are future compatible, providing an experience that works across all of the public. Digitizing services and forms There is an ongoing challenge to digitize government services. Agencies are struggling to move beyond legacy PDF forms. At CMS and VA, we have done extensive work on building customized web forms. These forms follow best practices, and provide a user-friendly interface for citizens to provide information to the government. Such approaches also help to ensure information gathering is done in an accessible and usable manner, creating time-saving data collection methods for agencies and reducing mistakes and processing time. Transitioning to standardized and centralized shared services (including open source) CivicActions has extensive experience in open source communities, like the USWDS. We are connected into the USWDS community, and our teams contribute to improving the USWDS framework by actively engaging with the community through collaboration in GitHub and Slack as well as through annual conferences, trainings, newsletters, podcasts, webinars and other opportunities. Where we can, we bring our deep Drupal experience and solutions to our client's challenges back into the USWDS. This is a critical part of seeing that we continue to provide robust digital services in the future. This work with Drupal supports the goal of future transitions to centralized shared services. Building open source solutions with Drupal create structured content which lends itself to standalone and/or federated systems. Content can be published once, and blended into the pages of multiple sites, thus reducing the cost of maintaining repetitive content. Governments around the world are seeing the benefits of centralizing on mature open source technology like Drupal. Working with Drupal also provides us an advantage both in accessibility and mobile support. Drupal has been built to exceed the needs of Section 508 for both the public facing and author facing interfaces. With 25% of the population having a disability, we know that authors also need accessible interfaces. Because of our work in the Drupal community, we are able to engage with global experts on a range of topics. Improve overall customer experience (CX) As mentioned above, we have an experienced team with an understanding of user research, content design and service design work. CivicActions understands the need for user research and testing. Our human- centered and data-driven design team works to align with the USWDS and the specific design systems and style guides of our government clients. CivicActions' content design team is also working to establish plain language text that is easy to understand and more accessible to the average user. We know people come to government websites in order to find information and we work with our clients to build an experience that allows users to quickly find what they need. The OMB's policy guidance (M-23–22) embraces the idea that search is an important piece of a digital first public experience. It is important that government pages are optimized for search engines, but also that they have a meaningful internal search capacity. Our team has experience customizing the search experience in Drupal for our clients. There is a lot of customization available within Drupal, but we have also leveraged Search.gov services. We are working with our clients to find ways to embed dynamic experience for our users in their sites. We understand that government often requires users to complete complex tasks and we continue to utilize best practices that align to the IDEA act and the OMB's memo in order to simplify their workloads. Accessibility and Digital-First Public Experiences Finally, for the first time, per the OMB memo, agencies are directly encouraged to do the following: "Apply the most current Web Content Accessibility Guidelines (WCAG) published by the World Wide Web Consortium (W3C) to websites and web applications, where possible." "Include automated scanning, manual testing of websites, and usability testing with people with disabilities, as well as testing with users of adaptive technologies." "Incorporate the needs of individuals with disabilities into the design and development of websites and digital services, and should include individuals with disabilities in usability testing of new tools or features." The guidance from the OMB re-enforces the need for government agencies to focus more on accessibility. The Spring 2023 Section 508 Program Maturity Report, demonstrated accessibility is not being consistently well managed across government agencies. Citizens with disabilities should be able to expect that they can engage with government agencies. This has been understood as a right for citizens for over 30 years, yet it is still a challenge. The OMB Memo provides very specific guidance on how agencies should be addressing accessibility. Accessibility is at the heart of CivicActions mission; our team is taking a leadership role in Drupal's accessibility, and is engaged in several other open source projects that focus on accessibility as well. Our staff includes people with disabilities, and through our Champions Program and Accessibility Onboarding, we ensure that this value is understood by every team member. Where We Are Going Our team is continuing to innovate in digital government. Our work with the open source communities, like the USWDS & Drupal, allow us to regularly engage with other teams, projects, and agencies. By engaging with others, we learn and contribute to the development of best practices. We can then bring these approaches back to our clients, improving the experience for all. Reach out if you are interested in learning more about our approach — we'd love the opportunity to talk about what we can do together.
            [format] => text
        )

    [field_post_date] => Array
        (
            [value] => 2024-01-31
            [format] => date
        )

    [field_publishing_information] => Array
        (
            [value] => Delivering Digital First: Turning 21st Century IDEA into Action was originally published in CivicActions on Medium, where people are continuing the conversation by highlighting and responding to this story.
            [format] => text
        )

    [type] => simple_content_migration
)

AI migration response for url https://accessibility.civicactions.com/posts/prioritizing-accessibility-bugs-for-maximum-impact:
Array
(
    [langcode] => Array
        (
            [value] => en-US
            [language] => English
        )

    [status] => 1
    [title] => Prioritizing accessibility bugs for maximum impact
    [created] => 1659302400
    [changed] => 1659302400
    [promote] => 1
    [sticky] => 
    [default_langcode] => 1
    [field_author] => Array
        (
            [0] => Jack Haas
        )

    [field_post_content] => Array
        (
            [value] => Juggling competing priorities, tight deadlines, and endless iterations can make it easy to deprioritize accessibility considerations...
            [format] => text
        )

    [field_post_date] => Array
        (
            [value] => 2025-04-01
        )

    [field_publishing_information] => Array
        (
            [value] => Prioritizing accessibility bugs for maximum impact was originally published in CivicActions on Medium, where people are continuing the conversation by highlighting and responding to this story.
        )

    [type] => simple_content_migration
)

Comment #7

dmundra

he/him

English

commented 10 September 2025 at 23:57

gpt-4.0 was looking better but generated these errors when trying to migrate the HTML

TypeError: Drupal\ai_migration\Service\AiMigrationCacheBinProvider::setPromptResponse(): Argument #1 ($response) must be of type array, false given, called in /var/www/html/src/AiMigrator.php on line 212 in Drupal\ai_migration\Service\AiMigrationCacheBinProvider->setPromptResponse() (line 52 of /var/www/html/src/Service/AiMigrationCacheBinProvider.php)
#0 /var/www/html/src/AiMigrator.php(212): Drupal\ai_migration\Service\AiMigrationCacheBinProvider->setPromptResponse()
#1 /var/www/html/src/Plugin/migrate_plus/data_parser/Ai.php(116): Drupal\ai_migration\AiMigrator->convert()

Failed to decode cleaned AI JSON response: { "data": { "type": "node--simple_content_migration", "id": "prioritizing-accessibility-bugs-for-maximum-impact", "attributes": { "title": "Prioritizing accessibility bugs for maximum impact", "created": 1604321962, "changed": 1604321962, "field_author": ["Jack Haas"], "field_post_date": { "value": "2025-04-01" }, "field_post_content": { "value": "<p>By Jack Haas, Front End Engineer</p><p><img src='https://example.com/max/1024/1*Ss7GsqaiWJYjA_4fkagEVQ.jpeg' alt='Close-up of a computer screen displaying lines of color-coded programming code.' /></p> <p>Juggling competing priorities, tight deadlines, and endless iterations can make it easy to deprioritize accessibility considerations. In our work, we've seen how neglecting accessibility can lead to frustrating user experiences and other consequences. That's why we decided to <a href='https://github.com/readme/guides/fix-accessibility-bugs'>treat accessibility issues with the same urgency as any other bug.</a></p> <p>Enter the Accessibility Bug Classification Matrix, which we created to help us quickly prioritize issues based on their impact. It provides a clear framework for promptly identifying and addressing the most critical accessibility barriers.</p> <h3 id='features-of-our-bug-classification-matrix'>Features of our bug classification matrix</h3> <ul> <li>Adapted from industry best practices</li> <li>Covers a wide range of Web Content Accessibility Guidelines (WCAG) criteria</li> <li>Prioritizes issues based on user impact severity</li> <li>Provides clear action steps for each priority level</li></ul> <h3 id='overview'>Overview</h3> <p>As our team grows and projects become more complex, we've realized the value of having a centralized process and systemic approach to prioritizing accessibility bugs as they are discovered.</p> <p>Inspired by the impact classifications developed by accessibility experts at Deque, we adapted their framework to fit our workflows and priorities. The result is a helpful tool that helps our teams quickly assess and prioritize accessibility issues based on their true severity and impact on users.</p><p><strong>This matrix isn't just for accessibility experts.</strong> We designed it to be a resource for all team members, regardless of the accessibility knowledge. By providing clear definitions and examples for each priority level, we're empowering everyone on our team to make informed decisions about what needs to happen when a bug is discovered.</p><p>It's important to note that this isn't meant to be a rigid, exhaustive list of every possible accessibility issue. We recognize that the field of accessibility is constantly changing, and new challenges emerge.</p> <h3 id='accessibility-as-a-team-sport'>Accessibility as a team sport</h3> <p>We're fostering a culture of shared responsibility by making this resource available to the whole team. Accessibility shouldn't be limited to a few experts. Whether you're a project manager triaging a backlog of issues, a QA team member doing release testing, or a designer who has noticed something odd when reviewing how their work was implemented, our <a href='https://accessibility.civicactions.com/guide/defect-priority'>Accessibility Bug Classification Matrix</a> provides a common language and clear guidance.</p> <h3 id='assessing-failure-levels-and-issue-priorities'>Assessing failure levels and issue priorities</h3> <p>We've divided the matrix into five levels, each with its own set of criteria and corresponding remediation priority.</p> <h3 id='level-1-page-level-interference-single-a-wcag-criteria'>Level 1: Page-level interference (Single A WCAG criteria)</h3> <p>Think of this as the 'drop everything and fix it now' category. Issues at this level are the most severe, as they fundamentally block users with disabilities from accessing essential content or functionality.</p><p><img src='https://cdn-images-1.medium.com/max/696/0*lQ7GlQN8eGmycoN2' alt='Table of criteria showing page-level interference: the highest level of user impact which should be addressed immediately.' /> See the full table of criteria in the <a href='https://accessibility.civicactions.com/guide/defect-priority'>Accessibility Bug Classification Matrix</a>.</p> <p>When we encounter issues like these,

Comment #8

dmundra

he/him

English

commented 11 September 2025 at 00:02

Trying Anthropic and model Claude 3.7 Sonnet got errors curl and overloaded errors. Will try tomorrow and maybe another model.

Comment #9

dmundra

he/him

English

commented 11 September 2025 at 16:30

OpenAI gpt-4.1 did better but noticed that for one post it set the text format to plain_text and the other one to full_html. I presume prompt adjustments will be needed per model.

Going to try anthropic one more time.

Comment #10

dmundra

he/him

English

commented 11 September 2025 at 16:39

Status:

Active

» Needs review

Tried Anthropic and it failed on both migrations with time outs but still create one page with the same text format mapping issues.

Comment #11

majorrobot commented 11 September 2025 at 20:01

Thanks for testing @dmundra. I'll take a look, too.

I assume you were using stable versions for Open AI and Anthropic provider modules? I wonder if it would make any difference to use dev versions.

Comment #12

dmundra

he/him

English

commented 11 September 2025 at 23:18

Yes using the stable versions:

        "drupal/ai_provider_anthropic": "^1.1",
        "drupal/ai_provider_openai": "^1.1"

Comment #13

dmundra

he/him

English

commented 12 September 2025 at 20:48

Ah the issue with Anthropic is max tokens being capped at 2048 when returning values that was the reason for the issues.

Comment #14

dmundra

he/him

English

commented 12 September 2025 at 20:51

Create separate issue for the text format issue

Comment #15

dmundra

he/him

English

commented 19 September 2025 at 20:21

Status:

Needs review

» Fixed

We decided for testing to stick with these models:

Regular testing

OpenAI GPT-4.1-mini
OpenAI GPT-5-nano
Gemini 2.5 Flash

Extended testing

Claude 4 Sonnet
Gemini 2.5 Pro
Open AI GPT-5

Comment #16

19 September 2025 at 20:21

Now that this issue is closed, please review the contribution record.

As a contributor, attribute any organization that helped you, or if you volunteered your own time.

Maintainers, please credit people who helped resolve this issue.

Comment #17

dmundra

he/him

English

commented 19 September 2025 at 20:22

Status:

Fixed

» Closed (fixed)

Test multiple AI LLMs

Problem/Motivation

Steps to reproduce

Remaining tasks

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

Referenced by

News items

Our community

Documentation

Drupal code base

Governance of community