Problem/Motivation
We have been mostly testing against Gemini. We should test the module against other LLMs like ones from OpenAI and Anthropic.
Steps to reproduce
Test either the dev version or alpha version with other LLMs, configure those LLMs for 'Chat' and 'Chat with Structured Response'
Remaining tasks
OpenAI
- Install and enable the module
- Install and enable the OpenAI https://www.drupal.org/project/ai_provider_openai module and configure it
- Test the migration process (docs to be provided)
Anthropic
- Install and enable the module
- Install and enable the Anthropic https://www.drupal.org/project/ai_provider_anthropic module and configure it
- Test the migration process (docs to be provided)
Comments
Comment #2
dmundraComment #3
dmundraComment #4
dmundraComment #5
dmundraWorking on testing this by setting API keys for both OpenAI and Anthropic
Comment #6
dmundraLooks like for OpenAI sort of worked but the HTML was stripped out for one post and the other post was truncated. This was with gpt-3.5-turbo model
Here are the outputs:
Comment #7
dmundragpt-4.0 was looking better but generated these errors when trying to migrate the HTML
Failed to decode cleaned AI JSON response: { "data": { "type": "node--simple_content_migration", "id": "prioritizing-accessibility-bugs-for-maximum-impact", "attributes": { "title": "Prioritizing accessibility bugs for maximum impact", "created": 1604321962, "changed": 1604321962, "field_author": ["Jack Haas"], "field_post_date": { "value": "2025-04-01" }, "field_post_content": { "value": "<p>By Jack Haas, Front End Engineer</p><p><img src='https://example.com/max/1024/1*Ss7GsqaiWJYjA_4fkagEVQ.jpeg' alt='Close-up of a computer screen displaying lines of color-coded programming code.' /></p> <p>Juggling competing priorities, tight deadlines, and endless iterations can make it easy to deprioritize accessibility considerations. In our work, we've seen how neglecting accessibility can lead to frustrating user experiences and other consequences. That's why we decided to <a href='https://github.com/readme/guides/fix-accessibility-bugs'>treat accessibility issues with the same urgency as any other bug.</a></p> <p>Enter the Accessibility Bug Classification Matrix, which we created to help us quickly prioritize issues based on their impact. It provides a clear framework for promptly identifying and addressing the most critical accessibility barriers.</p> <h3 id='features-of-our-bug-classification-matrix'>Features of our bug classification matrix</h3> <ul> <li>Adapted from industry best practices</li> <li>Covers a wide range of Web Content Accessibility Guidelines (WCAG) criteria</li> <li>Prioritizes issues based on user impact severity</li> <li>Provides clear action steps for each priority level</li></ul> <h3 id='overview'>Overview</h3> <p>As our team grows and projects become more complex, we've realized the value of having a centralized process and systemic approach to prioritizing accessibility bugs as they are discovered.</p> <p>Inspired by the impact classifications developed by accessibility experts at Deque, we adapted their framework to fit our workflows and priorities. The result is a helpful tool that helps our teams quickly assess and prioritize accessibility issues based on their true severity and impact on users.</p><p><strong>This matrix isn't just for accessibility experts.</strong> We designed it to be a resource for all team members, regardless of the accessibility knowledge. By providing clear definitions and examples for each priority level, we're empowering everyone on our team to make informed decisions about what needs to happen when a bug is discovered.</p><p>It's important to note that this isn't meant to be a rigid, exhaustive list of every possible accessibility issue. We recognize that the field of accessibility is constantly changing, and new challenges emerge.</p> <h3 id='accessibility-as-a-team-sport'>Accessibility as a team sport</h3> <p>We're fostering a culture of shared responsibility by making this resource available to the whole team. Accessibility shouldn't be limited to a few experts. Whether you're a project manager triaging a backlog of issues, a QA team member doing release testing, or a designer who has noticed something odd when reviewing how their work was implemented, our <a href='https://accessibility.civicactions.com/guide/defect-priority'>Accessibility Bug Classification Matrix</a> provides a common language and clear guidance.</p> <h3 id='assessing-failure-levels-and-issue-priorities'>Assessing failure levels and issue priorities</h3> <p>We've divided the matrix into five levels, each with its own set of criteria and corresponding remediation priority.</p> <h3 id='level-1-page-level-interference-single-a-wcag-criteria'>Level 1: Page-level interference (Single A WCAG criteria)</h3> <p>Think of this as the 'drop everything and fix it now' category. Issues at this level are the most severe, as they fundamentally block users with disabilities from accessing essential content or functionality.</p><p><img src='https://cdn-images-1.medium.com/max/696/0*lQ7GlQN8eGmycoN2' alt='Table of criteria showing page-level interference: the highest level of user impact which should be addressed immediately.' /> See the full table of criteria in the <a href='https://accessibility.civicactions.com/guide/defect-priority'>Accessibility Bug Classification Matrix</a>.</p> <p>When we encounter issues like these,Comment #8
dmundraTrying Anthropic and model Claude 3.7 Sonnet got errors curl and overloaded errors. Will try tomorrow and maybe another model.
Comment #9
dmundraOpenAI gpt-4.1 did better but noticed that for one post it set the text format to plain_text and the other one to full_html. I presume prompt adjustments will be needed per model.
Going to try anthropic one more time.
Comment #10
dmundraTried Anthropic and it failed on both migrations with time outs but still create one page with the same text format mapping issues.
Comment #11
majorrobot commentedThanks for testing @dmundra. I'll take a look, too.
I assume you were using stable versions for Open AI and Anthropic provider modules? I wonder if it would make any difference to use dev versions.
Comment #12
dmundraYes using the stable versions:
Comment #13
dmundraAh the issue with Anthropic is max tokens being capped at 2048 when returning values that was the reason for the issues.
Comment #14
dmundraCreate separate issue for the text format issue
Comment #15
dmundraWe decided for testing to stick with these models:
Regular testing
Extended testing
Comment #17
dmundra