Context: https://dri.es/ai-creates-asymmetric-pressure-on-open-source

We're collecting experiences from contributors experimenting with AI tools. We're interested in two categories:

  1. Tools that help contributors: writing better patches, catching mistakes before submitting, improving documentation, testing your own work
  2. Tools that help maintainers: code review, triage, finding bugs, automated quality checks

What is working? What isn't? Successes and failures are both valuable.

Share your experience in the comments, or link to a blog post if you've written about it. I'll report back on what we learn on my blog or at DrupalCon.

CommentFileSizeAuthor
#7 besian-ai-stats.zip36.15 KBspec0

Comments

dries created an issue. See original summary.

scott falconer’s picture

One thing that's working for me is using agent skills to guide the AI agents to work the way I want them to work. An example of this is https://github.com/scottfalconer/drupal-contribute-fix.

In my case, I was noticing that agents would often solve a problem by rolling a custom patch - it worked, but that was the end of it.

Once I had a skill that instructed an agent to 1) first check Drupal.org for any similar issues and 2) the notify me that it had found a candidate patch for contribution, it happily worked in a way that both benefited my project and the community.

There are layers to this solution though. As-is, I wouldn't feel comfortable having an agent submit a patch. That would easily overwhelm maintainers by putting the burden of review on them. That may be a solvable problem in the future, i.e. we could have something like "ready for agent review" wherein another dev's agent could test the patch when they themselves run into the same issue. Only after some sort of consensus is it elevated for human review.

breidert’s picture

We are still testing different ways to better Integrate AI into our development workflows. Once we have agreed in results I will post them here.

@Scott Falconer this is great !

ronaldtebrake’s picture

Yesterday I've read about AGENTS.md outperforming skills. So we're now trying to find out where that balance should be, as we're using both.

I will report back when we have more information on that.

On top of that, we've been using CodeRabbit as a tool to review, and even our ai skeptics are now very happy with this. The quality of the review is getting better and better, it's definitely caught some nasty bugs, and for us it's proven a great way of a first line set of review before a second peer reviewer comes in to play.


Side note, we're actively working on A starterkit for Drupal development with AI, to get to a shared set of standards, skills etc. that help not only Drupal development with AI, but also increase the quality of the outcome.

Looking forward to learn from others how we can help with contribution and maintaining too, I feel like there are great opportunities there.

jonathanshaw’s picture

Given how fast moving this space, a slack channel for AI tooling for Drupal developers and contributors would seem like a good place to share knowledge and ideas.

For example, on the main AI channel, I shared last week a skill.md that told AI how to search drupal.org gitlab for code examples across all contrib projects.

ghost of drupal past’s picture

Tools that help contributors / maintainers

And what about tools that destroy the environment, destroy democarcies, built and operated by exploiting the poor and those that not only encourage children to commit suicide but tell them how to tie the noose? Oh. Those are the same.

I maintain https://github.com/lobsters/lobsters/blob/main/AGENTS.md is the right thing to do.

spec0’s picture

StatusFileSize
new36.15 KB

Instead of ban of AI, control and predictability metric might be a better approach. For example, Bayesian framework based analysis as a measurable quality/reliability. Something along these lines.
Ps. Just to clarify, this is AI generated rough example, needs more research and further verification before real world adoption. The idea here is to have similar approach as a git hook and/or pipeline build step, in order to make assessment of AI code generated solutions/MRs. Ideally to remove some of the heavy lifting away from reviewers and open source maintainers. If this turns out to have potential value I'll make it into a real hosted library fetched by composer.
Ps2. This implementation needs persistent storage to update beliefs based on evidence otherwise we always start with the same prior belief, which defeats the purpose.
https://en.wikipedia.org/wiki/Bayesian_inference

cmlara’s picture

Linking #3565917: Proposed guidelines for AI contribution which is discussing what the D.O. AI policy should be, what tooling use will even be allowed on D.O.

catch’s picture

Linking a previous discussion at #3562505: Prototype a pipeline step for automated AI review of MRs.

Without rehashing previous points made there, there are more automated things we could do on Drupal core and contrib MR pipelines (like generating MRs against the main MR to fix phpstan and phpcs issues and/or regenerating the baseline) that do not rely on LLMs, but the issue title/summary appears to exclude additional tools that are not based on LLMs.

nod_’s picture

Tried to use LLM to help with credit attribution, not a good idea: https://tresbien.tech/blog/algorithmic-bias-against-drupal-community-val...

penyaskito’s picture

re: #9. Policies might need to change given https://www.drupal.org/drupalorg/docs/marketplace/abuse-of-the-contribut...

Use of automated tools to bulk-post to issues without authorization

depending on the definition of "bulk-post" and "without authorization".

catch’s picture

@nod_ thanks for writing the blog post.

There are two sentences which are very worth highlighting directly here:

Depending on the complexity, activity, and number of contributors involved it can take me anywhere between 30 seconds and 30 minutes to assign credit when I commit an issue to Drupal Core

This is true, assigning issue credit is quite time consuming for any issue with more than about 5 contributors and especially when it is an old issue with lots of comments. This is partly because there's no way to look at an issue or MR comment and assign issue credit directly from them, rather you have to have two tabs open, one with the checkboxes and one with the issue, and this won't change with the move to gitlab issues.

But this is also equally true:

If successfully implemented we would spend time evaluating the LLM output, not thinking about people that helped improve Drupal.

Even if a human has previously sorted out issue credit on an issue, which sometimes happens, I still need to at least cursorily review it before commit - at least to check if it was done sufficiently recently that it's not been outdated by newer contributions to the issue. If an LLM is able to assign issue credit with a 60% success rate this helps me exactly 0% because I would have no idea what is the 60% correct and what is the 40% incorrect. Any kind of 'reasoning' attached to that would be even more things to review on top of what already has to be reviewed.

catch’s picture

@penyaskito I don't think we'd change the policy for a phpcs fixer (or rector one), rather we'd want explicit permission from the DA to do it within the existing policy - this is also how we added the 'needs review' bot that @nod_ runs. Even the bots that we've very carefully set up and discussed before implementing still occasionally wreak carnage on the issue queue.

Although we have a much better track record than google on github does.

penyaskito’s picture

@catch Totally agree, but we should be explicit about it depending on the output of this, as we are with "bots".

AFAIK bots are allowed on very concrete occasional scenarios, after community scrutiny, run by known and trustable community members, and never posting on their name.

As the IS mentions "issue triage", I think it's worth mentioning (and btw I personally hope this policy _doesn't_ need to change and AI agents are treated in the same way we treat bots now).

catch’s picture

@penyaskito oh I only meant in the context of phpcsfixer or rector (e.g. existing non-LLM linting tools), not anything else discussed here, definitely that all needs to be looked at in the context of the policy. For me one of the most disheartening things on github is refreshing an issue I opened a couple of hours on github with 5 comments to see they were all automated and often telling me things I could see in a pipeline result anyway. Drupal.org almost completely avoids this - while we have some automated messages, they only operate when issues are becoming stale.

andypost’s picture

Github also looking how to block shitty contributions https://github.com/orgs/community/discussions/185387

jrockowitz’s picture

I have been thinking about this discussion for the past week, and I appreciate the possibility of using AI to help wrangle the issue queue and the contribution credit system. I can't help but come back to using AI to make it easier for people to contribute to Drupal and to help expand and sustain our community.

As a community, we have recently reimagined the site builder and content creator experience to leverage AI and make our product more inviting to prospective site owners and builders. We need to take an equivalent approach to welcoming prospective developers and contributors. Recent college graduates can't find junior developer positions due to AI. What if, as a community, we used AI to make contributing to Drupal easier and more inviting? For example, we could improve our documentation, examples, and code so that LLMs understand our code and design patterns, and generate high-quality, reviewable, and maintainable code.

We need to set a high bar for what we expect from AI-generated code without sacrificing our ability to be forward-thinking, while not leaving hardworking junior or senior developers behind.

I believe an initial small but significant first step, which originated this discussion, is to consider whether Drupal core should include an AGENTS.md file.

scott falconer’s picture

I think the core shift here is that Drupal now has multiple "actors" doing work, be it humans, bots, agents, or hybrids, and is scaling to new levels. As it does, both within Drupal sites and for the Drupal community in general, the fundamentals don’t change: work must be consistent, reliable, safe, and should provide a net benefit for the community.

What's nice is that the best fixes aren't AI-specific. They're improvements that help everyone, including people who never use AI. i.e:

- Make implicit conventions explicit (AGENTS.md / clearer CONTRIBUTING / templates).
- Strengthen guardrails (CI, lint, static analysis, test expectations) so the impact of low-quality (or well intentioned but misaligned) contributions have minimal human impact.
- Ensure automated actors use the same permissions/audit/reversibility/expectations as humans.
- Track a few simple metrics to help us objectively judge outcomes and impact, both positive and negative. (i.e. nod_'s blog post above shows why this is critical.)

To that end, I'd like to propose an evaluation framework for work from any actor that applies in places we're feeling the scaling pressure:

- 1) community contribution workflows (issues/MRs/maintenance)
- 2) runtime site operations (config changes, deployments, site builder tasks, content generation).

If we define a small benchmark of representative tasks in each lane (e.g. Community: "fix this failing test", "resolve this error", "update this config", "credit this contribution"; Runtime: "create a content type / view", "apply an update", "perform an action with auditable/reversible state"), and measure outcomes reproducibly, we can move to evidence generation and track that our guardrails are reducing risk and maintainer/admin/user/agent load before work reaches humans.

ronaldtebrake’s picture

+1 on #18

In addition, one thing I've seen recently is the https://agent-trace.dev/ specification, which we could turn in to a tool that could help the visibility and traceability and potentially become part of other workflows. (e.g. auto tagging labels)

spec0’s picture

Quick update on my previous comment: https://packagist.org/packages/specoto/besian-ai-stats (needs improvement still, maintainers hello).

https://agent-trace.dev/ looks interesting. However, in my opinion, it is best to treat ANY MR as "AI slop" regardless of disclosure by author, as a defensive technique.

jonathanshaw’s picture

Interesting statement here https://github.com/openai/codex/discussions/9956 from Codex maintainers about AI and PRs:

... AI coding tools like Codex have made it cheap and fast to generate large volumes of code. While this has been a huge productivity win, it has also put real strain on traditional open-source contribution models.

...we’ve found that reviewing, correcting, and integrating external PRs often takes more effort than implementing the same fixes directly....
As a result, community PRs have increasingly become a bottleneck rather than a force multiplier.
...
We no longer accept unsolicited pull requests.
...
In an AI-accelerated world, code itself is no longer the scarce resource. Understanding the problem, identifying the right solution, and making good prioritization decisions are the hard parts. ...
Once the right solution is clear, the implementation is usually straightforward.
...
Community contributions remain incredibly important to Codex. The highest-leverage ways to help today include:

- High-quality bug reports ...
- Root-cause analysis ...
- Thoughtful feature requests ...
- Feedback ...
- Signals that help us prioritize ...

geek-merlin’s picture

#21: The analysis is thorough and highly relevant.

Let's face it, our "contribution point game" was even before AI challenged by stochastic-guesswork strategies (in the sense of lack of thoroughness). AI only scales those strategies.

+1e5 for taking this analysis as most relevant building stone.

jonathanshaw’s picture

Here's an idea:
a "steps to reproduce" tool. An independent, freestanding, web-based investigation tool.
You give it (a) a Drupal issue link (b) descriptions of the circumstances in which you've encountered a bug.

It kicks off an agent that works away, trying to create a reproducible test case based on the circumstances you've described, drawing on other relevant existing tests. It systematically explores what circumstances do and don't cause a test failure, establishing the simplest possible reproduction.

It prepares both (a) new tests (at least one failing, possible some passing too) and (b) a natural language description of the "steps to reproduce". It emails the reporter when it's done working, they come back and can review what it says. If the reporter believes in the agents analysis, they can post a link to the investigation and the discovered steps to reproduce.

scott falconer’s picture

@jonathanshaw I have a few agent skills that together start down this path:

https://github.com/scottfalconer/drupal-intent-testing/
It allows an AI agent to drive a real browser, performing the kind of semi-random, exploratory verification a human QA tester would do.

If you also have https://github.com/scottfalconer/drupal-issue-queue installed it'll pull down the specifics from the issue to create the test.

and then if you have https://github.com/scottfalconer/drupal-contribute-fix installed it'll attempt to package that as a fully tested MR / issue comment (while also detecting and triaging any other issues)

I think there are some high-leverage areas your Codex post mentions that we could incorporate.

My dream for these tools is: When an agent runs into an issue it 1) figures out the root cause 2) searches the issue queue for related issues 2) helps the human contribute effectively to either an existing issue or opens a new issue.

lolandese’s picture

AI-Assisted Decisions: Refactor vs. Rewrite in Contrib Modules

AI can significantly aid the critical decision between refactoring existing contrib code versus a ground-up rewrite. This is particularly relevant for complex modules with extensive legacy codebases that evolved through multiple Drupal versions and maintainers.

Consider modules that originated in Drupal 6, migrated through subsequent versions to Drupal 10+, and accumulated technical debt from numerous maintainers over the years. These modules often lack comprehensive test coverage, making maintenance challenging even with AI assistance. While AI excels at understanding and improving existing code, it can struggle with deeply entangled legacy patterns that resist incremental improvement.

A complete rewrite has become more feasible with AI. Whereas previously such efforts required months of work, AI can now dramatically accelerate the process while maintaining code quality. However, rewrites still face the significant challenge of providing migration paths when the underlying data architecture changes fundamentally.

Practical Example: Tablefield Module

The Tablefield module exemplifies this scenario perfectly. With 14,000+ reported installs on Drupal 8+ sites, it remains popular but presents maintenance challenges. Originally conceived as a Drupal 6 module, it has had 12 maintainers and hundreds of contributors across its lifecycle.

I used AI to generate a comprehensive code analysis. The complete chat session documenting this analysis process is also available for transparency.

Technical Note: When conducting this analysis, AI encountered network restrictions when attempting to access the git.drupalcode.org repository directly. Providing AI with the module tarball worked perfectly as an alternative approach.

lolandese’s picture

nod_’s picture

If that is the case, please take care of disclosing AI use when the comment is generated totally or partly with AI to conform with d.o contributing guidelines. The type of post above is wordy in a way LLMs typically are.

spec0’s picture

Let's focus less on AI assistance disclosure and, instead, focus on future AI saturated MRs pushed to maintainers. For example, MRs can transition from just code suggestion to detailed spec driven proposal, no spec - no viable proof of intent from community(subjective author proposal is not sufficient) , furthermore, quality gate check for risk-based human intervention decision, close the MR otherwise(sorry if smth similar has already been suggested). Closing MRs that do not follow requirements is easy because we do not expect human users to manually open them, it is automated. The review process is the only human part, ideally, relaxed "quick check" review based on confidence. Serious risk with this romantic idea, however, is code smuggling and obfuscation. Is this preventable? Can review be assumed from any community member instead of only from maintainer, mirroring peer review in academia?

catch’s picture

For example, MRs can transition from just code suggestion to detailed spec driven proposal,

Most MRs are just small bugfixes and don't need a 'detailed spec driven proposal' or anything like it. Requirements like this are an invitation to even more slop that humans eventually have to wade through.

scott falconer’s picture

re: 'detailed spec driven proposal'

There's likely a good middle ground here. One of the benefits of coding "by hand" is it often forced you to think through problems / edge cases / trade-offs etc. as you went. This step is very easy to skip now, but it doesn't lessen the need for it to happen.

I think the general expectation should be: before you send off something for review, ensure you've put some sort of cognitive effort into it. Otherwise it's just offloading the thinking burden to someone else.

spec0’s picture

Just to clarify, as this part seems to be misunderstood, @catch engineers are not dealing with "small bugfixes", it is automated self-improvement process, the spec driven part represents only the desired outcome, the rest is details.
@scott falconer coding by hand is only valuable if deep understanding of the systems details is critical to achieve the desired outcome and/or make predictions regarding future consequences that may result from current decision. There are countless practical reasons why MRs must follow standards and generally be done the "Drupal way", we know this from painful experience. However, most of this knowledge can be compressed and abstracted away, which gives the unique opportunity to shift the cognitive effort to places that really count in the long run. For example, think about the time and effort spent on Canvas module for the community(respect to anyone involved), to put it into perspective, the lack of this feature actually gave opportunity to many "CMS" systems that offered only canvas type solutions to step firmly into the market. Curiouser and curiouser, it seems like the web browser document layout representation layer will be used mainly from a core of users that are more comfortable with this specific representation("back in my days" situation). Don't get me wrong, I'm not saying that LLMs would have magically prevented any of this if they were around 10 years ago, definitely not with the current level of cognitive load required from maintainers of the project. My point is, the most efficient way to mitigate all of the above is to progressively abstract away the technical details of the implementation, thus the role of the open source maintainer becomes ownership of the end result, hence the "detailed spec driven proposal". Easier said than done, in my opinion, this should be the main focus of the community at the moment.

catch’s picture

Just to clarify, as this part seems to be misunderstood, @catch engineers are not dealing with "small bugfixes"

Oh really? Please explain.

spec0’s picture

@catch, you are right, I was not referring to the current state. This is a reference to hypothetical, romantic, and practically within reach future.

dww’s picture

This is a reference to hypothetical, romantic, and practically within reach future.

You know, that "romantic, and practically within reach future" where the unquenchable demand for data center power and cooling has required such a massive increase in the burning of fossil fuels that the climate becomes basically unlivable for humanity, and where there's no longer enough potable water for human consumption or agriculture. In addition to wars being fought over access to all those fossil fuels, we'll also have wars over drinking water. That "romantic future" (basically already here) where there are massive layoffs of the affluent "white collar" workforce that has been propping up the global economy, ushering in a prolonged period of economic misery and recession as the effects ripple through to everyone else. That "romantic" future where all of human creativity has been scrapped and ground up into sawdust to be glued back together into plywood slop. No thanks!

Over the years, there have been a great many things I disagree with Larry Garfield about, but his recent blog post is completely spot-on:

https://www.garfieldtech.com/blog/selfish-ai

It should be required reading for all y'all "sloptimists" that are hyping this disastrous technology as if it's going to make everything so much easier and better for you personally. At what gargantuan cost? Are you actually willing to pay that price? Really?

I think the general expectation should be: before you send off something for review, ensure you've put some sort of cognitive effort into it. Otherwise it's just offloading the thinking burden to someone else.

That's a nice thought. I wish anyone hyping and using this technology gave 2 $shits about offloading the burden and costs on someone (everyone) else. It's built into the fabric of the technology itself. The entire thing is based on "externalizing" the costs. So yeah, I have 0 faith that the people using LLMs will all of a sudden find their morality and realize that their use of "AI" is costing and burdening other people. Instead of hoping that works, I'd rather do #3574093: Ban slop issue summaries and comments in the (probably vain) hope that Drupal will find a way to be relevant in a few years by bolding standing by our values as a community (that supposedly cares about each other) and declaring "proudly built and maintained by humans".

"The LLMs are getting 'smart' enough to figure out how to operate without destroying the planet and they will find a way to save the economy, too!" -- yeah, right. What could possibly go wrong? Let's burn down the basis for all life on Earth in a final mad dash to try to simulate a better world (or at least further enrich a tiny parasitic minority of already filthy rich tech oligarchs).

After over 3 decades as a computer scientist, I fear I'm about to have to make my exit and go back to being a musician, bodyworker, and farmer. This industry has already ruined itself, and is going to take the whole planet with it. If there's any time and hope left, please, dear reader, Just Say No(tm).

p.s. If you think I'm exaggerating the likely economic impacts:
https://www.theguardian.com/technology/2026/feb/24/feedback-loop-no-brak...

spec0’s picture

@dww well put, these are real, ignorant to dismiss problems. I am not advocating for adopting the latest trend, centralized cloud solution for LLMs that has ridiculous energy output requirements. The opposite, using small models for a specific type of tasks that can practically work on a single device(in theory) that are rigorously checked and guard-railed by deterministic tools is more where I am going with this. This is not a replacement for white-collar jobs, furthermore this is not really under our control either. The "romantic" part here is not - LLMs can do my work so that I can watch cat videos, future where we benefit from the new technology in a responsible way without being left behind is. Neither Drupal, nor any other CMS system can actually sell something like - "Well, it will take 20x more time and effort to build a feature/solution because we are saving the planet". There must be a middle ground here. Is the market going to need computer scientists 5 years from now? Probably not in form that shapes today's business software engineering. In any case, if maintainers are free from analyzing AI slop day and night, or similar quality processes that take most of their time at the moment, then the discussion regarding saving the planet can proceed, one step at a time, first we need to prove that these problems, you are referring to, can be mitigated effectively within a controlled community environment before challenging the whole system on a higher level.

Ps. Regarding the rigorous check of LLM output as I described in the beginning of this thread has progressed, it shows from experiments to allow for a lot cheaper audit process than just putting more tokens to LLMs(or yet another cloud subscription), although there is still LLM judge step in the process, I cannot find a way to verify semantically the result of the generated code without it, hence the small models working on-premise.
https://gitlab.com/specoto/bayesian-ai-stats-python
Example of the quality gate: https://gitlab.com/specoto/bayesian-ai-stats-python/-/jobs/13445918867

cainaru’s picture

I think @nod_’s comment #31 on #3568936-31: Embrace the chaos, add a couple of AGENTS.md file to core is also worth reading and taking into consideration here.

spec0’s picture

As @cainaru pointed out, this is both LLM generated and mix of deterministically/LLM curated code here, which is the whole point. Minimal review process, based on confidence.

cainaru’s picture

spec0’s picture

From further thought on this discussion, I believe, the real danger here is labeling any LLM assisted improvement as dangerous as it is not the "natural" way of doing software engineering, keep in mind that there is nothing "natural" about software engineering in the first place. In this context let me tell you a story...
Once upon a time, agriculture was not the natural way of providing for the community, it made hunters and gatherers obsolete, the gods didn't accept it, and the priests resisted it! Tribes were divided, went to war to protect the gods ways.
As time passed and famine struck, the agriculture tribes survived, hunters became soldiers, to protect the new gatherers who became also scientists working from home.
The end.

ghost of drupal past’s picture

Neither Drupal, nor any other CMS system can actually sell something like - "Well, it will take 20x more time and effort to build a feature/solution because we are saving the planet

LLMs can, at best, help with code generation. And even that's a huge stretch because they produced plausible but not correct code which by this nature is actually harder to catch bugs in. But anyways: in my near 22 years of writing Drupal patches producing code was never a bottleneck. Not even once. Figuring out the what to code was. Always was. Is this the right solution? Is it documented in a way someone who never saw the functionality can understand it? The most helpful review is always "what if instead we did ". Not a chance an LLM can do that.

I have narrowed the sister issue to "Ban LLM code contributions" because these days even machine translation is mostly done by LLMs (alas) and I do not want to bar non-English speakers to use them to help communicating. No other reason for changing the title from "Ban LLM contributions" to "Ban LLM code contributions".

All the hyperbole does not help here. Beyond the obvious accuracy problems inherent in the technology we have serious ethical concerns and that's what needs to be addressed here. Small models running locally is a fig leaf to pretend there is an ethical way to run these while not talking about all the problems in creating these models. And also it's just pretend because they are so slow it's inevitable anyone using these will move eventually to cloud based faster solutions, guaranteed.

spec0’s picture

Beyond the obvious accuracy problems inherent in the technology we have serious ethical concerns and that's what needs to be addressed here.

But anyways: in my near 22 years of writing Drupal patches producing code was never a bottleneck. Not even once. Figuring out the what to code was.

@ghost of drupal past, I agree, the "code generation" problem moving its way to academical space and your acknowledgement of the this is a step forward, in my opinion, the whole point is to prevent the "move to the cloud" part while there is still time to prove it. This is the call to action, not the hyperbole.
Ps. Do you have proof for the problems you are referring to?
For example, show me where the math is bullshit in my LLM generated example.
Hint: I am not asking for a code review, rather who is going to really catch this in the future by reading code? The real catch is edge case testing. So, writing/creating an edge test case breaking the code is the best approach to figure out what to code.
Ps2. BTW, the quote you made from my previous comment is out of context and suggests that you might be aiming for ad hominem argument, which is even less helpful than my hyperbole...

ghost of drupal past’s picture

Ye, I do not have much respect for those who are hell bent on destroying open source, poor communities, democracies and our climate. Sorry about that. Have your read dww's link? https://www.garfieldtech.com/blog/selfish-ai

spec0’s picture

Yes, a few weeks ago. If I have to be the scapegoat taking all the punches, so be it. Keep in mind that not delivering your doom sayings has equal repercussions. Show unambiguous proof(not mathematical, practical one).

cmlara’s picture

Not Drupal focused however this feels relevant for this issue on how AI is being used by maintainers, especially with #3573275: Do a rewrite of the module as an associated issue.

https://arstechnica.com/ai/2026/03/ai-can-rewrite-open-source-code-but-c...

spec0’s picture

Ok, just to re-iterate, @cmlara is correct regarding the re-write of modules just because it was suggested by LLM agents review. The motivation based on LLM review to start from scratch is dangerous because it implies that maintainers did not do their homework and introduced some fragile solution waiting to explode... This is not true, because if it were it would have already destroyed any practical implementation. The reason of the "harsh" LLM judgement is based on the task it has been given - review the code and make improvement suggestions, which is what it did, so far so good, then the real issue ensues(the subjective interpretation), maintainers gets startled that they did not do their job correctly from the point of view of the community(not true, this is similar to a new colleague joining your organization and starts pointing out inefficiencies from first glance not knowing the details behind the motivation), thus decision to start rebuilding in order to meet these ephemeral standards they might have missed. The bottom line is, if it works don't touch it, LLM review is no more than a suggestion based on other similar examples that most probably have very little to do with the outcome we are looking for, instead they have to do with semantic structure matching intent. This is not the same thing as tablefield module is wrong... Community, real world applications decides the right/wrong part, not LLM review.
That being said, when do we use LLM review? Well, it has value mostly when we generate code with LLMs, self-correction. This whole process is like Alice falling down the rabbit whole type of situation, in my experience, seemingly loosing control over time.. unless we refuse to let go of control.

Fear that an individual engineer does not have the same level of knowledge as an LLM driven agent may seem like - "I need to make a step back let go of control" but this is misleading, in fact, the agent needs guidance at this point more than the engineer, this is the part where the bad, useless code gets generated. My point is, all the compressed knowledge of any LLM model is meaningless if engineers make a "step back". So, again, AI agents depend heavily on confirmation. For how long? Not sure, to be honest. At least until any action, including physical world interactions and experiments, performed with agents can be qualitatively assessed and added permanently for future use from any system that implements the same interface(the holy grail of software engineering or even science - automated practical standards focused on discovery and innovation without the endless committee discussions and biased opinions, this is a good thing, only if used openly and responsibly).

The remaining question is, what are we(engineers/humans and our children) going to do when the above process is practical?
Now comes the wild speculation part(we are not talking about LLMs at this point), feel free to overthink it(suspiciously BS on my part).
Here we go, based on the fact that the real shift here is the paradigm of intellectual work, humans that want to participate in the objective scientific discovery process without impeding it(like impatient children) will have to make a voluntary decision to "enhance" themselves in order to match the speed of the discovery, the cost of this is changing/loosing their personality, but not in the sense of slavery, humans are not really a critical component for this to be forced to do so, just a choice and a side effect. The rest of humanity that chooses not to participate in the same process has no other option but to live a simple human life - climb the highest mountain, fall in love, travel to Jupiter and watch Ganymede's auroras... Secrets of the Cosmos, however, are beyond their grasp in the biological sense.
Now, back to the present, too many unknown unknowns. Which means this not in the hands of a few players that control LLMs(oligarchs, they also become victims of their own vanity). Furthermore, large centralized organizations do not really make sense in the above scenario.
It seems this is all in our hands, and we can not afford to abstain from this process, instead focus to control/participate in it while we can is the right course of action, in my opinion, as an alternative to https://www.garfieldtech.com/blog/selfish-ai.
Cheers!

webchick’s picture

Going back to the OG intent of this issue, raising #3557316: PerimeterX blocks legitimate AI tools from accessing issues on Drupal.org as a pretty significant blocker to AI tools being more useful in the Drupal world; it can't be expected to have sufficient context if it's literally forbidden from accessing it.

As a "what's working," though, tonight I asked Claude to do one of my least favourite maintainer tasks: apply a stale patch (#3581687-6: Guidance on how to write excellent documentation in this case), and it worked! :D

Prompt: there is a file in this directory 3581687-documentation-guidance_0.patch which no longer applies. One reason is there has been changes to AGENTS.md since this patch was rolled, and another is because the how-to-write-documentation.md was moved to a subdirectory. Are you able to nevertheless resurrect this changeset and apply it?

Response (after inspecting some things and formulating a plan):

agents/AGENTS.md — Updated the Documentation entry from the old inline description format to the structured When to use: / Covers:  multi-line format (matching the style of the other entries). Kept the current correct path (/SKILL.md). Replaced the em dash from the
 patch with a comma per project convention.                                                 
                               
 skills/how-to-write-documentation/SKILL.md — Replaced the placeholder body with the full documentation content from the patch: all 8 sections covering where Drupal docs live, API documentation guidance, core principles, common mistakes, a good example, and how to get help. The existing (richer) frontmatter was preserved as-is since it's more complete than what the patch had.

What's cool about this is it not only "re-rolled" the patch but also knew when to take the new file's conventions vs. the older one. It also picked up on guidance from another issue to not use emdashes in its generated text.

What's NOT cool about this is it doesn't know to check the contribution record when generating the commit message, which is how I landed on that PerimeterX issue. :P

webchick’s picture

Here are a couple of demos that have been given as part of the new Drupal AI Learners Club initiative that are relevant here. And both show different ways of ensuring you never submit code you don't understand.

Using AI to solve a Drupal.org issue (Scott Falconer)

https://www.youtube.com/watch?v=QFBCqnn_8C8

In this demo, Scott talks through how he points OpenAI Codex at a Drupal.org issue to solve it, including summarizing the issue, asking questions of the root cause, working back and forth on the proper fix, and having AI verify its own fix to ensure it works before creating a merge request.

Using AI for Visual Regression Testing (Mike Herchel)

https://www.youtube.com/watch?v=CDREM3bxL-0

In this demo, Mike talks through how he used Claude Code to write a visual regression testing framework for the new admin theme (which went on to find a regression in a Drupal core issue as well) for Drupal 12, as well as using Claude Code to break a large changeset into reviewable parts.