Guidance for AI-assisted Drupal development using local models [#3581977]

Problem/Motivation

Most conversations about AI-assisted Drupal development (especially at DrupalCon this week) seemed to assume you're sending your code to a commercial API, such as OpenAI, Anthropic, Google.

For some developers and organizations, that's not acceptable. Perhaps this is due to sensitive client data, regulatory constraints, cost, longevity concerns, vendor lock-in, or an ethical commitment to software freedom.

Drupal itself has always been free and open source, but most discussions of AI coding tools seem to default to proprietary services. In most cases, local models are not nearly as good as the big names, but I think it would be great to determine and document the best known (or least bad) practical workflows for AI-assisted Drupal development using only open source and free software.

Steps to reproduce

Try setting up a complete AI-assisted Drupal development workflow using only free and open source software. As far as I can tell, there's no clear guide for how to put them together for Drupal work.

There are some good resources, such as:

Rich Lawson's talks on "Ollama: Using Open source AI for Drupal Development" (Drupal 4 Gov, New England Drupal Camp)
Drupal's Ollama Provider
Articles

Proposed resolution

Create documentation that walks Drupal developers through setting up and using open source AI coding tools with locally-run models. The guide should cover practical workflows, not just theory, and should be specific to Drupal development patterns (module development, theme work, configuration management, hook implementations, plugin creation, etc.).

Here are some tools and resources that might be worth documenting:

Open source AI coding agents:

OpenCode: An open source AI coding agent that works in the terminal, as a desktop app, or as an IDE extension. It supports local models through Ollama and connects to over 75 model providers. It includes Language Server Protocol integration for code intelligence, which helps with PHP and Drupal-specific patterns. MIT licensed.
Aider: An open source terminal-based AI pair programming tool with deep git integration. Every AI edit becomes a git commit with a descriptive message, which makes changes reviewable and easy to revert. Supports local models through Ollama. Apache 2.0 licensed.

Local model infrastructure:

Ollama: Run open source LLMs locally. Supports models like Llama, DeepSeek, CodeLlama, Mistral, and others. Works on macOS, Linux, and Windows.
LM Studio: A desktop application for running local models with a straightforward interface.

Model discovery:

models.dev: An open source database of AI model specifications, pricing, and features. Community-contributed and maintained by the creators of SST. Helpful for comparing models and finding ones that run well locally for coding tasks.

Editor-related stuff:

Drupal-specific considerations:

The guide should address how to configure these tools with Drupal coding standards, how to provide context about Drupal's plugin system, services, and hooks, and how to work with Drupal's directory structure. It should also cover the tradeoffs: local models produce lower quality output than the largest commercial models, and developers should understand where local models work well (boilerplate generation, simple refactors, code explanations) and where they struggle (complex architectural decisions, multi-file coordinated changes in large codebases).

Remaining tasks

Research and test which open source models work best for Drupal/PHP development when run locally
Write step-by-step setup guides for OpenCode + Ollama, or perhaps Aider + Ollama workflows
Document Drupal-specific configuration (coding standards files, context prompts, repository maps)
Test workflows against common Drupal development tasks: creating a custom module, writing tests, building a theme, debugging, etc.
Gather community feedback on additional open source tools that belong in the guide
Review and update as the tooling evolves (this space moves fast)

User interface changes

None. This issue concerns documentation only.

API changes

None.

Data model changes

None.

Comment	File	Size	Author
#26	local-ai-drupal-guide.md	5.57 KB	murrow
#20	local-ai-drupal-guide.md	5.37 KB	murrow
#13	gemma-4-response-time.png	242.94 KB	mtift

Comments

Comment #1

27 March 2026 at 19:10

mtift created an issue. See original summary.

Comment #2

webchick

she/they

English

Vancouver 🇨🇦

commented 27 March 2026 at 19:18

Priority:

Normal

» Major

Yes! Great call-out.

Comment #3

webchick

she/they

English

Vancouver 🇨🇦

commented 28 March 2026 at 06:45

Some thoughts:

The primary intention of this project is to onboard Drupal folks who are new to AI in a way that doesn't burn out maintainers, and to increase and proliferate the Drupal community's shared understanding of AI, as quickly as possible.
That means that rather than presenting the breadth of available options, we want to pick one, canonical approach, that is inline with defacto standards that folks "off the island" who are neck-deep in the AI space use every day (Why? Because we hope to one day attract those people to contribute to Drupal, and it will help a ton if our toolset is already familiar to them).
Right now, that de facto standard in the commercial AI world is Claude Code + Agent Skills. So that's the foundation we're building on in this repo.
However, many folks either can't or won't pay for Claude, and we do not want folks using open source tools to have a "second class" experience (as much as possible).
Therefore, we will evaluate which tools to recommend through the lens of what is the BEST set of open source tools that gets someone closest to a Claude Code + Agent Skills experience without having to spend money? (We can definitely allude to other options, but remember that this is the BEST practices project. :D)

If this is our lens, then based on some research + some back and forth with Claude tonight:

Open source AI coding agents: OpenCode looks like the winner. It is open source (MIT licensed) and supports the same agent skill format as Claude Code so our work here can be used by either tool. This gives both open source + free users and users willing/able to pay a similar experience.
Local model infrastructure: Ollama It's what the community defaults to, OpenCode has native support for it, and it's easy to pull a model and start using it in your workflow.
Local Model: Looks like Qwen3 32B, which can be run locally (albeit gets disappointing results on ).

I did not see an easy way on https://models.dev/ to filter on open / local models vs. not. (I am also VERY tired after DrupalCon so might have missed it. :D) https://www.swebench.com/ does have this info, and while it only scored 40% (womp-womp) Qwen3 32B still scored the highest of the open models.

Despite everything I said up there about canonical blah blah, we might also want to document for this audience an "almost-free" model alternative (DeepSeek or Kimi) that use cheap API calls.

Comment #4

mtift

he/him

English

Minnesota, USA

commented 28 March 2026 at 14:05

I think that is an excellent and worthy goal to pick “pick one, canonical approach.”

As of this week (month, whatever), I would agree that Drupal development with a Claude Pro ($20/month) description is notably better than Drupal development with the free tools. Not only that, but I can be even more productive with a Claude Max 5x ($100/month). And my colleagues report they can be even more productive with a Claude Max 20x ($200/month) plan. I know some Drupal developers who almost always have Claude running multiple agentic workflows, solving real problems. So should that be the “one, canonical approach”? I don’t know.

I talked to some folks from nonprofits this week at DrupalCon who are not allowed to use AI because of concerns about data sovereignty. I talked to others that were only allowed to use the $8/month nonprofit subscription, which they told me had limited features compared to Claude Pro. Other folks I know only have access to Gemini, because that is what comes with their organization’s Google subscription.

Practically speaking, saying “Claude Code” is the canonical approach seems “correct.” It reflects the reality of my experience in the Drupal community. Most Drupal agencies, and the clients they serve, probably use Google/Microsoft for email/calendars/meetings, GitHub for code hosting, Amazee/Acquia/Pantheon/etc. for hosting, Slack for chat, plus a suite of a whole lot of other proprietary services that have a cost associated with them. Adding a Claude Pro subscription probably won’t bother most folks, and is probably the answer they want to know.

FOSS stack

So with all of that said, what I’m talking about is the “one, canonical approach” for people committed to software freedom, and perhaps that belongs in another Drupal project. I’d be totally fine to create one of those elsewhere, such as ai_best_practices_foss, and I think that would be logical and avoid confusion.

What I would prefer, however, is if this project had a sub-module that could be optionally enabled or somehow maintain a separate stack, not just that is “free as in beer,” but that aligns with other ethical frameworks. Here are some of my specific use cases:

Coding in schools. When I am teaching kids to code in schools, I would love to be able to teach those kids Drupal AND to show them that they can use AI without also telling them they need a subscription to Claude Code. I’d like to be able to go into a Linux User Group for kids and not tell them they need to use proprietary software.

Small nonprofits. For the past 15 years, I’ve been running Drupal site on a free DreamHost site (with free domain registration and free email). It’s nice that’s free, but more importantly, it’s nice that the nonprofit can use Drupal and make choices that align with the ethical commitments of the organization. I like to add a How We Built This Site link that describes the choices that we make and how other people could do the same thing without compromising their values. Much like with AI, many of the choices we make proprietary alternatives that work “better,” such as Drupal’s Simplenews over “free” email services, Drupal webforms (rather than Google forms), Calendar module (rather than Google calendar), Drupal search (Google search), OpenStreetMap (rather than Google Maps), etc. I’d love to be able to add some open-source AI features that work “pretty good.”

What goes in this repo

I still run Debian stable, use Vim as my code editor, Thunderbird for mail, Firefox as my browser, etc. I’m far from a free software purist, but I do my best when the information I have. I’ve always appreciated how Drupal generally aligns with my values and how I can be confident in telling people that they are welcome in the Drupal community, that we are ready and eager to welcome you and teach you Drupal, and that you don’t need much more than a computer and a connection to the Internet (which, admittedly, excludes many people). Oh, and lots and lots of time, which excludes other people.

In some ways, this reminds me of when I started using Ubuntu on donated computers in 2004. Back then, it didn’t work nearly as well as Mac OS on my PowerBook G4, but it gradually got better and better. Likewise, in my view, the current top-tier Ollama models seem generally considered as good as, or better than, the original ChatGPT (GPT-3.5) from 2022-2023. So I think for many audiences, a “pretty good” open model that aligns with ethics and doesn’t support the big tech companies would be the “canonical approach.”

Whether or not the guide for how to do that belongs in this repo might be a topic of discussion, but I wanted to make my motivations as clear as possible.

Comment #5

webchick

she/they

English

Vancouver 🇨🇦

commented 28 March 2026 at 19:30

Sorry, I was totally agreeing with you, and the three specific projects I called out as part of the alternative AI stack are open source (or as "open source" as models get, that's a whole discussion unto itself :P).

All I'm saying is, when you (or whoever) create the guide, create it around one strong recommendation vs. exposing all of the possible various options that are available.

It was clear to me walking around and talking to folks at DrupalCon that there is a huge contingent of folks (70%?) who are "AI curious" and genuinely want to get started with AI but have absolutely no idea how to get started. And in the same way Drupal's >installation docs just say "use DDEV" we need similar opinionated guidance for AI.

Once you've gotten started, then it makes sense to explore alternate tools, but when you're brand new "pick from one of these 3457 options" is overwhelming and scary.

Comment #6

fathershawn

English

New York

commented 31 March 2026 at 17:52

I keep socializing this essay from Carson Gross not only because I find it generally insightful but in particular because I am drawn to the idea of re-aligning the helpful agent to be a learning agent. I realize on reflection that I am drawn to solutions that enhance our understanding rather than just produce. I think that's what we should encourage.

If you don’t use it as a code-generator but rather as a partner to help you understand concepts and techniques, it can provide a huge boost to your intellectual development.

Comment #7

mtift

he/him

English

Minnesota, USA

commented 31 March 2026 at 21:11

@fathershawn That's a great article from Carson Gross. Thank you for sharing (By some strange coincidence, I was just listening to the Talking Drupal episode with you and Carson today!)

Since you commented here, I'm curious if you saw a connection between that article and using open source and free software for AI-assisted Drupal development. For instance, are you seeing connection between the open models being more predictable like, say, a high level programming language?

Comment #8

mtift

he/him

English

Minnesota, USA

commented 1 April 2026 at 00:41

Here are a few more considerations of performance and freedom:

Performance

I was surprised to read (and see it echoed elsewhere) that open models now trail state-of-the-art proprietary models by only about three months on average.

I don't fully understand all the benchmarks -- and they are changing rapidly -- but it looks safe to say that last year's top models from Anthropic, OpenAI, and others don't match what's freely available in the open today. Worth noting: most of these models are freely available ("free as in beer"), but their licensing falls under "open weights," not traditional open source.

Software freedom

So depending on the targeted audience, this guidance could address "open source LLMs" or also "open weights." And it seems like folks wouldn't have to give up as much in performance as you did even a couple years ago to use open models.

The other big factor, it would seem, is computing power. Based on current 2026 standards for open-source models (like Llama 3 or Qwen 2.5), someone might likely need a fairly beefy computer to run Drupal on DDEV and open models. But a lot of "Power User" with 64GB of RAM and a fast CPU might do fine. For instance, on a computer with a dedicated NVIDIA GPU (RTX 3060 or better) or an Apple Silicon Mac (M1/M2/M3/M4), the LLM will run on the GPU/Unified Memory. Then again, if free software is a matter of liberty, not price, maybe we don't need to consider price.

Comment #9

fathershawn

English

New York

commented 1 April 2026 at 01:21

Since you commented here, I'm curious if you saw a connection between that article and using open source and free software for AI-assisted Drupal development.

Forgive me @mtift for not thinking enough about the IS and too much about the first half of the issue title. Using OSS locally is a great thing to promote and https://ollama.com where I would start as well. I suppose in this context I am suggesting configuring those tools to promote learning rather than to produce solutions.

Comment #10

mtift

he/him

English

Minnesota, USA

commented 1 April 2026 at 16:11

@fathershawn No problem!

Also popping in here to note some other sources that may be related to this issue:
- Drush integration: https://www.drupal.org/project/ai_drush_tools
- Documenting how to use MCP module(s) with Opencode: https://www.drupal.org/project/mcp/issues/3569504. The Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources and tools.
- I think we might end up recommending a Drupal-specific opencode plugin (to work with an Ollama module) https://opencode.ai/docs/plugins. I poked around at the various lists of list and I haven't found one yet, but that might be a thing.

Comment #11

webchick

she/they

English

Vancouver 🇨🇦

commented 2 April 2026 at 02:11

New open weight model (Apache 2.0) that’s generating some buzz: https://x.com/arcee_ai/status/2039369121591120030?s=46&t=dynBgL_-HC_uqC2...

HuggingFace: https://huggingface.co/arcee-ai/Trinity-Large-Thinking

I think one thing that might be good to define in this issue is what we mean by “open source and free” when it comes to models.

Do we stick to the OSI definition? Do we care about country of origin? Etc.

Or do we sidestep these concerns by recommending FOSS tooling *around* the model, and let people pick whichever one they want?

Comment #12

webchick

she/they

English

Vancouver 🇨🇦

commented 3 April 2026 at 06:20

Oooooh. https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/

An open-source license
You gave us feedback, and we listened. Building the future of AI requires a collaborative approach, and we believe in empowering the developer ecosystem without restrictive barriers. That's why Gemma 4 is released under a commercially permissive Apache 2.0 license.

This open-source license provides a foundation for complete developer flexibility and digital sovereignty; granting you complete control over your data, infrastructure, and models. It allows you to build freely and deploy securely across any environment, whether on-premises or in the cloud.

Hugging Face: https://huggingface.co/collections/google/gemma-4

Comment #13

mtift

he/him

English

Minnesota, USA

commented 3 April 2026 at 12:50

Status	File	Size
new	gemma-4-response-time.png	242.94 KB

Who, that trinity Large Thinking is 797GB file (https://huggingface.co/arcee-ai/Trinity-Large-Thinking/tree/main) that probably requires 200+ GB of RAM to run. I have a 2TB drive and 64GB of memory, which feels like a lot, but I didn’t try that. Doesn’t seem practical for most folks. Or maybe there is a smaller version that might work better.

I’ve tried out Gemma 4 on both ollama:

# Get and run ollama run Gemma 4
ollama run gemma4:26b

And llama.cpp (which ollama uses under the hood and can be faster):

# Download Gemma 4
wget -O ~/models/gemma-4-26B-A4B-it-UD-Q5_K_XL.gguf \
  "https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/resolve/main/gemma-4-26B-A4B-it-UD-Q5_K_XL.gguf"

llama-server -m ~/models/gemma-4-26B-A4B-it-UD-Q5_K_XL.gguf \
  --port 8484 \
  --jinja \
  -c 32768 \
  -fa on \
  --temp 1.0 \
  --top-k 64 \
  --top-p 0.95

I was surprised at how slow it was. I guess asking "what kind of tasks are you good at?" is a complex question.

gemma 4 test results

Trying out those 2 models makes me thing that FOSS isn’t the only consideration for AI-assisted Drupal development with FOSS tools. It seems like it wouldn’t be particularly helpful if most Drupal devs can’t run it on their laptops.

Comment #14

mtift

he/him

English

Minnesota, USA

commented 3 April 2026 at 13:48

I’ve been thinking “digital sovernity” (https://events.drupal.org/rotterdam2026/news/drupalcon-rotterdam-why-dig...) and “software sovereignty” (https://dri.es/the-software-sovereignty-scale).

Drupal is made possible by GPL code supported by 501(c)(3) organizations – which exist for the public good – such as the Drupal Association, DDEV Foundation, PHP Foundation, and the MariaDB Foundation (“aiming at US 501(c)(3) status”).

I thought maybe in addition to running sudo apt install php, mariadb-server, apache2, etc. we might be moving toward a world where AI tools could be like, suitable for the Debian repository, for instance.

In my view, the closest thing to that is the Allen Institute For Artificial Intelligence, which is building a “fully open AI system.” Their OLMo and Molmo is some of the most transparent and educational material you can find on model building

So, from what I can tell, the projects that seem most aligned with Drupal and FOSS values would like to a stack such as:

# Download Olmo 3.1
wget -O ~/models/Olmo-3.1-32B-Instruct-Q5_K_M.gguf \
  &quot;https://huggingface.co/unsloth/Olmo-3.1-32B-Instruct-GGUF/resolve/main/Olmo-3.1-32B-Instruct-UD-Q5_K_XL.gguf&quot;

# Run Olmo in llama.cpp
llama-server -m ~/models/Olmo-3.1-32B-Instruct-Q5_K_M.gguf \
  --port 8383 \
  --no-jinja \
  --chat-template chatml \
  -c 32768 \
  -fa on \
  --temp 0.6 \
  --top-k 50 \
  --top-p 0.95

And Olmo was painfully slow. It might be exactly what folk need who just want something to run in the background, but all I can say is it felt A LOT different from what I’m used to with Claude.

So it seems like this might be more like composer or symfony situation, without nonprofits or GPL licenses involved. At least for now.

The practical recommendation might be a lot like what @webchick suggested way back at the start of this issue, except there seems to be a trend in folks moving from ollama to llama.cpp for greater control and performance, such as:

Regarding open weights vs open source, I think we can trust using open weights (the model can’t do anything by itself) + open source tooling (can verify the code) + local inference (nothing leaves your network). That’s what we get with llama.cpp. And in my testing, Qwen3-Coder-30B has been pretty darn good. I figured out which one would be best for my particular laptop running llmfit.

So, while I am admitting still early on in my learning, I feel like we might suggest:

And if that feels too slow for whatever the dev was expecting, try llmfit.

I’m not sure if it would be better to discuss in a merge request or not.

Comment #15

webchick

she/they

English

Vancouver 🇨🇦

commented 3 April 2026 at 14:12

Cool, if you think that stack makes the most sense for this audience that’s fine!

I just want to point out though that it seems like we are conflating two concerns:

One is a free/libre open source alternative AI stack that gets you capabilities akin to Claude Code.

The other is an alternative AI stack you can run locally on a mid-range laptop at decent performance akin to Claude Code.

I’m not aware of anything that fits both of those criteria. If you want performance it almost assuredly means running the model on some cloud provider and making API calls to it, but then you’re no longer “free as in beer.”

Comment #16

webchick

she/they

English

Vancouver 🇨🇦

commented 3 April 2026 at 14:17

Haha that having been said… I took a look at https://github.com/AlexsJones/llmfit and that’s pretty cool! :) So maybe indeed recommending Qwen but linking out to that for possible alternatives is the way to go.

Comment #17

mtift

he/him

English

Minnesota, USA

commented 3 April 2026 at 14:47

Title:

Guidance on AI-assisted Drupal development using only open source and free software

» Guidance for AI-assisted Drupal development using local models

Yeah, I feel like it's tough to come up with free/libre recommendations without also considering capabilities, performance, open models vs open source, security, free software, sovereignty, etc.

Ultimately, yes, I feel like the key thing here is offering an alternative to Claude Code that doesn't send your code/data to a third party. Maybe the first step is just a practical setup guide for one workflow, something like OpenCode + llama.cpp + DDEV -- while using the same Drupal-specific configuration from this project and honest notes about where local models hold up and where they fall short.

The broader terminology and licensing questions deserve a brief section for context, but they shouldn't block the practical guide.

I'll poke around more in this repo and see if I can come up with a MR.

Comment #18

murrow commented 3 April 2026 at 20:53

I have a Mistral setup guide that covers Drupal custom module development with OpenCode + llama.cpp + DDEV. I'm bouncing between this and Claude ATM, and am still mostly living in Claude-land. But, I'd like not to be and to have more control on hand. Would you like my (Mistral-created) guide?

Comment #19

mtift

he/him

English

Minnesota, USA

commented 3 April 2026 at 20:57

@murrow Yes, I would love that!

I've been trying to get these to work with OpenCode + llama.cpp + DDEV:

gemma-4-26B-A4B-it-UD-Q5_K_XL.gguf
Olmo-3.1-32B-Instruct-Q5_K_M.gguf
Qwen3.5-27B-UD-IQ3_XXS.gguf
Qwen3-Coder-30B-A3B-Instruct-UD-Q5_K_XL.gguf

But everything seems really slow.

Comment #20

murrow commented 3 April 2026 at 22:10

Status	File	Size
new	local-ai-drupal-guide.md	5.37 KB

@mtift, I am running a Mac Studio M4 Max with 128Gb RAM. I'm attaching a setup document (markdown) that started with Mistral, but has been updated manually. I can say that it works for me :-) . Claude is faster and, probably because I've been using it much longer and have all manner of CLAUDE.md instructions in place for Drupal, etc., makes fewer errors. But, training AI is a longterm investment, IMO.

Comment #21

murrow commented 3 April 2026 at 23:00

Actually, a correction. That was my starting point. I have been using mistral-devstral-small-2-24b

Comment #22

murrow commented 4 April 2026 at 00:13

FYI, my current opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "model": "mistral_custom/mistralai_Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf",
  "instructions": [
    "CLAUDE.md",
    "~/.claude/stacks/drupal.md",
    "~/.claude/stacks/laravel.md",
    "~/.claude/stacks/react.md"
    
  ],
  "provider": {
    "mistral_custom": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Mistral Devstral 24B with llama",
      "options": {
        "baseURL": "http://localhost:8080/v1",
        "apiKey": "local"
      },
      "models": {
        "istralai_Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf": {
          "name": "Devstral Small 2 24B Q8"
        }
      }
    }
  }
}

I have been using CLAUDE.mds because I already had these in place and it has been working well.

Comment #23

webchick

she/they

English

Vancouver 🇨🇦

commented 4 April 2026 at 00:24

Status:

Active

» Needs review

Sweet! Thanks so much for sharing! Marking as needs review.

Comment #24

mtift

he/him

English

Minnesota, USA

commented 4 April 2026 at 00:58

Darn! Devstral 24B did eventually respond, but at ~6 tok/s for prompt processing and ~2 tok/s for generation on a Core Ultra 7 155U (CPU-only, 64GB RAM). A simple 'hello' took over 90 seconds before the first token appeared.

Qwen3-Coder-30B-A3B is dramatically faster on the same hardware thanks to its MoE architecture. It's pretty zippy in a browser, but I've just had problem getting the opencode configuration working. Dense 24B+ models are probably impractical without a GPU. The guide should recommend llmfit early so people don't spend time downloading models that won't run well on their hardware.

I tried some specific flags that helped such as --threads 10 (use more threads), -np 1, and smaller context windows, but they don't close the fundamental gap between CPU inference and GPU inference. The guide should probably include quant recommendations based on available RAM.

Comment #25

murrow commented 4 April 2026 at 01:13

@mtift, probably this is the llama-server. Try this:

./build/bin/llama-server  -m ./models/mistralai_Devstral-Small-2-24B-Instruct-2512-Q8_0.gguf -c 4096  --host 0.0.0.0 --port 8080 --n-gpu-layers 99 --chat-template chatml

Comment #26

murrow commented 4 April 2026 at 01:16

Status	File	Size
new	local-ai-drupal-guide.md	5.57 KB

Updated to cover mistralai_Devstral-Small-2-24B

Comment #27

mtift

he/him

English

Minnesota, USA

commented 4 April 2026 at 13:39

I have had no problem getting descent local models running in my browser using llama.cpp. I can chat with it at reasonable speeds. I try to connect these models to opencode and it becomes painfully slow. I'm sure the RAM differences are part of the issue.

However, for interactive coding agent work, it seems like CPU is the limiting factor, specifically memory bandwidth. A M4 Max has up to 546 GB/s of memory bandwidth, while my Intel Core Ultra 7 tops out around 90-120 GB/s. That's it's roughly a 4-5x difference in memory bandwidth, which translates almost directly to 4-5x faster token generation.

It seems like this limitation is pretty well documented:

Compute performance has hogged the spotlight for most of the PC’s history. Memory and storage also mattered, of course, but most of what your PC could or couldn’t handle was determined by the CPU and GPU inside.

AI is turning that on its head.

So I unless I go buy a new computer just to run local models, I am not sure how I can offer much in the way of advice here. Further, all of this seems really complicated, and if this project is about "best practices" offering advice comparable to "use DDEV" then this might not be the right place to get into running local models. At least not yet.

Comment #28

yautja_cetanu commented 14 April 2026 at 14:37

Some thoughts:

https://fireworks.ai/ is a good place to start as well for exploring opensource models. If we have an approach that works there we can try open source models out, especially if our eval framework starts working and figure out when we are actually at a place where we could recommend open source models to do things and they are good at. I know devs working on high end stuff for government where they have been using Claude Haiku for everything and its working. I think many opensource models will be Haiku level.
https://github.com/musistudio/claude-code-router - This could be used to get claude code and everything set up here working with open source models.
Getting it so you can set up locally is also something worthy of a guide but next level problem compared to the first. There are so many different machines we would need to support. By focusing on fireworks.ai (or huggingface) first we can try out the models themselves.
The main goal of this, as webchick said, is to get a good starting point for people who have NEVER done this before and want to get started. However I think for this to work, we want to also have a "Best practises" version built on top of something devs will actually be using in their day to day work to incentivise everyone to work on it and improve it. So I think making something simple that is customisable for Organisation specific use-cases for example is worth doing. It could be a sub-module, could be another repo, but having some installer for "Local LLM coding" that pulls in the AI best practises but customises things seems like a worthwhile thing for sustainability.