Problem/Motivation

drupal/ai_best_practices scaffolds AGENTS.md and ships reusable skills under vendor/, but it has no automated way to tell a coding agent what is in the project it is working on: which custom modules exist, which services they register, which hooks they implement, or what content types are configured in config/sync.

Without this information, every agent session starts from zero on project-specific facts. On isolated tests with no ambient context, Claude Sonnet answered only 7% of project-specific questions correctly (exact service IDs, hook ownership, taxonomy vocabulary machine names).

Proposed resolution

Add a context harvester to the existing Composer plugin (src/AiBestPractices/Composer/AiBestPracticesPlugin.php) that runs during post-install-cmd and post-update-cmd, after drupal-scaffold has placed AGENTS.md.

A new class src/Harvester/ContextHarvester.php scans web/modules/custom/ for:

  • *.info.yml - module name and description
  • *.services.yml - service IDs and their implementing classes
  • *.module - hook implementations (detected by the {module}_{hook} naming convention)
  • config/sync/*.yml - entity bundle definitions (node types, taxonomy vocabularies)

It writes the following files under docs/ai/:

  • PROJECT.md - stack, local URL, module count, links to the detail files below
  • PROJECT_MODULES.md - per-module name, description, and dependencies
  • PROJECT_SERVICES.md - service ID to class mapping
  • PROJECT_HOOKS.md - hook to implementing function and module mapping
  • PROJECT_ENTITIES.md - entity type to bundle list (from config/sync)

It then injects a # This project section into AGENTS.md pointing at docs/ai/PROJECT.md, using HTML comment markers so repeated installs are safe. The detail files are only loaded on demand, keeping the always-loaded context small.

All code paths catch \Throwable and never exit non-zero. A misconfigured project cannot break composer install.

The harvester is also called from uninstall() to clean up generated files and remove the injected block when the package is removed.

Remaining tasks

  • [ ] PHPUnit tests for ContextHarvester (isDrupalProject, collectHooks, idempotency markers)
  • [ ] PHPUnit tests for harvester integration in AiBestPracticesPlugin
  • [ ] Decide: should docs/ai/ be committed (reviewable diffs on each composer install) or gitignored (always regenerated)?
  • [ ] Make the custom module path configurable via ai_best_practices.yaml
  • [ ] Smoke test against a real Drupal install with full bootstrap
  • [ ] Follow-up issue: contrib module inventory (installed packages from composer.json, injected service dependencies)

Eval results

Claude Sonnet, 3 isolated runs, --no-baseline --setting-sources '' --cwd /tmp (no CLAUDE.md, no MCP, no project filesystem access):

Case No context With context Change
B01: exact service ID for YAML injection 0% 100% +100 pp
B02: container->get() expression for a service 0% 100% +100 pp
B03: which module implements hook_cron 0% 100% +100 pp
B04: drush eval line using a project service 0% 100% +100 pp
B05: taxonomy vocabulary machine name 33% 100% +67 pp
Total 7% 100% +93 pp
Cost per query $0.039 $0.015 -62%
Avg response time 48 s 8 s -84%

All eval prompts require facts that are specific to the project under test and cannot be guessed from training data. The eval suite is at evals/drupal-project-context/ in the branch.

Original report

George Kastanis (PointBlank), 2026-04-28. Branch: 3587321-add-a-composer at git.drupalcode.org/issue/ai_best_practices-3587321.

Related: Ronald de Brake's Surge work established the agents-generate command pattern; this feature complements it by automating context generation at install time.

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

zorz created an issue. See original summary.

zorz’s picture

Issue summary: View changes

mxr576’s picture

I understand the appeal of using a Composer plugin to build a static memory of a project. My concern is that this approach adds ongoing maintenance overhead. To keep that memory accurate, developers would need to run something like composer update --lock, composer context-harvest, or a similar command whenever the project changes. In the context of custom module development, that feels like a significant extra step and a likely source of stale data if people forget to run it.

What has looked more promising to me so far is instructing LLMs to use drush eval to discover the current state of the project directly. That can provide up-to-date information about existing services, hooks, entity types, and similar definitions at the moment it is needed, instead of relying on a precomputed snapshot. I could imagine a hybrid approach here, where Beans provides reusable context and drush eval handles live discovery. To me, that seems more robust and stable than trying to solve this entirely through a Composer plugin.

zorz’s picture

Thanks for the feedback. A few clarifications:

This feature is explicitly experimental so it is a starting point, not a production-hardened solution.

The staleness concern is real but something we can deal with. The harvester injects only a pointer into AGENTS.md; the actual content lives in docs/ai/*.md files that are read on-demand, not at session start. A pre or post-commit Git hook that re-runs the harvester would close the gap automatically. I could ship a setup script for this.

The drush eval approach also works but quite a lot of times I have seen it fail due to a missing argument. That seems like a context/token misuse.

A hybrid, static snapshot for structural context and drush eval for live runtime queries works better I think.

webchick’s picture

Status: Active » Needs review

Love the idea of this! Marking needs review, since there's something to review.

mxr576’s picture

The drush eval approach also works but quite a lot of times I have seen it fail due to a missing argument.

Could you clarify what you mean by a missing argument in this context? Are you referring to PHP-level argument errors, or something like a misconfigured service definition (for example in services.yml) that only surfaces when the container is bootstrapped via Drush?

That seems like a context/token misuse.

I think that conclusion might be a bit too broad without more detail. Token or context issues can certainly cause failures, especially in cases where placeholders or contextual data are resolved dynamically, but similar symptoms can also come from invalid service wiring, bootstrap inconsistencies, or even how Drush initializes the container.

Could you expand on what specifically points you toward a token/context misuse here? A concrete example would help distinguish whether the root cause is really in token resolution or somewhere earlier in the execution chain.

zorz’s picture

Fair point, "context/token misuse" was too loose. The failures I had in mind, looking at a recent session, were not about token resolution. They were mostly bad API guesses inside the inline PHP: calls to methods that do not exist on the interface (applyUpdates on EntityDefinitionUpdateManager, getMinimalRequiredSolrVersion on SearchApiSolrBackend), or wrong field machine names. Each one produces a PHP fatal that the agent recovers from, but the failed eval stays in the conversation as historical context, which costs tokens and may anchor the next attempt on a similar wrong API.

A second class is autoloader gaps in eval's bootstrap. invokeAll('runtime_requirements') failed for me with "undefined function drupal_verify_install_file()" because D11's SystemRequirements reaches into core/includes/install.inc which eval does not autoload by default.

So a more defensible version of my earlier comment: drush eval is a fine tool for live discovery, but inline PHP authored by an agent has a real failure rate from those two categories, and the retries pollute context for later turns. That is why I see static and live as complementary. For facts that do not change at runtime, like service IDs, hook ownership, and entity bundle machine names from config/sync, a precomputed reference could skip the retry loop entirely. For things that are inherently runtime, drush eval or a typed Drush subcommand is still the right tool.

zorz’s picture

Open review decisions on this MR, for anyone landing here:

1. Delivery mechanism. Composer plugin keeps the install footprint smallest but ties the behavior to composer.json. A DDEV add-on (trebormc's pattern in #3584914 #13) keeps tooling out of production deps and is more isolated. A standalone CLI command is the most explicit but adds a step. The eval data does not depend on which we pick.

2. Harvest scope. Custom modules first is the safe default. Contrib and core inventory is a clear follow-up (already noted in remaining tasks). Worth deciding whether config/sync entity bundles stay in the v1 cut or move to follow-up, since they push the harvester closer to a config inspector.

3. Idempotency and removal. HTML comment markers in AGENTS.md cover repeated installs, and uninstall() cleans up. Worth a sanity check from anyone who has burned on Composer plugin lifecycle edge cases.

4. Where docs/ai/ lives in the repo. Committed (reviewable diffs per install) or gitignored (always regenerated). I leaned gitignored in the original write-up but do not feel strongly.

Happy to address any of these or split them into their own issues if the conversation gets long.

alex ua’s picture

Taking the four open decisions from #9 in order:

Delivery mechanism: A Composer plugin handles the initial harvest well since it runs at install/update time without extra tooling. A standalone CLI command (drush ai:harvest or similar) covers the re-run-after-changes case mxr576 raised in #4 without needing git hooks.

Harvest scope: Custom modules first is the right call. The evaluation numbers (7% to 100% accuracy, 62% cost reduction) already justify shipping with just that scope. I'd push for keeping the harvested output structured as a metadata index rather than prose. A machine-readable format (YAML or JSON) lets agents query specific facts ("what services does module X define?") without loading the full file.

The format choice matters more than it might seem. LAPIS (https://arxiv.org/abs/2602.18541, Feb 2026) found that converting API specs to a token-efficient format achieved 85.5% token reduction while preserving the semantic information agents need for reasoning. A March 2026 study covered by InfoQ (https://www.infoq.com/news/2026/03/agents-context-file-value-review/, based on https://arxiv.org/abs/2511.12884) found that auto-generated AGENTS.md files actually performed worse than having no file at all: task success dropped 0.5-2% and inference costs rose 20%+. Human-curated files helped about 4 percentage points. The harvester output needs to be structured and selective, not a dump of everything the scanner finds.

We maintain a metadata-only skill index across 98 skills. Each entry costs about 71 tokens. Full skill content averages about 5,000 tokens per entry. That's a 100:1 ratio. ITR (https://arxiv.org/abs/2602.17046, Feb 2026) independently validated this pattern: per-step retrieval of only relevant tools and system-prompt fragments achieved 95% context token reduction, 32% improvement in correct tool routing, and 70% cost reduction. Spring AI codified the same lifecycle in January 2026 (https://spring.io/blog/2026/01/13/spring-ai-generic-agent-skills/): load only name + description at startup, full instructions on relevance match, bundled resources on execution.

Repository placement: Committing docs/ai/ gives you reviewable diffs and makes the context truly local-first (per #3583214: Local-first architecture: reasoning context should travel with the codebase). Regenerated-only files create a bootstrapping problem: the context isn't available until after the first install, which is exactly when agents need it most.

Idempotency: HTML comment markers work. Our install script uses SHA-256 checksums (MANIFEST.sha256) to verify integrity on every install, catching partial writes and manual edits. That pattern would work here too as an extra safety layer beyond the markers.

Written with the help of an LLM.

mxr576’s picture

A machine-readable format (YAML or JSON)

Have you considered TOON? There is already a Drupal contrib module for it. However, it appears the maintainer only reserved the namespace and made an initial commit.

Delivery mechanism: A Composer plugin handles the initial harvest well since it runs at install/update time without extra tooling. A standalone CLI command (drush ai:harvest or similar) covers the re-run-after-changes case mxr576 raised in #4 without needing git hooks.

I have been thinking further about this in the context of our company’s rollout strategy.

There are two primary drivers:

1. Improving the quality of AI-generated output with the help of AGENTS.md. Agents can discover code, but they may make mistakes, skip important code paths non-deterministically, or miss critical context. They often struggle to determine what services, entity types, field types, and similar constructs are actually available in a project, because these can be defined in highly dynamic ways. File-based scanning approaches (for example, glob($customDir . '/*/*.services.yml')) are not sufficient and can create false confidence based on incomplete data. With AGENTS.md, the goal is to provide clear, authoritative guidance about what must be considered on every iteration. In this context, stale data can be more harmful than no data. While some of this information could be retrieved via tool calls, that would increase token usage and slow down execution. Therefore, an authoritative, static, but dynamically updated data store is preferable.

2. Developer experience (DX). The tooling should be almost invisible. Developers should not need to remember to run additional commands beyond what they already use regularly.

This last point is important. In local development, developers typically run commands such as composer install, composer update, drush cr, and drush updb (or drush deploy). A Composer plugin that triggers after package installation may seem like a good fit at first, but in practice there are limitations:

  • In the vast majority of Drupal projects, custom modules are not registered as Composer packages. As a result, developers do not run composer install or composer update after making changes to custom code.
  • If we rely on scanning module files to collect highly dynamic data (such as service definitions, hooks, or entity types), then when composer install or composer update runs, Drupal’s container, entity definitions, and hook registry may still be outdated until drush updb or drush cr has been executed.

Based on this, a more accurate approach may be to hook into the end of the cache rebuild process and trigger a "memory update" afterward. For example, this could be implemented as a KernelEvents::TERMINATE event subscriber, ensuring the update runs after the response is sent and does not impact request latency.

At that point, Drupal can rely on its internal APIs to perform introspection and generate consistently up-to-date, accurate data for LLM consumption. This process should be environment-aware, so production environments can opt out where this data is not required.

Matt Glaman’s Lenient Composer plugin demonstrates that even a Composer plugin can push services into the Drupal container automatically:
https://github.com/mglaman/composer-drupal-lenient/commit/7bfe79bfcccf1b0f3dafee7e36468dccfc2e56bb#diff-984b8e3111197dd008[%E2%80%A6]8857434cc2898b5dca67873
(Related Slack thread: https://drupal.slack.com/archives/C2AAKNL13/p1775658325501299)

Given these considerations, I would strongly recommend running a proof of concept in this direction as well.

zorz’s picture

Loving how this is evolving. Substantive feedback from both of you that has shaped the next iteration of this MR.

@mxr576 #11, fair point on the introspection layer. File-based scanning catches the static cases but misses programmatically registered services, OOP hooks, dynamic entity types, and runtime container alterations. Your kernel-event approach is the right call: the container is the source of truth, KernelEvents::TERMINATE after a rebuild is the right trigger, environment-gated to skip production. I am folding that into this MR rather than splitting it to a follow-up. The Composer plugin layer (stack info, environment detection, AGENTS.md injection), the eval suite, and the drush ai:harvest command stay; the regex-based service/hook/entity collection becomes a Drupal service that introspects via entity_type.manager, module_handler, hook registry, and the container, fired from a kernel-event subscriber.

On TOON: I have not dug into it myself, so this is from a quick LLM-assisted scan rather than firsthand experience. Token savings appear real for flat tabular data, but the published token-count benchmarks use GPT tokenizer (o200k_base) and Claude tokenizer is not measured. The Drupal contrib module looks like a 1.x-dev namespace reservation with no stable release. The PHP ecosystem has multiple independent ports without a canonical maintainer-blessed library, and no AI coding agent has TOON-specific tooling yet. For the shapes we emit, markdown tables and YAML are well-supported and broadly compact, though I have not measured them against TOON in this codebase. I would rather defer; if companion files grow too large for context, tiered loading (Spring AI-style) comes before format swaps. Happy to revisit if someone has measurements against Claude tokenizer in particular.

@alex ua #10, on the four open decisions:

  1. Delivery: Composer plugin + Drush command, with the kernel-event subscriber from #11 as a third trigger. Automatic refresh post-drush cr, no extra developer action.
  2. Scope: custom modules first. Container-based introspection makes contrib/core a clean follow-up. Output stays selective: project-specific identifiers, not restatements of composer.json or *.info.yml.
  3. Idempotency: HTML markers plus a description-only AGENTS.md entry (so PROJECT.md loads lazily on relevance, not at session start). SHA-256 manifest as a follow-up safety layer.
  4. Placement: committing docs/ai/. Bootstrap argument is sound, and the kernel-event refresh keeps diffs accurate.

Updated MR coming. Will run the eval suite against the new output and post numbers.

zorz’s picture

@mxr576, on second read your architectural point holds and I went too far in #12 when I said I would fold the introspection layer into this MR.

Re-reading the scope docs and the consensus from #3584914, this package is an opinionated installer of guidance: skills, AGENTS.md, evaluation. CONTRIBUTING.md is explicit that site-specific configuration sits outside the scope, and the project-vision discussion landed on no menu of parts and no power-user knobs. A Drupal module that ships an event subscriber and a Drush command into every consumer's runtime container is runtime code, not guidance, and that is a different kind of package.

The architectural answer you described in #11, container as source of truth, KernelEvents::TERMINATE after a rebuild, environment gated, is the right answer. The right home for it is a sibling package, not this one. The shape would be: introspection service over entity_type.manager, module_handler, the hook registry keyvalue, and service_container; subscriber on TERMINATE gated by a Settings flag and throttled to one harvest per container rebuild; drush ai:harvest for the explicit and CI cases. This package can suggest it once that work exists, whether someone in this thread picks it up or it lives as an open issue for a while.

For this MR, the in-scope work is smaller than what I put in #12:

  1. Drop the regex-based service, hook, and entity collectors. They give the false confidence you flagged. The Composer plugin layer keeps only the parts that do not need a bootstrapped Drupal: stack info from composer.json, environment detection from the filesystem, AGENTS.md tier-1 entry injection, and a PROJECT.md scaffold that points at where container-truthful detail files would live once a sibling package fills them in.
  2. Apply a Spring AI tier-1 fix to buildAgentsBlock() so PROJECT.md loads on relevance match rather than on every session start.
  3. Re-run the eval suite against the trimmed output. The original 7 to 100 percent and 62 percent cost numbers came from the harvested detail files; without them, this MR alone will not move the needle. The full numbers belong to whichever sibling-package MR materializes.
  4. Land this MR with the smaller scope. The sibling package idea sits as a follow-up note on the original issue rather than a self-imposed deliverable.

This is more honest about what the package is, and it leaves your architectural critique with a proper home rather than crammed into a package that says it ships guidance. If the project decides scope should broaden to cover runtime introspection, that is a conversation for #3584914 rather than this MR, and I am open to it.

Will post the eval numbers from the trimmed output once they are in.

mxr576’s picture

Just found the CTX module which exposes Drupal internals to agents over MCP, so should we even try to build an introspection layer via Skills?

https://git.drupalcode.org/project/ctx/-/blob/1.x/src/Plugin/McpTool/Ser...

(I wonder if the module name got inspired by https://docs.ctxllm.com/ - which I have promoted over Drupal Slack an use in the Easy Encryption project https://git.drupalcode.org/project/easy_encryption/-/blob/1.0.x/context....).

ronaldtebrake’s picture

I'm also inclined to focus on an MCP resources approach, sorry I didn't see this issue before, but on a similar level I've opened #3588230: Define a Drupal package documentation context convention for AGENTS.md and we also seem to agree that it would make sense. Which matches with #3585585#comment-16564365

We also need guidance on skills vs AGENTS.md vs MCP vs docs
I think this should become a docs page, not just a skill. We need a durable human-readable page that explains when to use each type of artifact:

Drupal.org docs / API docs / change records are the source of truth for human-facing documentation, API behavior, tutorials, change history, and canonical procedural docs.
AGENTS.md is for project-wide agent instructions: local setup, coding conventions, commands, testing expectations, repository structure, contribution expectations, and links to relevant skills or docs.
Agent Skills are for reusable, task-specific procedural guidance that agents should load only when relevant. They should encode decision ordering, checks, defaults, gotchas, and escalation cues.
MCP resources are for exposing data or context to agents, such as docs, code indexes, issue data, schemas, examples, or site-specific state.

MCP tools are for actions the agent can invoke, such as running a Drush command, querying Drupal, creating an issue, searching the issue queue, inspecting config, or calling an external API.
MCP prompts are for reusable user-invoked workflows or prompt templates, especially when a user intentionally chooses a workflow from a client UI.

With regards to the Agents Context module: https://drupal.slack.com/archives/CDL2YPBNX/p1776753269030049

Given the composability of what we’re trying to do I would love it if we could use drupal/mcp_server instead and make any necessary tools available on top of that, instead of shipping with “another” McpServer.
That’ll give us other options to add tools and more for future use cases.

webchick’s picture

Given the composability of what we’re trying to do I would love it if we could use drupal/mcp_server instead and make any necessary tools available on top of that, instead of shipping with “another” McpServer.
That’ll give us other options to add tools and more for future use cases

+1000 to consolidating on standard ecosystem projects vs going our own way.

scott falconer’s picture

I've been working on this idea over at: https://git.drupalcode.org/project/ai_context/-/work_items/3586150

My recommendation would be to break it into parts:

- Under Context Control Center for the gathering, governance, and management aspect.
- Into individual tools for how that info is expressed. i.e. it could be in agents.md, via mcp, or used in tooling like this: https://github.com/scottfalconer/drupal-ai-guidance-poc/