Problem/Motivation

As we're moving fast and hopefully not breaking too many things, let's get some agents' take on what is working well and what could be better.

This is intended to kind of be an evergreen, running log kind of issue.

Comments

webchick created an issue.

webchick’s picture

Claude Code, 4/6/26

Prompt: "Hi, claude! can you take a look at my repo and give me your take?"

⏺ This is a solid, well-conceived project. Here's my take:                                                                                  
                                                                                                                                            
  <strong>What's working well: </strong>                                                                                                                     
                                                                                                                                            
  - <strong>Architecture is clean</strong>. Single source of truth in AGENTS.md, agent-specific files as thin redirects, skills as on-demand loads. That's the right call for avoiding content drift.                                                                                                
  - <strong>Task-oriented naming</strong> (writing-automated-tests, not tests) is genuinely better for agent task-matching.                                  
  - <strong>Priority-ordered options</strong> in skill files is a subtle but important design choice -- agents that default to the wrong test type is a real failure mode, and the ordering nudges them correctly.                                                                                     
  - <strong>Eval framework</strong> is the most impressive part. Most prompt engineering projects skip validation entirely; this one has behavioral + static checks that can prove whether guidance actually changes model output.                                                                     
                  
<strong>  Things I'd think about: </strong>

  - how-to-write-documentation feels thin -- if it's a stub, it might be worth either fleshing it out or clearly marking it as status: draft
   so agents don't pull it and get incomplete guidance.
  - The redirect files (CLAUDE.md, GEMINI.md) could become stale if new tools emerge. You might want a note in CONTRIBUTING.md about keeping SETUP.md updated when adding new agent tool support.                                                                                     
  - Eval coverage looks solid for coding-standards but the writing-automated-tests structure exists -- are evals actually filled in for that
   one? If not, there's a gap between what's validated and what's shipped.                                                                                                                            
                  
  Overall: this is one of the more thoughtful "AI context for a specific platform" projects I've seen. The eval-driven approach in          
  particular is ahead of where most teams are.