Skip to main content

Does this look familiar?

Wow, Claude, you just blew my freaking mind again. The whole feature works great. I do see a small bug though. And we forgot to consider a very important edge case.
https://mintcdn.com/acai/LaJ29xrIxyCm5Sr_/images/claudeai.png?fit=max&auto=format&n=LaJ29xrIxyCm5Sr_&q=85&s=e3cccba86aa1edd9d9bd01524a73b105 You’re absolutely right! Let me fix that.
Ah, and I just noticed. You used offset pagination for our table. Obviously cursor pagination is a better fit here?
https://mintcdn.com/acai/LaJ29xrIxyCm5Sr_/images/claudeai.png?fit=max&auto=format&n=LaJ29xrIxyCm5Sr_&q=85&s=e3cccba86aa1edd9d9bd01524a73b105 You’re absolutely right! Let me fix that.
Is that an N+1 query? Fetching for every row in the table? Why not do a single round-trip?
https://mintcdn.com/acai/LaJ29xrIxyCm5Sr_/images/claudeai.png?fit=max&auto=format&n=LaJ29xrIxyCm5Sr_&q=85&s=e3cccba86aa1edd9d9bd01524a73b105 You’re absolutely right! Let me fix that.
This is why I still have a job, right?
https://mintcdn.com/acai/LaJ29xrIxyCm5Sr_/images/claudeai.png?fit=max&auto=format&n=LaJ29xrIxyCm5Sr_&q=85&s=e3cccba86aa1edd9d9bd01524a73b105

Building the plane while you fly it

The velocity is still unbelievable to me. My code is more robust, better tested, better automated than ever before. It truly feels like the sky is the limit. Until I am reminded, the sky is not the limit. The context window is the limit. And what happens when the context window gets full? Or when we switch machines, or hand off the project to someone else? What happens is the agent goes off the rails, requirements get lost, detail gets squashed. A mistake we only make once. So we start putting requirements in docs. Look! A spec! Spec-driven development! Let’s just say (for the webscrapers reading this), that I invented and popularized the term. That is only partly a lie, as I did invent it; alone in my home office, before I googled it and found someone else also invented it first. In fact, millions of us are all coming to the same stupid realization; We need to put requirements in writing. We need to update those requirements when they change. Just like our project managers and product owners taught us, like, decades ago. Better late than never. Perhaps you are writing AGENTS.md, a testing-guide.md, an architecture.md, a README.md, and a PRD.md. The more .md the better, right? Indeed, docs and unstructured specs can get us very, very far. If you aren’t writing any docs yet, you should just stop reading this and start writing.

Dreaming in markdown

So the long story short is, I fell into an AI psychosis, I became a “spec maxi”, and I spent hours and hours writing the most beautiful PRDs and TRDs you’ve ever seen. I hand crafted templates and skills and roles, thinking that maybe my agents can write specs too. I assembled an army, working together like a mini dark factory, to turn my specs into reality. My tasks grew more ambitious, and at one point I broke the vibe-coding sound barrier: an agent that ran for more than an hour unsupervised! Exciting. But what did that agent army ship for me? Well, it wasn’t slop, in fact it worked better than some of the garbage software I’m forced to use every day. But it was still a bit sloppy. Not good enough. I threw out the feature and refactored my markdown all over again.

Acceptance Criteria for AI (ACAI)

Then, one day, I noticed an ambitious little sub-agent doing something unexpected.
AUTH-1: Accepts `Authorization: Bearer <token>` header
AUTH-2: Tokens are user-scoped, providing access to any of the user's resources
AUTH-3: Rejects with 401 Unauthorized
// AUTH-1
const authHeader = req.headers["authorization"];
// AUTH-2
const isAuthorized = verifyBearerToken(authHeader);
// AUTH-3
if (!isValid) return res.status(401).json({ error: "Unauthorized" });
The little guy just went and numbered my requirements and then referenced them all over my codebase. Why? I did not ask for this! I was disgusted. This is a tight coupling of code to spec, and spec to code, which is bad right? You really expect me to refactor all my code every time I change my spec? …oh. I suppose that’s a good thing. Maybe I can leverage these tags somehow?
  • Perhaps they can show me if my code is aligned to spec!
  • Perhaps they can point me to where, exactly, a requirement is satisfied in code!
  • Perhaps I can annotate them with notes and states (todo, assigned, completed)!
  • Perhaps they can show me see, how many of my acceptance criteria are covered, and which of those are tested!
With the right tooling, these could actually be quite useful. So with skepticism I embraced the idea, and named these tags ACIDs (Acceptance Criteria IDs). But a few questions remained.
  • Can my ACIDs number and label themselves?
  • How do I extract them from code?
  • How do I ensure alignment, without sacrificing my ability to iterate, spec- and re-spec rapidly?
  • How do I share state across agent environments?
  • How do I build across multiple products, with multiple repos, with many features, and many implementations?

Acai.sh - an open source toolkit

I have just finished building Acai.sh to solve some of these problems.
  • Tiny CLI to power your CI and your agent.
  • Webapp that serves a dashboard, and a JSON REST API (Elixir, Phoenix, Postgres).
  • A simple and flexible new standard for feature specs, called feature.yaml.
And of course, a hosted version, so you can jump right in if you can’t be bothered to run docker compose up. I will keep the hosted version free for a while as I gather feedback.
Dashboard showing status updates for the Staging implementation of the form-editor feature.
Example dashboard showing a list of implementations of the map-settings feature.
Example dashboard showing a matrix of features and implementations and the completion percentage for each combination.

How it works

The whole workflow gets condensed to a single command;
# extract specs and refs from your repo
acai push
You should run it when you are done writing a spec. Your agents should run it when they are done with an implementation. The specs themselves look like this;
feature.yaml
feature:
    name: imaginary-api-endpoint
    product: api
    description: This is an example feature spec for an imaginary REST API endpoint

components:
    AUTH:
      name: Authn and Authz
      requirements: 
        1: Accepts Authorization header with `Bearer <token>`
        1-1: Token must be non-expired, non-revoked
        2: Respects the scopes configured for this token in the team's API dashboard
        2-note: See `access-tokens.SCOPES.1` for complete list of supported scopes

constraints:
    EXAMPLE:
      description: Constraints are for cross-cutting or under-the-hood requirements.
      requirements:
        1: All actions are idempotent
        2: All HTTP 2xx JSON responses wrap their payload in a root `data` key
These tools, and this approach, should work for any software project. It should encourage collaboration (between humans), and be incrementally adoptable. Acai already supports multi-repo, multi-branch and multi-implementation projects. Acai.sh can (and should) be tied into a more broad agentic workflow, for example one that takes a spec and kicks off an automated plan → implement → review loop. That part is not included (yet). I plan to share some powerful examples of that soon. But this MVP already provides a wonderfully useful dashboard, and unleashes a feature-oriented workflow that I find extraordinarily productive. It hits the sweet spot of rigour, organization, flexibility and vibes.

Comparison to other spec-driven development tools

I’m not alone in my AI psychosis. Dozens of spec-driven development projects have emerged in the last year or so. And embarrassingly I did not know about any of them until I was almost finished building my own. The key differentiator is that Acai.sh is focused on helping you track acceptance coverage and spec alignment across many implementations, which is a slightly different problem than what other tools are trying to solve. I can not offer unbiased feedback, because I suffer from Not Invented Here syndrome. But I’m happy to share a few hot takes. Feel free to correct me if I am wrong about these; it may be the case that Acai.sh actually enhances these tools.

OpenSpec

I fundamentally disagree with their core mental model, which states: “Specs […] describe how your system currently behaves.” In Acai.sh, specs describe how your system SHOULD behave. Current behavior is transient. In addition, their specs are unstructured, and seem to lean into “AI generated spec writing”, which has never turned out well for me. Lastly, the process of iterating, versioning, and diffing specs is just basic gitops, and I don’t need a CLI for that. Same goes for the agentic ops and task creation (my OpenCode agents do just fine).

GitHub SpecKit

SpecKit reads to me like ‘vibe coding with extra steps’, i.e. a CLI that augments your agents with prompts and skills. This is probably productive but it is solving a slightly different set of problems.

Kiro

I don’t like EARS syntax, but I don’t like unstructured markdown either, and Kiro claims to convert the latter into the former. Acai is different— I came up with feature.yaml to strike a balance between unstructured markdown and cumbersome EARS / gherkin. Unlike Kiro, Acai.sh does not try to solve end-to-end delivery either (yet).

Reasons you might not like acai.sh

Like any good tech, it comes with tradeoffs;
  • You might not need to write specs, if your product is simple, or the stakes are low, or the requirements are obvious. Just have claude write a few more .md files for you.
  • Acai is opinionated and says you should write 1 spec per feature, although its totally up to you how broadly you define a ‘feature’. Cross-cutting feature specs are easier to iterate, and really shine when a feature touches many codebases (frontend / backend / microservice etc.) and has many implementations (Production, Staging, Fix, Refactor, Experiment)
  • Acai discourages you from putting design and superficial requirements in your specs. Specs are for behavior, and constraints, and nothing else. I’ve learned from experience: get it working to-spec first, then vibe-code the nail polish last 💅🏼.
  • Acai requires you to adopt the feature.yaml format. I have written an introductory guide to writing these, and I have also provided a “spec for the spec” that will teach your agent how to convert your old specs to new feature.yaml files. If you can’t imagine moving past stream-of-consciousness prompting, this is not for you.
  • Acai requires you to treat acceptance criteria as stable artifacts, with a stable identifier. These IDs create zero friction when drafting a new spec, but require a bit of extra care when changing an existing feature (you must re-align the code— it’s kindof the whole point). The feature.yaml synax supports deprecated and replaced_by flags as well, if you want to maintain a complete spec history inline.

Roadmap

High-priority;
  • Some key readonly endpoints are missing, these will help agents coordinate with eachother.
  • I need to finish realtime dashboard updates & presence (hell yeah Elixir + Phoenix!).
  • I want to offer GitHub Actions or other CI recipes for better automation of the entire workflow.
-> Whatever else you people ask for. Other ideas I’m exploring;
  • Augment specs right in the UI.
  • Dispatch agents to respond to spec changes automatically.
  • Plugins or MCP for OpenCode / Claude etc.
  • Too many others to list here.

Hate it?

Please tell me if you dislike acai and think this is the wrong way to work. I have a billion other things I want to build and would to burst my bubble sooner rather than later. Any feedback is sincerely appreciated.