SET

Ship Exactly This.

Give it a spec. Get merged features.

Autonomous multi-agent orchestration for Claude Code. Greenfield or brownfield. Full app or single module. Every change planned through OpenSpec, verified by quality gates, merged automatically.

$ git clone https://git.setcode.dev/root/set-core.git && cd set-core && ./install.sh
// after install:
1. Open http://localhost:7400 — dashboard starts automatically
2. Open Claude Code in set-core dir, type: "run a micro-web E2E test"
3. Start sentinel from the dashboard — or tell Claude: /set:start
4. Watch the dashboard — agents decompose, implement, verify, merge
5. When done, tell Claude: "start the application that was just built"
6. Ready for your own project: set-project init --project-type web
// five pillars

The architecture behind autonomous orchestration

Five principles that make it work. Each one learned the hard way — through failures that cost hours of compute, corrupted branches, and wasted overnight runs.

SPECIFY
Structured input,
not prompts
DECOMPOSE
Intelligent planning,
not guessing
EXECUTE
Structured implementation,
not free rein
SUPERVISE
Three-tier supervision,
not babysitting
VERIFY
Deterministic quality,
not vibes
LEARN
Every run improves
the next

Structured input, not prompts

Output quality depends on input quality. 90% of agent failures are underspecification.

openspec_workflow

Structured artifacts: proposal → design → spec → tasks → code. Acceptance criteria (WHEN/THEN), requirement IDs (REQ-xxx), end-to-end traceability. Agents implement against the spec, not their imagination.

design_bridge

Figma Make → set-design-sync → per-change design.md with scope-matched tokens. Each agent gets only the colors, fonts, layouts for its pages. Not "make it look nice" — exact hex values, exact spacing. Part of the built-in web project type — the battle-tested default.

spec_flexibility

Full 30-page spec with Figma design. Or 3 requirements for your existing codebase. Or a single task description. The pipeline scales from a sentence to a specification. /set:write-spec generates structured specs interactively.

"We told agents 'build a cart feature.' Result: no price calculation, no empty-cart state, no persistence. With OpenSpec: 8 requirements, 8 implemented."

Waterfall took 8 months. This takes 8 hours.

The principle hasn't changed: output quality depends on input quality. A detailed spec used to mean months of upfront planning. Now it means hours of orchestrated agents building exactly what you described.

You are the product owner. The agents are the dev team. The spec is the sprint backlog. The better the spec, the better the result.

your_spec

Business requirements, acceptance criteria (WHEN/THEN), technical constraints, dependency listing, seed data conventions.

our_templates

Framework boilerplate, build config, test setup, linting rules, conventions. You say what. Templates handle how.

Intelligent planning, not guessing

One big task fails. Many small ones succeed. Max 6 requirements per change — above that, failure rate spikes.

dependency_dag

Spec → requirements → dependency graph → phased execution. Changes that don't depend on each other run in parallel. Changes that do, wait. The planner does this automatically — no manual sprint planning.

parallel_worktrees

Multiple Claude agents in isolated git worktrees. Real branches, real merges. No containers, no VMs — just git. Even with a single change, the worktree provides isolation — your main branch stays clean until gates pass. With multiple changes, they run in parallel without interference.

complexity_aware_allocation

S/M/L sizing per change. Token budgets (S: 2M, M: 5M, L: 10M). Model selection per complexity. These thresholds come from 100+ production runs — not guesswork.

"CraftBrew auth change hit 443K tokens (222% of context window) and failed. After splitting into smaller changes: each completed in 15-20 minutes."

Structured implementation, not free rein

Agents don't get a prompt and good luck. They get structured artifacts, project-type conventions, and iterative loops with progress tracking.

ralph_loop

Iterative agent development cycle: proposal → design → spec → tasks → code. Not a single-shot prompt — multiple iterations with stall detection, done criteria, and context pruning between turns. Single-shot gets you 70% done. The loop gets you to merge.

project_types

The built-in web project type: Next.js, Playwright, Prisma templates. Agents work into existing structure, not from scratch. Convention enforcement, route groups, colocation rules. Build your own for any stack — fintech, healthcare, CLI, API.

agent_tooling

Every agent gets: scoped proposal, task list with REQ-IDs, design.md with exact tokens, MCP tools for memory and team sync. Plus: token budget awareness, progress-based trend detection, and auto-pause when stuck or over budget.

"Single-shot prompts get ~70% done and stop. The Ralph Loop iterates: write code, run tests, read errors, fix, repeat. That last 30% is where the value is."

Three-tier supervision, not babysitting

The goal is always the happy path. But when things break — and they will — recovery must be fast, thorough, and automatic.

sentinel_supervisor

3-tier decision model: sentinel → orchestrator → agents. Each tier handles its own failure mode. Agents handle code errors. Orchestrator handles workflow errors. Sentinel handles infrastructure — crashes, disk, deadlocks. 30s detection, auto-recovery.

watchdog_intelligence

Context-aware stall detection. pnpm install taking 90s with no stdout? Grace period. Prisma migration running? Extended timeout. Graduated escalation: warn → restart → rebuild → give up. Not "no output = dead."

team_sync

Multi-agent messaging. Broadcast status, avoid file conflicts, coordinate dependencies. The orchestrator sees what everyone is doing — and intervenes when needed.

web_dashboard

Real-time monitoring at localhost:7400. Step progress, gate results, token charts, agent terminal, sentinel decisions, learnings — every tab is live. Start orchestration from the browser. Not a CLI afterthought — a proper operations center.

"We aim for clean runs. But crashes happen — disk fills up, network drops, agent stalls. The sentinel detects in 30 seconds, diagnoses, restarts. Before it existed, we lost 3 overnight runs. Now: zero."

Deterministic quality, not vibes

Exit codes, not LLM judgment. You can't talk your way past a failing test.

quality_gates[7]

Test, build, E2E, lint, review, spec coverage, smoke. Sequential pipeline — fast gates first. If Jest fails in 8s, you don't wait 45s for Playwright. Exit codes decide pass/fail. BDD traceability binds REQ-IDs to tests.

self_healing

Gate fails → agent reads error → fixes → re-runs gate. Not "retry 3 times and give up." The agent diagnoses. MiniShop: 5 gate failures, 5 autonomous fixes — including IDOR vulnerabilities caught and patched without human review.

deterministic_output

3-layer templates + set-compare scoring. Run the same spec twice: 87% structural overlap on micro-web, 83% on minishop. Schema equivalence: 100%. Convention compliance: 100%. The remaining divergence is stylistic, not structural.

spec_coverage

"Tests pass" does not mean "spec is implemented." The verify gate checks every REQ-ID has corresponding code. If 28/32 requirements are covered, auto-replan kicks in for the remaining 4. Doesn't stop until 100%.

"Early CraftBrew: LLM review let through a 'TODO: implement later' that broke checkout. After that — exit codes only. Deterministic gates can't be gamed."

Every run improves the next

The real value shows from run #2 onward. Every error occurs only once.

cross_run_learnings

Gate failures become planning rules. set-harvest extracts framework-level fixes from 100+ runs across 4 projects. Each run is smarter than the last — not by prompting better, but by codifying what went wrong into rules.

persistent_memory

Hook-driven cross-session recall. Agents learn from each other. Shared across worktrees. In 15+ sessions, agents made 0 voluntary memory saves. Zero. So we built 5-layer hook infrastructure that captures everything automatically.

template_system

Telling 5 agents "create a Next.js project" produces 5 different directory structures. Templates produce one. 3-layer system: core → module → project. Reduced file structure divergence from 63% to 0%.

"The first run always reveals problems. The point: every error occurs only once, because the system codifies the fix into rules, templates, or gates."
// the pipeline

From spec to merged code — fully autonomous

Markdown specification Figma design
input: markdown spec + figma design. Use /set:write-spec for interactive spec generation, set-design-sync to extract Figma tokens. SPECIFY
Digest — domains, requirements, acceptance criteria
digest: spec parsed into structured requirements, domains, and dependency graph. SPECIFY DECOMPOSE
Triage — ambiguity resolution
triage: ambiguities flagged during digest get resolved — automatically by the planner, or interactively by you. Nothing proceeds until every AMB has a decision. SPECIFY
Parallel phases Token usage
orchestrate: independent changes run in parallel worktrees. Agents implement iteratively. Sentinel monitors everything. DECOMPOSE EXECUTE SUPERVISE
Gate results Sentinel log
verify: 7 quality gates per change — test, build, e2e, lint, review, spec coverage, smoke. Exit codes, not judgment. VERIFY
Orchestration complete Running application
ship: verified code merges to main. Result: running application built from your spec. Zero intervention. VERIFY LEARN
1,400+ commits · 65K core LOC · 376 specs · 860+ change artifacts · 100+ E2E runs
// works everywhere

Not just "build me an app from scratch"

The pipeline scales from a single feature to a full application. Your existing codebase, your workflow.

greenfield

Full app from spec + design. 6-15 changes, parallel agents, zero intervention.

MiniShop: 6/6 merged, 1h 45m, 38 unit tests, 32 E2E tests, 0 human interventions.
brownfield

Your existing codebase. Add features, refactor modules, fix technical debt. The pipeline reads your code first.

set-core itself is built with SET — 376 specs, 1,400+ commits, every one through OpenSpec.
isolated_unit

One module, one feature, one fix. Single change, full gate pipeline. Same quality guarantees at any scale.

"Add 3 API endpoints with auth to my existing Next.js project." One spec, one run, done.
// see it run

A real agent session — spec to merged code

Claude agent session — debugging, testing, and fixing code autonomously
// proof

We treat determinism as an engineering problem

100+ runs, 4 project types, set-compare scores every pair. These are measurements, not claims.

1,400+ commits
83-87% convergence
7 gates per change
100+ E2E runs
0 overnight failures
(with sentinel)
challengeapproachresult
output divergence3-layer template system + set-compare87% micro-web · 83% minishop · 4 project types
convention complianceroute groups, colocation, naming rules100% across all runs
quality roulette7 programmatic gates (exit codes)deterministic
hallucinationOpenSpec artifacts + acceptance criteriaspec-verified
spec driftcoverage tracking + auto-replan100% coverage
failure recoveryissue pipeline (detect → diagnose → fix)auto-recovery
agent amnesiahook-driven memory (infrastructure)100% capture

It doesn't retry. It investigates.

The sentinel doesn't blindly retry failed gates. It reads logs, traces root causes, and dispatches targeted fixes. Environment misconfigured? It reconfigures. Dependency conflict? It resolves. Bug in SET's own code? It patches set-core and commits the fix — so the same failure never happens twice.

Detect → investigate → fix → verify → learn. Permanent fixes, not temporary workarounds.

sentinel — global issues (live)
Sentinel Issues Dashboard — real self-healing pipeline across 100+ orchestration runs

Real issue tracker from 100+ orchestration runs. Every resolved issue was fixed autonomously.

// commands

40+ tools. One workflow.

Slash commands in Claude Code, CLI tools in your terminal. Everything composes.

use_set // your project
set-project init deploy to project
/set:write-spec interactive spec gen
/set:decompose spec → execution plan
/set:start start orchestration
set-design-sync Figma → tokens
/set:audit health check
extend_set // plugins
modules/web/ built-in web type
ProjectType ABC base class
entry_points pip plugin system
CoreProfile inherit universal rules
planning_rules.txt domain patterns
templates/ scaffold per stack
develop_set // contribute
/opsx:new structured change
/opsx:apply implement tasks
/opsx:verify check before merge
set-harvest adopt E2E fixes
set-compare measure divergence
set-memory persistent recall

Plus: set-new, set-work, set-merge, set-close (worktrees) · /set:status, /set:msg, /set:inbox (team sync) · /set:todo, /set:loop, /set:push (workflow)

// ecosystem

Build your own project type

SET ships with a web project type (Next.js, Playwright, Prisma) battle-tested across 100+ runs. That's the default — but the real power is building your own.

modules/web/

Next.js, Playwright, Prisma. 100+ runs across micro-web, minishop, craftbrew. Per-change E2E, BDD traceability, convention enforcement.

E2E runners

Scaffold → init → register → sentinel start. One script per project. run-micro-web.sh, run-minishop.sh, run-craftbrew.sh. Repeatable validation.

your_project_type/

IDOR checks for fintech. HIPAA for healthcare. Your gates, your conventions, your templates. pip-installable plugin inheriting CoreProfile.

// why now

Single-agent was the start. Orchestration is the present. Enterprise is preparing.

Systems like SET can do the work of a full development team — given the right spec and properly developed project types. This is the present, not the future.

Don't blame the model. 90% of agent failures are underspecification on our side. SET exists to enforce structure, verify output, and close those gaps.

Enterprise is next. On-premise models, secure multi-tenant — the infrastructure is coming. Every organization should prepare now.

Model providers will build orchestration natively. We welcome that. But we're not waiting.

// work with us

Build With SET

Open-source and autonomous. Need something custom? We can help.

custom_project_type

We build a ProjectType plugin for your stack and domain. Your rules, your gates, your templates. Pip-installable, works with set-project init.

workshop

Hands-on spec-driven development training. Write specs that produce working apps. Run orchestration, understand gates, build memory. Remote or on-site.

managed_run

Send a spec, get a working app. We run the orchestration, you review the PRs. Quality gates guarantee the output. Ideal for MVPs and proof-of-concepts.

interested in:
// one more thing

when orchestration gets intense, defend your changes.

Battle View

arrow keys + space. every change is a ship.