Six agents across three projects. Four calls need you, and one's been waiting 42 minutes.
You clear the calls that matter while the rest keeps running.
spacedock
experiment spacedock · degradation experiment Validator rejected the experiment. Send it back, or close it? recommend REJECTED next
- The env var didn't disable teams: disabled mode was never actually tested.
- Only the pilot path was tested, never a fresh session.
- The agent avoided creating teams because the task description said "degradation".
- retest in a fresh session
- a real mechanism to disable team tools
- a neutral task description
design spacedock · agent-reuse design Design review passed 5 of 5 checks. Approve it and start the build? recommend approve next
- ✓ problem grounded in codebase evidence
- ✓ approach: when to reuse an agent, when to start fresh
- ✓ edge cases: model mismatch, budget overrun
- ✓ 7 testable acceptance criteria
approve email_triage · work-inbox 37/37 categorized: approve the sweep? (1 flag) recommend approve · resolve 1 flag next
- 6 delete: DMARC reports, one cold sales pitch
- 21 archive: receipts, newsletters, alerts, recaps
- 8 star: investor, partnership, legal, finance follow-ups
- 2 calendar: a recording session today, a fireside chat
- ⚑ a cold "let's chat" from an academic contact: archive or star?
evaluate spacedock_gtm · naming 4 naming metaphors scored by 5 personas: pick a direction. recommend approve · archive results next
- Nautical · 4.4 comp · 2.8 appeal · clearest, 1 persona appeal=1
- Minimal · 4.2 comp · 3.2 appeal · best-liked
- Business · 4.2 comp · 2.8 appeal
- Restaurant · 4.2 comp · 2.8 appeal · "season" failed everyone
_results/; the naming call stays yoursNothing is waiting on you. The agents keep working.
Agents produce work faster than you can verify it.
It breaks down in four predictable ways:
Interruptions you can't predict
You sandboxed it, so permissions aren't the problem. You can't batch interruptions you can't predict.
Context switches between unrelated decisions
A design call, a one-line approval, and a ship-without-tests call arrive back to back. The switching is the cost.
Rubber-stamping
An agent reviewing its own work writes a press release; you stop reading closely.
You become the human messenger between agents
One agent's output is the next agent's input, and you carry it by hand: copy the findings, paste the context, explain it again.
The bottleneck is judgment, not generation.
You set the bar. We hold agents to it.
Define the bar.
Write down what done means for this work. The bar starts rough and sharpens every time you reject.
A separate agent reviews.
A fresh agent reviews the work against your bar and catches the cut corners and unsupported claims the maker would wave through.
Iterate until it passes.
Rejected work goes back with findings. The agents rework it until the evidence meets your standard, and only then does it reach you.
Every call is recorded in a plain file in your repo: what was decided, on what evidence, and why.
Built for the people who care about their craft when agents do the work.
You've used a coding agent hard enough to feel this. Spacedock meets your projects where they are: real code, drifting plans, a few dead ends. No re-scaffolding required.
You hand agents whole chunks of work
You build skills, tired of running them by hand
You reach for agents beyond coding (email, GTM, content, research)
You live in Claude Code or Codex all day and feel the slide into rubber-stamping
Run the scan on your own sessions.
brew install spacedock-dev/tap/spacedock spacedock claude '/spacedock:survey' Or read the docs: spacedock.md/docs · github.com/spacedock-dev/spacedock
No spam. We read every reply.