New: Spacedock is #1 on the Berkeley Data Agent Benchmark

Six agents across three projects. Four calls need you, and one's been waiting 42 minutes.

You clear the calls that matter while the rest keeps running.

Open source · Read the docs →

BEFORE your terminals today
claude · email_triage
● fetch inbox · 59 unread
categorize delete/archive/star/calendar …
✓ 37/37 work · 22/22 personal
⏸ ambiguous: cold "let's chat" · archive or star? (42m)
reviewer: thorough triage, recommend approve
drafting archive batch …
● apply labels …
claude · spacedock
● validator: graceful-degradation experiment
⎿ env var didn't disable teams
only the pilot run was tested, never a fresh session
⏸ validation REJECTED: send back or close? [r/c]
● design review: agent reuse
⎿ reviewer APPROVE · 5/5 · 7 ACs
⏸ approve the design? [y/N]
claude · gtm
scoring naming variants · 5 personas … 4/4
⎿ Nautical 4.4 comp · 2.8 appeal
⏸ approve naming eval? pick a variant
● research · guardrail run …
● infra · sync reconcile …
agent: excellent work, merging ✦
scrollback +2,140 lines …
⚠ missed · 42m ago
4 decisions buried · 1 missed · impact: who knows
pooled by
spacedock
AFTER surfaced by spacedock
4 decisions · pooled across 3 projectsreal · ~/.agentsview
experiment spacedock · degradation experiment Validator rejected the experiment. Send it back, or close it? recommend REJECTED next
why it was rejected
  • The env var didn't disable teams: disabled mode was never actually tested.
  • Only the pilot path was tested, never a fresh session.
  • The agent avoided creating teams because the task description said "degradation".
validator recommends
  • retest in a fresh session
  • a real mechanism to disable team tools
  • a neutral task description
your call → send back with these findings, or close it?
design spacedock · agent-reuse design Design review passed 5 of 5 checks. Approve it and start the build? recommend approve next
reviewer: APPROVE · checklist 5/5
  • problem grounded in codebase evidence
  • approach: when to reuse an agent, when to start fresh
  • edge cases: model mismatch, budget overrun
  • 7 testable acceptance criteria
approve → opens implementation in a worktree
approve email_triage · work-inbox 37/37 categorized: approve the sweep? (1 flag) recommend approve · resolve 1 flag next
proposed actions · 37 emails
  • 6 delete: DMARC reports, one cold sales pitch
  • 21 archive: receipts, newsletters, alerts, recaps
  • 8 star: investor, partnership, legal, finance follow-ups
  • 2 calendar: a recording session today, a fireside chat
flag for you
  • a cold "let's chat" from an academic contact: archive or star?
approve → applies 36 actions, leaves the 1 flagged email for your one-tap call · (public page uses synthetic inbox)
evaluate spacedock_gtm · naming 4 naming metaphors scored by 5 personas: pick a direction. recommend approve · archive results next
cross-variant scores · comprehension / appeal
  • Nautical · 4.4 comp · 2.8 appeal · clearest, 1 persona appeal=1
  • Minimal · 4.2 comp · 3.2 appeal · best-liked
  • Business · 4.2 comp · 2.8 appeal
  • Restaurant · 4.2 comp · 2.8 appeal · "season" failed everyone
approve → archives all 4 results to _results/; the naming call stays yours

Agents produce work faster than you can verify it.

It breaks down in four predictable ways:

Interruptions you can't predict

You sandboxed it, so permissions aren't the problem. You can't batch interruptions you can't predict.

Context switches between unrelated decisions

A design call, a one-line approval, and a ship-without-tests call arrive back to back. The switching is the cost.

Rubber-stamping

An agent reviewing its own work writes a press release; you stop reading closely.

You become the human messenger between agents

One agent's output is the next agent's input, and you carry it by hand: copy the findings, paste the context, explain it again.

The bottleneck is judgment, not generation.

You set the bar. We hold agents to it.

Define the bar.

Write down what done means for this work. The bar starts rough and sharpens every time you reject.

A separate agent reviews.

A fresh agent reviews the work against your bar and catches the cut corners and unsupported claims the maker would wave through.

Iterate until it passes.

Rejected work goes back with findings. The agents rework it until the evidence meets your standard, and only then does it reach you.

Every call is recorded in a plain file in your repo: what was decided, on what evidence, and why.

Built for the people who care about their craft when agents do the work.

You've used a coding agent hard enough to feel this. Spacedock meets your projects where they are: real code, drifting plans, a few dead ends. No re-scaffolding required.

You hand agents whole chunks of work

You build skills, tired of running them by hand

You reach for agents beyond coding (email, GTM, content, research)

You live in Claude Code or Codex all day and feel the slide into rubber-stamping

Run the scan on your own sessions.

brew install spacedock-dev/tap/spacedock spacedock claude '/spacedock:survey'

Or read the docs: spacedock.md/docs · github.com/spacedock-dev/spacedock

No spam. We read every reply.