SHAPE · 03 · AGENTS
Agent systems.
Multi-step tool-use with human-in-the-loop checkpoints, guardrails that hold, and evals that catch drift before your users do. Built for workflows that are too important to hallucinate through.
- ship
- 4–6 weeks
- best for
- workflows, back-office, ops
- engine
- Claude · tool-use loops · guardrails
- handoff
- repo · keys · runbook · eval harness
the spec
What we actually ship.
01 / 04
01 · AGENTS
Bounded loops
Tool-use loops with budgets, timeouts, and stop conditions. No runaway agents.
02 · AGENTS
Human-in-the-loop
Checkpoints where it matters — approvals, escalations, irreversible actions. The agent knows what to escalate.
03 · AGENTS
Guardrails that hold
Input and output filtering, schema enforcement, allow-listed tools. Not a layer of duct tape — designed in.
04 · AGENTS
Drift detection
Evals that track behavior over time. When the agent drifts, you see it before the user does.
architecture
How it fits together.
The skeleton we reach for first. We bend it to your stack; we don't bend your stack to it.
02 / 04
[ trigger ] ──▶ ┌──────────────┐
│ CLAUDE LOOP │
│ tool-use │
└──┬────┬────┬─┘
│ │ │
┌──────┘ │ └──────┐
▼ ▼ ▼
[ tool A ] [ tool B ] [ approval? ]
│ │ │
└─────┬─────┘ ▼
│ ┌────────┐
│ │ human │ ──▶ ok / reject
│ └────────┘
▼
┌────────────┐ ┌──────────────┐
│ AUDIT LOG │ │ DRIFT EVALS │
└────────────┘ └──────────────┘
fit
Who this is for.
03 / 04
FIT · YES
You are —
- You have a back-office workflow that's structured — inputs, tools, a checkable output.
- You can define what 'done well' looks like, so we can eval against it.
- You want the agent to escalate, not to improvise, on anything irreversible.
FIT · NO
You are not —
- Open-ended 'do anything' agents. We don't build those.
- Workflows where the output can't be checked — if you can't eval it, we can't ship it.
- Teams without the appetite to keep the eval harness running post-handoff.
→ open a line
Book a call.
30 minutes. We'll tell you if agents is the right shape, or if it isn't.
04 / 04