Skip to main content
How-To7 min read

Running Claude Code as a team: subagents and parallel work

A single Claude Code session is one pair of hands working through a task in order. That is fine for most things, but some work is wider than one pair of hands can hold. Subagents are how you run a team instead of a soloist. Each subagent spins up with its own clean context, takes one scoped job, does it, and reports back. Used well, they make hard work faster and the results more reliable. Used badly, they multiply cost and confusion. Here is how I draw that line.

Why isolated context is the real benefit

The headline reason people reach for subagents is speed, but the deeper reason is context hygiene. When a subagent goes off to read forty files looking for one answer, all of that reading happens in its context, not your main thread. What comes back is the conclusion, not the pile of files it waded through. Your working session stays lean and focused, which keeps the assistant sharp on the actual task. A long investigation that would have buried your main context in noise instead happens off to the side, and you get only the signal.

One job, clearly scoped

A subagent does not share your session history, so it only knows what you tell it. That makes the brief everything. A good subagent task names the exact scope, "review this one file" rather than "review the code," carries enough context to stand on its own, states what the agent may and may not change, and says what a finished result looks like. A vague brief produces a vague result, and because the agent is working out of your sight, you do not notice the drift until it hands something back. Tight scope in, useful work out.

Run independent work in parallel

The moment subagents pay for themselves is when you have several independent problems that do not depend on each other. Three failing tests with three different root causes. A security pass, a performance pass, and a type-checking pass over different parts of the codebase. None of these needs to wait for the others, so you dispatch them at once and let them run in parallel. What would have been a long sequential slog becomes a single round, and as a bonus, uncorrelated context windows tend to catch bugs that one tired linear thread walks right past. The test is simple: if two tasks need nothing from each other, they can run together.

Let reviewers report, not rewrite

A pattern I hold to firmly: agents whose job is to review or analyze should hand back findings, not silently rewrite the code. A reviewer that edits in place is making decisions you never got to see, and when several of them run at once you get a tangle of conflicting changes nobody can untangle. Keep the agents as the eyes and the main thread, with you behind it, as the hands that integrate. You read the findings, you decide what to act on, you keep one coherent view of the change. The separation between "who finds problems" and "who fixes them" is what keeps a parallel run from turning into a mess.

Tier the models so a fleet stays cheap

Running a team does not mean running ten copies of the most expensive model. Most of the jobs you fan out, searching, scanning, mechanical checks, run perfectly well on a small fast model, and only the genuinely hard reasoning needs a strong one. A sensible fleet is mostly cheap workers with a few heavyweight thinkers where the problem earns it. Pin the model per job rather than letting everything inherit the biggest one, and a parallel run that sounds extravagant turns out to cost less than you would guess, because the bulk of it is cheap.

Parallel sessions, for when one is not enough

Subagents share your one session. When you want genuinely separate streams of work running at the same time, the next step up is parallel sessions, each on its own isolated copy of the repository so their changes never collide. It is the same principle scaled out: independent work in independent contexts, recombined when each is done. You would not do this for a quick fix, but for several real features in flight at once it turns a queue into a set of lanes.

The honest limits

Fanning out is not free and not always right. Every agent costs tokens, coordination has overhead, and agents editing the same files in parallel will step on each other if you let them. For a small, linear task, a single focused session is faster and cheaper than the ceremony of dispatching a team. The skill is recognizing which shape of work is in front of you: genuinely parallel problems reward a team, while one tightly coupled thread of changes wants a single pair of hands. Reach for the fleet when the work is actually wide, not by reflex.

The bottom line

Subagents turn Claude Code from a soloist into a team you direct. The value is isolated context as much as raw parallelism: scoped jobs that keep your main thread clean, independent problems solved at once, reviewers that report so you stay the integrator, and model tiering that keeps the whole thing affordable. Use it when the work is wide and skip it when it is not. It sits on top of the base setup and its guardrails and pairs with skills the agents can run and a shared memory they draw on.

If you want AI agents wired into your business to do real work in parallel rather than one slow task at a time, send a briefand I'll scope it with you.

More questions?

Easier on a call than in a blog post.

Replying within 4 hours during working hours