How-To2026-06-218 min read

How we set up Claude Code to work like a careful engineer

Install Claude Code, open it in a fresh project, and what you get is a fast, eager junior engineer with no memory of your standards and no judgment about your codebase. It will happily refactor three files you did not ask it to touch, declare a half-finished job done, and forget every correction the moment the session ends. The model is not the problem. The setup is. The gap between an impressive demo and a tool I trust with real work is almost entirely configuration, and it is configuration you control. Here is the setup layer I run on every project.

A constitution that loads every session

Claude Code reads a file called CLAUDE.md at the start of every session, and that file is your one chance to set the rules before any work happens. I treat it as a constitution, not a manual. It holds a small set of working-posture rules, the non-negotiable ones, and almost nothing else. The temptation is to pour everything in. Resist it. This file is paid for in context tokens on every single request, so every line has to earn its place.

The posture rules that matter most

Read before you write. Before changing code, read the thing you are about to touch and its immediate callers. If you cannot explain why the existing code is shaped the way it is, you are not ready to change it.
Make surgical changes. Touch only what the task requires. No improving, reformatting, or refactoring the code next door "while you are in there." That habit is where quiet breakage comes from.
Fail loud. Before saying done, list every step that was skipped, mocked, or left unverified. If that list is not empty, the work is not done. "Tests pass" is a lie if any test was skipped.
Think before coding. State the assumptions out loud first. When two readings of the request are both plausible, ask instead of guessing. A wrong guess that compiles is more expensive than a question.
Verify before you claim it is done. Run the build, the tests, the lint. Read the actual output, including the exit code. Only then say it works. No "should work" or "probably passes."

None of these is clever. That is the point. They are the habits a good engineer already has and a fresh language model does not, and writing them down once is what makes the model behave like the former instead of the latter.

Separate posture from procedure

A constitution is not a procedures manual, and trying to make it both is how these files balloon to thousands of lines that quietly tax every session. I keep the always-loaded file to posture and safety, the rules that apply to everything, and push the detailed procedures, testing standards, git workflow, security checklist, debugging method, into separate files that load only when that kind of work comes up. The always-on file stays under a hard size cap. A rule about database migrations does not need to be sitting in context while I am writing CSS.

Match the model to the task

Claude comes in tiers, and not every task deserves the most powerful, most expensive one. I reach for the strongest model on architecture and gnarly multi-file reasoning, a mid tier for mainline coding where a cheaper model is plenty capable, and a small fast model for mechanical work like searching, scanning, and bulk edits. A flat "always use the biggest model" policy is the easiest way to burn budget on work that did not need it, and it makes simple tasks slower for no benefit. The skill is deciding which tier a task actually requires, and most of them require less than you would reflexively reach for.

Permissions, with rails

Left fully manual, the assistant stops to ask before every command and the friction kills the flow. Left fully open, it can run something destructive while you are looking away. The middle path is the right one: pre-approve a safe set of commands so it does not interrupt you to install a package or run the tests, block the genuinely dangerous operations outright, and never grant blanket auto-accept on exploratory work where you cannot predict what it will try. Relaxing the rails should be a deliberate, narrow choice for a trusted plan, never the default.

Give it a way to check its own work

The single highest-leverage thing in the whole setup is a verification loop. Tests, a build, a lint, or a hook that runs the suite automatically when a session ends. Without one, the model is guessing whether it succeeded and you are the only error detector in the loop. With one, it can catch its own mistakes before you ever see them, and it does. Most of the "the AI broke my code" stories I hear are really "the AI had no way to find out it broke my code." Close that gap and the failure rate drops sharply, because the assistant stops being optimistic and starts being evidence-driven.

The honest limits

This is not a recipe for hands-off perfection. Rules reduce variance, they do not remove it. The model still drifts on a long task, still occasionally calls something done when it is not, and still needs a human reading the diff before anything ships. What the setup buys is consistency: the work comes out close enough to your standard that review is fast instead of forensic. It turns a tool that is unpredictable into one that is merely imperfect, which is the difference that makes it usable on real projects.

The bottom line

A capable AI coding assistant is not the model alone, it is the model plus the harness you build around it: a lean constitution loaded every session, posture rules kept separate from detailed procedures, the right model tier for each task, permission rails that are neither too tight nor too loose, and above all a way for it to verify its own work. Get those right and Claude Code stops behaving like an eager intern and starts behaving like an engineer who already knows your standards. The next pieces of the same system are skills, which package the work you repeat, subagents, which let it work as a team, and a memory that survives every session.

If you want an AI assistant or agent set up around your own business, with the rules, guardrails, and verification that make it trustworthy rather than just impressive, send a briefand I'll scope it with you.

How we set up Claude Code to work like a careful engineer

A constitution that loads every session

The posture rules that matter most

Separate posture from procedure

Match the model to the task

Permissions, with rails

Give it a way to check its own work

The honest limits

The bottom line

More from the notes.

Running Claude Code as a team: subagents and parallel work

Claude Code skills: how we package repeatable work, and the ones worth building first

How we give Claude Code a memory that survives every session

More questions?