Plan mode is becoming the most overused button in agentic engineering.
People reach for it before every feature, every refactor, every test, every CSS tweak, and every tiny change that could have been described in one sentence. The result is predictable: the agent spends time planning a task that would have taken less time to simply complete.
That does not mean plan mode is bad. Plan mode is excellent when the work is unclear, risky, or spread across many files. The problem is using it as a substitute for written standards.
A plan is a temporary answer to this question:
What should we do for this task?
A skill is a durable answer to a better question:
How should this kind of work be done in this codebase?
Those are different problems.
Use plan mode when the task needs thinking before editing.
Use AGENTS.md or CLAUDE.md for rules that should apply to every task.
Use SKILL.md for repeatable workflows, feature-area knowledge, and domain rules that should load only when relevant.
If you are still learning how agents work under the hood, start with my earlier guide: Build a Production-Ready AI Agent in Python. This post assumes you are already using coding agents and want better habits.
The Real Problem Is Not Planning
The real problem is that teams are using conversation as storage.
They paste the same design rules into every frontend task. They explain the same auth constraints before every protected route. They remind the agent about the same migration pattern, the same logging format, the same testing rules, and the same deployment checklist.
That feels productive because the agent obeys in the moment.
But it is fragile.
The instruction lives only in the current conversation. It disappears when the task ends, gets buried after compaction, or gets skipped when someone else opens a new session. The next task starts with the same ritual again.
This is not a planning problem. It is an infrastructure problem.
Your standards should not depend on memory, mood, or a perfectly written prompt. If a rule matters more than once, put it in a file.
What Plan Mode Is Actually For
Plan mode is a safety and discovery tool. Use it when you do not yet know the shape of the change.
Good uses of plan mode:
| Situation | Why planning helps |
|---|---|
| A schema change | The blast radius may include migrations, services, tests, and rollback paths. |
| A security-sensitive feature | The agent should inspect auth, permissions, secrets, and failure modes before editing. |
| A large refactor | The order of changes matters, and partial edits can break many files. |
| An unfamiliar codebase | Exploration prevents the agent from inventing patterns that do not exist. |
| Ambiguous product behavior | The agent should ask clarifying questions before committing to an implementation. |
Bad uses of plan mode:
| Situation | Better move |
|---|---|
| Fix a typo | Just fix it. |
| Rename one variable | Just do it and run the narrow check. |
| Add one log line | Just implement it. |
| Apply an existing button style | Use the design skill or design rules. |
| Add another endpoint that follows an existing pattern | Use the relevant API skill. |
The practical test is simple:
If you can describe the diff in one sentence, skip the plan.
If you need the agent to explore before you can describe the diff, use plan mode.
The Right Place For Reusable Knowledge
Modern coding agents now have several layers of instruction. They are not interchangeable.
| Layer | Best for | Loaded when |
|---|---|---|
| Prompt | This specific task | Only now |
| Plan mode | Unclear or risky implementation strategy | Only this planning moment |
AGENTS.md |
Always-true Codex project guidance | At session start |
CLAUDE.md |
Always-true Claude Code project guidance | At session start |
SKILL.md |
Repeatable workflows and feature-area knowledge | On demand, when relevant |
| Hooks | Actions that must happen every time | Deterministically at configured events |
| Automations | Stable recurring jobs | On a schedule or trigger |
This is the split most teams should aim for:
- Put universal rules in
AGENTS.mdfor Codex. - Put universal rules in
CLAUDE.mdfor Claude Code. - Put feature-class rules in skills.
- Put mandatory checks in hooks.
- Put repeated schedules in automations.
- Use plan mode only when the next move is not obvious.
That gives you a system instead of a pile of sticky notes.
AGENTS.md And CLAUDE.md Are For Always-True Rules
Codex reads AGENTS.md files before doing work. OpenAI describes this as durable repository guidance that sets consistent expectations across tasks.
Claude Code uses CLAUDE.md for the same broad category of persistent instructions. Claude Code does not directly treat AGENTS.md as its primary memory file, but its docs recommend creating a CLAUDE.md that imports AGENTS.md if you want both tools to share the same base rules.
A useful root file is short and boring:
# Repository Expectations
## Commands
- Run `pnpm test` after changing application logic.
- Run `pnpm lint` before opening a pull request.
- Use `pnpm`, not `npm`, in this repository.
## Engineering Rules
- Do not add production dependencies without approval.
- Keep public APIs backward compatible unless the task explicitly asks for a breaking change.
- Add tests for bug fixes that affect runtime behavior.
## Done Means
- The change is implemented.
- Relevant tests or checks have run.
- Any skipped check is named in the final response.
For Claude Code, if your team already maintains AGENTS.md, use a small CLAUDE.md bridge:
@AGENTS.md
## Claude Code
- Use plan mode for changes under `src/billing/` and `src/auth/`.
- Prefer asking one clarifying question before editing security-sensitive flows.
That keeps shared instructions in one place while allowing tool-specific nuance.
What should not go into these files?
- Long tutorials.
- Full API documentation.
- Every table name in the database.
- Feature-specific business logic.
- A 40-step release procedure that matters once a month.
Those belong somewhere else.
Usually, that somewhere else is a skill.
What A Skill Actually Is
A skill is a folder with a SKILL.md file and optional supporting files.
The core shape is intentionally simple:
my-skill/
SKILL.md
scripts/
references/
assets/
The SKILL.md file contains frontmatter and instructions:
---
name: payments
description: Use when a task touches Stripe checkout, billing records, refunds, or payment webhooks.
---
## Rules
- Match purchases to users by stable `user_id`, never by email.
- Every webhook handler must be idempotent because providers retry events.
- Refunds go through the refunds service, not the payment SDK directly.
## Files To Read First
- `billing/webhooks.py`
- `billing/refunds.py`
- `billing/models.py`
That is enough to change behavior.
The important part is the description. The agent sees descriptions before it loads full skill bodies. If the description is vague, the skill may not trigger. If the description is specific, the agent knows when to pull it in.
Bad description:
description: Helps with backend stuff.
Better description:
description: Use when changing authentication, protected routes, session cookies, refresh tokens, or login/logout behavior.
A good description names the task class, the trigger words, and the boundary.
Why Skills Beat Repeated Planning
Plan mode asks the agent to reason from the current prompt and current context.
Skills give the agent reusable context before it reasons.
That difference matters.
Imagine you ask for a new protected endpoint. Without a skill, the agent has to discover or infer:
- how sessions are issued,
- where auth middleware lives,
- which routes are public,
- how permissions are checked,
- how errors are shaped,
- how tests are written,
- and which existing endpoint is the best pattern.
Plan mode can help it investigate those things, but the investigation repeats every time.
With an auth skill, the agent starts closer to the truth:
---
name: auth
description: Use when a task touches login, logout, sessions, protected routes, auth middleware, or permission checks.
---
## Rules
- Public routes must be listed in `auth/public-routes.ts`.
- Protected API handlers must call `requireSession(request)` before reading user-owned data.
- Never trust a user id from request body when a session user id is available.
- Session cookies are HTTP-only, secure in production, and same-site lax.
## Endpoint Pattern
- Read `api/projects/[id]/route.ts` before adding a new protected project route.
- Use `AuthError` for unauthenticated requests.
- Use `ForbiddenError` for authenticated users without access.
## Tests
- Include one unauthenticated test.
- Include one forbidden-user test.
- Include one allowed-user test.
Now the agent does not need to rediscover your standards. It can spend its reasoning budget on the actual change.
That is the whole point.
The Skill Loading Model
Both Claude Code and Codex use the same broad idea: skills stay cheap until they are needed.
At startup, the agent sees lightweight metadata such as name and description. The full SKILL.md body loads only when the task matches the skill or when you invoke it directly. Supporting files can stay out of context until they are referenced.
This is why skills are better than dumping everything into one giant rule file.
A huge always-loaded instruction file punishes every task. A targeted skill only costs context when it helps.
That changes how you should write.
A skill should be short, specific, and operational. State what to do. Avoid long philosophy. Once loaded, its body may stay in context for the rest of the task, so every line should earn its place.
A good skill says:
- when to use it,
- what files to inspect first,
- what rules must be followed,
- what commands verify the work,
- what mistakes to avoid.
It does not explain the history of the whole system unless that history changes the next decision.
Where Skills Live
The exact folders differ by tool.
For Claude Code:
~/.claude/skills/<skill-name>/SKILL.md
.claude/skills/<skill-name>/SKILL.md
For Codex:
$HOME/.agents/skills/<skill-name>/SKILL.md
.agents/skills/<skill-name>/SKILL.md
Codex also scans .agents/skills up the repository tree from the current working directory, which is useful in monorepos. Claude Code supports personal, project, plugin, and nested project skill locations as well.
The practical rule is this:
| Scope | Put it here |
|---|---|
| Personal habit | User-level skills folder |
| Team convention | Repo-level skills folder |
| Package-specific workflow | Nested skills folder near that package |
| Broad reusable distribution | Plugin or shared package |
For a team, repo-level skills are usually the first serious step. They make skills reviewable in pull requests, versioned with the codebase, and available to every engineer who pulls the repo.
Skills Worth Writing First
Do not start by writing 30 skills. Start with the five places where the agent repeatedly needs context.
auth/SKILL.md
Use this when the task touches sessions, permissions, tokens, public routes, or protected endpoints.
Include:
- session source of truth,
- public route list,
- authorization checks,
- cookie rules,
- common tests,
- example files to read first.
payments/SKILL.md
Use this when the task touches checkout, billing records, invoices, refunds, subscriptions, or webhooks.
Include:
- stable identifiers,
- webhook idempotency,
- retry behavior,
- refund flow,
- ledger rules,
- audit logging requirements.
migrations/SKILL.md
Use this when the task touches database schema, indexes, constraints, or backfills.
Include:
- naming convention,
- zero-downtime pattern,
- rollback expectations,
- backfill strategy,
- lock avoidance rules,
- migration test command.
tests/SKILL.md
Use this when the task asks for tests, changes test helpers, or modifies behavior that needs coverage.
Include:
- unit versus integration boundary,
- what gets mocked,
- what uses a real database,
- naming convention,
- fixture pattern,
- commands for narrow and full checks.
observability/SKILL.md
Use this when the task touches logs, metrics, tracing, alerts, or incidents.
Include:
- logger format,
- span naming,
- metric naming,
- required fields on error logs,
- what not to log,
- incident links or dashboards.
These are not glamorous. Good infrastructure rarely is. But they remove dozens of small corrections.
Same Logic For Design
Frontend work suffers from the same problem.
Teams paste the same instructions again and again:
- use our spacing scale,
- do not use default fonts,
- avoid generic cards,
- use this button shape,
- use this motion style,
- match the existing page rhythm,
- do not create yet another one-off component.
That belongs in a design file or design skill.
For small projects, a DESIGN.md can work. For larger teams, use a frontend design skill.
---
name: frontend-design
description: Use when creating or changing UI, layout, styling, components, responsive behavior, or interaction states.
---
## Visual Direction
- Use the project typography tokens. Do not use default system fonts unless the design system already does.
- Prefer existing components before creating new primitives.
- Use the spacing scale in `src/styles/tokens.css`.
- Motion must communicate state change, not decoration.
## Before Editing
- Read `src/components/ui/README.md`.
- Check whether the target page already has a local pattern.
- Reuse button, input, modal, and card primitives unless there is a clear reason not to.
## Never Do
- Do not add new colors outside the token file.
- Do not create one-off shadows.
- Do not use hover movement on dense cards.
That skill can be invoked across hundreds of UI tasks. The agent does not need a new design lecture every time a button gets added.
A Practical SKILL.md Template
Use this as a starting point:
---
name: skill-name
description: Use when the task touches [specific area], [specific workflow], or [specific files]. Do not use for [boundary].
---
## Purpose
Use this skill to keep [area] changes consistent with the current system.
## Read First
- `path/to/canonical-file-1`
- `path/to/canonical-file-2`
## Rules
- Do the specific thing.
- Avoid the specific mistake.
- Reuse the specific pattern.
## Verification
- Run `command for narrow check` after changing this area.
- Run `command for broader check` before a PR if behavior changed.
## Common Mistakes
- Do not infer X from Y.
- Do not use old helper Z.
- Do not skip the edge case where [specific condition].
The template is intentionally plain. Fancy prose makes skills worse.
The agent does not need persuasion inside a skill. It needs operating instructions.
How To Decide What Becomes A Skill
Use this decision table.
| If the instruction is... | Put it in... |
|---|---|
| Relevant to every task | AGENTS.md or CLAUDE.md |
| Relevant to one feature area | SKILL.md |
| Relevant only to one prompt | The prompt |
| A required action after every edit | Hook |
| A repeatable scheduled job | Automation |
| A big uncertain implementation | Plan mode |
| A product decision needing human judgment | Ask a question first |
Here is the smell test:
If you have pasted the same paragraph into an agent three times, it probably belongs in a file.
If the paragraph applies to every task, put it in AGENTS.md or CLAUDE.md.
If it applies only sometimes, put it in a skill.
If it must happen without exception, make it a hook.
Keep Skills Small Enough To Obey
A bad skill is just a bloated prompt with a filename.
The most common mistakes:
- The description is too vague.
- The skill tries to cover an entire department.
- The body explains background instead of giving instructions.
- The skill contains stale file paths.
- The skill lists every possible exception.
- The skill has no verification command.
- The skill duplicates rules that already live in
AGENTS.mdorCLAUDE.md.
A good first skill is often under 100 lines.
Long reference material can live in references/. Scripts can live in scripts/. Templates can live in assets/. The SKILL.md file should be the map, not the whole library.
For example:
payments/
SKILL.md
references/
webhook-event-flow.md
refund-policy.md
scripts/
replay-webhook-fixture.ts
assets/
webhook-test-template.ts
Then SKILL.md can say:
## Extra Context
- Read `references/webhook-event-flow.md` only when changing webhook ordering.
- Read `references/refund-policy.md` only when changing refund behavior.
- Use `scripts/replay-webhook-fixture.ts` to reproduce webhook retry cases.
That preserves context.
It also makes the skill easier to maintain.
Skills Need Ownership
The strongest objection to skills is also the fairest one:
What if they go stale?
They will, unless someone owns them.
Treat a skill like a runbook. If the system changes, the skill changes in the same pull request. If a code review catches an agent violating a rule that should have been documented, update the skill. If the agent ignores a skill twice, rewrite the description or trim the body.
A simple maintenance rule works well:
- If the code changes the workflow, update the skill.
- If the skill points to a deleted file, fix it immediately.
- If the skill creates bad output twice, rewrite it.
- If the skill has not triggered in months, either improve its description or delete it.
Skills are not sacred documents. They are working tools.
What About Security?
Skills can include scripts and instructions that guide powerful tools, so treat them as code.
For team repositories:
- Review skill changes in pull requests.
- Avoid checking secrets into skills or reference files.
- Keep destructive workflows manually invoked.
- Be careful with tool allowlists or permission shortcuts.
- Do not install random third-party skills into sensitive environments without review.
The safest early version of a skill is instruction-only. Add scripts only when they clearly improve reliability.
A good progression is:
- Write an instruction-only skill.
- Use it on real tasks.
- Watch what the agent still gets wrong.
- Add a reference file only if needed.
- Add a script only when manual steps are repetitive and safe.
Do not over-engineer the first version.
A Better Workflow For Agentic Engineering
Here is the workflow I recommend.
Step 1: Start with a normal prompt.
Add a protected endpoint for exporting project members as CSV.
Step 2: If the agent asks questions answered by your repo standards, write them down.
# AGENTS.md
- Protected endpoints must call `requireSession(request)` before reading user data.
- CSV exports must stream rows and avoid loading more than 10,000 records into memory.
Step 3: If the same area keeps needing special context, create a skill.
.agents/skills/exports/SKILL.md
.claude/skills/exports/SKILL.md
Step 4: If the task is genuinely broad, use plan mode.
We need to redesign CSV exports for all reporting modules. Explore first, identify affected files, and propose a migration plan before editing.
Step 5: If the workflow becomes predictable, automate it.
Every Friday, summarize new errors from the export service and draft follow-up issues.
This is how you move from prompting to infrastructure.
The Real Split
Plan mode is a seatbelt. Use it when the road is uncertain.
AGENTS.md and CLAUDE.md are the driving rules. They apply all the time.
Skills are the toolkits in the trunk. You do not hold every tool in your hand while driving, but you want the right one nearby when the job appears.
That is the mental model.
Do not plan every small task. Do not paste the same standards into every prompt. Do not turn your chat history into a knowledge base by accident.
Write the rule once.
Name it clearly.
Put it where the agent can find it.
Then let the agent spend less time rediscovering your workflow and more time doing the work.
Copy-Paste Checklist
Use this checklist the next time you catch yourself writing a long prompt:
- Is this instruction needed for every task? Put it in
AGENTS.mdorCLAUDE.md. - Is this instruction only relevant to one domain? Put it in
SKILL.md. - Does this workflow have side effects? Consider manual invocation.
- Does this rule need enforcement, not advice? Use a hook.
- Does this happen on a schedule? Use automation.
- Is the task unclear or risky? Use plan mode.
- Is the change obvious and small? Skip the plan and ship it.
That one checklist can save a surprising amount of agent time.
The best agentic teams will not be the ones with the longest prompts.
They will be the ones with the best reusable context.