If you search for "build an AI agent", most tutorials give you a framework and a short script. It works quickly, but your understanding stays shallow.
That becomes a problem when something breaks, cost spikes, or behavior gets weird.
This guide takes the opposite approach.
You will build a Claude-powered agent from first principles. You will understand how it thinks, how tools are called, how memory works, and how to keep it reliable in production.
No framework lock-in. No hidden abstractions. No guessing.
By the end of this post, you will be able to:
- Explain what an agent is in one sentence.
- Build a working agent loop in Python.
- Implement Claude tool calls correctly.
- Prevent runaway loops and expensive failures.
- Design a production-ready knowledge-base compiler agent.
Who This Is For
This is for developers who:
- know basic Python,
- can run scripts from terminal,
- and want to understand agents deeply before using frameworks.
You do not need advanced ML knowledge.
Prerequisites
| Requirement | Notes |
|---|---|
| Python 3.10+ | 3.11+ recommended |
| Anthropic API key | from console.anthropic.com |
| Basic terminal usage | create venv, install package, run script |
| Small budget | around $5 is enough to learn comfortably |
Setup:
mkdir claude-agent
cd claude-agent
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # Windows PowerShell
pip install anthropic
Set API key:
export ANTHROPIC_API_KEY="sk-ant-..." # macOS/Linux
# $env:ANTHROPIC_API_KEY="sk-ant-..." # PowerShell
Part 1: What an Agent Actually Is
A normal LLM call is one input and one output.
You -> ask question -> Claude -> answer
An agent is Claude inside a loop:
Task arrives
Claude decides next action
Your code executes action
Claude observes result
Claude decides next action
Repeat until done
That is the full idea.
Think of it like a REPL cycle:
- Read
- Evaluate
- Loop
Agent cycle is similar:
- Read context
- Decide
- Act with tool
- Observe result
- Loop
Every agent has 3 parts:
- Brain: the model that decides what to do next.
- Tools: actions it can request.
- Loop: your orchestration code.
An AI agent is an LLM that can repeatedly decide and act through tools until a stopping condition is met.
Part 2: Your First Claude API Call
Start with a minimal call.
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "What is the capital of Pakistan?"}
],
)
print(response.content[0].text)
Now inspect the response object shape.
Important fields:
content: list of blocks, not a plain string.stop_reason: tells you whether Claude is done or needs tools.usage: token usage for cost tracking.
Think of a "block" as one unit in the reply. One block might be normal text. Another block might be a tool request.
Common beginner mistake:
- Treating
contentas always one text string.
Reality:
- It can contain multiple blocks,
- and in agent mode it often contains
tool_useblocks.
Part 3: Stop Reasons and Control Flow
Your loop should branch based on stop_reason.
| stop_reason | Meaning | What your code should do |
|---|---|---|
end_turn |
Claude is done for now | return or continue outer workflow |
tool_use |
Claude wants one or more tools | execute tool calls, append results, call Claude again |
| other/unexpected | edge case | log and fail safely |
This single field is your traffic signal.
If your agent feels unpredictable, inspect stop_reason first.
Part 4: Statelessness and Memory
Claude does not persist memory across API calls.
It only sees what you send now.
That means your code owns memory.
Example:
messages = [
{"role": "user", "content": "My name is Ahmad. Remember it."},
{"role": "assistant", "content": "Got it. Your name is Ahmad."},
{"role": "user", "content": "What is my name?"},
]
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=200,
messages=messages,
)
Why this works:
- You included prior turns.
- Claude inferred continuity from provided history.
Mental model to keep forever:
- Model does reasoning.
- Your code manages state.
Part 5: Giving Claude Tools
Tools are JSON descriptions.
You define:
- name,
- description,
- input schema.
Claude reads this contract and may emit a tool request.
In simple terms, the schema is a form definition. It tells Claude what fields are allowed and required before a tool can run.
tools = [
{
"name": "read_file",
"description": "Read a UTF-8 text file and return full contents.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Absolute or workspace-relative path"
}
},
"required": ["path"]
}
}
]
When Claude decides to use a tool, response content contains tool_use block(s).
Your responsibilities are strict:
- Append Claude tool request message to history.
- Execute tool(s) in your runtime.
- Append tool result message(s) with matching
tool_use_id.
If you miss step 1 or 3, the loop loses context.
Part 6: The Minimal Agent Loop
This is the core engine.
def run_agent(task: str, max_steps: int = 10):
messages = [{"role": "user", "content": task}]
for step in range(1, max_steps + 1):
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return response.content[0].text
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
output = execute_tool(block.name, block.input)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": output,
}
)
messages.append({"role": "user", "content": tool_results})
continue
raise RuntimeError(f"Unhandled stop_reason: {response.stop_reason}")
return f"Stopped after {max_steps} steps"
Always cap steps with max_steps.
Without a hard limit, a failing tool can trigger repeated retries and uncontrolled token spend.
Suggested max_steps defaults:
| Task Type | Suggested max_steps |
|---|---|
| answer from one file | 3 to 6 |
| summarize multiple files | 10 to 30 |
| full compile workflow | 30 to 100 |
Part 7: Production Hardening
A working loop is not a production loop.
You need resilience.
For beginners, resilience means this: your agent should keep behaving safely when the network is slow, a file is missing, or a tool returns bad output.
1) Structured logging
import logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(message)s",
)
logger = logging.getLogger(__name__)
Log these at minimum:
- step number,
- stop reason,
- tool name,
- token usage,
- exceptions.
2) Retry with exponential backoff
Retry only transient failures.
Transient means temporary. Examples: rate limits, short network interruptions, or brief server overload. Do not retry permanent problems such as bad input schema or unknown tool names.
import time
import anthropic
MAX_RETRIES = 4
BASE_WAIT_SECONDS = 1
def call_claude_with_retry(client, **kwargs):
last_error = None
for attempt in range(1, MAX_RETRIES + 1):
try:
return client.messages.create(**kwargs)
except anthropic.RateLimitError as err:
last_error = err
except anthropic.APIStatusError as err:
if err.status_code not in {429, 500, 502, 503, 529}:
raise
last_error = err
except anthropic.APIConnectionError as err:
last_error = err
wait = min(BASE_WAIT_SECONDS * (2 ** (attempt - 1)), 30)
logger.warning("Retrying Claude call in %ss (attempt %s)", wait, attempt)
time.sleep(wait)
raise last_error
3) Safe tool execution
Never let tool exceptions crash full orchestration.
def execute_tool_safely(tool_name, tool_input, registry):
try:
return registry[tool_name](tool_input)
except FileNotFoundError as err:
return f"ToolError: FileNotFoundError: {err}"
except Exception as err:
return f"ToolError: {type(err).__name__}: {err}"
4) Token accounting
Track token usage every step and aggregate per run.
run_input_tokens = 0
run_output_tokens = 0
# after each response
run_input_tokens += response.usage.input_tokens
run_output_tokens += response.usage.output_tokens
This gives you real cost visibility.
Part 8: Knowledge-Base Agent Design
Now apply the pattern to a real project.
Imagine a folder full of lecture notes. This agent reads each note, writes clean summaries, groups related concepts, and creates one index so a student can navigate quickly.
Target behavior:
- Read source files from
raw/. - Write one summary file per source into
wiki/summaries/. - Create concept pages in
wiki/concepts/. - Maintain a root index file
wiki/_index.md.
Tools:
list_files(directory)read_file(path)write_file(path, content)
System prompt skeleton:
SYSTEM_PROMPT = """
You are a knowledge base compiler.
Workflow:
1) Discover all files in raw/
2) Summarize each file into wiki/summaries/
3) Consolidate repeated ideas into wiki/concepts/
4) Build wiki/_index.md
Rules:
- Do not skip source files
- Use clear markdown headings
- Include source references
- Report what was generated at the end
"""
Beginner tip:
- Keep system prompt procedural.
- Use numbered workflow steps.
- Add explicit "do not skip" rules.
Part 9: Speed and Cost Optimization
Parallel processing for independent units
For many files, use a worker pool.
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=5) as pool:
futures = [pool.submit(process_file, path) for path in files]
Use parallelism only when tasks are independent. If one task depends on another task's output, keep it sequential.
Prompt caching
If system prompt stays stable, prompt caching can reduce repeated prompt cost significantly.
Incremental rebuilds
Do not reprocess unchanged files.
Use timestamp plus hash strategy.
import hashlib
import os
def file_hash(path: str) -> str:
with open(path, "rb") as f:
return hashlib.md5(f.read()).hexdigest()
def has_changed(path: str, state: dict) -> bool:
existing = state.get(path)
if not existing:
return True
current_mtime = os.path.getmtime(path)
if current_mtime == existing["timestamp"]:
return False
current_hash = file_hash(path)
if current_hash == existing["hash"]:
existing["timestamp"] = current_mtime
return False
return True
This gives speed without losing correctness.
Part 10: Common Beginner Mistakes
These cause most early failures:
- Forgetting to append Claude tool request to history.
- Returning tool output without correct
tool_use_id. - Running without
max_steps. - Treating all API errors as retryable.
- Ignoring token usage in logs.
- Writing vague tool descriptions.
- Packing too many unrelated rules in system prompt.
If your agent behaves strangely, check in this order:
stop_reason- message history completeness
- tool request and tool result pairing
- step limit behavior
- token and retry logs
Final Blueprint
If you want one compact architecture to keep, use this:
- Clear system prompt.
- Minimal tools with strict schemas.
- Stateful message history in code.
- Loop driven by
stop_reason. max_stepshard safety cap.- Retries for transient failures only.
- Safe tool error handling.
- Token accounting.
- Incremental processing.
Conclusion
Most developers first learn agent frameworks. A better long-term path is learning the loop itself.
Once you understand the loop, frameworks become optional tooling. You can adopt them strategically instead of depending on them blindly.
You now have a complete beginner-to-production mental model:
- how decisions happen,
- how actions are executed,
- how memory is maintained,
- and how reliability is enforced.
That is the foundation for building agents you can trust in real systems.