Build a Production-Ready AI Agent in Python (Beginner Guide)

Learn the exact loop, tool calling flow, memory model, and reliability patterns behind real-world agents.

If you search for "build an AI agent", most tutorials give you a framework and a short script. It works quickly, but your understanding stays shallow.

That becomes a problem when something breaks, cost spikes, or behavior gets weird.

This guide takes the opposite approach.

You will build a Claude-powered agent from first principles. You will understand how it thinks, how tools are called, how memory works, and how to keep it reliable in production.

No framework lock-in. No hidden abstractions. No guessing.

What You Will Be Able to Do

By the end of this post, you will be able to:

Explain what an agent is in one sentence.
Build a working agent loop in Python.
Implement Claude tool calls correctly.
Prevent runaway loops and expensive failures.
Design a production-ready knowledge-base compiler agent.

Who This Is For

This is for developers who:

know basic Python,
can run scripts from terminal,
and want to understand agents deeply before using frameworks.

You do not need advanced ML knowledge.

Prerequisites

Requirement	Notes
Python 3.10+	3.11+ recommended
Anthropic API key	from `console.anthropic.com`
Basic terminal usage	create venv, install package, run script
Small budget	around $5 is enough to learn comfortably

Setup:

mkdir claude-agent
cd claude-agent
python -m venv .venv
source .venv/bin/activate        # macOS/Linux
# .venv\Scripts\activate        # Windows PowerShell
pip install anthropic

Set API key:

export ANTHROPIC_API_KEY="sk-ant-..."   # macOS/Linux
# $env:ANTHROPIC_API_KEY="sk-ant-..."   # PowerShell

Part 1: What an Agent Actually Is

A normal LLM call is one input and one output.

You -> ask question -> Claude -> answer

An agent is Claude inside a loop:

Task arrives
Claude decides next action
Your code executes action
Claude observes result
Claude decides next action
Repeat until done

That is the full idea.

Think of it like a REPL cycle:

Read
Evaluate
Print
Loop

Agent cycle is similar:

Read context
Decide
Act with tool
Observe result
Loop

Every agent has 3 parts:

Brain: the model that decides what to do next.
Tools: actions it can request.
Loop: your orchestration code.

One-Line Definition

Part 2: Your First Claude API Call

Start with a minimal call.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of Pakistan?"}
    ],
)

print(response.content[0].text)

Now inspect the response object shape.

Important fields:

content: list of blocks, not a plain string.
stop_reason: tells you whether Claude is done or needs tools.
usage: token usage for cost tracking.

Think of a "block" as one unit in the reply. One block might be normal text. Another block might be a tool request.

Common beginner mistake:

Treating content as always one text string.

Reality:

It can contain multiple blocks,
and in agent mode it often contains tool_use blocks.

Part 3: Stop Reasons and Control Flow

Your loop should branch based on stop_reason.

stop_reason	Meaning	What your code should do
`end_turn`	Claude is done for now	return or continue outer workflow
`tool_use`	Claude wants one or more tools	execute tool calls, append results, call Claude again
other/unexpected	edge case	log and fail safely

This single field is your traffic signal.

If your agent feels unpredictable, inspect stop_reason first.

Part 4: Statelessness and Memory

Claude does not persist memory across API calls.

It only sees what you send now.

That means your code owns memory.

Example:

messages = [
    {"role": "user", "content": "My name is Ahmad. Remember it."},
    {"role": "assistant", "content": "Got it. Your name is Ahmad."},
    {"role": "user", "content": "What is my name?"},
]

response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=200,
    messages=messages,
)

Why this works:

You included prior turns.
Claude inferred continuity from provided history.

Mental model to keep forever:

Model does reasoning.
Your code manages state.

Part 5: Giving Claude Tools

Tools are JSON descriptions.

You define:

name,
description,
input schema.

Claude reads this contract and may emit a tool request.

In simple terms, the schema is a form definition. It tells Claude what fields are allowed and required before a tool can run.

tools = [
    {
        "name": "read_file",
        "description": "Read a UTF-8 text file and return full contents.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Absolute or workspace-relative path"
                }
            },
            "required": ["path"]
        }
    }
]

When Claude decides to use a tool, response content contains tool_use block(s).

Your responsibilities are strict:

Append Claude tool request message to history.
Execute tool(s) in your runtime.
Append tool result message(s) with matching tool_use_id.

If you miss step 1 or 3, the loop loses context.

Part 6: The Minimal Agent Loop

This is the core engine.

def run_agent(task: str, max_steps: int = 10):
    messages = [{"role": "user", "content": task}]

    for step in range(1, max_steps + 1):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return response.content[0].text

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue
                output = execute_tool(block.name, block.input)
                tool_results.append(
                    {
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": output,
                    }
                )

            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unhandled stop_reason: {response.stop_reason}")

    return f"Stopped after {max_steps} steps"

Safety Rule

Suggested max_steps defaults:

Task Type	Suggested max_steps
answer from one file	3 to 6
summarize multiple files	10 to 30
full compile workflow	30 to 100

Part 7: Production Hardening

A working loop is not a production loop.

You need resilience.

For beginners, resilience means this: your agent should keep behaving safely when the network is slow, a file is missing, or a tool returns bad output.

1) Structured logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(message)s",
)
logger = logging.getLogger(__name__)

Log these at minimum:

step number,
stop reason,
tool name,
token usage,
exceptions.

2) Retry with exponential backoff

Retry only transient failures.

Transient means temporary. Examples: rate limits, short network interruptions, or brief server overload. Do not retry permanent problems such as bad input schema or unknown tool names.

import time
import anthropic

MAX_RETRIES = 4
BASE_WAIT_SECONDS = 1


def call_claude_with_retry(client, **kwargs):
    last_error = None

    for attempt in range(1, MAX_RETRIES + 1):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError as err:
            last_error = err
        except anthropic.APIStatusError as err:
            if err.status_code not in {429, 500, 502, 503, 529}:
                raise
            last_error = err
        except anthropic.APIConnectionError as err:
            last_error = err

        wait = min(BASE_WAIT_SECONDS * (2 ** (attempt - 1)), 30)
        logger.warning("Retrying Claude call in %ss (attempt %s)", wait, attempt)
        time.sleep(wait)

    raise last_error

3) Safe tool execution

Never let tool exceptions crash full orchestration.

def execute_tool_safely(tool_name, tool_input, registry):
    try:
        return registry[tool_name](tool_input)
    except FileNotFoundError as err:
        return f"ToolError: FileNotFoundError: {err}"
    except Exception as err:
        return f"ToolError: {type(err).__name__}: {err}"

4) Token accounting

Track token usage every step and aggregate per run.

run_input_tokens = 0
run_output_tokens = 0

# after each response
run_input_tokens += response.usage.input_tokens
run_output_tokens += response.usage.output_tokens

This gives you real cost visibility.

Part 8: Knowledge-Base Agent Design

Now apply the pattern to a real project.

Imagine a folder full of lecture notes. This agent reads each note, writes clean summaries, groups related concepts, and creates one index so a student can navigate quickly.

Target behavior:

Read source files from raw/.
Write one summary file per source into wiki/summaries/.
Create concept pages in wiki/concepts/.
Maintain a root index file wiki/_index.md.

Tools:

list_files(directory)
read_file(path)
write_file(path, content)

System prompt skeleton:

SYSTEM_PROMPT = """
You are a knowledge base compiler.

Workflow:
1) Discover all files in raw/
2) Summarize each file into wiki/summaries/
3) Consolidate repeated ideas into wiki/concepts/
4) Build wiki/_index.md

Rules:
- Do not skip source files
- Use clear markdown headings
- Include source references
- Report what was generated at the end
"""

Beginner tip:

Keep system prompt procedural.
Use numbered workflow steps.
Add explicit "do not skip" rules.

Part 9: Speed and Cost Optimization

Parallel processing for independent units

For many files, use a worker pool.

from concurrent.futures import ThreadPoolExecutor

with ThreadPoolExecutor(max_workers=5) as pool:
    futures = [pool.submit(process_file, path) for path in files]

Use parallelism only when tasks are independent. If one task depends on another task's output, keep it sequential.

Prompt caching

If system prompt stays stable, prompt caching can reduce repeated prompt cost significantly.

Incremental rebuilds

Do not reprocess unchanged files.

Use timestamp plus hash strategy.

import hashlib
import os


def file_hash(path: str) -> str:
    with open(path, "rb") as f:
        return hashlib.md5(f.read()).hexdigest()


def has_changed(path: str, state: dict) -> bool:
    existing = state.get(path)
    if not existing:
        return True

    current_mtime = os.path.getmtime(path)
    if current_mtime == existing["timestamp"]:
        return False

    current_hash = file_hash(path)
    if current_hash == existing["hash"]:
        existing["timestamp"] = current_mtime
        return False

    return True

This gives speed without losing correctness.

Part 10: Common Beginner Mistakes

These cause most early failures:

Forgetting to append Claude tool request to history.
Returning tool output without correct tool_use_id.
Running without max_steps.
Treating all API errors as retryable.
Ignoring token usage in logs.
Writing vague tool descriptions.
Packing too many unrelated rules in system prompt.

Quick Debug Checklist

Final Blueprint

If you want one compact architecture to keep, use this:

Clear system prompt.
Minimal tools with strict schemas.
Stateful message history in code.
Loop driven by stop_reason.
max_steps hard safety cap.
Retries for transient failures only.
Safe tool error handling.
Token accounting.
Incremental processing.

Conclusion

Most developers first learn agent frameworks. A better long-term path is learning the loop itself.

Once you understand the loop, frameworks become optional tooling. You can adopt them strategically instead of depending on them blindly.

You now have a complete beginner-to-production mental model:

how decisions happen,
how actions are executed,
how memory is maintained,
and how reliability is enforced.

That is the foundation for building agents you can trust in real systems.