We moved from training AI, to running it. Now we let it act.
01
Training
02
Inference
03
Automation
Reasoning›Execution
What an agent is
And what it is not.
Chatbot
Answers your questions.
Co-pilot
Helps you do the work.
Workflow
Runs a fixed script.
Agent
Decides and does the work itself, adapting as it goes.
brainstepstools
Context window
Context windows are finite. Real tasks are not.
Actual Context Window 1M
Tool Definitions2.5M
System Prompt2K
Task
Conversation8K
Tool definitions alone exceed the entire window.
Tool calling
Post-mortem injection
Tool use is bolted on after the model is trained. It never actually learned these APIs, so it guesses.
Functional overlap
When tools expose similar schemas, the model blends them and invents parameters that do not exist.
11
placeholder image · swap later
But LLMs are very good at one thing: writing code.
The pivot
They are bad at calling tools. They are great at one thing: writing code.
Calling tools directly
model→charge_invoice({…})
✕ wrong parameter, the task breaks
Writing code
// charge every overdue invoiceconst overdue = await db.query("due < now() AND !paid")
for (const inv of overdue) {
const r = await billing.charge(inv)
if (!r.ok) awaitretry(inv)
}
✓ runs, retries, completes
Deterministic: loops, conditions, retries, error handling.
One program does what dozens of brittle tool-calls could not.
The tools stay out of the context window.
So where does all of this run, for hours or even days?
One layer for cost, caching, and compliance on every model call.
Beyond the demo
Each answer leads to the next.
Context windows overflow. Tool calls break.
→
Have the model write code.
↓
A durable execution environment: sandboxes and state.
←
Code must run somewhere. The agent must remember.
↓
Now it can reach your tools and internal systems.
→
Zero Trust and MCP governance.
“I know what hyperscalers will look like in 10 years: exactly the same as they do now. I'm looking to Cloudflare to define what the next generation cloud looks like.”