Everyone is asking the wrong question about AI.
The question everyone asks: “Which model should we use?”
The question that actually matters: “What environment does the AI operate in?”
This is not a subtle distinction. It is the difference between companies that get real value from AI and companies that burn budget on demos that never ship.
The Hole, Not the Drill
Nobody buys a drill because they want a drill. They buy a drill because they want a hole.
The same logic applies to AI agents. Nobody needs a smarter model. They need their business to run better: faster decisions, fewer dropped balls, work that happens while they sleep and holds up in the morning.
But the entire AI industry is selling drills. Bigger context windows. Better benchmarks. Faster inference. More parameters.
Here is what we have learned building AI systems that actually run businesses: the model accounts for maybe 20% of the value. The other 80% comes from the operating environment, the harness, the constraints, and the governance around the model.
We call this the Bounded Harness.

What Karpathy Proved (And What It Means for Your Business)
In March 2026, Andrej Karpathy released a project called Autoresearch. In one overnight session, his AI agent ran 126 experiments, kept 23 improvements, and meaningfully advanced the state of the art on a language model training task. No human intervention. The agent worked while Karpathy slept.
The interesting part is not that the agent was smart. The interesting part is that the environment was disciplined.
The agent could only edit one file. It had one metric to optimize. Every experiment got exactly five minutes. If the result improved, the change stayed. If it did not, the code reverted automatically. Everything was logged. Everything was reversible.
Tight constraints. Clear measurement. Cheap failure. That is what made it work.
A mediocre agent inside a strong harness will outperform a brilliant agent inside a messy one. Every time.
Five Principles That Change How You Think About AI

Constraints Create Capability
The instinct is to give AI agents maximum freedom. “Let it figure it out.” This is exactly wrong.
Every successful autonomous system we have built or studied works because of what the agent cannot do. Restrictions on scope, time, and action space force the agent to be precise rather than exploratory. Exploration without boundaries is just expensive noise.
For your business: do not ask “what can AI do?” Ask “what is the smallest, most valuable task we can give it, with a clear way to know if it worked?”
The Instructions Are the Architecture
In Karpathy’s system, a Markdown file called program.md tells the agent how to behave: what to try, what to avoid, how to recover from failure, when to commit and when to revert.
This is not “prompting.” This is system design.
In our own operations, we maintain governance documents that define our AI agent’s decision boundaries, communication standards, escalation triggers, and quality thresholds. These files are version-controlled, audited, and updated as the business evolves. They are as much a part of the architecture as the code.
For your business: your AI’s operating manual is a strategic asset. If it lives in someone’s head or in a one-off prompt, you do not have an AI system. You have an expensive experiment.
Optimize the Harness, Not the Model
Most companies respond to poor AI output by switching models. “Let us try GPT-5.” “Maybe Claude is better for this.”
The real problem is almost never the model. It is the surrounding machinery: how tasks are defined, how results are measured, how failures are handled, how context is managed.
We spent months refining our operating harness: 25 automated workflows, a three-tier memory system, governance-enforced model assignments, automated auditing, and a security pipeline. The model matters. But the harness matters more.
For your business: before you switch models, audit your operating environment. How are tasks assigned? How are results verified? What happens when something fails? Fix the harness first.
Time Budgets Force Real Value
Karpathy gives each experiment exactly five minutes. This is brilliant because it forces every approach to justify itself in the same time box. A clever idea that takes too long to execute loses to a simple idea that ships fast.
We apply the same principle. Every automated workflow has a timeout. If it cannot produce value within the budget, the approach is wrong. This prevents the most common failure mode of AI systems: impressive demos that take forever and cost a fortune in production.
For your business: put time budgets on your AI tasks. If the agent cannot deliver value in 5 minutes (or 60, or 300, depending on the task), the task is poorly defined, not the agent.
Make Failure Cheap and Visible
The best autonomous systems are designed so that bad outcomes are cheap to discard and every action is traceable. Karpathy uses Git commits as an evolutionary record. We use event logging, version-controlled workspace files, and automated auditing.
This is what makes the difference between an AI system you can trust and one you cannot. Not the model’s intelligence, but the environment’s observability.
For your business: can you trace every action your AI took? Can you revert a bad decision cheaply? If not, you are not ready for autonomy. Start there.
The Real Competitive Advantage
Here is the uncomfortable truth for the AI industry: models are commoditizing. The frontier model from six months ago is open-source today. The benchmarks that mattered last quarter are irrelevant this quarter.
What is not commoditizing is the operating environment. The governance frameworks. The memory architectures. The bounded harnesses that turn raw intelligence into reliable business value.
This is where the next wave of differentiation lives. Not in who has the smartest AI, but in who has the most disciplined environment for AI to operate in.
We know this because we live it. Our AI partner, Abbie, does not succeed because she runs on a frontier model (though she does). She succeeds because she operates inside a harness that was built for one purpose: making a business run better every day, with full accountability and zero ambiguity.
That is the thesis behind the 10-2-1 model: one elite human, two co-pilots, ten AI agents, all operating inside a bounded harness that makes the whole greater than the sum of its parts.
Start Here
If you are a business leader trying to figure out AI, stop asking which model to buy. Start asking these three questions:
Question 1
What is the smallest valuable task I can give an AI agent, with a clear success metric?
Question 2
What happens when the agent fails? Is the failure cheap and reversible?
Question 3
Can I trace every action the agent took and understand why?
If you can answer those three questions, you are ready to build something real. If you cannot, no model in the world will save you.
The model is the drill. The harness is the hole.
Michael Murray
Managing Partner at Abeba Co, where he builds AI operating environments for agencies and PE portfolio companies.
The AI Executive Partner framework is open-source at github.com/AbebaCo/ai-executive-partner-framework.
