
A lot has changed in the last six months. The models kept getting smarter, but we've come to believe the real breakthrough isn't intelligence — it is the architecture built around it. The unit of work moved from answering a question to finishing a task. In a new client piece, we unpack what changed, what early agentic adoption looks like, and what it means for how intelligence gets priced and consumed.
I. What Changed: Orchestration, Not Intelligence
“A lot of people experienced AI last year as a ChatGPT–adjacent thing – but you really have to look again as of December, because things have changed fundamentally, especially on this agentic, coherent workflow that really started to actually work.”
Andrej Karpathy, From Vibe Coding to Agentic Engineering, Sequoia AI Ascent 2026 (April 29, 2026)
Many observers thought 2025 was going to be the year of the agent, but by year–end it felt we were still meaningfully short of that. Model intelligence was commonly assumed to be the bottleneck.
What we learned in January is that models have become smart enough, and that the missing piece was orchestration: the wiring that lets a language model take an action, observe the result, decide what to do next, retain what it learned, and resume the loop on a schedule.
Peter Steinberger - the Austrian developer behind PSPDFKit, who sold his company in 2021 - built the first real agentic "harness" as a weekend project in late 2025: the connective software that finally let these models operate like true agents rather than chatbots. He rebranded it to OpenClaw on GitHub in late January 2026.
OpenClaw went viral. Within weeks of launch, it surpassed Linux to become the most-”starred” (a proxy for downloads) open source project in GitHub history, ushering in the agentic era.
Marc Andreessen, on the Latent Space podcast in early April, summarized the architecture in a single line that has since become the canonical explanation: an LLM, plus a shell, plus a filesystem, plus markdown, plus a cron loop – and it turns out that’s an agent.
Put plainly: OpenClaw gave LLMs somewhere to store its work, tools to use, and the ability to keep iterating until the job is finished.
Every component there (except the LLM, of course) has existed since the early 1970s. What was new was the configuration.
"The combination of {Pi and OpenClaw}, I think, is one of the 10 most important softwares… The great breakthroughs are obvious in retrospect - which is the best kind. They weren't obvious at the time or somebody else would've done them already.”
Marc Andreessen, Latent Space podcast, April 3, 2026
II. AI Has Moved from Queries to Tasks
Jensen Huang named the shift a few weeks earlier on the Lex Fridman podcast
“The iPhone of tokens arrived. Agents in general... It (OpenClaw) is the fastest–growing application in history. It went straight up. There’s no question OpenClaw is the iPhone of tokens.”
Jensen Huang Lex Fridman Podcast #494, March 23, 2026
As the models have moved from queries to tasks, they have started producing real economic utility for the user.
"In the last few months, there has been a step change function in the productivity of the AI toolkit. It is profoundly more powerful than it was just nine months ago."
Ken Griffin, Stanford Leadership Forum 2026, May 4, 2026
Karpathy described the same shift on the No Priors podcast in March.
“In December something flipped. I went from writing 80% of my code myself to 20%. I don’t think I’ve typed a line of code since December. Code isn’t even the right verb anymore – I have to express my will to my agents for sixteen hours a day.”
Andrej Karpathy, No Priors with Sarah Guo, March 20, 2026
The shift began in software engineering, where feedback loops are tight and success criteria are verifiable.
It is now clearly reaching into other knowledge work, where the user describes the deliverable, walks away, and returns to a nearly finished product.
The unit of interaction has inverted. For two years, working with an LLM meant typing a question and reading an answer: latency in seconds, cost in cents, output a paragraph.
The agentic architecture flips each parameter. Latency runs in minutes to hours, cost can run to tens of dollars, and the output is a finished deliverable – a ten–page deck, multi–tab model, or a working group list.
III. What We Have Seen So Far with Felix
In Rogo, we saw where the OpenClaw revolution was heading and immediately mobilized the team to build our first agent on top of the new architecture.
From project kickoff to our Felix agent being in customers' hands took roughly six weeks. It's been live with several clients for a couple of months now, and the usage data is starting to roll in.
We ran a study of existing Enterprise users who already had Rogo Pro prior to Felix, comparing weekly query volume before and after Felix turn–on.
Average weekly queries per user doubled.
What makes this striking is that the unit of work itself got longer.
Felix can now spend twenty minutes or more working autonomously on a complex query, a pattern we're seeing across the agentic landscape - unlocking depth that wasn't possible with Rogo Pro's roughly five-minute ceiling.
With that added depth, users aren't just doing more of the same thing; they're doing things they couldn't do before.
IV. New Architecture Drives Utility But Consumes Meaningfully More Tokens
On April 24, a Stanford Digital Economy Lab team – Erik Brynjolfsson, Sandy Pentland, and Jiaxin Pei – published one of the first rigorous measurements of token consumption in agentic coding workflows.
“Agentic tasks are uniquely expensive, consuming 1,000x more tokens than code reasoning and code chat.”
Brynjolfsson, Pentland, Pei et al., How Do AI Agents Spend Your Money?, Stanford Digital Economy Lab, May 5, 2026
(Note: Felix does consume more tokens than Rogo Pro, but meaningfully less than the 1,000x cited in the study.)
As these models drive real utility, token consumption becomes a real cost. For some, it will also become a measure of productivity.
Jensen Huang has been making exactly that case internally at Nvidia. Recently on the the All–In Podcast in March:
“If that $500,000 engineer did not consume at least $250,000 worth of tokens, I am going to be deeply alarmed. That $500,000 engineer at the end of the year – I’m going to ask them, how much did you spend in tokens? If that person said $5,000, I will go ape.”
Jensen Huang, All–In Podcast, March 20, 2026
To Jensen, an engineer who is not burning their fair share of tokens is not actually trying. He compared it directly to one of his most expensive employees doing high–end work with the wrong tools.
Translated into a banking context: an associate who isn't consuming a meaningful amount of tokens is the equivalent of one showing up today and insisting on doing their modeling with a pencil and calculator instead of Excel. It will be necessary to steer that consumption to work that is commensurately high-value for a firm.
And the economics compound along a second dimension most firms haven't started measuring: time. Agents don't forget.
The deal patterns, the precedents, the practices your best managing director built over years—all of that used to walk out the door when she did. With agents and institutional memory, it can stay. Your newest MD inherits it on day one.
Token spend is now a productivity multiplier on the most expensive labor your firm employs, and it is going to be a P&L line item visible to the CFO.
Institutions are now the ultimate consumers of tokens, and the objective has shifted from simply adopting AI to maximizing utility per token: arming workers with the most powerful tool possible relative to token spend.
V. The Industry Economics Are Changing
“We see a future where intelligence is a utility, like electricity or water, and people buy it from us on a meter.”
Sam Altman, BlackRock U.S. Infrastructure Summit, March 12, 2026
As we move toward agentic AI and per–task token consumption expands, the way the industry consumes and charges for that intelligence is going to change.
Foundation labs will increasingly look like utilities, selling metered intelligence at enormous scale.
The "iPhone of tokens" (agents) will allow the application layer to generate meaningful value beyond the foundation models themselves.
"I would like us to be an infrastructure provider. If we can provide a utility and people can build on top of that utility… I think that would be quite powerful."
Sam Altman, Stripe Sessions 2026, April 29, 2026
The new architecture empowers AI–native companies that lean into deep domain expertise to build substantial utility on top of foundation models, turning domain context, workflow understanding, and proprietary data into real value for the end user.
"The way I would think about it is that most of what we're building is a platform and we think that there's so many examples of where a platform can accrue a lot of value, but the customers who are building on that platform actually accrue even more value.”
Krishna Rao, Anthropic CFO - Invest Like the Best Podcast, May 13, 2026
As this paradigm shifts, an interesting chasm opens up between the frontier labs and the end consumers of their intelligence. The labs are optimizing for tokens consumed; the end user is optimizing for utility per token.
That divergence expands the application layer's role as a model orchestrator.
Routing the right task to the right model – a ten–page deck to a frontier model like Opus 4.7, a quick email cleanup to a model that’s 8-10x cheaper like Gemini 3 Flash – becomes another core responsibility of the application layer, aligned with the end user, not the meter.
The technology has shifted quickly. The economics are scrambling to catch up.
More posts


