2024 was the year people became familiar with the idea of AI agents. Mind-blowing demos captured widespread attention.
2025 was the year agents transitioned from demo to production. We entered the year with the mantra "2025 is the year of AI agents." By the end of the year, several signals made it clear that a deeper transformation was underway: the formation of the Agentic AI Foundation by the Linux Foundation, marking a move toward a more open ecosystem; JPMorgan rolling out agents with adoption from over 30k employees, suggesting the massive, compounding value of systems of AI agents in real production environments.
What comes next in 2026? In 2026, the decisive shift in AI will not be smarter models, but operational systems of agents that can be coordinated, verified, and improved over time.
Over the past three years, I've been building agentic systems at the forefront of research, open source, and real-world production, dating back to the inception of AutoGen in early 2023. In this article, I reflect on those experiences and share my outlook on where agentic AI is headed in 2026.
1. AI Agents From Demos to Production at Scale
This is a trend that is already very obvious (almost feels like cheating to call this a prediction). The technical stack is far from stabilized, but production deployments are already happening because of the economic value AI agents can provide.
Despite this momentum, there is one common trap many teams fall into: misunderstanding where the real ROI actually lives.
For any organization with even a modest level of operational and organizational sophistication, the highest value does not come from a simple chatbot, a generalist agent, or any isolated agent system. At scale, uncoordinated agents quickly devolve into massive context silos impossible to manage. The real leverage comes from orchestration. The highest returns emerge from a well-coordinated system of agents—one that intelligently understands the organization, structures, and routes, and coordinates multiple specialized personas to achieve outcomes that no single agent could deliver on its own.
Some resources about why orchestration is such an essential layer in the enterprise setting from top business leaders:
- Big Ideas 2026: The Enterprise Orchestration Layer by A16Z, Dec 25, 2025
- AI's trillion-dollar opportunity: Context graphs by Foundation Capital, Dec 22, 2025
2. The Early Shape of AGI? AI Agents Tapping Into Workforce
OpenAI released GDPVal, an evaluation that measures model performance on economically valuable, real-world tasks across 44 occupations, in October 2025. It may appear less flashy than new model launches, but it points to something far more consequential: even with today's model capabilities, it is already possible to build agents that can perform economically valuable work at scale, grounded in tasks drawn from real job functions across the U.S. workforce. This begins to resemble AGI, with agents passing both the Turing and employment tests.
Latest research and studies about the progress of AI tapping into the workforce:
- GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks, by OpenAI, Oct 2025
- Future of Work with AI Agents, Stanford Social and Language Technologies Lab, Oct 2025
- AI Could Affect 90% of Occupations, Morgan Stanley, Sep 2025
3. The Self-Improving Loop and Verification Fabric
AI agents are not meant to be static. They are meant to adapt. To be effective in the real world, agents must be able to observe their own behavior, learn from outcomes, and improve over time. Without this capability, even the most potent agents inevitably degrade as environments change, assumptions break, and edge cases accumulate.
Adaptation is not an optional enhancement but a foundational requirement for agents expected to operate continuously, autonomously, and at scale. One of our latest research, Absolute Zero Reasoner (NeurIPS 25 spotlight paper) shows the effectiveness of Reinforced Self-play Reasoning on coding tasks from almost zero data. At the core of this new paradigm is a verifiable environment coupled with a task generator that allows agents to stress-test and systematically improve their reasoning capabilities through self-play.
Extending this paradigm beyond coding tasks directly leads to a more general and fundamental challenge in verification. For many real-world tasks, there is no scalable verification mechanism. An infrastructure around verification must be constructed: a structured system of task and problem decomposition, intermediate checks, feedback signals, and even human annotation and teaching that together transform open-ended objectives into verifiable environments. Once such a fabric exists, self-play and reinforcement can be applied far beyond coding—to reasoning, planning, data analysis, decision-making, and multi-agent coordination.
This reframes the core challenge and opportunity of agentic AI: progress can be driven by systems that make outcomes observable, comparable, and improvable over time.
4. Welcome to Software 4.0: Agentware
Andrej Karpathy has a well-known thesis about the evolution of software from 1.0 to 3.0.
- Software 1.0: We are writing computer code.
- Software 2.0: We are programming neural nets.
- Software 3.0: Prompts become the new program.
Last time Andrej publicly talked about this evolution was six months ago: Software is changing (again). But even as he spoke, we were already crossing the threshold into the era of Software 4.0, where Agency & Coordination become programmable.
In Software 4.0, both the notion of programming and the role of the developer fundamentally change. Developers are no longer defined primarily by their ability to write code or craft prompts, but by their ability to design, orchestrate, and operate AI agents. Programming returns to its original essence: formulating problems and steering computation, while machines take on more of the execution. What's new is a fundamentally different form of computation—agentic computation—where systems are adaptive, goal-directed, capable of taking action, and able to improve through feedback rather than simply executing fixed code paths.
Software evolves into Agentware. Agents become the primary unit of application: encapsulating reasoning, memory, tools/skills, orchestration, and learning loops within a single operational system. How exciting for builders!!
5. The Rise of the Internet of Agents
2025 witnessed early formation and consolidation of open protocols, notable ones including agent tool calling protocol MCP; Agent to agent communication A2A protocols; Agentic payments protocols: ACP (Agentic Commerce Protocol), AP2 (Agent Payment Protocol), and x402.
We've been actively experimenting with these open protocols and systematically integrating many of them into AutoGen/AG2. Through these early explorations, I'm beginning to see the outlines of an early "internet of agents"—a world in which autonomous agents can discover one another, communicate, invoke tools, transact value, and coordinate work across organizational and platform boundaries.
Much like the early internet, this layer remains rough and incomplete for full open operation. However, within controlled boundaries, we are already seeing network effects emerge. The implications are profound: agents cease to be isolated automations and become participants in a shared global substrate. Capability becomes composable. Intelligence becomes distributable. And coordination—long the most complex problem in AI systems—begins to scale.
The AG2 Journey
At AG2, I've been working with an amazing team and an open-source community on the forefront of the topics above. The journey begins with the creation of AutoGen, which has grown into one of the most widely adopted multi-agent frameworks worldwide. In 2025, we transitioned from AutoGen to AG2, continuing our mission to build agentic AI in the open. I'm deeply proud of what we've built together.
-
Adoption: The world's leading teams are already deploying systems of AI agents on mission-critical workloads with AG2. Several notable examples include:
- Novo Nordisk — clinical data analysis
- Walmart — dynamic product recommendations and large-scale product description generation on Walmart.com
- NVIDIA — advanced chip design workflows and even self-improving agents
- Parker Hannifin — Agentic customer support spanning a catalog of over 50 million products and documents exceeding 100,000 PDF pages
-
Award Winning Frontier Research: We contributed multiple best papers and spotlight papers to the research community at the top AI/ML conferences, including:
- AutoGen (Best paper award at ICLR LLM Agent workshop 2024) — The pioneer multi-agent framework curating the wave of agentic AI.
- Agent Failure Attribution (Spotlight paper at ICML 2025) — A novel benchmark for automated failure attribution in agentic systems.
- Absolute Zero Reasoner (Spotlight paper at NeurIPS 2025 and #1 paper of the day on Huggingface) — A groundbreaking reinforced self-play reasoning paradigm enabling self-improving agents.
-
Cutting-Edge Agent Performance: We developed agentic systems that ranked among the top performers on the most challenging benchmarks, such as GAIA and SWE-Bench, and recently achieved the #1 market return on Prophet Arena across multiple consecutive weeks—outperforming the market baseline by ~40%.
The Path Forward
We are now entering a new phase of Agentic AI, not defined by bigger models or flashier demos, but by practical systems that are operated, interconnected, trusted, and improved over time.
Over the coming weeks, I'll share a series of deep dives into the ideas behind this shift, along with concrete examples, production use cases, and the real-world impact they deliver. If you're passionate about the ideas and challenges outlined above, we invite you to join us in building this future together.
Stay in the loop with the latest developments and releases from AG2 by following our LinkedIn and subscribing to updates here.
