Pega's fix for runaway AI costs: stop the agents from thinking at runtime

The news

At its PegaWorld conference in Las Vegas on June 8, 2026, Pegasystems announced Pega Infinity 26, which it says will be available in Q3 2026. The principal change is commercial: Pega is moving away from per-token pricing for its AI agents toward a flat charge per completed "case," which it defines as a task carried out from start to finish, such as a customer changing an order, a loan approval, or a claim. Pega frames the move as removing what it calls the "AI token tax".

The pricing change rests on an architecture Pega calls Predictable AI. Reasoning-heavy AI work is concentrated at design time, when workflows are authored in Pega Blueprint and the new Infinity Studio. At runtime, a lighter-weight model identifies the user's intent, selects a pre-approved workflow, and executes it step by step; where an individual step requires a language model, for example to parse a document or summarize a prior interaction, that step is given bounded instructions rather than open-ended latitude. Pega gives two reasons: more consistent outcomes, because agents follow approved workflows rather than re-reasoning each request, and more predictable cost, because the heavier processing happens only once during design rather than on every transaction.

The architecture is not new to this release. Pega introduced Predictable AI Agents in May 2025 and integrated them into Pega Infinity '25, which reached general availability in December 2025. Infinity 26 primarily adds the outcomes-based pricing model, alongside a companion announcement that exposes Pega processes as Model Context Protocol (MCP) servers, allowing third-party agents from Anthropic, OpenAI, Google, and AWS to call them under Pega's governance controls. The release cites no named customer, quotes analyst Liz Miller of Constellation Research. The "more than 20x" savings figure comes from Pega's AI Token Cost Calculator and is qualified as applying "depending on workflow complexity and scale".

The bigger picture

Two industry currents explain the timing of this announcement.

The first is pricing. The customer-service software market has spent the past year and a half moving away from per-seat and per-token models toward charging for outcomes. Intercom Fin charges $0.99 per resolution. HubSpot cut its customer agent to $0.50 per resolved conversation in April. Zendesk runs around $1.50 per automated resolution on committed volume and has been selling outcome-based pricing since 2024. Salesforce launched Agentforce at $2.00 per conversation, a unit so loose that only roughly 8,000 of its 150,000-plus customers adopted it, which forced a pivot to per-action Flex Credits and Agentic Work Units. Sierra, Decagon, and Ada all sell per-outcome on custom enterprise contracts. Gartner, in a March 2026 forecast, projects that the cost of running inference on a trillion-parameter model will fall more than 90% by 2030, while noting that those provider-side savings will not fully reach customers and that agentic models consume between 5 and 30 times more tokens per task than a standard chatbot. Not all of it will reach the buyers, though. The unit price of thinking is falling while the number of units per task climbs, which is the squeeze every vendor in this market is now pricing against. Pega's per-"case" charge belongs to this trend, with its unit defined differently from a customer-service "resolution": a case spans a back-office task such as a loan approval or an insurance claim run end to end, rather than a single support interaction.

The second current is a deep disagreement across the industry about how much freedom an AI agent should have at runtime. One camp ships prompt-based tooling and lets agents reason and plan at each step, treating flexibility as the key point. Another constrains agents to pre-approved workflows and treats unbounded runtime reasoning as a liability, especially in regulated processes. Pega sits firmly in the second camp, and its CEO has said publicly that competitors asking users to write prompts are setting themselves up for trouble. The context underneath the argument is not trivial. A widely cited 2025 MIT study from its NANDA initiative found that roughly 95% of enterprise generative AI pilots produced no measurable return on the profit line, which the authors attributed less to model quality than to a "learning gap" in how organizations integrated the tools. This is the line the market is arguing about right now, and the vendors have started to pick sides.

My point of view and analysis

Start with the part Pega frames as leadership. On price, Pega is not leading, it is catching up, and the per-"case" charge is the same outcome-based move the customer-service vendors made first, just dressed for a different room. Credit where it is due, however, because the chosen unit is better than most: a completed back-office case is harder to game than a support "resolution" and maps to work a CFO already values. That is a real distinction. It is also a modest one, and it is not a first.

On the architecture, Pega's CEO is not entirely wrong about the risk he is arguing against. Letting a model improvise its way through a regulated claims process is asking for trouble, and the graveyard of failed genAI pilots is full of companies that could not audit what their agents did. The trouble is that the cure and the original promise of agentic AI pull in opposite directions.

Here is the question I cannot get my head around. There is real value in customer interactions that follow a rote path, and a great deal of work is exactly that; so Pega serving the rote case cheaply and consistently is a good thing, period. But the value of an agentic system was supposed to be the other case: the request that does not fit the workflow as designed, the genuinely novel situation. Pega's architecture is built to do the opposite of reasoning through those at runtime. So how does the system know it can safely run the rote workflow if it never reasons through the case at the outset? Pega's answer is the lightweight intent query that does the routing, which means the only runtime intelligence in the loop is intent classification, and classification is itself probabilistic and perfectly able to misroute. A request that matches no workflow then has three exits: forced onto the nearest approved path, escalated to a human, or handed to Blueprint to generate a workflow on the fly. However, that third option is the one Pega spends the whole pitch warning against, because runtime generation in a regulated process is precisely what it calls dangerous. You cannot headline determinism and keep on-the-fly generation as the safety valve without owning the contradiction.

There is a distinction underneath all of this. Deterministic guardrails wrapped around a probabilistic system set the boundaries of acceptable action without collapsing the space inside them. The agent still reasons; it simply cannot climb the fence. Pega is doing something else. At runtime, the approved space is the entire space. There is no reasoning inside the fence, because the fence is the answer. That is not an agent operating within guardrails. It is a workflow engine with a probabilistic front desk. For loan approvals and claims that may well be the right trade, and it should simply be named as one. The industry spent two years insisting agents would handle the unscripted long tail, and Pega's bet is that the long tail is where you get hurt, so it designed the long tail out. They may be right about the risk while conceding the promise without saying so. This is BPM, Pega's home turf since 1983, with an AI intake layer on the front. Calling it agentic is generous.

So here is what I would do before believing the deck. Ask Pega for one named production customer, on the record, who has run this at scale and watched the cost curve flatten, because a calculator output is not a reference you can phone. Then get the definition of a billable "case" in writing, including what happens when the workflow misroutes, fails, or escalates to a human, because "resolution" was always a vendor-defined word and "case" is no different, and that ambiguity surfaces on the invoice rather than in the contract. Finally, ask the uncomfortable one: what share of your real request volume does not map cleanly to a pre-approved workflow today, and what does Pega do with that slice? If the answer is "a human takes it" or "Blueprint writes a new one live," you are buying a very capable workflow engine, which may be exactly what you need, as long as you buy it with your eyes open.

The token critique landed because it is true, and the architecture is sensible for the work Pega is aiming at. I am just not convinced the market asked for agents that are forbidden from thinking the moment a request gets interesting, and I would like to know whether buyers are actually asking for this or whether the industry has decided the long tail was a bad idea all along.

aheadCRM

Search This Blog

Pega's fix for runaway AI costs: stop the agents from thinking at runtime

The news

The bigger picture

My point of view and analysis

Labels

Comments

Post a Comment

Last Year's Top 5 Popular Posts

You are only as good as your customer remembers

SAP Draws a Perimeter around Agentic AI and What That Means for the Rest of US

Beyond GDPR: Is MyTerms the New Standard for Enforceable Personal Data Agreements?

The Illusion of Value: Why Salesforce’s Agentic Work Unit is the New "Bad Query" of the AI Era

LLM Showdown: Comparing ChatGPT, Gemini, and Grok for Automated News Research