Skip to main content

Pega's fix for runaway AI costs: stop the agents from thinking at runtime

Does Pega kill the token tax? Image by TW with some help from ChatGPT
The news

At its PegaWorld conference in Las Vegas on June 8, 2026, Pegasystems announced Pega Infinity 26, which it says will be available in Q3 2026. The principal change is commercial: Pega is moving away from per-token pricing for its AI agents toward a flat charge per completed "case," which it defines as a task carried out from start to finish, such as a customer changing an order, a loan approval, or a claim. Pega frames the move as removing what it calls the "AI token tax".

The pricing change rests on an architecture Pega calls Predictable AI. Reasoning-heavy AI work is concentrated at design time, when workflows are authored in Pega Blueprint and the new Infinity Studio. At runtime, a lighter-weight model identifies the user's intent, selects a pre-approved workflow, and executes it step by step; where an individual step requires a language model, for example to parse a document or summarize a prior interaction, that step is given bounded instructions rather than open-ended latitude. Pega gives two reasons: more consistent outcomes, because agents follow approved workflows rather than re-reasoning each request, and more predictable cost, because the heavier processing happens only once during design rather than on every transaction.

The architecture is not new to this release. Pega introduced Predictable AI Agents in May 2025 and integrated them into Pega Infinity '25, which reached general availability in December 2025. Infinity 26 primarily adds the outcomes-based pricing model, alongside a companion announcement that exposes Pega processes as Model Context Protocol (MCP) servers, allowing third-party agents from Anthropic, OpenAI, Google, and AWS to call them under Pega's governance controls. The release cites no named customer, quotes analyst Liz Miller of Constellation Research. The "more than 20x" savings figure comes from Pega's AI Token Cost Calculator and is qualified as applying "depending on workflow complexity and scale".

The bigger picture

Two industry currents explain the timing of this announcement.

The first is pricing. The customer-service software market has spent the past year and a half moving away from per-seat and per-token models toward charging for outcomes. Intercom Fin charges $0.99 per resolution. HubSpot cut its customer agent to $0.50 per resolved conversation in April. Zendesk runs around $1.50 per automated resolution on committed volume and has been selling outcome-based pricing since 2024. Salesforce launched Agentforce at $2.00 per conversation, a unit so loose that only roughly 8,000 of its 150,000-plus customers adopted it, which forced a pivot to per-action Flex Credits and Agentic Work Units. Sierra, Decagon, and Ada all sell per-outcome on custom enterprise contracts. Gartner, in a March 2026 forecast, projects that the cost of running inference on a trillion-parameter model will fall more than 90% by 2030, while noting that those provider-side savings will not fully reach customers and that agentic models consume between 5 and 30 times more tokens per task than a standard chatbot. Not all of it will reach the buyers, though. The unit price of thinking is falling while the number of units per task climbs, which is the squeeze every vendor in this market is now pricing against. Pega's per-"case" charge belongs to this trend, with its unit defined differently from a customer-service "resolution": a case spans a back-office task such as a loan approval or an insurance claim run end to end, rather than a single support interaction.

The second current is a deep disagreement across the industry about how much freedom an AI agent should have at runtime. One camp ships prompt-based tooling and lets agents reason and plan at each step, treating flexibility as the key point. Another constrains agents to pre-approved workflows and treats unbounded runtime reasoning as a liability, especially in regulated processes. Pega sits firmly in the second camp, and its CEO has said publicly that competitors asking users to write prompts are setting themselves up for trouble. The context underneath the argument is not trivial. A widely cited 2025 MIT study from its NANDA initiative found that roughly 95% of enterprise generative AI pilots produced no measurable return on the profit line, which the authors attributed less to model quality than to a "learning gap" in how organizations integrated the tools. This is the line the market is arguing about right now, and the vendors have started to pick sides.

My point of view and analysis

Start with the part Pega frames as leadership. On price, Pega is not leading, it is catching up, and the per-"case" charge is the same outcome-based move the customer-service vendors made first, just dressed for a different room. Credit where it is due, however, because the chosen unit is better than most: a completed back-office case is harder to game than a support "resolution" and maps to work a CFO already values. That is a real distinction. It is also a modest one, and it is not a first.

On the architecture, Pega's CEO is not entirely wrong about the risk he is arguing against. Letting a model improvise its way through a regulated claims process is asking for trouble, and the graveyard of failed genAI pilots is full of companies that could not audit what their agents did. The trouble is that the cure and the original promise of agentic AI pull in opposite directions.

Here is the question I cannot get my head around. There is real value in customer interactions that follow a rote path, and a great deal of work is exactly that; so Pega serving the rote case cheaply and consistently is a good thing, period. But the value of an agentic system was supposed to be the other case: the request that does not fit the workflow as designed, the genuinely novel situation. Pega's architecture is built to do the opposite of reasoning through those at runtime. So how does the system know it can safely run the rote workflow if it never reasons through the case at the outset? Pega's answer is the lightweight intent query that does the routing, which means the only runtime intelligence in the loop is intent classification, and classification is itself probabilistic and perfectly able to misroute. A request that matches no workflow then has three exits: forced onto the nearest approved path, escalated to a human, or handed to Blueprint to generate a workflow on the fly. However, that third option is the one Pega spends the whole pitch warning against, because runtime generation in a regulated process is precisely what it calls dangerous. You cannot headline determinism and keep on-the-fly generation as the safety valve without owning the contradiction.

There is a distinction underneath all of this. Deterministic guardrails wrapped around a probabilistic system set the boundaries of acceptable action without collapsing the space inside them. The agent still reasons; it simply cannot climb the fence. Pega is doing something else. At runtime, the approved space is the entire space. There is no reasoning inside the fence, because the fence is the answer. That is not an agent operating within guardrails. It is a workflow engine with a probabilistic front desk. For loan approvals and claims that may well be the right trade, and it should simply be named as one. The industry spent two years insisting agents would handle the unscripted long tail, and Pega's bet is that the long tail is where you get hurt, so it designed the long tail out. They may be right about the risk while conceding the promise without saying so. This is BPM, Pega's home turf since 1983, with an AI intake layer on the front. Calling it agentic is generous.

So here is what I would do before believing the deck. Ask Pega for one named production customer, on the record, who has run this at scale and watched the cost curve flatten, because a calculator output is not a reference you can phone. Then get the definition of a billable "case" in writing, including what happens when the workflow misroutes, fails, or escalates to a human, because "resolution" was always a vendor-defined word and "case" is no different, and that ambiguity surfaces on the invoice rather than in the contract. Finally, ask the uncomfortable one: what share of your real request volume does not map cleanly to a pre-approved workflow today, and what does Pega do with that slice? If the answer is "a human takes it" or "Blueprint writes a new one live," you are buying a very capable workflow engine, which may be exactly what you need, as long as you buy it with your eyes open.

The token critique landed because it is true, and the architecture is sensible for the work Pega is aiming at. I am just not convinced the market asked for agents that are forbidden from thinking the moment a request gets interesting, and I would like to know whether buyers are actually asking for this or whether the industry has decided the long tail was a bad idea all along. 

Comments

Last Year's Top 5 Popular Posts

You are only as good as your customer remembers

As you know, I am very interested in how organizations are using business applications, which problems they do address, and how they review their success. In a next instance of these customer interviews, I had the opportunity to talk with Melissa Gordon , Executive Vice President, Enterprise Solutions at Tidal Basin about their journey with Zoho. You can watch the full interview on YouTube. Tidal Basin is a government contractor that provides various services throughout the government space, including disaster response, technology and financial services, and contact centers. Tidal Basin started with Zoho CRM and was searching for a project management tool in 2019. This was prompted by mainly two drivers. First, employees were asking for tools to help them running their projects. Second, with a focus on organizational growth and bigger projects that involved more people, Tidal Basin wanted to reduce its risk exposure and increase the efficiency of project delivery. This way, the compa...

SAP Draws a Perimeter around Agentic AI and What That Means for the Rest of US

The most consequential enterprise AI governance document published this year arrived in late April with surprisingly little fanfare. SAP's updated API Policy, version 4/2026 , is a short document in plain English. The clause that is most interesting is Section 2.2.2. It restricts how autonomous and generative AI systems are permitted to interact with SAP APIs. Read literally, it has the potential to change the architecture of agentic AI projects across every SAP customer landscape. Read carefully, it is also more interesting than the lock-in headlines suggest. The policy targets a specific category of AI behavior, not AI as such. It connects to commercial mechanics that go well beyond API stability. And the literal text, in its current form, will probably not survive the next two policy revisions intact. There is a lot to unpack. I will walk through what the policy actually says, how the SAP-watching community is reading it, what the rest of the major enterprise vendors are doin...

The Illusion of Value: Why Salesforce’s Agentic Work Unit is the New "Bad Query" of the AI Era

The News On February. 25, 2026, Salesforce announced a pricing and metrics update . During the company’s Q4 FY2026 earnings call, CEO Marc Benio ff, together with CMO Patrick Stokes , unveiled the Agentic Work Unit (AWU). Positioned as a metric to quantify the labor performed by autonomous digital systems, Salesforce defines an AWU as one discrete task accomplished by an AI agent. According to Salesforce, this discrete task represents the exact moment " raw intelligence is converted into real work ". It is not a fixed unit but measured as a processed prompt, a completed reasoning chain, or an invoked tool. Salesforce explicitly designed the AWU to move the industry conversation away from the raw consumption of Large Language Model (LLM) tokens. As Benioff noted, tokens only measure "how much an AI talks," whereas the AWU is intended to measure actual business execution. The scale of this rollout is massive. Salesforce reported that its platform has already processe...

Data Wars: SAP Vs. Salesforce In The AI-Driven Enterprise Future

The past weeks certainly brought a lot of news, with SAP Sapphire and Salesforce's surely strategically timed announcement of acquiring Informatica , ranging at the top. I have covered both in recent articles. The enterprise software landscape is crackling with energy, and Artificial Intelligence (AI) is certainly the star of the show. It isn't anymore about AI as a mere feature; it's about AI as the strategic core of enterprise software. Two recent announcements underscored this shift: SAP's ambitious AI-centric vision that was unveiled at its Sapphire 2025 conference, and, arriving hot on its heels, Salesforce's agreement to acquire data management titan Informatica for $8 billion. Both signal an intensified battle for AI supremacy, where trusted, enterprise-wide data is the undisputed new monarch. Of course, SAP and Salesforce are not the only ones duking this one out. SAP's Sapphire Vision: An AI-Powered, Integrated Enterprise At its Sapphire 2025 event in ...

CPQ, Meet Price Optimization: Your Revenue Lifecycle Just Got Serious

The news On October 1, 2025, Conga announced its intent to acquire the B2B business of PROS , following PRO’s acquisition by Thomas Bravo . At the same time, ThomaBravo and PROS announced that PRO’s travel business segment will be run as a standalone business . The bigger picture Revenue operations, revenue management and revenue lifecycle management have become a thing in the past years, as evidenced by the number of specialized companies that solve parts of the overall problem of optimizing revenue. It also got abused to some extent (e.g., surge pricing models) when the users of the corresponding capabilities consider optimizing being the same as maximizing. Reality check: It is not. While optimizing involves a bit of identifying how much a customer is willing to pay, it also involves the thought of repeat business, or in other words customer loyalty, even without a formal loyalty program. And that involves the customer experience, part of which the speed of creating a quote with mat...