Skip to main content

SAP's Double Acquisition: How Dremio and Prior Labs Complete a Data Strategy the Competition Can't Easily Match

SAP acquires Dremio and Prior Labs; image by TW with some help from ChatGPT
On May 4, 2026, SAP announced two acquisitions in the same breath:
Dremio, an Apache Iceberg-native agentic data lakehouse, and Prior Labs, a pioneer of Tabular Foundation Models. Neither acquisition is exotic. Together, they are contributing to the most coherent enterprise AI platform strategy any major vendor has shown this year.

Let me unravel what each company actually brings, why the combination matters, what it means for the competitive field, and — most importantly — what buyers and SAP customers should be doing right now.

The Problem SAP Is Solving

Before diving into the deals, let's formulate the problem addressed. SAP's CTO Philipp Herzig said it clearly: "Enterprise AI doesn't stall because the models aren't good enough; it stalls because the data isn't ready for AI agents".

That is not a marketing line. It describes a pattern analysts and practitioners see constantly: AI pilots perform in a sandbox and fail when they hit production. The reasons are familiar: data is locked in proprietary formats across a dozen systems, there's no consistent business context, ETL pipelines take months to build, and governance gaps make audit-ready AI decisions nearly impossible.

SAP has also faced an additional problem. The narrative about SAP is and always was that it works brilliantly if everything lives inside SAP and required considerable engineering if you want to connect it to anything else. In an enterprise world where the average organization uses dozens of SaaS applications, that story is a liability.

Both acquisitions address these problems directly from different angles.

Acquisition One: Dremio and the Data Layer

Dremio is an open-data lakehouse built on Apache Iceberg. That description undersells it. Dremio co-created Apache Polaris, the open catalog standard for Iceberg multi-engine interoperability, and Apache Arrow, the in-memory columnar format that is the plumbing of modern analytics. These are foundational open-source contributions. Dremio's customers include Shell, TD Bank, and Michelin; these are enterprises with complex, multi-environment data estates that needed exactly what Dremio offers: federated query across any data source, without copying data or running ETL.

 

The specific integration SAP is executing is significant. SAP Business Data Cloud will become an Apache Iceberg-native enterprise lakehouse. That means SAP and non-SAP data can coexist on the same open foundation without format conversion or data movement. A universal catalog built on Apache Polaris means every system — SAP and otherwise — can read and write using the same standards. The Dremio AI Semantic Layer adds consistent business context across all sources, so that an AI agent querying HR data from SuccessFactors and financial data from a non-SAP system is working from the same definitions, not two conflicting ones.

For SAP's agentic AI play, this is a big improvement. Joule agents need data to act on. If that data is fragmented, ungoverned, or requires human pre-processing before an agent can consume it, agentic AI becomes expensive proof-of-concept work rather than production value. Dremio's self-managing platform with automated clustering, compaction, query optimization, reduces the operational burden of keeping an AI-ready data estate running. The MCP integration Dremio already ships means any LLM or AI agent framework can access enterprise data without custom integration work.

What is production-ready versus pending: The transaction closes Q3 2026, subject to regulatory approval. The Dremio platform itself is proven at enterprise scale, so this is not an early-stage bet. The integration work to embed Dremio fully into SAP Business Data Cloud will take time after close. Buyers should expect a meaningful roadmap presentation at SAP TechEd 2026 rather than a shipping product today.

Acquisition Two: Prior Labs and the Intelligence Layer

Prior Labs is a different kind of acquisition. This is not a product acquisition. It is a research acquisition with a specific thesis: Large Language Models are the wrong tool for structured business data prediction.

The argument is technical but intuitive. LLMs are trained on text. They have a rudimentary understanding of tables and statistics. When you ask an LLM to predict payment delays, supplier default risk, or customer churn from tabular ERP data, you are using a tool designed for language on a problem that is fundamentally about numbers, distributions, and correlations within structured datasets. Tabular Foundation Models (TFMs) are purpose-built for exactly this domain.

Prior Labs' flagship model series, TabPFN, has been academically validated: published in Nature, top-ranked on TabArena, the leading benchmark for tabular models, and with over three million downloads as an open-source tool. The most recent TabPFN-2.6 matches the accuracy of a four-hour automated machine learning pipeline, produced instantly, in a single model. For enterprise use cases like predicting payment delays, supplier risks, upsell opportunity scoring, and churn probability, that is a meaningful capability gap versus what LLMs can offer today.

SAP will commit more than €1 billion over four years to scale Prior Labs into what it is calling a globally leading frontier AI lab for structured data, based in Europe. Prior Labs will operate as an independent entity and will retain its open-source strategy. The scientific advisory board will include Yann LeCun (ACM Turing Award winner, Advanced Machine Intelligence) and Bernhard Schoelkopf (Max Planck Institute for Intelligent Systems, ELLIS president). That is not a list assembled for press release credibility. Both are significant contributors to the field.

The European angle matters beyond headlines and marketing. Regulatory pressures on AI data handling, EU AI Act requirements for explainability, and data sovereignty concerns make a European-based frontier AI lab genuinely useful for SAP's core customer base. An AI model built and governed in Europe, on structured business data, with academic validation and an open-source foundation, addresses a set of enterprise objections that American hyperscaler AI labs cannot easily resolve.

What is production-ready versus pending: Prior Labs' existing models are available and proven at the research level. Their integration into SAP's commercial product stack, embedding TabPFN predictions into S/4HANA, CX, or SuccessFactors workflows, requires post-close engineering. The €1B investment is a four-year commitment, not a capability that appears in the next release cycle.

Why the Combination Matters: A Three-Layer Stack

Looking at these acquisitions together, SAP is assembling a three-layer stack for enterprise agentic AI:

Layer 1: Data access and governance: Dremio + SAP Business Data Cloud. Federated access to SAP and non-SAP data, open Iceberg-native foundation, AI-ready governance, no data movement required.

Layer 2: Business intelligence from structured data: Prior Labs' TFMs. Accurate predictions on tabular business data that LLMs cannot match; payment delays, supplier risk, churn, upsell scoring.

Layer 3: Agent orchestration: Joule + Business Technology Platform. SAP's existing AI agent and process automation layer, is now able to draw on a governed data foundation and specialized prediction capability.

The coherence here is unusual. Most enterprise AI announcements from large vendors are additive: another LLM integration, another AI assistant feature, another partnership with an LLM provider. This one is different. SAP is making a structural argument about what enterprise AI requires. Not just a better model, but the right kind of model running on clean, federated, business-context-aware data, and is acquiring the capabilities to back it up.

The Partnership Question: Databricks and Snowflake

This is where the Dremio acquisition gets complicated, and SAP customers with existing data platform investments need to pay close attention. In the 15 months before today's announcement, SAP built two high-profile partnerships around exactly the problem Dremio solves. Both are now in a different position than they were yesterday.

The Databricks situation

In February 2025, SAP launched SAP Databricks as a first-party service natively embedded in SAP Business Data Cloud. This is not a partner integration, but a component of BDC itself, paid via BDC Capacity Units. Databricks committed $250 million to support customer and SI success on the joint platform. It was positioned as the flagship data engineering and AI layer for BDC.

Dremio competes directly with this story. Where SAP Databricks brought data engineering, ML workloads, and Unity Catalog-governed analytics into BDC, Dremio brings an Iceberg-native lakehouse, federated query across SAP and non-SAP data, and an AI semantic layer. These capabilities overlap substantially.

The underlying table format tension makes this sharper. Databricks' platform is built primarily on Delta Lake, with Iceberg support added later. Dremio co-created Apache Iceberg's catalog standard, Apache Polaris, and is Iceberg-native by design. SAP's stated direction is to make Business Data Cloud Iceberg-native. If that commitment holds, SAP's own first-party data architecture moves toward the format Dremio built, and away from the format Databricks built on.

SAP has not addressed what happens to SAP Databricks after the Dremio close. The most plausible near-term outcome is that Databricks retains a role as the ML and AI model development workbench within BDC with Unity Catalog, model training, MLflow, Mosaic AI, while Dremio becomes the lakehouse storage and federated query layer. That is a narrower role than Databricks was sold as in February 2025, and it represents a meaningful change in the commercial argument for SAP Databricks. Customers who made BDC decisions partly on the strength of that integration should ask explicitly, in writing, what the integration roadmap looks like after the Dremio transaction closes. Remember Hybris? or SAP Marketing Cloud?

The Snowflake situation

Snowflake's position was structurally weaker before today. When SAP and Snowflake announced their partnership in November 2025, the difference in status was noted immediately: SAP Databricks was a first-party service inside BDC, while SAP Snowflake was a solution extension, an add-on. SAP's data and analytics leadership confirmed the distinction publicly. Snowflake was the third major data platform partnership SAP announced that year, after Databricks and BigQuery.

The Dremio acquisition makes Snowflake's position harder to defend commercially. Snowflake has bet heavily on Apache Iceberg adoption; it is one of the company's core strategic moves to remain relevant as the industry converges on open table formats. This Iceberg compatibility means SAP and Snowflake can still interoperate through an Iceberg-native BDC. But compatibility is not the same as commercial necessity.

The core proposition of the SAP-Snowflake partnership was: bring your existing Snowflake deployment, connect it to SAP Business Data Cloud via zero-copy BDC Connect, and get federated access to semantically rich SAP data without moving it. Dremio offers the same federation story natively, from inside BDC, without a separate Snowflake contract. For customers now deciding between SAP BDC + Snowflake and SAP BDC + Dremio as their non-SAP data federation layer, the math has changed.

 

There is also a timing question. The SAP-Snowflake BDC Connect integration was planned for H1 2026 general availability. Now that SAP has announced a native Dremio-based federation layer, watch whether that H1 2026 timeline holds or slips, and how the roadmap changes. I expect SAP's engineering prioritization to change.

What buyers with existing partner investments should do

Three hot questions for anyone already using or evaluating these partnerships:

Will Databricks' role inside BDC narrow from "first-party lakehouse service" to, e.g., "ML and model workbench"? If so, the ROI case for SAP Databricks changes, and customers who bought it as a data platform should model what Dremio covers versus what Databricks still delivers uniquely. Get that scoping conversation started now, before the Dremio transaction closes and SAP has to commit to a position.

Does the SAP-Snowflake BDC Connect rollout and ongoing development proceed on the original timeline? If SAP's native federation story via Dremio reduces internal urgency to complete the Snowflake integration, customers who built roadmaps around the Snowflake partnership will need to know. Ask immediately.

For both Databricks and Snowflake, the more important question is not whether the partnerships survive — they almost certainly will in some shape and form — but whether they survive with the same scope. SAP has a track record of gradually narrowing partner integrations once it builds equivalent native capability. Both Databricks and Snowflake should be scenario-planning for that conversation. And SAP customers should avoid being caught in the middle of it.

How This Changes the Competitive Map

None of SAP's major competitors is positioned identically. Each has a different set of strengths and gaps.

Microsoft has Microsoft Fabric as its unified data platform and Copilot as its AI layer. Fabric is a credible, well-funded platform, but it is Azure-native and more proprietary than an Iceberg-first architecture. Microsoft does not have a tabular foundation model research capability. The company's AI investments are primarily in OpenAI-based large language models, which face the limitations on structured data prediction that SAP's CTO named.

Salesforce has Data Cloud as its data unification layer and Einstein as its AI layer. Data Cloud is impressive for CRM-centric use cases and requires other data to be ingested into Salesforce's ecosystem. It is not a federated architecture. Salesforce's AI investments are focused on their customer data estate, not the broader structured data prediction problem. Compared to SAP's core industrial, manufacturing, and financial services customer base, that is a narrower scope.

Oracle is actually the most interesting comparison. Oracle has full-stack control (SaaS, PaaS, IaaS) and deep integration across Fusion applications and OCI. Like SAP, Oracle has a large base of structured operational data. But Oracle has not made a comparable move on the data layer (no Iceberg-native federated architecture commitment) or on the model layer (no TFM research acquisition). Oracle's agentic AI story is about embedding agents into Fusion workflows, not about predicting outcomes from structured data with specialized models.

Snowflake is worth watching as an indirect competitor. Dremio and Snowflake serve overlapping markets. Both are open-data lakehouse platforms. Snowflake has its own Iceberg support and Cortex AI layer. The SAP acquisition gives Dremio the enterprise ERP context and customer base that Snowflake competes for separately. Combined with SAP's process knowledge, the Dremio + SAP Business Data Cloud stack could become a strong alternative for Snowflake's enterprise analytics customers, particularly those running SAP.

The Open-Source Angle

Both acquisitions come with an explicit open-source commitment from SAP. That is not accidental and it is worth taking seriously.

Prior Labs' TabPFN has 3 million downloads and academic validation. SAP's stated intention to preserve the open-source strategy means the research community continues to develop and validate these models, which benefits SAP's commercial implementation. Dremio co-created Apache Iceberg's catalog standard Polaris and the Arrow query format. SAP's commitment to continue contributing to these projects matters for interoperability. It is what makes the "no vendor lock-in" claim credible.

For enterprise buyers, the open-standards story addresses a real concern: if you rebuild your data architecture around a vendor's AI platform, what happens in five years if you need to change vendors? An Iceberg-native, Apache Polaris-cataloged data estate is portable in a way that a proprietary format is not. SAP is betting that customers who build on open standards will stay, not that they have to stay.

What Buyers and SAP Customers Should Do

If you are an existing SAP customer: The most near-term action item is understanding what the Dremio acquisition means for your data estate. Specifically: how much of your business data lives outside SAP today? If the answer is "quite a lot", which it is for nearly every organization, then the Iceberg-native Business Data Cloud architecture becomes highly relevant to your AI readiness. Get on the roadmap conversation with your SAP account team now, before these capabilities are generally available.

If you are evaluating ERP vendors: SAP's TFM bet is a meaningful differentiator in the 2027-2028 timeframe, not immediately. But the evaluation question is important today: ask any ERP vendor you are considering what their structured data prediction story is beyond LLM-based AI. The question will either reveal a thoughtful answer or reveal that they haven't thought about it.

If you are not an SAP customer: Do not dismiss Dremio's trajectory because it is now inside SAP. The open standards commitment means Dremio-based data architectures remain viable in non-SAP contexts. Watch whether SAP's integration approach over the next 18 months preserves that independence or quietly narrows the platform to favor SAP data sources.

If you are a data platform decision-maker: The Dremio acquisition raises the strategic importance of your Iceberg adoption timeline. Organizations that have already standardized on Apache Iceberg as their table format will find the integration into Business Data Cloud straightforward. Organizations still on proprietary formats (Delta Lake, Hive, legacy warehouses) should factor this development into their modernization roadmap.

On timing: Both transactions are pending regulatory approval and are expected to close in Q3 2026. Neither capability is available in production today. Do not let this announcement accelerate or delay operational decisions that need to be made in the next quarter. Use the time to build internal alignment on data readiness strategy.

MyPoV

SAP has spent years getting credit for being the system of record for the world's largest enterprises, and criticism for being difficult to integrate with everything outside its own walls. These two acquisitions are a direct response to that criticism.

The Dremio play says: we will be the data layer for your entire estate, not just your SAP estate. The Prior Labs play says: we will build the AI that actually understands the data that runs your business, not just the AI that talks about it.

Neither of these is guaranteed to work. Post-acquisition integration is where most ambitious platform strategies starve. The open-source commitments are credible today but have to be proven under commercial pressure. The regulatory timelines are a fact of life. And the €1 billion Prior Labs investment is a four-year commitment in an AI research environment that moves faster than any multi-year plan can anticipate.

But directionally, this is the right problem statement and a good approach to solving it. The data readiness problem exists The LLM-for-everything assumption is flawed for structured business data. And the open-standards position is defensible in a way that proprietary data platforms are increasingly not.

The question SAP customers should be asking isn't whether these acquisitions make sense. They do. The question is how fast SAP can execute the integration without losing what makes both companies valuable: the engineering credibility of an independent data platform and the academic rigor of a research-first AI lab.

That execution question will be answered over the next 24 months. Start watching now. 

Comments

Last Year's Top 5 Popular Posts

You are only as good as your customer remembers

As you know, I am very interested in how organizations are using business applications, which problems they do address, and how they review their success. In a next instance of these customer interviews, I had the opportunity to talk with Melissa Gordon , Executive Vice President, Enterprise Solutions at Tidal Basin about their journey with Zoho. You can watch the full interview on YouTube. Tidal Basin is a government contractor that provides various services throughout the government space, including disaster response, technology and financial services, and contact centers. Tidal Basin started with Zoho CRM and was searching for a project management tool in 2019. This was prompted by mainly two drivers. First, employees were asking for tools to help them running their projects. Second, with a focus on organizational growth and bigger projects that involved more people, Tidal Basin wanted to reduce its risk exposure and increase the efficiency of project delivery. This way, the compa...

Data Wars: SAP Vs. Salesforce In The AI-Driven Enterprise Future

The past weeks certainly brought a lot of news, with SAP Sapphire and Salesforce's surely strategically timed announcement of acquiring Informatica , ranging at the top. I have covered both in recent articles. The enterprise software landscape is crackling with energy, and Artificial Intelligence (AI) is certainly the star of the show. It isn't anymore about AI as a mere feature; it's about AI as the strategic core of enterprise software. Two recent announcements underscored this shift: SAP's ambitious AI-centric vision that was unveiled at its Sapphire 2025 conference, and, arriving hot on its heels, Salesforce's agreement to acquire data management titan Informatica for $8 billion. Both signal an intensified battle for AI supremacy, where trusted, enterprise-wide data is the undisputed new monarch. Of course, SAP and Salesforce are not the only ones duking this one out. SAP's Sapphire Vision: An AI-Powered, Integrated Enterprise At its Sapphire 2025 event in ...

CPQ, Meet Price Optimization: Your Revenue Lifecycle Just Got Serious

The news On October 1, 2025, Conga announced its intent to acquire the B2B business of PROS , following PRO’s acquisition by Thomas Bravo . At the same time, ThomaBravo and PROS announced that PRO’s travel business segment will be run as a standalone business . The bigger picture Revenue operations, revenue management and revenue lifecycle management have become a thing in the past years, as evidenced by the number of specialized companies that solve parts of the overall problem of optimizing revenue. It also got abused to some extent (e.g., surge pricing models) when the users of the corresponding capabilities consider optimizing being the same as maximizing. Reality check: It is not. While optimizing involves a bit of identifying how much a customer is willing to pay, it also involves the thought of repeat business, or in other words customer loyalty, even without a formal loyalty program. And that involves the customer experience, part of which the speed of creating a quote with mat...

The CDP is dead – long live the CDP!

In the past few years, I have written about CDPs, what they are and what their value is – or rather can be. My definition of a CDP that I laid out in one of my column articles on CustomerThink is:  A Customer Data Platform is a software that creates persistent, unified customer records that enable business processes that have the customers’ interests and objectives in mind. It is a good thing that CDPs evolved from its origins of being a packaged software owned by marketers, serving marketers. Having looked at CDP’s as a band aid that fixes the proliferation of data silos that emerged for a number of reasons, I have ultimately come to the conclusion and am here to say that the customer data platform as an entity is increasingly becoming irrelevant – or in the typical marketing hyperbole – dead.  Why is that? There are mainly four reasons for it.  For one, many an application has its own CDP variant already embedded as part of enabling its core functionality. Any engageme...

LLM Showdown: Comparing ChatGPT, Gemini, and Grok for Automated News Research

The analyst’s day is full of research. Now, this is the age of AI and AI is here to help, isn’t it? As everyone is talking about copilots and AI agents, why not using the tools at hand to do a little research on research. NB., no one really has a good definition of an AI agent, so this might become an additional topic for research. But I digress. Imagine the following project at hand, which is not only interesting for analysts, btw, but also for a variety of roles in the corporate world. Let’s call it vendor (competitor) monitoring. The job is the following: Research reputable sites for news about a number of vendors, relating to a set of keywords. Reputable sites are high quality news sites, high quality tech publications, high quality analyst sites and, of course the news pages of the vendors in question. Limit the time frame of the search matching to the cadence of my information requirement, e.g., “yesterday” for a daily update or “last week” for a weekly update. Provide a summary ...