Databricks: Data & AI summit ‘25, what to expect?
Key innovations unveiled: Our expert takeaways from the Databrick + AI Summit 2025
The Databricks Data + AI Summit 2025 was nothing short of revolutionary for the Data & AI landscape. With over 22 000 attendees in San Francisco, Databricks is no longer just about the Lakehouse, it is leading towards world’s most complete Data Intelligence Platform. They kicked things off with a bold opening move: launching the Databricks Free Edition. This forever-free tier invites individual learners, students, startups, and small teams to explore its powerful tools.
Last year, we made a summary of the Data + AI Summit ‘24. Let’s take a look at the new announcements this year, and maybe more importantly, how these announcements can benefit your organization.
Lakebase: OLTP Meets the Lakehouse
Credits: Databricks
Traditionally organizations have relied on transactional databases like PostgreSQL, MySQL or SQL Server to power real-time applications. These databases are optimized for fast, small, row-level operations that support business-critical applications. Think about order processing, inventory management, or customer interaction platforms.
Organizations focusing on big data applications use analytical engines such as data warehouses or lakehouses. They are ideal for crunching massive datasets, running business intelligence, or training machine learning models.
Until now, combining both types of workloads meant maintaining separate infrastructure and complex ETL pipelines to move data between systems. That’s where Lakebase changes the game. Lakebase brings OLTP capabilities directly into Databricks, unifying the traditionally separate worlds of transactional and analytical data. This means that you can build applications, like internal tools, customer dashboards, or even AI agents, that write to and read from the same governed, performant platform as your analytics and AI models.
Why is this significant? Because it removes delays, simplifies architecture, and ensures your operational data is always analytics ready. No more duplicated data, no more stale dashboards, and no more worrying about syncing between systems. And because it’s Postgres based, developers can start using it with the tools and queries they already know, lowering the barrier for adoption across engineering teams.
Databricks Apps + Lakebase: a dream combo?
If you’re building data-powered applications, the new pairing of Databricks Apps with Lakebase might just be the match made in AI heaven. With the general availability of Databricks Apps, organizations can now build fully-fledged data applications directly within the Databricks platform. These apps are no longer just visualizations or static reports, but they are dynamic, interactive, and securely integrated with your data, governance, and user permissions.
The game-changer is how these apps work in tandem with Lakebase, the service we discussed previously. While Lakebase provides the transactional backbone, Databricks Apps become the interface layer where users interact with that data. Imagine building an internal supply chain portal, an AI-driven customer support interface, or a real-time inventory dashboard, all within the same governed platform where your analytics and machine learning also live. This combo solves a long-standing enterprise problem: the disconnect between operational systems and analytical insights.
With Databricks Apps + Lakebase, it is possible to create responsive, AI-enhanced tools that read from and write to the same source of truth, without needing to export data, manage separate security layers, or maintain brittle integration pipelines. This is how real-time, intelligent business tools are built.
Lakeflow Designer: ETL, Visually Reimagined
Last year, Lakeflow was introduced in order to ingest data with Databricks-native components. This year, they pushed it even further by not only making Lakeflow generaly available, but also with the preview of Lakeflow Designer. A visual, no-code interface for building data pipelines with drag-and-drop and natural language.
It’s designed for business users and data analysts who want to wrangle data without needing to write Spark code. By empowering more people to manage their own data flows, organizations can accelerate their projects and reduce dependence on central data engineering teams.
Smarter AI Agents with Agent Bricks
Another exciting launch this year was Agent Bricks, a new tool that enables organizations to create fully functional AI agents by simply describing their needs in natural language. No more stitching together APIs and prompt logic from scratch. Just plug in your data and let Databricks do the heavy lifting.
What’s more, Agent Bricks works hand-in-hand with AI/BI Genie, which has now reached general availability. Genie allows business users to ask questions in natural language and get instant answers from curated datasets within your business. When Genie is used inside Agent Bricks, it becomes even more valuable. You start by asking a question in natural language to Genie. Once the relevant data has been retrieved, another agent can start reasoning on top of these results coming from Genie. In essence, you can simply describe an analytical task, and Agent Bricks generates an optimized workflow with one or more agent(s) tailored for that purpose. This workflow of agents can contextualize, interpret, or even recommend next steps based on the output of a previous agent within the workflow.
What makes this truly impactful is how naturally the workflow unfolds. Business users don’t need to write code or toggle between platforms. They start with a question, receive structured data, and immediately follow up with deeper insights. Because both Genie and Agent Bricks are built atop the same foundational architecture, the entire process is tracked, versioned, and governed via MLflow and Unity Catalog.
This layered approach transforms the role of AI in business from simply answering queries to providing contextual intelligence. With the growing catalog of available agent templates, teams can mix and match capabilities to orchestrate multi-agent workflows that solve more complex challenges.
MLflow 3.0: A New Era for AI Lifecycle Management
MLflow has long been the go-to tool for managing machine learning workflows. With version 3.0, it jumps in on the generative AI era.
One of the most notable enhancements is agent observability. In MLflow 3.0, you can now track prompt versions, agent actions, outputs, and user interactions, which are critical for debugging and refining complex, dynamic systems. When your AI assistant generates unexpected results, you’ll be able to trace back exactly what happened, and also why it happened.
In addition, MLflow 3.0 introduces a Prompt Registry, a new feature that allows teams to version and manage prompt templates just like code or models. This is essential for companies deploying large language models, where small tweaks in phrasing can produce really different outcomes. With prompt tracking, you can safely test and promote different agent behaviors in a controlled way.
MLflow 3.0 also supports evaluation workflows for generative AI, including tools to automatically compare outputs across prompt or model changes using metrics like relevance, helpfulness, or even hallucination detection. These are not just add-ons but vital features for companies that want to scale AI without sacrificing reliability or safety.
Crucially, all of these components are deeply integrated into the Databricks ecosystem. That means every agent built with Agent Bricks, every model trained on the lakehouse, and every prompt variant tested is tracked, versioned, and governed.
SQL Gets an AI Superpower
With the launch of Apache Spark 4.0, Databricks has re-engineered its analytics engine for a new era, one where performance, flexibility, and AI-powered insights are seamlessly integrated into the everyday analyst toolkit.
At the heart of Spark 4.0 is a series of enhancements that boost both speed and usability. Among the most anticipated is support for Real-Time Mode, enabling sub-second query responses for streaming data, which drastically shortens the gap between data ingestion and insights. There’s also the introduction of the Variant data type, which improves how semi-structured data, like nested JSON, can be stored and queried without requiring rigid schema enforcement. Alongside these, Spark 4.0 adopts ANSI SQL compliance by default, providing greater consistency and reliability in how queries are parsed and executed across environments.
But perhaps the most transformative addition is the expanded support for user-defined functions (UDFs) and SQL pipe syntax. Analysts can now chain complex logic together more naturally in SQL, while data engineers can write modular, reusable logic that behaves predictable at scale.
In addition, they also came up with AI Functions in SQL. A set of pre-built capabilities that allow users to invoke large language models and computer vision tasks directly within their queries, all with simple SQL statements.
Credits: Databricks
This means a single SQL query can now do the work of an entire machine learning pipeline, making it easier for analysts and business users to access AI without needing a background in data science. Imagine extracting data from PDFs or running sentiment analysis on customer reviews, all directly from your warehouse queries.
Unity Catalog on Steroids
Last year, Databricks made Unity Catalog open source, signaling its ambition to become the industry standard for data governance. This year, they rolled out several new capabilities that positioned Unity Catalog as the #1 catalog in the world.
One of the standout additions is Unity Catalog Discover, a curated internal data marketplace that helps business users easily navigate the vast world of enterprise data. By organizing datasets by business domain, surfacing data quality signals, and integrating steward insights, Discover turns the often-overwhelming task of finding the right data into a guided, trust-based experience.
Another major feature introduced is Metric Views. These offer a centralized, governed way to define, manage, and reuse business-critical metrics across reports, dashboards, and apps. It ensures that everyone in the organization is aligned around consistent definitions and calculations, avoiding the classic “single source of truth” problem.
But perhaps the most strategic advancement is Unity Catalog’s native support for Apache Iceberg. Iceberg is rapidly becoming the format of choice for modern data lakes, thanks to its support for large-scale, high-concurrency, and ACID-compliant workloads. By integrating Iceberg directly into Unity Catalog, Databricks enables organizations to manage their multi-cloud, multi-engine data ecosystems from one central catalog. In practice, this means that teams can now catalog, query, and govern Iceberg tables not just within Databricks, but also across other engines like Snowflake, Trino, Presto, or even AWS Athena.
For organizations struggling with data silos, this makes high-value data easier to find, understand, and use responsibly. The Unity Catalog is the place to-go where business users can dig into data without the need of a data specialist team.
Credits: Databricks
Conversational Interfaces with Databricks One
Finally, Databricks One and AI/BI Genie are bringing the power of natural language to dashboards and reports. Instead of writing queries, users can simply ask questions, and the system will generate answers, visualizations, or actions.
This opens data insights to a much broader audience inside organizations, helping drive decisions without bottlenecks or training.
Wrap-up
The 2025 Summit marked a major turning point in Databricks’ journey. From being a Data Lakehouse provider to becoming a full AI-native platform. Each new feature, whether it’s no-code ETL, serverless GPU compute, or smarter AI agents, is designed to reduce complexity, increase access, and speed up innovation.
For businesses, this means shorter development cycles, lower costs, and more reliable AI systems.
Let us know what features you’re most excited about, and how you’re planning to use them in your data and AI journey.
Author: Lander Meeuws
Lander Meeuws
Lander is a data scientist with a strong foundation in AI, cloud, and data engineering. With experience in both consulting and startups, he thrives on building end-to-end data solutions that connect infrastructure with intelligence. Outside of work, you’ll find him in the mountains or on a bouldering wall, always seeking his next challenge.
Anxious to know what Plainsight could mean for you?
Read more “About us”
Consider “your career at Plainsight 🚀”
Any questions? “Contact us”