Data Engineering Podcast

This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Episodes

Enabling Agents In The Enterprise With A Platform Approach

June 29, 2025 • 54 mins

Summary
In this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agent systems, Arun shares insights on building agentic systems at an organizational scale, highlighting the importance of robust models, data connectivity, an...

Mark as Played

Transcript

Dagster's New Era: Modularizing Data Transformation in the Age of AI

June 17, 2025 • 61 mins

Summary
In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand software engineering's scope. He delves into the current state of AI adoption, the importanc...

Mark as Played

Transcript

AI and the Lakehouse: How Starburst is Pioneering New Workflows

June 10, 2025 • 44 mins

Summary
In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an AI agent for data exploration and using AI for metadata enrichment and workload optimiz...

Mark as Played

Transcript

Amazon S3: The Backbone of Modern Data Systems

June 2, 2025 • 61 mins

Summary
In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element in modern data systems, enabling scalable and cost-effective data lakes since its laun...

Mark as Played

Transcript

Scaling Data Operations With Platform Engineering

May 28, 2025 • 42 mins

Summary
In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online travel company, Chakravarthy shares insights on managing diverse database technologies and providing databases as a service to streamline operations. He explains how his team has transitioned from D...

Mark as Played

Transcript

From Data Discovery to AI: The Evolution of Semantic Layers

May 21, 2025 • 49 mins

Summary
In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and providing context to data users, they delve into the significance of semantic layers for AI applications. They dive into the nuances of semantic modeling, the impact of AI on data accessibility, and the ...

Mark as Played

Transcript

Balancing Off-the-Shelf and Custom Solutions in Data Engineering

May 13, 2025 • 46 mins

Summary
In this episode of the Data Engineering Podcast Tulika Bhatt, a senior software engineer at Netflix, talks about her experiences with large-scale data processing and the future of data engineering technologies. Tulika shares her journey into the data engineering field, discussing her work at BlackRock and Verizon before joining Netflix, and explains the challenges and innovations involved in managing Netflix's impression dat...

Mark as Played

Transcript

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

May 4, 2025 • 59 mins

Summary
In this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical database. Sida discusses the inception of StarRocks, which was forked from Apache Doris in 2020 and evolved into a high-performance Lakehouse query engine. He explains the architectural design of StarRocks, highlighting its capabilities in handling high concurrency and low latency quer...

Mark as Played

Transcript

Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications

April 27, 2025 • 72 mins

Summary
In this episode of the Data Engineering Podcast Derek Collison, creator of NATS and CEO of Synadia, talks about the evolution and capabilities of NATS as a multi-paradigm connectivity layer for distributed applications. Derek discusses the challenges and solutions in building distributed systems, and highlights the unique features of NATS that differentiate it from other messaging systems. He delves into the architectural de...

Mark as Played

Transcript

Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog

April 20, 2025 • 57 mins

Summary
In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data warehouses to developing the open-source project, Lakekeeper, an Apache Iceberg REST catalog written in Rust that facilitates building lake houses with essential components like storage, compute, an...

Mark as Played

Transcript

Simplifying Data Pipelines with Durable Execution

April 12, 2025 • 39 mins

Summary
In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and reduce operational overhead, ensuring exactly-once execution in distributed systems through the use of the Transact library. He discusses the importance of v...

Mark as Played

Transcript

Overcoming Redis Limitations: The Dragonfly DB Approach

March 30, 2025 • 43 mins

Summary
In this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-speed in-memory databases. Roman shares his experience creating a more efficient alternative to Redis, focusing on performance gains, scalability, and cost efficiency, while addressing limitations such as high throughput and low latency scenarios. He explains how Dragonfly DB solves ope...

Mark as Played

Transcript

Bringing AI Into The Inner Loop of Data Engineering With Ascend

March 23, 2025 • 52 mins

Summary
In this episode of the Data Engineering Podcast Sean Knapp, CEO of Ascend.io, explores the intersection of AI and data engineering. He discusses the evolution of data engineering and the role of AI in automating processes, alleviating burdens on data engineers, and enabling them to focus on complex tasks and innovation. The conversation covers the challenges and opportunities presented by AI, including the need for intellige...

Mark as Played

Transcript

Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy

March 16, 2025 • 51 mins

Summary
In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the critical role of Airflow in powering operational data products. He covers the evolution ...

Mark as Played

Transcript

Accelerated Computing in Modern Data Centers With Datapelago

March 8, 2025 • 55 mins

Summary
In this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data processing by reimagining system architecture. Rajan explains the shift from hyperconverged to disaggregated and composable infrastructure, highlighting the importance of accelerated computing in modern data centers. He discusses the evolution from proprietary to open, composable stacks, e...

Mark as Played

Transcript

The Future of Data Engineering: AI, LLMs, and Automation

February 25, 2025 • 59 mins

Summary
In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large language models (LLMs) to enhance productivity and reduce manual toil. The conversation covers the potential of AI to transform data engineering tasks, such as te...

Mark as Played

Transcript

Evolving Responsibilities in AI Data Management

February 16, 2025 • 38 mins

Summary
In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including extensive test datasets, especially in generative AI, and explains the differences in data ...

Mark as Played

Transcript

CSVs Will Never Die And OneSchema Is Counting On It

January 12, 2025 • 54 mins

Summary
In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the challenges of working with CSVs, including inconsistent type representation, lack of sch...

Mark as Played

Transcript

Breaking Down Data Silos: AI and ML in Master Data Management

January 2, 2025 • 57 mins

Summary
In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights t...

Mark as Played

Transcript

Building a Data Vision Board: A Guide to Strategic Planning

December 22, 2024 • 49 mins

Summary
In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision board" and explains how it can help organizations outline their strategic vision by consid...

Mark as Played

Transcript

Popular Podcasts

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Las Culturistas with Matt Rogers and Bowen Yang

Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy And Charlamagne Tha God!

The Clay Travis and Buck Sexton Show

The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

Advertise With Us

Data Engineering Podcast

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Enabling Agents In The Enterprise With A Platform Approach

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Dagster's New Era: Modularizing Data Transformation in the Age of AI

AI and the Lakehouse: How Starburst is Pioneering New Workflows

Amazon S3: The Backbone of Modern Data Systems

Scaling Data Operations With Platform Engineering

From Data Discovery to AI: The Evolution of Semantic Layers

Balancing Off-the-Shelf and Custom Solutions in Data Engineering

StarRocks: Bridging Lakehouse and OLAP for High-Performance Analytics

Exploring NATS: A Multi-Paradigm Connectivity Layer for Distributed Applications

Advanced Lakehouse Management With The LakeKeeper Iceberg REST Catalog

Simplifying Data Pipelines with Durable Execution

Overcoming Redis Limitations: The Dragonfly DB Approach

Bringing AI Into The Inner Loop of Data Engineering With Ascend

Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy

Accelerated Computing in Modern Data Centers With Datapelago

The Future of Data Engineering: AI, LLMs, and Automation

Evolving Responsibilities in AI Data Management

CSVs Will Never Die And OneSchema Is Counting On It

Breaking Down Data Silos: AI and ML in Master Data Management

Building a Data Vision Board: A Guide to Strategic Planning

Popular Podcasts

Enabling Agents In The Enterprise With A Platform Approach

Dagster's New Era: Modularizing Data Transformation in the Age of AI