Snacks Weekly on Data Science

This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications. Subscribe to the channel and enjoy Snacks Weekly on Data Science!

Episodes

Global Feature Importance with Collective Wisdom [Meta]

October 6, 2025 • 8 mins

In this episode, we look at how Meta addressed the challenge of feature selection at scale through Global Feature Importance—a system that aggregates insights across models to surface the most valuable features. This approach not only streamlines model development but also enables machine learning engineers to iterate more effectively and build models that deliver stronger business impact.
For more details, check out Meta’s publ...

Mark as Played

Evaluating Retrieval Capabilities of Language Models [Microsoft]

September 29, 2025 • 10 mins

In this episode, we explore how to evaluate the retrieval-augmented generation (RAG) capabilities of small language models. On the business side, we discuss why RAG, long context windows, and small language models are critical for building scalable and reliable AI systems. On the technical side, we walk through the Needle-in-a-Haystack methodology and discuss key findings about retrieval performance across different models.
For ...

Mark as Played

Personalized Recommendation with Foundation Models [Netflix]

September 22, 2025 • 11 mins

In this episode, we explore how Netflix enhanced recommendation personalization using foundation models. These models can process massive user histories through tokenization and attention mechanisms, while also addressing the cold-start problem with hybrid embeddings. The work highlights how principles from large language models can be adapted to build more effective recommendation systems at scale.

For more details, you can refer t...

Mark as Played

A/B Testing vs. Multi-Armed Bandits: A Simulated Study [Vanguard]

September 15, 2025 • 10 mins

In this episode, we explore how Vanguard evaluated standard A/B testing against multi-armed bandits for digital experimentation. Their simulated study showed that A/B testing is often the better choice when dealing with a small number of variations, while bandit strategies, such as Thompson Sampling, become more effective as the number of variations increases. The broader lesson is that experimentation design should always be conte...

Mark as Played

Catalog Attribute Extraction with Multi-Modal LLMs [Instacart]

September 8, 2025 • 10 mins

In this episode, we explore how Instacart tackled the challenge of extracting accurate product attributes at scale. We discuss different solutions—starting with SQL rules, moving to text-based ML models, and finally, Instacart’s multi-modal LLM platform, PARSE. By blending text and image data and enabling rapid configuration, PARSE demonstrates how modern AI tools can streamline data pipelines, reduce engineering overhead, and deli...

Mark as Played

Segmenting Supply with a Data-Driven Methodology [Airbnb]

September 1, 2025 • 8 mins

In this episode, we explore how Airbnb developed a structured framework that combines unsupervised clustering and supervised modeling to classify listings into meaningful supply personas based on availability patterns. This data-driven approach helps Airbnb enhance personalization, improve experimentation, and gain deeper insights into its global supply base.
For more details, you can refer to their published tech blog, linked here ...

Mark as Played

Causal Inference with Bayesian Structural Time Series Model [Walmart]

August 25, 2025 • 8 mins

In this episode, we explore the Bayesian Structural Time Series model as a causal inference methodology and walk through a real-world example of how Walmart leveraged it to measure the impact of a simple yet meaningful product taxonomy change.

For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/walmartglobaltech/decoding-causal-incrementality-in-e-commerce-leveraging-bayes...

Mark as Played

Advancements in Embedding-Based Retrieval [Pinterest]

August 18, 2025 • 10 mins

In this episode, we delve into how Pinterest has enhanced its embedding-based retrieval system to provide a more personalized, relevant, and dynamic Homefeed experience. By scaling their models with richer feature interactions, refreshing the content corpus with trending Pins, and leveraging cutting-edge machine learning techniques, Pinterest is able to serve better content—faster and more accurately—to hundreds of millions of user...

Mark as Played

How Data Scientists Lead and Drive Impact [Meta]

August 11, 2025 • 10 mins

In this episode, we dive into what it’s like to be a data scientist at Meta. Grounded in product leadership, data scientists at Meta apply deep analytical expertise to drive measurement, navigate complex product ecosystems, and shape key decisions—ultimately delivering meaningful impact on product outcomes.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@Analyt...

Mark as Played

Building Scalable Risk Management Platform [Revolut]

August 4, 2025 • 10 mins

In this episode, we explore how Revolut is reimagining risk management. By developing a modular, scientifically grounded, and explainable platform, the team has enabled faster, more accurate, and more transparent risk decisions—spanning diverse products and global markets.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/revolut/reinventing-risk-at-revolut-77e63c552...

Mark as Played

Tackling Interference Bias with Marketplace Marginal Values [Lyft]

July 28, 2025 • 9 mins

In this episode, we explore how Lyft tackles interference bias in marketplace experiments using Marketplace Marginal Values (MMVs). We break down why interference is a natural challenge in two-sided platforms like Lyft, and how their team uses optimization, simulation, and advanced metrics to measure causal effects more reliably.

For more details, check out the original tech blog linked here: https://eng.lyft.com/using-marketplac...

Mark as Played

Causal Inference with Double Machine Learning [Microsoft]

July 21, 2025 • 8 mins

In this episode, we explore how causal inference helps companies like Microsoft answer high‑stakes product and business questions when A/B testing isn’t possible. We dive into Double Machine Learning—a technique that leverages ML models to control for confounding variables and isolate true causal effects. The result is a flexible, rigorous framework that every data scientist should have in their toolkit.

For more details, you can re...

Mark as Played

Scalable and Blendable Feed Construction [Whatnot]

July 14, 2025 • 8 mins

In this episode, we explore how Whatnot tackled the challenge of scaling feed recommendation systems across a rapidly growing platform. We dive into WhataMix—a DAG-based framework that enables teams to build, test, and deploy feed logic using reusable, modular components. It’s a great example of how thoughtful system design can accelerate development while maintaining high standards in machine learning infrastructure.

For more detai...

Mark as Played

Using Generative and Traditional AI to Enhance Travel Experience [Expedia]

July 7, 2025 • 9 mins

In this episode, we explore how Expedia is integrating both generative and traditional AI to enhance the travel experience. The company’s approach leverages generative models for open-ended, natural language tasks, and relies on traditional models for structured, mission-critical problems. By playing to the strengths of each, Expedia is able to build smarter, more adaptable AI systems without overcomplicating things or compromising...

Mark as Played

Ensuring Data Quality at Petabyte Scale [Glassdoor]

June 30, 2025 • 11 mins

In this episode, we dive into how Glassdoor addresses the challenge of maintaining data quality at a petabyte scale. By treating data as a product, the engineering team built a centralized, scalable platform that enables proactive validation, continuous monitoring, and cross-team collaboration. From data contracts and static code analysis to LLM-based logic checks and anomaly detection, we unpack the key practices behind their appr...

Mark as Played

Building a Travel Assistant with LLMs [Agoda]

June 23, 2025 • 8 mins

In this episode, we explore how Agoda used large language models (LLMs) to improve user experience through building a conversational AI product. By focusing on prompt engineering, grounding data, and smart evaluation, the team built a scalable assistant that adds real value to the user journey.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/agoda-engineering/how-w...

Mark as Played

Setting Goals at Scale with the Goal Map [Meta]

June 16, 2025 • 8 mins

In this episode, we explore how Meta tackles the complex challenge of setting aligned, measurable, and high-impact goals across a vast organization. Whether you’re in data science, analytics, or product leadership, this episode offers practical insights into building a more effective goal-setting system.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@Analytics...

Mark as Played

Predicting user actions with transformer-based models [Hike]

June 9, 2025 • 6 mins

In this episode, we will explore how Hike applied transformer-based models to predict user behavior in their Rush Gaming Universe. We will look at the business motivation and break down the technical solution, from input features to prediction and evaluation. This case is a good example of how modern deep learning techniques can drive real impact in improving user experience.
For more details, you can refer to their published te...

Mark as Played

Quantization Techniques for Language Model [EsperantoTech]

June 2, 2025 • 10 mins

In this episode, we will explore quantization techniques for language models. We will look at the business motivation—making large language models more efficient—and unpack the technical solutions that make this possible.

For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/@EsperantoTech/quantization-and-mixed-mode-techniques-for-small-language-models-b3366dbad554

Mark as Played

Key Ingredients for a Secure Agentic AI Future [Intuit]

May 26, 2025 • 9 mins

In this episode, we’ll explore the unique security challenges posed by agentic AI systems and why embedding trust and safety into these systems from the ground up is critical. We’ll review a few key ingredients for building a secure agentic AI future.

For more details, you can refer to the blog, linked here for your reference: https://medium.com/intuit-engineering/owasp-dishes-out-key-ingredients-for-a-secure-agentic-ai-future-be862...

Mark as Played

Popular Podcasts

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

The Bobby Bones Show

Listen to 'The Bobby Bones Show' by downloading the daily full replay.

Advertise With Us

Snacks Weekly on Data Science

Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Global Feature Importance with Collective Wisdom [Meta]

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Evaluating Retrieval Capabilities of Language Models [Microsoft]

Personalized Recommendation with Foundation Models [Netflix]

A/B Testing vs. Multi-Armed Bandits: A Simulated Study [Vanguard]

Catalog Attribute Extraction with Multi-Modal LLMs [Instacart]

Segmenting Supply with a Data-Driven Methodology [Airbnb]

Causal Inference with Bayesian Structural Time Series Model [Walmart]

Advancements in Embedding-Based Retrieval [Pinterest]

How Data Scientists Lead and Drive Impact [Meta]

Building Scalable Risk Management Platform [Revolut]

Tackling Interference Bias with Marketplace Marginal Values [Lyft]

Causal Inference with Double Machine Learning [Microsoft]

Scalable and Blendable Feed Construction [Whatnot]

Using Generative and Traditional AI to Enhance Travel Experience [Expedia]

Ensuring Data Quality at Petabyte Scale [Glassdoor]

Building a Travel Assistant with LLMs [Agoda]

Setting Goals at Scale with the Goal Map [Meta]

Predicting user actions with transformer-based models [Hike]

Quantization Techniques for Language Model [EsperantoTech]

Key Ingredients for a Secure Agentic AI Future [Intuit]

Popular Podcasts

Global Feature Importance with Collective Wisdom [Meta]

Evaluating Retrieval Capabilities of Language Models [Microsoft]

Causal Inference with Double Machine Learning [Microsoft]