Snacks Weekly on Data Science

Snacks Weekly on Data Science

This podcast is about making data science and machine learning knowledge accessible and less intimidating. Every week, I will handpick one selected industrial tech blog to break it down. We will discuss some key data science concepts and machine learning algorithms, and how they are applied in those real-world applications. Subscribe to the channel and enjoy Snacks Weekly on Data Science!

Episodes

May 25, 2026 8 mins

In this episode, we look at how companies deal with large volumes of unstructured text and why traditional clustering methods often fall short at scale. We explore two LLM-powered approaches shared by data scientists from Microsoft: a bottom-up pipeline that builds structure from data using embeddings and clustering, and a top-down pipeline that starts with LLM-generated categories and refines them recursively into a hierarchy.
...

Listen
Watch
Mark as Played

In this episode, we discuss a classic scaling problem in fraud and risk operations: too much manual review, inconsistent judgments, and growing complexity. We explore the team’s solution, Bumblebee, a multi-agent AI architecture that separates planning, evidence gathering, and analysis into specialized roles, enabling a robust and scalable system to solve the problem.
For more details, you can refer to their published tech blog,...

Listen
Watch
Mark as Played

In this episode, we explore how OLX improved discovery by combining keyword search and vector search instead of forcing a choice between the two. Keyword systems remain excellent for precision, while vector systems add semantic understanding. Together, they create a smarter and more user-friendly marketplace experience.
For more details, you can refer to their published tech blog, linked here for your reference: https://tech....

Listen
Watch
Mark as Played

In this episode, we explore how Udemy built a multilingual AI platform to bring its generative AI features to learners around the world. The team approached localization across three levels: a translation-first approach for broad and fast coverage, a fully native multilingual system for markets where fluency and cultural precision are essential, and a hybrid solution in between that intelligently routes between the two depending on...

Listen
Watch
Mark as Played

In this episode, we explore how Meta uses the “Ladder of Evidence” framework to evaluate the effectiveness of new product features. Instead of relying on a single analytical method, this framework helps teams choose the right type of evidence based on real-world constraints, leading to better and more informed product decisions.

For more details, you can refer to their published tech blog, linked here for your reference: 

https://...

Listen
Watch
Mark as Played

In this episode, we explore how Vimeo built a customized AI system for subtitle translation—one that goes beyond basic text translation to tackle the much more challenging problem of synchronizing language with timing. We discuss how the team designed a split-brain architecture to separate translation quality from timing constraints, and how they implemented fallback mechanisms to ensure the system remains reliable in real-world sc...

Listen
Watch
Mark as Played

In this episode, we explore how the New York Times engineering team used AI agents to scale unit test coverage across their News site. They accomplished this by building a custom coverage measurement tool, designing a two-loop human–AI workflow, and investing heavily in prompt engineering, including strict guardrails to prevent the agent from cheating or drifting. The key takeaway is that AI works best when it is tightly constraine...

Listen
Watch
Mark as Played

In this episode, we explore how Shopify evolved its product classification system across three major stages: from a traditional logistic regression model with TF-IDF features, to a multi-modal approach combining text and images, and finally to Vision Language Models built on top of a standardized and evolving product taxonomy. We also look at how architectural design and inference optimization are just as important as model accurac...

Listen
Watch
Mark as Played

In this episode, we explore how Faire built its ads system from scratch. On the business side, we discuss why ads matter for a growing marketplace: enabling brand discovery, creating a new revenue stream, and strengthening the overall ecosystem. On the technical side, we break down the three core components—Ads Delivery, Ads Manager, and Ads Foundation—and examine key considerations such as optimizing for long-term brand–retailer r...

Listen
Watch
Mark as Played

In this episode, we explore how Agoda tackled a costly engineering bottleneck by integrating GPT into their CI/CD pipeline to analyze failing SQL stored procedures automatically and suggest optimizations — complete with rewritten queries, index recommendations, and side-by-side performance comparisons. The result is a human-in-the-loop system where AI handles the heavy lifting and engineers make the final call, leading to significa...

Listen
Watch
Mark as Played
March 16, 2026 8 mins

In this episode, we explore how LinkedIn is reimagining job search with AI and large language models — evolving from rigid, keyword-based systems to flexible, intent-aware experiences that feel more conversational and personalized.

For more details, you can refer to their published tech blog, linked here for your reference: https://www.linkedin.com/blog/engineering/ai/building-the-next-generation-of-job-search-at-linkedin

Listen
Watch
Mark as Played

In this episode, we explore how Uber tackled the challenge of personalizing CRM communications at scale through contextual bandit strategies enhanced with generative AI embeddings, lightweight and powerful models like LinUCB and XGBoost, and smart decision augmentation with SquareCB. This work shows how data science can take a core business need—delivering relevant user communications—and build systems that adapt in near real time ...

Listen
Watch
Mark as Played

In this episode, we explore how seemingly perfect-looking SQL generated by AI agents can be “lying” when essential logic is missing. The Thomson Reuters Labs team highlights the need for deeper evaluation beyond simple syntax checks, and shows how tools like TruLens and AgentBench help expose hidden errors and better align agent outputs with real business intent.

For more details, you can refer to their published tech blog, linked h...

Listen
Watch
Mark as Played
February 23, 2026 10 mins

In this episode, we explore how Airbnb measures Listing Lifetime Value by separating it into baseline LTV, incremental LTV, and marketing-induced incremental LTV, and how this framework helps address challenges like measuring true incrementality and handling uncertainty about the future.
For more details, you can refer to their published tech blog, linked here for your reference: https://medium.com/airbnb-engineering/how-air...

Listen
Watch
Mark as Played

In this episode, we explore how OLX evolved its ranking algorithms—from the pairwise logic of RankNet to the metric-optimized power of LambdaRank—to ensure users find exactly what they’re looking for. We discuss how moving from simple classification to "Learning to Rank" helps businesses prioritize user attention where it matters most.
For more details, you can refer to their published tech blog, linked here for your r...

Listen
Watch
Mark as Played

In this episode, we explore how Udemy tackled the tricky challenge of understanding learner intent in their AI Assistant — from a simple similarity-based embedding model, through experiments with larger models and fine-tuning, to a hybrid system that intelligently leverages both embeddings and large language model classification. This evolution demonstrates how real-world ML systems often require balancing accuracy, cost, latency, ...

Listen
Watch
Mark as Played

In this episode, we explore how Meta’s data scientists approach product strategy using a structured framework that adapts to different data and problem scenarios. We walk through the distinct analytical approaches used across different problem spaces, defined by whether data availability is high or low and whether problem clarity is broad or concrete. Each scenario requires a different mix of thinking, collaboration, and analytics ...

Listen
Watch
Mark as Played

In this episode, we explore how PayPal estimates incremental lift in customer value using synthetic control methods. This causal inference–based approach provides a principled way to construct a counterfactual and isolate causal effects when traditional experiments aren’t sufficient, helping teams measure true impact in a complex, noisy, real-world environment and make more informed decisions.

For more details, you can refer to thei...

Listen
Watch
Mark as Played

In this episode, we explore how Netflix tackles the challenge of predicting user session intent by extending the capabilities of its foundation model with a hierarchical multi-task learning architecture. This approach helps Netflix better understand what users want in the moment and personalize the experience in real time, ultimately improving its recommendation system at scale.

For more details, you can refer to their published tec...

Listen
Watch
Mark as Played

In this episode, we explore how CVS Health builds its product recommendation system to deliver relevant, timely suggestions across millions of customers and thousands of products. We look at the business motivation behind personalization at CVS, and then walk through how the team uses Word2Vec, Euclidean distance, LLM-generated product summaries, and iterative refinement to improve the system step by step.

For more details, you can ...

Listen
Watch
Mark as Played

Popular Podcasts

    Hey Jonas! The official Jonas Brothers podcast. Hosted by Kevin, Joe, and Nick Jonas. It’s the Jonas Brothers you know... musicians, actors, and well, yes, brothers. Now, they’re sharing another side of themselves in the playful, intimate, and irreverent way only they can. Spend time with the Jonas Brothers here and stay a little bit longer for deep conversations like never before.

    Stuff You Should Know

    If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

    Dateline NBC

    Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

    The Bobby Bones Show

    Listen to 'The Bobby Bones Show' by downloading the daily full replay.

    The Clay Travis and Buck Sexton Show

    The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

Advertise With Us
Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2026 iHeartMedia, Inc.

  • Help
  • Privacy Policy
  • Terms of Use
  • AdChoicesAd Choices