Apple Tasting: Reinforcement learning for quality control

All Episodes

February 22, 2021 • 35 mins

Have you ever come home from the supermarket to discover one of the apples you bought is rotten? It's likely your trust for that grocer was diminished, or you might stop buying that particular brand of apples altogether.

In this episode, we discuss how the quality controls in a production line need to use smart sampling methods in order to avoid sending bad products to the customer, which could ruin the reputation of both the brand and seller.

To do this we describe a thought experiment called Apple Tasting. This allows us to demonstrate the concepts of regret and reward in a sampling process, giving rise to the use of Contextual Bandit Algorithms. Contextual Bandits come from the field of Reinforcement Learning which is a form of Machine Learning where an agent performs an action and tries to maximise the cumulative reward from its environment over time. Standard bandit algorithms simply choose between a number of actions and measure the reward in order to determine the average reward of each action. But a Contextual Bandit also uses information from its environment to inform both the likely reward and regret of subsequent actions. This is particularly useful in personalised product recommendation engines where the bandit algorithm is given some contextual information about the user.

Back to Apple Tasting and product quality control. The contextual bandit in this scenario, consumes a signal from a benign test that is indicative, but not conclusive, of there being a fault and then makes the decision to perform a more in-depth test or not. So the answer for when you should discard or test your product depends on the relative costs of making the right decision (reward) or wrong decision (regret) and how your experience of the environment affected these in the past.

We speak with Prof. David Leslie about how this logic can be applied to any manufacturing pipeline where there is a downside risk of not quality checking the product but a cost in a false positive detection of a bad product.

Other areas of application include:

Anomalous behaviour in a jet engine e.g. low fuel efficiency, which could be nothing or could be serious, so it might be worth taking the plane in for repair.
Changepoints in network data time series - does it mean there’s a fault on the line or does it mean the next series of The Queen’s Gambit has just been released? Should we send an engineer out?

With interview guest David Leslie, Professor of Statistical Learning in the Department of Mathematics and Statistics at Lancaster University.

Further Reading

Publication list for Prof. David Leslie (http://bitly.ws/bQ4a via Lancaster University)
Paper on "Selecting Multiple Web Adverts - a Contextual Multi-armed Bandit with State Uncertainty" in Journal of the ORS (http://bitly.ws/bQ3X via Lancaster University)
Paper on "Apple tasting" (http://bitly.ws/bQeW via ScienceDirect)
Paper by Google Inc. on "AutoML for Contextual Bandits" (https://arxiv.org/abs/1909.03212 via arXiv)

Send us a text

Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think wou

Mark as Played

Advertise With Us

Popular Podcasts

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Apple Tasting: Reinforcement learning for quality control