All Episodes

April 9, 2021 7 mins

In this episode of Beneficial Intelligence, I discuss biased data. Machine Learning depends on large data sets, and unless you take care, ML algorithms will perpetuate any bias in the data it learns from.  

The famous ImageNet database contains 14 million labeled images. However, 6% of these have the wrong label. The labels are provided by humans paid very little per image, so they will work very fast. Unfortunately, as Nobel Prize winner Daniel Kahneman has shown, when humans work fast, they depend on their fast System 1 thinking that is very prone to bias. Thus, a woman in hospital scrubs is likely to be classified "nurse" and a man in the same clothes is likely to be classified "doctor." 

Google Translate was showing its bias when translating from Hungarian. Hungarian only has a gender-neutral pronoun, but the English translation was given a pronoun. The original gender-neutral phrases became "she does the dishes" and "he reads" in English.

As CIO or CTO, you need to make sure somebody ensures the quality of the data you use to train your machine learning algorithms. If you don't have a Chief Data Officer, maybe you have a Data Protection Officer who could reasonably be given this purview. But you cannot foist this responsibility on individual development teams under deadline pressure. It is your responsibility to ensure that any machine learning system is learning from clean, unbiased data. 

Beneficial Intelligence is a weekly podcast with stories and pragmatic advice for CIOs, CTOs, and other IT leaders. To get in touch, please contact me at sten@vesterli.com

Mark as Played

Advertise With Us

Popular Podcasts

Bookmarked by Reese's Book Club

Bookmarked by Reese's Book Club

Welcome to Bookmarked by Reese’s Book Club — the podcast where great stories, bold women, and irresistible conversations collide! Hosted by award-winning journalist Danielle Robay, each week new episodes balance thoughtful literary insight with the fervor of buzzy book trends, pop culture and more. Bookmarked brings together celebrities, tastemakers, influencers and authors from Reese's Book Club and beyond to share stories that transcend the page. Pull up a chair. You’re not just listening — you’re part of the conversation.

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.