In this episode of the Data Show, I spoke with Jeff Jonas, CEO, founder and chief scientist of Senzing, a startup focused on making real-time entity resolution technologies broadly accessible. He was previously a fellow and chief scientist of context computing at IBM. Entity resolution (ER) refers to techniques and tools for identifying and linking manifestations of the same entity/object/individual. Ironically, ER itself has many different names (e.g., record linkage, duplicate detection, object consolidation/reconciliation, etc.).
ER is an essential first step in many domains, including marketing (cleaning up databases), law enforcement (background checks and counterterrorism), and financial services and investing. Knowing exactly who your customers are is an important task for security, fraud detection, marketing, and personalization. The proliferation of data sources and services has made ER very challenging in the internet age. In addition, many applications now increasingly require near real-time entity resolution.
We had a great conversation spanning many topics including:
Here are a few highlights from our conversation:
In the early ’90s, I worked on a much more advanced version of entity resolution for the casinos in Las Vegas and created software called NORA, non-obvious relationship awareness. Its purpose was to help casinos better understand who they were doing business with. We would ingest data from the loyalty club, everybody making hotel reservations, people showing up without reservations, everybody applying for jobs, people terminated, vendors, and 18 different lists of different kinds of bad people, some of them card counters (which aren’t that bad), some cheaters. And they wanted to figure out across all these identities when somebody was the same, and then when people were related. Some people were using 32 different names and a bunch of different social security numbers.
… Ultimately, IBM bought my company and this technology became what is known now at IBM as “identity insight.” Identity insight is a real-time entity resolution engine that gets used to solve many kinds of problems. MoneyGram implemented it and their fraud complaints dropped 72%. They saved a few hundred million just in their first few years.
… But while at IBM, I had a grand vision about a new type of entity resolution engine that would have been unlike anything that’s ever existed. It’s almost like a Swiss Army knife for ER.
The Senzing entity resolution engine works really well on two records from a domain that you’ve never even seen before. Say you’ve never done entity resolution on restaurants from Singapore. The first two records you feed it, it’s really, really already smart. And then as you feed it more data, it gets smarter and smarter.
… So, there are two things that we’ve intertwined. One is common sense. One type of common sense is the names—Dick, Dickie, Richie, Rick, Ricardo are all part of the same name family. Why should it have to study millions and millions of records to learn that again?
… Next to c
Stuff You Should Know
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Dateline NBC
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
Las Culturistas with Matt Rogers and Bowen Yang
Ding dong! Join your culture consultants, Matt Rogers and Bowen Yang, on an unforgettable journey into the beating heart of CULTURE. Alongside sizzling special guests, they GET INTO the hottest pop-culture moments of the day and the formative cultural experiences that turned them into Culturistas. Produced by the Big Money Players Network and iHeartRadio.