All Episodes

October 28, 2009 27 mins

Shazam and Midomi are both types of music recognition software. Tune in as the TechStuff guys compare and contrast Shazam and Midomi and explain how they both work.

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

See omnystudio.com/listener for privacy information.

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
Brought to you by the reinvented two thousand twelve camera.
It's ready. Are you get in touch with technologies? With
tech stuff from how stuff works dot com. You've heard
the rumors before, perhaps, and whispers written between the lines
of the textbooks. Conspiracies, paranormal events, all those things that

(00:24):
disappear from the official explanations. Tune in and learn more
of the stuff they don't want you to know in
this video podcast from how stuff works dot com. Hello, everybody,
welcomes to text stuff. My name is Chris Platt, and

(00:45):
I'm an editor here at how stuff works dot com,
sitting across from me as usual as senior writer Jonathan Strickland.
Hey there, do you think people can identify our voices
pretty easily? Well, I would imagine, so we don't sound
exactly alike or anything at all alike, really, I know,
But I mean, do you think people could listen to us,

(01:06):
people who have listened to the podcast before and say,
I know that's Jonathan and that's Chris. Possibly, although I
have heard at least one person claim after we did
a phone interview that I was the only person who
sounded the way I did on the podcast. But you
know what makes you look taller. You were sitting further
You were sitting further away from the phone, and it

(01:26):
was on speaker phone at the time, so that may
have played apart. But um, what we're getting to here
is kind of working our way slowly around to the
topic we're gonna discuss today. But that actually comes courtesy
of a little listener mail. This listener mail comes from

(01:46):
Mason in Iowa, and he says, Hey, there, love the podcast.
How about an episode on how Shazam and Medomi work?
How does the program segment compare against a database and
return result? It's so darn quickly, especially when I'm the
one doing the singing in Medomi's case. Well, Mason, first

(02:07):
of all, I should I should point out I would
be remiss if I did not point out that we
have a sister podcast called Stuff from the B Side,
and they actually did an episode about this kind of software. However,
we're gonna tackle it ourselves because we actually tend to
cover the same topics now and then we've we've done
both done the electric guitar, so I see no reason

(02:29):
for us not to uh tackle this one. Sure, and
there's uh you know, we can add a little stuff
that they didn't add. Sure, yeah, like puns that in
some updates about what's going on. Oh oh yeah, we
can do that too. So let's how about which one
do you us start with? Zam or Medomi? Um? Well,
she Zam has been around longer, that's true. She Zam
has been around since the early two thousands. Like two

(02:52):
thousand two was when it first the company first started.
In two thousand thirty was really when the service started
to get some attention. Actually I had it down as
sam starting in two thousand really wow, okay, so I
stand corrected. Yeah, the the earliest version of Shazam was
an interesting version. You would you would hold your cell

(03:12):
phone up to the source of music and you would
uh send that to the Shazam service, and you would
get a text message back identifying the song. And I
can see why this could be considered like some sort
of weird magic. I mean, how could a service figure
that out so quickly because it was usually just a
matter of a few seconds between when you sent the

(03:33):
the music and when or when you held the phone
up to the music source and when you got the reply.
Oh it's magic, you know, right. Uh, you have to
believe we are magic. We can do this. You appreciate
that you do in the Zany reference, well, yeah, you're welcome.

(03:53):
I was gonna go with Loving Spoonful, but even I think, Sanna,
do we already lost everybody Loving Spoonful? I don't. There's
probably three of you out there who even know who
that is. So but yeah, I think you're right. And
it's um stuff like this and in voice recognition. I
think it's one of those things that just sort of
surprises people because you don't think that the computer has

(04:13):
enough I don't know intelligence to figure it out. Wait,
wait a minute, that's a machine, and the machine figure
out what I was doing there. Let's talk about how
Shazam does this. Now, what Shazam does It breaks down
any recording into um, just some very simple data. They
call it fingerprinting. They fingerprints songs. And if you were

(04:36):
to if you were to try and if you were
to try and and and chart out a song from
start to finish with all the different elements that go
into that, like you know, you're you're essentially assigning data
points to every single frequency in that song, you would
end up with a fairly substantial amount of data yeah,
and Shazam has something like one point seven one point

(04:57):
eight million songs in the database, probably even more than
that now because that was the most recent data I
saw too, but I think that data was yeah. Yeah.
So what sasam does is they take the peaks and
the troughs, the highest points in the frequency and the
lowest points of the frequency to map out patterns and

(05:18):
the fingerprint songs in that way. So they're cutting out
a lot of the information in the middle and uh.
That actually saves lots of space. It also saves time
when you're trying to match one song or or one
clip to a database full of songs. So when you
hold your phone up to a SEV speakers or uh,
you know whatever whatever is making the music, um, it

(05:39):
starts to uh. It sends that clip to the Shazam service,
which then immediately analyzes it and looks at those peaks
and troughs and tries to find a match in the database. Now,
the reason why it's so fast is it Shazam keeps
all of this in the computer's memory. It's not stored
on a hard drive. The the database doesn't have to

(06:02):
you know, the process sucsessor doesn't have to search the
hard drive to find a match. Everything's in memory, which
means they have to have lots and lots of machines
to uh to be able to hold all those songs
within computer memory, or at least the fingerprints of those songs.
And so once it finds a match, it then sends
that data back to you, and usually it's it's fairly accurate.

(06:23):
It's especially accurate for more modern music, the stuff that's
been out for the last couple of years. Um, if
you start going back further, you may start encountering songs
that just aren't not in the database, and so you
won't get a legitimate match. And there are cases where
if you have a song that is very similar to
another song, it may come back with an incorrect match.

(06:45):
But that doesn't happen that often. I suppose if you
were to, you know, try and identify a Creed or
Nickelback song, it would might come back wrong because all
those damn songs sound the same to me. I mean, seriously, kids,
find some better music, all right, So get off his lawn, Yes,
get off my lawn. While I listened to, you know,
music that they made in the good old days, like

(07:05):
the late nineteen seventies in London. Um, like the sex pistols.
So not that not that modern stuff that's just junk
as an email to anyway, there's there are they're actually
papers online. They go into very great to tail about
the algorithms that they that Shazam uses to identify to

(07:27):
match up songs and and to send the data back
to you. Yeah, they go into quite a bit of
depth and as our actually as our sister podcast pointing out,
as uh John and Mark talked about, they used a
three dimensional graph to do this. Yeah, it's kind of cool. Yeah,
it's called a uh, well you can call it a
couple different things of time frequency graph where a spectrogram, um, sorry,

(07:51):
spectrogram and uh, basically it's a it's three dimensional and
that it goes it shows you over time how the
you know, theeks and valleys of the song frequencies change, um.
And that's kind of interesting that that it can that
it's able to do that in the first place. But
that's how it's you know, that's how it's looking at it,

(08:12):
and that's having the element of time in there is
crucial because otherwise, um, you know, wouldn't be able to
uh look at that Yeah. The the cool thing about
this services that you can hold your phone up to
any point of the song and and as long as
you're able to get about fifteen seconds worth of of
of content, then that's enough for sam to work with

(08:36):
to find a match. So it doesn't have to be
the beginning of the song, it doesn't have to be
the end. It can be at any point, and as
long as it is able to map out those troughs
and valleys accurately, then you should be able to get
a pretty good result. There are some things that can
cause some problems besides the fact that some songs do
sound alike. I mean that I was kind of making

(08:56):
a joke there, but that could happen if you had
if you're holding up a phone to a part where
there was it was sampling another song and it was
long enough, you could conceivably get the wrong result. But
also if there's ambient sound that's interfering in the area,
you may not be able to get a result because
you know, it can't identify the song because it's getting

(09:17):
all these other sound inputs that are mixing it up. Yeah,
I mean that's one of the things they sort of advertise.
Wishes am is that you could take your phone around
the mall, for example, or you know, on the radio
in your car, um and hold it up to the
speaker and it'll tell you what song that is. Well,
I mean, if you're at the mall, you're dealing with
all those people at the mall, You're dealing with perhaps

(09:39):
conflicting songs coming from different places. We're acoustics, the distance
from you to the speaker. I mean, there are all
kinds of things can interfere with that. Not to mention
your cell phone frequence frequency, you know, cutting out suddenly
you're yes, so I've uh yeah, I thankfully there was

(10:00):
a visual queue there. You guys don't get to see
because this is an audio podcast. Brows did go up
about a foot. That was nice. I was like, I know,
I'm not supposed to talk now. Um No, I've used
this this application several times and uh and I have.
I've had varying degrees of success. There's a a theater
I go to on a fairly regular basis at a

(10:22):
stage play theater, not a cinema theata. I got to
the Theatah fairly regularly actually, and they have some They
have a pretty cool mix of of pre show and
post show music that they play on the sound system,
And there have been a couple of times where there's
been songs that I played that I just I didn't recognize,

(10:43):
and Shazam is pretty good at identifying those. The hard
part is being able to get a long enough section
where people aren't talking loud so loudly that it's interfering
with the sample, right, So there have been times where
I've stood next to the speaker and held my phone
up to it. But now there's problems with that too,

(11:04):
because if the volume is too loud, then you get
some distortion and then it doesn't really work in that
way either. But also there's this incredibly obscure Japanese song
that they play that Shasam just doesn't seem to recognize.
So I'm gonna have to break down and ask someone
at the theater what the heck it is. Yeah, and
then there's the the other factor, you know, the part

(11:25):
where you're standing there holding your cell phone up to
a speaker and people are starting to wonder what the
heck you're doing. But I mean, if I start worrying
about what people think of me, now, I mean, come on,
I've made it this far, why should I worry now?
So let's let's kind of shift over a little bit
and talk about Medomi, which is a little different from Shasanna.

(11:45):
There's actually a few interesting uh and and fairly major differences,
one of which is that the database that that Medomi
uses is not it's not collected by the company necessarily,
it's user jim rated. Users of Medomi have sung songs
or hummed songs or whistled songs into their phones and identified,

(12:09):
you know, tagged the file as being a certain song,
and the user community tends to comment on these, vote
him up or down? Uh, submit their own version. Yeah, yeah,
what are you kidding? That doesn't sound like that. Yeah,
here's my version, because god lord, the one that's in
the database is terrible. And you could you could sing

(12:30):
your own version and upload it to medomie and then
people can vote on which one is the more representative
version of the song. The wiki model, right, and the
idea behind Medomi is that, you know, with Shazam, you
really need the original source or you need you need
to be able to hold the phone up to something
that's playing the actual song. You can't necessarily sing it yourself,

(12:53):
or even if there's a band playing it live. That
doesn't necessarily mean it's going to be able to identify
the song because it's looking for a specific pattern, and
depending on the way you sing it or the way
the band sings it or whatever, it may not match
up to anything in their database, even though the song
itself maybe in the database. Because of this fingerprint technology,

(13:15):
Madomi is different. You know, it may be able to
track your the way you sing the song and find
a match within its user generated database. Now, the way
Medomi explains that the how this works is a little
um vague, I guess you could say, I'd say, okay,

(13:35):
So in general, what happens is you submit the song
and whatever format that you've chosen, whether it's whistling, singing, humming, whatever.
The software they use converts this into a special length
computer language they call Crystal language. Um, it's actually computer
language is probably the wrong term, but it's their own

(13:56):
proprietary format format exactly. And so then it looks into
the database and sees if there are any files that
have a similar, uh similar style to what you just
sang or hummed or whistled or whatever, and then it
gives you a selection of songs that most likely are

(14:16):
going to fit what you sang. I say most likely
because um, it's this. Yes, I've actually played with Medomie.
Now I don't as far as I know, Medomi doesn't
have an Android application. They may have it now, but
at the time when I first heard of it, they
did not have an Android application, so I don't have
it on my phone. But you can use the service

(14:37):
on the web. You can use it with a computer,
so as long as your computer has a microphone, you
can give it a try. And I decided to do this,
and the first couple of songs I tried it was
surprisingly very accurate. You know, I am not a good singer,
as I'm sure many of you can imagine, my singing
ability is is very poor. Um. I can do character

(14:59):
voices for certain musicals and that's it. But I decided
to give it a shot, and the first couple of
songs I tried came out pretty accurate. I think the
first one I did was uh, Sedated by the Ramans
and I want to be Sedated, and it came came
back right away. But the more I tried, the more

(15:20):
I was encountering um anomalies, or maybe maybe the correct
one was the anomaly for me. But I was having
issues where I would sing something and I would get
a result that was totally not what I was singing.
The most egregious of these would be Blue Oyster Cults
classic hit Don't Fear the Reaper. Uh My version, apparently

(15:42):
to Medomi, sounds an awful lot like the girl from Ipanema.
There wasn't nearly enough cow bell. No. I clearly was
lacking the cowbell, and therefore Medomi was unable to understand
what it was. But uh so, but if I had
think of it this way, if I had recorded Don't

(16:03):
Fear the Reaper and submitted it to Medomi and then
tried to do it again, it would have identified my
version of Don't Fear the Reaper as being correct. There
you go, so that that's not, you know, difficult at all,
just because they accord all your own songs, so that
you can go and go, oh, what's the name of
that song that I always yeah, exactly, Well, that that
would more It would mostly be to impress or distress

(16:27):
your friends, because you could sing a song that sounds
nothing like the original, but it would come back and
know what it was you were singing, because you were
the one who provided the template for that song. Do
you see that that seems problematic for me? Well, because
it seems like you could intentionally go in and sing
a whole bunch of stuff and tag it and just
drive people nuts. Well that's that's why you know, it's

(16:49):
it's user user police, so you have to it's one
of those services that depends heavily upon the community of users.
If the community is being very uh you know, honest
and forthright, then they are going to police the different
submissions and make sure that only the ones that actually

(17:10):
and accurately represent the songs are the ones that make
it into the database. Otherwise, uh, you know, they essentially say, well,
this is someone who's just goofing around and trying to
cause problems, and they'll they'll nix it, right. So, um,
but yeah, I mean, anytime there's a user, anytime there's
a service that depends upon the community of users to

(17:32):
to keep it going, it makes me a little nervous
because you know, you never know when a group of
people is just gonna get a little capricious and decide
that you know, they that every song needs to be
sung to the tune of the Yellow Rose in Texas
and why not. Yeah. So, uh as for the new information, well,

(17:52):
I mean I didn't know if we were. That's pretty
much the way that those and again it's it's referring
to a database. It's sending you the results pretty quickly. Um.
But they they don't Madonie doesn't share as much of
it's it's uh back end operations as Shazam does, so

(18:12):
we can't for sure. Men say that they use the
same sort of setup where they you know, everything stored
in memory as opposed to hard drive space or whatever.
We just don't know because we aren't privy to that information.
But um, yeah, that's that's all I have on just
the basic operations. So what's this new info you've well, um,
you know, in in in general, Jasam basically offers the

(18:37):
application for free for you know, iPhones and Nokia phones. Um.
But actually, in the as of the ninth don't know,
late middle part of October two thousand nine when we're
recording this, um Chasm actually just got a new round
of funding from Kleiner Perkins, call Field and Buyers. They
actually have their own iPhone Apple Cation fund called the

(19:03):
I fund this hundred million dollars and uh, you know
it's specifically geared towards iPhone developers. Well, Shasam got some
of that money, and there, um they're going to continue
to offer their services for free for a while. Uh. Well,
they actually charge for the BlackBerry version, but the iPhone
and Nokia Phones version. I don't think they charge for Android,
or if they have, I am totally unaware of it

(19:24):
because I have Shazam for the Android phone and and
if they're charging me for it, I need to pay
better attention to my bills. Well, you need to pay
attention to your bills by the end of the year
in two thousand nine, because um, there you will start
to get five free song identifications a month, and then
four nine a month after that if you want unlimited
usage and all the extra goodies. And they're talking about

(19:46):
selling application or i'm sorry, selling items, uh you know
as part of the application to like you know, banned
gear and you know, possibly selling video. I'm pretty sure
you know what. I can't be certain now, but I could.
I seem to recall the last time that Sasam correctly

(20:06):
identified a song for me. It gave me a link
to where I could purchase the album on Amazon. But
maybe maybe I've got that mistaken now because it has
been a while since I've used it and had it
actually work. Because it turns out that most music that
I really like, I already know what the song is. Yeah,
it's just the really really obscure stuff that tends to

(20:28):
like I'm like, wow, that's so cool. I've never heard
that before, and apparently neither has Shazam, so I haven't
had much call to use it recently. I don't have
a Nokia phone, and I don't have an iPhone or
an Android or BlackBerry um, but I do have the
iPod and I could download Susanne. But I have the
first generation iPod Touch, and so I could stand there

(20:48):
all day holding up my microphone lists iPod Touch up,
and She'sam's just gonna go anytime? You just just I'll
be waiting. Actually, I could use a microphone that ugs in,
but you know that costs money, right, and then you
don't and you have to carry around an extra pace
of gear and you have to be someplace where there's

(21:09):
WiFi for you to be able to actually indeed, you
know that could be problematic that's a lot of qualifiers
for a single application. So maybe I'll invest in a
smartphone eventually. Yeah, that would um, that would be my recommendation.
You know, there's a there's this awesome new phone called
the Motorola Droid. Have you heard about it? I do

(21:29):
think I have. It's kind of hard not to, um
so at any rate. Yeah, that's that's interesting stuff. Uh,
I mean, I think these these services are really cool ideas,
especially for people who happen to be out and about
a lot and they hear a lot of you know,
encountering a lot of new music. Um. It's it's a
really interesting way to to to educate yourself about stuff

(21:52):
that you like that you know, you you encounter, but
you don't necessarily you're not really familiar with it. Um.
I mean, there are a lot of other options to Like,
there are a lot of like HD radio stations which
will identify the song that you are listening to. And
there's some HD radio applications where you can even get
access to buying the song off of something like Amazon

(22:12):
or iTunes. So uh yeah. Of course satellite radio also
identify as those and um, you know there are other
and this really isn't all that especially new. You know,
Grace Note alias c d dB, you know, has been
identifying songs mathematically, uh, you know, based on identification numbers,
and at least, you know, it's a little bit different

(22:34):
from the way that Shazam and Medomi do. And then
there's my buddy John, who seems to know every song
ever recorded. So I'll be like, hey, John, what's that
song that goes has He'll be like, oh, that's such
and such by the so and so's and so. Yeah.
If you if you don't have access to Shazam or Medomi,
you should totally call my buddy John because he probably

(22:54):
knows a song. Yeah, you know. And I bet that somewhere, uh,
in our brains there is a mathematical algorithm going on
to converts those bits of information you have to be
able to recognize something. Well, yeah, you think about the
brain is pretty remarkable because you can recognize a song
even if someone is mangling it. Yeah, you know, it
doesn't you may not sound anything like the original song,

(23:17):
but because we're able to recognize that, we can say, hey,
that song is as you know, such and such like,
or you can always do my favorite thing, which is Hey,
who sings that song? Oh that would be the Beatles. Yeah,
let them do it if nice. Um no, I'm a
I'm a kind guy. But yeah, I mean, you know,

(23:37):
they say that kids who play a musical instrument are
a little better at math. You know, it's sort of
there's a connection between the two. So I don't know,
maybe there's a thing. I'm I'm a pretty mean hand
with the slide whistle on and not no, not with
the slide rule, lack at ish sh mac and you do.

(24:01):
Oh there you go. See those are the kinds of
lame jokes that we bring. That's what we bring to
the table that you just aren't going to get from
stuff from the B sides. Yeah, you will get some
very valuable information about music from stuff on the B
sides and instruments, you won't get really horrible jokes. So
thank them for that. Yeah, yeah, definitely, Um so yeah

(24:22):
they If you want to learn more about it, I
do recommend listening to that podcast. It's you know, you'll
have to dig back in the archives a little bit
to fink it's from February. Yeah, it's been quite a bit,
but shorter podcasts and yeah, that was a group that
was back when we would we would or ten minute podcasts. Yeah,
but um, yeah, if you want to learn more about it,

(24:42):
I would recommend listening to that podcast, and if you
haven't already, you should subscribe to it because it's a
good show. Yeah, definitely. So they also covered some some
great classical music, um topics that I had never really
explored before. So I learned quite a bit from that show,
like classical, like in all the way back to the sixties. Yeah,

(25:04):
the seventeen sixties. Snap, all right, so let's let's let's
find this. Let's find this down a little bit. Let's
take this down on notch. Let's let's end with a
little listener mail is listener mail comes from my good
friend Ry Adney. She wrote to me and she said,

(25:26):
I'm hearing your Touch Screens podcast now and the HTC
Hero for Sprints supports multi touch. I've been crushing on
this phone for a while, but can't bring myself to
ditch my Palm Centro just yet. So yeah, we were
talking about, you know, the multi touch and how Apple
had had patented the multi touch technology. Um well as

(25:48):
long as as Google and HTC are creating a multi
touch system that doesn't infringe on that patent. They're fine
if they found a different way of doing it otherwise
they could might they may see a letter from Apple's lawyers.
It's not that Apple is known to be litigitus or anything,
not at all. And on that happy note, let's wind up, guys,

(26:11):
thank you for listening. Remember we have tech Stuff Live.
It's a show that's live every Tuesday, one pm Eastern.
You can find that at the house stuff works dot
com blogs. Let's just go to the house stuff works
dot com home page. Look on the right side. You'll
find links to the blogs there and you'll just look
for the one that says watch tech stuff Live or

(26:31):
some variation thereof, because that's where you can see our show.
And uh, we're starting to get some really good feedback
on it, but we need more more viewers. So I
know that one pm you might be working or in class,
you know, just find a quiet corner that you can
hide in and watch and you'll love it. You'll love
It's the best use of twenty two minutes of your
time right there. Yeah. Also, remember that we both contribute

(26:55):
to the blogs at how softwork dot com. We I
write articles occasionally for the website uh thus List Senior
writer title. I'll there you go, so visit how stuff
Works dot com and Chris and I will talk to
you again really soon. For more on this and thousands
of other topics, visit how stuff Works dot com and
be sure to check out the new tech stuff blog

(27:17):
now on the House Stuff Works homepage, brought to you
by the reinvented two thousand twelve camera. It's ready, are
you

TechStuff News

Advertise With Us

Follow Us On

Hosts And Creators

Oz Woloshyn

Oz Woloshyn

Karah Preiss

Karah Preiss

Show Links

AboutStoreRSS

Popular Podcasts

Bookmarked by Reese's Book Club

Bookmarked by Reese's Book Club

Welcome to Bookmarked by Reese’s Book Club — the podcast where great stories, bold women, and irresistible conversations collide! Hosted by award-winning journalist Danielle Robay, each week new episodes balance thoughtful literary insight with the fervor of buzzy book trends, pop culture and more. Bookmarked brings together celebrities, tastemakers, influencers and authors from Reese's Book Club and beyond to share stories that transcend the page. Pull up a chair. You’re not just listening — you’re part of the conversation.

Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.