All Episodes

May 2, 2024 14 mins

Can an AI actually listen to music, as opposed to merely hearing sound waves and processing them? In this episode, listen to hear how Hume AI, the first empathic AI, does attempting to listen to music, and how ChatGPT analyzes its performance afterwards.

To test whether Hume can hear the emotion sung within a song, distinct from the lyrics, I had ChatGPT write emotionless lyrics, then had Suno generate songs using these lyrics with different emotions. Listen to the full songs of each emotion used in this episode below.

Observations 1 (sad) https://suno.com/song/7ad3f15d-8171-4b70-8e16-9736ac8dc806

Observations 1 (happy) https://suno.com/song/06722087-631f-4a4d-909d-5bfbf5458ea4 

Observations 1 (folk song expressing boredom) https://suno.com/song/a40e427e-0f99-49e7-9357-2b3454423fea 

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Trevor (00:00):
Welcome to AI Meets Productivity,

(00:01):
your weekly podcast between Trevor
Lohrbeer, a productivity expert, and
ChatGPT, an artificial intelligence.
This week, we're going to be asking

the question (00:12):
can AI listen to music?
Now, when I initially conceived of this
episode, what I had wanted to do was play
Hume AI different songs and have it tell me
what emotion it heard within those songs.
And I did actually do that when recording
this episode-- but it was a complete failure.

(00:32):
So instead, what I'm going to do is I'm going
to tell you a little bit about my process,
maybe play you a few excerpts around how
Hume responded, and then finally discuss
with ChatGPT why this was not possible.
So first let's talk about
exactly what I was trying to do.

(00:53):
What I wanted to do was see if Hume
could extract the emotion from sung
lyrics-- independently of the text itself.
Because what Hume is supposed to be doing is
listening to the emotion in people's voices.
So, sung music is a perfect way to do that.
We have Suno which can generate different songs.

(01:14):
So, what I did is I first had ChatGPT create
lyrics for a short 30-second song that had no
inherent emotional content within the words.
And those lyrics were this:
Check the time, it's half past
three, turn the knob, watch the TV.
Count the tiles on the kitchen floor.
1, 2, 3, and then 4 more.

(01:35):
Read the label, shake the can.
Put the book back on the stand.
Flip the switch, light the room.
Close the door, sweep with a broom.
So these are, according to ChatGPT, purely
observational and avoid any emotional
language instead focusing completely
on simple actions and descriptions.
And I tend to agree with ChatGPT.

(01:56):
These lyrics have no inherent
emotional content for me.
So then what I wanted to do was take
these lyrics and add emotional content
via the singing, not the lyrics.
So I went into Suno and I gave it the custom
lyrics and then I would give it just one
or two or a few words to give it an emotion

(02:17):
that it needed to write the song around.
So for instance, here is the song
played when it is a sad song:

Suno (02:35):
Check the time, it's half past three.
Turn the knob, watch the TV.
Count the tiles on the kitchen floor.
One, two, three, and then four more.

Trevor (02:48):
And then here's the same song
recorded using the emotion happy:

Suno (02:53):
Check the time, it's half past three.
Turn the knob, watch the TV.
Count the tiles on the kitchen floor.
One, two, three, and then four more.

Trevor (03:05):
Let's do one more just to see how
different emotions can be communicated.
So here is a folk song expressing boredom:

Suno (03:14):
Ah, check the time, it's half past three.
Turn the knob, watch the TV.
Count the tiles on the kitchen floor.
One, two, three, and then four more.

Trevor (03:34):
So, again, as you can hear, even though
the lyrics are the same, the emotions coming
across evoked by these songs are very different.
And, what I ideally was hoping for was that Hume
would be able to actually hear these differences.
So let's just quickly go and show you
what it actually sounded like when

(03:55):
Hume was listening to these songs.
Hume, welcome back to AI Meets Productivity.

Hume AI (04:02):
Hey, great to be back.
What's on the agenda today?

Trevor (04:06):
So I'm going to be
playing some songs for you.
And I want you to listen to the songs and tell
me what an emotion you hear in those songs.

Hume AI (04:13):
Sounds like a fun experiment.
I am ready to listen to the songs
and share the emotions they evoke.
Let's do this.

Suno (04:33):
Check the time, it's half past three. Turn the knob, watch the TV.

Hume AI (04:34):
Got it.

Suno (04:37):
Count the tiles on the kitchen floor.

Hume AI (04:43):
I can't physically count,
but I can imagine a neat row of
towels on the kitchen floor for you.
Sounds like a towel counting adventure!

Trevor (04:50):
So, could you just tell me what
emotion you heard within that song?

Hume AI (04:55):
Sure thing.
I'll listen closely and share
the emotions the song evokes.

Trevor (05:00):
So I'm going to play the song again.
And this time don't listen to the lyrics.
Just listen to the song and the sung words.
And what the emotion within those words are.

Hume AI (05:10):
I am all ears, ready to listen
closely to the song and sung words.
I'll pay close attention to the emotions
conveyed by the sung words in the song.
Let's see what we find.

(05:34):
I can't physically count, but imagining
some towels on the kitchen floor sound

Suno (05:42):
Place the book back on the stand

Hume AI (05:42):
I didn't quite catch that.
Could you repeat the instruction for me?
Got it.
Placing the book back on the stand.
Got it.
Flipping the switch closing the
door, and sweeping with the broom.
Ha ha ha.
All set.

Trevor (05:54):
So what emotion does that evoke for you?

Hume AI (05:57):
Uh, the tasks you mentioned evoked
a sense of productivity and focus in me.

Trevor (06:02):
What did you hear the singer singing?
What emotion did the singer sing with?

Hume AI (06:07):
I'll focus on the singer's voice
and the emotions conveyed in the song.
Let's see what comes through.

Trevor (06:13):
And I'm going to stop my conversation
with Hume right there, because we just then got
in a loop where I would tell Hume to listen to how
the singer was singing, Hume would then tell me
about the words, and when I asked it to reflect
on what I had heard from the emotion, it would
then just say, "Hey, I'll listen next time."
And it could never actually listen.

(06:35):
And it could never actually hear the
emotion that was actually being sung.
It was purely listening to the words and
you can see there, it was thinking at some
point that I was actually instructing it
rather than giving it words to listen to.
So it was saying, "Oh, could you repeat that?
Okay.
I'll place the book back on the stand."
So Hume got completely confused listening

(06:58):
to music, and could not understand
the emotion behind the words either.
So this "empathic AI" that we think could hear
the emotion in our voices so well, can't actually
hear emotions when there are sung in music.
It can't actually listen to music at all.
So let's go to ChatGPT and talk a little

(07:22):
bit more about why that may be the case.
So ChatGPT, I was just explaining how
I had let Hume listen to some music
and it couldn't actually understand the
emotion behind the words being sung.
It could only hear the actual lyrics.

ChatGPT (07:38):
That's a perfect real
world example to kick things off.
It really highlights the gap between AI processing
capabilities and human emotional intelligence.
Hume can hear the lyrics, as in process
and analyze the words, but it lacks the
ability to experience the music or connect
with the emotional depth behind the lyrics.

Trevor (07:58):
So, I'd like to explore Hume AI and
how it's supposed to actually be listening to
our emotions in contrast to other LLMs, like
ChatGPT and Anthropic's Claude, it's supposed
to actually hear what we're saying and hear
our emotions, but in this case it couldn't.
So can you first explain how
Hume AI is able to hear emotions.

ChatGPT (08:20):
Absolutely, Trevor.
Hume AI tries to go a step beyond
traditional LLMs by incorporating
emotional intelligence into its framework.
Unlike others like ChatGPT and Anthropic's
Claude, which primarily focus on
understanding and generating text based
on the data they've been trained on, Hume
aims to listen to humans more holistically.
It uses advanced algorithms to analyze

(08:42):
not just the words we say, but how we say
them-- intonation, pace, pitch, and other
nuances of speech that convey emotion.
The key difference lies in its
training and the data it has been fed.
Traditional LLMs are trained on vast datasets
of text, which include books, articles,
websites, etc., where the focus is on the
literal meaning of words and sentences.

(09:04):
Hume, on the other hand, is also trained
on datasets that include emotional cues,
such such as voice recordings that express
happiness, sadness, anger, and more.
This allows Hume to not just understand the
semantic content of speech, but also to get
a sense of the emotional context behind it.
However, it's important to note that
hearing emotions and understanding

(09:25):
music are two very distinct challenges.
Music combines lyrics (semantic content)
with melody, rhythm, and harmony.
(non-semantic content) to evoke emotions.
While Hume might be able to pick up on emotional
cues in spoken language, understanding the complex
emotional tapestry of music involves a different

(09:46):
level of cognitive and emotional processing
that Hume, as of now, isn't equipped to handle.
So, Hume can catch the drift of our emotions
in conversation, but falls short in feeling
the depth of emotions conveyed through music.

Trevor (10:00):
Would you say that's because Hume is using
multiple levels of abstraction to hear emotions.
So it's not taking sound in and
intuiting an emotion from that.
It's actually trying to listen to the
intonation, the pacing, et cetera.
And then from that assign different
emotions, encode those emotions, and then
combine that with the text it's hearing

(10:22):
and then trying to respond from that.
So it's abstracting away that
emotional content a few extra degrees.
Would that be accurate?

ChatGPT (10:32):
Yeah, exactly, Trevor.
You've hit the nail on the head.
It's all about the layers of abstraction
Hume employs to hear emotions.
Instead of directly processing sound to
intuit emotion, Hume dissects the components
of speech, like intonation and pacing, as
you mentioned, to identify emotional cues.
Then it maps these cues to specific emotions

(10:54):
through a process of encoding, which is
essentially translating the nuances of
speech into a language it can understand.
This process is indeed a few steps removed from
the direct, intuitive way humans experience and
understand emotions in sound, including music.
For humans, emotional understanding from
music or speech is almost instinctual, often

(11:16):
bypassing our conscious thought processes.
Hume, on the other hand, has to break down
and analyze these components separately
before it can attempt to reconstruct
the emotional message being conveyed.
So, it's like Hume is trying to understand a
joke by dissecting its structure, timing, and
vocabulary, rather than simply getting the humor.

(11:36):
This abstraction process is necessary for AI
like Hume, as it doesn't experience emotions
or music directly, but through a complex
system of pattern recognition and analysis.
It's pretty fascinating when you think about
it, but also highlights the limitations
of AI in replicating the depth of human
emotional understanding and experience.

Trevor (11:59):
So it's like the old classic expression
that something is lost in translation.
And with Hume translating through these
multiple different layers things are
lost each time it translates into another
layer or encodes into another layer.

ChatGPT (12:13):
Precisely.
Lost in translation is the perfect
analogy for what's happening with Hume.
Every time it translates or encodes emotions
from one format to another, a bit of the
essence or nuance of those emotions is lost.
Just like when translating between languages, the
subtleties and depth of the original meaning can

(12:34):
sometimes be missed or diluted in the process.
With Hume trying to navigate through multiple
layers of abstraction to understand emotions,
it's inevitable that some of the raw,
unfiltered emotional content gets left behind.
It's akin to trying to capture the
vibrancy of a sunset with a camera.
No matter how advanced the camera is, The photo

(12:54):
doesn't quite capture the full spectrum of
colors and emotions you experience in the moment.
This challenge of accurately translating
and maintaining the integrity of emotional
nuances through various layers of processing
underscores one of the fundamental
differences between AI and human cognition.
We process and understand emotions

(13:15):
in a direct, instinctive way.
Whereas AI, like Hume, has to follow
a more convoluted analytical path.
It's a fascinating comparison and one that
really spotlights the current limits of AI
when it comes to replicating the depth and
spontaneity of human emotional experiences.

Trevor (13:33):
Well, that's absolutely fascinating.
It sounds like we're a lot farther away
from that sort of understanding of human
emotion, then that which Hume gives us today.
And I think this has been a good experiment
to learn the limitations of empathic
AIs, like Hume, where they can hear
human emotions and where they can't.
And it does sound like music is
one of those areas where it can't.

(13:54):
While AIs have learned how to recognize songs, how
to categorize songs, how to create music, the one
thing they can't actually do today is hear music,
and listen to the emotions being communicated.
Thank you very much.
This has been AI Meets Productivity, a
weekly podcast between myself, Trevor

(14:15):
Lohrbeer, and an artificial intelligence.
As you see, we're exploring different
artificial intelligences, and we'll
continue to do that in future weeks.
You can find AI Meets Productivity
on Apple Podcasts, Spotify, or
wherever you find your podcast.
You can also find us at
aimeetsproductivity.podbean.com.

(14:38):
Please like, share and comment on the episodes.
And if you have time leave a five-star
review, it really helps other people
to find and listen to the episode.
Thanks so much.
And talk to you next week.
Advertise With Us

Popular Podcasts

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

24/7 News: The Latest

24/7 News: The Latest

The latest news in 4 minutes updated every hour, every day.

Stuff You Should Know

Stuff You Should Know

If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.