Happy Anniver-Siri

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:04):
Welcome to Tech Stuff, a production from I Heart Radio.
Hey there, and welcome to tech Stuff. I'm your host,
Jonathan Strickland. I'm an executive producer with I Heart Radio,
and I love all things tech, And uh, you know
what the today's episode I was gonna I was gonna

(00:26):
make it a one partner, but it turns out there's
just way too much stuff, not just about the topic
at hand, but the various components that make up this
topic that require me to do more than one. So
this is gonna likely be a two parter. But today
I thought we could look back at the development and

(00:47):
evolution of a famous AI personality. This virtual assistant celebrated
an anniversary recently, and I must apologize for being a
couple of days late with this, but this particular servant
debuted on October fourth, two thousand eleven, technically for the
second time, but the history of the actual technology dates

(01:11):
back much further. And of course, I'm talking about Sirie,
Apple's virtual assistant that can interpret voice commands and return
results based on them. This is not just some dull
history lesson, however, Sirie really has an incredible backstory, ranging
from a science fiction vision of the future to a

(01:34):
secret project intended to augment the decision making capabilities of
the United States military. Yeah, Siri had a pretty tough background.
The story of Sirie is complicated, and not just because
of the internal history of developing the technology, but also

(01:55):
because the tool relies on a lot of converging technological
trend There are elements of voice recognition, UH, speech to text,
natural language interpretation, and other technologies that fall under the
very broad umbrella of artificial intelligence. So get settled, it's
time to talk about Siri. Also, if you're listening to

(02:17):
this near Apple devices, I apologize because there's a good
chance those devices might start talking back at me. But
I refuse to do an episode where I just refer
to the subject as you know who. You could argue
that the origins of Siri can be found in a

(02:37):
promotional video that Apple produced back in nineteen seven to
show off a concept of an artificially intelligent smart assistant.
Now that alone is interesting, but what really is amazing
is that the arbitrary date they chose as the setting
for this video was two thousand evan, probably September. We

(03:01):
know that because there is a part within the video
where a character asks for information that had been published
five years previously, and the published information had a publication
date of two thousand six. Now this means that the
actual debut of Syrie as an Apple product was just

(03:22):
one month after the fictional events in that video from nine.
That's just a coincidence, but it's a cool one. The
Knowledge Navigator video shows a man walking into a study,
really nice one, and unfolding a tablet style computer device.
Then he walks off away to stare at stuff as

(03:45):
a virtual assistant reads off his messages and meetings on
his calendar. The virtual assistant appears as a video and
a little window on the screen of the tablet, and
it's you know, like shot from the shoulders up, kind
of a the bust of a young man, and the
video takes up that one little corner of the tablet device.

(04:06):
So in this visualization, the virtual assistant isn't just a
disembodied voice. It also has a face. Also, everyone in
this video is extremely white, which I guess is kind
of a given for the time period and the people involved,
but it just comes across as so white. I mean,

(04:29):
we're doing this with the benefit of the glasses of
twenty I just wanted to throw that out there anyway.
The video goes on to have the real life man
who is a professor in this video, ask his virtual
assistant to pull up lecture notes uh and unread articles
that relate back to the lecture he's He's asking for

(04:50):
a lecture notes of a lecture he gave a year ago.
He's giving essentially the same lecture now, but he wants
to update it with the latest information, and he even
asks the virtual assistant to summarize those unread articles that
had been published in the year since his last lecture.
The virtual assistant is thus aggregating information, analyzing that information

(05:12):
for context, and then delivering summaries, which is that's a
pretty sophisticated set of artificially intelligent tasks. He also, the
professor uses the device and virtual assistant to call and
collaborate with a peer in real time. Now, this was
not the only video that Apple would produce to showcase

(05:33):
this kind of general idea, however, arguably it is the
most famous of those videos. Now, as I said, Knowledge
Navigator came out of Apple, and Steve Jobs would later
play a pivotal role in how the company would introduce Sirie,
but This was not a Steve Jobs project because Jobs

(05:56):
had been ousted from the company Apple, or he had
quit in disgust, depending upon which version of the story
you're listening to. Anyway, he had left a couple of
years before this video was produced. The Knowledge Navigator was
something that Apple CEO John Scully had described in a
book titled Odyssey. Now, of course, in science fiction stories

(06:19):
we have no shortage of instances where a human is
interacting with a computer or otherwise artificially intelligent device like
a robot, but the Knowledge Navigator seemed to lay down
the foundations toward future products like Siri and the iPad,
not to mention the potential uses of the Internet, which
inn was definitely a thing. It existed, but most of

(06:44):
the mainstream public remained unaware of it because the Worldwide
Web wouldn't even come along for another few years. However,
while you can look at this video and say, ah,
this must be where Apple got that idea, they probably
got to work right away on Siri, well you'd be
wrong because the early work, in fact, the vast bulk

(07:07):
of the work on Syrie to bring it to life,
didn't start at Apple at all. It didn't involve the company.
So our story now turns to a very different organization,
the Defense Advanced Research Projects Agency, better known as DARPA.
Now this is part of the United States Department of Defense.

(07:30):
Back in nineteen fifty eight, the then President of the
United States, Dwight D. Eisenhower, authorized the foundation of this agency,
though at the time it was called the Advanced Research
Project Agency or ARPA. Defense would be added later. This
agency would play a critical role in the evolution of
technologies in the United States, and the mission of DARPA

(07:54):
and ARPA before it is quote to make pivotal investments
and breakthrough technology is for national security end quote, and
that wording is really precise. It's easy to imagine DARPA
as being housed in some enormous underground bunker filled with
scientists who are building out crazy devices like robo scorpions

(08:16):
or a blender that can also teleport or something. But
in reality, DARPA is more about funding research than conducting research. Now,
don't get me wrong, the agency relies heavily on experts
to evaluate proposals and consider to whom the agency should
send money. But the purpose of DARPA is to enable

(08:37):
others to do important work. DARPA has played a huge
role in countless technological breakthroughs. This way. Much of the
technologies that would go on to power the Internet started
with ARPA net, a kind of precursor network to the
Internet and one that was funded by ARPA. Thus the

(08:57):
name the DARPA Grand Challenge just helped get self driving
cars into gear. You know, pun intended. They also created
difficult scenarios for humanoid robots to go through. That was
a few years ago and was really cool. The competitions
DARPA hosts have specific goals and metrics, and that guides

(09:17):
the designers and engineers who are working on them as
they build out technologies. It's good to define your goal.
It really gives you focus when you're trying to develop
the technology to meet that goal. Winning a challenge is
a big deal, though the cash prize may not even
cover the amount of money participants have spent through the

(09:37):
development of those technologies, and there are entire businesses, or
at least divisions within businesses that can be borne out
of these challenges. The Grand Challenges are just one way
DARPA encourages technological development. Often, the agency will create a
specific goal such as the design of a robotic exoskeleton

(09:59):
that can help you know, US soldiers carry heavy loads
while they are on foot for longer distances, and then
they'll send out an RFP, which is a request for proposal.
The agency considers the proposals that it receives from this
RFP and then decides which, if any, they will accept
and then fund. Then after a given amount of time.

(10:22):
You know, it's dependent upon the specific project, we find
out if anything comes out of it. Sometimes nothing does,
as some technological problems may prove more challenging than others
and may require more time to evolve the various technologies
to make it possible. So it might push the field,
but you might not have a finished product at the

(10:42):
end of it. Other times you do get a finished
product anyway. In two thousand three, a decade and a
half after the Knowledge Navigator videos came out of Apple,
DARPA identified a new opportunity, and this was one that
was borne out of necessity. The challenge was that we
have access to way more information today than we did

(11:04):
in the past. So decades ago, military commanders had to
make decisions based on limited information. They'd rely a great
deal on their own expertise and experience in order to
make up for the fact that they only had part
of the picture. And while a great commander has a
better chance of making the right call than an inexperienced

(11:26):
commander would, the limited amount of information could still contribute
to disaster. You might be the greatest commander of all time,
but if you're lacking a key part of information, you
might make a decision that is terrible. So flash forward
to two thousand three, and now the story had kind
of flip flopped. Now military commanders would receive more information

(11:48):
than they could reasonably handle. The challenge now wasn't to
use intuition to make up for blind spots, but rather,
how do you synthesize all this information so that you
can make the right decision. Too much information was proving
to be kind of as big a problem as too
little information, at least in some cases, and so DARPA

(12:11):
wished to fund the development of a smart system that
could help commanders make sense of all the data coming
in from day to day. Now, DARPA projects tend to
be labyrinthian, with lots of bits and pieces and a
lot of different companies and research labs and more organizations
might tackle all or part of one of these projects.

(12:34):
The cognitive computing section of DARPA had a program called
Perceptive Assistance that Learn or PAL, which seems nice. It
was this part of the program that would fund the
development of a virtual cognitive assistant. The amount of funding
was twenty two million dollars. What a great PAL. The

(12:57):
organization that landed this deal was s r I International,
itself an incredibly influential organization. It's a nonprofit scientific research institution.
Originally it was called the Stanford Research Institute because it
was established by the trustees of Stanford University back in

(13:20):
nineteen forty six, though the organization would separate from the
university formally in the nineteen seventies and become a standalone,
nonprofit scientific research lab. The organization has played a role
in advancing materials science, developing liquid crystal displays or l
c d s, creating telesurgery implementations, and more. And now

(13:43):
it was going to tackle DARPA's request for a cognitive
computer assistant. S r I International created a project called
the Cognitive Assistant that Learns and Organizes or KALO or
CALO if you prefer. And this appears to be another
case where they landed upon that acronym first and then

(14:05):
worked backward, as klo seems to come from the Latin
word colognists, which means soldiers servant, and I probably mispronounced
that because even though I was a medievalist, it's almost
criminal I never took Latin. The concept, however, hearkens back
to some of what we would see in that Knowledge

(14:26):
Navigator video from a system that would be able to
receive and interpret information, presumably from multiple sources, and provide
a meaningful presentation or even interpretation of that data to humans,
which is a pretty tall order, and let's break down
a bit of what an assistant would need to do

(14:47):
in order to accomplish this. We'll leave help the voice
activation parts for now, as that would not be absolutely
critical to make this work. You know, you might have
a system that gives daily briefings on its own, or
you might have one that you activate through text commands
or some other user interface. It wouldn't necessarily have to
be voice activated. But on the back end, what has

(15:08):
to happen for this to work well? Presumably such a
system would need to pull in data from a number
of disparate sources, so the assistant wouldn't just be reciting
facts and figures that we're coming from a centralized data server. Instead,
it might be assimilating data from numerous sources into a
cohesive presentation. On top of that, the data might be

(15:31):
in different formats, meaning the system would need to be
able to analyze the information inside different types of files.
This isn't an easy thing to do. There's a reason
we have a lot of specialized programs for working with
specific types of files. When I put together these podcasts,
I use a word processor for my notes, and I

(15:53):
use an audio editing piece of software to record and
edit the podcasts. Now I need both of those programs
because neither of them can do the job that the
other one does. I don't have like a all purpose
program that does everything. Accessing different file formats, even in
the same general family of applications is tricky. Beyond that,

(16:16):
the way information can be presented within each file could
be very different. It's very possible for us to open
up multiple spreadsheets and even using the same basic spreadsheet
program let's just say Excel, It's possible for us to
open up half a dozen Excel spreadsheets that are all
presenting the same information but doing so in different ways,

(16:38):
and that might not be obvious at casual glance. You
might look at one and the other and not immediately realize, oh,
these are both saying the same thing. Just think about
how information could be presented as a table or a
graph or a chart. The AI assistant would ideally be
able to access information no matter what format it was in.

(16:59):
Nomatter are what a version of that format it was in,
be able to interpret it and then be able to
deliver a meaningful analysis to the user. Now, as data
sets grow, this becomes increasingly difficult, which I should point
out is the whole reason DARPA wanted to fund research
into this in the first place. Military commanders were faced

(17:19):
with a growing mountain of information that was increasingly difficult
to parse. The analysis might also need to incorporate natural
language recognition features. And I've talked about natural language a
lot in previous episodes, but if we boil it down,
it's the language that we humans use to communicate with
one another. It's our natural way of expressing our thoughts.

(17:43):
But the way we humans process and communicate information is
different from how machines do it. We can be subtle.
We can use stuff like metaphors and allegories and just
different phrasing. Computers are, you know, a lot more literal. Hey,
if you break it down to the most basic unit
of machine information, you know, the bit. You see how

(18:06):
literal computers are. A bit is either a zero or
a one, or if you prefer, it's either off and
on or no and yes. But using lots of bits,
we can describe information in a way that provides more
subtlety than just nowhere. Yes. But my point is that
computers don't naturally process information the way we do, and

(18:28):
so an entire branch of artificial intelligence called natural language
processing evolved to create ways for computers to interpret what
we mean when we express things within natural language. Making
this more complicated is that, of course, there's no one
way to say any given thing. We've got lots of

(18:49):
ways to express the same general thought. And added to that,
we have lots of different languages. There are around seven
thousand different langue whig is spoken in the world today,
though you could probably get away with a couple of
dozen and cover the vast majority of the world's population
that way. But these languages have their own vocabularies, their

(19:11):
own syntaxes, their own expressions. So not only do we
have multiple ways of saying things within one language, we
have all these different languages to worry about. If you
were to send ten people into a room with an
AI assistant, and those ten people have a task they're
supposed to perform with the help of this AI assistant,

(19:33):
odds are no two people are going to go about
it exactly the same way. And yet a working virtual
assistant needs to be able to interpret and respond to
every case and do so reliably on the back end,
and AI system needs to be able to interpret data
coming from different sources that may have very different ways
of expressing similar ideas. This is an enormous task. Now,

(19:58):
when we come back, I'll talk more about what s
R I was doing and how the military project would
evolve ultimately into Apple's Personal Assistant. But first let's take
a quick break. Now I've only scratched the surface of

(20:19):
what makes the creation of an AI assistant capable of
accessing information from numerous sources and making that information useful
really required. Let's talk a bit about the parameters of
this project itself. So if you remember I said that
the deal was initially for twenty two million dollars, and
that would end up funding the creation of a five

(20:42):
hundred person project, and the project spanned five years initially
to investigate the possibility of building out such an AI system.
Over time, more money would end up going into the
research system, and it totaled around a hundred fifty million
dollars by the end of the produc inject. The lab
where it all went down would receive the charming nickname

(21:05):
nerd City. A large part of the project focused on
creating a program that could learn a user's behaviors. So
not only could this personal assistant respond to what you
were asking, it would gradually learn the way you behaved
and it would adapt to you to work more effectively.

(21:26):
Now this comes into the arena of pattern recognition. We
humans are pretty darn good at recognizing patterns. In fact,
we're so good that sometimes we will quote unquote recognize
a pattern even when there isn't a pattern there. In
some cases, this can come across as charming, such as

(21:48):
when we see a face in a cloud, right, that's
not really a pattern there. We're recognizing a pattern where
none really exists. It's all based on our perspective in
our imaginations. Now, in other cases, it's not so charming.
It can actually lead to faulty reasoning. So I'm going
to give you a very basic example that I hear
all the time, particularly now that we're in October and

(22:11):
there's some full moon weirdness going on. So there's a
fairly widespread belief that there's a connection between full moons
and an increase in the number of medical emergencies that happened.
Generally speaking, that people act irresponsibly during a full moon,
and that often results in injury, which means greater activity

(22:33):
at hospitals. Now, this belief is most likely due to
confirmation bias. That is, we already have a belief in place,
and the belief is that full moons lead to more
accidents because of people acting irresponsibly. That is what we believe.
It doesn't have evidence yet, and then when things do

(22:55):
get busy at a hospital and there happens to be
a full moon, we register that as evidence for our belief. Aha,
says the mistaken person. The full moon explains it. However,
on nights when it is busy but there is no
full moon, there's no hit, no one, no one takes
notice of how odd you know, it's crazy busy, but

(23:17):
there's no full moon tonight. We don't do that. Likewise,
if it happens to not be busy but there's a
full moon, you're also not likely to notice. You're not
likely to say, like hunt, it's not very busy tonight,
but there's a full moon out. So it's only when
you have the full moon and the busy hospital where
the evidence appears to support your belief and confirm your bias.

(23:42):
But in truth, when you take a step back and
you do an objective study and you look at the
times when a hospital is busy, and you look at
when there was a full moon, and you look to
see if there's any correlation, it falls apart. Now I
got a little off track there, But the point I
wanted to make is that we humans are biologically attuned

(24:03):
to recognizing patterns. It's very likely that pattern recognition is
one of the traits that really helped us survive thousands
of years ago, which is why it's so intrinsic in
the human experience. But building programs, computer systems that are
capable of identifying patterns and separating out what is signal

(24:24):
versus what is noise is its own really big challenge.
S r I was hoping to create a program that
could look for patterns and user behavior in order to
respond with greater precision and accuracy to user requests and
ultimately to anticipate future requests. Now we see the sort
of pattern recognition and response in lots of technology today.

(24:48):
There are several smart thermostats on the market right now,
for example, that can track when you tend to raise
or lower the temperature in your home, and after a while,
the thermostat learns that, hey, maybe you like it nice
and chilly at night, but you want it to be
warm and toasty in the morning, and so the thermostat
begins to adjust itself in preparation for that based on

(25:10):
your previous behaviors. Now that is a very simple example.
Extrapolate that out and you begin to imagine a technology
that is anticipating what you need or want, perhaps before
you're even aware of it yourself, which can get kind
of creepy but also sort of magical. But in truth,
it's because this system is detecting patterns that we aren't

(25:34):
even able to recognize ourselves. The danger there, of course,
is that the systems can sometimes mistakenly identify a pattern
when in fact there's not really a pattern there. Very
similar to the case I was explaining about with the
full moon and the busy hospital. Even computer systems can
make those sort of mistakes, and depending upon the implementation,

(25:56):
that can be a real problem. But that's a that's
an issue for a different podcast. Now. When it comes
to humans, pattern recognition is so ingrained in most of
us that it can actually be kind of hard to explain.
You notice, when something happens, and if that same thing
happens later with the same general results as the first time,

(26:17):
it reinforces your first perception of that thing, and if
it happens over and over, their brain essentially comes to
understand that when I see X happen, I can expect
why to follow, and from that you might eventually realize
that there are other correlating factors that may or may
not be present. When this goes on. With computers, the

(26:39):
goal is to create systems that can analyze input, whether
that input is an image file or typed text or
spoken words or whatever, and it first has to interpret
that input, has to identify it and figure out the
defining features and attributes of that input, then compare that
against known patterns to see if the input matches or

(27:02):
doesn't match those patterns. And in a way, you can
think of this as a computer system receiving input and
asking the question have I seen this before? And if so,
what is the correct response? If the input matches no pattern,
the system then has to have the correct response for that.
So a very simple example might just be a failed state,

(27:25):
in which case the virtual assistant might reply with something
like I'm sorry, I don't know how to do that yet,
or something along those lines. Now, remember earlier I mentioned
that we humans have a lot of different ways to
say the same general thing. For example, with my smart speaker,
I might ask it to turn the lights on full,

(27:45):
meaning I want them to be all the way up.
I might say make the lights. I might just say
make it brighter. And the system has to take this input,
analyze it, and make a statistical determination to guess at
what is that I actually want to have happen. I
say guess because in each case we're really looking at

(28:06):
a system that has multiple options when it comes to
a response, and each option gets a probability assigned to
it based on how closely that option matches with the input,
So I might say make it brighter, and the underlying
system recognizes that there's a n chance I mean, increase
the brightness of the lights of the room, my men,

(28:29):
and the system has determined that that's the most probable answer. Right,
it's probably correct, so it goes with that, but still
kind of a guess. Now, there are a lot of
different ways to go about doing this, but the one
you hear about a lot would be artificial neural networks.
I've talked a lot about these in recent episodes, so

(28:51):
we'll just give kind of the quick overview. So you've
got a computer system has artificial neurons. These are called nodes,
and the job of a node is to accept incoming
input from two or more sources. The node is then
to perform an operation on those inputs, and then it
generates an output, which it then passes on to other

(29:13):
nodes further in the system. You can think of the
nodes as existing in a series of levels, with the
top level being where input comes in and the bottom
level being where the ultimate output comes out. So the
nodes are level down except incoming inputs then perform other
operations on them and pass it further down the chain

(29:34):
and so on until ultimately you get an output or response.
Now that's a gross oversimplification of what's going on, but
generally you get the idea of the process. Now, let's
complicate things a little bit to get these sort of
neural networks to generate the results you want. One thing
you can do is mess with how each node values

(29:56):
or ways each of the inputs coming into that node.
So I'm going to use some names human names for
nodes here just to make things easier to understand. Let's
say we've got a node named Billy. Billy is on
the second layer of nodes, so it's one layer down
from where direct input comes into the system. So there

(30:19):
are nodes above Billy that are sending information to Billy.
We'll say that the two nodes that give Billy information
are named Sue and Jim Bob. Sue and Jim Bob
send Billy information, and it's Billy's job to determine what
further information to send down the pipeline. Like I need
to do an operation based on this bits of these

(30:42):
bits of information that are coming to me, and then
I have to come up with a result. Only Billy
has been told that Sue's information tends to be a
little more important than Jimbob's information is, and so if
there's a question as to what to do, it's better
to lean more on sue use information than on Jimbob's information.

(31:03):
We would call this waiting as n W E I
G H T I n G. Computer scientists wait the
inputs going into nodes in order to train a system
to generate the results to the scientists want. One way
to do this is through a process called back propagation.
Back propagation is when you know what result you want

(31:27):
the system to arrive at. So let's use the classic
example of identifying pictures that have cats in them. As
a human, you can quickly determine if a photo has
a cat in it or not. You'll spot it right away.
So you feed a picture through this system and you
wait for the system to tell you if yes, there's
a kitty cat in the picture or no. The images

(31:50):
cat free. And let's say that the picture you fed
to the system in fact does have a cat in it.
You can see it, but when you feed it through
the system, the system fail is to find the cat
and says nope, there's no cat here. Well, you know
that the system got it wrong. So what you might
do as a computer scientist is you look at that

(32:11):
final level of nodes right at the output level to
see which factors led those nodes to come to the
conclusion that there was no cat in the photo. You
then look at the inputs that are coming into those
nodes and you see how they are weighted, and you
change the weights of those inputs in order to force

(32:31):
that last level of nodes to say, oh, no, there
definitely is a cat here. And so on. You move
up from the output level and you go up level
by level, tweaking the waitings of incoming data so that
the system is tweaked to more accurately determined if a
photo has a cat in it. Now, this takes a

(32:51):
lot of work, and it also means using huge data sets.
You know, you're feeding hundreds of thousands or millions of images,
so of them with cats, some of them without, and
training the system over and over again to train it
before you start feeding it brand new images to see
if it still works. And this can be a laborious
process to train a machine learning system, but the result

(33:14):
is that you end up with a system that hopefully
is pretty accurate a doing whatever it was you were
training it to do, you know, like recognized cats. But
that's just one approach to machine learning. There are others.
Some like the version I just described, fall into a
broad category called supervised learning. Others are in unsupervised learning.

(33:37):
In fact, Kalo was largely built through unsupervised learning, meaning
the machine had to train itself as it performed tasks
using inputs that hadn't been curated specifically for training purposes.
It's just an enormous amount of information coming in that
the system has to process. So, in other words, for Kalo,

(33:57):
the system wasn't dealing with like a stack of a
million photos, seventy of which had cats and which didn't.
Kayla was working with real world information and attempting to
suss out what to do with it in real time. Now,
to go into how unsupervised machine learning works would require
a full episode on its own, but it is a

(34:19):
fascinating and complicated subject, so I probably will tackle it
at some point. I'm just gonna spare you guys for
right now. The real point I'm making is that s
RI I International spent years building out systems that could
do a wide range of tasks based on inputs. Pattern
recognition was actually just one relatively small piece of that.

(34:40):
Creating an ability to pull data from different sources in
a meaningful way is its own incredibly challenging problem, as
I alluded to earlier, particularly as the number of sources
you're pulling from and the variety of formats the data
is in begins to increase, it becomes easier for the
system to make mistakes as you throw more variety at it,

(35:01):
and it requires a lot of refinement. Frankly, it's actually
a task that's so big I have trouble grasping it.
The Kalo project became the largest AI program in history
up to that point. It was an incredible achievement. It
brought together different disciplines of artificial intelligence into a cohesive

(35:22):
project with a solid goal. By the two thousand's, artificial
intelligence was a sprawling collection of computer science disciplines, each
with incredible depth to them. So you might find an
expert in one field of AI who would have little
to no experience with another branch under the same general
discipline of artificial intelligence. There was a prevailing feeling that

(35:45):
the various branches of AI had each become so complex
they would never work together. The Kalo project proved that wrong.
When we come back, i'll explain how part of this
military project would break away to become the virtual assistant,
ultimately finding its way onto iOS devices. But first let's
take another quick break. Adam Chair, whose name I'm likely mispronouncing,

(36:19):
and I apologize, but he was an engineer at s
r I working on Kalo, and he worked with a
team that had the daunting task of assimilating the work
that was being done by twenties seven different engineering teams
into a cohesive virtual assistant. So, as I mentioned just
before the break, the disciplines of AI had each gotten

(36:40):
very deep, very broad, and required a lot of specialization.
So you have these different engineering teams working within various disciplines,
and it was chairs team that needed to bring all
these together and make it into a working, coherent hole.
The results were really phenomenal. Now I'll give you a
hypothetical use for Kalo. Let's say that you've got a

(37:04):
project team and there are ten people on your team,
including you, and let's say there's a meeting that's on
the books for tomorrow morning at a particular conference room,
and it's supposed to be a status update meeting for
the project. It turns out that two out of the
ten people on your team are no longer able to

(37:25):
make the meeting due to last minute high priority conflicts,
so they've had to cancel out of the meeting. KALO
would be able to detect the change in status of
those two people and say, all right, these two are
no longer going to the meeting. Then KALO could determine
how important those two people were to the overall team,

(37:46):
essentially saying what are their roles? What what role are
they performing within the context of this team, and is
it a critical role for this meeting. It can also
look at the importance of the meeting itself, like, oh, well,
this is a status update, so it's really just to
keep the team, you know, informed of what's going on.
It's not a mission critical type of meeting. It could

(38:08):
take all that into account. Then KALO can make a
determination on its own whether or not it should keep
the meeting in place and go ahead just without those
two people and maybe just send updates to those two people,
or to cancel the meeting entirely notifying all the participants
about it. Then look at the different calendars of those participants,

(38:28):
book a new meeting, including securing a space for that
meeting and sending out new invites. It would even be
able to look at the purpose of the meeting and
flag information that's relevant to that meeting, essentially creating a
sort of meeting dossier on demand. So it's really, you know,
incredible sophisticated stuff. Now, that was the fully fledged Kalo,

(38:53):
but an offshoot of this project, or maybe it's it's
better to say it was a smaller sister project that
existed at the same time it launched in two thousand three.
Along with Kalo. This other one was called Vanguard, at
least within s r I, and it was taking a
more scaled down approach of building out an assistant and

(39:15):
looking at how it could be useful on mobile devices. Now, again,
this was in two thousand three, before smartphones would really
become a mainstream product because Apple wouldn't even introduce the
iPhone until two thousand seven. But s r I was
working on implementations of a more limited virtual assistant and
then showing it off to companies like Motorola. One person

(39:37):
at Motorola who was really impressed with this work was
a guy named Dog Kittlaus. Kittlaus attempted to convince his
superiors that Motorola that Vanguard was a really important piece
of work, but he didn't find any real interest over
at Motorola, so he did something fairly brazen. In two

(39:57):
thousand seven, he quit his job at Motorole and he
joined SRI International with the intent of exploring ways to
spin off a new business that would develop an implementation
of the Kalo Vanguard virtual assistant, but for the consumer market.
The result would be a new company called Sirie s

(40:19):
I r I, which is kind of the way you
would say s r I if you were trying to
pronounce it as if it were an acronym as opposed
to an initialism. Adam Chair, after some convincing from Kittlaus,
joined the venture as the vice president of Engineering. Kit
Loss would be the CEO. Tom Gruber, who had studied

(40:40):
computer science at Stanford and then pioneered work in various
fields of artificial intelligence, would become the chief technology officer
for the company. Interestingly, the Serie team didn't initially call
their own virtual assistant project SIRIE. Instead, the new spinoff company,

(41:00):
SIRI would call their virtual Assistant how H a l
after the AI system in the book and film two
thousand one. They did take an extra step to reassure
people that this time HOW would behave itself. So, if
you're not familiar with the story of two thousand one,
the artificially Intelligent computer system HOW begins to malfunction and

(41:24):
begins to interpret its mission in such a way that
it compels it to start killing off the crew inside
a spacecraft, kind of a worst case scenario with AI.
While SIRIE began to get off the ground, it was
licensing technologies from s r I to power the virtual assistant,
and it also began to hire the talent needed to

(41:44):
bring this idea to life. At the same time, Apple
was pushing the smartphone industry into the limelight with the
introduction of the first iPhone. This was all happening at
two thousand seven. It was clear that the push for
a virtual assistant was coming at just the right time,
as Apple's implementation of smartphone technology was a grand slam

(42:06):
home run. To use a sports analogy, it soon became
obvious that the future of computing was going to be,
at least in large part mobile That in turn opened
up opportunities to create new ways to interact with mobile
devices in order to do the stuff we needed to
do now. It's obvious to say this, but mobile devices

(42:28):
have a very different user interface from your typical computer.
Interacting with a handheld computer by tapping on a screen
or talking to it creates different opportunities for crafting experiences
than someone sitting down to a computer with a keyboard
and mouse. There's a potential need for a voice activated

(42:48):
personal assistant that could help you carry out your tasks,
particularly ones that might need multiple steps. Sirie the Company
came along just as the need for Sirie the App
was beginning to take shape, so it was the right
place at the right time. In two thousand seven, Apple
had not yet opened up the opportunity for independent app

(43:11):
developers to submit apps for the iPhone. That wouldn't actually
happen until July tenth, two thou eight, essentially a year
after the iPhone had debuted. The Serie team was still
hard at work building out the virtual assistant app they
had in mind in two thousand and eight, while they
were licensing technology from s r I International, you know,

(43:32):
from the Vanguard and the the Kalo projects, they still
had to build out the systems that would actually power
Syria on the back end. Generally speaking, their approach was
to create an app where a person could ask Syria
question and the app would record that request as a
little audio file, send that audio file to a server

(43:53):
and a data center, and the first step then would
be to transcribe the audio file into text, so we're
talking about speech to text here. Then the system would
need to parse the request. What is actually being asked here?
What is the command or request saying. Now, in some systems,
a computer will break down a sentence into its various components,

(44:15):
you know, a subject, verb, and object, and then try
to figure out what is actually being set. Adam Chair
took a different approach with his team. They taught their
system the meaning of real world objects. So, rather than
trying to parse out what a sentence meant by first
figuring out what's the subject, what's the verb, and what's

(44:38):
the object that the subject is acting upon, Siri started
off by looking at real world concepts within the request.
Siri would then map the request against a list of
possible responses and then employ that statistical probability model that
I mentioned earlier. What are the odds that someone was

(44:59):
asking for dire actions to an Italian restaurant versus asking
Siri to provide a recipe for an Italian dish, for example.
So if I activate my virtual assistant and say I
want linguini, that's a pretty broad thing to say, right.
The app has to guess at whether I mean I
want to go someplace that serves linguini or I want

(45:21):
to make it myself. Now, my personal app would have
learned by my behaviors that I am very lazy and
would realize that I am actually asking for someone to
bring me linguini. So there's no doubt Siri would return
results of Italian restaurants that deliver as a result from
my request. And keep in mind, Sirie was intended to

(45:42):
learn from user behaviors and a tune itself to those
behaviors over time. Beyond that, Siri would pull information from
multiple sources to provide results. So if I asked about
a restaurant, Siri would provide all sorts of data about
the restaurant, from user reviews, to directions to the restaurant,
to menu items to what price range I might expect

(46:05):
at that place. Syria could also tap into other stuff
like the phone's location, and thus give relevant answers based
on my location, so I wouldn't have to worry about
getting irrelevant search results if I happened to be far
from home, right Siri wouldn't suggest that I go and
get food from a place that's right down the street

(46:25):
from my house in Atlanta while I happen to be
in New York City, for example. The team also gave
Sirie a bit of an attitude. Siri could be sassy
and had a bit of a potty mouth. In fact,
Siri would occasionally drop an F bomb here or there now.
According to Kittlaus, the goal was eventually to offer extensions

(46:45):
to Siri so that end users could kind of pick
the apps personality. Maybe you want a no nonsense virtual
assistant that just provides the information you need and that's it.
Maybe you wanted more of a good fee sidekick, or
maybe you wanted a virtual assistant who could give you
some serious attitude on occasion. The goal down the line

(47:08):
was to create options for people to kind of shape
their experience, but that would end up on the cutting
room floor due to a very big reason. The serie
app made its debut in the iPhone app store. In January,
three weeks after it debuted, Kit Loss received a phone
call from an unlisted number, a call that he almost

(47:32):
didn't even answer, but when he did answer, the person
on the other end of the call happened to be
Steve Jobs, the CEO of Apple. Jobs was over the
moon about Sirie and wanted to meet with kit Lost
to discover some pretty enormous options, the biggest one being
that Apple itself would acquire Sirie. Now. At the time Sirie,

(47:53):
the company was working on developing a version of the
app for Android phones, having reached a deal with varies
in to create a version of Sirie that could be
the default app on all Verizon Android phones moving forward.
The Apple deal would ultimately derail that agreement, as Jobs
was insistent that Sirie be an Apple exclusive. In fact,

(48:16):
when Apple would introduce Sirie on October fourth, two thousand eleven,
it seemed like it was being presented as a purely
Apple product, that it didn't have a life outside of
Apple at all. It came across as it just being
Apple all along. And of course, the day after Apple

(48:40):
would introduce SyRI to the public, Steve Jobs himself passed away.
October five, two thousand eleven. But that part of the
story will have to wait for part two because, as
I said, this is going longer than I anticipated. So
in our next episode we'll pick up probably actually a
little earlier than where I'm leaving off here, actually, because

(49:02):
there's still some other details we should talk about as
far as how Siri works and the actual arrangement of
Apple's acquisition, and then we'll talk about how the app
has evolved and changed under Apple's ownership, and will also explore,
you know, a little bit about series distant cousins like
Alexa and Google Assistant and others, because all of these

(49:26):
work in similar ways, though they have their own specific
processes to handle requests, and so if you do an
Apples to Apples comparison, it does break down ultimately once
you start getting down to how things are working in
detail on the back end. So I won't go into

(49:47):
full mode on those because it would require multiple episodes
on that. But we will talk more about Siri and
what has happened in the years since its acquisition in
our next episode. If you guys have suggestions for future
topics I should tackle on tech stuff, let me know
the best way to do that is to reach out
on Twitter. The handle we use is text stuff H

(50:09):
s W and I'll talk to you again really soon.
Text Stuff is an I Heart Radio production. For more
podcasts from my heart Radio, visit the i heart Radio app,
Apple Podcasts, or wherever you listen to your favorite shows.

All Episodes

Episode Transcript

TechStuff News

Follow Us On

Hosts And Creators

Oz Woloshyn

Karah Preiss

Show Links

Popular Podcasts

Stuff You Should Know

24/7 News: The Latest

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Happy Anniver-Siri