Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
Hey everybody,
fascinating chat today as we
dive into the world ofenterprise AI through open
source, with a true innovatorand expert in the field at
Anaconda AI.
Peter, how are you?
Speaker 2 (00:17):
I'm great, great Glad
to be here.
Speaker 1 (00:20):
Well, thanks for
being here, Really intrigued by
the work you and your team aredoing.
Before that, maybeintroductions about your
background and mission, and howdid you end up at Anaconda?
What drew you to this veryexciting space?
Speaker 2 (00:36):
Yeah, it's a very
interesting story.
I didn't set out this way.
I was a consultant using Pythonand its early stage scientific
tools back in the aughts, and westarted seeing these tools
getting used more and more inindustry, beyond just science,
and my co-founder, travisOllivant, and I had the bright
idea that maybe these scientifictools are ready to cross over
to the mainstream of businessdata analytics, and so we
(00:58):
started Anaconda at the sametime that we started a general
community movement pushing theuse of Python for data science
and the open data sciencemovement around Python.
We really were seminal increating, leading that in the
2012 timeframe and, of course,as the years have rolled forward
, we've seen that our hunch wascorrect.
Many people do enjoy usingPython for machine learning, for
(01:20):
data analysis, and now, ofcourse, it's become the language
for AI and we're very, veryhappy to see its massive success
.
Speaker 1 (01:27):
Well, let's talk
about that.
Take a little walk down memorylane.
Python has been around forever,it seems, but it's now powering
the AI boom.
Did you see that coming whenAnaconda got started?
How did that evolve over time?
Speaker 2 (01:41):
It's funny because
the answer to that question is
both emphatically yes and liketotally no.
We absolutely had a very, verydeep conviction that, coming off
the big data sort of boom rightin the early 2010s big data and
cloud computing a lot morepeople are going to want to do a
lot more things with their databesides just a SQL database
(02:03):
query and they needed tools todo a lot more things with their
data besides just a SQL databasequery and they needed tools to
do that that were.
You know, python was a greattool for doing that.
R was also around, but Python,I think, had some advantages and
so we utterly believed thatPython could be the language for
data science, data analysis,machine learning at scale.
So on that front, we were yeah,I think we were right and we
(02:23):
definitely saw that coming.
But, on the other hand, to beable to see that diffusion
models and transformers and allthese things would lead to this
like massive portal to God knowswhat right that we did not see
coming.
I don't think anyone at thattime could have seen it coming,
necessarily because AI was stillin kind of the AI winter.
Deep learning was starting toshow some early traction and it
wasn't until the late 20 teens,when it was obvious that the AI
(02:45):
revolution was here and Pythonwas the language that was
preferred by the researchers andby practitioners.
Speaker 1 (02:52):
Got it and fast
forward.
You just launched the AnacondaAI platform, so congrats on that
.
Speaker 2 (02:57):
Thank you.
Speaker 1 (02:57):
So for the viewers,
what is it exactly?
What makes it different fromother AI or data platforms out
there in the market?
Speaker 2 (03:04):
Yeah, we've always
been out meeting users where
they are right, and so I thinkthere's a lot of energy around
this space, a lot of peopletrying to do a lot of various
kinds of things.
For us, it's all about, okay,what are people actually doing,
right?
A lot of the data exploration,a lot of data transformation.
All that stuff is still aprecursor to doing AI at scale,
and so you still need theclassic, I would say, python,
(03:28):
data ML, engineering, dataengineering kinds of tools.
But in addition to that, youalso need a few other things,
right, and that includes thingslike model management.
That includes things like, hey,how do I govern all these open
source models that people arereleasing, quantizations, fine
tunes.
I want to pull all of that intoan enterprise ready platform,
(03:48):
and so the practitioners have aneasy place to collaborate, to
share their work, to sort ofparty on the data and party on
the models.
But enterprises have realconcerns about governance,
compliance, reproducibility, allof these things, and the
Anaconda AI platform is a placeto bring both of these things
together.
So, as essentially an outgrowthand extension of what we've
(04:09):
always done with our datascience platform, we make it
easy for practitioners tocontinue using the tools they
want to use we're not superopinionated about.
Do you want to use JupyterNotebook?
Do you want to use VS Code?
Whatever front ends andwhatever cloud we connect to all
the clouds, we work on-prem andall these kinds of hard
security and regulatedenvironments, but at the same
(04:29):
time, when you start bringing inyour AI models and you start
building workloads around them,easily deploying them and giving
administrators and IT folks asingle pane of glass to see in
what ways do we have a securityvulnerability and what ways
might we have some exposure hereand there, those are the kinds
of things that we've wrapped upinto a single pane of glass, so
to speak.
Speaker 1 (04:49):
Well done.
You mentioned security.
Security always comes up inopen source discussion.
What makes you think it makessome companies or enterprises
nervous, and how are you and therest of the industry tackling
that?
Speaker 2 (05:02):
Yeah, the topic of
open source security in just
traditional straight up softwaredevelopment is becoming more
and more a center of focus,right Because of the success of
adoption of open source, I wouldsay.
But we've also seen some reallynew kinds of like very
audacious attacks deep two year,three year, like sleeper cell
(05:23):
kind of attacks on the supplychain, which is incredible.
The LibExZ attack that happenedlast year but that came to
light I guess it happened foryears and then it came to light
last year.
That was a very deepstate-level actor attacking the
very nature and the fabric ofwhat makes open source work.
The trust model between opensource collaborators was under
attack.
So we know that enterprise isbuilt on this stuff.
(05:45):
We know that there are red team, black hat adversaries looking
to weaponize that adoption.
So as an industry we have to getmore serious about this
adoption and it's not justbecause we want to.
There are regulatory thingscoming down from like Europe and
also executive orders andguidances from NIST here in the
United States.
People have got to get seriousabout the software supply chain
(06:06):
for open source softwaresecurity.
And now that is one thing, andwhen you look at AI, that's a
whole additional set ofcomplications on top of that,
because all this AI stuff isbuilt on top of and to some
extent uses this open sourcesoftware stuff, but it
introduces its own new set ofcomplexities and security
(06:28):
vulnerabilities.
Right, a lot of it.
Right now.
When we look at real world, likeno kidding, who's really doing
AI stuff for real?
For all the talk of agents andall the hype around the VC space
and this stuff, what are peoplereally doing?
And when you look at the actualdata people are really having,
they're struggling to get thesethings working in production for
real, not only because of justthe challenges of the technology
(06:51):
itself hallucinations and justefficacy and general
reproducibility.
So figuring out that is hard.
But then, additionally, we'reseeing people already again
Black Hat Red Team they'reattacking those kinds of open
models.
People using tools.
They're trying to jailbreak thesystem prompts.
These things are vulnerable,sort of like from the get-go.
(07:12):
And so if you're an enterprisethat is excited about the
opportunity as you should be,because I think it's massive you
also have to understand thatsecurity goes part and parcel
with the development of this.
This is not like an open sourcedevelopment.
Oh, we'd like to secure ourPython or Nodejs packages.
Wouldn't that be nice if wewere checking all the boxes With
(07:33):
the AI stuff.
It is not optional.
The guardrails are just notoptional given the scale and
scope and complexity of what ishappening there and what that
tooling actually represents.
Speaker 1 (07:42):
Well said, Things are
certainly moving fast and month
to month, week to week.
Are you seeing any major shiftsin how companies are building
AI solutions today, theirapproaches?
I mean what's working, what'sstill broken.
Speaker 2 (07:57):
Yeah, you're right,
it changes week to week
sometimes, but mostly it's stilla month to month cadence.
Right, we had for a while a lotof people just trying to get
bigger and bigger contacts sothey could one-shot everything.
Then they realized that kind offalls apart.
It's kind of like, you know,hit or miss.
And so then coupling thesesystems up with RAG, rag became
all the rage, right.
And then now it's agents andagentic workflows, and now it's
(08:19):
all these things combinedtogether.
Right, can they use tools?
Can we have, can we do chain ofthought?
So we're doing inference, timescaling.
All of these things are now allcoming together.
So that's my view of likewhatever the last 18 months in
like 30 seconds.
But I think what's nice is thatit feels to me the vibe seems to
(08:41):
be that people are starting toask the hard questions, like if
we want to use this as areproducible engineering
discipline we build things thatwork, we know they work, we turn
them on tomorrow, they workagain tomorrow Like, if we want
to build that on top of thesestochastic and probabilistic
sort of components, thesesquishy, soft components, how do
(09:03):
we actually do that?
And so more and more of theconversations I feel like I'm
having with people.
They are looking at thatproblem and not having wishful
thinking, not just being like,okay, well, some new paper, at
least next week, is going tosolve it all for us.
I think people have kind ofgiven up on that a little bit.
People are starting tounderstand to actually deploy
LLM AI technology at anenterprise grade level, you have
(09:26):
to be extremely thoughtfulabout each piece of it.
You have to do the evaluations.
All the magic is actually inthe evals and whether you build
your own framework, whether youuse one of the existing ones
that are out there, there is nosilver bullet.
I think the sort of theevaporation of the silver bullet
might be the biggest vibe shiftover the last year.
That it's not just RAG, it'snot just agents, it's not just
chain of thought.
It's a lot of these thingsbeing put together thoughtfully
(09:49):
and then a thoughtfulenterprise-specific domain and
problem-specific set ofevaluations.
There's not a shortcut to it.
So I think that's kind of whereI'm glad to see the industry
conversation maturing aroundthat, because that, I think, is
the high integrity thing to do.
Speaker 1 (10:07):
Oh, such a great
insight and as you look for
opportunities for improvement inthe AI developer workflow right
now, obviously you're focusedon tooling.
Where do we need improvement?
If you were to do a SWOTanalysis, Is it tooling, of
course, but process mindset,training, other things.
Speaker 2 (10:24):
Yes to all of them.
Sorry, that's sort of a cop-outanswer, but that is it to all
of them.
Sorry, that's sort of a cop-outanswer, but that is.
It's all of these.
I guess the thing is what Iwould, maybe as a metaphor we
use metaphor, right, if we'removing from like the 1920s and
30s and you've got cars and weknow how to drive cars and
there's some safety that youneed to put around cars, that's
maybe traditional softwaredevelopment and then we move to
(10:47):
jet airplanes that go like Mach0.9, 0.95, you have to do all
the things, but better, right,your manufacturing tolerances
have to get lower, your pilottraining has to get better.
The infrastructure you build,the runways have to be much,
much smoother than just somecrappy little country road,
right, the tires have to be highquality rubber or they explode
when you land, like all thesethings.
It's, unfortunately, yes, butif you get all that right, wow,
(11:08):
you can hurl 300 people throughthe air at the speed of sound,
right, so there is a benefit toit.
But it's not just going to belike just turn the crank and now
it's easy.
I don't think there's an easymode.
There's an easy mode todeceiving yourself that you're
doing something interesting, todo actually something correct,
is not going to be an easy mode.
It requires upgrading all ofthese things and actually the
hardest thing isn't thetechnology, the tooling, it is,
(11:30):
I think, the mindsets.
I think it's setting executiveand other stakeholder
expectation and that's somethingon us as an industry to sort of
level set right.
And that's where conversationslike this, I think, hopefully
can have some kind of impactwhere people can say okay,
everyone else is also struggling.
Speaker 1 (11:52):
It's not just me.
We don't really just suckuniquely.
Everyone's struggling with thisright Fantastic thought there.
So there's a lot of discussion,controversy occasionally,
around AI models built behindclosed doors.
Speaker 2 (11:59):
Do you?
Speaker 1 (11:59):
think the closed
source, if you call it that AI
approach is a threat toinnovation, or is this just the
way business gets done?
Speaker 2 (12:10):
Do I have permission
to speak freely here?
Speaker 1 (12:12):
It's up to you and
your team.
I'm happy.
Speaker 2 (12:14):
Yes, no, I think.
Look it's.
I think it's going to followthe same cycle of technology
innovation and adoption asanything else.
Right that?
Certainly, for people to garnerthe investment and to sustain
investment enthusiasm, they haveto tell a story about some
proprietary advantage.
(12:35):
And when you look at AI, well,it's either the data or it's the
model, or it's your people thatknow how to train the data into
a model.
But the algorithms are allbeing published in papers that
are widely available.
The hardware you're doing it onit's the same gpus that jensen
is selling to everybody else, soit's not like you got a lock on
the architecture of thehardware.
(12:56):
Now google does have their ownhardware.
There's a lot of people makingcustom hardware, but for the
most part those those are costbenefits.
Those are not like a quantumleap in capability of what an
LLM does or what an AML does.
So the question is, if youdidn't say that I have something
closed source that's opaque andthat's of special value, you'd
(13:17):
have some hard conversationsfrom the investor side of the
world.
So I think there is somemotivated reasoning there.
But if you think about thisfrom the point of view of just a
machine learning expert orsomeone who's thinking about
these things at a technicallevel.
There's really no magic orsecret sauce in this.
There are techniques, of course, to get the most efficacy.
When you're training to get itto converge loss curves, I get
(13:40):
it.
There's definitely real skillthere.
But is it a hundred billiondollars of market cap worth of
skill?
I don't think so, right Causewe've seen many teams frontier
teams in the world are alwaystrading off the pole position
and the leading position fortheir models, which means that
the smart people that open AIhas.
Well, anthropics got smartpeople too, as does Gemini, as
(14:01):
does Baidu Everyone's got smartpeople.
So the thing is okay in thatcase, even if we say the closed
source stuff has some specialsauce, it looks like lots of
people have the special sauce.
And when we think about okay,where does the data come from?
Well, a lot of it is scrapedoff the open internet.
A lot of it is public data.
A lot of it is books and otherthings.
Some of them are in the publicdomain.
(14:26):
So I think, in long term, thata huge part of the value in an
LLM is based on data that is inthe commons or publicly
available to everyone.
So the baseline that should bepublicly and generally freely
available is going to be quitehigh.
I don't think it's like youeither get chat, gpt or Lama
5ives or nothing.
I think that the what'spublicly available in the open
(14:47):
isn't pretty high, which meansthe commoditization pressure is
gonna be quite steep.
It's gonna be quite, quite high.
And I think in the long run, Ithink the industry will have to
trend towards transparency.
Otherwise, I mean you see thelatest controversy with what's
happening with grok right overat XAI, and you know that's not
tenable.
We cannot have a customersupport chatbot for medical care
(15:08):
all of a sudden spewinganti-Semitic stuff because a
patient's name is like Feinberg.
You cannot have that.
That's not the world we want tolive in, right?
So we have to have transparency, we have to demand
accountability in how thistechnology gets built.
Speaker 1 (15:23):
Amazing.
Well, that's quite a mic dropmoment for this discussion, but
I do have one more question.
I mean looking ahead.
You're such an innovator.
Where do you see the opensource AI movement headed over
the next couple of years andwhat's your role in it?
Speaker 2 (15:38):
Yeah.
So I think that it will becomemore apparent that the open
source and transparent, publicand commons AI can be done and
can be done competitively.
I'm personally leading someefforts around that and we'll
see, hopefully, more of theannouncements around some of
that stuff here in the fall.
But there's a large number ofpeople nonprofits, governments,
(16:02):
the UN, a lot of people who wantto see this as a technology
that belongs to everyone becauseit's based on the works of
everyone.
It's like it's.
Why would it belong to everyone?
(16:24):
So that's something we're goingto see that the conversation
will have to start reallyfocusing on the supply chain,
the data supply chain for thesemodels, and we're going to have
to have much more focus onevaluations, on how to engineer
safety around these kinds ofprobabilistic systems, how we're
going to ensure when we putthese sensors into drones, into
(16:46):
autonomous vehicles or householdrobots and humanoid robots.
Those things cannot be opaque.
There has to be a liabilitychain.
There has to be a place wheresocieties and governments and
regulators come in and say theseare acceptable, these are not
acceptable, and we'll drive realaccountability around that.
So I think the role that opensource has to play in this is
that we can show you can dothese things in an open and
(17:07):
transparent and accountable wayand in that way really, you know
, just set the conversation sothat there's not by default
expectation that these have tobe black boxes.
Speaker 1 (17:18):
Such a great insight.
So you're in Austin, your teamis everywhere.
That's right, a little bit of aquiet period here in the summer
, but where can people meet youvirtually in person?
Any travel or events or meetupsor otherwise in the next few
weeks or months?
Speaker 2 (17:34):
Yeah, we've gone
through the summer spate of
conferences.
I'll be at the AI4 conferencein Vegas at the beginning or the
middle of August.
The Anaconda folks are aroundat a number of different kinds
of conferences and coming up inthe fall.
You know we'll be.
All the major industry AIconferences will plan to be
there, so just look for theAnaconda booth.
Come by, talk to us.
Love to talk to people aboutour AI platform, how we help
(17:56):
enterprises do AI in aresponsible, governed way, and
I'm also happy to chat withanyone who wants to talk more
about open source and opensource AI.
Speaker 1 (18:06):
A wonderful mission.
Congratulations on the successonwards and upwards.
Thanks Peter, thank you, evan,and thanks everyone for
listening and watching.
And be sure to check out ournew TV show, techimpacttv, now
on Fox, business and Bloomberg.
Take care.