Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jerod (00:04):
Welcome to Practical AI,
the podcast that makes
artificial intelligencepractical, productive, and
accessible to all. If you likethis show, you will love the
changelog. It's news on Mondays,deep technical interviews on
Wednesdays, and on Fridays, anawesome talk show for your
weekend enjoyment. Find us bysearching for the changelog
(00:24):
wherever you get your podcasts.Thanks to our partners at
fly.io.
Launch your AI apps in fiveminutes or less. Learn how at
fly.io.
Daniel (00:44):
Welcome to another
episode of the Practical AI
Podcast. This is DanielWitenack. I'm CEO at Prediction
Guard, and I'm joined as alwaysby my cohost, Chris Benson, who
is a principal AI researchengineer at Lockheed Martin. How
you doing, Chris?
Chris (01:00):
Doing well today, Daniel.
How's it going?
Daniel (01:02):
It's, it's going great
on the on the road this week,
talking talking about AI andsecurity in in various places,
which is which is always alwaysfun. And, often, you know,
things come up in those sorts oftalks. One of the things I've
got a lot of questions aboutthis week, actually, is the
(01:23):
impact of AI on coding workflowsand vibe coding and all of those
sorts of things. And reallyhappy to have with us some
really amazing guests today tohelp us talk through some of
those subjects and also sharewhat they're doing both on
tooling and and models. We'vegot Robert Brennan, who is co
(01:44):
founder and CEO at All Hands AI.
And then we have, Graham Newbig,who is co founder and chief
scientist at All Hands AI andassociate professor at Carnegie
Mellon. How's it going, guys?
Robert (01:56):
It's going good. Very
good. Thanks for having us.
Daniel (01:58):
Yeah. Thanks for
joining. Maybe, one one of you
or both of you could kind ofgive your thoughts generally.
Like I say, even this week atconferences, it seems like half
of the questions that I'mgetting are around how AI is
impacting developer workflows.You know, how many people are
really using Vibe coding tools.
(02:19):
If you're using Vibe codingtools, you know, what impact is
that having on on code quality?All all of these sorts of
things. So I'm wondering, youknow, from the perspective of,
you know, all hands and the workthat you all are doing, what
does the kind of environmentaround these code assistant,
vibe coding tools look like fromyour perspective right now?
(02:41):
Kinda what does that ecosystemlook like? And then maybe set
all hands in the in the contextof that would would be helpful.
Robert (02:48):
For sure. Yeah. So
there's there's there's a huge
variety of tooling out thereright now for code generation.
So it's a very hard space tonavigate. There's two ways I
like to to bifurcate the space.
One is on the one hand, you havea lot of tools that are really
meant for, like, rapidprototyping. There's some really
cool stuff happening there,stuff like Lovable, Bolt dot
(03:09):
dev, v zero dot dev, stuff. Theytend to be very visual. You're
getting, like, quick prototypesof games or websites, things
like that. Some really fun stuffhappening there.
And the stuff that's enabling,like, a whole new set of people
to experiment with softwaredevelopment, people who, you
know, maybe, like, designers orproduct managers don't really
have coding experience, maybehave, like, very little coding
(03:29):
experience, they can now buildwhole apps, which is super cool.
And then on the other end of thespectrum, have stuff that is
much more oriented towards,like, senior developers who are
shipping production code.They're working on a on a code
base that's gonna go and servemillions of users, where you
have be a little bit morecareful about what's going on.
Also, really cool stuffhappening on that on that end of
the spectrum. And then the otherway I like to to bifurcate this
(03:51):
space is that you have sometools that are very tactical.
So stuff like GitHub Copilot,where it's, you know, inside
your IDE, you know, it's it'ssuggesting code, like, exactly
where your cursor is inside thecode base. You're, like, you're
zeroed in on a task, and the AIis just, like, helping you move
faster through that task. Andthen on the other end, you have
these tools that are much moreagentic. Right? They're able to
(04:13):
just take a quick humandescription of a problem, and
then go off and work for five,ten, fifteen minutes while you
go get a cup of coffee or workon a different problem or catch
up on email, and then it comesback to you later with, with the
solution.
And open hand sits basically onthe right end of both of those
spectrums. Right? We are reallyoriented towards senior
(04:34):
engineers who are working onproduction grade code bases, and
we're really oriented towardsthis more agentic workflows
where you're giving an agentsomething to work on, and it can
iterate forward on its ownwithout you having to babysit
it, without you having to be,you know, you know, squinting at
your computer screen trying tofigure out, you know, where you
should be editing.
Daniel (04:50):
Yeah. That that's super
helpful. I'm wondering this is
might be an interestingquestion, but I know, Graham,
we've run into each other in thepast as related to human
language related work in anothercontext. I'm wondering from your
perspective as kind of chiefscientist, but also a
(05:11):
researcher, as you've dug intothis all hands project and work
and product, what has beensurprising in terms of
challenges around this and maybethings that were surprising in
terms of easier than what youmight have thought. Any thoughts
there?
Graham (05:32):
Yeah, it's a great
question. Thinking back in
hindsight, it's kind ofsometimes hard to come up with
surprising things because thethings that were formerly
surprising now seem kind ofobvious. But one of the things
that I actually wrote a blogpost about before was right when
the Open Hands project startedout, we were kind of on this
(05:52):
bandwagon of trying to create abig agentic framework that you
could use with and define lotsof different agents. You could
have your debugging agent, youcould have your software
architect agent, you could haveyour browsing agent and all of
these things like this. And weactually implemented a framework
where you could have one agentdelegate to another agent and
(06:12):
then that agent would go off anddo this task and things like
this.
One somewhat surprising thing ishow ineffective this paradigm
ended up being from twoperspectives. So the first
perspective is it didn't reallyand this is specifically for the
case of software engineering.There might be other cases where
(06:35):
this would be useful. But thefirst is in terms of
effectiveness, we found thathaving a single agent that just
has all of the necessarycontext, it has the ability to
write code, use a web browser togather information and execute
code. Ends up being able to do apretty large swath of tasks
(06:56):
without a lot of specifictooling and structuring around
the problems.
And then the other thing isbuilding many, many different
agents is kind of a relativelylarge maintenance burden if
they're not very easy to define.So we've basically gone of full
in on having a single agent thatcan do many, many different
(07:18):
things. But in order to do that,it has to have the ability to
pull in whatever information itneeds. So we have a framework
called micro agents wherebasically you can pull in a new
prompt or a new tool orsomething like that for a
particular task, but theunderlying overall agent is a
single agent that can do manydifferent things.
Chris (07:37):
Quick follow-up on that.
Just for listeners who aren't
really familiar with agenticworkflows and stuff, Could you
talk just a moment about whatthat means? You know, if so, if
you've been developing for anumber of years in the more
traditional workflows that we'veall, you know, kind of all
started out at, and, and nowwe're hitting this this world of
(07:58):
agentic possibilities. Could youtalk a little bit about what's
different for the user fromwhere they came from, in kind of
traditional developmentenvironments to what agentic
development workflows are like?
Robert (08:11):
Yeah, so, you know, I
think the sort of like step one
of integrating AI into yourdevelopment process was like
Copilot, right, where it'sreally just plugging into
Autocomplete. Right? We're allfamiliar with Autocomplete.
We've been using it for decades.It just got a thousand times
better all of a sudden.
Instead of just completing aclass name, now it's writing
like, you know, several lines ofcode. So that was like a huge
boost to my productivity when Iadopted Copilot. Was like, yeah,
(08:34):
this is amazing. And then I wasstill, you know, for bigger
chunks of code, was like goingto ChatGPT, and I was like, hey,
can you write a SQL statementthat'll do x, y, and z? Or, you
know, things like that.
And often, I found myself doingthis workflow where I would ask
ChatGPT or Claude to, like,generate some code. I'd paste
that into my IDE, run it, get anerror message back, paste the
(08:54):
error message back into intoClaude or ChatGPT, and I just do
this loop. And at some point,was like, well, this is dumb.
Like, I'm just shuffling textbetween one app and another. And
that was actually when I I builtmy first, like, agent basically,
where I built a little CLI thatwould just do that loop with
with Anthropic on in thebackground.
(09:14):
And that's that's kind of likethe core of what an agent is.
It's it's doing a full a fullloop where you basically you
give a problem to the agent andsay, yeah, like, okay. I wanna
write a SQL statement that doesx, or I wanna modify my app to
add this new feature. The agentwrites some code. It runs the
code.
It sees what happens. It getssome kind of output from the
real world, whether that's likethe output of a command or
(09:35):
maybe, you know, the thecontents of a web page or the
contents of a file, puts thatback into the LLM's context, and
then it can take one stepforward closer to its goal. And
then, you know, you can, as asyou get kind of better and
better and more accurate attaking one step closer to your
goal, you can take on longer andlonger range tasks. So I would
say in the beginning, agentswere really good for things that
(09:55):
would take like 10 steps, youknow, something really simple,
like implement a new test andthen make sure it passes. And
now they can implement, youknow, things that take hundreds
of steps, which is really cool.
I mean, that's that's thechanges that we've seen over the
last, you know, six to twelvemonths is that they're able to
take on these huge tasks. So Ican say, implement feature x,
you know, front end, back end,and add testing. And today's
(10:19):
agents are able to just continueexecuting, stepping forward into
that until it comes to full PRwhere all the tests are passing
and it's just kind of packagedup and ready to go.
Daniel (10:28):
Selfishly, maybe I'll
pass on a question for I was
sitting around with a number ofpeople at the conference I'm at
last night, and there were someopinions. This gets to some of
what you were just talkingabout. I mean, some of what you
talked about at the beginningabout this being geared towards
more senior, maybe more seniordevelopers working in an
existing code bases or somethinglike that, but also what you
(10:50):
were just talking about, aboutthat kind of workflow. It was
kind of the opinion around thegroup that I was with last night
that, hey, a lot of these toolsmight be well suited to senior
engineers because you caniterate like that and actually
have a sort of smell test forwhat's going right and what's
going wrong, but not really forless experienced developers or
(11:13):
new developers who really don'thave that ability. I'm curious
to understand your perspectiveon that, and maybe who's sort of
using this and who's using itsuccessfully, I guess, is the
question.
And what does that persona looklike?
Robert (11:30):
Yeah. I think it's
important to realize that you
still need to keep all the samecode quality controls in place
that you did before the age ofAI, if not more code quality
controls. Right? You needeverything needs to go through
code review. You need somebodywho's familiar with the code
base to look at the changes thatare happening.
I would say one of the kind offailure patterns I see with the
technology is a lot of times ajunior a junior engineer or
(11:54):
somebody who doesn't really knowhow to code, you know, Vibe
codes their way to, like, apretty good MVP because these
agents are especially good at,like, Greenfield stuff. Right?
They can build a, you know, a todo list app all day. And then as
you layer on more features overthe course of, like, weeks or
months, the code base juststarts to rot a bit. Like, the
agent adds a bunch of maybe it,like, duplicates a whole
(12:15):
function because it couldn'tfind the original function, or
it just keeps expanding thesingle function so that it's
like thousands of lines of codeand has all these forking paths.
If you don't have somebodylooking at the changes that are
being proposed and critiquingthem and like telling the agent,
hey, you you added this newfunction, but we have an
existing function that doesthat, or, you know, this
function's getting too big,please refactor it. If you're
(12:37):
not looking over its shoulderand critiquing its work, the
code base will just grow intothis monster, and you'll have to
throw it all away because it'sjust it's beyond repair.
Daniel (12:46):
Well, I do wanna get
into some of the kind of unique
elements of All Hands and theperspectives that you all are
taking. One of the things, ofcourse, that strikes me right
away as I, it's even top of theweb page when you go there is
(13:07):
your approach to do this kind ofopen source. So open source
approach to this kind of toolfor developers. I'm wondering,
both of you could speak to this,but maybe Graham, you could
start in terms of, obviouslyyou've built various projects
over time and done research andbeen plugged into the research
community. Why from yourperspective, it important that
(13:30):
at least some key portions ofwhat you're building here are
open source and what you thinkthat might mean for these kinds
of tools, including all handsmoving forward?
Graham (13:43):
Yeah, so there are a
number of reasons why we decided
to do this open source. Thefirst reason is I think
everybody in our communitybelieves that this is going to
be very transformativetechnology. And it may
drastically change the way we dosoftware development going
(14:06):
forward. And we have twooptions. We have an option where
software development isdrastically changed for us by
other people, or there's theoption where we do it ourselves
together.
And we believe in the latterapproach basically. We believe
that if this is going to have abig effect on software
(14:27):
development, software developersshould be able to participate in
that. That's kind of theideological point of view. The
other point of view is we alsobelieve that from a research
perspective, open source,especially from the point of
view of agent frameworks, notnecessarily the underlying
(14:47):
foundation models, but from thepoint of view of agent
frameworks, open source is notever really going to be behind
the closed options. The reasonwhy is because academia and all
of these people really love thistopic.
They really love working on it.If we have an open framework and
we can provide a platform bothfrom the point of view of having
(15:10):
a good code base that's easy toexperiment with and providing
resources to people who want todo experimentation on these
topics, then the open sourcecommunity together will be just
as good as any company that isworking on this in a closed
manner. And so instead ofreinventing the wheel, we can
all invent it together and comeup with something really good
(15:32):
that's good for developers,interesting for the academic
community and other stuff likethat.
Chris (15:36):
Could you talk a little
bit about how you bring
developers into this process?Since that's kind of
foundational to how you'reoperating, could you talk a
little bit about what you'relooking for, how you bring
people into your community andkind of ramp them up on that?
Graham (15:51):
Yeah. So it's kind of
interesting. Our software is a
little bit complex becausethere's necessary complexity in
order to do things like make avery strong agent, give it all
the tools it needs, allow it torun-in a safe manner and things
like this. One thing that we tryto do is we try to If people are
(16:15):
interested, point them in thedirection of issues they could
start working on. We have aunique problem, which is a lot
of the easy issues that would begood for developers to learn
more about the code base arejust solved by the agent.
We're still working through thebest way to fix that. But
especially front end stuff, wehave a new front end capability
(16:36):
that we'd like to have. We'vehad a lot of people join
successfully through that. Thenwe've had longer term research
projects where we collaboratetogether with people in
universities And we've beenpretty successful at doing some
interesting things there, Ithink.
Daniel (16:49):
Cool. Yeah. I'm
wondering, Robert, from the
perspective of obviously,sometimes this is hard to do on
an audio podcast. But if youcould just give a sense, I just
logged into All Hands not thatlong ago online, so I see some
visuals. But if you could maybepaint the picture so there's the
(17:11):
open source side of things,which I'm assuming means people
could maybe host all handsthemselves, which might be
interesting for some, but youalso have kind of a hosted
version of that.
Could you just talk us throughthose options, how kind of
people can access this and youknow, what they'll see, how they
how they integrate or how theyconnect their code into into all
(17:36):
hands to get started, that kindagetting started picture.
Robert (17:39):
Cool. Yeah. Yeah. So for
the open source, everything runs
inside of Docker. So thatincludes the application itself.
You just run Docker run, andyou'll see, you know, a web
interface running at local host3,000, and you can just drop in
a prompt to the agent. You canalso connect it to GitHub by
generating a token inside ofyour GitHub settings, plugging
that into the UI, and then youcan start to pull and push to
(18:01):
your repositories. It's a littlebit tricky running things
locally because not only do werun the application in Docker,
but when you start a newconversation with the agent, we
want to make sure the agent'swork is done in a nice sandbox
way so the agent gets its ownDocker container to work inside
of. So there's a little bit oftrickiness we have to deal with
a lot of, like, troubleshooting,you know, why isn't Docker
(18:22):
behaving properly kind of stuff.So it's it's it's a little bit
of a difficult application torun locally.
So we actually createdapp.allhands.dev where you can
use OpenHands in the cloud. Andthis is a really just like it's
pretty much, you know, one forone in terms of the
functionality with the opensource. But there's a bunch of
convenience features because,you know, a, we we have this
(18:44):
persistent server in the cloud,and b, we can take care of all
the infrastructure for runningthese sandboxes for the agent.
So, like, you know, forinstance, when you start up a
conversation in the cloud,sandbox comes up within, like,
one or two seconds rather thanhaving to wait, like, thirty
seconds or so for it to start upon your local machine. And we
also can, like, connect intoGitHub a little bit more
seamlessly because we can havean OAuth application where you
(19:06):
just, like, one click login and,you know, where we can access
everything.
And then the the cloud featurethat I love more than anything
is that, if you can if you leavea comment in, like, a, like, a
pull request, like, say thetests are failing, you can just
say, add OpenHands. Please fixthe tests. And because we have
this long lived server in thecloud, that can just kick off a
conversation automatically, andOpenHands will just commit back
(19:28):
to your to your pull request.Those are actually the
interactions I love the mostwhere I don't have to go into
the OpenHands UI and, like,fiddle around. I just inside of
GitHub or soon inside of Slack,I just, you know, summon the
agent and it just does the workfor me and I get to, you know,
reap the fruits at the at theend there.
Graham (19:46):
My favorite is
programming from my phone. Yeah.
So you log you log into the appand then just tell it what to
do. Do that while I'm walking towork. And by the time I get to
work, I have a full request toreview.
It opens up a lot ofpossibilities if you don't have
to run it locally.
Daniel (20:00):
Yeah. Yeah. I could
imagine also just in the spur of
the moment thinking of somegreat feature to add. And a lot
of those things are lost, right?So if you have the ability to
just, I know some people, orit's when they're running on a
treadmill, they're coming out ofthe shower or something, they
can just pop in and give aprompt and have some work be
(20:21):
done and then finish gettingready and get into work.
I love that idea.
Robert (20:24):
Yeah. It's funny. I feel
like I'm still getting a lot of
coding done despite being theCEO of the company and being in
meetings all the time. Becauseas I'm going into a meeting,
I'll just like quickly be like,hey, do x y z, go into the
meeting. And then once I'm done,it's just the codes that are
waiting for me.
Chris (20:38):
It's funny, you guys are
actually already leaping ahead
and answering the question I wasabout to ask, because I was
thinking, Robert, on your firstanswer a moment ago. And that's
really, you know, it'sdramatically changing the
workflow and the and, you know,not only the workflow, but you
know, how and where you'recoding and stuff like that. As
(20:59):
what I'm kind of curious, Imean, this is it's it, you know,
if you've been developing for along time, this feels a little
bit magical. And as you've hadusers come into this new
workflow, what are the kind ofthe mindset shifts that are
either challenges or maybe mostwelcome on the conversely, that
you know, that get peopleproductive and useful and
(21:22):
recognizing the utility of thisand benefiting from it? Because
there's a little bit of a leapfrom kinda where they grew up
into the bold new world of this.
What's that mind shift like andhow do you get people through
that?
Robert (21:36):
Yeah, it's a great
question. It's actually very
similar to when I startedmanaging folks. For one, like,
you you just have to get good atthinking, like, oh, no. I should
delegate this. You have to,like, kind of have that switch
flip and, like, your instinctis, like, fire up the s code and
just, like, start working, andyou have to you have to have,
like, have that moment of, like,oh, no.
Like, this is actually a goodthing for the agent to work on
(21:57):
or for my employee to work on.And there's also, like, a little
bit of a trust thing. Right?Like, when I first started
managing folks, I wanted tomicromanage them. I wanted to,
like, tell them exactly how todo everything, and it ended up
being just more work for both ofus and frustrating for them.
Once I learned to, like, trustmy employees and know that,
like, they might not do itexactly like like I would do it,
but, like, they're gonna do agood job. They might need some
(22:18):
coaching and some direction, butbuilding building that trust
over time is is reallyimportant, and it's the same
thing with the agent. You know,the agent isn't always right.
You do need to, you know, I liketo say trust but verify. Right?
You need to read its code and,like, understand what it's
trying to do and where it mighthave misunderstood something and
maybe iterate a few timesthrough either, like, a code
review in GitHub or by just,like, chatting with it inside of
(22:40):
application itself. But, yeah,very, very similar to that
management experience of, like,learning to kinda take your
hands off the keyboard and bereally clear with somebody else
about communicating these arethe requirements, and here's how
you can improve and things likethat.
Daniel (22:54):
Yeah. Graham, I I have
maybe a question that I also get
a lot of times. You know, one isactually related to you know,
Chris just asked one questionthat I get a lot, which is the
workflow related stuff. But theother question that I get a lot
related to these types of toolsis, hey, I've seen people create
a lot of cool demos with thesesorts of tools, small projects
(23:15):
that you can kind of like sortof regenerate if it doesn't
work. But if I'm working in alarge existing code base to the
points that were brought upearlier, that's where most
development happens.
What are the technical piecesthat have to be in place for you
to have an agent work in a kindof larger code base or an
existing project, and actuallyhave the context that's needed
(23:40):
to do things that fit, have thecontext of other things that
exist in the code base, but alsopotentially the context of maybe
it's a company style or otherthings like that?
Graham (23:56):
Yeah, it's a good
question. For reference, the
OpenHands agent is the largestcommitter to our code base. So
we're definitely And our codebase is rather large and
complex. So I just checked nowand it had two zero nine commits
over the past three months andthe next closest contributor had
142. So it's doing pretty well.
(24:18):
But there's a bunch of technicalpieces that need to go together
to make that work. Theunderlying language model is
really important. Fortunately, alot of the core language model
providers are focusing on this.We're also training language
models ourselves. But theunderlying language model needs
to have a lot of abilities.
(24:39):
One kind of boring but extremelyimportant one is the ability to
edit files. So about six monthsago, this was a major problem
for most language models. Theywere not able to successfully
generate a diff between what aportion of the file used to look
like and what the new portion ofthe file would look like, or
(25:01):
they would add an extra line orduplicate things or stuff like
this. So this was a majorproblem. Claude is very good at
this right now.
A lot of the other languagemodels are kind of catching up
to be good at doing this.Another thing that's kind of
more on the especially a bigproblem for large code bases is
(25:21):
identifying which files to bemodifying. And this is somewhat
less of a big problem than Ioriginally thought it would be.
I was imagining that this wouldbe a really huge problem, but
actually language models arepretty good. Even if you give
them no tools to specificallysearch a code base or something
(25:43):
like that, they use finding grepand all the other tools that a
normal programmer might use andare able to navigate their way
around the code base.
But I do think that code searchand other things like this can
help this. We have somepreliminary results that
demonstrate that it doesn'tnecessarily improve your
(26:04):
resolution accuracy, but itdefinitely improves your speed
of resolution. And so that'sanother thing. Being able to run
tests and iterate on tests,being able to write appropriate
tests that test whether a newpiece of functionality or adding
is actually working as expectedor not, being able to try on the
language model side, being ableto try lots of different
(26:26):
possibilities. So for example,one big failure case of a lot of
language models is they try thesame thing over and over and
over again and get in loops andnever get out of that.
Models like Claude are good atnot doing this, whereas a lot of
other models fall into thisfailure mode and don't do as
well. So the list goes on. Icould talk about this for much
(26:50):
longer, but those are some ofthe most important parts, I
think.
Daniel (26:54):
Well, Graham, you're
already getting into this, which
is another thing that I wantedto ask about. Maybe you could
comment from your angle oftechnical and research side. And
Robert, I'd be curious on thekind of business product side
about generally why you got intoalso building models. And for
(27:18):
those that want to take a look,there's some really great models
that All Hands has released.It's just all Dash hands on
Hugging Face, you can read alittle bit more and we can talk
about the details here.
But yeah, maybe first, justlike, why was that a step that
you all felt was important andor kind of wanted to be part of
(27:39):
your contribution to the space?
Graham (27:42):
Yeah. So there's two
reasons. The first reason is we
are an open source company andwe kind of philosophically
believe in open source andopenness. If you're relying on a
closed API based model entirely,then you can never fully achieve
(28:03):
that goal. Another thing ispractically there are issues
with customizability and costfor closed models.
And the best closed models aresomewhat expensive. There's a
non trivial cost involved withusing them to do agentic tasks,
(28:26):
especially because you need toquery them over and over and
over again. And so havinganother option that's more cost
effective that we can eitherjust use as is or possibly
switch over to that for easierportions of a task, but use a
more expensive model for theless easy portions of the task
is something that would beuseful. And then
(28:47):
customizability, we have a lotof our enterprise customers or
design partners asking for somevariety of customizability, be
it to their code base or to aprogramming language that
they're interested in workingwith and other things like this.
And if we don't have a modelthat we can fine tune, we are
(29:08):
limited in our scope of thingsthat we can customize.
So looking forward, that'ssomething that we would like to
do. And we're not done yet. Wejust released V0.1, so we'll
definitely continue beinginterested in this in the
future.
Daniel (29:24):
Awesome. Yeah. I guess
from the product perspective,
Robert, terms also of the hostedversion that you're running, the
one that people can log into,are the models that you've
you've built, are are thoseintegrated to one degree or
another in that in that kind oflive product, or or what's the
(29:45):
kind of road map there?
Robert (29:47):
So yeah. So right now,
it's all Cloud three dot seven
under the hood. There are somereally cool ways where we can
build where we can build ourmodels into the process. One is
if we can route certain parts ofthe agentic loop or certain
queries to cheaper model ratherthan putting everything through
the most expensive model outthere without sacrificing
(30:09):
accuracy. That's that's reallygreat for our users because we
can pass those savings ontothem.
So that's that's one reallyinteresting path. Another path
that we have where, we have amodel that is specifically
trained, basically to recognizewhether OpenHands is on the
right track to solving a problemor if it's like going off the
rails. Right? So we built thismodel specifically based on the
(30:31):
dataset that we've gathered. Andthat's a really cool product
feature because on the one hand,like, you can just recognize,
like, did we achieve did we didwe solve the task or did we not?
And, like, report back to theuser appropriately. We can stop
the agent if it's, like, goingoff the rails, and we can say,
hey. This is this is what'sgoing wrong. Please reroute, you
know, using this new strategy.We can also, like, launch
(30:53):
several different trajectoriestowards solving a problem and
then, you know, maybe pick oneout of the out of the three that
we launched and say, okay, thisone looks like it's going in the
best direction.
Keep following this one and killthe other two. So lots of really
cool stuff we can do there byhaving a model that specifically
knows kind of the inputs andoutputs of what OpenAnts is
doing.
Chris (31:12):
Well, while you're
talking about that, I'm
wondering, could you talk alittle bit, you know, with
whether the models we've kind oftalked about the, you know, the
frameworks and stuff being open.Are you looking at models that
you're creating being open? Ordoes that say as part of a
proprietary offering? How howare you envisioning that from,
you know, in terms of the whatthe models are, what they're
(31:32):
addressing, whether they'relarger or smaller models, what
licenses apply, that kind ofthing. Could you speak a little
bit about your philosophy andstrategy toward that?
Robert (31:41):
Yeah. I mean, so far,
it's we're opening everything
up. Right? We've we've taken theposition that we basically want
OpenHands to be as useful aspossible to an individual
developer running it on theirworkstation. Right?
You know, we are a company. Wedo wanna make money. And so we
are building some closed sourcefeatures specifically for, like,
large teams who are usingOpenHands together. But so far,
(32:03):
we've taken the position thatbasically all the research we do
and all the, like, know how forhow the agents do as good a job
as possible at solving softwaretasks, that should be open
source. That should be availableto every developer.
And it's stuff likecollaboration features, things
like multi tenant, things likeauditing, compliance, stuff
that, like, big enterprises needthat your average developer
(32:24):
working on an open sourceproject doesn't need. That's
what we're gonna hold back andsay, okay. This is closed
source, and we're going toenable big enterprises to do
this stuff, you know, the waythat big enterprises like to do
things.
Chris (32:34):
And one other follow-up,
just because I happen to work in
an industry where security andprivacy are really paramount.
How are you thinking about likewith, you know, with going
instead of going off to one ofthe large foundation models via
cloud, often that runs intochallenges for enterprises that
have security concerns, inparticular, any thoughts on on
(32:58):
or something that you can offerfor when it needs to be, know,
all held close, closely helddata cannot go out onto a cloud
connection, that kind of thing.What what you're thinking about
that either for present or forthe future?
Robert (33:10):
Yeah. So so we basically
have three offerings. We've got
the open source, which anybodycan run and use for free. A lot
of security conscious companiesdo start with the open source
because everything they can hookit up to Bedrock or, you know, a
local model or, you know,basically, they can plug into
the existing models that thecompany has approved. We have
the cloud offering, which allruns through Anthropic, all runs
(33:31):
through our servers, which is agreat convenience for a lot of
people, but kinda scares offsome companies that are very
security conscious.
But then we can also takebasically all the infrastructure
we've built for our cloudoffering and ship it into
somebody else's cloud. So youcan run it all inside your AWS
environment. You can connect itto Bedrock. So it's basically
all configured to stay withinyour walls.
Daniel (33:52):
I'm wondering kind of
just thinking about like current
current functionality and, youknow, what what, Graham, you
mentioned all of these commitsfrom all hands in your own repo
and some of those kind of easy,maybe first issues that
developers could solve, maybethose are taken care of. How do
(34:14):
you see the level of performancenow? How are you all measuring
that and red teaming that,testing that over time, and
thinking about improving thatover time. How do you even kind
of consider something like that,given that there's so many
different types of projects outthere? Obviously, there's
academic benchmarks.
(34:36):
I think you have the SWE benchand those sorts of things. But
as a product, as an offering,how do you think and measure
kind of that performance overtime? What right now is
performing very well and maybewhere are those areas of
improvement?
Graham (34:54):
Yeah, it's a great
question. There's a lot to that
question, but just about how weare doing benchmarking. Up until
recently, we were doing a lot onSWE bench, but we have a very
large evaluation harness thatactually already has 20
benchmarks incorporated into itby our academic partners. And
one thing that we're thinkingabout doing going forward and
(35:16):
are actually kind of in theprocess of doing is we have
identified the common use cases,the ways that people typically
use OpenHands, and tried toidentify benchmarks that reflect
these use cases and then do amore balanced benchmarking
strategy across these. So wehave some pretty exciting
(35:37):
results about things like webnavigation and web information
gathering, which is really,really important for if you want
to function in an environmentwhere you have lots of docs or
learn about a new library or dodata processing, data science
related tasks.
And then we're also doing thingslike making sure that you can
fix broken commits. So you havea pull request that has failing
(36:01):
tests and merge conflicts, andcan you merge that in? And this
is something developers hate todo but need to do all the time.
So this is something we'reputting a lot of effort into
making sure we're good at, andwe have some good results about
that that we hope to releasesoon. And then other things like
test generation, versionupdates, things like this.
(36:24):
The academic world is large, soit turns out there are
benchmarks for almost all ofthese that have already been
created by some institutionssomewhere in the world. And so
very often we talk to theseinstitutions and say, Hey, do
you want to contribute this intoour evaluation harness? Often
the answer is yes because theydid their work for a reason.
They want it to be used. Sowe're using that as a way to
(36:46):
expand our vision ofbenchmarking to cover the actual
use cases that the users aremost interested in.
Chris (36:53):
Well, as we start to wrap
up here, one of the things that
we really like to do is kind ofget a sense of the future going
forward. And with both of youhere, I'd like to ask the same
question of each of you and geteach of your takes for a little
bit of diversity on on howyou're seeing the thing. But as
(37:13):
as you, you've kind ofintroduced us into this kind of
new way of thinking aboutdevelopment going forward, and
what's possible. For old guyslike me, it takes, it's
definitely changing how I thinkabout development. And this is
moving really, really fast rightnow.
And you know, it's accelerating.I'd love to understand how each
(37:36):
of you sees the future both inthe space itself in terms of,
you know, changing the world interms of developer workflows,
and your role in that process,as an organization and as an
open source community, how yousee those going forward. I'll
let you guys decide who wants togo at it first, but would love
(37:57):
to hear each of yourperspectives.
Robert (37:59):
Yeah, I think the thing
that that's really exciting for
me is the idea of bringing thenext, you know, billion
developers into the fold. Youknow, when I when I first
started learning to code, I feltlike a wizard. Like, could just
all of a sudden make my computerdo anything, and I could build
all sorts of differentapplications. And I was, you
know, a baby engineer. I wasbuilding all sorts of nonsense.
(38:19):
And but I just I felt sopowerful and so excited. And
then that, like, that fades overtime and, you know, it becomes a
job. And then I would say overthe last, you know, year or two,
I've got that excitement again.I feel I feel like a wizard
again. I can get so much done,you know, using large language
models and using using agents.
(38:40):
And so I'm really excited tobring that feeling to like a
whole a whole new tranche ofpeople who have maybe had ideas
for software that they want,for, you know, workflows that
they want, for applications thatthey'd like to have and just
haven't been able to like bringthem to life. And I think it's
really exciting that they'llthey'll be able to do that. I
think there's a lot of questionsas to like how we enable them.
(39:00):
Like, you know, my momdefinitely has some really cool
ideas, but she has no businesslike monitoring a production
database. And so I think we'regonna need to rethink like how
infrastructure works and how weship applications and things
like that.
I think there's a lot of thoughtthat is gonna need to go into
that. And I'm really excited tosee kind of what what shakes
out.
Graham (39:20):
Yeah. I love Robert's
answer, from a completely
different angle. One of thethings I have in my introductory
slides to a presentation I giveabout coding agents is looking
at the Nobel Prize winners fromlast year in physics and
chemistry. And the Nobel Prizewinners in physics were people
(39:42):
like Jeff Hinton and the ones inchemistry were people like Demis
Hussibis. And these areobviously the top awards in
areas other than computing.
And I'm building agents tocreate software. But the reason
why I'm building agents tocreate software is not because
(40:04):
software is the end. It'sbecause software is a means to
an end. And I think AI has ahuge possibility to increase the
impact and the human conditionand things like this. But I
think the way it's going to dothat is through software,
(40:26):
basically.
And so if we can make it veryeasy to effectively create
software and make it veryaccessible to the people who
want to use it, we'll be able tomake great strides forward. So
that's what I'm most excitedabout.
Daniel (40:44):
Awesome. Well, we're
definitely excited to see what
you all are doing. It's it'samazing work and really just
encouraged to hear also yourperspective, on the on the
project and the way in whichyou're building. I encourage all
of our listeners to go and checkout all-hands.dev. Check it out.
Try it out. And and, yeah, thankyou both for joining and taking
(41:06):
time. It's been great.
Graham (41:07):
Thanks for having us.
Thanks so much.
Jerod (41:16):
All right. That is our
show for this week. If you
haven't checked out ourChangelog newsletter, head to
changelog.com/news. There you'llfind 29 reasons. Yes.
29 reasons why you shouldsubscribe. I'll tell you reason
number 17. You might actuallystart looking forward to
Mondays.
Graham (41:36):
Sounds like somebody's
got a case of the Mondays.
Jerod (41:40):
28 more reasons are
waiting for you at
changelog.com/news. Thanks againto our partners at fly.io to
Brakemaster Cylinder for theBeats and to you for listening.
That is all for now, but we'lltalk to you again next time.