Can You Really Trust AI-Generated Code? - JSJ 699 - JavaScript Jabber

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:05):
Hey, folks, welcome back to another episode of JavaScript Jabber.
This week, I'm your host, Charles max Wood, and I'm
here with Itamar Friedman.

Speaker 2 (00:14):
Now it tomorrow.

Speaker 1 (00:14):
Do you want to let people know who you are
and what you do? I see your shirt it says quoto,
So you want to talk about what they do? And yeah,
and then we dive in and talk about whether or
not to trust your age AI generated code or code review.

Speaker 3 (00:30):
Really happy to talk about that topic and being here,
Charles like, really a pleasure. Developing community are really awesome,
especially as they're more.

Speaker 4 (00:38):
Specific you get into the details. COODO stands you mentioned.

Speaker 3 (00:42):
The name CODO stands for Quality off Development, and the
most most focused of our platform is around AI code review.
In general, we deal with quality, different AI quality workflows,
et cetera. Basically helping enterprise professional a depth teams to
standardize their quality via the review process or.

Speaker 4 (01:01):
Shift left, cold review and testing.

Speaker 3 (01:03):
It's okay, yeah, serving for example thousand around the world.
But I think, like really really exciting, for example, just
to talk is cold review important and all right now
and in the future.

Speaker 4 (01:15):
Forget about CODO. That's I think like a cool topic.

Speaker 1 (01:18):
Yeah, absolutely, Well you know, you kind of hit on
something that I think a lot of people are talking
about with the AI stuff, you know, whether it's a
code review or AI generated code. Maybe it's AI generated
code that's code reviewed by AI.

Speaker 2 (01:32):
I mean, I don't know.

Speaker 1 (01:33):
It seems like there's a lot of concern as well
as far as Okay, well, if I've got you know,
an AI n LLM generating code, am I even going
to have a job or you know, maybe they just
downsize most of my team and so I have to
be the most elite of the awesome elite at my company.
And so yeah, there is a lot of concern there.

(01:55):
And then the other piece of it is is, okay,
I've got this powerful tool, Am I using it right?
And so let's talk about the code reviews first, since
that's where you're kind of living these days. And I
don't think we've really gotten into the AI code reviews,
to be perfectly honest. I saw that GitHub does some
like you can turn this on for some of your

(02:15):
repos and I haven't even tried it because it's just
like I don't know, I mean, at work, you have
to have a human code review your stuff anyway.

Speaker 2 (02:25):
And then on my personal stuff.

Speaker 1 (02:26):
It's like, like, I guess I did turn it on
on one project and I was not that impressed. So
so tell me where this fits in and maybe where
I'm not using it to its full potential.

Speaker 3 (02:39):
Yeah, these are good pointers. So you touch an interesting point.
They're related and different. So it's suckles almost theim like
first about losing your job, Like you get to gut
that point really quickly. I think like, first of all,
the next few years. I'm not talking about fifteen years.
That's it's hard to predict, especially the future. So let's
focus on five years and we can we can go

(02:59):
to further and out. I do have like my opinion,
strong opinion about about the five next year. Forget about it.

Speaker 4 (03:05):
Sorry.

Speaker 3 (03:05):
Dario froment Tropic and Sam Altman cleaning. I don't know
half a year ago, one year ago to the twenty
twenty five. You don't need more developers. Ninety percent of
code generated by developers. Yea sweet bench if you know
this benchmark software engineering benchmark going to be ninety nine
by nine of the year.

Speaker 4 (03:20):
We're far from all of that.

Speaker 3 (03:22):
And now I claim that while they decrease their predictions
for twenty twenty six lower than twenty twenty five. I
think even there they're wrong. So but that's not doesn't
mean that the future is not going to be strong
right now. It all already is, and it's going to
be more and more AI empowered. And yes, at some
point there is going to be a flection point where
we're actually going to see end to end automation of

(03:44):
software certain aspect of software development. But we can talk
about it via a core review specifically about you know, Copata,
et cetera. We are actually good friend, Like we have
multiple clients that we are partnering together. They focus more
on code generation and agentic workflow out of or pushing
like get up issue into a PR much quicker. We're

(04:06):
more focused on how do you note that that you
can trust that line code? And I'll elaborate about that now.
Sometimes there are some you know, feature like think about
the cloud, like cloud have cloud observability tools, but still
you have the data dog off the world. And that's
a difference between like a few features that the cloud
have around observability to a full fledged platform that gives

(04:27):
you the confidence. So I won't go deeper on that
competition because it's actually quite a symbiotic like, the more
people use copilot, the more you need CODO in order
to trust the code, So it's actually good.

Speaker 4 (04:39):
But code review specifically, I think like some might.

Speaker 3 (04:43):
Think about a synonym for where you deal with software quality,
but actually difference like it is part of it. But
let's think about quote review the purpose. It's meaningful purposes
first like owning the code and learning two different very
close relate buckets, like as a team. First the developer

(05:04):
mostly single handed, like developing a feature or a sub feature.
And then there's that moment where the team take responsibility
if that person is going to be in vacation, who's
going to do root code analysis if something happens right,
So it's a moment where you you like, learn to
each other code, learn the software, and own it together,

(05:24):
even if an AI generated it completely one hundred percent
from one prompt to a PR which is rare. But
let's assume we're going to maybe talk about aspector and development,
et cetera. But let's say you did you respect. Yeah,
let's say you did write respect completely like perfect man,
and you've got a feature. Even then a human needs
to take over at least for the next few years
and take responsibility. So that's one one thing about Court

(05:45):
of View, and the second thing is that like until now,
and it's going to be in the next few years,
even if you do the best spec and the best
like work, you're still going to have AI that needs sorry,
code that needs to be reviewed. And there's a different
between a tool that is meant to help you review
and startulize quality to the one that is trying to

(06:07):
help you generate code. Same off, how there's a difference
between code of reserveability and thought. So to till the
r hey, don't worry, in my opinion about your job.
I would even recommend people as different than all other I.

Speaker 4 (06:20):
Think nine and a half the industry.

Speaker 3 (06:22):
Do go learn like you know computer science, but do
like work with AI as one of your main tools
throughout the STLC and code generation, Code of view will
cot analysis, generating your respect everything and you will just
alleviate together with with the profession, and that I think

(06:42):
my point about that.

Speaker 1 (06:44):
Yeah, I agree with a lot of what you said there,
and I'm going to back up through some of it,
and then I've got a couple of questions as we
go through. But one is you know, because yeah, you
kind of wove in the am I going to lose
my job?

Speaker 2 (06:56):
Or where where do I go with my job? Along
with hey, where.

Speaker 1 (06:58):
Does the code you stuff fit in? And and that's
where I have more questions. But I just want to
reiterate a couple of things. One is is that we're
not getting away from people having to be involved in
the process. Is what I heard you say. And so
to a certain degree, yeah, you may have tools that
take on certain part aspects of the job or you know,

(07:19):
do some of the things that are involved in you know,
understanding and figuring out what's going on. But at the
end of the day, yeah, you need a human that's
you know, that's going to take responsibility for this stuff
and and shepherd it through. And if you're looking to
enhance your career, you need you need to understand and
be able to use these tools because they do make

(07:42):
people more productive. And so the adoption of this stuff
is inevitable basically. And so those are the triitions and
the role right and the role of soffer interrupting of
software developer right now in five year is completely different.
I love the word that you shepherd, like like basically
you're going to deal a lot with writing spects, writing
your rules, writing your best practices, following down like navigating

(08:06):
your army of agents, specializing in how to deal with
specific problems that might.

Speaker 3 (08:12):
Come around with that. How everything I said is going
to be evolved, et cetera. So it's going to be
completely different, like higher level, more architecture, guardrails stuff. And
it's a process that happened throughout the last twenty thirty years.

Speaker 4 (08:26):
Right.

Speaker 3 (08:26):
We used to punch cards and write notes and assembly
and etcetera. So it evolved. So it's going to be different.
But there's a lot of things for a human to
take responsibility and ownership.

Speaker 1 (08:38):
Yeah, and wherever we end up, right, because I think
it's optimistic of anybody to say, well, this is really
where I think we're going to be even in two
or three years, because things are moving so quickly, right,
And you pointed out, you know, Sam Altman and some
of these other folks, they thought they knew where this
was going, and they just you know, it's impossible to
really predict it. But if you're on top of what's

(09:01):
going on today, then it's a whole lot easier to
adapt to whatever comes later. And then as far as
the like the AI run code reviews and things like that.
So you talked a little bit about somebody's going to
have to understand and maintain and take ownership of the code,

(09:23):
and a lot of that knowledge has passed through a
code review, right, So are you are you advocating that
you have a human reviewer and an AI reviewer or so? So, yeah,
how does this fit into the life cycle of my
features and things like that.

Speaker 3 (09:40):
Yeah, we'll talk about a bit about nostalogy just before,
Like you remember those days, like I think me and
you are like sorry for iman selling old model enough
to remember today's where we actually use books.

Speaker 4 (09:53):
Do you remember before.

Speaker 3 (09:54):
We went to Google or whatever was other websites, we
were actually using books to learn about things, et cetera.

Speaker 4 (10:00):
And I still use books.

Speaker 1 (10:03):
That's kind of my maybe I'm going to sound old,
but and mostly yeah, if it's technology, I do spend
a lot of time on the internet kind of picking
up the new things, right because the books get out
of date kind of fast, depending on how quickly technology moves.
But yeah, I prefer picking up the books and just
getting kind of the classic ideas that don't change it.

Speaker 2 (10:26):
Everything runs on.

Speaker 4 (10:27):
So so we are nostalgic about it.

Speaker 3 (10:29):
But if I have to force you, like choose one,
I think I know what you're going to choose. And
think about how much we delegate to Google or other
technologies on the Internet and interanet to that we're not
going to validate right now to the deepest like part
of it. So I think it's right now already like

(10:49):
a very human digital we call it I or not
like technology based like development already as it is today.
And I think like the same thing is going to
happen with court review, where actually if you have a
one hundred and we are seeing customers that have more
than that, like a set of guardrails, rules, standards, and

(11:10):
I can give you an example like having a human
reviewer doing that like all at once in the time
the limited time is actually problematic and if you can
free the time to focus on what you know where
could be the most useful for that courde review, it's
actually going to have better learning, especially if the tool
AI tools that are using cares about learning. So even

(11:32):
if it caught something, it's just like, oh I fix it.
It's rather hey, this is the reasoning, and that's what
we learned from it, and by the way, this is
how we changed our second brain like that. And so
let me give you an example, like there could be
that someone in the company really cares about you know,
eat your flags, and they didn't manage to tell everyone no.
And the five hundred developers seven thousand developer organizations, that's

(11:55):
what you care about. You can configure that in the
system and I system. And now if you developed a
new feature, you get an alert, Hey you didn't put
a feature flag. Oh show you shouldn't you know? It's
this is also learning, like it's like, hey, oh, the
company cares about feature flag. I have a one hundred
example like this, but just one, you know, very I

(12:17):
think famous, funny or scary or not funny. Min I'm
a developer. This is like I'm a developers. I wake
up in the morning, I have a slot forty minutes
to review a PR, a pull request. If you give
me five lines of code in that PR, I'll give
you fifty comments. If you give me fifty lines, I'll
give you five comments. If it's five hundred lines, looks

(12:37):
good to me? Right, Like, in forty minutes, how can
I five on lines of Code's that's a reality. And
by the way AI could even help us with classifying
like which review which pr?

Speaker 4 (12:49):
You can actually do it for forty minutes.

Speaker 3 (12:51):
That's how Also doctor works like they get an AI
tool that helps them prioritize, not necessarily tell them if
that person is whatever diagnostic, but just serve facing. Right,
So the opportunities are awesome. Actually, the problem is real
already existing around learning and catching issues. Tell me if
you didn't have a major issue, which developer did not
have a major issue?

Speaker 4 (13:12):
And production?

Speaker 3 (13:12):
I can tell a few horror stories myself, and so
the opportunity is not reducing the learning or increasing bugs,
is actually trying to to you know, get this better.
But we do need to write you xui and the
right mindset of the team. That's the building the dev tool.

Speaker 4 (13:28):
So obviously like.

Speaker 3 (13:29):
Sorry, shameless plug bug code, but it's not only us, okay,
like or not going to be the only one who
cares about it.

Speaker 2 (13:35):
Yeah, I think I think the how do I put it?

Speaker 4 (13:38):
So?

Speaker 1 (13:39):
Programmatic checking of your code has been around for a
long time, right, So you have the static analysis tools,
You've got the linters and stuff like that. The electronic
with those is that typically at least the ones that
I've seen they break your code down with the abstract
syntax tree and then they look for patterns.

Speaker 2 (13:58):
Is basically what they do.

Speaker 1 (13:59):
And the AI systems they kind of do that. But yeah,
you can be a lot more I guess, a lot
less prescriptive as far as what the patterns are, right,
and so you can train it on how to look
at code and what to look for, but you can
be a lot more broad and it'll figure out how
to do some of the stuff that you have to

(14:21):
explicitly tell the other the static analysis tools and the
linters how to do right. And so you can say,
like your feature flag example, right, you know it can
you know, I mean, depending on the training of the
system and you know how good the data set is
and things like that, and how good your prompt is.

Speaker 4 (14:37):
But you can teach it.

Speaker 1 (14:39):
Essentially, this is how you identify a new feature and right,
and then you have to make sure that there's a
feature flag around it. And it can figure out the
other steps, like it can infer what to do and
so on a lot of those things where you're saying
you set up a rule. Effectively, what you're saying is
is my code has to conform to these ideas, and

(15:00):
then it can with its latent space, you know, the
stuff that it's been trained on, anything else that you
add to the mix, with its context and things like that,
it can then go in and intelligently figure a lot
of that stuff out where trying to figure out how
to explicitly program it to say, here's how you find
something like this in the abstract syntax tree, or here's

(15:21):
how you break down this idea. Especially on you mentioned
like the five hundred line pr right, it's like across
all of these files and all of these changes. That's tricky.
But the LLM can consume it and figure it out
much much easier than you can figure out how to
explicitly program it for every case you're going to run into.

(15:42):
And it's not going to be perfect, but it can
do a lot better job, and it can do it
a lot more quickly and a lot more thoroughly.

Speaker 2 (15:49):
Than I can.

Speaker 1 (15:49):
Yeah, just kind of browsing through it and going, you know,
hoping my pattern matching brain goes, yeah, that's a problem,
and that's also a problem. And hey, there's so much
context in here that I'm just trying to understand that.
You know, I'm going to pick out all these little things.
But then when I go in as an experienced programmer
to do the code review, you know, I can see

(16:11):
what it caught and I can say, yeah, ninety nine
percent of this is great. You don't have to worry
about these couple of things that you know, it may
not be quite right on that or it's close, but
you should do this instead of that. And then the
other thing that I can see with it is that
there may be things that are just kind of aesthetic
or organizational or other things that we do that we

(16:36):
haven't codified into the rules where it's like, no, we
do things this way, not that way. And then you
can also go back and you can retrain your you know,
you can rewrite your prompt to include those on the
next one. But yeah, I can see where the LM's
looking at the code could do a much broader and
more nuanced analysis than you can get out of some

(16:58):
of these other tools and be more thorough than a
programmer doing it.

Speaker 3 (17:02):
I'd love to relate to the you know, traditional quote
unquote old world satical analysis versus the semantic AI. But
just just before that, about philosophy around like, how would
you how would you actually exploit l lams like AI
to work with rules or standard standards that are written down.

(17:25):
So it's for those who are watching the video I'm
actually having I have here like a black shirt that
it's really where you really I wear purple. And I'll
explain why, because I think the world's roughly sticking divide
into too and I'm relating to how you use rules
in the standard and mark down, et cetera. I'm going
to explain it's connected, I swear. So there's the blue

(17:48):
team and the Red team. Like principle at CODO, we
see yourself like more in the red with mixed. That's
one of the reasons we chose the purple. So I'll
explain the blue team, which we had like the Winter
of the Cursor of the World, et cetera. Right QUOD
code compile everyone Basically they take those rules you can

(18:08):
write them in different and they mostly put it as
part of their context, which in many cases is actually
listening to.

Speaker 4 (18:16):
But there's two problems. One I said many cases.

Speaker 3 (18:19):
The second is that maybe it took it as part
of the context. But as you usually it doesn't work.
You don't finish a feature like at least not a
meaningful feature from promptu code, even if it's an agent
that runs for ten minutes usually or one hour, and
how sometimes you usually like get to prompt it over like.

Speaker 4 (18:38):
Like navigate it.

Speaker 3 (18:39):
Then at your tenth you know, prompt it actually very
very probably like missed something from your spec or or
your rules or your even your own prompt. And the
second one that's a blue way of thinking, would they
prioritized like their KPI is like u x ui. We

(19:03):
admire on them on that and speed from prompt to code,
et cetera. The red team is how we take every
fact that is there as an intent they're functional or
non functional, and verify it. That's a totally different process.
So you take the list of your one hundred rules,
whether it's ten or or one thousand by the way,
and you and you check them with and write lam

(19:27):
to turn to really increase. I can't say one hundred percent,
but really ninety nine whatever and free is a chance
that they're actually actually checked. I just want to you
know that it's important to differentiate between different philosophies and
it actually leads to a different uxui, et cetera. And
that relates to the first topic of ASTs. I think
that they're actually very powerful tools like sonar for example.

(19:50):
I would definitely I have like high expectation of that company.

Speaker 1 (19:53):
And but yeah, they do a great job within the
set that they're you know, they're capable of doing.

Speaker 3 (19:57):
They do two and I think like eventually, if you
want to exploit AI, really explore, move fast with confidence,
you want that mix of you know LLLM AI empowered
code review, code quality with that static and it would
be the best if you can actually mix them together
and you would see integrations like we have integrations with

(20:20):
with those tools because they might catch in many cases
the same thing and you don't want that annoying like
double double reviewing double thing. So but but the basic
of the technology, you want this technology to to to
work together because otherwise you don't have a confidence to
really like, okay, there's one to clock code. In my

(20:40):
experience like you often I wouldn't say daily but weekly,
Like with one prompt, I can can get one thousand
line of code in five minutes being changed. Go review
that right and right and and now and now, like
you want the maximum help of AI also or sorry,
any technology to help you navigate to getting confidence about

(21:02):
that code quality, correctness, maintainability, et cetera. So I have
like a high expectation of these twols working working together,
being that hunter part the red versus the blue. So
you can get a purple as a dev organization.

Speaker 1 (21:19):
So I mean going into this, right, and I kind
of I mean, we may change the title when we publish,
but I kind of tongue incheek put the title is
can I trust AI generated code? And so now you're
talking about this other end of things right where it's
like you've got the copilots or cloud codes or Google
Gemini or chat GPT.

Speaker 2 (21:39):
Or whatever, right or whatever?

Speaker 1 (21:42):
Right, Yeah, and so you know they're using these models
to generate code, right and then yeah Kiro we did
an episode with Eric Hanchett where you know, in Kiro
it helps you generate the entire spec and then basically
it kind of iterates through step by step and does it.
It was funny because we did that episode. I'm sitting

(22:03):
there going, well, I've been using Cursor and vs Code
with Copilot and neither of them do it, and literally
a week later is showed up in Cursor. And so
so I've used that feature a handful of times and
it's really nice. The other thing that I'm going to
point out because you're like, yeah, you know you have
claud code generate you a thousand lines of code or whatever.

(22:23):
I was having a conversation yesterday with Obi Fernandez. He's
a Ruby developer, but I've seen the same thing with
from JavaScript friends, where you know, they use claud code
and they have it generate a bunch of the code.

Speaker 2 (22:37):
Right, and then they kind of review it on their own.

Speaker 1 (22:39):
And the thing that's he basically said because he used
to run a consultancy, one of the bigger consultancies out
there in the Ruby space, and he was saying, I
built this app in a month. That it would have
taken us six months and probably five hundred thousand dollars
of developer salary to build this, and I've done it

(23:01):
by myself in a month with AI help. And so
I'm looking at it and go, well, so to a
certain level, I guess the answer to can I trust
AI generated code is yeah, I mean to it because
it works, right, But yeah, so if you have the
AI generating the code, I mean, where do you run
into problems there? And then you know, especially with what

(23:24):
you're seeing, where you have this system that then reviews
the code. So if you have the LLM generate the
code and the LLM review the code, right, is that
kind of a circular dependency that does or doesn't work?
I mean, this is where I'm starting to get into. Okay,
you know, I've got these tools that are supposed to
empower me, but do they actually work nicely together to

(23:46):
give me that red team blue team work out?

Speaker 4 (23:49):
Yeah?

Speaker 1 (23:49):
And I guess the other concern is is you know,
am I actually moving faster or am I moving faster
with a gun into my foot.

Speaker 2 (23:56):
The whole time?

Speaker 4 (23:58):
Yeah, we heard a lot last point.

Speaker 3 (24:00):
I think, like short, what I'm mostly want to relate
about the main question off the topic of today, like
should you trust AI generated code? I do have an answer,
but I just say, like I do think the future
is already happening now, like pol people of a client
connecting CODO with cursor, CODO with COPAAD or or whatever,
and that's more.

Speaker 4 (24:21):
Yeah. I just want to like it, Okay, Yeah, I
just don't want to like to spend too much time
with that.

Speaker 3 (24:25):
I guess it's a lot of commercial today's, so it's
possible it's going through that direction.

Speaker 4 (24:30):
More and more.

Speaker 3 (24:31):
I do want to answer the question, but with a
metaphor before if I ask, do you you do you
trust human code? Wait, way, don't answer, let me give
you give you a second, but just think about a
for a second. Do you trust human the like code?
Think like, let's let's well think about a little bit. Sorry, no,

(24:53):
because because you're you know, it's a it's a good point.

Speaker 1 (24:56):
The flip side, though, is that because I don't know
that it's necessar do you trust human generated code? A
lot of it depends on the human right, and you know,
it's like, Okay, how much experience do they have? Have
they done this before?

Speaker 4 (25:08):
You know?

Speaker 1 (25:09):
Are they taking security and scaling and all the other
things into account?

Speaker 2 (25:14):
Right?

Speaker 1 (25:14):
But the other thing is is that I don't think
that that's the baseline that we have to look at
AI from because it's not whether or not I trust
human code and whether or not I trust AI code,
because at this point, the human generated code is the baseline.

Speaker 2 (25:31):
We've been doing that for years and years and years
and years.

Speaker 1 (25:33):
Is the AI code better or the other way you
look at it from a business standpoint, is is it
close enough? Given what I had to do to generate it,
which is usually a prompt and not nearly as much time. Right,
so maybe it's not as good as human code in
a number of ways, but it's good enough and it's.

Speaker 2 (25:55):
A lot cheaper, and so it may be worth it anyway.

Speaker 4 (25:59):
I'll use you on it.

Speaker 3 (26:00):
But still still bear of me with the metaphor, like
if I ask, and I'll borrow something that you said
to you to make my point. If you ask, like
do you trust human the developed generated code, written code
or coded code, then then I think that the minute
answer is like, yes, that's what we do. But then
you think about it a bit more like you just did,

(26:21):
and you're thinking, if I trust that person, even if
it's a senior developer, to write the code and noteped
plus plus on the airplane and then push it to production.
I'm not saying I never did that, but but but
but push a production. I think your answer is no,
And then you're what you are saying, like a senior
developer that is going through the processes that is required,

(26:44):
uh checking also for security reviewing for the standards and
all that and going through the review process and checking
the CI results and et cetera. You do trust if
you want generated code. That's my answer, Like you should
you trust AI generated code? Spit it out, you write
a you wrote it, prompt you via code, trust it. No,
I suggest you don't also do that. For the most

(27:06):
senior developer, you have a process, the process could be quick,
and I think that's what.

Speaker 4 (27:10):
We're going to see.

Speaker 3 (27:11):
Also for the AI generated code, you're going to see
AI being used to automate more and more and more parts,
and then AI generated code is going to look like
more more a generated code like AI generated software development,
AI software development, where AI is going to do a

(27:32):
proper process. Humans, by the way, in certain percent, which
in the beginning would be very big, very large, sorry,
are going to be involved throughout the process, and over
time the process is more like you know, like verifying
that the pipeline works, you know, like like we do
with the MANUF in a manufacturing like a lab. Right

(27:54):
like it started with human labor and slowly like more
and more and more automatic humans are still involved there,
et cetera. It took like fifty years to do that.
It will take also like fifteen twenty years. We've talked about.

Speaker 4 (28:07):
Predicting for totally being machines.

Speaker 3 (28:11):
Meanwhile, like I just invent in a number like fifteen years,
like twenty forty one just to have to guess, like
the four and one looks like AI, so I guess
it's a stayer for me. But until then, like we're
just going to see like more and more portioned being
automated and maybe some portion like automated end to end,
and at that point you will trust it because it's
not just like spitting statistically code and even if it's

(28:32):
trained really really well, but it's also going to validate itself,
verify itself going through the process, et cetera.

Speaker 4 (28:38):
And that's a future. That's a future we believe that.

Speaker 3 (28:40):
Code of right, that's where we focus on code quality,
codelorification and etc.

Speaker 2 (28:44):
Right.

Speaker 1 (28:45):
I think the point is well taken that because a
lot of people they conflate where things are with where
we're going to end up. And what you're saying is
is that, yeah, we're going to get better and better
tools to do more more things, and they're going to
manage more and more pieces of the process, right, And

(29:06):
and yeah, I think that that is absolutely true. It's
funny because for several years, you know, I've been using
rock or chat, GPT, or Claude. I've used all of
them for different things.

Speaker 2 (29:21):
Right.

Speaker 1 (29:21):
It's like, hey, I'm trying to explore this thing, right,
and so'll it'll give me all kinds of feedback on
you know, health or whatever, right, and so I kind
of use it as a coach or at least, you know,
and then sometimes I'll go fact check it or verify
this or that, or you know, refine whatever it gave me.
But you know, a lot of times it shortcuts a

(29:42):
whole bunch of research that I would have to do,
and so then I can just justify the pieces where
I'm like, it doesn't seem quite right, but it's it's
gotten more and more correct the longer, you know, the
longer we go, because the models get better, the data
that's you know, in that latent space gets better.

Speaker 2 (29:58):
And so that's I definitely see with software.

Speaker 1 (30:02):
Yeah, But as far as conflating where we are with
where you know, a few years ago, I think it
was just last year actually, I was having a conversation
with my father in law. Now, granted he's a general contractor,
and by general contractor, I mean like he fixes crap
in people's houses general contractor, and you know, and so
he just heard about like the goof ups in the
news where it was like, well, I heard somebody ask

(30:25):
chat GPT this thing and it told him this bogus thing.
And I'm like, yeah, dad, but we've moved like four
models ahead since then and it doesn't do that anymore.
And he's like, yeah, well, you just can't trust it
for anything, and you know, and again I'm looking at
him and going, well, actually, I use it all the
time for this other stuff because you can trust it, right,

(30:47):
But yeah, I go and fact check stuff. It's like,
you know, I think you have a bias here, and
so I'm going to go fact check these pieces because
I think I think the data that you were trained
on isn't one hundred percent in line with my worldview
or the way I think things are. But at the
same time, yeah, it's gotten way, way, way more accurate,
especially when it gets into a lot of the you know,

(31:08):
like easier for like meal planning and you know, hey,
I got to modify this workout or this or that,
and it you know, it's terrific because it's got all
that data in it. And I think that's where we
go with the software. Is My point is so even
where we see it kind of fall short. Is it
a much better and much more accurate, much more thorough

(31:30):
than I can be as a code review reviewer. And
it looks like the answer to that question is undoubtedly.

Speaker 4 (31:36):
Yes.

Speaker 1 (31:37):
It may give me some false positives and some false negatives,
but it's going to be more thorough and much faster,
and I can pick through that and it's still going
to save me a whole bunch of time and effort,
and it's only going to get better. Right, So where
we end up in a few years it may be
completely different, but it's almost certainly.

Speaker 2 (31:54):
Going to have better data to run on and make
the process better.

Speaker 3 (31:57):
I think, like I truly believe there is meaningful improvement
in the LLLM. Like some people over time claim that
it's like diminishing the velocity of right because it consumes
what it's been putting out. I've heard that for example,
for that reason, et cetera, I think the are meaningful improvement.
We have internal benchmarks around quality of code, et cetera.

(32:20):
It's it's it's going up. And having said that, I
think there's other reasons where like extracted value is bigger. First,
we use it in more areas the l ll ms
and more areas. But specifically I want to relate to
what you said. I think it's also we're learning how
to use it.

Speaker 4 (32:35):
So you know, like as a developer, you learn how
to Google, or you you learned how.

Speaker 3 (32:39):
To use stack overflow, but now I guess, but you
learn how to have a good Google. That's still very
relevant and I think like we are learning how to
you know, prompt or use it could be one of
the differences in between you and the front or family,
et cetera. And that's really actually to for example, what
we talked about w as Hero and Spectra and development

(32:59):
that like we we're learning that the better we the
more information should be concise and accurate. But the more
information we provide as part of the prompt if we're
talking about self development, like maybe spec in most cases, the.

Speaker 4 (33:12):
Better job it will do.

Speaker 3 (33:13):
By the way, I am like once in a while,
like giving a disclaimer or trying to be careful because
because there's a lot of research down for example and
Tropic are really good in it is that if you
think that you can push like as much as context
and instruction as you want and expected to really work well,
they're actually seeing it diminishing returns even like worse if
you give it like a spec of like a full book,

(33:35):
even if the context is bigger. But putting that aside,
it is a really good idea to.

Speaker 4 (33:40):
Learn how to use these tools.

Speaker 3 (33:41):
And I think we're actually consciously and consciously like like
like doing doing that and and for software development specifically,
and I think even JavaScript, where you know the language
is maybe not that descriptive, et cetera. Like having a
proper spec is a good idea. And although I have
to say I'm not a big believer that spectrum of

(34:02):
development is going to be the last thing that survived.
It's going to be the biggest thing that actually make
the difference. Is an important concept, but it's not going
to be the what's solved everything.

Speaker 2 (34:13):
No, I think.

Speaker 1 (34:14):
And I'm just going to piggyback on what you're saying
because I think you're correct in one way how we
use the tool, right, and so you know, spec druven
development is one way that we've you know, this is
a new way to use.

Speaker 2 (34:26):
The LM and you know, maybe have a wider.

Speaker 1 (34:30):
Context on what it's doing and give it a step
by step cohesive plan. But yeah, I don't think that's
where we end up. I mean, we're going to invent
other ways of using these tools, and this may be
a stepping stone to something else. The other thing, though,
is that it's not just for me. Hey, we're getting
better at using these tools, but also as it takes
things off of our plate, we're able to refine in

(34:52):
other areas, and I think those get better because the
next versions of the models pick up some of those
changes to the way we do things outside of how
we use the LLM and make it better that way too.
And so at some point does it kind of you know,
are we getting smaller increments of value?

Speaker 2 (35:12):
Maybe?

Speaker 1 (35:13):
But again I I just see the ingenuity of people
as we go continue to just be really cool and awesome,
and so for the time being, we just see these
astronomical leaps every time we get a major version update
on these llms.

Speaker 3 (35:30):
Yeah. By the way, I think once upon a time.
My background is the machine learning since two thousand and six, oh, okay,
annual networks in twenty ten, so I allow myself talk
about a history. I think once upon a time until
roughly speaking GPTs three three point five. Every time you

(35:54):
train them all like really like nine nine percent of
the cases. You try it to be better than the
others in a specific niche, right, even if it's a
big niche, still GPT three point five, I think we
had like a year or two or more that we
were under assumption level matters. You're not the market that wow, Like,

(36:15):
look at this GPT three point five winning every benchmark,
even human benchmark if you remember those graphs, amazing graph
on opening eye exacts as like different professions like from
lawyer stuff to history and the waxes like the percentile
on their official tests, and like GPT three point five
like cross every model on all of these like fifteen

(36:37):
different professions, and then GBT four the same, et cetera.
At this point, it's not the case anymore. I think
since roughly speaking, so that's three point five if you're
familiar from on Tropic, et cetera, then and I think
that model was suddenly better. Some claim much better than
GPTs on coding, but probably not at all.

Speaker 4 (36:58):
Quite a few cases.

Speaker 3 (37:00):
Now, like there was a moment where seeing people thought
only open air on Tropic or whatever, Google are going
to generate like foundation models, but I'm seeing like dedicated
foundation models and in health medicine, customer success, et cetera.

Speaker 4 (37:14):
And and like we do do see like GPT.

Speaker 3 (37:17):
Five for example, and maybe there's new new versions the
the are coming that are that are better on specific
aspect of software development, like specific in software develop right,
So so like there is evidence that that is it's incremental,
and I think that's actually somewhat meaning that we're not maturing.

(37:37):
And I'm not sure if that's what people thought. Aren't
going to say, like we're maturing. Okay, let's get to
let's get to those EDU cases. Let's get Okay, we
probably need a specific LLM and a specific agent for
specific quality measure that we want to track or help with,
et cetera. So I think I think like it doesn't like,
it doesn't mean that the whole solution doesn't keep evolving

(38:00):
upwards right in the same or even bigger speed, you know,
like the same.

Speaker 4 (38:05):
As uh uh.

Speaker 3 (38:07):
And book we talked about book like carts will that
similarity is near. We're seeing like this is a notion
that we're seeing like exponential growth and technology. But if
you're zooming in, then you're seeing like skurve and and
the thing is that each escort the time. The difference
between that each escort just slower and smaller and smaller, right,

(38:28):
and if you zoom out, it looks like exponential. So like, yeah,
maybe that specific GPT sorry, specific LLM attention architecture and
specific training, et cetera, is slightly like the low hanging
fruits are over, but we are we will. We are
seeing in the igentic world and and other like technologies

(38:50):
more breakthroughs and I can mention more and and overall
like the we're going to see AI like keep keep
going upwards and so so I wouldn't like, like say,
you know, like incremental LLM the solution we're getting, it's
going to get better and better and we should adopt it.

Speaker 1 (39:07):
Yeah. All right, Well we're getting toward the end of
our time. I hate to cut this short because I
could sit here and talk about this forever. But yeah,
so we're gonna just roll into our picks. I've got
like five minutes before my work, Yeah, my work anyway,
so I'm going to jump in. I have a work meeting,
so I'm going to jump in and move to picks.

Speaker 2 (39:29):
Now.

Speaker 1 (39:29):
Picks are just shout outs about whatever it is that
we've been up to and enjoying lately. So the first
pick that I have is, so on Friday, I'm going
to be teaching board games. I do this periodically. Hang on,
So I yeah, I'm teaching board games at a board
game conference, and I've picked most of the games we're teaching.

(39:53):
The one game I haven't picked is well, there are
two of them. One of them I'm learning tonight and
the other one earned last week. And this one's called
far Away. It has a board game weight on board
game Geek of one point nine to one, which means
that it's fairly approachable for the average board game player.

(40:14):
And so what it is is you how do I explain?
It's it's mostly cards. So you're playing cards in front
of you, and you have like eight slots, and so
you play your first slot and then your second slot,
and then your third slot. But when you score it,
you score it back the other way. And so the
last card you put down is the first one you
played to score, and it's it's available for scoring on

(40:39):
all the other cards that you played, and then you
flip over the next to last card that you played
and you score it against the two cards that you
have down.

Speaker 2 (40:49):
If you play cards. So let's say you play.

Speaker 1 (40:51):
The the twelve, and then you play the fifteen after
the twelve, then you also get some other cards. I
can't remember what they're called, but those ones count through
the whole scoring process and anyway, so you just kind
of build up this deck and then you score it
back up the other way it is. It was really fun.
I think it took us like a half hour. There

(41:12):
were four of us playing. It says that you can
play it with ten ages ten plus. You probably can
if you're going to play competitively, as far as like, hey,
you know, I'm stacking all these cards up so that
all the resources on the earlier cards play nicely on
the later cards. A ten year old might struggle with,

(41:33):
you know, planning ahead that far figuring out what to do,
but they definitely play the game right, and enough of
it is common sense enough to where they can probably
at least wrangle their way through a lot of it.
So so yeah, so I'm going to pick that it's
called far Away on the board game.

Speaker 2 (41:50):
Pick, and then let's see other picks.

Speaker 1 (41:54):
So I think I might have I think I might
have picked this last time, but I'm going to just
pick it again there's a movie that came out. It's
called Truth and Treason by Angel Studios. My wife and
I are members of Angel Guild, so we pay every
month to be part of that. We get to vote
on the movies that they make, and it's also part
of our subscription to the Angel app that Angel where

(42:18):
you know, you can watch videos and you can say,
I don't want any of this kind of profanity or
any of this kind of content, right, so it'll cut
all the sex scenes out of your movies and stuff
like that. But so we wind up getting tickets as
part of our Angel Guild membership to all of these
movies when they come out to theater. So this one
is a World War II film. It's the story of
three young men become disaffected with the Nazi regime during

(42:43):
World War Two after their Jewish friend gets disappeared by
the SS and so they start distributing leaflets by putting
them in mailboxes and stuff and on cars and things
around Hamburg and they get caught, and the movies about
them and you know what happened to them, and so anyway,

(43:04):
it was it was really really good. One of the
things I like about these kinds of movies. I mean,
it's it's a sad story, you know, in the end,
you know the way, the way that it all goes
for them. But it's like, look, you know, how how
willing are you to stand up for what's right and
what's true?

Speaker 2 (43:19):
And I think in.

Speaker 1 (43:20):
Today's world, in certain parts of the world, yeah, you
may be risking your life, you know where I live. Yeah,
I guess they do kill people for that because it
killed Charlie Kirk. But I don't feel like somebody's going
to kill me for standing up. But I've had people
come after, you know, my reputation and things for things
that I've said. But again it's down to how, you know,

(43:43):
are you willing to stand up for truth?

Speaker 2 (43:45):
Are you willing to.

Speaker 1 (43:48):
You know, do the right thing even if it costs you.
So anyway, it's called truth and treason. I don't know
if it's still in theaters or not. I think it
still is probably until like Thanksgiving, So yeah, definitely worth seeing. Terrific,
terrific film. And then my wife, My wife and I
are still playing Jaws of the Lion, which is one
of the gloom Haven board game setups, so and it's

(44:11):
basically without a Dungeon Master, self directed D and D
kind of game, and so you know, uh, anyway, the
difference being that you're it's not as free form. You
actually have cards to give you your abilities and so
you play the cards. Yeah, anyway, very fun. So I'm
gonna pick that as well tomorrow.

Speaker 4 (44:32):
What are yours? Okay?

Speaker 3 (44:34):
Like, I just saw the movie Good Fortune, and uh,
the first time I'm seeing a movie in a in
a theater in New York. I just relocated here, and
I felt that there was a somewhat funny mix of uh,
you know, today's a tech tech bro you know thing,

(44:59):
just this cushion related to together with like all the
apps that we're using daily and how how like people influence?
How are these apps that are designed by supposedly tech
bros Our influence day to day? Kenor is like playing
there as like an angel I don't give like ten

(45:21):
out of ten, but coming from the tech industry, et cetera,
I thought like it's an interesting movie also to see
a bit a little bit how like other people like
see our industry, et cetera.

Speaker 4 (45:33):
So I definitely like.

Speaker 2 (45:38):
Very cool.

Speaker 1 (45:39):
All right, Well, one last thing and then I have
to jump off for a work media, I'm already late
for if people want to check in see what you're
working on.

Speaker 2 (45:48):
Check out codo. Where do people go for any of
that stuff?

Speaker 4 (45:51):
Yeah, totally so first of all qodeo, dot ai.

Speaker 3 (45:54):
From there, we have like everything we are at social
obviously as well personally, I a mar underscore. Mar is
my handler at x Twitter, and we have multiple open sources.
We're actually going to contribute some of them to one
of the open source foundations.

Speaker 4 (46:11):
Still learning which one is the best one.

Speaker 3 (46:14):
So for example, we have like a pull request code
review agent, so you can find it.

Speaker 4 (46:19):
It's called pr agent.

Speaker 3 (46:21):
It's very different than our main product, by the way,
very very different, but it is part of like our
collaboration with the community. So this is a bunch of
ways to reach out and we love like hearing the
community code reviews subjective quality is subjective, while we do
need to standardize that. People think about it differently, so
hearing everyone, please reach out and anything and we'll be

(46:43):
in touch.

Speaker 2 (46:44):
All right, cool, Well, thanks for coming. This was fun.

Speaker 4 (46:47):
Yeah, same here. I really loved it.

Speaker 2 (46:49):
All right, folks, we'll wrap it here till next time.

Speaker 4 (46:52):
Max Out

All Episodes

Can You Really Trust AI-Generated Code? - JSJ 699

Episode Transcript

Popular Podcasts

Stuff You Should Know

Crime Junkie

NFL Daily with Gregg Rosenthal

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Can You Really Trust AI-Generated Code? - JSJ 699