All Episodes

October 29, 2025 64 mins
How do you build quality software with LLMs? Carl and Richard talk to Den Delimarsky about the GitHub Spec Kit, which uses specifications to help LLMs generate code for you. Den discusses the iterative process of refining specifications to produce better code, and then being able to add your own code without disrupting the process. The conversation delves into this new style of software development, utilizing specifications to break down tasks sufficiently for LLMs to be successful, and explores the limitations that exist today.
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:01):
How'd you like to listen to dot net rocks with
no ads? Easy? Become a patron For just five dollars
a month, you get access to a private RSS feed
where all the shows have no ads. Twenty dollars a
month will get you that and a special dot net
Rocks patron mug. Sign up now at Patreon dot dot
NetRocks dot com. Hey, welcome back to dot net rocks.

(00:36):
I'm Carl Franklin and A'mirchrid Campbell, and we're recording this
right before we leave for dev Intersection in Orlando. But
you are not home, sir.

Speaker 2 (00:44):
No, No, It'll all have been over by then. It'll
almost be, you know, Halloween, so right, I'll be in
the Netherlands when this publishes, so something.

Speaker 1 (00:53):
But right now, you're not at home, are you?

Speaker 2 (00:55):
I am, Yeah, I'm leaving. Oh you are. Literally we're
recording a show and then I'm packing up to head
to the mainland tomorrow. Okay, see you in Orlando, dude.

Speaker 1 (01:02):
All right, Okay, I thought you were on a trip before.

Speaker 2 (01:06):
Maybe you were.

Speaker 1 (01:06):
I don't know. I can't keep track of you. I
don't even want to.

Speaker 2 (01:09):
Yeah, me neither. I don't even know where I am,
Wait a minute, are you?

Speaker 1 (01:13):
Who are you? What'd you do with Richard Mind?

Speaker 2 (01:15):
I'm just another AI.

Speaker 1 (01:16):
Yeah right, uh well, let's you know, since this is
episode nineteen seventy four, let's talk about what happened that year,
shall we before we do anything else.

Speaker 2 (01:25):
Oh, it's an exciting year. It's an exciting year. Nixon.
Bye bye, Where do you want to start?

Speaker 1 (01:30):
Bye bye Nixon. Yeah, he's not a crook, but he
is resigning.

Speaker 2 (01:33):
Are you sure? I don't know, I sound so sure?

Speaker 1 (01:35):
Well he said so, yeah, okay, And you know, after
that day, we didn't have Dick Nixon to kick around anymore.
So I think he did for a while longer, but yeah,
oh yeah, yeah, as long as I miss was alive,
he was getting kicked on a daily basis.

Speaker 2 (01:50):
Yeah.

Speaker 1 (01:51):
The Turkish military invaded Cyprus following a coup, leading to
a division of the island and increased international tensions.

Speaker 2 (02:00):
Which still persists this day. That division in Cyprus.

Speaker 1 (02:03):
The Carnation Revolution occurred in Portugal. That's a coup resulting
in the overthrow of the authoritarian regime. So the FIFA
World Cup took place in West Germany, where the host
nation defeated the Netherlands two to one to win the championship.
Just in case you were wondering, the boxing match known
as the Rumble in the Jungle featured Muhammad Ali defeating

(02:26):
George Foreman in Zaire. Ali would go forward to have
a great after career life in George Foreman sold grills.
He did a lot more than that. He had seven kids,
all named George. The Swedish pop group Abba won the
Eurovision Song Contest with their song Waterloo, making the start

(02:49):
of their successful career. Have you seen the Eurovision Song
Contest recently, Richard?

Speaker 2 (02:54):
Not recently, No, but I'm aware of it. You don't
want to, okay.

Speaker 1 (02:59):
Some significant the film's release include The Godfather Part two, Chinatown,
and Blazing Saddles, all movies that probably couldn't be made
today in my opinion. I'm going to talk about the
Terracotta Warriors. The first of those were discovering Jiang, China, unbelievable,
leading to significant archaeological finds. What do you know about

(03:20):
the terracotta warriors, Richard.

Speaker 2 (03:22):
I've been there, but just you know, I know that
once they were exposed, they started to decay, like a
lot of the paint was still on them. When they
first were found, but.

Speaker 1 (03:30):
They were made to with the with the emperor, I
suppose in the grave to protect him, right.

Speaker 2 (03:37):
Yeah, that's the that's the idea. Like nobody really knows
for sure, but the Yeah, you'd have to see this thing.
It's unbelievably vast. Yeah, and the crazy part is, like
archaeology is not a huge thing in China, They've only
done a little. There's so much more to be found there.
And this is a culture that's been around literally for millennia. Yeah,
you're talking these things are what two thousand or so

(03:59):
years old, but there's so much more.

Speaker 1 (04:01):
I think the crazy part is that people actually built
thousands of terracotta warriors and stuffed them in a grave. Yeah,
that's kind of crazy.

Speaker 2 (04:07):
It speaks to the wealth of that country at that time,
right that they could afford to do such just like
building pyramids. Like, clearly you have too much money, right
if you can build these things. But that, yeah, it's
another example of that.

Speaker 1 (04:20):
So what happened in science and Tech and seventy four Richard.

Speaker 2 (04:24):
Not a big space year. Mariner ten made it to Mercury,
the first probe to ever do that, actually do two
flybys of Mercury, and that would be the only visit
to Mercury in the twentieth century. That would be the
Messenger spacecraft in the twenty first century. And that's about it.
There's a couple of soused flights to sal you there's
the plans for the Apollo Soy use missions, but they

(04:44):
won't happen until the next year, so not a big
year in space.

Speaker 1 (04:48):
And like Venus, Mercury is not a very hospitable planet.

Speaker 2 (04:51):
Well, it's awfully close to the Sun. It's very very hot,
no atmosphere at all. Yeah, but still interesting, you know,
and certainly something so we're studying further tricky to get
to takes a lot of energy to get there.

Speaker 1 (05:04):
Do we actually get data from it before it burned up?
Oh yeah, No, it's not that close. Was able to
orbit around fly bys Mercury. You know, going into orbit
around Mercury is very challenging. We won't do that for
quite some time.

Speaker 2 (05:16):
The first reporting on the destruction of the ozone layer
nineteen seventy four and will lead in a few years
to the thing called the Montreal Accords. Well, we'll band
CFC's and ultimate allow the ozone layer to repair itself
a successful story and turns out it worked. It worked.

Speaker 1 (05:32):
Yeah, I'm giving away the ending. We can do those
kinds of things, people, We can. We absolutely can. And
I use that as an example when we talk about the
current crisses. The first edition of Dungeons and Dragons is published.
Gary Gygax and Dave Arnison from TSR derived from an
earlier game played with miniatures and went.

Speaker 2 (05:53):
On to make geeks even geek here for many, many years.
Oh yeah, but and I would say this is by
far from in our part of the world. The most
important thing happened in seventy four. Right at the end
of nineteen seventy four, the mits altear eighty eight hundred.
So this was not the only personal computer at the time.
There were kits you could buy that would just be

(06:14):
give you and you'd have to put together the parts
and so forth. This was the first assembled machine based
on the Intel eighty eighty. The design would largely lead
to the S one hundred bus machine. In general, you
could only had to set at LEDs on the front end.
You would program it with switches, or you could add
a serial card to it to have a teletype interface.
Otherwise just not display no keyboard under that stuff. Yeah,

(06:34):
and it was popular electronics. Is this magazine that was
actually driving One of their rival magazines had published this
kit on their cover, and they wanted to make sure
they had something as well. So they were pushing and
my admits to get this thing done in time. And
so nominally it was on the cover of the January
nineteen seventy five edition, but it was actually put on

(06:56):
new stands in December seventy four, so I included the
seventy four And why is it important? This is Microsoft's
founding product. That cover is the thing that got Gates
to drop out of Harvard and moved to Albuquerque and
write a version of Basic would later be known as
the Altar Basic, which was Microsoft's very first.

Speaker 1 (07:16):
Product, and he wrote it with the Altair.

Speaker 2 (07:18):
He used the alt He actually wrote it on a
different machine because he didn't have one. He wrote it
on a machine at Harvard, a lot of it by
hand and by hand figured it all out because they
had to run in four K, and then brought it
down there and coated it in to test it. But
the first time he already had it written before the
first time he actually laid hand on an altar because
there wasn't any of them, and the name Altar actually

(07:39):
comes from the star Alta, although there's an apocryphal or
adjacent story of the fact that around that same time,
in the late seventies, the television show Star Trek had
an episode on a mock time where they go to
Altair six, and since it was a star of a technology,
they ought to have a cool name, not just a number,
so they called it.

Speaker 1 (07:59):
So I've seen pick of the Altear and you're right.
It has like switches where you can turns the zeros
into the ones on and then press the button to
write it into memory. It kind of reminds me of
a pressure cooker, you know, like, well, it's just a box, right,
it's a issue here is it's turning over. It's before
RAM exists. Yeah, so you had to hand key the bootloader. Wow,

(08:21):
it was like one hundred and seventeen bytes something like that.
So you literally so booed up was a process. Was
literally once you got that thing running, you didn't want
to turn it off. Power goes out.

Speaker 2 (08:31):
No, Yeah, and this is they weren't even using cassette
tapes at this point. The first versions of Basic were
on this paper tape with holes punched in it that
they that the thing could read in to load up
the version of Basic. It was a very small version
of Basic because there's only four k RAM and you
still have to run your program too, right, So.

Speaker 1 (08:50):
They really only had mainframes before that, right, main four
or less.

Speaker 2 (08:53):
And many the PDPs are around, right, the mini computers,
but they're all.

Speaker 1 (08:57):
Using cards, so it makes sense that they would be
using that technology.

Speaker 2 (09:02):
He uses paper tape. And there's no copy of the
original ALTA Basic. I think it exists anywhere. There is
a version of the advanced version of that Basic in
the Computer History Museum in the Silicon Valley, signed by Gates,
and I think it was Gates's actual copy of it.

Speaker 1 (09:20):
Wow.

Speaker 2 (09:20):
And so yeah, yeah, the original original it starts the
whole company.

Speaker 1 (09:25):
I think it's gone. That's pretty crazy. Yeah, so the
history of dot net starts in nineteen seventy four, doesn't.

Speaker 2 (09:30):
Theoretically, I mean it's beginning of Microsoft. But yeah, yeah,
all right in Albuquerque. Wow, good one.

Speaker 1 (09:38):
So let's start then next with better no framework?

Speaker 2 (09:41):
Awesome?

Speaker 1 (09:49):
Al what you got? Do you use bitwarden? So do?
I I love Bitwarden? Did you know that you can
run it on prem Yeah.

Speaker 2 (09:56):
Yeah, the open source product, right as it should be.

Speaker 1 (09:58):
Yeah, I didn't know this, and then I found the
Bitwarden server get repository and apparently you know, so, like
I said, I didn't know it. I thought it was
a product that you subscribe to, and it is. But
if you want to run it yourself in.

Speaker 2 (10:12):
There, you can use it for free or you can
pay them like.

Speaker 1 (10:15):
You and so what's really cool about this is that
you know, hey, maybe we don't even want to trust
bitwarden dot com with our passwords. Maybe we want to
you know, keep all our data behind the firewall and
wherever we want to put it, and you know, to
be more secure. I mean, whether it is or not,

(10:35):
I don't know, but at least that's an option. So
you can download the server. It has all the APIs
and everything that you need to run it. You can
even run it on a Windows machine. You can run
it in Docker containers on a Windows machine if you
want to. Of course, Linux and Mac it's all there.
It's really really cool and.

Speaker 2 (10:54):
If we have made it clear, it's a password manager. Yes,
you know, you can buy commercial products like one passes
and so forth. But This is also one I've fallen
on too. I had to leave last pass when last
pass went insane. Me too, Yeah, which, by the way,
I found migrating off the last pass less painful than
I thought. Yep, I was dreading it. But the only

(11:16):
real problem with a migrating from between password managers is
at some point you're going to generate a CSV file
of every single password in your life, and that will
be upsetting. Well, the real way to do it is
to just find the list of domains that you have
passwords for in last past and go to each one
of them and say, hey, I forgot my password, and
go through the whole process of generation. Yeah. I mean,

(11:38):
it's a terrible thing to do if you've got hundreds
of them, but you know, in the reality is you
can migrate your passwords over the other one, but you
should probably change me after that and do yourself a
favor delete that file when you're done. Absolutely, that file
shouldn't exist. That files bad. What file? Yeah, and I'm
super keen to get to pass keys. I just think
passkey's still a half baked so.

Speaker 1 (11:58):
Yeah, they're not quite there yet. I like one global
authenticator app would be really good.

Speaker 2 (12:03):
For me. Yeah, I don't know.

Speaker 1 (12:04):
Not nine authenticator apps, not ten?

Speaker 2 (12:07):
Good luck with that.

Speaker 1 (12:08):
Just one is all I want.

Speaker 2 (12:09):
I think there's a I's pretty sure there's an SKCD
cartoon about this. It's like, we need a standard for everything.
Now there's ten. Yay, Yeah, that's right. Remember scuzzy SCSI. Yeah,
wasn't one of the SA's standing for standard. Yes, that's yeah.
You know, how would that work out? No, there are
no standards. Everybody fights against them.

Speaker 1 (12:30):
All right, Well that's what I got. Richard, who's talking to.

Speaker 2 (12:32):
Us today, jumped into the wayback machine because we haven't
talked about stuff like specification driven development in a long time,
but we have talked about it. And I grabbed a
comment off a show nine to twenty one. Wow, this
is when we talked about the product Cucumber working with
behavior driven development Matt Wins show back in twenty thirteen.

(12:52):
We've got a ton of comments on that show. Not surprising. Yeah,
and gap has a Utz had this commed. He said, well,
I had no real world experience with to driven development
until now I've to use it on a new, relatively
small project and we're now working on I'm really I'm
trying to get to the place is what he's talking about,
where you have an executical specification. And I just thought

(13:15):
this comment was really relevant because we think about what
we're doing with prompting with lllms these days, as could
almost be an executable specification rather than having a separate document.
It's supposed to document what the system does, but it's
always out of sync, like what's the what do we
make rules around all of this? But it would be
cool to actually get on these large projects that we

(13:37):
have a set of documentation that in this game were
talking about. Cucumber generated the test to say does the
software does what it's supposed to do? But I, you know, now,
thinking ten years on looking at gatus as common, I'm like, wow,
look where we are right now with the next generation
of tools and another way of thinking about how we

(13:58):
write a specification about an application and then how that
specification turns into executable code.

Speaker 1 (14:04):
Cucumber was like all use cases, right, yeah, Like as
a user, I want to be able to blay blah
blah blah, and yeah, yeah, very cool.

Speaker 2 (14:14):
Yeah, it was acceptance testing, right, You're supposed to be
able to write it in English, Like I don't know.
I'll include a link to cucumber. Folks want to take
a look at it. It would take a minute for
me to go back and look at this now, just
thinking about what lms have done to this whole space. Sure, anyway, Gattis,
thank you so much for your comment, and a copy
of music code By is on its way to you.
And if you'd like a copy of music cobe I,

(14:34):
write a comment on the website at dot NetRocks dot
com or on the Facebook. We publish every show there,
and if you comment there and I read in the show,
I'll send you copy of music cobe.

Speaker 1 (14:41):
I was just using music to code By to prepare
for my talks nice next week, and it's still going strong.
Twenty two tracks, twenty five minutes long, designed for the
Pomodoro technique, but not necessarily. You don't have to have
a clock.

Speaker 2 (14:58):
You know.

Speaker 1 (14:58):
They're neither boring or too exciting. They're just in that
middle space. They're just too right. Get you into a
state of flow and keep you there, okay. Den de
la Marski is our guest. He is a principal product
engineer currently working at Microsoft, where he helps build developer
tools in AI powered experiences that make engineers more productive.

(15:22):
Den started his engineering journey all the way back in
the nineties.

Speaker 2 (15:26):
Oh god, all the way back, all the way back,
a long time ago.

Speaker 1 (15:30):
All the way back in the nineties.

Speaker 3 (15:33):
I did I tell you that I listened to your
podcast when I was in middle school?

Speaker 2 (15:38):
Oh my god.

Speaker 1 (15:39):
Yeah, excuse me, take some geritol right now, Hang on
a second.

Speaker 2 (15:45):
Yeah, yeah, all right.

Speaker 1 (15:46):
So in the nineties with the three eighty six box
that was verily enough to run Doss games. He's upgraded
since then. He has spent plenty of time writing code
and visual basics six oh, but for more than two
decades now he's writing see sharp and Python and there's more. Dan, welcome,
thank you, so happy to be here. I'm such a

(16:06):
fan of Dunn at Rocks.

Speaker 2 (16:08):
That's great. Wow.

Speaker 1 (16:09):
Well, after reading your bio, I'm a fan of yours
TB six. I mean, I was there, we were there.
I missed those days.

Speaker 3 (16:16):
I missed the days of just me like popping an ID,
using the forum, designers, dropping some progress bars on clicking
the buttons and things are.

Speaker 1 (16:24):
Just working well and not having to sign into an app.

Speaker 2 (16:28):
Nice and not having to sign in Ye, that was
a product that was made fun of. You know, it
wasn't real programming back in the day, and we were
crying all the way to the bank as we knocked
out the things that our customers actually needed in less time.

Speaker 1 (16:45):
I was there at BB one. Oh, I was there
and quick Basic com pts Before.

Speaker 2 (16:51):
That, Yeah, yeah, no, I spent I did my time
in the in MFC crashing windows repeatedly and wondering what
I was doing wrong. VB showed out like this is better.

Speaker 1 (17:03):
What you do is you find yourself an MFCC plus
plus programmer and then you just partner with them to
do the hard stuff.

Speaker 2 (17:10):
Yeah.

Speaker 3 (17:11):
I gotta say, I just don't know that there was
not that much stuff that was that hard. Like I
love VB six, but you know, the one thing that
just kind of annoyed me about it is when Windows
XP came out and it had all these super nice
styles and everything, and just VV six was constantly stuck
in the Windows ninety five and I was like, oh,
come on, man, Like, can I just get the nice
progress bar, the nice button with the shadow.

Speaker 1 (17:31):
And you could if you went third party. They all
adapted their stuff. Yeah, they had read through everything, but.

Speaker 2 (17:36):
I was broke and I had no money, I couldn't
get that party. Well, and we didn't know that Microsoft
was changing gears, right, like, yeah, that there wasn't going
to be a VB seven right, Yeah.

Speaker 1 (17:47):
Those were the days days of drama.

Speaker 2 (17:49):
Yeah. So but that's talk to me about spec driven development,
den What have you been up to here? What is
this all? Right?

Speaker 3 (17:56):
So folks might have seen that about a month ago,
we work with our friends at get hub to release
this thing called spec Kit. So if you go to
get up dot com slash, get up slash spec dash Kit.
Spec Kit is nothing super magical. It's basically a set
of templates, prompts, and scripts. And the whole purpose for

(18:16):
that is that when we talk to folks that try
to build software llms, like, there's a large swath of
folks that are just very excited about the fact that, oh,
I don't have to write boilerplate code, I don't have
to write my tests from scratch because I can just
you know, use quad code to do that stuff. So
as we talk to more of these customers, especially customers

(18:38):
and like bigger companies, customers that want to build something
that is going beyond what we refer to as vibe coding. Oh,
I'm building a podcast landing page, like, oh, yeah, you
can vibe code that in the day. If you're building
a i don't know, like a CRM or like a
plugin for a CRM, all of a sudden you're in
the world of hurt because you're getting into this state
of now I need to go and just constantly prompt

(19:02):
for changes, and then things went off the rails and
I need to go back somehow and then redo it.
But now I don't have any history. So for that
we thought, okay, we have an idea for an experiment.
What if all of this stuff can be encoded into specs?
And this is idea as old as time, as kind

(19:23):
of Richard called out like spectrum development and the idea
of writing specs as the underlying kind of the layer
of software is not exactly new like it existed for
some time. This is more of a it's a new
incarnation of it and in this context, and by the way,
a lot of this was also driven by one of
my colleagues, John Lamb, who was inspired.

Speaker 1 (19:44):
Yeah.

Speaker 3 (19:45):
John Lamb is great from the Iron Python days. For
folks that don't know, Yeah.

Speaker 1 (19:51):
From a developmentor before that, I think yes.

Speaker 2 (19:54):
Yeah.

Speaker 3 (19:55):
So John was somewhat frustrated by the fact that Claude's Sonnet,
the model from Anthropic, was very over eager.

Speaker 2 (20:04):
When you'd give it coding test.

Speaker 3 (20:05):
You'd be like, oh, can you like I'm building this
new components, like oh, let me write this component, let
me write all to the framework for this component and
everything around it.

Speaker 1 (20:13):
Yeah.

Speaker 3 (20:13):
So the idea was like, well, can we put guardrails
around it? So make sure that, like if I ask
you to do this, you do just that. So this
is what essentially bootstrapped spec Kit. This was the foundational pieces.
It's like, how do we make the model a little
bit more contained if you will to do the things
that we want. And as we started kind of exploring this.

Speaker 1 (20:34):
Couldn't you write a system prompt for Claude just to say, hey,
state your lane and don't do anything I don't tell
you to do.

Speaker 3 (20:39):
You could, but then you're just always in this like
cat and mouse game, right, and you're writing system prompts.
So and by the way, it's also like if you
switch models all of a sudden.

Speaker 2 (20:49):
Like what what then? Like do I write a system prompt? So?

Speaker 1 (20:52):
Okay?

Speaker 3 (20:53):
Yeah, And also we live in a world where developer
switch models, like every week there is I'm sure there's
a model came out. Well, we're recording this that we
did not know about this, this is somehow better or different
models for different applications exactly. So Speckett was born out
of that entire need, and as we started kind of
evolving it, things pop top from customer conversations like, oh, well,

(21:15):
if I starting coding requirements, how do I make sure
that my technical requirements are correctly updated if I decided
to change my tech stack. And that actually is one
of the core value props of Speckett and the spectrum
development here is that we're detaching the spect the requirements
from code and from the tech stack, so that if

(21:36):
you're a developer and previously you're like, Okay, I want
to build a web app for my enterprise that does
something like accounting, and I'm not entirely sure which framework
is the most performed, is it next JS? Should I
use Astro? Should I be framework less entirely? Should I
write this in just tailwin CSS for UI, shad CN.

Speaker 2 (21:56):
And like, what would you have to do?

Speaker 3 (21:59):
You'd have to go and write this in whatever stack
and then maybe do some benchmarks and figure out like
what's the performance. Now, because you have a spec you
can just go and ask it to write you like
three versions out of the same requirements. So you basically
you start speed running the iteration process and saying, oh,
create me like three options that I can choose from.

(22:22):
And that's kind of nice because now you have the requirements,
they're separate from your technical requirements, and the LM can
just build it for you and then you can assess it.

Speaker 2 (22:30):
Cool.

Speaker 1 (22:31):
So Cucumber plus AI, you could probably think.

Speaker 3 (22:34):
It's basically simplify yes, instead of as a user I
can is like as a large language model, I can.

Speaker 1 (22:44):
Wow.

Speaker 2 (22:44):
But I think you're also skipping over some hard bits here,
which is you really have to granulize the specification. Yes,
you want to take small enough bites at the LM
has a shot at building this thing.

Speaker 3 (22:53):
Success absolutely, And I think one of the biggest problems
that we see with people trying to approach spectro of
development is actually, like, first of all, the problem is
under specification because if you are not yet comfortable writing
specs or understanding.

Speaker 2 (23:07):
What you're building and why, you're.

Speaker 3 (23:09):
Gonna fall into the pit of under specification and what
happens when you underspecify things. The AI starts guessing and
it's like, oh, sure, like you did not tell me
that you do not want the header to be green.
I'll make it like whatever color I choose, right, And
I think this is like it's a it's a mental shift, right.
You previously have to get into the mode of like, oh,

(23:31):
I'll just go and start writing things and see how
it happens. Now you're like, no, no, no, I need to
spend upfront time thinking about this, outlining this. And by
the way, we also support this capability. We added this
prompt inside spec kid that is called clarified and that
is essentially you're using the language model to question your
spec right, So if you're not if you have writer's block,

(23:52):
you're you're not inspired, Uh, you can just ask the
LM to say can you can you ask me quite,
are there any blind spots that I might have? Are
there the things that I don't know?

Speaker 2 (24:02):
I don't know? Right?

Speaker 3 (24:04):
Right, just keep keep prodding at this and so and
absolutely you're right Richard, like there is a degreeable. So great, Well,
we're going to have this massive specification. It's going to
be massive, like it's going to be massive context windows
for folks that know, like any large language model has
basically a limited set of tokens again process at any

(24:24):
given time before it starts losing track of what you
asked it to do. So the context windows are relatively limited.
So if you start with a giant, massive spec you
can call it a context window. I'm going to call
it a buffer overflush. Sure that's really what's happening. Absolutely, yes,
that that's a very good way of putting it. Like
you're going to get into the world of the buffer
overflow where a lot of your instructions and guardrails are
just going to be tossed out the windows. Like it's

(24:44):
not going to be a plot.

Speaker 1 (24:45):
Yeah I heard Sun at four point five has more space.

Speaker 3 (24:48):
Yes, yes, modern models have like generally are improving in
terms of like the the buffer overflail problem.

Speaker 2 (24:55):
Yeah.

Speaker 3 (24:56):
So what we also do here is not only do
we ask you to create the specification, but we'll also
break this down into chunks. So once SUSPEC is ready,
once the technical plan is outlined, we have the capability
to basically split this into consumable tasks. And then this
is where I showed in my videos. I showed in
a blog where it's like you can use slash tasks

(25:17):
to then create a list of tasks that are based
on the spec based on the technical plan, and the
LM can do them in chunks. It's like, Okay, first
of all, you need to bootstrap this project next to yes, okay,
let me go do that. Now you need to create
the CSS, Okay, let me go do that. So you're
going through these controlled stages of creating things and get
to an end result. Now, some of these tasks also

(25:40):
going to be parallelized, and this is something that we
see with coding agents like claud code that support sub agents,
where the agent itself can spin up sub agents with
their own context windows, and you can say, oh, this
agent's going to be writing tests for this component, this
agent's going to be running tests for this other component.
And then you get into this world of kind of
things are happening in parallel but also manageable chunks.

Speaker 2 (26:01):
Yeah no, and you sort of see your role as
a shepherd there to push down the specifications in enough
detail that the tools have a chance segregating into portions
with headers that describe this is the context you're going
to be living in. But it's shorter than the overall context.
Now I need you to do this party, and I

(26:21):
everywhere I see this being this approach be successful with
unbelievable levels of testing too, right, I'm always a stun
to llms. Often when given an instruction for a set
of code to generate, get about halfway through and just
kind of stop. It's like, you can finish this. It's
your buffer overflow. I think it's a buffer overflow. And
so so that test approach where I want you to

(26:43):
generate the set of tests and then I want you
to pass all the tests before you stop. It's kind
of a way to again compress that context further right
into these test loops to try and actually get complete
sets of code from it, like the tools take the wrestling.

Speaker 1 (26:57):
When that happens. When that happens, though, it's it's really
unnerving because it's in a state where you can't really
debug it because not all the code is there that
it was, yes, you know, spitting out and you ultimately
have to roll stuff back and try again. And I
hate that about code generation in general, is that you know,

(27:18):
now this is intelligent code generation. By the way, John
Lamb's been into code generation since like ninety one.

Speaker 2 (27:23):
Before it was cool.

Speaker 1 (27:25):
Yeah, but you know the problem of course is that
if it creates something that you don't understand and can't
explain in a meeting, for example, you're you're done for
in my opinion.

Speaker 2 (27:39):
Oh absolutely absolutely.

Speaker 3 (27:41):
And I think this is an interesting tangent here to
the role of expertise, Like I've heard somebody talk about
kind of the role of the developer moving forward. If
if LM has become really good at cogeneration and producing
just just boatloads of code, what is your role as
a developer on IT team? And I think there's a

(28:03):
hypoth this is that that role is going to shift
more to almost like a Richards. You go out like
the Shepherd reviewer, right, it was like okay, but like
even for you to be able to review things, you
have to have expertise to know what you're looking at, right,
because I've seen this happen. But like Carl, you're talking
about like test or and development, Like to me, this
aspect of like, okay, well, there's a bunch of tests,

(28:25):
and I looked at some of the tests that some
of the models generate, and sometimes like for me to
test for the test to pass.

Speaker 2 (28:31):
Oh, let me make a change and just return true. Yeah, right,
and test is passing now, Hey solved it.

Speaker 1 (28:36):
Look at it.

Speaker 2 (28:36):
It's all green.

Speaker 1 (28:37):
Now.

Speaker 3 (28:37):
Nice, But you need to have the expertise to actually
look at it and be like, yeah, this ain't it?

Speaker 2 (28:43):
No, like I need like I.

Speaker 1 (28:44):
Kind of I kind of think of a roll moving
forward as sort of like a general contractor building a house. Right,
the general contractors in charge of all the subcontractors that
have their sub specialties. But any general contractor worth their
weight knows enough about those things to be able to
take over if the electrician is out sick, or to uh,
you know, remove the termite damage from these boards, if

(29:07):
the carpenter doesn't you know, isn't there that day? That
kind of stuff. So so you need to be able
to understand everything that your subs are doing and be
able to jump in when you can, but not necessarily
build the house by yourself.

Speaker 3 (29:19):
Right, right, And we see this especially critical with things
like security and awe, oh my goodness, Like I see
so much stuff that is just tossing to get our
bripo that clearly has been just like the AI told
me that this is secure, Like oh no, you're returning
to the two FA code and the adjacent response is
you're expecting it like that's that's that's not going to

(29:41):
be good. So yeah, so the these changes right, like
there's a good degree of that expertise that needs to
be embedded into the process and see that the the
what the LM is generating, whether it's tests, the components,
the structure is actually right and scope through the project
you're building them because LM's go thrills all the time,

(30:01):
right all the time, and they do wild things that
if you don't know, like on a surface, it might
seem like, oh it works. I was recently like rebuilding
some of the stuff would alsope with the help of
spec get on my blog and it's like, oh, I
couldn't understand where your CSS is. I'm just going to
recreate a bunch of CSS files and so I'd have
like a bunch of duplication all of a sudden. Yeah

(30:22):
it works though, but if I didn't know, if I'm
somebody that has to maintain, they'll be like, wait, why
just recreate these two folders for this exact same content.

Speaker 1 (30:32):
We're coming up with more and more of these examples
where having the expertise would prevent working code. That's dumb.

Speaker 2 (30:39):
Yeah, yeah, yeah. The concept is what's worked, what's actually
working right, and the software clearly has a different perception
of working than you. Yeah right, get good enough guidelines. Hey,
we got to take a break for these very important messages.

Speaker 1 (30:55):
Do you have a complex dot net monolith you'd like
to refactor to a micro services architecture? The micro Service
Extractor for dot Net tool visualizes your app and helps
progressively extract code into micro services. Learn more at aws
dot Amazon dot com, slash modernize.

Speaker 2 (31:16):
And we're back. It's don A. Rock's Emirateard Campbell. Let's
Carl Franklin. You talking to our friend dem about the
spec kit and this idea of I think essentially formalizing
some practices around harnessing LMS for co generation. You're talking
about uml right, yeah, yeah, that's what I want in

(31:36):
my life.

Speaker 1 (31:37):
You're still way back in the nineties.

Speaker 2 (31:39):
Yeah, well, you know, it's linguist. It'd be interesting to
get to a point where it was rendering, taking images
and breaking them down into specifications as well. But I
I do you know, now we've got the I'm starting
to see the term work slop, just like AI slop,
but it's hey, you you responded to a mediocre corporate

(32:03):
email with even with generated results of even more mediocre.
And the evil part about this is it takes time
to read that and realize it's garbage, and so you're
literally propagating garbage downward and making everything slower. Like there's
a real risk here about not creating quality, but there
is a way to create quality, like I have seen

(32:24):
these things be successful. So part of this is really
building you know, we talked about what's our job as
the human in the loop here, it's actually having an
eye to quality results that just setting it to true
is non as acceptable that these tests are comprehensive. You know,
you've got it, and but I also think you're going
to be used the tool a lot for this. I

(32:45):
like the idea of co generated from Claude being evaluated,
you know, by open AI and how does it what
its assessment looks like, and going passing back and forth
between them so that you do have other assessments of
everything being made.

Speaker 1 (33:01):
Have you guys ever hired somebody who talked to good
talk and then when they sent their code in you
could tell they had no idea what they were doing
and you're just frustrated. You could lose your mind like that,
Like if you're constantly retelling an LLM, No don't you
don't downgrade to dot eight? Was dot nine? Did I

(33:21):
tell you the down grade?

Speaker 3 (33:23):
No?

Speaker 1 (33:23):
I did not, So why are you doing? You could
actually lose your mind if you know, after a while,
if that's your job and no wonder people want to
write code because at least they're in control of it
and they got no one to blame but themselves.

Speaker 3 (33:37):
Yeah, so for that, by the way, like you call
out of a very important point, which is, how do
you set up these kind of requirements that the LM
does not go off the rails in terms of your
technical requirements and would spec it introduced this concept of
the constitution. It sounds very fancy or like the you know,
call it the charter if you will, or the Karda
if you're British. You know, basically outline, it's the fundamental

(34:00):
guys exactly, like the non negotiable things that will absolutely
apply to your project. And this is where we've seen
a lot of successful people coming in and saying I
will write my constitution and say you always use.

Speaker 2 (34:11):
Dot net nine.

Speaker 3 (34:13):
You never downgrade anything else like, And the Constitution is
grounded in every part of the spec process so that
when it goes through like the technical plan and task breakdown,
it always has to go and check like, what does
the Constitution say about us?

Speaker 1 (34:24):
Oh?

Speaker 3 (34:24):
Dot net nine? Oops, I put dot net eight. Okay,
can't do that. That's a constitutional violation. I'm gonna go
ahead and redo this.

Speaker 1 (34:30):
Get up copilot agent. I think the engine was or
still is in written in dot net eight, and so
it didn't have a dot net nine compiling environment, and
so it would always go downgrade and think it was
doing you a favor.

Speaker 2 (34:46):
Right, So theory that's a long training thatta has a
lot of impact on this. That's the long term support version, right,
So there's a case but still but still, but you know,
the only have this great race of context, which is
you know you're describing the Constitution is a prefix prompt, right,
It's like before you do anything, these are the rules.

(35:07):
So a bunch of our context is now consumed by
the prefix yes and no. Then we have the sort
of state of being of the project, and then the
request for what to create I'll say, we do it
a little bit even trickier than that. So the Constitution
is embedded in other prompts where you can essentially say,
if you're building a technical plan, consult the Constitution, and

(35:28):
within the technical plan, create a checklist of constitutional checks
you have to go through before you move forward.

Speaker 3 (35:35):
So instead of us embedding the Constitution, the Constitution itself
can say like, oh, okay, this has to be a Net
nine application, this has to be using this library to
connect to Microsoft SQL server. And then when you get
to the implementation stage, before you do an implementation, we
also do a checklist of essentially saying can you cross
check once more the Constitution against the plan, against the task,

(35:58):
against the spec and see if there's any violations. So
you're essentially creating almost unit tests for.

Speaker 2 (36:03):
English like that.

Speaker 3 (36:05):
That's kind of the framing seris Okay, let me, let
me go and just cross check to make sure that
nothing went off the rails, because too often people fall
into this kind of habit of oh, I've done the spec,
I've done the planet done and task great, go build
this now. But that's like hold on, you still need
to check that what it created actually matches the expectation requirements.
So there's a bit of a process, right, And I

(36:26):
think this is where I was talking about, Like this
is the mental shift that you now have to think
about those things that you did not before, because before
you would just start writing code and be like, oh sure, yeah,
let me bootstrap a dot A nine project and visual
studio at my NuGet packages and be on my merryway.
Here it's like now I need to basically like spell
it out for it. For like, you have an over
eager intern that is very very happy to help you

(36:49):
but has no knowledge of what you're trying to do
and just go like if you don't tell me, I'll
just guess. And this is where you're trying to kind
of like put guardrails and say like can I guide
you to the right thing, to the exactly what I want?

Speaker 1 (37:00):
Reminds me of Scooter from Mondays. Mister Miller, shop it
on your pencils, I'll shove you beginning. I didn't find
him with pencils up with cranons in shopping to miss
the Miller.

Speaker 2 (37:11):
It's exactly that. It's exactly that. Uh, Now, I gotta
think there's some people listening. Is this even worth it? Then?
Like all of this work to try and get this
code generation to behave properly.

Speaker 3 (37:25):
I think it depends. I think I'll say like it
depends right, like it's it's a I I. I'd like
to think the analogy like if you know, if you
have a hammer, everything looks like a nail. So it's
like I don't want to say like you have to
use this for everything, but there's certainly value in getting
from zero to a new feature, to a new component,
to a new entirely new product with this. Even if

(37:50):
you use this essentially scaffold this, just just get the
bulk of the things, get the four pieces of vail,
and then you with your expertise can come in. It's
like you want to go and rewrite this component, go
ahead and do this, you want to add like a
moon and go ahead to do this. But it saves
like a lot of this tediousness. So also a lot

(38:12):
of these models are getting better, Like Sonnet four to
five got released last week, Like GPT five got released
like it was like last month, and like GPT five
compared to GPT four was like night and day when
it comes to coding, right, which like who saw that coming?

Speaker 2 (38:28):
Right? But like they're getting better and better.

Speaker 3 (38:30):
And I totally see a world where a lot of
these guardrails were just kind of become leaner and it's
not going to be as heavyweight because.

Speaker 2 (38:38):
You're not going to bang into them as often.

Speaker 3 (38:39):
And yeah, exactly because a lot of the constraints that
we talked about like they wear. The reason they're there
is that they were inspired by Sonnet four being over eager.

Speaker 2 (38:48):
Right, Yeah, I know every law is written in blood, right,
like right right, what we're seeing.

Speaker 3 (38:53):
But as they become better, a lot of these constraints
will become more lax because like you can just tell
it like it's always dotten at nine is like cool,
I remember it's always done at nine, I'm never touching
anything else.

Speaker 2 (39:03):
Yeah, it's just you don't bang against it as much. Yeah,
I just think there's an interesting balancing act for you know,
what really you're saying them is there's a case for waiting.

Speaker 3 (39:12):
There's a case for waiting and experimenting.

Speaker 2 (39:14):
If I wait, the tools will get better.

Speaker 3 (39:16):
And experimenting and seeing what works in a wood doesn't
then because Spekeett itself is an experiment, right, like we are.
We got into the world with the intent of can
you help us verify where this works and where it doesn't. Yeah,
and like we're not making any statements of saying like, oh, yeah,
this is going to absolutely destroy the software engineering career
track is the yeah I got to write the code
for It's like, no, like I want you to test

(39:38):
this and tell me. Yeah, I've tried adding a feature
to my enterprise app and it just completely went off
the rails and it was completely useless.

Speaker 2 (39:46):
Like that to me is like super helpful. You back,
I'd love to know that. No. And I saw this
in the early GitHub copilot days where people's behavior changed
in the repository, where there was far more reverts yep. Yeah,
like they would use the tool to generate some core,
they push it up, it would blow up, they would revert. Yeah,
And so you know, we I could definitely tell you

(40:07):
were using the tool because you were using the toolimate
the code and then you weren't really editing much, or
you weren't doing a whole lot. You were just trying.
You were stuffing into the test suite. See what happened,
and what happened was bl you know what, I'm going.

Speaker 1 (40:18):
To close this issue and open it again.

Speaker 2 (40:22):
Yeah, you know we we can just make that branch
go away again.

Speaker 1 (40:27):
And everybody else how we screwed this up.

Speaker 3 (40:30):
This was a forcing function for me to actually get
much closer to get interesting, because once I started going
through like the spectrum processes like okay, let me go
create the spec, let me create the plan, let me
break it down and task. Okay, this looks reasonable. Let
me check the SIN. Okay, so now I have the check.
Now let's start generating the code. It's like, okay, I
don't even push. The code I just generated was like, okay,

(40:51):
this component, Okay, now it looks right. Checkpoint check the SIN.
Now continue generating. Oh now it's screwed up. Just blow
it all away. Let's do it again.

Speaker 2 (41:00):
The code that worked in the previous build does work
this next build. Yeah, exactly.

Speaker 3 (41:06):
But the geit process is super helpful in this context
because it allows me to checkpoint things, and it's like,
all of a sudden, get went from this like, oh yeah,
I just use this toss things into gethub into an
actual like oh, I can checkpoint things, and I can
create multiple branches and multiple implementations across different branches and
then test them separately. Oh I'm curious, what if this

(41:28):
button would instead be this kind of component. Let me
create a new branch and do this and test this right, So,
like you, you go through the process of using get
as an.

Speaker 2 (41:37):
Actual helper here, Yeah, no kidding, surprised. Can I go
to the checkpoint code and point the LM at dot
and said, this code works? Why is this changed? Or
you're going to make changes? Make sure it passes the
same test, so you can use your validation skills against
the various pieces of code with these checkpoints and force

(41:58):
the tool down about right, let.

Speaker 1 (42:00):
Me change GET testing tests?

Speaker 2 (42:03):
Do I just change the tests?

Speaker 3 (42:06):
This is like see the lesson for future software engineers
listening to this learn GET.

Speaker 2 (42:10):
Yeah, that's like that's gonna be your your your savior.

Speaker 1 (42:13):
GET.

Speaker 2 (42:14):
This is a tool that will help you. Yeah, but
you know and now you're you're helping me shape my
mind on the how different our work looks like and
the tools that are actually going to help us to
get there. And again with always designed to am I
better off? Is as better results? I've definitely talked to
folks here, So I'm being successful as a tool, but
I'm not having as much fun, right, Yeah, yeah, I can.

Speaker 3 (42:36):
I can see this and I think there's a something
to be said about the fact that coding and self
engineering in and of itself is a creative process, right,
Like you're you're sitting down, like, let me think through
this problem, let me figure out how to this. But
also there's no denying that there's like, at least for me,
I know for a fact, that there's a lot of
tedious tasks like oh my goodness, like I just do

(42:56):
not want to deal with this, like writing us like okay,
you need to write one hundred unit test for all
these components, Like oh, like, can I can just somebody
write these for me so I can actually focus on
the components.

Speaker 2 (43:07):
One hundred percent coverage is so much more approachable when
you got a tool generating the test solely. Man, Yeah,
what'd you call me? So?

Speaker 1 (43:17):
Yeah, it's almost like you know these people who get
their dopamine rush from just a clean compile, right, you know,
you know, I just want to write this feature compiles.
All right, it's good, but I get my dopamine rush
on those weekly meetings where I show the boss what
I've done and it works and everything's good. Yes, that's
I'm holding off for the for my endorphins. To kick

(43:39):
in at that moment. It's delayed gratification.

Speaker 2 (43:41):
I think that's that's there.

Speaker 3 (43:42):
There's value in that in the sense that we often
can forget what this is in service of, right, Like
you're like looking at me, like the stuff that the
code that I write is in service I.

Speaker 2 (43:55):
Want to solve a specific customer problem. Yeah, this is
our whole VB conversation at the of it. Yeah, no,
it wasn't as cool as C plus plus, but boy
is sure did bring solutions to customers faster. Exactly if
this can bring solutions to customers faster, Okay, and how
many of your customers did care that, Like, oh you
didn't write this in C plus plus ooh, well, as

(44:16):
long as it doesn't look like AI slop, right, this
is come down to it. If it's triggering people the
way that AI slop triggers people, then you're not succeeding.
Just say no to AI slope.

Speaker 1 (44:27):
There you go.

Speaker 2 (44:28):
Yeah, demand quality from these tools. But I'm also you
said this a while ago, and I should have grabbed
onto this earlier down if this thing gets me eighty
percent of the way by dealing with all the scot
work so that I now have to just work on
the twenty, which is arguably the coolest code and the
most important in the project. You did me a favor,
There's no two ways about that. Yeah. You know, if
we could just Prato's law this and get that eighty

(44:50):
out so that we can work on the hard stuff,
we've definitely saved time provided value.

Speaker 3 (44:55):
I think that's the goal. Hard stuff is more fun anyway, Yeah,
that's the more fun.

Speaker 2 (44:58):
Yeah. Yeah, well it's plus you're going to fight with
the spec and the tokens to get those last few
points so hard it's probably not worth it. Like there's
going to be a point of diminishing returns on this, yeah,
at some point. And I'll say, like, it's also the
mental shift of you thinking ahead of these problems and
having an actual artifact and being more crisp with your

(45:22):
thoughts and more crisp with how the rubber dying effect
here is huge.

Speaker 3 (45:25):
Yes, exactly is going to help you down the line
to just create better stuffware, just but just by proxy
of you being able to think about it more and
understand how does this actually work?

Speaker 2 (45:37):
Why is it the way that it is?

Speaker 3 (45:39):
What are the things that I should be thinking of
that I'm not thinking about You know, we saw this
happen in offshoring when when it got easier to offshore
to low cost markets and you didn't get good results,
and you learned to write better and better specifications and
better and better project plans so that you were successful
with the offshore developers.

Speaker 2 (45:57):
It turned out if you took those things on shore,
you got more success to Yeah, shocker, right, Like.

Speaker 3 (46:05):
If you're better and specifying your requirements, you get better outputs.

Speaker 2 (46:08):
Wow, there is a reality check to this whole thing.
But I really appreciate the idea of don't try. You're
getting one hundred pcent is probably not worth it that
getting And once you get into the eighties, you've had
a win and now you can go do the rest.
Here's the other thing, you know, because my instinct, I'm
going back to code generation responses in the language you've used. Now,
I feel like I'm dealing with the code generator, which

(46:30):
clearly I am. As soon as I touch that code,
I can't touch the code generator again. But that's not
true than l L. No, You're you're collaborating again. You
have an intern in your hands. M hmm. You're working
together to write the code. Yeah, I'm you're going to
resist the anthropomorphic terms there dem like I have a
piece of software that's fair, that's fair.

Speaker 1 (46:50):
Yeah, but it sooner or later you're going to have
to agree that the world wants to anthropomorphize these things.

Speaker 2 (46:56):
And I know the world wants it. It's just wrong.
It's absolutely fair.

Speaker 3 (47:02):
Actually, Like I I can get totally behind that idea,
like we should not entermorphize the AI because like, it's
just an algorithm, it's a computer background.

Speaker 1 (47:09):
But it makes it easier for us to understand it
when we put it in human terms, right if when
we I.

Speaker 2 (47:16):
Think it makes it easier if a story execuse it inappropriately.

Speaker 3 (47:19):
Yes, yes, that I think it's like it gives it
gives us the way of saying like, oh, it's like
it's just it's just like a humans, Like, no, it's
not like a human.

Speaker 2 (47:26):
It's not at all like a human.

Speaker 1 (47:27):
Right, But if I was going to say what was
it thinking rather than well, what algorithmic sequences brought.

Speaker 2 (47:34):
It to the result that it did? Now true?

Speaker 1 (47:37):
Which would you rather let me ask you? Right, No, it's.

Speaker 2 (47:40):
True, it's true.

Speaker 3 (47:41):
I think there's there's a kind of shades of grade
of this it's like where it lands. But I generally
feel like, yeah, may're they're they're not humans.

Speaker 1 (47:50):
So they're not they're not. We do have to remember that,
and so they don't have responsibility. They don't you.

Speaker 3 (47:57):
Know, they're not accountable right right, or like if it
produces your like right, like we as software engineers, we
all know things like the the RAQ twenty five, right,
like the incident about the X ray machine. Remind us,
So there's an incident about with an X ray machine
where because of a software bug and no hardware interlock,
it would actually, uh for cancer patients would have basically

(48:22):
like deadly doses of radiation in under very certain conditions.
I think this happened like in the eighties. It's litical.
I think it's like the RAC twenty five. And so
the idea there is like, right, like, if if you
have an LM right software that does something dangerous like this,
who's accountable for that?

Speaker 2 (48:40):
Yeah? Right right? It's not the AI, like you can't
it isn't. It isn't the equipment yea, yeah.

Speaker 3 (48:45):
It's like, it's not the equipment, it's whoever pushed that software.
It's the software developer, just like with any other scenario right,
So to me, it's like, I think it's AI. Is
that it's it's INTELLISENTI and steroids. Was that Is that
a good frame to put it here? Like, yeah, that's
that's good.

Speaker 1 (49:01):
I mean, because we don't answer promorphize and tell a sense,
do we?

Speaker 2 (49:04):
You try not to? Well, well, you could, says I
mean the number of times I'm cursing a visual studio.
What is it thinking?

Speaker 1 (49:14):
What were you thinking? Right?

Speaker 3 (49:18):
But yeah, so I think it's like long story, longer
to what we talked about. I think that's a definitely
a mental shift. It's definitely one of those things where
you you have to treat it essentially as an assistant
program that allows you to write code faster, but it's
still up to you to make sure that you you

(49:39):
review this and you have input, and you can look
and say this is garbage, that's not what I asked,
or this test just returns true.

Speaker 2 (49:48):
That's not how you pass a.

Speaker 3 (49:49):
Test like this are supposed to work, right, So you
your expertise is still valuable. And I think that that's
what also gets lost in the conversation because folks get
so excited.

Speaker 2 (49:57):
I'm just I'm trying to de sex you this not
out of this down this idea that I have a
re entrant code generator. Yeah, that makes me very happy
because that's something we haven't able to do before. It
used to be when you use code generators. Once you
touch that one way, I never run the generator again. Yeah. Now,
the idea that I could stick this code generator back
at the code that i've also that it contributed to
and I modified and do another iteration, that's a big deal.

(50:19):
Like this is it's pretty cool.

Speaker 3 (50:21):
Yeah, yeah, exactly because I can come in for a
specific component and say, okay, this chunk here doesn't quite
look right. And what I will seen people do, and
I think this is a super valuable thing, is once
you started going through the motions of okay, I saw
where it failed. I saw that it did not create
this component the way that I wanted. Based on the
conversation that you had in code that back into the

(50:44):
spec right and say okay, now that we know this,
can you put them back in the spectrum technical plan?
So that way, again, if I come in down the
line and say, hey, we're going to take on a
rewrite of the system, can I recreate the behavior that
I had based on the learnings that I had and
the iteration that I had in this new format. And
this is where the speck comes in, because then you
can say, oh, yeah, totally, I've encoded.

Speaker 2 (51:05):
All the larynx. This is also things that we hated
doing as developers. Right, It's like, Yo, the spec is wrong,
but nobody's actually gonna go fix the spec.

Speaker 3 (51:12):
But now oh no, zero people. Now you have a
tool that you can demand it fix the spec. Yeah, exactly,
Just do yourself a favor and make sure you check
it in again. I really appreciate your thinking it will
save you because it'll mangle your spec.

Speaker 2 (51:24):
Mm hmmmmmmm hmm. Yeah.

Speaker 3 (51:26):
I mean, and you're absolutely right in the sense like
when I remember, like when I started my career as
a product manager at Microsoft, you'd write a spec, you'd
review it on a bunch of meetings, and then once
a dev the moment the code on the starboard.

Speaker 2 (51:40):
Nobody ever touches that over.

Speaker 3 (51:42):
It's like and then even if you realize like, oh yeah,
we forgot about like mentioning this, Hey dev, can you
go fix this thing? Because we had this idea and
they're like, oh sure, yeah, let's go change it. Like
nobody goes to the doc and then six months later
when you look at the doc and be like, product
doesn't work this way.

Speaker 2 (51:57):
Well, this is how no new people can into the
project as they start with the document.

Speaker 3 (52:01):
It's wrong exactly, and you and all those documents are
wrong and they're all all out of date.

Speaker 2 (52:06):
We're trying to avoid that. All right, let's get weirder.
Let's start taking after we finished making the code. Then
you demand another tool generate a specification from the code
you can see and compare against it.

Speaker 3 (52:16):
That's actually a thing, you know, and you know what.
Where this is super helpful what we refer to as
brownfield projects because where a lot of the spectrum of
development stuff kind of falls over a lot of scenarios.
Is all this is optimized for net new Like I'm
starting from zero to one, I have nothing, Go get
me something. But what if I already have a code

(52:36):
base that is which, let's face it, mostly we do
tens of thousands of Yeah, there's like can you do this?
And this is where that reverse process is super helpful.
Like I want you to write me a spec based
on how my authorization library is working, and it's like, okay,
let me go and look at the library that create this,
and they're like, Okay, now that we have this context,
can we now add a feature that changes how my

(52:57):
oft flow works context it's all about content.

Speaker 2 (53:00):
Sure, No, this is a whole there's a whole other
show here about Brownfield with lllms mm hmm. Just see
I you know, because it's a great problem space. Right.
Let's say so we have a ton of applications inside
of organizations where the original dev team's long gone. Oddly enough,
the spec is incorrect, but we need new features. Yeah,
and you have a new team that's trying to take

(53:21):
that code on. So the idea that you could point
in ll at it first to generate a correct specification
in theory, like, you need to validate that. So I
probably want to iterate on that a few times and
then start to build a plan around how you would
make the change you want.

Speaker 1 (53:35):
Well, you could certainly build real documentation of your product
by just sicken clawed copilot agent or whatever on it
and it'll do that. I've done it with Jeff Fritz
encoded with AI episode one and it created a pretty
good user manual. So I know that's not a spec
that's not an internal spec but it certainly is of

(53:58):
that same idea that you take something that it already
exists and then generate some documentation around it.

Speaker 3 (54:04):
Have you guys heard of this project called deep wiki
deep wiki, deep wiki, deep wiki that sounds familiar to me.

Speaker 1 (54:11):
So has it been around a long time?

Speaker 3 (54:13):
No, it hasn't, but it's from the I believe it's
some of the same people that wrote Devin, like the
Coding Agent. So deep wiki is this place where you
can actually come in. It's a documentation generator. Basically, it's
an AI based documentation generator. You can come in and
you can look at different repositories that already been documented
or maybe even add your own, but you can just

(54:34):
plug in the URL and then it's going to generate
the docs. And to me, the fascinating part about this
is just how freaking good it is at looking at
the code base and like, oh vs code. Sure, let
me look at the architecture and here's a renderer process,
here's the main process, here's the extension host process. How
they interact with each other's like none of this stuff
is actually like encoded in some like there's a viscode

(54:58):
architecture doc but like it infers it for the code,
it points you to the lines of code, it generates
mermaid charts, like it's a fascinating project. It's a deepwiki
dot com.

Speaker 1 (55:06):
Does it use playwright to get screen captures of the
application running?

Speaker 3 (55:12):
You can plug it in with any MCP servers, including Playwright.

Speaker 1 (55:17):
So that's what we did with with Visual Studio code
and the agent. We told it to use the Playwright
MCP to run the app and actually take screenshots and
put it in the documentation, which is very, very helpful.

Speaker 2 (55:29):
Yeah.

Speaker 3 (55:30):
Yeah, Playwright MCP is one of my favorite tools hands down.

Speaker 2 (55:33):
Yeah.

Speaker 3 (55:33):
That's unbelievable, is just how good it is about both testing,
providing screenshots and actually one of the validation steps that
I've experimented for spect driven by the way it's like
when it creates the UI, it uses player MCP to
go and say, are there any components that overlap with
each other or the UI is funky? And then if not,
go and fix it.

Speaker 1 (55:54):
That's so cool.

Speaker 2 (55:55):
Wow.

Speaker 3 (55:55):
Right, So you're wiring these things together and we're back
to the whole idea of like what's the role to
develop in this? And this is like you are now
the orchestrator of this fleet of software agents that do
things for you? How do you make sure that app?

Speaker 2 (56:10):
Yeah? And you see, like I think about the large
scale problems inside of big organizations right now. It's like
they got a ton of asp down web form apps
out there. Nobody wants to work on them anymore. Yeah,
of course they Everybody wants a mobile client for a
bunch of them, which is almost impossible, and rewriting them
takes time and money. Yeah, And so the idea that
I could fire up these llms use stuff like playwright

(56:33):
to screenshot the whole app and then fire it through
a generator to make it other version of the app.
Now me, my twisted mind, what's my automatic reaction? It's like,
let's make photocopies of photocopies. So I'm going to take
this app, I'm going to generate a spec. Then I'm
going to generate an app from the spec. Then I'm
going to generate a spec from that version of the app.
I don't see where we get to, because I bet
you're going to hit a floor, yeah, of functionality at

(56:54):
some point right where, because the compiler always has a say,
because the app ultimately has a set of core functionality.
Like I wonder if you'd end up with the most
distilled version of an app if you did a bunch
of iterations like that. Probably probably right.

Speaker 3 (57:08):
But also like we know that software there's more to
building software than just carbon copy of things. Yes, right,
And this is where like I see people posting on
social media as like, oh, with the help of LM,
I recreated docuside in three days. It's like, well, sure,
pretty sure you didn't like yeah, but like it's you
got to realize that there's more docusine than like just

(57:29):
the uy of docusine, right, And that's part of it,
is like we're really starting to just still into what
are the important bits of software and what's the thing
that should have our primary attention.

Speaker 1 (57:39):
Didn't some guy just recently tell uh and chat gipt
or whatever to rebuild and recreate Kubernetes anthropic. It was anthropic,
He said, rebuild me Kubernetes in Rust or something like that.
And it's spent so many tokens that it costs anthropic

(58:02):
four hundred thousand dollars and he only paid his two
hundred dollars for the month.

Speaker 3 (58:07):
Yeah, but the question is did it did it recreate Kubernetes.

Speaker 1 (58:12):
Well, it spent four hundred thousand dollars worth of token.

Speaker 2 (58:14):
So it did something maybe nothing, that's in the end.
What did that four hundred thousand dollars do but heat
up a bunch of rocks? Yeah right, Yeah, the sand
is working hard. Yes, made the rocks warm today.

Speaker 3 (58:31):
But I was genuinely curious, Like Carl, you were so
excited about it, and I was like, but did it
actually work? It's like that that that meme about like
I rewrote my app in like one day, this massive
enterprise scale things, like none of it worked, but did
sure look pretty?

Speaker 2 (58:45):
Yeah? It was just another version of AI slop. But
you know, now you're pressing us the next issue here,
which is is this cost effective in the end, Like
when we eventually have to pay the real costs because
we know we're not so far. I mean we actually
have to pay the real cost for the compute, for
the tokens and so forth. Like now we're gonna have
another layer on running this tool where it's going to

(59:06):
now assess the token. Let's do an estimate before we
run this. You know, no different than I do when
I start provisioning apps into Azure we understand how much
infrastructure we need to run, and I can throw at
the CFO it's going to be this much a month. Yeah,
you know, to operate this like that. Those are the
trade offs are going to make, for sure, and it
will get more efficient to a degree that these are

(59:27):
going to be the trades, Yeah, for sure.

Speaker 3 (59:29):
And I see a world too where we can use them,
maybe even a combination of things of what is actually
net new stuff that needs to be delegated to a
powerful generalized model versus something that can run locally on
my machine. Call right, It's like, oh, this is a
model that's trained specifically on ASB dot net core. It
is excellent for ASP dot net core and nothing else.

(59:51):
Can I potentially use that to go and write me
the ASP dot net core code?

Speaker 2 (59:54):
Sure? Right. I also wonder about training model specific to
an application, Like You've got a mature piece of software
that's an awful lot of knowledge about what the software does,
and so if I can have that run in a
local model and just you know, let's not go out
of scope here, I just want to add these features
to it. So and yeah, the idea that these monster
models are the only way to do things sistantly not true, right, right?

Speaker 3 (01:00:16):
And John and I were actually talking about John John
Lamb again our good friend, about this concept of a
prompt compiler, and the idea here is that with a
lot of the contexts that we're passing to models, there's
this context overload. There's just too much, like Richard used
the concept like the buffer overflows like that, that's like
we're trying to prevent. And it's like, and if I'm

(01:00:36):
building an ASP do a net application with a specific
like API structure, how do I encode just the relevant
context from the spect it? So I don't just give
it like, oh, here's all the front end pieces and
everything else that you need to consider. But by the way,
just build the apia like, right, can I just select
the pieces that it needs for my task?

Speaker 2 (01:00:56):
Yeah? And maybe there is again, like a either a
local model or something else that pre emptively can select
these things and say okay, you only need this to
execute this task and nothing else, and that drives some
of the efficiency. So I think definitely we're we're right
now in the space where all of this is like
it feels like free money, right, Oh, four hundred thousand
dollars of Kubernetes tokens for two hundred dollars a month, Like, yeah, sure,

(01:01:20):
we're in the bubble of you know, the first wave
of this stuff right right, This bubble will end, but
you know, just no different than the dot com boom
and the end of that, you still have an Internet afterwards,
Like there will these tools will still exist, but we'll
start to rationalize it. I'm real happy to be talking
about the post bubble rationalizations.

Speaker 1 (01:01:41):
Yeah.

Speaker 3 (01:01:41):
Absolutely, yeah, I think this is where the whole conversation
about the bubble misses the point that the utility of
this is undeniable. Yes, Like I'm looking at the models
that we have today, like from Anthropic, from open Ai,
from like, and think about it, like if if the
evolution of those model stops today, like if after two
the refoard of the show nothing else ever gets released.

(01:02:03):
There is still a ton of value behind them and
building suffware and being able to like reduce the toil. Right,
nothing changes, it does not improve at all, and there's
still a lot of value in this.

Speaker 2 (01:02:13):
Now.

Speaker 3 (01:02:14):
Is there a lot of hype for things that are
like chat GBT wrappers around in the wild, like all
these like random startups Like absolutely, yeah, But the utility
for the core underlying models and the capabilities that they
offer for software development was like again to me, is undeniable.

Speaker 2 (01:02:28):
Yeah, I'm with you then, and I'm just you know,
we can still get better, but yeah, it's certainly worthwhile,
and I appreciate thing you're building, these kinds of tools
to help us rein in the sort of randomness and
start to think more coherently about what modern dev practices
with these cod generators are going to look like going forward.

Speaker 1 (01:02:47):
Absolutely well, Dan, thank you very much. It's been great
talking to you. Wow, blowing our minds over here.

Speaker 2 (01:02:53):
Thank you for having me. Yeah, my dream came true.
I was in dot net rocks.

Speaker 1 (01:02:56):
That's great.

Speaker 2 (01:02:57):
There you go. And and we went for a ride too, friend,
going to call that meaty goodness.

Speaker 1 (01:03:01):
Yet goodness absolutely all right, Thanks again to Den Delamarski,
and we'll talk to you next time, dear listener on
dot net rocks. Dot net Rocks is brought to you

(01:03:32):
by Franklin's Net and produced by Pop Studios, a full
service audio, video and post production facility located physically in
New London, Connecticut, and of course in the cloud online
at pwop dot com.

Speaker 2 (01:03:47):
Visit our website at d.

Speaker 1 (01:03:48):
O T N E t R O c k S
dot com for RSS feeds, downloads, mobile apps, comments, and
access to the full archives going back to show number one,
recorded September two thousand and two. And make sure you
check out our sponsors. They keep us in business. Now,
go write some code. See you next time. You got

(01:04:09):
your middle vans. This is home, then my Texes
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

The Burden

The Burden

The Burden is a documentary series that takes listeners into the hidden places where justice is done (and undone). It dives deep into the lives of heroes and villains. And it focuses a spotlight on those who triumph even when the odds are against them. Season 5 - The Burden: Death & Deceit in Alliance On April Fools Day 1999, 26-year-old Yvonne Layne was found murdered in her Alliance, Ohio home. David Thorne, her ex-boyfriend and father of one of her children, was instantly a suspect. Another young man admitted to the murder, and David breathed a sigh of relief, until the confessed murderer fingered David; “He paid me to do it.” David was sentenced to life without parole. Two decades later, Pulitzer winner and podcast host, Maggie Freleng (Bone Valley Season 3: Graves County, Wrongful Conviction, Suave) launched a “live” investigation into David's conviction alongside Jason Baldwin (himself wrongfully convicted as a member of the West Memphis Three). Maggie had come to believe that the entire investigation of David was botched by the tiny local police department, or worse, covered up the real killer. Was Maggie correct? Was David’s claim of innocence credible? In Death and Deceit in Alliance, Maggie recounts the case that launched her career, and ultimately, “broke” her.” The results will shock the listener and reduce Maggie to tears and self-doubt. This is not your typical wrongful conviction story. In fact, it turns the genre on its head. It asks the question: What if our champions are foolish? Season 4 - The Burden: Get the Money and Run “Trying to murder my father, this was the thing that put me on the path.” That’s Joe Loya and that path was bank robbery. Bank, bank, bank, bank, bank. In season 4 of The Burden: Get the Money and Run, we hear from Joe who was once the most prolific bank robber in Southern California, and beyond. He used disguises, body doubles, proxies. He leaped over counters, grabbed the money and ran. Even as the FBI was closing in. It was a showdown between a daring bank robber, and a patient FBI agent. Joe was no ordinary bank robber. He was bright, articulate, charismatic, and driven by a dark rage that he summoned up at will. In seven episodes, Joe tells all: the what, the how… and the why. Including why he tried to murder his father. Season 3 - The Burden: Avenger Miriam Lewin is one of Argentina’s leading journalists today. At 19 years old, she was kidnapped off the streets of Buenos Aires for her political activism and thrown into a concentration camp. Thousands of her fellow inmates were executed, tossed alive from a cargo plane into the ocean. Miriam, along with a handful of others, will survive the camp. Then as a journalist, she will wage a decades long campaign to bring her tormentors to justice. Avenger is about one woman’s triumphant battle against unbelievable odds to survive torture, claim justice for the crimes done against her and others like her, and change the future of her country. Season 2 - The Burden: Empire on Blood Empire on Blood is set in the Bronx, NY, in the early 90s, when two young drug dealers ruled an intersection known as “The Corner on Blood.” The boss, Calvin Buari, lived large. He and a protege swore they would build an empire on blood. Then the relationship frayed and the protege accused Calvin of a double homicide which he claimed he didn’t do. But did he? Award-winning journalist Steve Fishman spent seven years to answer that question. This is the story of one man’s last chance to overturn his life sentence. He may prevail, but someone’s gotta pay. The Burden: Empire on Blood is the director’s cut of the true crime classic which reached #1 on the charts when it was first released half a dozen years ago. Season 1 - The Burden In the 1990s, Detective Louis N. Scarcella was legendary. In a city overrun by violent crime, he cracked the toughest cases and put away the worst criminals. “The Hulk” was his nickname. Then the story changed. Scarcella ran into a group of convicted murderers who all say they are innocent. They turned themselves into jailhouse-lawyers and in prison founded a lway firm. When they realized Scarcella helped put many of them away, they set their sights on taking him down. And with the help of a NY Times reporter they have a chance. For years, Scarcella insisted he did nothing wrong. But that’s all he’d say. Until we tracked Scarcella to a sauna in a Russian bathhouse, where he started to talk..and talk and talk. “The guilty have gone free,” he whispered. And then agreed to take us into the belly of the beast. Welcome to The Burden.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2026 iHeartMedia, Inc.