Ep53: Language Models for Data Tasks + MCP Journey Begins

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:05):
You are listening to the BreaktimeTech Talks podcast, a bite-sized tech
podcast for busy developers where we'llbriefly cover technical topics, new
snippets, and more in short time blocks.
I'm your host, Jennifer Reif, anavid developer and problem solver
with special interest in datalearning and all things technology.
This week unexpectedlybrought me into MCP territory.

(00:28):
I'd planned on investigating thispath at some point, but I hadn't
planned on doing it just yet.
However, preparation for some upcomingevents and other things going on ended
up putting me on that path this week.
I feel like I'm just scratchingthe surface, but it's a start.
I also have a couple other tidbitsfrom this week as well, so let
me share what I've learned.
To start off a non MCP relatedtidbit is that I found the holy

(00:53):
grail of large language modeluses, and that's flat file cleanup.
So I ended up playing aroundwith Claude a bit this week.
I was hacking at a dataset.
I'd pulled down a. JSON file hadbeen able to convert it over to CSV
file and was trying to clean up someof the text properties where there

(01:17):
was a backslash inside the text.
And it's notorious that you usesomething like a regex, so something
like sed or awk, or whatever to do this.
And I've played around with some ofthese tools before, and I always just
feel like I am throwing things at thewall and trying to see what works.
And I end up either having to go back toa colleague or just hash it out for hours

(01:40):
until I finally figure the solution out.
But this round, I thought, you know what?
Why can't a large languagemodel attempt this?
And let's just see how well itdoes for something like this.
So I showed Claude the file.
I pulled up Claude code inside myfolder with the flat file in it
and said, Hey, look at this file.

(02:01):
See if you can escape the back slashcharacters that are inside this
file to make sure that it will readproperly, and it worked brilliantly.
It escaped the backslash characterswithin the text field and the file
loaded to Neo4j on the first attemptafter getting corrected by Claude.
This was so much easier than searchingaround on the internet for the

(02:24):
proper regex sed versus awk versussomething else, and testing on small
subsets and slowly hashing throughindividual lines of a huge CSV file.
That often in the past has takenme hours to go through and pinging
and bugging colleagues about it.
I was able to do this pretty quicklyand easily using Claude, as an example.
But you could use probably lotsof other large language models as

(02:47):
well, and it was so much easier.
So I do recommend.
Regex, which is a very consistent,rule-based type of pattern matching worked
really well with, flat files for that.
Now jumping into the MCPexploration this week.
I was catching up on some YouTubevideos that I hadn't seen, trying

(03:08):
to dig into MCP, looking at demosfor upcoming events and so on, and
trying to figure out what I shoulddo and what I should look into next.
And I came across Jason Koo's YouTubevideo showing Claude Desktop and
the Neo4j Cypher MCP integration.
And that was a really good introductoryvideo to get me started on this path.

(03:29):
And then I thought, you know what?
Why can't I do that as part of a demofor working with Neo4j and Cypher
and showcasing graphs a little bit.
I started playing around with this and wasable to get it spun up reasonably quickly.
There were a couple ofthings that tripped me up.
The first thing was that just lookingat the GitHub documentation and some

(03:50):
of the, um, blog posts and videos outthere, it wasn't super clear that I
first needed to install UV or somethingsimilar, so I pulled that down.
Once I pulled that in, then allof a sudden the configuration
worked and I was able to connectfrom Claude Code just fine.
Now, I will also say that dependingon the chat model, the chat

(04:12):
interface or application that you'reusing, the configuration might
need to be slightly different.
It was pretty easy for me because Iwas using Claude and the example I
was looking at was also using Claude,so that was pretty straightforward.
But you might need to alter yourconfiguration because different large
language models and chat interfaceswill take slightly different formats
on the JSON that is for the config.

(04:34):
Now my next step, now that I have justthis introductory basic configuration
be between Claude Desktop and the Neo4jCypher MCP server, my next step would
be to integrate the MCP using Spring AI.
I'm starting down that road a littlebit just now, but I haven't gotten very
far, so I'll keep you updated there.
Now I also in this explorationwith MCP and playing with Claude

(04:58):
for flat file updating and so on.
I came across one instance wherethe Large Language Model was wrong
and it dug in its heels and wassomewhat passive aggressive on being
wrong and saying that I was right.
Here's the story.
So I asked Claude to pull the mostrecent three reviews for some businesses,

(05:21):
and it wrote Cypher syntax that useda list range of zero to two, which to
most programmers' eyes looks right.
Starting at an index of zero, ending on anindex of two, that's three indexes, right?
Zero, one, and two.
However, Cypher is actually exclusiveof the end index, meaning that

(05:42):
it pulls up to that index but notthat index, so therefore it would
pull zero and one and not two.
Claude justified the response thatsome businesses only had fewer than
three reviews, and that was whywe were only seeing two reviews in
the output results from that query.
I thought, okay, no big deal.
That looks right, but every business waspulling two and I did a quick count on the

(06:06):
reviews per business and saw that therewere tons more reviews than just two.
I went out to the Cypherdocumentation found, oh, the list
ranges actually, are exclusive.
And so I mentioned that to Claude.
And Claude fixed the syntax, but thetext response that it added to that
syntax started by acknowledging I wascorrect, corrected the Cypher to match

(06:28):
my expectations and what would actuallypull the results I was looking for.
But, then the following paragraph endedthat in the Cypher zero to two range
returns indices 0 1 2, three elements.
And zero to three returnsindices zero 1 2 3, 4 elements.

(06:49):
Which is not correct becauseremember, it's exclusive so it
won't pull that last element.
And that the previous query was workingcorrectly and that some businesses
only had fewer than three reviews.
So it dug in its heels.
It was being rather passive aggressive.
The nerve, right, of, ofa Large Language Model.
This is a good reminder that theseAI systems are incredibly powerful.

(07:10):
As I talked about with editing the flatfile and going through and escaping
those characters and then doing some ofthese Cypher queries by just prompting
it with natural language and then ituses the MCP server, creates these Cypher
queries, runs it against the database.
That's really awesome, but it's alsoincredibly persuasive when it's wrong.
If I hadn't known my Cypher andcross-checked the documentation to back

(07:34):
it up, I would've assumed that Claudewas right and I was in the wrong.
When people do these things,there's usually emotional
consequence that follows.
When you do something and you saysomething wrong, and then somebody
executes and does something basedon that information, there's usually
some emotional consequence thatgoes along with you being wrong
and telling somebody else that.

(07:55):
There's some element of shame orguilt or admitting fault or legal
consequences or something else.
But the AI doesn't have this component.
That's just the human elementof when something goes wrong
and when something goes bad.
There's usually like a, a responsibilityor accountability section.
But how do you make a largelanguage model, a system accountable

(08:16):
for these types of things?
So this is an interesting conundrum that Ithink will continue to develop over time.
And this was just the first instancethat I have had of kind of this
blatant dichotomy of, you'reright, but you're also wrong.
And I'm right.
I'm gonna dig in my heels on this.
Now, the other thing this week that I havekind of worked through is my colleague and

(08:37):
I talked about the difficulty of findingand cleaning data sets to use for demos or
applications or sample graphs or whatever.
You end up with data that's ina variety of different formats,
cleanliness architectures, the modelstructure, and licensing states.
So sometimes you'll end up with a dataset that covers a lot of ground and

(08:58):
does pretty well, but there's likean array or something in one of the
fields and you need to kind of sectionthat out or the, the model that the
files are in doesn't quite align tothe data model that you're looking for.
So you have to adjust that andrefactor it some and so on.
Now, my colleague shared that alarge language model can do small
data sets, basically create dummydata sets from the ground up, with a

(09:23):
lot of step-by-step instruction andbuilding up of the related components.
It would be nice to have much morecapability in this area to have a Large
Language Model generate dummy data atscale, but we're just not quite there yet,
unless somebody knows of a tool that wehaven't heard of yet that may be coming.
But this is something that as adeveloper advocate and as someone who

(09:45):
works on databases and backend systems.
This is something that is supervital to be able to showcase the
technology, and yet it's probablythe longest pole in the tent when
we're pulling together applicationsor sample integrations and so on.
It is just finding a goodclean data set that works well.
The content piece that I want tohighlight this week is about the

(10:08):
Neo4j data modeling MCP server.
So yes, yet again, a little bit more MCP.
I had come across this articleactually several weeks ago, and
finally, when I'm in the MCP worldthis week, took a few minutes of
time and sat down and got through it.
Alex Gilmore, who's from Neo4j,wrote this article as a way
to outline kind of what the.

(10:29):
Relatively new data modeling MCPserver does and how it works.
It's a way to interactively define adata model using a chat large language
model interface, in natural language.
So say, Hey, I have this data set.
I wanna construct something along theselines for a e-commerce data model or a

(10:50):
cybersecurity data model, and the largelanguage model can work with the MCP
server and interactively put togethersomething that is reasonably sensible.
Now, of course, you can make someadjustments and you can also export
that model to a JSON file from the LLMinterface and load that into a tool like

(11:12):
Neo4j Arrows, for instance, where youcan load in JSON and then it loads the
model into a visual format, and then youcan tweak that model and send it back.
You can also then take that model that theneo4j data modeling MCP server generates,
and you can use that in conjunctionwith the Neo4j Cypher MCP server to

(11:32):
actually load the data into the database.
So the data modeling MCP server doesn'tactually write to a database, but
you can use it with one of our otherMCP servers in order to do that step.
The end of the article links toanother one of Jason Koo's videos
that interactively walks through thesesteps and shows you how it works.
It's pretty cool, just theinteraction between the chat

(11:52):
interface and the data modeling side.
It'll generate a visualizationand you can work with it back and
forth to edit and tweak things.
I know there is still some cautionaround MCP and some of the flaws that it
brings to the table, but it is cool tosee what the technology makes possible
and the things that we can do with it.
After starting the week with a fewsmall tasks, several paths converged

(12:15):
and put me into learning mode withMCP, or model context protocol.
I explored integrating some MCP serverswith Claude for asking questions
about data and also caught up on anarticle about the recent release of
the Neo4j data modeling MCP server.
I'll keep you posted on wherethe journey takes me next.
Thanks for listening and happy coding.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

New Heights with Jason & Travis Kelce

24/7 News: The Latest

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Ep53: Language Models for Data Tasks + MCP Journey Begins

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

New Heights with Jason & Travis Kelce

24/7 News: The Latest

All Episodes

Ep53: Language Models for Data Tasks + MCP Journey Begins

Stuff You Should Know