Ep54: Spring AI Integrations + Real-World RAG Challenges

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:05):
You are listening to the BreaktimeTech Talks podcast, a bite-sized tech
podcast for busy developers where we'llbriefly cover technical topics, new
snippets, and more in short time blocks.
I'm your host, Jennifer Reif, anavid developer and problem solver
with special interest in datalearning and all things technology.
The last several weeks were filledwith travel to tech events and giving

(00:28):
several presentations and workshops.
It was amazing to connect with so manyinteresting technical audiences and chat
about solving problems with technology.
But I am also happy to plant my feetat home for a few weeks and dig in
on some new things and produce someother types of technical content again.
The first thing that I've come acrossrecently is someone asked me how Neo4j

(00:49):
handles sparse vectors, I thoughtI knew what dense versus sparse
vectors were, but it turns out thatafter I did some research and digging
that it was a little bit differentthan what I was initially thinking.
First thing, what aresparse versus dense vectors?
They're a different data representation.

(01:10):
Dense vectors are what probably most ofus are familiar with in generative AI.
They're long arrays of floating pointnumbers, representing semantic meaning
of the data that's being encoded.
Sparse vectors, on the other hand,contain numbers that represent
whether or not a word exists in thedata, and then fill in the remaining

(01:32):
numbers of the array with zeros.
The arrays themselves are sparse then,rather than the actual data being sparse.
Sparse vectors are a bitmore storage efficient.
They can be used for some keyword matchingscenarios and represent the syntax
of the data versus semantic meaning.

(01:52):
You might imagine some use caseswhere understanding how language is
structured, what words might comenext, or how sentences or paragraphs
or even documents are constructed.
And looking at the very technicalsyntax elements of that construction
versus looking at how we comprehendor understand the words that are

(02:13):
being written for the documents,paragraph, sentences, what have you.
So slightly different use case,slightly different way of encoding data.
The next thing I have played aroundwith recently is integrating Spring AI
MCP with one of Neo4j'S MCP servers.
I tried this for the very firsttime, and I did get a sample working.

(02:36):
I hope to put together a blog postthat will detail more about this and
what I did and what went wrong andthe things that I struggled with.
But the biggest thing that was the hardestwas looking at the Neo4j MCP Cypher
readme file in the GitHub repository,which serves as the documentation
right now for the library, and thenlooking at the Spring AI MCP client

(02:58):
documentation page and trying to figureout how to assemble those two things
and fit the puzzle pieces togetherin order to get a working solution.
There were a few little things Ilearned along the way and it was fun
exploring this playing around with it.
I was able to get some, not just theintegration working, but was able to
work with the text2cypher translationin the Neo4j MCP Cypher server.

(03:22):
That was the one I picked to connect to,and then I was able to do some text2cypher
queries and play around with that a bit.
So it was a lot of fun and Ilearned a lot in the process.
I hope, again, to share morewith you in a blog post.
Another thing that I dealt with overthe last several weeks was, a few weeks
ago I was playing around with Pinecone,the vector database, and Neo4j graph

(03:46):
database and trying to do vector RAGand graph RAG, and looking at how
the two were similar and different.
And I wanted to do vector RAG withPinecone and then do graph RAG with
Neo4j, and I was looking for the mapto record method in the Spring AI
vector store, and I couldn't find it.
They have a map to record and then a mapto document, as well, to map entities

(04:07):
that are coming back from the databasefrom the vector similarity search back
to your Java application entities.
And this was really nice.
So I thought they had just removed itin one of their most recent releases.
But it turns out I came back severaldays, weeks later and was looking around
and digging around, and I found thosetwo methods in the Neo4j vector store.

(04:30):
So this is only implemented in theNeo4j vector store and not several other
vector stores, or at least Pinecone,and the general vector store class.
Now, I'm not sure why the other vectorstores don't implement a way to map
back to your Java objects, or maybethey do and Pinecone doesn't, or I
just haven't found the way to do it.
But I found this rather interesting.

(04:52):
Now it is a bit unique to Neo4jin that we are returning nodes and
relationships from the databaserather than traditional records or
documents or whatever the entity is.
A lot of times we will add in additionalmapping layers to our integrations,
the libraries, and anything we mightwork with just to make that using

(05:14):
the relationships, especially, makingthat mapping back a little bit easier.
However, I did find this unusual thatmaybe a lot of your vector stores
don't provide a way to do that aswell, because then you're left doing
it custom and doing it manually.
Again, maybe I've missed something,but I couldn't find it available in
Pinecone, which then meant I would haveto map it manually, which seemed like a

(05:37):
lot of effort for something that maybecould be included in just the plain
vanilla vector store class in Spring AI.
Also, over the last couple of weeks, I'vepresented a workshop and this workshop was
put together by some colleagues of mine,but it focuses on the different types of
RAG retrievers and how you should designthe way you're pulling the data from your

(06:02):
external data source for the differenttypes of questions you might be asking.
For instance, a vector searchwill pull semantic similarity
or general knowledge, what issemantically or contextually similar.
And then you can extend thoseresults using additional retrieval
queries on the end of those.
So pulling additional metadata orcontext from whatever data sources

(06:24):
you're looking at, and then adding thatto the vector similarity search result.
But your questions then needto look for broader or more
general information, right?
Things that lend themselves wellto semantic similarity searches.
If your question is regarding a specificentity or trying to do some sort of
keyword type search, at least initially,then that's not where vectors shine.

(06:48):
For instance, looking at a particularentity and traversing out from there.
We actually talk about thisin the content piece I'm gonna
highlight here in just a second.
So bear with me, for a minute, if you'restill not sure where this is going.
One more thing I explored and kind offound out over time is, and it seems a
little bit obvious now that I'm lookingback on it, but I was playing around with
my Goodreads demo application, the SpringAI Goodreads repository on my GitHub.

(07:13):
And it's a book recommendation systemthat I integrate with generative AI.
I tend to search using a keywordsearch approach, like I would
a library or a topic search.
When I was a kid growing up, I wouldwalk into the library, I'd walk over to
their computer, I'd type in some sortof keyword or a topic or genre I was
interested in, and it would retrieve theresults or anything that might be tagged

(07:36):
or included in the book descriptionwith that search phrase or search word.
But I have to remind myself thatthat is a very keyword search,
deterministic system type of lookup.
Now we're dealing with somethingthat's much more natural language,
more non-deterministic, moreconversational, and those keyword search

(08:00):
phrases maybe won't be as powerful.
Doing a keyword search for something likemy book recommendation system is basically
doing a fancy synonym finder, right?
That's not really the powerof large language models.
Instead of then searching, magical" or"dragons" for my search phrase looking
for books, I started using promptslike "I want to read something to ease

(08:25):
my stress and relax during a mountainvacation", or "I want to challenge my
mind and read something about technology".
I found that a more conversational,contextual, complicated set of questions
met better, more interesting results.
Pulling all these different termstogether, based on my mood, the topic I

(08:45):
might be interested in, current interestsat that specific time can produce a
more complicated picture that an AIcan combine all these different search
vectors and find the most likely pathcombination, including all those things
versus giving it one keyword search.
It's really kind of vague and notenough information for it to pull
something that's really highly relevant.

(09:07):
Adding a little bit more context actuallygives it more data to go on and could
help produce much better results.
The content piece I wanna look at todayis Your RAG Pipeline is Lying with
Confidence - Here's How I Gave It aBrain with Neo4j, and this ties back
to some of the concepts we've alreadyexplored earlier in this podcast.
I really thought that the authordid a great job walking through

(09:30):
this, structuring it, workingthrough it methodically, and
explaining things along the way.
The big term that the author usedwas that traditional RAG systems have
"semantic drift", where each step inthe RAG pipeline delineates the meaning
or loses some context along the way.
And the author links this to a game oftelephone that you might play when you're

(09:53):
kids, where one person either whispersa term or a sentence or something phrase
into the next person's ear or a pipe orwhatever you're using as a transmitter
device, and then that person hears itand then passes that message on down.
By the time it gets to the end,usually whatever it was that was
the starting message is totallydifferent and way off base.

(10:17):
The author then takes thisexample and says, your RAG
pipeline is doing the same thing.
There's some semantic drift overtime that occurs based on what
RAG drops or misses or whatever.
There were a couple of things thatthe author did to try to combat this.
The first was going frombasic to smart chunking.
Then the next is dealing withsemantically blind retrieval.

(10:40):
As we talked about earlier on thepodcast, sometimes high similarity
doesn't mean high relevance.
As an example, the author was dealingwith different types of employee leave
policies, and if someone were searchingsomething on parental leave, vector
search would pull anything that'srelated to parental leave, and it was
actually pulling some termination leavepolicies in with that because it was

(11:03):
semantically similar in the chunk.
However, you don't wanna returntermination leave policies to
someone looking for parentalleave policies, right?
So, trying to figure out, okay,how do I fix this and improve this?
The author dealt with thatsemantically blind retrieval and
trying to go about solving that.
Then last thing that the authortalks about is that the system

(11:23):
prompts can't fight bad retrieval.
Trying to take whatever wasretrieved and then telling the
LLM and the system prompt, dothis, don't do that, avoid this,
structure it this way, clean this up.
Was turning out bad results.
It's kinda like handing somebody a basketof junk and saying, make gold out of this.
The large language model isin the same boat, basically.

(11:45):
You're trying to tell it, Hey,here's, some really crummy results.
Try to make something good out of this andtweak this and do this and clean this up,
and the LLM just isn't very good at that.
So the way that the author combatedthis is to do some smarter chunking
with text overlap, that was goingfrom basic to smart chunking.
Then the retrieval re-ranking bytraversing the policy type in the

(12:06):
graph to narrow down that keywordtopic first, rather than just the
semantically related topic first.
For instance, that leave policy problemthat I had mentioned just a minute ago.
Narrowing down and looking at, okay, Iwanna focus on parental leave policies
and then traverse out from there and pullthe related context based on that and not
incorporate semantically similar thingsthat might include termination leave.

(12:29):
Then the third thing is to useretrieval that adds to the prompt and
not try to use the prompt to negate orbribe the context into a good answer.
This week from upgrading my skills onSpring AI to reassessing and updating
my knowledge of general skills, I feltlike there was a lot of food for thought.
We talked about how sparse versus densevectors are different, integrating

(12:50):
Spring AI with MCP, and then discussedan article on how being thoughtful about
solution design produces better results.
As always, thanks forlistening and happy coding.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Crime Junkie

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Ep54: Spring AI Integrations + Real-World RAG Challenges

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Crime Junkie

All Episodes

Ep54: Spring AI Integrations + Real-World RAG Challenges

Stuff You Should Know