Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:05):
You are listening to the BreaktimeTech Talks podcast, a bite sized tech
podcast for busy developers, where we'llbriefly cover technical topics, new
snippets, and more in short time blocks.
I'm your host, Jennifer Reif, anavid developer and problem solver
with special interest in data,learning, and all things technology.
I'm continuing my journey withvector databases from last week.
(00:28):
I did make some more progress,but still not quite where I
want things to be just yet.
I also got an application with SpringAIand both Pinecone and Neo4j databases
spun up and in the works, as well.
Still making progress there as well,but it's, it's a work in progress.
Then I found an article on contributingto open source projects that
(00:48):
discusses both the benefits andprovides some ideas if you're curious.
So I did make some progress on the vectordatabase project that I was playing
around with and talked about last week.
I had some wins that pushed me aheadand on to the next things, but then I
faced some further hurdles, of course.
Vector store architecture, I found out, isa little bit different from what I'm used
(01:11):
to with relational or graph databases.
For instance, as an example, graphdatabases start with data and then the
index will sit on top of that data.
It's used as the starting point of thequery or for relational databases. It's
used to do a lookup for the tables andsuch, the rows that you're looking for.
(01:31):
So you can load data and or createthe indexes first in a graph database.
It doesn't really matter as longas you don't have conflicts between
the index and the data that you'retrying to add that index to.
Where the vector store has an indexthat actually encases the data, which
(01:52):
allows you to partition the database
by different indexes.
So you could have multiple indexesin the same vector database and store
different vectors, different documents,different whatever you're putting in
there, and separate those out by index.
So you're going to load datainto an index, and then removing
that index also removes the data.
(02:14):
This was an architecture thing thatI just don't think really sunk in
until I saw some of the architecturediagrams and started playing with
importing and exporting data inand out of the vector database.
And then I realized, oh, this isdifferent from what I was expecting.
I need that index inorder to put data into it.
And in order for me to query thedata, I have to reference and
(02:37):
use that index in order to findthe data that's connected to it.
If you're playing with vector databases oryou're new to that, that is going to be a
shift from maybe other types of databases.
And then I also ran up againstloading the data into the database.
I did some further work there, and therewere some things that were out of sync
that made it a little bit more of a trialand error process than I would have liked.
(02:59):
First off is that Pinecone haschanged some of their API stuff,
and their documentation doesn'talways reflect the latest thing.
It's not fully up to date.
Some pieces are, and that works fine.
But then other things seem to be outof sync or don't quite explain fully
what the new steps and process are.
Then also in Spring AI, some of thedocumentation there is a little bit
(03:21):
out of sync because they recentlyupgraded to a new milestone release.
Some things have changed there.
Most of the documentationaligns and matches that M6
release, but not all of it completelyaligns, especially when you
look at different vector stores.
That was the thing that I noticed.
I was having trouble with onevector store, so I just dropped that
dependency and pulled in anothervector store and similar issues there.
(03:44):
Even though the API For Spring AIVectorStore, it looks the same.
Underlying, if the database has changed,they don't quite sync up with the
latest version of the database always.
So hopefully there will be somechanges there, and they'll update some
of the libraries and such that areworking with the backend databases
to reflect the latest API changesfrom those VectorStores, but not
(04:06):
everything is in sync just yet.
And again, lots of changes going on lotsof times, so things changing at variable
times, and that's hard to keep up with.
But I went back to Spring AI to read andparse the JSON that I was talking about
last week, and it was a little bit trickybecause I realized and it fully sunk in
that I'm dealing with JSON lines format,which is different than a lot of, I guess,
(04:29):
standard or typical JSON formatting.
It's just wired a little bit differently.
Instead of throwing multiple objects intoan array separated by commas, it just
lists the multiple objects on each line.
So they're separated by lines, basically.
This was a little bit tricky whenI was trying to read and parse.
A lot of the Java libraries work withthe regular JSON array formats, but
(04:54):
don't necessarily work with JSON lines,so I had to do some kind of reworking
or some additional research in orderto figure out how to read JSON lines
format and parse that and operate on it.
But once I figured it out, I thinknow I have a decent starting place
where maybe, hopefully, I won't havethose same problems going forward.
I do plan to put together a projectand probably some content, blog
(05:17):
posts and so on, detailing
the differences and the things thatI worked with and things I ran up
against, but that's not out yet, andhopefully I'll get to it sometime soon.
The next thing was that therequired Pinecone configuration
has changed in Spring AI and theUI has changed in Pinecone itself.
So I spun up a free tier instanceon Pinecone on their cloud.
(05:41):
First of all, their free tier doesn'tuse the namespace qualifier, which is
fine, it's not required, but it is alittle bit confusing sometimes looking
at their documentation. It throws thatnamespace thing in there all the time,
and sometimes it's hard to know, do Ineed to have that, do I need to not,
Then the environment format has changed.
It used to be something like cloudprovider plus the tier that you
(06:02):
were on as the environment, butnow it's actually the cloud region
where that database is hosted.
Again, it's not super clear inthe documentation how this looks.
Also, project ID is crazy hard tofind, but I finally figured it out
that it's inside the host name itself.
So when you look at your free tierinstance of Pinecone, it shows
(06:22):
the full host name, the full URI,
and the project ID is actuallywithin that value, but you do
have to know where to look for it.
So it took a little bit of trial anderror, but I figured out that the
host name combines index name dashproject ID dot svc (for service) dot
environment dot pinecone dot io.
(06:44):
So in case you're wondering,it's index project ID.
dot service, environment,dot pinecone dot io.
Super confusing to find out, but onceI know the format that they're looking
for, it's actually much, much easier, andI was able to get Spring AI to connect
with no problem once I figured out allthe pieces and assembled them correctly.
That was really tricky. Took me waylonger than what I wanted it to, but
(07:08):
again, some things that were out ofsync between Pinecone and Spring AI.
The next thing is I spun up a Spring AIapplication using Pinecone and Neo4j.
The tricky part I found here is that Ineeded to have multiple vector databases
configured in the same application,
which meant that if I tried to justlet Spring do its auto configuration,
(07:31):
the vector store beans would conflictbecause a generic vector store bean would
try to pick up Pinecone and it wouldtry to pick up Neo4j and it'd be like,
which one do you want me to pick up?
I don't know.
I actually needed to create two separatebeans and and set those values separately.
But then, I was setting the propertiesfor the index name and label and project
(07:51):
ID and so on inside the application.
properties file, and not in my environmentitself, so I needed to wire in the
environment inside my application toset those values from properties files.
I'll probably try to reworkthis and clean this up a little
bit, but it is working for now.
I do have the beans working,and I'm able to connect to
either database as I so choose.
(08:12):
The last piece I want to talk aboutis a piece of content that I came
across. And actually, I came acrossthis a few weeks ago and just
haven't had a chance to cover it yet.
So I'm excited to do that today.
And the article is calledWhy and How to Participate in
Open-Source Projects in 2025.
I've actually had a few differentquestions on this and I've just
been thinking about this becauseI feel like just recently I've
(08:33):
started to feel pretty comfortable
about finding gaps or holes or thingsthat are missing and being willing and
able to contribute back to a project.And when I originally graduated school,
everyone was telling me Oh, you needto build your your portfolio and work
on projects and contribute back toprojects and so on. And I always found
(08:53):
this really intimidating and hard todo because I felt like, well, I don't,
I don't know what I can contribute.
I'm not sure where to start.
And really, I've just found that overtime and over comfort and building
more things, I'm starting to seethe gaps or where things could be
improved or where I might be able tohelp and contribute something back.
So I would give it some time.
(09:15):
Give it some practice.
Just keep building things as youcan, and it's easier said than done.
I know that. But this article talksabout contributing to open source
projects as a way to accelerate yourcareer, enhance your skills, and
expand your professional network.
It's going to improve bothyour hard and your soft skills.
First of all, on the more technicalside, there's a wide range of technical
(09:36):
contributors on a project that haveall kinds of technical knowledge.
And from my perspective, think aboutwhere a lot of open source projects start.
Right?
They're going to be the things that aregaps or that people want to complement
existing tools or technologiesor help make things easier when
working with those tools, or thosewho just want to explore and learn.
So they're going to be some prettybright people who want to be there and
(09:59):
want to contribute and build betterthings and make development easier.
And then on the
other side, you have communicationand group decisions and building
the credibility and yourvoice within the community.
Those are really valuable skills to have.
It's going to improve your technicalstanding as well as just your
community standing and networkinginside the community itself.
Then the article goes into how toactually get started contributing
(10:22):
to open source projects.
The first thing is to pick a project.
Try to align with yourinterests and your career goals.
Remember that open sourcecontributions often start as a hobby.
So again, find somethingthat interests you.
I always say to find something youwant to know about, you want to learn,
something to enhance your, your career,your skills, and just look for those
(10:43):
projects that seem fun and interesting.
Then the article says to join thecommunity communication channels
that help you understand the techand how the contributors work.
So just really get a feel forhow the process works, how these
contributors operate and handleconflicts and discussions and so on.
Then the next thing is to lookat the documentation, and I would
(11:05):
add to this content as well.
And the article actually does talk aboutcontent alongside the documentation.
But it might be slightly separate fromlike official reference documentation.
Over time, I think this is the easiestplace to find gaps and contribute because
there's often things that are overlookedor that are missed or that are out of
sync because the code changes or featureshave been added and documentation is
(11:30):
often an afterthought, or thingsget missed, or maybe some people
understand things a little bitdifferently, and so might need a
few other descriptions or examplesto help comprehend what's going on.
That's kind of a, an onboardingramp if you want to start somewhere.
I think documentation is a great place.
Then the article also says to start smallwhen you're ready to start contributing.
(11:52):
Look at things like tests, documentations,refactoring, tasks that are really
valuable but are often overlooked. So again, it'sthose small hidden things that may be
frustrating when you're just startingout or that you might not have a lot
of the background experience that long-term contributors might have had.
And so those are things that you'regoing to have, those external skills
(12:14):
that you can find those gaps andmaybe help fill them much better.
Then, as you build your credibility,then you can start adding to
enhancements and new features.
The article points out that if youjust drop into a community and say,
this needs added, or I think thiswould be good to add, you're not
really going to have a good community
rapport built up there.
(12:34):
And so it's not going to be takenmaybe quite as seriously or prioritized
as highly as if you've alreadybeen involved in the community and
wanting to try to make the projectbetter and to help out existing.
And then the article closes with someexample open source project ideas.
But also, whatever stack you mightbe working on probably has some
libraries or some things that aremissing and out of sync that could
help use some contributions too.
(12:56):
So I would encourage you to lookwithin whatever stack you might
be dealing with at the time.
This episode, I talked through mylatest progress in playing with vector
databases, getting data loaded, andthen exploring it with Spring AI.
Then I discussed an article onthe why and how of contributing
to open source projects.
As always, thanks forlistening and happy coding.