Evolving Responsibilities in AI Data Management

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:11):
Hello, and welcome to the Data Engineering Podcast, the show about modern data management.
Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale.
DataFold's AI powered migration agent changes all that.
Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches.

(00:35):
And they're so confident in their solution, they'll actually guarantee your timeline in writing.
Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds
today for the details.
Your host is Tobias Macy, and today I'm interviewing Bartov Mikulski about how to prepare data for use in AI applications. So Bartov, can you start by introducing yourself?

(00:57):
So I'm Bartov. I'm an MLOps engineer. I've been working as a data engineer for some time, then I switched to MLOps.
And,
along the way, I realized that
that the real problem is not really the software that if you do, maybe the data is more important,
like, not
even the data that you process, but the way how to test it.

(01:20):
And this kind of applies to AI also.
So this was the the way how how it went from data engineering to AI.
And do you remember how you first got started working in the space of data and AI and ML?
Kind of by accident. I mean, we had some
data engineering work,

(01:42):
to to do in the back end project, and I liked it. So then I stayed in data engineering.
And,
then along the way,
I got interested in, mapping,
but I was never very good at training those models.
But somehow, I was better to deploy them and keep it running so I got involved in MLOps,

(02:05):
and that I just stayed in this idea.
And so now that
a lot of the industry has started moving into the space of generative AI and building AI applications,
obviously, there are a lot of data requirements that go with that. But I'm wondering if you could just start by outlining some of the main categories of the types of data assets that are needed specifically for AI applications

(02:30):
and some of the ways that that maybe differs from the, I guess, traditional, I'll I'll say, data assets that engineering teams are used to working with.
Okay. So
first of all, this is machine learning, so it doesn't really differ that much.
You need some test dataset.
In the case of generative AI, we call it evaluation dataset because, you know, we need fancy naming.

(02:54):
But this is your test, dataset,
and you will use it to
verify if this thing whatever you are building works works correctly.
You don't need the training data set that much though unless you are going to fine tune something.
So
it should be easier,
to get the required data.

(03:15):
Although,
then quickly you realize that if you are building something that involves multiple steps,
like, you do one
call and you retrieve some data, you do another call to do your AI model,
you need the tray and the testing datasets
for all of the steps separately because you will be

(03:36):
testing them separately
and to figure out what doesn't work because
some
somehow, always something doesn't work well,
and you have to figure out what it is. Yeah.
So
you will have a lot of test data, and,
this is the the data asset that
you would need to gather somehow. You can

(03:58):
well, you you can generate it to some extent,
but at some point,
you will have to start getting the real data.
And in this space of generative AI applications, there are a few different
styles that have been emerging.
RAG was kind of the first one once we moved past the initial phase of just prompt engineering.

(04:21):
And now we've moved into these agentic architectures,
and there are a few different, styles of AI apps that have been coming up. Another one being graph rag where you incorporate knowledge graphs.
And I'm wondering
how the particular type of application
changes the requirements around the types of data assets that are available to those AI applications for feeding into the models or or for storing some of the outputs or

(04:48):
metrics around the model generation itself.
Okay. So besides the data in the database, obviously,
what what you need is,
as we discussed, the evaluation dataset. So in all of those cases, it looks a little bit different. It's It's because for a rack, you have the user question,
and then the AI is calling some

(05:11):
service, let's say, database
with some query. So it needs a test set that
consists of at least two things, the input from the user and the query you want to send or multiple queries.
And then you
check if this really happened or if the query that was sent is similar enough to what you expect.

(05:33):
Then you get the response,
and you have to generate the answers. And this needs it's separate,
dataset
for testing
because this is another step, and it it can fail too.
And then, of course, you will test it as a entire workflow.
So you have to

(05:54):
be prepared for this also.
And this is was one interaction, really. Yeah. You receive something and you generate a dancer. And if you have multiple steps, you will have to multiply your datasets.
And for the agents, it gets even funnier because, you have no control of of the of the process.
Well, I have some control over the process, but the agent can

(06:17):
choose a tool and choose the parameters for the tool.
And then your dataset has to contain
the queries and the tools that you want to use and the parameters you would expect to see when this query is sent.
So, basically,
keep multiplying the test datasets.

(06:37):
In terms of the areas of responsibility
for what the
role looks like and who is responsible for what pieces of the life cycle of the application and the different
data that gets fed into or retrieved from those different stages, I'm wondering how you're seeing that
breakdown in terms of different organizations

(06:58):
and how that maybe is influenced
by the either size and scale of the organization or the type of application or use case that they're powering.
K. So as a freelance AI engineer, I can say that everything that is even remotely
related to AI is
responsibility of the AI engineer,

(07:20):
but it doesn't have to be this way. So
we already had some setup, yeah, because you probably had some ML models.
So it can stay this way. You have the data and training into this gathering the data
and maybe cleaning it. You have the data scientist.

(07:40):
In this case, we'll write the prompts and do the experimentation on the prompts.
You might have the MLOps team deploying it. In this case, deployment is really just changing the prompt unless you
use the open source modem, then you have to redeploy something.
But on the other hand,
the step when you are getting the

(08:02):
production data is more, work intensive because you have those
intermediate
calls to the to the model.
So in this case, the envelope steam is still required.
So I think it doesn't today, it doesn't change that much.
It does
requires maybe

(08:23):
something that we are not used to. We're just working with text
on both ends of the
the model. So just not feeding it text, but also getting the text from it.
And for data engineering teams in particular who are used to working with more structured datasets, doing something along the lines of data warehousing, business intelligence, or maybe even

(08:47):
feeding some of those curated datasets back into application contexts.
What are some of the types of skills that transfer well to this world of unstructured data and preparing it for
AI applications, particularly working with things like vector databases?
And what are some of the skills that need to be acquired for people in that situation so that they can more effectively

(09:11):
work with and support the MLOps and AI engineer teams?
Okay. So if you
outline the process
in detail, you will always find something that we are right now. So if you do calls to the database, you probably know the query language for the database, whatever it is. If it's SQL or
any other thing,

(09:32):
you you might know it already.
So this is a skill that
you can just use.
In the other areas okay. Maybe vector databases might be kind of new for you.
So this,
the data entry team might need to learn
because

(09:52):
it is like a normal database when you are inserting data to into it. So it's maybe not that relevant, but when you are receiving,
it's
a little bit surprising at first.
So
the matching of of documents.
The other things that you can transfer,

(10:12):
I think, the entire machine learning process,
like the deployment,
follow deployments, AB testing, testing in general, experimentation,
this doesn't change. Change the tool, but the process stays the same.
So
people already know a lot that they need to know when they use generative AI. Maybe they just don't

(10:34):
realize it yet.
On that
vector database side,
they take a number of different forms where you have document oriented vector databases
in the shape of things like Qdren.
You have pure vector databases,
sort of like Pinecone,
and then you have vector
add ons for relational databases like PG vector and postgres,

(10:58):
as well as a whole slew of other
formulations of vector storage in various contexts.
And I'm wondering how
the inclusion of vectors
as a data type and as a core asset that is consumed
and produced by these AI applications
changes some of the ways that

(11:19):
teams need to think about data modeling,
in particular around things like trunking strategies,
metadata management.
What are the pieces of information that you want to strip out before you run it through an embedding model? What are some of the pieces of information that are actually useful for putting into the embedding model? I know that, for instance, HTML, there have been conversations about whether to keep or strip out the tags, whether they're helpful or harmful, and just some of those types of,

(11:45):
you know, tactical elements of building these data assets that teams need to be thinking about and trained up on.
Okay. So trunking is definitely something new for data engineers,
and you just have to get used to this.
So from the strategy start that you have basically, you have to remember that

(12:05):
you will probably have to chunk the documents
because they will not fit in the context window of the model. Even if they do, that might be too expensive to use it this way.
So even though the best
way
might be
possibly,
we will have to test this always to always send the entire document to the model.

(12:28):
But in reality, you will chunk it. So you have several ways to do it. You can just decide that there's a fixed
size of the trunk.
So let's say, for the sake of the example,
500 characters,
and then you just cut the text every 500 characters, maybe too small a number I think this too small, but just for example.

(12:50):
And then you start to build on top of this idea because you probably don't want to,
cut it in the middle of the word.
So you might have some tracking started with that. Okay.
It's 500.
But if there's in the middle of the word, we do it a little bit earlier. And then you realize, okay. That's not amid the middle of the word, but still in the middle of the sentence.

(13:11):
And then you go back. Yeah. It's not in the middle of the sentence, but maybe this inside of the paragraph. And so, basically,
just
invented the recursive trunking strategy. You are cutting the text
when it makes sense
to preserve as bigger chunk of as you can. But if you can't, then you just resolve

(13:33):
just just
stay with
the trunking of,
in the middle of the word.
But,
still, it might not be enough because,
possibly, you can just be unlucky.
Yeah. So,
the sentence that you need might end up in the other trunk.
So then we added overlapping,

(13:53):
trunking strategies to take some
part of another trunk. You not even considered it,
like, to be the part of the trunk that you want, but you just overlap with another thing. And you have duplicates, but it's it's supposed to help you find the the relevant information.
But still,

(14:14):
it's,
may not be enough because,
sometimes when you write the document, you have first
description of the problem, like a few paragraphs, and then you start writing the description of the solution. And your tracking extension might perfectly,
allow you to find the description of the problem, but you are not interested in the problem. Or even know what is the problem. You have it. You want the solution.

(14:40):
And it's somehow another trunk that was not matched. So then you can use some we got parent document ready, but when you match by chunks, but you get the entire document.
And
you can still build on top of those those ideas because sometimes out of strength
the topic in the middle of the text. And you can use something called semantic trunking. So use use the generative AI model to tell you where this trunk ends. So from the basic idea of cutting the text,

(15:12):
at some point, you can build a lot of, more advanced, techniques.
And then you realize that if you have document,
the the trunk
that you want to match and you match it by the query from the user, you are not really
matching the same things. You have the answer and the question, and that's supposed to be similar.

(15:34):
But maybe you would be looking for an answer that is similar to some other answer. So you just invented the hypothetical document embeddings, but you are generating the,
fake, it's not fake, answer to the user question. But you hope that the language vocabulary in this fake answer is similar to the actual answer. So

(15:56):
you can keep adding new things, and then you have metadata that
you can use
to narrow the,
space of the vectors that you have to, set. But this,
this is not something you can retrofit into the pipeline
because you have to start those metadata fields.

(16:16):
So if you start thinking of metadata, you have to go back to data engineering
and just add them.
And this might require a lot of work done again
that you have done already. But
this is what it is. Yeah.
Another
divergence from what data engineers are typically used to in the context of these embeddings and vector databases

(16:43):
is that
there's not a lot of opportunity for
being able to do sort of a backfill or an incremental reload,
at least in the case where you need to change your chunking strategy or change your embedding model. You need to effectively
rerun all of the data every time whenever you make a change of that nature

(17:03):
versus just I need to add a new document to the database using all of the same parameters.
Whereas in
more structured data
contexts, you can
either
mutate the data in place or,
you know, append to it without necessarily having to do as drastic of a rebuild.

(17:25):
And
given the fact that you might be dealing with large volumes of data,
it likely brings in requirements of more
complex or more sophisticated
parallel processing. And I'm wondering how you're seeing some of those requirements
change the
tool sets or
platform capabilities that

(17:45):
engineering teams need to incorporate and invest in to be able to support these
embedding
experimentation
and being able to evolve embeddings over time as new embedding models come up or as they need to change the trunking strategies or etcetera?
K. I'm
I think
this is not solved yet. At least, I'm not not I'm not aware of any solution to this

(18:11):
as of now.
So
for now, what what what I have been doing is just
creating new collections of, data,
with the different trunk strategies,
trunking strategies,
and using those. And, of course, you have to ingest them again, and it takes time, and you have to process them.

(18:35):
And if you use some
SaaS embedding model, you pay for the embeddings all the time when you do it.
So this is the problematic
part, and I'm not aware of any solutions. Maybe
definitely someone's working on them, and I would love to hear about it. But, I don't know it.

(18:57):
Yeah. In particular, I imagine that teams who are
doing sort of the traditional
extract transform and load or extract load and transform workflows for filling their data warehouse,
whatever
batch or even streaming tools they're using to do that likely aren't going to be able to,
provide the
timeliness or scalability that they would need for doing massive reprocessing

(19:22):
of all of the documents for regenerating embeddings, which likely pushes them into adopting something like a spark or array where maybe they didn't already have that as part of their infrastructure.
Yep.
And, then you have to,
explain the engineers that they have done something, and it was perfect, but we need something else,

(19:46):
which
is probably not
not not the thing to want to say to people rather often.
Yeah. But this is what it is.
It will be great to have a solution, but I think we don't have it yet.
And beyond
the embeddings,

(20:06):
as you move into some of the
more sophisticated
AI applications
where maybe you need to incorporate something like long and short term memory for
a chatbot or an agent style application.
You also have the management of conversational history and responses

(20:27):
and maybe also,
additional data collection to support fine tuning of some of those models.
How does that introduce new
requirements and new workload capabilities to data engineering and ML ops teams to be able to support those types of applications.
Okay. So I'm not that familiar with agent memory, so let's maybe focus on fine tuning.

(20:48):
So
first of all,
like in the machine learning classical machine learning, you need the training data set, and it consists of the input and out. This is pretty obvious.
But,
gathering the quality output gets sticky because,
you can
get the data from the chat. For example, if you're building a chatbot,

(21:13):
from from the chat,
and assume that if nobody is complaining about it, then it's probably correct.
But, this is not the case because people might just stop using this tool if they are not satisfied.
It doesn't mean that, like, someone doesn't didn't bother to click the button that they don't like it,
then which means that they have liked

(21:36):
the the output and you can use it.
What's even worse, how you are going to get the correction?
You
got something wrong.
The person who is using your app is not satisfied. They click the feedback button, but they don't like it. And now to show them the
message that okay. So write down what you want to get instead. So if you wanted to have a helpful tool and it already disappointed you, and now you also got a homework.

(22:06):
So this is this is not going to work,
this way.
So,
sadly, what you are going to need,
in all of the cases, you
might
get away with getting the data from the user, but you will need some data levels. So someone who can just write the
okay. Inputs you can get from the actual user, but the output that you expect, you need someone who know what is the output

(22:32):
and who can write them down
and what is the sad part.
In most cases, that person might be you. So
I will be writing this
because there's
no no one else was going to to do it. Yeah. But you need the data, and it's not not going to appear magically from nowhere.

(22:52):
Because of the fact that the overall space of generative AI applications and the different ways that these large language models are being incorporated
into
different application architectures,
thinking in terms again of things like agents
versus straight workflows versus just a back and forth chatbot and even just going from single turn to multi turn.

(23:16):
How has that
evolution of capabilities and use cases
changed
the types of work and responsibilities
for data engineers and ML ops engineers over that period? And maybe what are some of the ways that you are forecasting those changes to continue as we go forward?

(23:37):
K.
The generative AI by by itself was a big evolution in the AI space,
but,
I think it wasn't the biggest,
because
for me, the biggest trend was the
coding tools. Like, Cursor,
before it was, of course, a GitHub Copilot.

(23:58):
But it
it could finish the current line, but it wasn't reduced. Okay. It was useful. But, in comparison to Cursor, it's
almost nothing.
And, I think this is the the biggest, trend in responsibilities
because
now,
I, as a data engineer, I I can do front end now. This may not be the prettiest front end, but but I can do it. Yeah. I can I can I can make it work? So,

(24:24):
with this tool, you can have teams
who are really almost like full stacks in everything.
You might specialize in data engineering still, but but you can do other work.
And,
this really
makes a lot of things possible, and you don't need

(24:44):
to involve someone from other team when you are just building something,
maybe not even internally, like,
for the real long team, but
for just building something, might be a prototype that is
good enough to fall to other other people, and you don't need to involve,
the person from from another team, like a front end engineer who probably is not on your data engineering team because you don't need this kind of skills.

(25:12):
And and you can still still do it.
So for for me, this this is the biggest trend. Yeah? Like, the tools you can use to generate code. And, of course, it's not a pity code, might have some bugs, might be inefficient, but
doesn't really matter if it's can it allows you to do something that was not possible to do for you.

(25:34):
So
this was the biggest change
and, the biggest shift in the responsibilities
because now you have more responsibilities
in a sense,
because you can do everything.
Okay. Almost.
But
but you can still do it. Yeah. It's not that you got a responsibility
and you have not you're capable of doing it. Can you get it work?

(25:57):
As far as the skills and capabilities
for these engineers who are tasked with supporting
AI applications,
working with some of these vector databases, document embeddings,
getting involved in data collection for fine tuning datasets,
all of the various pieces that come into
supporting these applications.

(26:18):
What are some of the common skills gaps that you see or that teams should be aware of and watching out for and identifying opportunities for training on? So there is a huge,
huge gap between copy pasting some code to get it
work from some tutorial. They will have your first version of a chatbot

(26:40):
and making it work in production and not be ashamed of the result.
And it's not even
that much of the engineering work
as,
realizing that
you're
mostly like every other software. Your
software is going to only to to be as good as your test. And so if

(27:01):
you cannot test it and you cannot prove that it works, then it probably doesn't. And in case of the native AI, this testing might be very extensive because you have the entire workflow. We have the steps.
You have
the examples that you don't really want to see in production when someone is trying
to abuse your tool, but and they will try to abuse your tool.

(27:22):
So,
you have to handle this tool.
So this this is the the skill gap. You know? You can
get it work pretty easy using some old online tutorials
and then
spend months to get it to the
quality level that is required for production.
In your experience of working in this space and working with teams who are building these different AI applications,

(27:49):
building the data feeds that support them, what are some of the most interesting or innovative or unexpected ways that you've seen those teams address these evolving needs of AI systems and be able to support them as they evolve and scale?
K. Maybe not even
the needs of the system, but the way how you build it.

(28:09):
What was
most unexpected for me as a data entry is that
you can make the biggest difference with the user experience, with the UI.
I mean, because people expect that to
see a chat, maybe the summarize with AI button,
and you don't need this. You can just hide it in

(28:32):
the back and show them the the final result. Yeah? One of the projects we have built a reporting to top us just just a page. Yeah. And, you could get data extracted from some online reviews
on this page, and
it didn't scream this is AI based.
It's vast AI based, but you don't have to tell it everyone. Yeah.

(28:54):
Because right now, it
seems to be
the way to market the story. And now now this is the thing with AI. But people don't care, and
a lot of people actually don't like it. So maybe you don't need to show that this is AI based. You just use it and you show them the results, and they don't have to know it.

(29:15):
I I think another interesting
evolution,
particularly for data engineers in terms of the scope of their work, is going back to that
chunking and embedding generation.
The inclusion of ML and AI in the data pipeline itself, I think, think, is another notable shift from
maybe five or ten years ago where it was largely just deterministic processing and transformation

(29:41):
using
coded logic,
where now you're relying on these different AI models
being able
to generate those embeddings, process that content on your behalf,
particularly if you're doing something like generating,
semantic chunks where you actually feed the text through an LLM to summarize before it gets embedded. So I'm wondering how you're seeing some of that

(30:06):
shift in terms of toolset impact data teams who maybe don't have that history already.
Well,
not sure if I have seen a team like this because I've worked with, teams who worked with, natural language processing before.
So they're very used to,

(30:26):
to use something,
for embeddings
and text processing on those embeddings.
So this was not something fucking for those teams.
But, yeah, I I can imagine,
that it might
be might be something new because, for example, I have seen people

(30:47):
replacing OCR software with
models that can recognize data from images. And we have multimodal model
and,
just replace something that
used to be hard,
like, character recognition from menus
with
with a model, and it turns out to be cheaper.

(31:07):
So
there there is something
that might be for some teams, but
I don't think that this is,
as much as it was fucked like it used to be, like, two years ago when people
suddenly realized that this thing exists. And it it was there for some time already,
but

(31:29):
a a lot of people discovered it suddenly. Yeah.
And in your experience
of working in this space,
learning about some of these newer and evolving techniques for building and supporting these AI applications,
what are some of the most interesting or unexpected or challenging lessons that you've learned personally?
Okay. People will get very, very creative at trying to break the filter you build.

(31:54):
As soon after they realize this is this is AI, they will have they they will just have to break it. Someone will come and try to,
try to use it in some way that, it's not supposed to be used. Right?
But in the best case, not really. It's
very bad best case.
They will use it as a free pro,

(32:15):
to track the PTE. And just because you can process the the request,
and you have to be prepared for this.
And, really, if you have a chatbot, then maybe
switching it off when you detect that it's getting abused is not a bad idea.
You might laugh at at this,
this approach, but

(32:37):
it it this is something you might consider because, otherwise,
at the
k. Maybe it's not the best case, but not such a bad case when you
become
a topic of, memes on the Internet. I have a screen for from your chatbot, and people are laughing at it. And it's bad, but it could be worse.

(32:58):
And people will try to break it. That's just
not not something that you have seen before with any other app.
There are people who break apps for fun, but
the way way more of when you start to use it.
And as you continue to
invest in your own knowledge and work with the teams that you're involved with and just try to stay abreast of what's happening in the industry.

(33:27):
What are some of the emerging trends that you are paying particular attention to and investing your own learning efforts into?
I have just discovered
prompt comparison. Apparently, it is possible to use
a a model,
generative AI model,
to transform the prompt that you that's for another model, make it shorter,

(33:51):
with useless, fewer tokens,
but still get similar or the same performance. And, I really got interested in that. I cannot
say much about it yet because I have not learned enough,
but
I didn't know if it's possible, and I discovered, like, a week or two ago.
Yeah. That's,

(34:13):
really something I
want to
spend some time working on because
looks cool. Yeah. It makes
the makes the call strippers, first of all.
Then
maybe you can feed more data in your prompt
so you have bigger context.
And that just

(34:34):
sounds cool. Yeah. Just converted your prompt, compressed it, and it works the same.
So just just for this those three reasons, the this is the thing that I I want to
take a look. And,
when I learn
enough about it, I will probably write a blog post of Asifuall.
So may maybe

(34:54):
I can find it later.
And so far, what I have found is this, LLM lingua library from Microsoft.
I think I got to the name right.
So yeah.
And this is the
maybe not a trend, but
some area
of interest.
And are there any other aspects of

(35:16):
data engineering requirements around AI applications
and just supporting these applications and the data that they consume and produce that we didn't discuss yet that you'd like to cover before we close out the show?
Maybe one thing, you don't have to support every,
input.
You can just choose what what the tool is supposed to do and maybe,

(35:39):
cluster the data, do some topic modeling, decide if people are already asking the questions you expect,
to see.
And if they don't,
then maybe just filter it out. I mean, it doesn't have to do everything. It's not general. If it's not general purpose app, then
just decide if this is the thing you want to support.

(36:00):
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling,
or technology that's available for data management today, maybe in particular, and how it relates to supporting AI apps.

(36:21):
Okay. We have a lot of tools for,
evaluation, like monitoring
or,
just doing evaluation testing.
And,
it's not really a gap in the tooling because I think we have already too many.
But they're kind of,
trying to do everything,

(36:41):
and,
I
think
we need some consolidation.
I would like to have one tool for this.
Like, it will do everything, but at least do it in
some way the
creators of the tool choose to to do it,
because,
right now, you have to start try to do everything, but they really don't, and we need several of them.

(37:07):
The documentation is usually,
let's say, politely lagging.
Most likely not even existing.
So,
yeah,
I I would love to see a tool that
just gets the job done.
May it may have some

(37:29):
opinions
about how to do it. I might need to adjust my code to do it. It's fine.
Just
I I don't want I don't need three tools for for everything.
So this is the gap that I see right now.
Alright. Well, thank you very much for taking the time today to join me and share your thoughts and experience

(37:51):
of the
data requirements around these AI applications and some of the ways that it's shifting the responsibilities
and the tooling and the work required for data engineers and MLOps engineers. So appreciate the time and energy you're putting into that, and I hope you enjoy the rest of your day. Thank you. Bye bye.

(38:16):
Thank you for listening, and don't forget to check out our other shows. Podcast.net
covers the Python language, its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email host@dataengineeringpodcast.com

(38:41):
with your story. Just to help other people find the show, please leave a review on Apple Podcasts and and
tell
your
friends
and
coworkers.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

Las Culturistas with Matt Rogers and Bowen Yang

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Evolving Responsibilities in AI Data Management

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

Las Culturistas with Matt Rogers and Bowen Yang

All Episodes

Evolving Responsibilities in AI Data Management

Stuff You Should Know