Federated learning in production (part 2)

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Jerod (00:04):
Welcome to Practical AI, the podcast that makes
artificial intelligencepractical, productive, and
accessible to all. If you likethis show, you will love the
changelog. It's news on Mondays,deep technical interviews on
Wednesdays, and on Fridays, anawesome talk show for your
weekend enjoyment. Find us bysearching for the changelog

(00:24):
wherever you get your podcasts.Thanks to our partners at
fly.io.
Launch your AI apps in fiveminutes or less. Learn how
fly.io.

Daniel (00:44):
Welcome to another episode of the Practical AI
Podcast. This is DanielWitenack. I am CEO at Prediction
Guard, and I'm joined as alwaysby my cohost, Chris Benson, who
is a principal AI researchengineer at Lockheed Martin. How
are you doing, Chris?

Chris (01:01):
Doing great today, Daniel. It's a beautiful spring
day here in Atlanta, Georgia,and I gotta say the the flowers
are coming out. It's a Yes. Niceday to talk.

Daniel (01:11):
They're probably distributed all over the various
lawns across Everywhere.Federated even. Yes. Yes. Well,
Chris, this reminds me, week wehad a kind of part one intro to
federated learning and somedetails about that with Patrick

(01:32):
from Intel.
He mentioned recently that hewas at the Flower Labs
conference and the FlowerFramework around federated
learning, he mentioned quite afew times. Well, we're
privileged to carry on theconversation around federated
learning into a part two on thesubject because we've got Chong

(01:55):
Shen with us who is a researchengineer at Flowr Labs. Welcome,
Chong. How are doing?

Chong Shen (02:00):
Hi. I'm doing very well. Thanks for having me.

Daniel (02:04):
Yeah. Actually, we were talking before the show. This is
the second time that we've gotto chat about flower on the
podcast back in 2021, so evenbefore AI was invented with Chad
GPT, apparently we were havingconversations about AI and one

(02:26):
of those was with Daniel fromFlowr. That's episode 160 titled
Friendly Federated Learning. Ittook me a second to say that
one.
But I'm sure a lot has changedand updated and advanced in that
time, of course. Maybe just tostart things out, Chong, could

(02:47):
you give us a little bit of acontext of your background and
how you got introduced to thisidea of federated learning and
eventually ended up working withFlowr?

Chong Shen (02:59):
Yeah, absolutely. Thanks again for having me. My
background is in computationalphysics, so I spent many years
working, doing research in thecomputer physics field, both my
PhD and a postdoc. So I worked alot on parallel computing, on
supercomputer classes. I wasalso very interested in machine

(03:22):
learning and deep learning ingeneral.
So when I pivoted away fromacademia to go into what I call
industry, there was this spacewhere you have distributed
learning. So that was in 2021.So when I started my career back
then, it started as a datascience consulting business, but

(03:44):
specializing in federatedlearning. And I saw lots of
projects that were veryinterested to adopt federated
learning or this distributedlearning approach to solve some
specific problems that theyhave. But I also came across the
Flower framework.
And open source development is abig passion of mine. So being

(04:05):
able to develop a framework thatis used effectively with a very
permissible license, I thinkit's a pretty cool thing to do.
So that's why I decided to joinFlowr Labs and become a core
contributor to the frameworkitself.

Daniel (04:23):
Yeah. Yeah. And feel already connected with you
because my background is inphysics as well. It's always
good to have other physicists onthe show that have somehow
migrated into the AI world. I'mwondering in that transition,

(04:44):
you mentioned this transitionkind of academic to industry.
You were getting into evenconsulting around federated
learning. Was that idea offederation or distributed
computing or however you thoughtabout that, was that a key piece
of what you were doing inacademia, which led you into

(05:06):
that interest? Was it somethingelse that sparked the desire to
really dig in there as you werekind of going into quote
industry as you mentioned?

Chong Shen (05:16):
Yeah, wasn't something I came across in
academia surprisingly. Butsomehow when I stepped into the
data science world, I cameacross people who are looking
into it, and that became anapproach that back then we
adopted to try and solve someproblems. So we saw that
Federated Learning could be away to solve it, and then it's

(05:38):
very coincidental. Okay, it'sdistributed learning, it's a
distributed computing. So itresonated with me quite
strongly.

Daniel (05:44):
Yeah. And was that related to working with
sensitive data or in regulatedindustries or something like in
those consulting projects orjust interested in that
progression?

Chong Shen (05:58):
Yeah, actually there are, I would say, two broad
categories. One where the datais incredibly sensitive and we
usually refer to them as reallysiloed data, data that should
not absolutely leave theboundaries of where it was
generated. And then the secondgroup or second cluster is the

(06:22):
problems where the data sourcesare so massive. The point at
which the data is generatedgenerates so much data every
second of the day that they justcan't do any useful or
meaningful analysis on this kindof raw data and you have to
build off downsampling. So tryto look into pushing competition

(06:43):
to the edge and try to see ifthey could apply some sort of
machine learning approach ordeep learning approaches on this
sort of massively generated datawithout needing to downsample
them.

Daniel (06:52):
That makes sense. And yeah, I guess I should
explicitly mention as well thatin the part one of this two
parter with Patrick, Patrick didprovide a detailed introduction
to the idea of federatedlearning, and we discussed that
at length. If people want to goback to the previous episode and

(07:15):
listen through that, that mayprovide some context. But it
probably would be worthwhilejust to give your thirty second
or couple minute view onfederated learning and how you
would describe it at a highlevel, and then maybe we can
jump into some other things.

Chong Shen (07:34):
Sure, absolutely. The easiest way to think about
it is looking at your classicalmachine learning approach.
Classically, you need to bringall the data into a single
location. Think of a database oron disk, and then you train your
model on that data. Butsometimes, it's not so easy to

(07:56):
actually bring all the data intoone location just because of the
privacy reasons about movingyour data, some geopolitical
considerations surrounding it,and also the data volume that's
been generated.
So instead of bringing out adata to one spot and training a
machine learning model on that,what you do is you move the
machine learning models to thepoint at which the data is

(08:19):
generated, and then you trainthese local machine learning
models at these sources. Theninstead of moving the data
across, you move the modelweights to a central server,
which is much, much smaller. Youcan then aggregate the model
weights to learn from thesevarious data sources. Then over

(08:41):
time, as you repeat many, manyrounds of this, you end up with
a globally aggregated model thathas learned from this variety of
data sources without needing tomove data sources across. That's
the essence of federatedlearning.

Chris (08:54):
I'm curious as as you guys have worked on the
framework and and, you know, youhave new users coming into it,
what what usually promptstypical user from your
perspective to move intofederated learning? Like, what
like, you know, before they'rereally, like, fully into it, and
they understand the benefits andand they're sold on it, if you

(09:16):
will. What's usually in yourexperience, the impetus that
kind of gets them into thatmindset or and and kind of
drives them in that directioninitially? What causes the
change in the way they'rethinking so they go, I
definitely need to get intofederated learning and go use
flower specifically?

Chong Shen (09:32):
Yeah, absolutely. I think from my experience, the
biggest driver is when theyrealize they can't move their
data. Right? But when they speakto all the parties involved,
they say that, Oh, I have thisdataset. Oh, you have this
dataset, but I don't really wantto share them.
And then, Okay, this is where,you know, federated learning or
FL comes to the picture anddecided, Okay, we really need to

(09:54):
do this. This is one aspect ofit. And the other aspect is when
there's this big company whohas, let's just say, many data
sources. They say, okay, it'ssuper difficult to coordinate
all our databases together sothat we can have a cohesive way
to train the machine in themodel. This is also when try to
look for all distributed machinelearning systems, then they

(10:14):
realize, oh, they come acrossvarious learning.
There's these two vectors thatdrive the typical use cases.

Chris (10:22):
Curious if I can follow-up on that, because I
have a a personal curiosity. Ihappen to work for one of these
big companies that has data inlots of different places. And
and in addition to that and wekind of in the preview in last
week when we were talking, wetalked a little bit about some
of the privacy issues as well.I'm curious what you think about
this. Like, in our case, and andwe're not the only one, lots of

(10:45):
that data is store at differentlevels of security and and
privacy.
There are different enclaves, ifyou will Mhmm. Where you're
trying to do that. And how doesthat ramp up the challenge of
federated learning when you havedifferent, you know, different
security concerns around thedifferent data enclaves that
you're trying to bring togetherthrough federated learning? How

(11:06):
do how does one go like, insteadof saying all different
locations for distributed dataare equal, when you're dealing
with different securityconcerns, do you have any any
ways of starting to think aboutthat? Because as as I come into
it as a newbie on this, thatseems like quite a challenge for
me.
Do you have any advice orguidance on how to think about
it?

Chong Shen (11:26):
Yeah. Yeah. I think this is, from my experience is
that the complexity of thesolution skills with a number of
data stakeholders involved. Andwhen you mentioned about
different levels of theenclaves, that to me signals

(11:47):
that there are many data ownerswho manage their data a bit
differently. So the the key tosolving that is to harmonise the
data standards first, to be ableto get on to a federation.
And then from then onward, theimplementation becomes much,

(12:09):
much easier. Maybe it's one ofthe key things that I've seen.

Daniel (12:13):
And we've kind of talked about your background, the
introduction to federatedlearning, some of those
motivations. Maybe before we getinto flower specifically and
some of the more production use,From your perspective as being a

(12:34):
central place within theecosystem of federated learning,
I guess, Just very honestly,because we had that last episode
in 2021. '20 '20 '1 till now,how is the state of adoption of
federated learning in industrydifferent maybe now than before?

(13:00):
How has that grown or how hasthat matured as a kind of
ecosystem, I guess?

Chong Shen (13:08):
Yeah. It's a very good question. If I were to put
a number to it, and this isreally arbitrary, I think
there's 100x difference from2021 when the Flowr framework
existed and now. And one of thekey changes in the usage of

(13:29):
heritage learning is the abilityto train foundational models and
large language models. And thishas been a significant change
and driving force.
Previously, when we talked aboutusing the Flowr framework, you
may be confined to models thatare not super large, small by

(13:50):
today's standards of the orderof millions of model parameters.
But these days, when we aretalking about making use of text
data, image data for thesefoundational models, you are
thinking about models at theorder of billions of parameters.
And there is a fundamentalchange in also how we have

(14:12):
structured the architecture ofour framework, and also to
increase the ability to streamlarge model weights. So all of
these things are happening rightnow as we speak, and there's
some exciting new progress.Hopefully, we release a new
version in a couple of weeks.
For the users, the usage isidentical. Nothing has changed.

(14:34):
But what has been unlocked isthe ability to then train very
large models. So all of thesereally increases the appeal of
using federated learning or theFlowr framework for a larger
variety of use cases.

Sponsor (15:03):
You know what's beautiful about good code? It
just works. No fuss, no fivehour debugging sessions at 2AM.
That's exactly what NordLayerbrings to business security.
While you're busy shippingfeatures and fixing bugs,
NordLayer handles your networksecurity in the background.
It's like having a senior DevOpsengineer who never sleeps, never

(15:23):
takes vacation, and neveraccidentally deletes the
production database. Zero trustarchitecture? Check. VPN that
doesn't make your team want towork from the coffee shop
instead? Double check, deploy inunder ten minutes with no
hardware to rack and stack,triple check, built on the same
foundation as NordVPN, butdesigned for teams who need

(15:45):
granular access control andcompliance reporting because
apparently it works on mymachine is not sufficient for
the auditors.
The good news is our friends getup to 22% off plans plus an
additional 10% with the codepractically 10. That's
practically dash 10. That's lessthan your monthly GitHub Copilot
subscription, but infinitelymore useful when the security

(16:08):
team comes knocking. Check itout at
nordlayer.com/practicalai.Again,
nordlayer.com/practicalai.

Chris (16:20):
Well, Chung, we as we've kind of dived into the show and
we've we've already startedmaking reference to flower quite
a bit, but we haven't actuallyreally described specifically
what flower is in detail as aframework and what it brings and
such as that. Could you couldyou take a moment and and we
probably should have done thisbefore, maybe kind of kind of

(16:43):
express, you know, exactly whatflower is, what the components
are, and and how it kind ofhelps the user begin to federate
their data in terms of theirknow, what their workflow is.
Could you talk a little aboutthe kind of the basics of it?

Chong Shen (16:59):
Yeah, absolutely. So the Flowr framework is our
flagship open source code that'sbuilt on the Apache two point
zero license. And this frameworkallows users, any data
practitioners, to build afederated learning solution. So

(17:20):
with the framework, what thismeans is they are able to, I
guess in code terms, install abasic Python distribution of
Flowr, and to build thedifferent apps that allows you
to construct the fundamentalfairytale architecture. So what
this means is to be able to spinup your server, which aggregates

(17:42):
the model parameters, and towrite the code to also do your
training on the clients.
The structure that we providewithin the framework allows
users to follow the samereproducible way to perform
their accredited learning. So Ithink at the essence, this is
what it is. What I also wantedto say that, and one of the

(18:02):
appeals of Flowr for mepersonally, is that we really
emphasise the user experience.And this is why we always say
Flowr is the friendly federatedlearning framework. We
prioritize the experience of allour users.
We support them on Slack. Wealso have a Discourse channel

(18:24):
called Flower Discuss, where weactively answer any of the
questions from users. And wealso have a fantastic community
that has contributed a lot of,code improvements to the call
framework as well. So we arecompletely open. We're built
transparently and reallyaccountable for every single

(18:44):
line of code that we commit tothe highest standards.

Daniel (18:47):
Yeah, and I can testify personally. At Prediction Guard,
we work with a number ofstudents over time at Purdue
University. They have capstoneprojects. We're in the same
town, so it's natural that wewould work with some of those
students. We've done that acouple times now.

(19:10):
One of those student groups thatwe had, I believe it was last
year, actually did this sort ofcapstone project related to
federated learning and traininglanguage models, translation
models, trying various things.They evaluated a bunch of
different things, but I thinkended up using Flowr for the

(19:32):
reasons that you mentioned. Theywere newbies into into this
world of federated learning.Obviously, smart, students. No
no doubt there.
But they but they definitelygravitated to the user
experience with Flowr becausethey had programmed in Python
and it just came naturally tothem. So yeah, I'm sure that's a

(19:58):
common experience that maybe youall hear from others, that sort
of natural Pythonic way to kindof approach these topics.

Chong Shen (20:08):
Yeah, yeah, we do, absolutely. I'm very happy that
you shared that experience. It'sgood to always hear feedback
from your community. But yes,Python being the really driving
language behind machine learningmodels and deep learning models
right now. So it's a reallynatural way to provide a Python

(20:29):
SDK.
We support it from day one andwe will continue to support it
for a long time.

Chris (20:33):
I'm curious the kind of extending that just a little
bit. Beyond being in thelanguage, I like the notion of
the friendly language. The wordfriendly appeals to me in terms
of that user experience. Couldyou talk a little bit more about
kind of why you're brandingaround friendly and what that

(20:56):
means from a user experiencestandpoint? You know, what other
aspects of it make it friendly.
There there are so many thingsout there that are not friendly
that, that definitely thatdefinitely grabs my attention.

Chong Shen (21:08):
Yeah. Absolutely. I can, I think what what would be
nice to explain is, for the past10 releases, we have
dramatically improved thefriendliness of our framework,
hopefully? I hope that's theexperience that people will get
out of this. The main point isto reduce the cognitive load of

(21:30):
any developers who want to useour framework.
So I'll give one concreteexample. We introduced the Flowr
CLIs a couple of releases ago, Ithink probably late last year.
And what this does is with asimple FLWR space new command
NEW, a user is able to navigateoptions through the command line

(21:53):
and immediately have a templatedproject to work with for
federated learning. And it runsout of the box. After Flower
New, the user goes through this,just follows the steps, and then
you do flwr run, and it runs outof box.
And we have the core templatesthat is necessary for users to

(22:13):
build on. We have the PyTorch,TensorFlow, the typical ones,
and the more exotic ones, haveJax. And those who want it, they
can use NumPy as well. All theseprovide the boiler code for
users to get started with, andit reduces so much startup time.
Then with that, once a user hasbuilt all their applications,

(22:33):
the user can also really monitortheir runs.
We also introduced commands likefwlr ls. It's really like ls in
your terminal to just see whatruns, what Webflow runs are
running at the moment. And alsoothers like FWR log to see the
logs of your code. So all ofthese really simple CLI tools

(22:58):
really help a user navigate andwork with running code much more
easily. Previously, I would say2021, '20 '20 '2, early '20 '20
'2, the Flower framework was ina different place.
How it worked back then was wasstill friendly, but the way that
a user would need to start thefederation would be to start

(23:23):
three Python scripts. And thisis not as intuitive or natural
if you want to scale up or putinto production. So with the
introduction of the Flowr CLIand a different way of deploying
the architecture which drivesthe federation, it really makes
it so much easier for users tostart building and then deploy

(23:45):
the, deploy the code.

Daniel (23:47):
Well, you you were kind of leading into maybe what was
going to be my next question.You mentioned kind of taking
things into production. So somepeople might hear kind of
friendly framework, which is agood thing as Chris mentioned,
but they might associate thatwith, you know, prototyping and

(24:07):
learning and that sort of thing,not necessarily production
usage. So I'd love if you couldkind of help us understand what
does a if I'm implementing afederated learning system with
Flowr, what does a productionfederated learning system look
like? I'm sure there's differentsorts of ways that that could

(24:31):
manifest, but certainly you'veseen a lot of use cases.
Maybe you could just highlightsome examples for us. What does
that production federatedlearning system look like? And
what are some of theconsiderations that you have to
think about going from a toyprototype of this might work to
a full scale production rollout?

Chong Shen (24:53):
Yeah, absolutely. I think it is a nice segue between
the fairness aspect and movingto production, because what I
also want to mention here isthat I walk through a very
simplified workflow of how auser would build out an FL
solution. With the Flowrframework, you could build and

(25:15):
write the apps that you need foryour your server aggregation,
and also for the clients, whichactually train the models at the
data sources. In the firstiteration, a user might actually
run it in what we call thesimulation runtime. So without
worrying about the actual datasources or to work out the data

(25:37):
engineering aspect of it, youcould test the implementation of
the basic architecture in thesimulation runtime using data
sets that are obtained fromHugging Face, for example, or
from data sets that you couldjust create artificially just
for testing purposes.
With the same code that you useto train the models and the

(25:58):
clients and to aggregate, youcan then point the code to a
different runtime and thenexecute it in what we call the
deployment runtime. And thisbrings us one step closer to
production. So once you havethis mode of execution, the
clients would then be tapped into the data sources, and you can

(26:18):
then start training your actualfederated model. So what does it
take to deploy a productionsystem? Firstly, there is a nice
acronym that I like to use fromthe TinyML community.
It's Blurb. I'm not sure ifyou've come across that before.
Have you come across thatbefore? Just out of curiosity.

Daniel (26:39):
But go ahead and explain.

Chong Shen (26:42):
Yeah. Yeah. So the TinyML community talks about
bandwidth, latency, efficiency,reliability, and privacy, if I'm
not mistaken. I could be wrongwith the last one. But in a
production grid system, what youreally want is the reliability
of the deployed solution to dothe full computation.
It doesn't have to be federatedlearning, but systems in

(27:03):
general. So with the conversionof the file framework, we have
separated what we call theapplication layer, where users
will build in apps, and theseare the ones that users will
modify. And then we also havethe infrastructure layer, which
underpins this system. Thisinfrastructure layer is

(27:25):
responsible for receiving theflower commands from a user, and
then to distribute all thenecessary code the clients for
the clients to actually performthe training. So in parallel
terms, you come across it, butwe call this the superlink to

(27:45):
actually host the server.
And the supernodes supernodesare the long running services
which basically orchestrate theclients. So these two components
are long running. With these twocomponents, because they are
long running, the users can thenexecute multiple federations

(28:07):
across all the systems withoutworrying about any of these
components failing. So this iswhere the reliability comes into
the picture. Because theconnections are also
established, we also handle thebandwidth and the connection, so
we try to reduce the latenciesbetween the supernodes and the

(28:27):
superlink as well.
So the infrastructure issomething that is being deployed
once and that will persist forthe lifetime of the project. And
this makes it much easier forthe users to continue to work
with the production grid system.So it's always there waiting for
you. Every time a user wants togo in and execute a run and look
at the results, it's alwaysthere without worrying about any

(28:49):
component failing and stoppingthe run.

Daniel (29:09):
Chris and I are so happy that you are joining us each
week to hear from amazing guestsand listen to some of our own
thoughts about what's happeningin the AI world. But the things
are just moving so quickly, andwe recognize and want to see you
all participate in theconversation around these
topics. That's why I'm reallyhappy to share that we're gonna

(29:33):
start posting some webinarevents on our website where you
can join virtually and actuallyparticipate, ask questions, get
involved in the discussion, andreally deep dive into various
topics related to either variousverticals or technologies or
tools or models. The first ornext of these that's coming up

(29:57):
is June 11 at 11AM eastern time.It's gonna happen virtually via
Zoom, and you can find outinformation about that at
practicalai.fm/webinars.
This is gonna be a a discussionaround on prem and air gapped
AI, in particular as how thatrelates to manufacturing and

(30:19):
advanced logistics. I've seenpersonally as we work with
customers in this area just thetransformative power of this
technology there from monitoringmachine generated events to
providing supply chain adviceand so much more. But there's a
lot of struggle in terms ofdeployment of this technology in

(30:41):
those air gapped or in those onprem environments. So that's
what we're gonna talk through onJune 11. I really hope to see
you all there.
For this discussion, again, June11, '11 AM eastern time. It's
gonna be virtual via Zoom, andyou can find out more
information atpracticalai.fm/webinars. Chong,

(31:27):
I love this idea of the sort ofsuper notes and super links. And
my thought is, I'm trying towork out in my head kind of if I
was Let's say I'm working in thehealthcare space and my nodes
are maybe different hospitals ordifferent facilities in a
network or something like that,and I have a central place where

(31:50):
I have my super link and I'mdoing the aggregation. Just from
a practical standpoint, as Ithink Chris mentioned before,
you have these differentfacilities, you have different
maybe stakeholders withdifferent data.
What do I need to do as let'ssay I'm the person that's in
charge of running theexperiment, training the model.

(32:13):
What do I need to do on thesetup side to sort of connect in
these super nodes or whereverthe clients are? What needs to
exist there? How do I registerthem in that setup process to
really get going before I'm ableto go in, like you say, and from

(32:35):
a user perspective, runexperiments or perform training
runs and that sort of thing?

Chong Shen (32:42):
There are many ways to go about it, but I think the
cleanest way is to think abouttwo groups of roles. One is the
administrator role, and they areresponsible for deploying the
supernodes in each of these,let's say, health care
facilities, health care centres.They are responsible for making

(33:07):
sure the correct user isregistered onto the Superlink or
the Federation, and also tocoordinate any monitor,
basically monitor the usage ofthe Superlink itself. So that's
the administrative role. Andthen there is the user role, or
the data practitioners, datascientists would then write

(33:29):
their apps, their server appsand their client apps, and then
run these apps on the superlink, on the federation that the
administrator has deployed.
So I think this cleardistinction would be an easy way
to think about it. So as astart, an administrator would

(33:49):
say there are five hospitals whowant to form a federation. An
administrator or administratorscan go in and deploy the
supernodes with the template.For example, if you using
Kubernetes or Docker containers,you can have Helm charts that
can deploy the supernodes ineach of these five hospitals.

(34:12):
The superlink can be hosted by atrusted third party server, or
it can also be hosted by FlowerLabs, for example, who can host
a Superlink for you because it'sjust a simple service.
And then the users wouldregister or be authenticated on
the Superlink. So they need tobe, both authenticated and have
the authorization to run theFlowr commands on the Superlink.

(34:35):
And that way, you can get acompression system up and
running, in a cross silosetting.

Chris (34:42):
I'm curious as we're kind of talking through it, and I'm
learning a lot from you as aswe're as you're describing it.
And you've kind of madereference to admin roles and
client server apps and, youknow, super links and super
nodes and stuff, which which,you know, kind of in the context
of federated, there's networkingand stuff like that. So I guess

(35:03):
I have a generalized questionaround that. And and that is, is
there any set of knowledge orskills that a user can kind of
ramp up into or needs to know touse flower effectively, like
like like particular, forinstance, you know, that maybe
they're coming from more of akind of a the data science or

(35:24):
kind of, you know, deep learningrole. And and maybe they haven't
done a lot of networking andstuff like that.
Do they need are there skillsthat they need to be able to
ramp up into to be mosteffective at using Flower that
you would recommend? Or, youknow, what what would the
expectation on the user be inthat capacity?

Chong Shen (35:44):
Yeah. That's a good question, actually. It's a fair
question as well. In my opinion,what we're trying to convey is
that we users do not need tothink about the communication
aspect of it at all, thateverything is handled by the
infrastructure. Of course, if auser starts to be more run into

(36:12):
when the Shared Data Learningsolution becomes a bit more
complicated and run through veryspecial cases, this is where
some understanding of thecommunication protocols and how
these are set up.
This could help as well. And Ithink for users who are stepping
more into sort of administrativerole and want to deploy the

(36:33):
supernodes or work with theinfrastructures, basically the
superlinking supernodes, thereare questions of
infrastructuredev ops. You haveto have some familiarity with
deploying this in containers orworking with pods, things like
that. But fundamentally, whenyou first start to work with the

(36:54):
framework, you can get startedwith a vanilla production system
without worrying too much aboutthe communication or needing to
know too much about it. And thenas you get your feet wetter,
then you can learn more alongthe way.

Daniel (37:09):
Well, yeah, that line of thought, along with something
that you earlier said about howlarge language models generative
have pushed the boundaries ofhow you do communicate data,
weights back and forth, how youcan handle larger models with

(37:29):
the more recent versions ofFlowr, and you're releasing the
new version in a couple weeks,even with more. I'm wondering
generally how certainly that'sone aspect of how this sort of
boom and generative AI hasprobably influenced your roadmap
and how you're thinking aboutthings, what people are wanting

(37:49):
to do with Flowr. I imaginethere may be a variety of ways
that that's impacting Flowr. Iwas even thinking while you were
talking about that, was like,Wow, it would be cool if there
was a MCP server or something orhelpers on top of flower that I
could just type in naturallanguage. That would be a

(38:12):
friendly interface to set up myexperiments and that sort of
thing.
Yeah. As one of the core folksworking on the framework, how
have you seen this boom ininterest around generative AI
influence the roadmap and whatyou're thinking about at Flowr,

(38:33):
what you maybe envision for thefuture of the framework, that
sort of thing?

Chong Shen (38:40):
Well, when you brought up the model context
protocols, had a smallinterface, but there's
definitely been some interestingconversations recently as well.
We and the team are looking tothat. Yeah. About the impact of
generative models or largelanguage models slash multimodal
models, there's been a It's oneof the driving forces for the

(39:05):
Flowr framework as well. Wereally believe that this state
of the art LMs as we speak,they're running out of data to
train.
Back in December, Julia, the cofounder of OpenAI, you were
saying that data is running up,or data has run up to train
these LMs. And yes, that'sexactly the sentiment that we

(39:26):
feel as well. It's the tip ofthe iceberg. There are tonnes of
data locked in silos that couldbenefit from having large
language models, either pretrained or fine tuned on in
order to be useful, to be madeuseful. And the way to achieve
it is through federatedlearning.

(39:48):
I think this is one of the keytechnologies that is driving the
framework.

Chris (39:54):
I'm curious kind of to extend that notion a little bit.
As we're we we've been so intokind of the generative AI hype
cycle for the last couple ofyears and stuff. And and now
we're that that's kind of movinginto combining models in
different ways and and agentic,you know, focus and and

(40:16):
ultimately, physical, you know,models going out there in terms
of of interaction. And so and II know I'm what I'm seeing out
there involves instead of justhaving one model, you know,
people are now putting lots ofdifferent combinations of models
together to get jobs done. Doesthat in any way change kind of

(40:36):
how you should think about aboutusing federated learning?
Is is, like, every model thatyou might have in a solution
just its own one off Flowerimplementation, or is there any
ways that you guys are thinkingabout combining models together
if they're all using data from,you know, different resources
and stuff like that? Like, howas we're moving into my solution

(40:58):
has many models in it, does thatchange in any way how users
should think about using Flowror architecting a Flowr based
solution?

Chong Shen (41:07):
It's a very deep question. I feel that there are
a couple of possible futureshere. There is a future where
these agentic workflows, whereyou have models that are chained
together to achieve a certaintask, could also be used

(41:29):
eventually in concert withfederated learning. So I see a
future where there is apossibility about that as well.
But there needs to be someintermediary steps there.
And the reason is because thesemodels, when you use them for
agentic workflows, they need tobe really optimised for the

(41:51):
agentic workflows. They need tobe trained on a certain type of
structure and also be optimisedfor it. There needs to be some
proper evaluations for that. Sosort of the missing I see the
future where if these two sortof pathways of agentic workflows
and federated learning cometogether, it would be that

(42:13):
people should think about havingstrong evals for this kind of
workflows, and then knowing thatthere is a limit to them once
you're able to quantify them,then to look for ways you can
improve it through distributedlearning, such as federated
learning. And this is how yourationalize an improvement over,
agentic workflows.

Daniel (42:33):
Well, Chong, it's been fascinating to hear some of your
your perspective on, especiallyproduction use of federated
learning and flower. As we kindof draw to a close here, I
imagine we'll have flower backon the podcast here in another
couple of years or before.Hopefully this does be a

(42:54):
recurring one. But as you lookto this next season of either
what you're working on or justthe ecosystem more

Chong Shen (43:05):
broadly,

Daniel (43:08):
what's exciting for you, interesting for you that is
always top of mind or is mostthere when 're going back from
work in the evening? What's onyour mind as you look forward?

Chong Shen (43:24):
Yeah, absolutely. I think I'm very keen to think
about this Foundation LLM thatis purely trained on FL, on
federated learning, and has beenshown to be both privacy
preserving and also state of theart. I think if the viewers and
also yourselves, if you checkout, we are collaborating with

(43:46):
Vana as well in The US. They arelooking into data DAOs, and we
are very much working on that.So I'm really looking forward to
seeing the first LLM in theworld that is trained in FLA
with, soda standards.

Daniel (44:04):
Awesome. Well, yeah, we look forward to to that as well.
Certainly come on the show andand give us your comments on it
when it when it happens. But,thank you so much for taking
time, Chong, to to talk with us.Really appreciate your
perspectives.
And, please pass along our ourthanks to the Flowr team and
their continued work, you know,as a team, on a great addition

(44:26):
to the to the ecosystem.

Chong Shen (44:27):
I will. Thank you, Daniel and Chris. Thanks for
having me having you on thepodcast.

Jerod (44:38):
All right. That is our show for this week. If you
haven't checked out ourChangelog newsletter, head to
changelog.com/news. There you'llfind 29 reasons. Yes.
29 reasons why you shouldsubscribe. I'll tell you reason
number 17. You might actuallystart looking forward to
Mondays.

Sponsor (44:58):
Sounds like somebody's got a case of the Mondays.

Jerod (45:01):
28 more reasons are waiting for you at
changelog.com/news. Thanks againto our partners at Fly.io, to
Brakemaster Cylinder for theBeats, and to you for listening.
That is all for now. We'll talkto you again next time.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Federated learning in production (part 2)

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Las Culturistas with Matt Rogers and Bowen Yang

Dateline NBC

All Episodes

Federated learning in production (part 2)

Stuff You Should Know