All Episodes

June 9, 2025 17 mins

U.S. National Science Foundation-supported researchers are developing a multimodal system that combines image analysis and natural language processing to help manufacturers detect problems, suggest improvements and communicate with machines in real-time. Bingbing Li, a professor at California State University, Northridge, discusses his group's work with vision language models for use in smart manufacturing.

 

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:03):
This is the Discovery Files podcastfrom the U.S.
National Science Foundation.
Industry
is using advanced technologiesto optimize the production environment,
with the goal of creating
more adaptive manufacturing systemsthat can better support
one of the most important sectorsof the US economy.
We're joined by Bingbing Li,
a professor in the Departmentof Manufacturing, Systems

(00:25):
Engineering and Management at CaliforniaState University, Northridge.
His group is working to make manufacturingin the United States smarter, safer,
and more competitiveby developing a new kind of artificial
intelligence assistant designedspecifically for the factory floor.
Professor Li,thank you so much for joining me today.
Thank you so much for inviting me.
I’m super excited.

(00:45):
You're doing some exciting work.
I think the audience is going to beexcited to hear about it as we go here.
The crux of it is asystem you're calling MaViLa.
What is the MaViLa system?
Yeah.
MaViLa itself is advancedvision language model, we call this a VLM.
Specifically designedfor the smart manufacturing domain.
Because, we have been working

(01:05):
in the smart manufacturing domainfor almost ten years,
and then we always have,you know, the issues, and then,
you know, a large language model came outand then we got, really advanced.
We thought that would be really good.
You know, the opportunity for usto solve the issue that we had before.
So we developed this specializedin the advanced VLM system.

(01:29):
We call this MaViLa, betterfor the autonomous planning
and execution in the dynamic and,you know, the real world scenarios.
So backtracking a little bit to get intosome of the concepts you mentioned there.
I think a lot of people have heard ofthe large language models at this point,
but what is the differencewith vision language models?
Large language model itself in a primarily

(01:51):
you know, designed to process,generate the textual context.
So it's, text generation,text summarization.
And they can answer the questions,based on the text.
So that's a pretty basicbecause it's focused on the Ram of text,
the processing and generating the humanlike language, where
with remarkable precision right now?

(02:12):
And a lot of, new and amazingtools, a start up, a company as well.
They can generate the short video right?
Or the images right.
You can just type text prompt, andthen they can generate a video or images.
So we call this Multi-Modality,all these data.
And the VLM itself is a visual languagemodel, they are multimodal AI systems.

(02:37):
So they can understand and processboth visual and the textual data.
So they combine the large language modelwith a vision encoder.
So they give the VLM the ability to see sothey can get visual question answering.
For example,you give you one picture right.
And then the VLM, they can understandit, read a picture, tell you all

(02:59):
what, the picture describe or illustrate.
And then they can also by using the textor the prompt.
And then they can generatevia image as well.
So the VLM itselfthey can generate a text description
based on that images right will look likeand then will describe.
And also they can create images offrom textual description like the prompt.

(03:22):
They can also do the visual summarization.
And they can also get the imageto text retrieval.
So the VLM are more complex to train
and deploy due to their multimodal nature
because the processand multi modalities of data.
But they of course they also requiremore computational resources

(03:43):
so than LLM,because LLM itself only process the text.
Okay.
I know one of the data groupsyou used in this specific project
is learning specificallyfrom manufacturing, sources.
What are the benefits of having specificto manufacturing sources in the data?
I think, for LLM itself, initial like,

(04:03):
three years agowhen, when we become super you know the...
Right. ChatGPT
is everywhere.
ChatGPT you ask all the questionsand then they can answer
all different questionsfor us, for manufacturing,
the most important informationscenario for us is,
for example,we'll give you one manufacturing process.
We want to knowwhat will be the optimal cutting speed.

(04:27):
This is specific a technical question,
but the LLM itself,they can process all these really good
and nice questions but it’s more generic.
But I wherever the also isbased on the limitation on the database
they have right.
So like a fully presenteddatabase is a based on like WIKI Right.
Another 40%, is on the text.

(04:48):
You know, the textbook,you know, all the different resources.
But for us,whenever you use ChatGPT you use,
you know, the computer vision system,they are always missing the information
specifically for one domain,we consider this domain knowledge.
Same thing.

(05:08):
I think this is the big issue challengefor a lot of other domains as well.
Like maybe, you know, biological, youknow, the domain in the chemical engineer.
Yeah. You know aerospace. Right.
So they are always lookingfor the specific technical.
Oh right now I'm assigned to to operate
this syncing machine for example,or this 3D printer.

(05:31):
What will be the optimal,you know, process parameter.
So they cannot give you the answerbecause they lack this domain knowledge
and also they don't have the database,you know, to give you the answer.
Right.
So that'swhy for us as a domain application and
then we have to come up with our own datalike domain knowledge and data set.

(05:53):
And then there's the pipeline to buildand to fine tune
the large language modeland the visual language model.
And then we can build our own tool.
So for our specific manufacturing domains.
So in this casethey can answer the question
oh you are working on this specificcutting or milling or 3D printing.
Right.
And then you are, more curious aboutwhat would be the, cutting speed.

(06:19):
Right.
What would be the materials compositionyou plan to prepare?
And then same thing for the metal3D printing.
You also want to know what would bethe, in situ monitoring, right?
You analyze all these, images you have,and then they can recognize.
Same thing the for this, paper
that we published, thewe collect a lot of these, data.

(06:40):
So they have, benchmark a databaseon the images of different machine
like cutting machine.
Milling machine, wire, EDM, CNC machines, 3D printers.
With 3D printer, we also have different,you know, the FDM, SLA, right?
All the different, 3D printer technologies.
Right. And then they have the picture.

(07:00):
So our hope is wherever they sawthat picture, they can tell, oh,
what will be the name of that machineand what would be the specification
for that machine, not only the namebut also the specifications.
So whatever.
As user right, as engineer,
whenever you enter into the manufacturingfacility, you know, the because

(07:22):
we are working on the digital twin and AR,you know, system as well.
So whenever you go through that facility,the AI system or the AI Agent tool,
it can tell you whatthat machine exactly is,
not only the name or you can tell the nameis this information.
But how about the model?
How about serial number? How about this?

(07:44):
Optimal capacityor the optimal process parameters.
Right.
And then all these different detailedtechnical information.
Well beyond whatever might be in a manual,like,
like it can have access to everythingin real time.
Oh, yeah.
So thinking about the smartmanufacturing process and introducing
AI into it, is that database, thatdomain knowledge the biggest challenge

(08:09):
with getting AI systems in placein a manufacturing setting?
You know, the AI has been boomingright in the in our era days.
But a lot of people and also a lot of,you know, small and medium
size company start up companiesstart working on the applications.
Right?
But the application, you know, thethe core of the AI industry,

(08:32):
I believe that will be the AI algorithm,the model you have, right.
Either foundation model or the foundation,the algorithm, like the transform
algorithm, transformer algorithm,they developed by the Google right.
They become the foundationfor the current large language model.
Another big core will be the data,who has the data.

(08:54):
And then they can trainall these different data knowledge
right in their specific domainor multi-domain.
That will be the most valuable one.
Of course, like the a lot of people rightnow, they are working on the application,
you know, theAI application for different.
I mean, but if they don't have the dataor they don't have this foundation,
you know, the database or the benchmarkdatabase, so they cannot go further.

(09:17):
Right?
Maybe after couple of years,a lot of these company will, disappear.
Right, but only needs companywho own the AI model.
Foundation model.
or the algorithm, or thethe company who own the data.
These two will survive.
And then the will go move forward.
So same thing for manufacturing, right.
So the data collectionincluding all these like a user.

(09:41):
You mentioned right. User manual.
So they can guide the engineers. Right.
And the newyou know the trainees to learn.
And from these documentation,this documentation include
the text images, videos and all.
A lot of the professional you knowthe specific imaging like SEM right.

(10:02):
The Tensile testing data all these youknow, the TM data and then scanning data.
Right.
For the medical applicationthat will have a lot of CTMI
you know, the database as well.
So for manufacturing, this is super, supercritical for us to build our own data
sets and the benchmarking the data setsso we can train our foundation model.

(10:24):
So in that case whenever you have a newspecific application they can generate
either the synthetic dataor we collect the real world data.
That would be super helpfulfor us as well.
Thinking about that real world applicationa little bit, do
you see a lot of challenges

(10:44):
in potentiallygetting this into industry use?
One of the big challenge is the dataaccess like, some company, right.
They may have the confidential informationright?
They can not release all this data,
you know, to build your systemlike a VLM system.
And then some of them, they have, concerns.

(11:05):
Trust is an issue, right?
I think whether they can trust your systemand then to help them to achieve
their goal.
So one of the resources used in
this project was the national researchplatform Nautilus.
How did you use high performancecomputing systems in this project?
NRP is a national resources,research into the platform.

(11:27):
And lead by, I believe, you know, UC
San Diego, University of Nebraska.
Right.
And also for doing your UIUC,a lot of these universities
they built this platform and then they usea connecting you know, portal.
And then that would be freeand open to a lot of students,

(11:47):
especially if like, systemwith limited resources.
And then they help us, you know, to trainbecause initially when we got the grant
we also got another, NASA grant togetherlike 50K AWS credit,
you know, for us to startworking on the AI training.
But we realized like, with this a 50,you know,

(12:09):
thousand AWS credit, which is ran outpretty fast within three months.
And then we still need, you know, the NSFand NASA grant to pay there.
So it's it's super expensive.
We cannot afford it.
And then we just, reach out to the San Diego
Supercomputer Center and then said, oh,we have a wonderful Nautilus platform.

(12:29):
And then it's opento, you know, CSU system.
You can just set upone named namespace, right.
And then faculty,you can work as the administrator, right,
for that namespace,
and then invite all your students who areworking on the project to get access.
So yeah, it works really well.
And they have access to all the research,you know, the, papers and the projects

(12:54):
we have been working in the pastfive years.
You know, we always use the Nautilus.
Of course, like the Nautilus, NRPNautilus itself
was funded, initially funded NSF.
And the NSF also continuously providedfunding, you know, to support
the Nautilus platform and also the UCSan Diego Supercomputer Center.

(13:16):
Keep updating and upgrading their CPU.
GPU. Right.
So with this, tremendous,you know, the funding support from NSF,
we were able to work on thiscutting edge research, especially in AI,
because if we purchase a servicefrom Google, Microsoft
or Oracle, right,there will be easily 20 K per month.

(13:39):
I think without the NSF and the funding,the support or we cannot make it happen.
So for my last question,I want to circle back
to the MaViLa systemand think about the future.
Do you see fully autonomous
manufacturing becoming the standard?
I don't know how long will it take becausethe manufacturing is always like a slow.

(14:00):
They have their own pace,you know, to adopt the new technology,
the smart manufacturing, like a IoTindustrial 4.0, all the IoT sensors,
some advanced manufacturingright facility
like the semiconductor industry,they have been adopted.
And, you know, they're pretty quick,but in the meanwhile, a lot of custom
manufacturing or traditional manufacturingis still using the,

(14:23):
manual, you know, the machine.
And then they are trying to upgrade the,
automation,but there will take another few,
maybe decade if the manufacturingwill become the autonomous.
You know, the scenario.
So there will be a lot of challenges,you know, for these laborers as well.
Right?
Of course, you have to upskill itsexisting employees, their workforce,

(14:47):
and then you have to prepare really wellbefore you adopt
to autonomous manufacturing.
Of course,the autonomous manufacturing itself,
they have also have a lotof the technical challenges right now.
Have not achievedthe real autonomous manufacturing yet.
There's still a way to go forthe technical, you know, the challenges.
But in the meanwhilefrom social perspective.

(15:10):
Right.
I think how you can prepare
these existing workforce employees
upskill them and in the meanwhileto train the next generation workforce
and then to get the new era of autonomousmanufacturing scenario still, way to go.
And then I think all the stakeholdersto have to work together,

(15:32):
including the governments, industryand academia.
Right.
And then we hope,you know, that we minimize the,
maximum impact,especially for humans, right.
And then they can eventually.
But also, I mean, in the meanwhile,
it turns out so right now,the industrial 5.0,
they emphasize the human centric principle

(15:52):
because the industry 4.0,a lot of people say, oh, we can only
rely on the, smart manufacturingand the cloud computing,
IoT devices, right, to achievereally good, you know, the scenario.
But turns outwe still need the people in the loop.
We still need a human in the loop,a human centric manufacturing
to make ita better manufacturing community.

(16:14):
So same thing for uslike the VLM as I mentioned, right?
We still need the humans expertsand domain experts
to improve the system and alsoto keep updating our database as well.
So we needed these, peoplefor the next generation of workforce,
right, to make the, manufacturing community better.
Special thanks to Bingbing Li.

(16:35):
For the DiscoveryFiles, I'm Nate Pottker..
You can watch video versions
of these conversations on our YouTubechannel by searching @NSFscience.
Please subscribe wherever you get podcastsand if you like our program, share
with a friendand consider leaving a review.
Discover around the U.S.
National Science Foundationis advancing research at NSF.gov.
Advertise With Us

Popular Podcasts

On Purpose with Jay Shetty

On Purpose with Jay Shetty

I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted — click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

The Breakfast Club

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy And Charlamagne Tha God!

The Joe Rogan Experience

The Joe Rogan Experience

The official podcast of comedian Joe Rogan.

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.