All Episodes

July 30, 2021 46 mins
New Episode of the LA PIPA & ReMotive Podcast: Data, AI, and Climate Action with Vince Petaccio At LA PIPA Studios & ReMotive, our team is dedicated to exploring the cutting edge of Data Science & AI, sharing scientific knowledge and technical expertise through insightful conversations with industry leaders. In this episode of our Data Stand-Up! podcast, Jesús talks to Vince Petaccio, a Data Scientist at Amazon Web Services (AWS), where he develops machine learning tools to identify and mitigate fraud activity. Previously, he worked at Untapt, where he created deep-learning natural language tools designed to match job candidates with fulfilling careers while reducing systemic bias. Before that, Vince focused on real-time neural data analysis in brain and spine surgery across New York City and Philadelphia. He holds degrees in Biomedical Engineering (Neural Engineering) from Drexel University and Computer Science from Georgia Tech. AI & Data Science for Climate Action Beyond his career in AI and data science, Vince is a passionate advocate for climate action. Since 2016, he has been actively involved with Citizens' Climate Lobby, working to advance politically neutral climate policies and build momentum for climate action. He also applies his data science expertise to projects that seek innovative solutions to the climate crisis. In His Own Words: "I am passionate about leveraging emerging technologies for the benefit of others. I believe that human ingenuity, with a bit of imagination and a lot of cooperation, can yield practical solutions to challenging problems. My time outside of work is spent passionately pursuing nonpartisan solutions to the global challenge of climate change, feeding my insatiable curiosity, and deepening my knowledge." Thanks for sharing your knowledge, Vince! Join us as we explore the role of AI, data science, and innovation in tackling climate challenges.
Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:06):
Bedroom is an independent data science and aiphone specialized in
data driven business change. In this podcast, our guests help
us spread knowledge and experience with our listeners.

Speaker 2 (00:28):
Good morning things, How are you doing today?

Speaker 3 (00:32):
Hey, how's a go? And I am doing well? Enjoying
the beautiful New York weather. How are you doing?

Speaker 2 (00:37):
I'm fine? Is that ironic or you're being serious?

Speaker 3 (00:43):
Well, it was being serious, although it's funny to say
that it's been extremely rainy the last few weeks here,
but that's a bit of an anomaly in today's our
usual lovely summer weather.

Speaker 2 (00:53):
Glad to hear well for those listening.

Speaker 4 (00:56):
I listened to one of the Super Data Science episodes
and I was fascinated by the work that Bins is
currently doing. I have to say, Bings, you sound it
quite passionate about using emerging technologies and specifically data science
for the benefit of others. I believe that you work

(01:18):
on a very challenging problem these days, which is global
climate change, and that really caught my attention. We'll get
into that later because you work at AUS, which, as
many of you will know, stands for Amazon Web Services,
and we need to know who you are and how
you ended up where you are? So would you mind

(01:40):
briefly introducing yourself?

Speaker 2 (01:42):
Where are you from?

Speaker 4 (01:43):
Years in the data science industry, educational background, hobbies if you.

Speaker 3 (01:48):
Want, absolutely, yeah, I'd be honest. Yeah. So my name
is Vince Potaccio. I grew up in the United States,
right outside of Philadelphia, in the suburbs there. You know,
my educational background will start there and kind of go chronologically.
I originally studied biomedical engineering and kind of focusing on

(02:09):
neural engineering. And I would say that in general, the
theme that overarches my educational background and my career history
is sort of working at the intersection of technology and humanity.
You know, where does humanity fit in with the rest
of the world, and how does technology sort of fit
in with that relationship and that negotiation in many ways.

(02:33):
So my educational background, I worked on building brain computer interfaces,
and we were working on human trials of a device
that would allow people who had certain neurological disorders that
prevented them from communicating with their loved ones to communicate
and control a computer with their brain waves. So we

(02:55):
record brain activity and when to use that to control
a computer. After that, that kind of led into a
career working in brain and spine surgery, where I would
apply real time data analysis on the nervous system and
would interpret that in real time in the operating room
to help surgeons kind of guide their procedures and improve
outcomes and keep patients safe. And that work kind of

(03:19):
led me sort of drove a passion for understanding exactly
how signals are being processed, sort of how we could
understand data and then use that to make decisions. And
what was at the time a hobby, which was learning
Python and digging into artificial intelligence and machine learning and

(03:40):
pop science books became a career and I went back
to school did another master's degree in computer science machine learning,
and after that transitioned into the private industry as a
data scientist. And that was in twenty eighteen, so I've
only been in the industry for about three years now.
And at the time I worked at a company called Untapped,

(04:01):
where John Crone, the host of super Data Science, was
my manager, and we built algorithms that I would try
to match resumes for those seeking jobs with job descriptions
in a way that would avoid a variety of different
sorts of systemic bias. And after a while there I

(04:24):
moved over to Amazon Web Services, which is where I
am now as a data scientist, where I try to
understand human behavior in the context of fraud for folks
who are effectively trying to steal compute resources, and try
to understand their behaviors and their motivations and then seek
to mitigate that to protect legitimate customers at AWS. So

(04:47):
I hope that I know it wasn't as brief as
you may have wanted, but that's how I got where
I am today.

Speaker 2 (04:52):
No, but that was very clear.

Speaker 4 (04:54):
So what in easily caught your attention was or your
intro just was knowing how to make sense of the
signals that you were capturing with those sensors in the
brain or the.

Speaker 3 (05:11):
Body of a person, right, correct, Yeah, absolutely, And that's
yeah to your original question. You know, that's that theme
is sort of echoed in a lot of my work
today on my you know, my AWS job is my
nine to five, so to speak. But my five to
nine is working on finding ways to apply these really

(05:34):
powerful tools and techniques to the existential crisis that is
climate change. And so that sort of integration of sensors
and tangible technology with biological biology and computation is, like
I said, a theme that is echoed in that work
as well. So it's it seems like very different domains,

(06:00):
but they are surprisingly related.

Speaker 2 (06:03):
No, I kind of agree, I kindo.

Speaker 4 (06:05):
I completely agree, mainly because many of the data sources
that you may use in one of these projects that
relate to inability. I don't know, crops, farms, whatever, is
about gathering data from those decentralized devices, IoT sensors and
all of that that then go somewhere to be analyzed

(06:26):
through a model, or maybe the model is run on
that decentralized device.

Speaker 2 (06:30):
But there is a huge.

Speaker 4 (06:33):
Room for growth and opportunity, and probably you can tell
us more about it later because you have a few
projects run by yourself and I don't typically ask about
specific data tools, but given your role at a WS
and your experience working with solutions from many other providers,

(06:53):
I guess we may be getting a bit techy discussing,
you know, pros and cons about specific platforms technology.

Speaker 2 (07:01):
I hope that's fine. Okay, so let's stay. Let's stve in.

Speaker 4 (07:06):
You've told me how you got started in the industry,
meaning the data science and AA practice, and why you
came attracted from a neural engineering degree. But which kind
of pricke did you start with first? Did you start
with more data engineering tasks or did you suddenly start

(07:28):
to get immersed in things related to building and deploying
deep machine learning models algorithms?

Speaker 2 (07:36):
How did you start it in the field.

Speaker 3 (07:38):
Yeah, So my foray into this industry started with my education. So,
like I mentioned earlier, I went back to school and
did another master's degree in computer science. And while I
was doing that, I was still working in the operating room.
And my thinking at the time was, if I really

(07:59):
want to make this a career and not just a hobby,
then I need to fully immerse my brain in this
sort of information as much as I possibly can. So
in addition to doing the masters, which was very theoretical,
as education from a formal educational institution tends to be,
I also was doing some sort of like job training

(08:24):
oriented online courses as well, and so from that I
had sort of an exposure to practical, modern trendy applications
and also to sort of theoretical classical methods. And because

(08:45):
of that, when I sort of started in the industry,
my thinking was not necessarily on how can I, you know,
build the biggest model or have the most parameters possible
to solve this problem. It was more driven by the
theoretical foundations that I had learned in my educational background,
which was, how can I distill this problem into a

(09:07):
correctly framed machine learning problem and use the simplest approach
that I can think of that is also effective. So
I think a lot of my work has sort of
boiled down to rather than throwing the fanciest or most
in vogue model at a problem, it has been more
about solving the problem in the simplest way that I can,

(09:29):
which is ultimately the more efficient and also leads to
more explainability in many cases. So with that said, the
specific problems that I was first solving an industry were
natural language problems, you know. Like I mentioned at the top,
we were working on trying to match employers who were
trying to fill vacancies with folks who are trying to

(09:50):
find fulfilling work. And in that situation you have you're
trying to look at this hyper compressed representation of a
person's entire professional history, by which I mean their CV
or their resume, and then you're looking at this similarly
hyper compressed representation of some piece of work at some organization,

(10:13):
which can be a few paragraphs describing, you know, where
you'll spend half of your life while you're employed there,
and trying to both explode that representation out to capture
more information that's sort of latantly represented by the language there,
and then determine how much or how well those two
things work with each other and how compatible they are.

(10:36):
So that sort of led me down a path of
exploring a lot of natural language work and applying trying
to find new ways to apply classical methods to novel
natural language problems. Yes, I hope that answered your question,
But basically, you know, I was working more on things
like classical dimensionality reduction techniques before I delved into deep learning,

(11:00):
which is where it eventually led.

Speaker 2 (11:03):
I may ask, I know you may be a bit
biased because of who we are now with US. I
know that technology is sift and move very quickly. Which
infrastructure providers were you working with back then? Because I
know there is no conversations around work cloud platform on

(11:24):
a WUS losing ground against the Microsoft US here?

Speaker 4 (11:27):
But how did you started? Which infrastructure did you start
to use? And thoughts on it? And learnings from it.

Speaker 3 (11:37):
Yeah. So the first cloud infrastructure that I worked with
was GCP Google Cloud Platform, and I really enjoyed working
with it. You know, there are a lot of different
tools available there. They have a fantastic free tier which
persists indefinitely, so if you have a small scale project,
you can run it for free forever. Although you continue

(12:00):
to grow, then Google is happy to continue to provide
services at fbe, which I think is kind of the
idea there. I think it's a great way of running it,
and for that reason I actually do for some of
my personal work still use GCP for that always fre tier.
I like some of the products they have as well,
some of the ways that they're merging scale to zero

(12:23):
computing with containerization, with things like Google Cloud Run where
you can you know, you can think of it like
an AWS Lambda function that has no time limit and
runs your custom doctor container and then immediately scales to
zero when it's done. So by contrast, at AWTs there's

(12:43):
no competing product to that. We have lambda functions which
run for up to fifteen minutes, and only late last
year where that they start supporting custom doctor containers and
then there's sort of nothing in between that and like
a custom you know, Kubernetes cluster that you have to
manually configure to scale up and scale down. So I

(13:06):
really do appreciate that about Google, and you know, I
think ultimately there's no in my opinion, better platform per se.
It's just a matter of you know, which one fits
your needs exactly exactly.

Speaker 2 (13:19):
Yeah, that's right.

Speaker 4 (13:20):
You know, there is times where everyone is sowing some
hype about I don't know, Kubernitis, and then everything needs
to be deployed there and then it isn't the thing,
and then it's moved.

Speaker 2 (13:34):
People are trying to build something differently.

Speaker 4 (13:37):
But I guess it's about having a vision and as
you were saying, the tech stack that it's on, one
of these solutions provide and then adapting that or making
sure that it feeds what you need because the product,
because at the end of the day, everything that we build,
develop and implement is is it has to be managed

(13:58):
and understood as a product. If that fits with the
technological stackting, I guess that's the right move.

Speaker 2 (14:05):
And in comparison to this or in alignment with this, I.

Speaker 4 (14:09):
Know, I know that there's been a lot of I
wouldn't say turmoil, but basswords being used, and one of
these ones is big data.

Speaker 2 (14:19):
What does big data into you in which context?

Speaker 4 (14:23):
And did you use for work with any technologies I
don't know, maybe a Spark, CAFCA or hadoop for this
for the purpose of, you know, managing big data projects.

Speaker 3 (14:35):
Yeah, yeah, it's an interesting question. I think, as you say,
big data is one of those terms that gets abused
quite a bit, depending on you know, whether it's a
marketing person using it or data engineer using it. In
my mind, there's the definition is clearly context dependent, and
I'm not aware of any specific you know, number of

(14:57):
petabytes or terabytes or gigabytes of data required for it
be big. But I can tell you that I have
personally worked with you know, petabyte scale data sets. With
that said, I have never actually used any of the
tools like Spark or cofgo or dupe that is explicitly
designed for working with those sets of data. And I

(15:19):
think the reason is that I'm going to go a
little off the rails here for a second, but I'll
promise I'll bring it back. One of my hobbies is
making electronic music, and again I promise it'll come back.
On topic, electronic music is fascinating to me because there
is no right way to make electronic music. And in

(15:42):
electronic music, in contrast to like playing the drums as
I did as a child, there's no right technique and
the technical process is sort of in service to the
musical process. But with electronic music, your workflow and your
set of tools defines the creative process itself and therefore

(16:04):
is an additional dimension of expression that is unlocked by
the fluid nature of the tools. And so I sort
of approach data science the same way. In my mind.
There are better ways of doing things and works ways,
but there's no necessarily best way of doing something, and

(16:24):
so the tools that you choose to use for something
in the context of the product you're working on, can
sort of combine to create an optimal solution or creative
solution to help you solve a problem. And so that's
sort of how I think about these things. It's not
so much a one to one mapping of data set

(16:45):
size to tooling, and so yeah, I've never worked with
these particular tools simply because my personal workflow has never
has never made the investment of time and energy to
learn to use them necessary or worthwhile an investment.

Speaker 4 (17:01):
Yeah, I mean I was simply curious because I mean
when when when I listened to your conversation conversation with
John and we initially discussed the the script for having
a call like this, I was, you know, thinking of
sensors that could be gathering data from I don't know,

(17:21):
the ground for crops, or from a brain, okay, and
I would be and I would be imagining that they
would be capturing data in real time or life in
a huge volume of information. That's why I initially thought, Okay,
maybe for one of these breaks, they or you used

(17:45):
one of these tools and it you know, streamlined or
streaming data in that sense.

Speaker 3 (17:53):
Yeah, yeah, I have now although yeah, it sounds like
a fantastic application for it. Yeah, for sure.

Speaker 4 (18:01):
And I know you've described story briefly your role at
a WUS. I didn't get that part of someone trying
to capture more resources than he or she should.

Speaker 2 (18:17):
Can you go into it a bit more in detail? Please? Yeah.

Speaker 3 (18:21):
Absolutely.

Speaker 2 (18:21):
So.

Speaker 3 (18:22):
You know, fraud exists on you know, pretty much any
platform that has a significant amount of usage on the Internet.
But basically what happens on cloud computing is you have
this resource which is time on a compute device, and

(18:42):
there will be folks who will try to basically steal
that time and not pay for it. And so you
can imagine somebody seeing the price of bitcoin and saying, Wow,
I would love to mine some bitcoin, but I either
don't have the capital to invest in mining equipment, or
I have a lot of skills using the Internet and

(19:02):
cloud computing, and they can devise ways of using cloud
compute infrastructure to mind cryptocurrency without paying for it. And
so those are the types of things where we're trying
to stop them from doing that because you can imagine
somebody could, you know, use all available resources and not

(19:22):
leave any leftover for legitimate users who are trying to
run their businesses or fight climate change, or work on
their PhD projects stuff like that. So that's kind of
what we fight, is theft of the resources and time.

Speaker 2 (19:37):
I was going to say, how can you do that?

Speaker 4 (19:39):
But the thing is you cannot respond me to that
question if you know you don't want people to be
learned how to do this. But I'm guessing that people
just create accounts and they try to, I don't know,
deploy a model in asserting way that it consumes a
lot of computing power or how does it work?

Speaker 3 (19:58):
Yeah, I don't want to go in to how they
do it too much here and give folks any ideas,
But what I'll say is that by far, the most
skilled and creative users of AWS that I've encountered are
the people who are finding really creative ways to exploit

(20:18):
our services to make illegitimate money. And sometimes they aren't
even using us as the end product. They're using our
systems to provide some free service to themselves by getting
very creative with what our systems do, you know, like
just off the top of my head, like figuring out

(20:39):
if a stolen or a list of email addresses that
they've scraped are valid by you know, automating an account
sign up process and then looking to see if they
receive an email to verify their account something like that,
where they don't even need the AWS account, they're just
using our systems as a way to test whether email
credentials are valid. Things like that. Yeah, they get very,

(21:03):
very creative with exploiting our systems for their gain. And
you know, one of the things that makes it particularly
challenging is that this is a global service. It's available
and you know, most places on Earth, and that creates
some interesting economic incentives where folks who might live somewhere

(21:24):
where there's a very low cost of living, the incentive
structure is very different for them, you know, them making
a couple of dollars a day, spending all day managing
or deploying or building some fraudulent script. It might be
worth it for them to do it there for a
few dollars a day, where for us here in the United States,

(21:46):
it just would not be economically feasible, you know, to
do that. So there are a lot of those sorts
of issues that we deal with as well.

Speaker 4 (21:53):
Okay, how does this relate all with the work that
you do from your five to nine I mean from
I mean taking into account the degree, the degree where
you come from. I know that you then data masters
and you complete a masters in computer science and a

(22:14):
lot of that. When did you decide to put that
knowledge of you know, or advanced analytics techniques to work
in regards to sustainability and climate change? And why do
you think data can be a good asset to fight
climate change?

Speaker 3 (22:35):
Yeah, that's a great question. You know, like I said,
I'm still fairly new to this world. And so I
view my forays in private industry that is not immediately
related to fighting climate change as additional education to learn
the skills and tools. And then once I have five
and I switched from my nine to five to my

(22:57):
five to nine, that's confusing. I am able to immediately
apply those new learnings every single day to the work
that I do. So, for example, you know, trying to
understand how humans interact with one another and understand their
behavior in interacting with the technology platform is really helpful
for helping me think about ways to design experiments for

(23:21):
vertical farming. And you know, and I know probably we'll
get to this a little bit later, but briefly, most
of my work today, you know, hands on keyboard dealing
with climate change is in the vertical farming space. My
basement here in New York City is full of tubes
and hoses and wires and LEDs and you know, hydroponic

(23:44):
and aeroponic plant setups. I'm developing a platform that optimizes
the experimentation process for building machine learning models to control
agricultural devices in vertical farms. So the relap there is
basically learning techniques and you know, staying on top of

(24:05):
research that's used in industry that will give me a
tool set that can be leveraged in this fight. And
so yeah, that that's sort of how that that overlaps,
and how I got interested in applying this in the
first place is actually my career change was motivated by
a deep concern about climate change. You know, I, like

(24:29):
I mentioned earlier, I was working in surgery, and in
my downtime between procedures, I would be reading these sort
of pop science books about AI and machine learning and
the climate and the natural world and animal life ecosystems
has always been something deeply important to me throughout my
whole life. And immediately as I was learning these things,

(24:50):
I started thinking, you know, this is like a general
purpose tool that can be applied to any well framed problem,
and I want to learn all about this to until
I get to the point where I'm able to figure
out how to frame this existential crisis correctly, to make

(25:12):
it so that we can use this general purpose tool
to start chipping away the problems themselves. And so, you know,
for me, you're the second part of your question was
how can data be a good asset to fight climate change?
In my mind, we're not going to write like a

(25:34):
deep learning model that will stuck carbon.

Speaker 2 (25:36):
Out of the air.

Speaker 3 (25:37):
We're not gonna, you know, write a transformer that can
you know, make the electrical grid clean instantly. It will
come from using these tools and technologies in order to
accelerate progress that subject matter experts are making in their

(25:59):
own fields. So, you know, data scientists and computer scientists.
One of the things that we often deal with in
our industry is is this idea of a certain sort
of exceptionalism. And I don't mean that in an insulting
way to our own industry, but it's easy to think, hey, like,
we have these really powerful tools, let's just shove a

(26:19):
bunch of data into them. Then we'll be able to
solve problems that people have been grappling with for decades
or centuries. But often when you speak to the actual
experts in those fields, we realize that we're missing a
lot of context and nuance. And so I think, really
the power doesn't come from the data and the models.
That comes from the partnership between those experts in those

(26:40):
fields and our tools and the data. So that combination,
I think is more than the sum of its parts
and allows us to compound progress as a civilization to
the point where we can really accelerate our case the

(27:00):
beds so that we can solve problems faster.

Speaker 2 (27:03):
That's been really well explained.

Speaker 4 (27:05):
I guess that the hardest part here is to make
sure that you link that knowledge that comes from the
business domain, which in these cases, climate, which compasses a
lot of areas, you know, farm scrops and all of that,
is one of them.

Speaker 2 (27:22):
But there is so many.

Speaker 4 (27:26):
Maybe related to emissions, but it may be related to
many too many fields, And it's about having that specific
knowledge and linking it to the data science techniques that
you can put to use, and then data becomes a
good asset, but it isn't a good act by itself.

Speaker 2 (27:41):
I agree.

Speaker 4 (27:42):
I know you do vertical farming by yourself, but I'd
like to know how many bricks are you currently working
on where you can utilize your science skills or analytical
skills that relate to sustainability.

Speaker 2 (28:00):
Tell me case here.

Speaker 3 (28:02):
Yeah, so I'm happy to talk about the ones that
I personally work on, and I can, you know, just
briefly go over a list of some of the others.
You know, I'm always kind of exploring additional projects and
as sure, you know, like everybody's time is finite, and
so there's always a bit of an optimization process there
in terms of making the most impact with the finite
amount of time. But the two avenues I'm working on

(28:23):
now are, like I mentioned earlier, developing a platform to
allow us to sort of accelerate the process of building
and designing experimentation for vertical farming. So basically, you know,
how can we rapidly train models to control vertical farming

(28:44):
technology so that we can produce better food with fewer resources.
How can we grow new types of food without starting
the whole process all over again, Like how can we
transfer some of the learning from our previous work to
new species and new types of crop. So that's sort
of the main focus there. In addition, I also work

(29:04):
with at a group here in the United States and
that also does some work abroad and most the United
States called Citizen's Climate Lobby, And we organize politically to
advance non partisan, sort of politically neutral climate policy that's
sort of economically driven. And so there's some work there

(29:27):
that involves using natural language processing to sort of drive
our efforts and maximize engagement of our members. Our volunteers
and to basically be most efficient with our volunteer resources
to increase the amount of impact we can have. And
that's an example where like I didn't, I certainly did

(29:50):
not found Citizen Climate Lobby. It's it's been around for many,
many years. I'm simply trying to find ways to help
optimize the process there, using data science and the tools
I learned at UNTAPPED, doing that for language processing to
catalyze faster progress there so that we can make a
bigger impact using those sort of political activism techniques that

(30:15):
I've learned from the subject better experts. So that's sort
of where my time and energy is focused. There is,
like you said, a huge list of applications, I mean,
everything for mitigation of climate change, like optimizing electricity systems,
redesigning buildings and cities, optimizing industry, forestry, and land use,

(30:39):
all the way to adaptation. You know. It's looking at
things like biodiversity and solar geoengineering, even things like climate
modeling and analysis. I think though, my favorite project, one
that I'm really excited about, is one called the Earth
Species Project. Have you heard of this one?

Speaker 2 (30:58):
Nope?

Speaker 3 (31:00):
Okay, so this one it's sounded by Azaraskin, who is
from the Center for Humane Technology. He invented infinite scrolling
when you go down a timeline or something. And Brett
Selvatl who was on the original founding team of Twitter,
and this product is basically looking to The way I

(31:20):
think about this is to use word embeddings to translate
animals sounds into human language. So basically trying to build
a sort of AI powered Rosetta stone that crosses species boundaries.
And the goal here is not just to be able

(31:42):
to talk to your dog, it's to be able to,
in a certain sense, make humans care a little more
about non human animals. The first one that they're focusing on,
I believe is the sperm whale and trying to translate
their songs into something that we can understand. There have
been some recent studies about how dolphins name each other,

(32:04):
and so we know that they have some some concept
of language that they use. But the goal here is
to give humans a connection, a deeply personal connection with
some of the non human life on Earth. And that's
one of the things that stresses me out a lot
or gives me anxiety, is the way that we as

(32:25):
humans have a certain tendency to view the natural world
as a resource for extraction rather than a system of
which we are one component and which is highly interdependent.
So if we can sort of humanize other animals by

(32:45):
giving them a voice, a literal voice, then perhaps, maybe
and hopefully it will allow us to kind of think
differently about how we interact with the natural world beyond
humanity and allow us to make odd social and behavioral
changes that will kind of bring us back from the brink.
So I'm really excited about that project. That's just one

(33:08):
of many.

Speaker 4 (33:10):
I mean, it sounds really exciting, and your reasoning makes
complete sense. And now I realize that the kind of
prads that I was discussing with my colleagues here seems
easier compared to that one. I mean, using data to
make sure that we understand how animals communicate what they feel,

(33:30):
and think that that is much harder than I don't know,
trying to predict if it's going to be study and
then based on that, knowing how much energy you can
produce so that you make sure that it gets feedback
or fed back to the network, you know, which is
one of the traits that we are working on ourselves. Okay,

(33:52):
I know that in some of these bricks using sensors
to gather data. You know that we've mentioned before IoT
technologies Internet of things technologies.

Speaker 2 (34:04):
Is one of the key areas.

Speaker 4 (34:06):
I don't know if you've encountered any challenges or in
your point of view, what's the hardest thing when it
comes to bring in data from IoT technologies in any
sort of data format or way that you think it's possible.

Speaker 3 (34:20):
Yeah, you know, I work a lot with sort of
IoT devices, raspberry Pies and arduinos down in my basement.
In my experience, the biggest challenge is more on the
hardware front, is getting reliable data out of the hardware
and validating that what the information you're getting is correct.

(34:44):
And that's sort of where a lot of my work
on the data side has come in, is monitoring data
quality in some real time way and ensuring that I'm
getting reliable, accurate data over time. And so that has
sort of started. My initial trapesing into that started with

(35:07):
basically just building costly redundant systems with sort of reference
systems that I could compare them to, and then using
that to sort of gather data and understanding sort of
the distribution of values that would come out of a
sensor like learning what if a sensor failure looks like
from a data perspective, and then building some like real

(35:30):
simple like heuristic or simple regression models to understand data quality.
The other thing that was a surprising challenge for me
in this domain is dimensionality. You know, you get this
enormous volume of data, and like, what if I just
want to log onto my dashboard and like, look at

(35:53):
what the pH of my nutrient reservoir has been over
the last twenty four hours. You know, if I have
a data point for every you know, six seconds on
pH because I'm constantly pulling that, Do I really want
to wait around for you know, thousands or tens of
thousands of data points to like be pulled from an
API and then plot it on like a little tiny

(36:14):
chart on my dashboard, Or do I need to like
maintain some lower dimensional representation of that in a separate
file that updates regularly, or you know, do I do
that on the fly in the front end. So's the
sort of questions that I've been grappling with with IoT
devices is is taming the huge amount of data to

(36:34):
make it actually useful for human consumption, so that you
can actually drive decision making.

Speaker 4 (36:40):
I see, and that's one of the hardest I believe. Okay,
you have.

Speaker 2 (36:46):
To be honest here.

Speaker 4 (36:47):
Have you considered launching your own business ord doing freelancing
only related to brakes that are linked to I don't know,
sustainability when you production of energies, sustainable cropping, any of
these business models where you know, I think there is

(37:08):
a huge room for growth and I think there's going
to be more in the upcoming years.

Speaker 2 (37:12):
Have you considered that seriously?

Speaker 3 (37:15):
You know, I have, and I think that that is
in my long term roadmap. Like I said, right now,
I'm at AWS and I'm learning a lot and have
no immediate plans of changing that because of the value
that I get from learning accelerates my own research into this.
But it is my hope that eventually this will go

(37:37):
beyond my basement and have a much broader impact. And frankly,
whether that happens through me starting my own firm or
joining another firm, or just even you know, contributing original research.
The details of how it's implemented or immaterial to me.
What really matters is like the impact, you know, where

(38:00):
suring that the largest impact possible comes from the work.
With that said, yeah, I'm definitely interested in entrepreneurship more broadly,
and I do have other products and related to all
these that I'm always kind of working on as well.
So you know, I thought I didn't wheezle my way
out of that one too much.

Speaker 2 (38:19):
But that's fine.

Speaker 4 (38:21):
I mean, you've been a little bit politically because you
have to be, but you beampus. I'm fine. Last question,
and I guess a complex one. What do you think
our industry, as you know, data science, pants analytics, and
a lot of that is lacking. I feel that there's
a sense of reality missing here because there is a

(38:43):
gap between what's technically possible. You know, there's a lot
of deep data tech advancements, a lot of papers being
published maybe related to the learning modeling wherever it is,
and then what the business needs. And something that you
were mentioning before is that now that we have a hammer,
which is data science, we try to apply that to

(39:03):
everything when sometimes the answer to a challenge is much
to employer than that. Right, if you can do something
without data science, do it without data science, and it
may be an easier solution to business purning. But what
do you think this industry is lacking right now.

Speaker 3 (39:22):
I think it's a great question, and I think you
make an excellent point that sometimes like data science isn't
the answer. And I think that in general, and this
is my opinion and nothing more than that, a bridge
between other domains and our domain would be hugely advantageous

(39:43):
both for both of us. It would be it would
allow us to compound learnings from each other, so it
would help us to actually drive impact and take our
work beyond toy models and proofs of concept to actually
making a real difference in the world. I would also
allow us to understand sort of the nature of the
complexity of the real world better, you know, Like i

(40:07):
can train as many models as I want on MNIST
handwritten digit data sets, but until I'm in the real
world and dealing with the true complexity of like what
faces look like from people of different skin colors, like,
I'm going to have a hard time making models that
have genuine social value. And I think that bridge there,

(40:28):
connecting people who are experts in what we do with
experts in other fields and allowing for the sharing of
knowledge to compound the impact of what we do is
a challenging problem. To solve and almost like a discipline
in and of itself, but something that will I think

(40:50):
be critical for scaling the impact of AI and machine
learning moving forward.

Speaker 2 (40:55):
Great answer.

Speaker 4 (40:56):
Actually, this is a feeling that we share ourselves as
a consultancy. Sometimes we encounter business challenges where we need
to learn from the business model of our clients. We
need to mix that and their knowledge and their inheritance,
you know, in terms of knowledge again with what can

(41:17):
be done using data, whether that's simple statistics or data science. Right,
And we do have a few design service design processes
that allow this to happen, that are allowed to take
into consideration everyone's opinion and everyone's point of view, just
to make sure that wherever is design and prototyped as

(41:40):
a data product right with that product layer and that
product management understanding, its shared and created by the knowledge
of both sides, of the business domain experts and the
data science experts. I guess that the answer to your concern,
which we do share is using this design thinking is

(42:03):
one of the techniques when it comes to the IDE
eighteen these solutions to business problems, but other service design
methodologies may work too.

Speaker 2 (42:16):
Yeah, that's how I feel about it.

Speaker 4 (42:18):
Any Anyhow, I know we're getting close to the hour
and it's been a wonderful conversation. But before we close
the call off things, I like to appreciate the time
that we've spent here. But also I'll ask for one
book recommendation and one name of our future.

Speaker 2 (42:36):
Guest that it should be bringing to Data stand up
by Vetro.

Speaker 3 (42:41):
Yeah. Absolutely so. Yeah. First of all, let me just
say thanks for having me. This has been an absolute pleasure,
and I really enjoyed the conversation and I'm really glad
to hear that you all are working on the actual
problem of connecting domain experts with data science experts. So awesome.
My book recommendation, it's it's a bit of an optimistic read.
I will say, it's sort of a vision document for

(43:04):
a particular subset of solutions that I am passionate about personally.
It's called The Vertical Farm, Feeding the World in the
twenty first Century by doctor Dixon Despomier, And this is
a book that it's sort of a high level overview
of vertical farming and the power it can have and

(43:25):
the impact it can have on sort of transforming the
world and decoupling land use from agriculture and feeding humans,
and it goes beyond that a little bit and talks
about like designing better cities and our interactions with ecosystem
and the natural world more broadly. So definitely would recommend
that one.

Speaker 2 (43:45):
So interesting.

Speaker 4 (43:46):
Yeah, before you give me the guest, they have to
say that there is one restaurant I Matrid. I probably
shouldn't give the name while we record this, but it
will share it with you through direct message or something.

Speaker 2 (43:57):
They do vertical farming.

Speaker 4 (43:59):
And actually it's one of the most famous restaurants in
Spain because everything that you taste and that you have there,
it tastes wonderful.

Speaker 2 (44:07):
I mean it's really flavory, if that makes sense.

Speaker 4 (44:10):
Even you can have a tomato and it did like
it wasn't apple, and it's amazing.

Speaker 2 (44:14):
And the work to go farming. So I have to
start this restaurant with you in case you ever visit
to Spain.

Speaker 3 (44:18):
Amazing.

Speaker 2 (44:20):
Yeah, now I have to.

Speaker 3 (44:23):
Yeah, I would say that for a recommendation of a
next guest, it would be the fantastic Pria Dante. She
is a PhD student at Carnegie Melon and she is
the second author of the seminal paper Tackling climate change
with machine learning that you may have seen come out
recently with a lot of big names, and it's sort

(44:44):
of the standard gold standard of addressing climate change with
machine learning. Wow. Yeah. And she's also one of the
co founders of climate Change AI, which is a fantastic
resource for any of your listeners who would love to
learn more and ways to get involved. Their website is
just climate change dot ai. And if you're interested in

(45:06):
learning more about the intersection between data science and climate change,
you would be a fantastic person to speak to.

Speaker 2 (45:12):
Yep. Okay, awesome that that that is a huge intersection.

Speaker 4 (45:15):
So I'm sure a podcast will do, but we will try.

Speaker 3 (45:20):
Absolutely.

Speaker 4 (45:22):
Thank you again again we as pe trop we appreciate
you taking the time answering you know this knowledge about
data science hand yeah, in regards to climate change.

Speaker 2 (45:33):
You're telling us.

Speaker 4 (45:34):
About your work at the ABLUS very briefly but very insightful,
and we hope to speak.

Speaker 3 (45:41):
So absolutely and thank you again for giving time to
this really critical issue.

Speaker 2 (45:46):
Thank you, Bean, Take.

Speaker 5 (45:47):
Care anst and Costs District CONSTRUCTU
Advertise With Us

Popular Podcasts

Stuff You Should Know
Dateline NBC

Dateline NBC

Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

The Breakfast Club

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.