All Episodes

March 21, 2022 20 mins

Maarit Widmann is a data scientist at KNIME (www.knime.com)
In this episode, we discuss:

  • What KNIME is
  • Why the visual programming paradigm (workflows, rather than code) is appealing to both technical and non-technical professionals
  • The most challenging aspects of the data science lifecycle, and what auditors should be aware of
  • The most important aspects of the data science lifecycle 


About this podcast
The podcast for performance auditors and internal auditors that use (or want to use) data.
Hosted by Conor McGarrity and Yusuf Moolla.
Produced by Risk Insights (riskinsights.com.au).

Mark as Played
Transcript

Episode Transcript

Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Narrator (00:07):
You're listening to The Assurance Show.
The podcast for performanceauditors and internal auditors
that focuses on data and risk.
Your hosts are ConorMcGarrity and Yusuf Moolla.

Yusuf (00:20):
Today we're really excited to have Maarit
Widmann on the show.
Maarit is a datascientist at KNIME.
I'll let Maarit explain whatKNIME exactly is and what
they do; we've spoken aboutit on the podcast before.
But Maarit is a data scientistthere and looking forward
to an interesting chat.
Maarit, welcome to the show.

Maarit (00:37):
Thank you very much.

Yusuf (00:38):
Okay.
So we'll kick straight into it.
The first thing we wantedto understand is, what your
professional background is.
If you're able to give usa little story about where
you've come from academicallyand what you do now.

Maarit (00:48):
My professional background is my
history at KNIME.
So I come directly fromuniversity to KNIME as a
very junior data scientist,and here I have been growing
together with the companyand as a data scientist.
So my background in studiesis in data science, and then
I also have a bit of sociologyand social sciences there,
and that fits that teacherrole and the education role

(01:10):
that I have at KNIME now.
Our software that the companydevelops is based on this
visual programming paradigm.
And the idea is thatyou don't need to code.
You can make complex and easyand any kind of data science,
operations without coding.
You just build your workflowsin a very intuitive way, as

(01:32):
blocks of operations that youcan see in the user interface.
And this is the tool andour audiences are then
data scientists, they arebusiness analysts, they
are several kind of users.
And my role and the role ofmy colleagues in the education
team is to bring the tool intothese different audiences.
It's a lot of writing, it'sgiving courses, it's teaching.

(01:54):
It is also givingpresentations of use cases.

Conor (01:58):
What attracted you to data science in the beginning?

Maarit (02:02):
As a student, data science was very innovative.
And the learning is so hugein everything that you do
there is always some learning,that is very rewarding.
and the other thing is theresults they are so concrete,
you build an application ofchurn prediction, you apply
it somewhere and then you seeimmediately, okay, this saves

(02:23):
us this and this much of money.

Yusuf (02:25):
With the work that you do at KNIME, what is it that
keeps you going and keepsyou interested and excited
about the data science field?

Maarit (02:33):
Firstly, I like very much working at KNIME and
people with different skills.
The data science is what isthe common thing that we have
good people who can develop thesoftware or can use the software
who come with great ideas.
What could be the applicationsand this kind of environment
is something very exciting.
On a personal level it'salso the variety in the

(02:54):
tasks that you can do.
So it's a lot of writing.
It's a lot of teaching.
It's also a lot ofthinking on your own and
coming with good logic.

Conor (03:02):
Can you give us a little bit of the history
about KNIME itself as acompany - how it began and a
little bit about its journey?

Maarit (03:09):
Actually at the moment how KNIME is and the
size and the reach of thesoftware and the company
is much more than what ithas been in the time when I
started, for example, at KNIME.
So it has been growingquite fast and the beginning
was about 15 years ago.
And it was at the university.

(03:30):
So there may becomes theinnovative background that
we still live at KNIME.

Yusuf (03:36):
What would your use of I use KNIME daily.
I start from an application andI have already an idea how the
applications would look like.
I know what kind of dataI have, I know what I want
to have as a result, Iknow what my audience is.
And then the focus is on okay,what I'm going to do, I'm going
to build a visual workflow.

(03:56):
I am going to dosomething that is easy to
present to the audience.
I'm going to buildsomething that meets the
expectations of the audience.
There is a lot of also thiscreative part and looking
what kind of other solutionsthere is and how can I make
it maybe easier with thevisual workflow or how can
I communicate it besides theworkflow and they saw a lot

(04:19):
of other aspects includedthan just building the little
pieces in the analysis process.
That is maybe the mainway how I use KNIME.
Ut then there is also . What dowe want to analyze internally?
If I want to know howpopular my blog post was
in the last year, then Ido an analysis with KNIME.
So there is also thisexperimental way of
building the workflowsand seeing what comes out.

(04:41):
So lots of time spent with little yellow boxes.

Maarit (04:43):
Exactly, exactly, especially the yellow ones.
These are for the thingsthat you never present,

but t (04:49):
filtering rows, filtering columns, making aggregations.

Conor (04:54):
Can you tell us some of the benefits of using a workflow
approach, as opposed to saya more traditional approach
that's heavily based on code?

Maarit (05:03):
The workflow based approach.
Firstly, what I like very muchis this communication part;
that you can bring togetherso many different experts
who are all looking at thesame workflow and they just
see, they can understand it.
They understand differentthings about it, but it's
all relevant to all of them.
If they go look at a blockof code, there is the data

(05:25):
scientist, there's thecoding expert who sees
what is happening in there.
And it's not super usefulfor the other people who
are interested in the biggerpicture in the process.
This is what I like the most.
And the learning curve - soit's so easy to learn that.
I also have internallysome people who use KNIME
in the finance and themarketing department.

(05:46):
I really like the thing thatif they need help with their
workflows, I can go there andsay, okay, here is the node.
You need to match these columnsand all you need to change here
the string to date and time.
I go there once and I don'tneed to go there as a coding
expert and say, ask again whenyou need this another time.
I showed them once.
Then they recognizewhich yellow node it is.

(06:07):
They use the yellow node,second time and it's
already in their toolkit.
They can use it.
The people outside the datascience field can learn it
so quickly and intuitively.

Yusuf (06:18):
Excellent.
So those are internalusers focused on corporate
administrative functionsthat are using KNIME
themselves not just the datascientists or data experts.

Maarit (06:27):
Yeah, definitely.

Conor (06:29):
Over the years Maarit you must have worked
on many interesting andfascinating projects.
Is there one particularproject that you see
as the most interestingthat you've worked on?

Maarit (06:40):
The most recent one that I'm working
on is very interesting.
So I'm writing a bookabout time series analysis
with my colleagues.
this is about codeless timeseries analysis, and it combines
so many interesting aspects.
The first thing is "codeless"- to make time series
analysis is super relevantin different industries.

(07:01):
You want to predict sales, oryou want to predict disease
spread or something like that.
But how do you do that?
Many algorithms are in Python orare in R, and then you need the
data science expert to build it.
But at KNIME, we have beenworking on components to
make these algorithms,ARIMA and so on, for time

(07:22):
series analysis, codeless.
These have been existingnow for a while, and now we
are making more promotionfor them in a book.
This is something that has beena very long journey, but also
very nice project to work on.

Yusuf (07:35):
Time series analysis can become quite complex, when it
comes to forecasting, et cetera.
But of the things that weoften see within the audit,
internal audit and performanceaudit realm is where we have
data for some years or someperiods, but we don't have
data for others or where therepotentially is missing data
for say certain days or certainweeks or certain months.

(07:57):
The techniques that you'redeveloping, can you use that to
predict what those numbers mightbe and then fill in those gaps?

Maarit (08:05):
These real life problems is something that fascinates me.
In time series analysisyou need to know your data.
Sometimes maybe with themachine learning models, you
can just have some bunch ofdata, you give it to a model
and you get some results.
it's on you to think if theymake sense or not, but in
time series analysis, thepre-processing part, the data
access part, it's so big partof the complete analysis that

(08:30):
you need to think, what do I do?
Okay, I'm missingone week of data.
I'm have missing valuesand to make that model
actually work it's.
The analysts tasks to say,okay, how do I replace
the missing values?
Where do I get moredata so that it works.
Or if I don't have data,maybe I change the model.
And there is also wherethe business analysts or

(08:52):
the people giving you thedata come together to make
the whole project or theanalysis work the best.
In the book that we arewriting and in the time series
materials that we have put alot of emphasis also on this.
Okay.
How do you start?
What do you see from the datajust by data exploration,
if you can already usethat for forecasting.

(09:12):
Then at the end, we've builta model, but let's see if the
model is actually bringinganything on top of that.
Or it's just the regularthings that we find by
data exploration, ifthey do the work already.

Yusuf (09:23):
So you're talking about things like if we missing
one week in a year, we can'tnecessarily look at the week
before the week after, becausethey might be seasonality, which
means we need to bring the sameweek from the previous year
and try to adjust it this way.

Maarit (09:36):
Exactly.
And this is where itcomes to knowing the data.

Conor (09:39):
And then the audience for your book Maarit,
will that be aimed atbeginners or intermediate
level practitioners?

Maarit (09:48):
Yes.
Firstly, people who wantto learn about time series
analysis, we start therewith some theory, what
is seasonality what isstationarity, or they can also
be advanced data scientistsjust with not the knowledge
of the time series analysis.
Then the audience who aregoing to use the time series
analysis in practice, we havekept it on a very basic level.

(10:09):
So you don't need to be a KNIMEexpert to be able to start
with the workflows in the book.
So if you can open a workflowand build the first bar
chart, it should be enoughto open the workflows in the
time series analysis book andsay, okay let's learn this.

Yusuf (10:23):
So you mentioned earlier that when undertaking a time
series analysis, exercise,that one of the things that
you need to make sure you lookat properly is understanding
your data exploring the data.
More broadly, what do you seeas the most important aspect
of the data science lifecycle?

Maarit (10:39):
Based on the experience that they have and what I just
described in this communicatingand bringing the people together
to do work, work together onthe whole process, as it is
described by the life cycle,maybe a quick review of
the life cycle, what it is.
why a diagram that dynamicsin the data analysis process.

(11:02):
So it starts fromdata access to data.
Pre-processingthis yellow notes.
Then it goes to modeling.
Then it goes, also do thedeployment because we don't
build the model just for havinga nice model, but we want to
make some use of it and howthese different pieces of
the process follow each otherand interact with each other.
It's often not super clearthat single people working,

(11:25):
for example, on the modelingor accessing the data that,
these pieces interact.
That if I have a super goodmodel, I cannot concentrate
only on tuning it and makingit from 98% accuracy to 99.
If my data is not representativeof the population, for
example, maybe I need togo back to the data access

(11:47):
phase to make it actuallywork in the practice, or if
I have a model that is good.
Now maybe I need to checkit in next month next year.
Is it still good?
And then go back to thebeginning of the process.

Conor (12:01):
So some of our listeners will be starting out on
their data science journey asauditors' are there any sort
of common pitfalls or commontraps or any tips you might
have for them about areas tofocus on in that data lifecycle?

Maarit (12:17):
I would say that the challenges of the auditor's
are at the beginning of thelife cycle, there is maybe not
so much in the modeling part,but there is this data access
and the data pre-processing.
One of the challenges is toget into the life cycle and
let the workflow do things foryou, and not maybe say, okay, I

(12:38):
have been doing this manually.
Let me copy paste thisExcel sheet or let me just
write the functions myself.
Or if I need something morecomplicated, I will ask someone
else and forget about it.
Just to see the own work aspart of this process and say,
okay, I have my yellow nodes.
I let them do my work that Ihave been doing manually so far.
And if I need somethingmore complicated, I go

(13:01):
to the modeling experts.
I let them help me and I letthem complement my workflow.
Then it's still under mycontrol, the workflow,
and I can build on that.

Yusuf (13:11):
So you mentioned various aspects that are really
important in the life cycle.
But based on your experienceacross, several projects I
imagine, and having worked withmany people and especially when
you training lots of peoplethat at various levels of
capability and understanding.
What have you found tobe, across all of those
people and across all ofyour projects, the most
challenging part of deliveringa data science project?

Maarit (13:34):
The start is often quite difficult because people
have preferences in terms ofwhat kind of tools they use.
And to make them speak thesame language I cannot say
I want to, I have an idea.
How do we make thismodel 99% accuracy?
If the language that thebusiness analyst speaks, I
want to save five hours ofmy people's working time.

(13:56):
So to find the commongoals, this is maybe
the first challenge.
And we say, okay, how arewe are going to reach the
goals with this workflow?
And just to have a bit of thepatience there and see, okay,
you are now using this tool.
I'm I may be using Python.
Now I could do thisin two minutes.
But let's buildtogether a workflow.

(14:17):
It takes maybe instead offive minutes, it's taken
out 15 minutes and nowwe have something that is
understandable to everybody.
And it's the commonplace of communication.

Yusuf (14:28):
So that's interesting.
The actual technical aspectsof the project are probably
the easier part, once youdecide you're going to
be doing this and thenactually going and executing
it, pulling all the boxestogether and linking them up.

Maarit (14:37):
That's maybe also related to my role as a
teacher and how I in withwhat kind of tasks I look
for myself, but yeah, that's.

Conor (14:47):
I think your example was perfect.
And, the example you gaveof, we want to get reduce
the time spent by five hours.
But somebody may have adifferent goal and it's
trying to bring thosetwo goals together.
So that you're working onthe same objective, albeit
from different perspectives.

Maarit (15:03):
That is definitely, then that's also how, yeah,
if I write a blog posts, ifthe blog post says this is the
workflow, then always needsto be a story around it so
that the audience understands,okay, this workflow is actually
exactly the same thing thatcould also work for my use case.
And then the story needs to besomething practical, like fraud

(15:26):
detection, or churn prediction.

Conor (15:28):
Do you get many examples?
Of users in the real worldcoming back to KNIME and saying,
oh, we use KNIME for thisparticular problem, or to help
us with this business issue.
And it even surprisesyou guys at KNIME.
Oh, we hadn't thought abouthow the software could be
used for that use case.
Do you see that much?

Maarit (15:47):
Yes.
Because we have quite anactive community we have
in the KNIME hub, we havethis workflow shared by the
community and it happens moreoften than people think that?
I look for.
Okay.
Do we have about marketinganalytics or do we have about.
Some specific technique, thenI look for it KNIME and I
see, okay, there is a useractive in the community has

(16:11):
created example workflows.
Maybe this example of workflowscome from a request on
LinkedIn or on the KNIME forum.
So there is a lot ofinteraction going with where
you have communication isa workflow also between the
members in the community.

Yusuf (16:26):
Yeah, that's interesting.
I remember at one of the summitsthere was an individual that
worked for a fire regulationauthority that was using
KNIME to translate documentsinto multiple languages
so that they could sharethat with other countries.
And not something that youwould naturally think about in
your normal day-to-day work.

Maarit (16:43):
Yeah.
Especially these use cases thatI never thought of, I could
build myself as an example.
There is so interestingthings circulating.

Yusuf (16:51):
It definitely goes beyond the identify cat
faces, which is interesting,but it's a bit overdone.
In terms of people out therethat would like to learn how
to use KNIME themselves, howwould they get started like,
if I've never seen it myselfbefore, maybe I've heard of
it, but don't really know,what's the best place to start?

Maarit (17:09):
If you want to know what KNIME is, what
use cases are there?
We have regularly eventsand data talks, and maybe
it's a good starting point.
to our website and see whenis the next events happening
so that you get to know thepeople working here, get to
know the customers and theirstories, how they use KNIME.
This is on the very high level,what the company is, but if

(17:31):
you say, I want to use thesoftware, I want to use the
analytics platform right now andstart with visual programming.
Then in the same place, fromour website KNIME.com, you
can land on a learning page.
And depending on the typeof learning you prefer,
there are online courses.
You can find me, for example,one of my colleagues in the
education team and join acourse where we go through the

(17:54):
steps in learning the basicsof data science or basics
of KNIME analytics platform.
Or if you say, I can doit myself, there is also
self-paced courses where youjust, watch videos and complete
the same steps on your own.
And we have webinars, wehave books, so they're good
starting point for that is thelearning page on our website.

(18:18):
The software is open sourceand you can, from the web page
KNIME.com you can find on thefirst page, the download button
and that's two or three stepsto download it on your machine.

Conor (18:29):
Fantastic tool.
And one of the things from anon-data scientist, certainly
not even close to being one,the workflow certainly gives
a degree of transparency andunderstandability around the
analysis, and you can seeexactly what's going on in
the logic chain, which isfantastic, I think, and would
be really helpful to auditors.

Yusuf (18:50):
For sure.
Yeah.
We've been using it since2016 now, and it's so easy.
You create a workflow and,I can show it to Conor
and he can review it.
Whereas if I had to give hima thousand lines of code, he'd
point a gun at me very quickly.

Maarit (19:03):
It becomes beautiful.
If you see your process thatstarted with the mind map that
is so chaotic on your paper,and then you extract there the
different pieces and you canput it in the analysis itself
that it looks as logical asyou thinking is at the end.

Yusuf (19:17):
Maarit, how can people Read more about the
work that you do or readsome of your blog posts.
get in touch with you andthe rest of the KNIME team.

Maarit (19:24):
When you download KNIME, there is also the
option to register forregular updates regarding
blogs, upcoming courses.
This is one good way ofkeeping in touch with what is
happening organized by KNIMEand what is happening with the
software and the people here.
every time you open theanalytics platform, you will
always also see a welcome page.

(19:44):
And there we show you a news.
What is the blogpost coming out?
What are some newfeatures, for example?
So we will keep you updatedif you just keep using the
software, but also we have aMedium blog for low code data
science and LinkedIn, Twitter,we update there regularly
and there the people workingat KNIME are very active.

Conor (20:07):
Hopefully a lot of our listeners will jump on the
KNIME website get stuck into it.
Maarit, great conversation.
Thanks for your time.

Maarit (20:13):
Thank you very much.
I hope to see more peoplejoining them and hope to teach
you at some point as well.

Narrator (20:19):
If you enjoyed this podcast, please share
with a friend and rateus in your podcast app.
For immediate notification ofnew episodes, you can subscribe
at assuranceshow.com - thelink is in the show notes.
Advertise With Us

Popular Podcasts

Fudd Around And Find Out

Fudd Around And Find Out

UConn basketball star Azzi Fudd brings her championship swag to iHeart Women’s Sports with Fudd Around and Find Out, a weekly podcast that takes fans along for the ride as Azzi spends her final year of college trying to reclaim the National Championship and prepare to be a first round WNBA draft pick. Ever wonder what it’s like to be a world-class athlete in the public spotlight while still managing schoolwork, friendships and family time? It’s time to Fudd Around and Find Out!

Crime Junkie

Crime Junkie

Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you won’t hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, you’ve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.

The Breakfast Club

The Breakfast Club

The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

© 2025 iHeartMedia, Inc.