The Future of Data Transformation: Inside the Development of Babel - EMx 253 - Elixir Mix

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:06):
Hello, everybody, Welcome back to another episode of a Lectro Mix.

Speaker 2 (00:09):
I am leading the discussion today.

Speaker 1 (00:10):
This is Alan Wima, and today in the panel we
have Adie Aingar and we have a special guest returning
from a long overdue I don't know time to return.

Speaker 2 (00:20):
I'm not sure how to say. We have Alex Wolf. Alex,
how are you? Hey, I'm great. Thank you for asking. Yes,
I mean like some people might recognize the voice, right.
I've been on Alexia Mixed as a panelist for honestly,
I don't know how long, quite a while, but I
kind of went into a ye hi address at the
end of last year for personal reasons. I'm not going
to go into in depth, but yeah, today I'm just
here as a guest that will also stay that way.

(00:42):
I've currently no plans to return to it as a panelist,
but it's nice to be here again. Thank you Allen. Yeah,
you're sadly miss We always missed having your point of view,
so you're definitely welcome back whenever you would like to
come back. Also from whatever for people have been missing
my swearing.

Speaker 1 (00:56):
Yeah, it's become more of a PG thirteen show other
than a rad show. Potential left, I would say, but
still good.

Speaker 2 (01:03):
Yeah.

Speaker 1 (01:03):
So now that you're back, and I guess we can
get back onto the main topic, right, You've come to
talk about this new library you're working on, right, Yes.

Speaker 2 (01:11):
I mean, like there is a bunch of name confusion
around it, which I didn't realize when I published it,
but makes sense. But the library is called Bubble, and
there's absolutely nothing to do with a JavaScript the library,
but the JavaScript transpiler or like also the same called product.
It's the data transformation library. And we've been using like
a pre release version of that at work for over

(01:34):
a year now, and I just recently took the time
to kind of polish it up and package it into
something that is nice and presentable. I'm still working on
it in my free time to kind of make it
a bit more approachable and improved documentation and shit like that,
set up a live book example, things like lease. But
it is on one dot zho and it is out there.
It's production really, I mean, like we've been using it
for over a year in production.

Speaker 1 (01:55):
Maybe you can go more into specifics about is a
data pipeline transformation library?

Speaker 3 (01:59):
Is there?

Speaker 2 (02:00):
That's right? Right? Yeah, pretty much the idea is and
we've been using it mainly to integrate with external APIs
because at least from like how we are building applications
that we have internal models, right like in our case,
those are just instructs. We tend to use the type
of struct library for that, and then we transform whatever
we get from external APIs into these structs, and there

(02:20):
is some tooling around that in the ecosystem. I mean,
the first thing that comes to mind is something like
ecto or maybe embedded ECTO schemas those things. But I'm
not sure like how many extern API integrations you've personally done.
But like usually the API responses don't quite map onto
what you would like in the ideal case, right because
I mean, and then you either do one of two things.

(02:41):
You kind of adjust your models to be closer to
the API responses just for convenience, or you have to
write a bunch of boilerplay code. And when I say
a bunch of boiler play code, I mean things like okay,
like I fetched this nested thing here, but what is
that nested thing? And actually not said right? And then
I want to have an error what if thing over
there is a different type and so on and so forth,

(03:03):
and that is kind of how bubble was born, because
like I had to write a bunch of those integrations,
like we integrate them with an external graph qul API,
and I was getting sick of it, honestly, so I
kind of started to write a few abstractions where I
could then say, okay, like please fetch that nested key
over here, or please transput that into like this structure

(03:23):
over there, and then like evaluate the keys over there
by fat again fetching stuff from there, and so on
and so forth, and it kind of accumulated into a
collection of tools that was just very useful because what
you end up doing is you specify what you want
to do, right like the happy path basically like fetch here,
transform there, put it into this map, and then the

(03:44):
library itself like takes care of inittiquity, the ugly bits.
And that is like also the the parts I've been
polishing recently where you get a lot of very elaborate
and very specific reporting if something goes wrong, because honestly,
that is the number one point, like when you do
have these non trivial data transformations and then something breaks

(04:05):
somewhere in there, somewhere in as did you just sit
there and go like ah, fuck, Like where does it
go wrong? Now? Right? Like what exact mistake did I make?
Do I have a typo somewhere? And depending on how
deeply nested it is and how non trivia it is,
this can be a pain in the butt to debug.
And like that's exactly also the kind of experiences we had,
and that all of that the work I've put into
the library is exactly around that. So like you specify

(04:28):
the happy path and it takes care of the ugly
bits and tells you exactly why something went wrong and
where it went wrong.

Speaker 1 (04:34):
Is it like once it reaches an error, then it
stops continue on like kind of like plug does or
does it keep trying to do more?

Speaker 2 (04:41):
Basically, yes, it's like a composition of like I can
compose all of the different things into pipelines or into
like steps can have basically substeps, and as soon as
like any of the step returns an error, like processing stops,
a pipeline in theory can have like an on error handler,
so you can recover from an error if you want to.

Speaker 3 (04:57):
But that is just a video, so I check the
multi data process basically.

Speaker 2 (05:01):
Yeah, kind of Ye, that's quite interesting because I think
I did have some stuff quite similar to what you're
actually trying to do, So it's maybe something I'm actually
going to look into. Yeah, And I mean maybe the
elephant in the room is where why not use Acto
directly for example, And honestly you probably could do that.
It's just I mean, like there's this thing called field
mapper in Acto where you can then also transform things.

(05:23):
The thing what I liked about like the approach we
do here and what I prefer why I prefer it
at the end of the day is it's very expressive
in like the data structure you have right like why
you can say, okay, like I want to transform this
thing into that thing over there, and honestly it doesn't
have to be s worked. You could transform it into
my map. You could transform it into a keyword list,
you can transform it into it like whatever, into a

(05:44):
bunch of tooples. Like it's completely agnostic in that way.
I mean, like we are mainly using it to transform
it into structs, but you get kind of like a
self descriptive thing that kind of tells you, Okay, this
thing that just those fields over there, this thing that
just those fields over there, and I presume you could
something similar with ACTO. But yeah, at the end of
the day, I mean like more model libraries kind of

(06:05):
in a similar space. I don't I don't see anything
bad in that. Like one little advantage there is that
Bubble has overact on this case, and an actor you have,
like the whole transformation is coupled to the model itself.
I mean, Bubble doesn't really care about the model. Like
at the end of the day, it transforms data from
one shape into another shape. So you could have different
API integrations with different Bubble specks transform everything into the

(06:26):
same model, while the model itself doesn't know shit about that,
you know, So, like that is something that you could
do with Bibble. We are not doing that at the
moment because I mean we're only integrating with like a
content fol headless CMS thing. But we could very easily
integrate with like into a legacy system and transform it
into the same models, and the rest of the system
would need to know that that came from a different place.

Speaker 1 (06:48):
Yeah, I was trying to see if you have it
seems like your example is very much you have to
pass away steps. Do you have anything in there where
it's like if this thing is there, do this otherwise
not kind of keep going and.

Speaker 2 (06:57):
Just yeah, yeah, yeah. I mean like the kind of
how like bubble came to be, right, Like in the beginning,
I just created a few steps basically like we use
those internally. I moved it into a guitar propository relatively quickly,
and we just depended on the GIAB propository. And over time,
like while through usage, we discovered some patterns that were useful, right,

(07:18):
And for example, being able to make a choice, Like
there's a bubble step, it's called choice, and then you
can like what you do that you have a function,
and then you can match on whatever the data is
coming in and give it like it return a different
bubble step, so you can kind of decide, okay, like
for example, if this is niled and do something like that,
or if this is a list and do something like this,
or if it's a map, then do something different, right,

(07:38):
Like though there are different components you can use to
kind of compose things together and decide you while you
go through the data, what exactly the next step ought
to be. I mean, like something that turned out to
be incredibly useful, and I didn't expect it to be
in the beginning. Is like we have a constant step
like so as a constant that really just does what
the name suggests, that always returns the same value it

(07:59):
always which turns the value that you give it. It's
actually very useful, like in these choice scenarios when you say, okay,
like if it's actually a list, then maybe transformed by mapping.
But if it's something else, just return an empty list
by default, right, Like then you have constant and things
like that. So it kind of I mean, like if
this world it's kind of monady in some spaces, but

(08:20):
it's basically like a bunch of opinionate monats very specifically
built to transform data and compose together. I actually consider
to make a kind of get a little bit closer
to like a monetic interface, but I consciously decided against
it because this thing is specifically meant to make data
transformations easy and nothing else.

Speaker 1 (08:40):
Now you have a couple of different instruction here basically
pieces to it already have Babbo contacts, Babo step, Babbo test,
quite a few other babble pieces, right do you mnd
kind of walk a little bit through the different pieces
and what they're actually used for.

Speaker 2 (08:51):
Yes, yes, yes, basically at the entry level, like you
have a top level module which is called Bubble. It's
no surprise there, right, and that kind of has also
all of the functions you would want to compose the together.
So our whole API is kind of built to be pipeable,
so like you can basically pipe each of these steps
into another step and then that automatically composes into a
bubble pipeline, and a pipeline can be named or unnamed.

(09:11):
The bubble pipeline can have an error handler or not.
That is kind of the better bread and butter if
you want, right, And then beyond that you have something
which is called a Bubble step. That's a behavior, but
it's also a usable module and it's actually what's also
used to build all of the built in steps. So
under the hood, what that does it requires you to
implement an apply function, so like that gets just the

(09:34):
struct itself. Like well, for example, if you use it
into in another module and that that modul it's expecting
that model to be a struct, right, then you get
the first values that struct itself. The second value is
the data that you want to transform. And like all
of the Again, all of the built in steps in Bubble,
like the map and the fetching and the getting and
so on of all of those are built on top

(09:54):
of that step, so you can quite straightforwardly create your
own steps if you want. So that is kind of
the extension place to go. Another thing what it does
under the hood, like it also derives the Bubble applicable
and protocol implementation because that is kind of a bubble
is using them to quite apply and compose everything together.
So in theory you could also use this thing as
a behavior and then define all of these implementations yourself.

(10:17):
It's more of a convenience thing then Doubble test, so
like that is something you probably won't need to use
in your day to day if you just consume the API,
but it has some nice help us if you want
to build your own steps, So it has some shortcuts
basically to make testing custom steps easier. It's also what
Bubble uses internally to test the steps. And then maybe
the most interesting one is Bubble trays. So that is

(10:40):
kind of where I landed on iterations on iterations on iterations,
because that is that is where the bunch of the
work went in to kind of capture what has happened before, right,
like what has been happening, what was the error? What
led up to that? Erra and I went through so
many iterations on that, like while this thing was kind
of in an internal development And bubble trays is kind

(11:02):
of like you can really think of it like like
in a tracing scenario, right, So like each evaluation step
creates and it has to return a bubble trays, right,
So the traise always contains the input data, it contains
the output data. So for example, okay or an error,
it contains the actual step, and it contains nested races,
and that is kind of what Bubble then use internally

(11:24):
also to give you information about, Hey, while I evaluated
your one hundred step pipeline, this step over there and
like three levels deep nested that things failed, and that
is also like what is why I where like a
lot of the work for my side went And there's
for example, like on you kind of on the Bubble
trace model, it's module itself. There's a fine function and

(11:45):
you can give it a trace and then say, hey,
find me all fetch steps. Then it returns you basically
list of all FTCH steps. You can find cell. Okay,
find me all fetch steps that are part of a map,
and so on and so forth. You can give it
just a function to say, okay, find me off all
tra that match this function, and so on and so forth.
So that is kind of where you can if you
actually do have a complex pipeline and the out of

(12:06):
the box reporting isn't enough, and then that thing can
really help you dig down into like what exactly went wrong,
because every trace contains the input data, it contains the
output data and the potential and the step that actually
was evaluated, so you can really see basically what happened. Yeah,
and then beyond that, the bulk of the work went
into honesty like developer experience. I mean, there's a bubble

(12:28):
error that not a NOME surprise there right, Like it
has a nice error message. And actually what I did
a whole lot for this was the first time I
did that was building custom inspect implementations. Ho like like, yeah,
make the different Barber steps and that the trace is inspectable.
So if you, for example, I mean I would invite
viewers to try that out. If you, for example, like

(12:50):
try this thing out maybe on an IX shell, do
a mixed install and then you can you build a
bubble pipeline, right, Like it inspects into the actual code
that you executed, So basically you don't really get a
bunch of structs, but you get like a pipeline composition
that you could copy paste and it's executable code. So
like that kind of to have her make it easy
and straightforward to kind of inspect, Okay, like what was

(13:12):
the pipeline and I'm using that under the hood in
the Bible trace custom inspector then say, okay, each of
these steps was executed here. That is the step, that
was the input data, that was the output data, Those
were the Nesta traces, so on and so forth. So
you kind of get like a spec traciege thing. And
in the case of an error that is something I'm
currently working on to it's bits omitting like all of
the success steps in between, because otherwise it can get

(13:34):
very noisy because at least in our case, like these
pipelines can be dozens of steps long, right, so you
can get a very very long tech trace. But yeah,
I'm currently working on making that a little bit even
more approachable to like kind of only print the things
that went wrong instead of printing everything, and that is
like where again the bulk of the work went into
so that you don't as the consumer of this library,

(13:55):
don't have to care about that shit, right, Like you
only can specify, Okay, I want to fetch that, I
want to do that a map it into whatever I
want to put it into a map, and then please
take care of it for me, Bubble. I don't care
how you do it, and if it goes wrong, you
get all of this information readily at your fingertips.

Speaker 1 (14:11):
I was just thinking about the tracing part if you
wanted to trace something in production. Though, the way you
see up the tracing is not just a simple configuration
that you can flip on and off, kind of like
with Logger right where you can change the debug level
like during run time. So for this one, you actually
have to put it into a trace and then add
some more kind of configuration to it.

Speaker 2 (14:28):
If I read it's not actually it's something what is
it's always builds like the hood, So like basically, if
if you do the top level API, it has like
Bubble apply, it's called that returns even an okay on
errortoople what that thing does under the hood, Like there's
a bubble trace function that basically we turns to like
the trace structing. That thing is always constructed for every

(14:48):
evaluation and then it just checks is that trace an
okay result or is that an error result? And then
it returns an okay one errortuple. Like that is basically
a bubble apply does so like if you have like it,
then in an error case, it gives you a bubble
error struct and that contains a message what was it
now contains the result, and it contains the trace like
that is basically what the doubble hour contains. Right, So
that is something you always have had your fingertips, like

(15:10):
something I it was kind of nice how things came
together there, like a little story time here, because something
I wanted to have and I wasn't quite sure how
to do it was to have a step and it's
now there. It's called route to access the initial the
data you passed in because sometimes you have situations we
had that where you kind of say, okay, like I
want to create this like this map or destruct over

(15:30):
there with these fields, but one piece of data I
actually need is maybe like a few levels nested higher, right,
And there was no easy way to make that happen,
you had the kind of like maybe do it with
like some functions and carry that whole data thing along.
Now there is bubble route it's called and what that does.
It's basically because each of the evaluation steps actually gets

(15:52):
like the abubble context. That is basically the structor contains
all of the previous steps and it just goes to
that list, go to the very first step, take the
input data and return that.

Speaker 3 (16:03):
Right.

Speaker 2 (16:03):
So in theory, you could even build like now time
traveling steps where you say, okay, go back two previous
steps and take the data from that. And I experimented
with that for a little while, but it gets confised
very quickly because at that point it's like you have
to keep so much implicit state in your mind. But
if you wanted to I kind of have a build

(16:24):
time traveling debugger or time traveling tooling on top of this,
you could you could basically say, hey, apply these three
steps and then take the result from like two steps
ago and then do something different with it. Like the
data is all there and that is kind of like
it was nice how I'm like, like building this route
step became super trivial as soon as I kind of
nailed the error handling down, the pieces fell together. It

(16:46):
was a very nice experience.

Speaker 1 (16:47):
Yeah, okay, you put it on straight a lecture. I
don't see any dependencies other than like CREATO.

Speaker 2 (16:54):
Honestly that that is kind of like a designed philosophy
I do with all of my libraries. I I mean,
like I of course, for like full local development, I
have Creto and everything, But I don't think any of
my libraries. Maybe I have like an plug, like a
small etech plug. That's also I mean like that obviously
depends on plug, right, but part of that, like all
of my libraries are zero dependency. There's something I honestly,

(17:15):
I always like independencies. I kind of maybe it's just
how the way my brain is wired. But I do
like when you don't have a bunch of shape that
is installed along, like when you want to do something simple.

Speaker 1 (17:26):
I just look at this enforced version ex file. There's
some versions of lextra line that you don't support if
I read this correctly.

Speaker 2 (17:33):
Yeah, but I mean like that those are also in
the mix do dot ex file like there's a yeah,
there's a file in there. I'm just just local convenience
to kind of switch between okay, like if this it's
just a bunch of convenience, it's to say, okay, like
if this is an older Elixity version than this, I
have to compile a different thing. I mean, like some
of the functions I'm I honestly I don't remember how

(17:53):
why exactly I'm using it. But there are a few
small cases when I got things production ready and they're
tested against various combinations of like Elixir and OTP, then
it came like turned up errors or warnings. So I
have a few small places where I kind of have
to switch. Okay, like if this is an older Elixir version,
I have to do something different.

Speaker 1 (18:09):
If you're listing version like one thatt eight I think
is your lowest version.

Speaker 2 (18:12):
It's relatively low, yeah, like should I had an even
lower version in the beginning, But then like it turns
out that the inspect obstruct kind of changed at some
point and I could have supported that, but I was like,
you know what I mean, like I'm already Yeah. I
think initially I had like support for up and down
to one dot six for Elixir, and then like I

(18:33):
think somewhere there the sometimes something in the inspect ops change,
so I bumped it to one dot nine, but it
works down to one dot nine and like my CCI
set up for my libraries also, or like I have
test various combinations of Elixir and OTP versions.

Speaker 1 (18:47):
Yeah, I was looking for that, but it looks like
you're developing with one dot fifteen and OTP twenty six
so two.

Speaker 2 (18:52):
Yeah, that is my default set up locally. But why
do you choose these versions? It's just curious. It's just
it's the version, was the version we're also currently using
in our production that it's the only breathing pretty somewhat old.
It's very old technically in stoftware age. Like I said,

(19:13):
it came I kind of came from like building tooling
we used in at work in production. So like when
I copied that over right, like that kind of came along.
There's no particular reason for doing that. So honestly, I
don't think any person out there will have any versioning
problems with his library. Like, honestly, I would be surprised
you have to use a really fucking lexity version to

(19:34):
run into problems, Like we're talking years here. Did you
plan that in your in your designer It just kind
of happened. I tend to aim for like a when
I when I do ibrary designs, I tend to aim
for like a low version, and then like I develop
on a higher version, and then I have my cy
set up and then I see, okay, like, for example,
why does something break? And usually it's, honestly, it's convenience.
It's usually when something breaks, it's because maybe Colonel them

(19:56):
wasn't around them stuff like that, right, And honestly, there's
big deal in working around that. But sometimes it gets
a bit more complicated, like with the inspectors, like when
I said, okay, like yeah, I could support that, But honestly,
at this point, I'm already talking about visions that I
don't know, three four, five, six years old, so like
let's bump it a little bit. But that is kind
of what I'm usually what I'm aiming for when building

(20:17):
these libraries.

Speaker 3 (20:18):
Yeah, that's that's really cool. I think one thing I
can see could be very useful edition could be like
a just a concurrency edition to this, like yeah, the map, right,
like a concurrent map if you have a less keys.
That's something I run into quite often when to a transformation,
at least at the scale that we do, so that
that'd be quite useful.

Speaker 2 (20:37):
Actually, yeah, something like bubble ships with I mean I
have a few example for example, I have just called
a cast step, and it supports booleans, supports floats, and
it supports integers out of a box and nothing else.
And that's not about supporting more. But what I instead
added as a call step. And when you do a
call you give it a module, you give it a
function name, and put on some potentially some extra arguments.

(21:00):
So if you wanted to, you could wrap that into
like a synchronous processing thing, right, but I guess I mean,
like I see where you're coming from, because actually, like
it sounds trivial out of the box. One of the
complex parts of Bubble is that you're mapping over exactly. Yeah,
that is. That was one of the honest when I

(21:21):
feel built the very first version, that was where I
hammered my head the long longest against the table, like
where to kind of collect if everything is okay, then
just collected. It became a whole lot easier. And as
soon as I wrapped everything in traces because the point
I mean that, yeah, I need to check here, like
it is any your fosset steps then fading potentially. But
it's not that complex anymore. But yeah, I guess that

(21:43):
makes sense. It could be a reasonable addition.

Speaker 3 (21:44):
But it's really cool. I think we had a guest
at some point I can't like the year ago or so,
and they mentioned that everyone should try to build a
data pipeline. Then I like, sure, like a DSL or
what you don't call it, like a data pipeline, like
a struck that gets changed like a dramatic way and
returns like a deterministic you know, set of outwards. Like

(22:05):
you have this like error and like you have this
idea of trace when you do execute a pipeline. Nothing.
Doing like this really helps you like rethink how powerful
ALEXA is, and you know how you can like leverage
that plugg able the pipeline ideology and like you know,
wrap the entire process and like a struct you know,

(22:26):
it's very Yeah, it does kind of like change the
way you think a little bit. So it's really cool
that you did this.

Speaker 2 (22:31):
Yeah, and a few angles. I also want to explore
to make things a little bit more convenient, Like for example,
like theoretically, if you construct these pipelines, you only need
to construct those ones, right, I mean, like generally speaking,
those are not dynamic generally speaking, so like I'm also
considering to like explore a little bit like to maybe
catch those things at compile times something like that, right,
like where it kind of compies into a like a

(22:54):
representation of the pipeline that can then be used to
construct those things. And you don't have to, but I mean,
like honestly, like we have this thing again in production,
like and the pipeline construction part is also something that
is traced because like that whole thing like in production
is like properly traced through hotel and everything, and like
we do have like a subtrace for like the actual

(23:15):
API transformation in the data transformation, and like that part
is not the slow part, you know. So yeah, that
is like very very low on my list of priorities
I probably what I might end up doing, because that
is ready to straight forward. You could do something like
persistent per term and basically say like you do like
a death I don't know, death bubble, then you specify

(23:38):
the thing and what behold it, evaluates it once and
then puts it in persistent term and story.

Speaker 3 (23:42):
But yeah, you do make the I think you do
add like that. I mean idea of state. Obviously that
kind of like breaks from the whole like the trace
and the pipeline structure is not as like transparent, but
but yeah, it makes sense. I guess like makes sense
if if you do it within the context of the
pipe line, then it might be still like functional.

Speaker 2 (24:03):
Yeah, I'm honestly I'm excited to see what people maybe
do this this and I honestly also excited to see
like what kind of feedback comes in about like things
that are missing. I'm honestly I was very happy with
like how I went about building this this time around,
like how we kind of used it in production for
like months, and like I kind of collected feedback like that,
And I mean, like that is also where I realized that,
like the error reporting part, that is the thing that

(24:25):
where people get the most value out of that, because
in the beginning it didn't really have good air reporting,
and then when things broke in production, it was like
always not super easy to figure out like why did
this break? Right? Like, And so I mean maybe that
is also like a learning I want to share towards
any bliss put into listeners. I want to build a
library maybe build something and then use it in production

(24:45):
yourself for a while before you put it out on
hecks or maybe I mean publish it as like a
zero do one version, right, but get that production knowledge right,
like get like actually dog food your shit, because then
you will very quickly see how these things kind of
hit a limit gap. And I mean there are a
few angles I would like to explore to make it

(25:07):
a little bit more convenient, but like the biggest pain
points we had, like for example, like figuring out how
is that pipeline currently like looking like, because like in
the beginning, you just had a bunch of structs that
got inspected. If you were looking at those, it was
super hard to read write. And now you actually, like
for these customs inspect implementations, you basically get the code
that you need to copy paste and to build that pipeline,

(25:29):
and that all of that, all of these little improvements
made it already so much nice and easier to use.
And that is just it's some pain I myself had
when using that, and that was a very valuable experience
in building this.

Speaker 1 (25:41):
I must have starting to wonder too about because this
is all single, single process, right.

Speaker 2 (25:47):
Yeah, that's absolutely zero processes in there?

Speaker 1 (25:51):
Does is that something you think it could be useful
or do you think that maybe overkill for something like this?

Speaker 2 (25:55):
I mean, I'm alio already just said right, like like
a concurrent transformations that could be useful, Like where you say, okay,
maybe you want a map of a very big collection
and maybe you want to stream it. Right, there could
be something I could imagine that it could be useful.
I guess you could. I'm gon honest, there, I would
need to really think about how to build like a
stream in the face because like there's no guarantee what
kind of data you out yet, right.

Speaker 3 (26:17):
Yeah, I don't make a tricky but just as the
bit that you structure it, you'll have to add some
flexibility to this construct of five line.

Speaker 2 (26:27):
Yeah, but yeah, there's definitely angles there I could explore
to make it easier. You could already do it if
you wanted to, Like you could basically say that like
any of your collections and maybe inside of a map
that those is actually not just a collection, but you
build a stream and then that under the hood basically
does bubble shit. Like there's nothing that would stop you
from doing that. But of course it's it's not supported

(26:49):
out of a box, like something I did while I
build that, Like when I build all this whole error
reporting thing is like I already kind of presume that
people might do funny things like that's, for example, there's
a bubble there step and it's kind of like the
kernel then, right, like you get you just give it
a function and then you get the data and then
you do whatever you want, right, Like you can do
whatever you want. And I already assume that people maybe

(27:11):
potentially they call bubble inside of a bubble then step right,
like so kind of like have base nested calls, which
isn't kind of how composition is meant to work, but
I could very much see somebody do this without thinking
much about it, and like what Bubble already does under
the hood, like basically it checks if an error would
results if it's a bubble error, right, and then it's okay,

(27:31):
oh okay, probably somebody did like bubble apply inside of
a bubble then so I'm just going to take that
Bible error and take the trace out of it and
treat it as if it was one big pipeline. So
there are already optimizations in the library like that, So
I would presume it would be relatively straightforward to build
like something like off a stream. Maybe I'm presuming I
get like the Yeah, I get the high level right,

(27:52):
But I mean I currently we have like a map
and a flat map, which is actually a little bit confusing.
I'm considering to rename those because what you when you
get do it a map, you give it like a
bubble pipeline or a bubble step to apply on every
of the elements. In a flat map, you give it
a function that returns a bubble a bubble step. So
like it's kind of like a map with a choice inside.

(28:13):
It makes sense from the morenatic point of view. It
doesn't necessarily make sense from the Elix heir out of
a box experience point of view. But like what I
could imagine is doing something like maybe thinking a stream map, right,
or having a something like that. What is this just
file thing? I haven't seen that before? Okay, now we're
leaving Elixir completely. Yeah, it's bugging me. Just is like

(28:36):
it's like make but without all of the craft. So
it's a task file thing. We're also using that internally.
It's honestly, I like it. It's it's I'm at the
point in my career, and I think it's just like
a philosophy I like, as aid a software engineer, Like
I like tools and libraries and things that do one
thing and do one thing well, and like you can
kind of see that in my libraries, like all of
those are focused on like one specific thing and that's it,

(28:59):
and just that kind of like make without all of
the see craft that comes along with make right. So yeah,
I mean, like I guess if you were at the
end of the episode, it could be one of my
picks because I'm very fond of just yeah. I mean,
I don't really have a lot more questions. You went
through a lot of it, and it seems pretty straightforward
and I can already see some usefulness to it. The

(29:20):
only thing that I did not see you kind of
talked at about the validations and stuff, right, But can
you talk more about that, because I didn't see that
in the darks. It didn't kind of poke out to
me right away. Is there some kind of thing where
you can say these keys need to be here and
this one needs to be over here, other than to say,
if this key is not here and I want to
move it from here here, then I want to fail,
then that would fail validation. I would kind of reverse

(29:42):
the question, like what would be a use case? Right?
What bubble isn't it's not a validation library, right, Like,
it's not meant to kind of give it a bunch
of data and basically then say, okay, it should conform
to a schema. I guess you could truehorn it into
it a little bit, but it's not, that's not the idea. Right. So,
like every every of the build in steps, and I

(30:02):
mean like the one people would see use the most
in pipelines is the fetch step that kind of encapsulates
assumptions about the data, right, I mean, like, for example,
the cast step when you say cast this to a boolean, right, Like,
it's assuming that this is like a true it's an
either string or it's an is it an interesting I
don't cast integers, but like is it a string or
is it like a booleon already? Right? I mean it's

(30:24):
it's lowercasing the strings for example, the checks for true
for the checks for yes and no because those are
valid truth fy values in yamal for whatever reason. So
things like that, right, But that is like kind of
encapsulated in each of the steps and if then, like
any of these assumptions is violated, then you get an error.
But yeah, there's no like I mean, like if you
wanted to do validation, you could do that. You could

(30:45):
like bubble then is that kind of the escape patch
for everything, because you give it just a function, right,
Like Bubble then gets a function. That function gets the data,
whatever the data is at that point in your pipeline.
And like that is something I also do in production
a few times where we then maybe check it against
like aesthetic list of values, right and say if it's
not part of a sthetic list of values and actually

(31:06):
return an error here. So like that is that kind
of the point where you would reach forward to do
whatever arbitrary meditations.

Speaker 3 (31:12):
I would do it like post babble apply because I mean,
if you start like doing these custom things, you might
want to probably want to return the trace error for
the Yeah.

Speaker 2 (31:22):
But it is that is Bubble then like kind of
takes care of that for you under the hood, right,
Like like that is like the optimizations I mentioned earlier,
But like if it just return a value, it reps
it an okay TOOPLE. If you return an error result
error Toople, then it checks the case that actually they
maybe do you return an error of a trace to
return around with a bubble error, and that it does

(31:42):
all of these little optimizations under the hood.

Speaker 3 (31:45):
Right, we're done like an okay, and an invalid chain set.
If you choose to do an act to operation.

Speaker 2 (31:50):
Or whatever, yeah you could do that.

Speaker 3 (31:52):
Yeah, I would just do it like I've just use
this as like for like a transformasion and after the
pipeline application, I would do to unless you are like
features like you know, laziness or concurrency, the then the
value of doing it within the pipeline. You know, there's
a lot more. Okay, how long did it take you
to write? And I know you said there's a lot

(32:13):
of iterations, but like just the first iteration where you
know you like, okay, they is functional running in production.

Speaker 2 (32:19):
How long did you Yeah, first like very rough version,
maybe a week. But I mean at that point that
I didn't have properity reporting nothing right, But it just
was already better than having to ride all of these okay,
nested fetch here. What if this isn't what if this
is missing?

Speaker 3 (32:35):
Right?

Speaker 2 (32:35):
Like I already at that point had like something that
told me okay, I failed because that key in this
data was missing, right that that's something like the first version.

Speaker 3 (32:45):
Yeah, week, it's pretty good. I'm curious because, like you said,
you didn't have the error initially, and you know, it
wasn't as like inspectable or traces now. And I've seen like,
you know, two types of to developers in this context,
Like one sort of like very excited to like, oh, hey,
you know we're going to have this like centralized DSL,

(33:07):
centralized standardized, we're doing data pipeline, and the other ones
are like yet another d yourself for me to learn,
you know, like why can't we just like you know,
keep it simple, keep like using use alexa and I
think the current iteration of babble that we are looking
at it, I think it's like pretty simple, you know,
it's it's easy to read through. Their their aspects of
it which are complicated, like how how the trace eventually

(33:30):
runs and gathers the data, But I think overall, it's
like I want to say, like for a pipeline library
is still simple. But I'm curious, like did you get
pushed back from people like hey, this is something there's
something I'll have to like learn, you know, why why
can't I just like give you my data operations normal way.

Speaker 2 (33:46):
Yeah yeah, actually I did there. But that's a funny
little story there because I mean, like kind of what
I think what you're also getting at the AUDI and
like that is kind of the design philosophy that went
into it. Like Bubble does not make any assumptions how
you want to structure your code, right, Like it's not
a framework. It doesn't tell you, okay, like you need
to plug into these places to use it. Like it's
not like Commanded of Phoenix where there are certain extensions

(34:09):
for it's or also not Ector. I mean, Ecto basically
is a very powerful tool, but it makes assumptions about
how you want to build your application, right, like you
need to do these actor schemas things like that. That
is kind of why why I built double to scratch
my own isge because that thing doesn't make any shit,
any assumptions about how you structure your code, right. It
basically gives you things you can pluck together to build
bigger things, and then it tells you, okay, like data in,

(34:32):
data out, do what you want, right. So what I've
gotten when I first published this, like on the Alexi
forum a threat I've gotten the I've got. I wouldn't
say a lot, but I've got some pushback on like, hey,
why can't I just use Acto, And I mean my
answer is, well, if you want to use actor, then
use actor, right, Like I don't like using actor for this.

(34:53):
If you like using actor for this, cool, Like there's
no good answer for this. I mean, like the only
use gate I can really pull out of my nose
here is if you have multiple data sources and need
to transform your data in very different forms from these
data sources, but you want to have like one common output,
then Bubble will make it a little bit easier for
you than actual. But I guess you could kind of

(35:13):
do it with like in between ACTO schemas if you
wanted to. I don't know, Like people, nothing in Bubble
is like special in the way of yeah that it's
impossible to do in other already existing tooling. And I've
gotten some some feedback pushed back from people in that regard.
And also people have been like there was some valid
criticism about the verbacity of like a full blown Bible trace,

(35:37):
because if you do have a non trivial pipeline, these
things get long, like because whatevery trace then contains it
contains this is the actual step I'm running, right, and
this is the input data, right, and then potentially these
are all the nested traces, and that is the output data.
If something breaks, that information is really fucking valuable, but

(35:59):
it gets long. That is why I'm currently working right,
like on an optimization where basically for errors, like if
you do have an error message, then it only it
prints the root causes that is something that is already there, like,
so the traces that have no nested traces that are
erroring so bubbles making the assumption that this is kind
of where shit broke, because usually if you do have
nested traces, that means it's like a higher order step, right,

(36:21):
like something that composes other steps together, and that for
the actual full trace, it doesn't it omits all of
the successful steps in between. It's still long then, but
not quite as long. And if you do then actually
want to access the full trace with like all of
the successful steps in between, I mean that thing is
still there in the data, right, it's just dot printed.
So yeah, I got pushed back, So like what kind

(36:43):
of summarize I got pushed back on? It can be
summarized as like, I like, how how the way things
are right now? Why should I use this? To which
I just can say, then don't. I'm perfectly fine if
you don't use that. And the second pushback is okay,
like this is too ribose. I don't get it, and
I think that there's about it's criticism that makes sense.

Speaker 3 (37:00):
A quick follow up like did't it for pushback within
your company? Like because you know you can if you're
making a decision to add this to your code for
engineers who were like not not too happy about it, like,
oh another thing I'm going to have to learn, even
though no matter how light weight, right.

Speaker 2 (37:17):
Yeah, actually not at all. Like everybody at work who
used everybody at work who used it was like, honestly,
this is nice. That's great, because I mean like that
is like where I said in the beginning, I scratched
my own itch, like I wrote eight data transformation libraries
in plain elks here so often it's boring code to write,

(37:38):
like it's not exciting. It's a bunch of whips, it's
a bunch of cases, it's a bunch of errors, and
okay tooples and yeah, maybe you could then say let
it crash. That it was also like one of the
what what if I just had a question, Yeah, if
it's fine for your case, then then do that, right.
But if you don't want to make it crash and
you do want to kind of be a bit more
careful about okay than this, then that return and it's

(38:00):
it's boring code to write, and it's a lot of
it's a lot of code to write.

Speaker 3 (38:04):
Yeah, that's it's very cool. So the problem was like
obvious and apparent enough that you had an entire entire
developer team was happy about it. That that's a great place.

Speaker 2 (38:15):
To be and especially since the latest versions where I
updated the error reporting and people were Honestly, we have
one colleague who joined quite recently and he had also
do like an external API integration and like he did
never use that thing obviously right, and I never heard
about it. And he picked that up. I mean like
he has been euroking in alex a few years just

(38:37):
picking and afternoon like they looked at the dogs just
used it and was like Stevens was nice, like I
didn't have to care about all of these different EERO states.
I could just specify fetch that, fetch that doo map
a story. Nice. Yeah, and if any of your listeners
are like now curious, I mean, like the thing is published,
it's on one dot zero dot one currently, I mean
probably will be one dot one dot Oh, I guess

(38:59):
at the time of when this is published here, because
like I said, I'm doing a few optimizations at the
moment and I'm currently working on and I would hope
that this is done when you listen to this like
on a live book example, and also a few more
examples because that is some feedback are also gotten. I
think that as well as feedback behinder just showcase like
how you can use this, right, and also showcase you
don't just have to use it for structure. You could

(39:20):
use it for transforming something into maps into I mean,
if you could do this to transform with something into earlane.
How are these things called arline records, right? Which is
just like a fancy tooples because yeah, nothing in bubble
is making any assumptions about like the output data, right,
Like honestly, that was like if you if you want
to read some some boiler plate code and then go

(39:42):
into h into a bubble, the bubble geitub repository and
go into the intoable protocol because there's a bubble in
two steps, and then you just give it a data
shape and basically it recursively evaluates all of the values
in that data shape and checks if those are Barber
steps and in evaluates those. That is how you get
that nice descriptive API. But for example, for optimization purposes,

(40:06):
like I do have implementation for tuples, and I have
a handwritten implementation for tuples up to I think five
value tubles or was it four value tuples, and then
you have like four values and you have to check
for all of possible combinations of like okay, error. That
was a lot of boiler plad go to because like
for any tuples that have more than I think, I

(40:27):
think up to four I think I went up to four.
Any tuples that are more than four elements, it just
as little like tuple two lists then basically falls back
on the list implementation and then does list to tuple.
But that's obviously slower than doing pattern matching. So I
hand wrote implementations for up to four value toubles, which
was not fun. That was the kind of boiler plated

(40:48):
I wanted to avoid, And now I had to write
more of that for that library, but at least they
only have to write it once. But yeah, coming back
to what I actually want to say, So try this
out open issues, let me know what you think. Mineka.
I'm going to go give my contact leaders at the end,
but already here, like, you can reach me at mail
at alexocode dot dev and I really would be curious

(41:09):
to hear what you think about this and how you
use it and if there's any rough edges I kind
of missed.

Speaker 1 (41:14):
Okay, yeah, I don't have more questions, Addie, just double
check in.

Speaker 3 (41:18):
I'm good.

Speaker 2 (41:18):
Yeah.

Speaker 1 (41:19):
Otherwise, again, thanks for coming on. I think we can
transition over to picks.

Speaker 2 (41:23):
Addie. I think you said you had something that you
want to say.

Speaker 3 (41:27):
I didn't say that, but I guess like there's something
I'm excited about video game that's coming out I think
in a couple of months or so. It's called Kingdom
Cum Deliverance too. The first one is a gem. I
highly recommend anyone who hasn't played it to play such
a great game. And yeah, second one's coming out. I
was like announced to a month or so ago, super
stoked about it. Yeah that's it for me, Alex.

Speaker 2 (41:47):
How about you? Yes, sure. I mean, honestly, I'm going
to pick just here. I mean, it wasn't my plan initially,
but it's nice. It's like one of these tools as
I mentioned earlier, it does one thing. It does one thing. Well,
if you want to have some thing like make with
all of the craft that right, I mean, like if
you do righte a make file, you always have to
do this phony thing because otherwise it assumes it's actually

(42:08):
a tag blah blah blah. Right, and just as like
a modern re envisioning of what make who could be
if you don't presume it's the sea thing. Yeah, I'm
really fond of it myself. Check it out. Beyond that,
what else can I I mean, this is like deeply personal,
but I'm going to do it anyway. So, like, if
you are in a situation where you go through like

(42:29):
a breakup, or maybe like in a marriage situation with
a breakup, and you don't want to end up in
as this hateful divorced couple. I've recently been reading a
book called Conscious Uncoupling, which is basically about like finding
an alternative solution to separation where people don't end up
hating each others, which is especially I think helpful when
kids are in the picture. So Conscious Uncoupling by Catherine

(42:51):
Woodward Thomas is a worth of read. And yeah, other
than that, if you want to reach me, and you can,
like as I mentioned earlier, my male but all you
can also find me on must Have Done if it's
a Hackeyderm dot io at wolf for Earth, or you
just check out my website which is alexocode dot dev.
I mean like everything is linked there, so curious to

(43:12):
hear what you think, folks.

Speaker 1 (43:13):
And then, of course, last and most importantly, there's me,
my pick. So I don't know what's going on, but
there's this huge push for being more green in Hong
Kong the past half a year or so. I'm all
up for that, But the thing that drives me nuts.
I don't know about you, guys. I'm guessing Alex, maybe
you have something like this in Germany where they're pushing
to use kind of those paper straws, which I'm okay,

(43:34):
but problems. They just don't work, at least the ones
that we have. You get like two or three good
SIPs and then it just falls apart. I'm not sure
if you guys have this privately better ones than we do.

Speaker 2 (43:42):
Yeah, we may. Plastic sprawers are forbidden in Germany, so
but how's your paper straws? Are they actually like usable?
That's my question? Okay, depivity depends on the restaurant and
how cheap they are.

Speaker 3 (43:53):
I carry my own bamboo straws.

Speaker 1 (43:56):
That that is what my pick is, right. So I
recently want to ammaz On. I picked out a bunch
of stuff I wanted to buy, and I bought a
bunch of metal straws just for this occasion. And I thought,
I wondered to have this, and I searched and yeah,
they have it. And I'm a little bit excited because
I'm not going to carry around a bunch of full
sized straws all the time. But they have travel metal
straws that goes in your keychain. So I'm a little

(44:17):
bit excited for that coming and should be coming in tomorrow.
Because you know, sometimes you just want to get a
drinks side of the road, they give you the paper straw.

Speaker 2 (44:24):
Yeah, I'm done with that one. So they just fall
apart too much. And these are big enough for the
boba tea. That's why I had to specifically to get
that one because I love to get. I hate those
with a passion. I don't know, to each your own,
but yeah, they probably haven't.

Speaker 3 (44:41):
You probably haven't had a good one, Alex.

Speaker 2 (44:43):
That might be, but I honestly, I also I considering
if it's just my newer diversity. That's just like consistencies,
just like nope, my brain just goes nope.

Speaker 1 (44:52):
They make all kinds. I mean, there's one. I think
you're into avocados for now.

Speaker 2 (44:55):
You mean, isn't every vegan into avocados. There's like an
vocado one I had nearby my house is pretty good,
like with made real avocado and milk.

Speaker 1 (45:03):
It's pretty decent. Yeah, definitely, Like you know, if you're
health conscious, check it out. I haven't tried it out yet,
so I can't totally recommend it. But I'm just excited
because of the stupid push over here.

Speaker 2 (45:12):
Alex. You want to say something, you just unmuted yourself. No,
want to thank you for having me?

Speaker 1 (45:17):
Yeah, well, I want to thank you for coming on.
It's good to see you again, and I'm happy you're
doing well, so hopefully we'll see you again in the future.

Speaker 2 (45:24):
It's like one last thing potentially. I'm currently working on
a little project of my own. It's not quite ready
to be announced yet, but if you're interested to hear
more of me in the long term, like podcast wise,
then maybe do follow me on social media. So just
on the handles you mentioned earlier, right, yeah, messed it
on like a hockeyderm dot io ed warf. But again

(45:44):
I can also visit my website ixpo code dot death.

Speaker 1 (45:47):
All right, again, thanks for coming on and like I said,
hopefully see again in the future if you come back
on again in the future with this new project now
I hope.

Speaker 2 (45:53):
So

All Episodes

The Future of Data Transformation: Inside the Development of Babel - EMx 253

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

CrimeLess: Hillbilly Heist

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}The Future of Data Transformation: Inside the Development of Babel - EMx 253