Client-Side Data Storage: Keeping It Local

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:00):
So imagine you have a friend, a friend that you
visit every single day. Okay, sounds nice, right, But you
walk up to their house, you knock on the door,
they open it, and they just stare at you completely blankly.

Speaker 2 (00:10):
Oh wow, yeah.

Speaker 1 (00:11):
Totally blank. You have tell them your name, your favorite food,
where you work, and exactly why you're standing on their
porch every single day.

Speaker 2 (00:21):
That sounds incredibly exhausting and you know, highly inefficient.

Speaker 1 (00:26):
Exactly, but that is exactly how the traditional internet works.
Every time you load a web page, the browser acts
like it's meeting the server for the very first time.
Complete amnesia.

Speaker 2 (00:35):
Yeah, it's a fundamental flaw in the original design.

Speaker 1 (00:39):
So welcome to the deep Dive. I'm your host, and
today our mission is to cure the browser's amnesia.

Speaker 2 (00:45):
And as the resident expert today, I'm excited to get
into this.

Speaker 1 (00:48):
Yeah. We are pulling insights today from Raymond Camden's book
it's called Client Side Data Storage, Keeping It Local, published
by O'Reilly. We are exploring how to finally give the
browser a permanent memory.

Speaker 2 (01:00):
Which I mean it fundamentally changes how we interact with
the web because traditionally, you know, browsers just ask for
documents and server send them back right. That simple transaction
is the entire relationship. But by storing data locally, like
right on your hard drive, we flip that dynamic. We
turn the browser into its own standalone engine, and.

Speaker 1 (01:19):
The performance gains alone just make this totally worth unpacking
for you all listening, because when data is local, you
get immediate access exactly, you weren't waiting for a signal
to you know, travel across the country to some database
and then come all the way back.

Speaker 2 (01:32):
Yeah. And you also drastically reduce the strain on the
back end infrastructure. I mean, if a server isn't constantly
fetching the exact same user preferences thousands of times a minute.

Speaker 1 (01:42):
It can actually do real work, right.

Speaker 2 (01:44):
It can devote its compute power to complex business logic
and well, the holy grail of all this local data
storage is the fully offline web application, of.

Speaker 1 (01:54):
The offline app the dream.

Speaker 2 (01:56):
Exactly, you can build apps that just keep working even
when you drive into a tunnel and drop your cell signal.

Speaker 1 (02:03):
But you know, I always look for the catch when
a technology promises the holy grail, And based on the sources,
it seems like the biggest hurdle is that none of
these core technologies have built in synchronization with the cloud.

Speaker 2 (02:18):
That is the massive catch. Handling conflict resolution is entirely
on you, as.

Speaker 1 (02:22):
The developer, So it's not magic, not at all.

Speaker 2 (02:25):
If a user changes data in their browser while offline
and someone else modifies that same record on the server,
you have to write the logic to merge those changes
or you know, pick a winner.

Speaker 1 (02:37):
That sounds messy, it can be.

Speaker 2 (02:39):
And beyond that, the storage limits across different browsers are
just frustratingly fuzzy fuzzy.

Speaker 1 (02:43):
How Like, they don't tell you.

Speaker 2 (02:44):
The limit pretty much. You rarely get a hardcap. Yeah,
the browser simply evicts old data when the drive gets
too full, and it bases that on its own opaque algorithms.

Speaker 1 (02:53):
Yikes.

Speaker 2 (02:54):
Yeah. And finally, you have to remember these tools are
not a replacement for a heavy duty embedded database like Oracle.

Speaker 1 (03:01):
Right, so you still need to use the right tool
for the right job. I like to frame this evolution
kind of like upgrading an office. So we're going to
start by looking at the browser's original memory system, which
is basically a tiny, highly inefficient sticky.

Speaker 2 (03:14):
Note A very apt analogy yeah.

Speaker 1 (03:16):
And by the end we are going to build a massive,
highly organized asynchronous filing cabinet.

Speaker 2 (03:22):
I love that. So let's start with that tiny sticky
note that represents the oldest technology in the book, which
is cookies.

Speaker 1 (03:28):
Cookies.

Speaker 2 (03:28):
Yeah, they date all the way back to a Netscape
beta in nineteen ninety four.

Speaker 1 (03:34):
Wow, nineteen ninety four.

Speaker 2 (03:35):
Right, so they carry the immense baggage of how the
web used to work. They function essentially as just small
text value sent back and forth via HTTP headers.

Speaker 1 (03:44):
And here's where I have to like really push back
on the entire premise of using cookies for storage, because
if the whole goal of client size storage is to
keep data off the network to save bandwidth, cookies defeat
the entire purpose, oh one hundred percent. The sources explicitly
state they could sent on every single HTTPU request in response.

Speaker 2 (04:03):
Every single one. Whether you are asking for a massive
HTML document or just you know, pinging the server for
a tiny fabricon image, it.

Speaker 1 (04:10):
Doesn't matter, right. The browser diligently packages up all your
cookies for that domain and attaches them to the header
of their request.

Speaker 2 (04:19):
It is the equivalent of writing your personal preferences on
a heavy brick and then mailing that brick back and
forth to a store every time you want to ask
them a simple question.

Speaker 1 (04:28):
That's a great way to put it.

Speaker 2 (04:29):
Like you asked, hey, do you have this shirt and blue?
And you mail the heavy brick. They reply, yes, we do,
and they mail the exact same brick back to you.
The network drag is just absurd.

Speaker 1 (04:38):
It is, and developers have absolutely abused that mechanism over
the decades. So browsers eventually had to impose limits.

Speaker 2 (04:46):
Well kind of limits.

Speaker 1 (04:47):
Well, you're generally restricted to about fifty cookies per domain
and a total size limit of just four kilobytes across
all of them.

Speaker 2 (04:56):
Four kilobytes.

Speaker 1 (04:57):
That's nothing, it's kiny.

Speaker 2 (04:59):
But the author actually decided to stress test those limits,
which I love. The results were hilarious.

Speaker 1 (05:03):
Oh, this is a great story. Yeah.

Speaker 2 (05:05):
He pushed over four hundred cookies into the Google Chrome
browser just to see what would break first, and Chrome
handled the massive volume perfectly. Fine, It didn't stutter at all. Right,
the browser was fine. The web server he was talking to, however,
completely crashed.

Speaker 1 (05:20):
Wait really it took down the server.

Speaker 2 (05:22):
Yeah, because Apache and in Jank's web servers have limits
on it's a security thing to prevent denial of service attacks.

Speaker 1 (05:31):
Oh, that makes sense.

Speaker 2 (05:32):
So by bloating the requests with hundreds of cookies, the
client effectively dedossed its own server.

Speaker 1 (05:38):
That is wild.

Speaker 2 (05:39):
The server just couldn't parch the sheer volume of header data,
so it failed.

Speaker 1 (05:43):
So we have this tiny four killed by capacity massive
network bloat. And to make matters even worse, the API
for writing a cookie is incredibly clunky.

Speaker 2 (05:53):
Oh it's terrible.

Speaker 1 (05:54):
Like the set of cookie, you assign a string to
document dot cookie, so you write something like document dot
cookie Raymond. Simple enough, right, sure. But then to set
a second cookie you just run the exact same assignment
document dot cookie age forty three.

Speaker 2 (06:07):
Which is wild because in almost any other programming paradigm,
assigning a new value to an existing variable overwrites the
old one.

Speaker 1 (06:15):
Right if x equals one and then x equals two,
axes two exactly.

Speaker 2 (06:20):
But the document dot cookie API magically appends it instead.
It violates basic programming intuition.

Speaker 1 (06:26):
And reading them is a total nightmare of string parsing.
You ask for document dot cookie and the browser just
spits back one giant, messy string of texts with all
your cookies separated by semi collins.

Speaker 2 (06:38):
Yeah, you are basically forced to write your own regular
expressions just to chop that string up and find this
specific piece of data you actually want and deleting them.

Speaker 1 (06:46):
Deleting them requires an absurd workaround. You can't simply invoke
a delete command.

Speaker 2 (06:51):
Yeah, that would be too easy.

Speaker 1 (06:52):
You literally must reset the cookie with an expiration date.
In the past, developers traditionally used January first, nineteen seventy,
the Unix epoch.

Speaker 2 (07:01):
Yeah, you are essentially forced to time travel just to
kill a tech string.

Speaker 1 (07:04):
It's so hacky. So if cookies are basically a network
tax that we want to avoid, where are developers actually
putting this data today? I mean, I know I'm not
mailing a brick every time I use an offline text
editor in my browser. We clearly threw away the sticky
note and bought a real local locker.

Speaker 2 (07:19):
We did. The industry shifted to web storage, which is
almost universally known as local storage.

Speaker 1 (07:25):
Okay, the upgrade right.

Speaker 2 (07:27):
It gives you roughly five to ten megabytes of space,
depending on the browser vendor, And crucially, this data never
touches the network automatically.

Speaker 1 (07:36):
That's the key.

Speaker 2 (07:36):
Yeah, it stays quietly on the local machine until your
JavaScript explicitly asks for it.

Speaker 1 (07:42):
And it provides two distinct flavors. Right. Local storage persists
forever until you intentionally clear it, and session storage dies
the second you close the browser tab.

Speaker 2 (07:51):
Exactly, and the mechanics are beautifully simple compared to the
cookie nightmare. You only have four clear commands, which are
you've got set item to save data, get item to
read it, remove item to delete a specific entry, and
clear to wipe the whole locker.

Speaker 1 (08:06):
It is so much cleaner it is.

Speaker 2 (08:08):
Security is also baked in. Local storage is strictly scoped
to the domain, so if you save a preference on
food dot com, a script running on Goo dot com
cannot peek into that storage.

Speaker 1 (08:18):
Locker makes total sense, but the sources do warn about
a specific limitation here, which they call the string trap.

Speaker 2 (08:24):
Ah. Yes, the string.

Speaker 1 (08:27):
Trap, because web storage only holds strings. I know the
standard workaround is to run objects through Jason dot stringify
before saving them, but the text implies there's a much
bigger issue here than just you know, typing a few
extra characters to encode an array.

Speaker 2 (08:42):
You're right, The hidden tax here is CPU cycles and threadblocking.

Speaker 1 (08:45):
Thread blocking. Yeah.

Speaker 2 (08:47):
If you have a massive, complex JavaScript object, maybe a
deeply nested user profile with thousands of relational data points,
and you try to save it to local storage, the
browser has to serialize the entire object into a text string, right,
and Jason stringify is a synchronous operation. It blocks the
main thread.

Speaker 1 (09:06):
Oh oh wow. So while the browser is busy converting
say five megabytes of application state into a massive text string, yeah,
the user's interface completely freezes completely.

Speaker 2 (09:16):
They can't scroll, they can't click buttons, animations stutter and stop.

Speaker 1 (09:19):
That's terrible user experience.

Speaker 2 (09:21):
It's really bad. And the same penalty applies on the
way out. When you pull that five megabyte string back
out of local storage, Jason Pars has to reconstruct the
object synchronously.

Speaker 1 (09:30):
So it freezes again.

Speaker 2 (09:31):
Exactly. So, for simple preferences, the string trap is trivial,
but for state heavy web applications it creates a massive
performance bottleneck.

Speaker 1 (09:40):
That makes perfect sense. Okay, before we move on from
local storage, we have to talk about the storage event.

Speaker 2 (09:45):
Yes, this is a fun one.

Speaker 1 (09:47):
Because you can attach an event listener to the window
object that fires whenever the data in local storage changes.
But the twist absolutely baffled me at first glance.

Speaker 2 (09:57):
Which part.

Speaker 1 (09:58):
If I have a tab open and I type something
into a form that updates local storage, the event ignores
the tab that made the change, it just doesn't fire.

Speaker 2 (10:06):
I know, it seems totally counterintuitive until you know you
look at the architectural purpose of the event.

Speaker 1 (10:11):
Okay, lay it on me.

Speaker 2 (10:12):
The tab that initiated the change already knows the data
was updated because it executed the code itself. The storage
event is strictly designed for cross tab communication.

Speaker 1 (10:22):
Wait, if I have two tabs open, Hey, aren't they
aggressively sandbox from each other for security? How does tab
B even no tab A touched the local storage. Is
the browser itself acting like a local pub sub message broker.

Speaker 2 (10:36):
That is the absolute perfect way to visualize it. The
browser acts as the central message broker.

Speaker 1 (10:42):
Wow.

Speaker 2 (10:43):
Imagine you have two tabs open to the exact same
shopping application. In tab A, you click add to cart. Okay,
the application saves the new cart state to local storage
because the storage for that domain was modified. The browser's
internal pub subsystem broadcast ask the storage event to tab B,
and the.

Speaker 1 (11:02):
Event payload even includes the old value and the new
value exactly so TABB can inspect the payload, see the
new item, and instantly update the little cart icon in
its own navigation bar.

Speaker 2 (11:14):
Yep, you have synchronized to completely separate sandbox tabs in
real time without sending a single byte of data to
a server.

Speaker 1 (11:21):
That is incredible cool. It makes the web feel alive
and responsive, it really does.

Speaker 2 (11:25):
But you know, as we discuss with the string FORRAPP,
local storage hits a wall when you need to store
complex or massive data set.

Speaker 1 (11:31):
Right because of the UI freezing.

Speaker 2 (11:32):
Yeah, the ten megabyte capacity and the synchronous main thread
blocking nature of the API. I mean we eventually need
a heavy duty solution.

Speaker 1 (11:39):
We need the massive filing cabinet, which brings us to
index dB. This is the deep end of the pool
for client side storage.

Speaker 2 (11:46):
It really is.

Speaker 1 (11:47):
It handles massive data sets and operates entirely asynchronously, meaning
it does all its heavy lifting on background threads without
freezing the user interface.

Speaker 2 (11:57):
Which is exactly what we want. But it is incredibly powerful,
and it earned a reputation for having a notoriously unfriendly API.

Speaker 1 (12:05):
Unfriendly is an understatement. The sources mentioned that historically support
for index dB on iOS eight was so fundamentally broken
that developers literally wrote regular expression scripts to sniff out
Apple mobile.

Speaker 2 (12:17):
Devices, Oh the dark days of web development.

Speaker 1 (12:20):
Like if the rejects matched an iPhone, the code just
pretended indexdb didn't exist at all and fell back to
older tech what was actually breaking under the hood to
force that kind of extreme workaround.

Speaker 2 (12:30):
So webkits early implementation of indexdb on iOS was tied
to a very clumsy, squee light backing store, and the
translation between the asynchronous object oriented nature of index dB
and the rigid synchronous nature of the underlying SQL database
was deeply flawed.

Speaker 1 (12:45):
How bad was it?

Speaker 2 (12:46):
Terrible? Developers experienced catastrophic data loss cursors, skipping records, and
just totally unpredictable crashes.

Speaker 1 (12:53):
So they just bypassed it entirely.

Speaker 2 (12:55):
I mean I would too, Yeah, exactly. Thankfully, the browser
engines have mature lot and the modern support is highly reliable,
but the architecture still requires a huge mental shift, especially
if you are coming from traditional relational databases like Micequel right.

Speaker 1 (13:10):
Because you do not have SQL tables with rigid predefined columns.

Speaker 2 (13:15):
Instead, index dB uses databases at the top level, and
inside those you create object stores.

Speaker 1 (13:21):
So the database is the filing cabinet. The object store
is a specific drawer. You might have one drawer labeled
users and another labeled invoices exactly, but inside that drawer
it is a schemless free for all because it stores
raw JavaScript objects natively, avoiding that gson dot string of
fy tex we talked about earlier. Yeah, you can throw
wildly different data shapes into the exact same object store. Yeah.

Speaker 2 (13:44):
The author gives a brilliant example of this flexibility. You
could insert a user record where the age property is
an integer like the number forty two, and in the
very next record the age property could be the literal
texturing tool to matter and index dB accepts both without
throwing a single scheme of validation error.

Speaker 1 (14:03):
But the obvious question is, just because the database lets
you mix data shapes natively, does that mean you should
actually design your applications that way?

Speaker 2 (14:10):
Oh? Absolutely not. Structure is still paramount for maintainability even
without rigid columns. Every single record you place into an
object store must be identifiable by a primary key, right.
The database needs a unique identifier to fetch or update
that specific record later, And.

Speaker 1 (14:27):
The API gives you two distinct ways to handle that, right,
a keypath or key generator.

Speaker 2 (14:32):
That's right.

Speaker 1 (14:32):
So if I am storing user profiles that inherently contain
unique email addresses or say, social security numbers, I assume
I would use a key path exactly.

Speaker 2 (14:41):
You instruct the database to look inside the data payload itself.
You define the keypath as the email property, and index
dB automatically extracts that value to use as the primary
key for the index.

Speaker 1 (14:53):
Got it. But if I'm just saving random offline drafts
of blog posts, the data doesn't really have a natural
unique idea. I need the database to handle it for me.

Speaker 2 (15:02):
And that is the precise use case for a key generator.
You can figure the object store to auto increment a
numerical ID.

Speaker 1 (15:09):
Oh nice, Yeah, every.

Speaker 2 (15:10):
Time you hand in a new draft, it assigns it
ID one than ID two. Maintaining uniqueness automatically behind the scenes.

Speaker 1 (15:17):
And finding those records later requires creating indexes so you
don't have to scan the entire drawer sequentially. You can
index by date, by author, by status.

Speaker 2 (15:26):
Yep. It makes searching incredibly fast, But.

Speaker 1 (15:29):
The sources highlight a very strict security and structural rule here.
You cannot just inject new object stores or create new
indexes on the fly, whatever your application code feels like.

Speaker 2 (15:40):
No, you are restricted from modifying the schema. Loosely, you
can only alter the structure of your database during a
highly specific browser event called upgrade needed.

Speaker 1 (15:49):
So if I build an application today and three months
from now, I deploy an update that requires a new
object store for user uploaded images, what is stopping my
jobscript from just creating that drawer instantly?

Speaker 2 (16:04):
Concurrency conflicts and data corruption.

Speaker 1 (16:07):
Ah.

Speaker 2 (16:07):
Imagine a scenario where the user has two tabs open.
Tab A is currently writing massive amounts of data into
an object store.

Speaker 1 (16:15):
Okay.

Speaker 2 (16:16):
Meanwhile, the code in tab B decides it wants to
delete that entire object store and replace it.

Speaker 1 (16:21):
That sounds like a disaster.

Speaker 2 (16:23):
It is. The database would be left in a completely
corrupted state.

Speaker 1 (16:26):
So the browser forces version control.

Speaker 2 (16:28):
Precisely by requiring you to bump the version number of
the database, say from version one to version two. The
browser intercepts the request and fires the upgrade needed event.

Speaker 1 (16:37):
And what does that do?

Speaker 2 (16:38):
It acts as an exclusive lock. It forces all other
active connections in other tabs to close or pause before
granting you the one and only window to safely remodel
the filing cabinet. Wow okay, yeah, and once the upgrade
needed event finishes, the new structure is locked into place.

Speaker 1 (16:52):
That forces incredible intentionality on the developer. You really have
to plan ahead, you do.

Speaker 2 (16:57):
And speaking of safety mechanisms, all all the actual interactions
with the data, creating, reading, updating, deleting must happen inside
a transaction.

Speaker 1 (17:06):
And the transaction acts as an atomic safety wrapper around
your database operations. Like if you attempt to update five
relational records and a script error occurs where the device
loses power right as the fourth record is being modified,
the transaction fails entirely exactly.

Speaker 2 (17:23):
The database doesn't just leave the first three changes sitting
there in a half finished state. It automatically rolls back
all the operations to ensure the data remains pristine and consistent.

Speaker 1 (17:33):
It's an all or nothing approach.

Speaker 2 (17:34):
Total data integrity and moving from writing data to reading
it introduces another fascinating mechanism, Oh the beaver. Yes, when
you need to fetch massive amounts of data, say you
want to retrieve all one hundred thousand historical temperature reading
stored in the database, you do not just run a
query and dump all one hundred thousand records into an.

Speaker 1 (17:53):
Array, right, because doing so would immediately overflow the browser's
available RAM and crash the tab Ooh, we have to
stream it. And the API uses a tool called a cursor,
and the author uses an incredible analogy to explain how
it operates. A cursor is not a blinking line on
a screen. It is a happy little beaver.

Speaker 2 (18:14):
It visualizes the iteration process so perfectly. Instead of loading
the entire data set into memory, the cursor fetches one
record at a time.

Speaker 1 (18:23):
The happy little beaver runs into the object store, grabs
a very first temperature reading, scurries back out, drops it
at your JavaScript's feet, and pauses.

Speaker 2 (18:31):
Good beaver.

Speaker 1 (18:32):
Right. It waits for you to process the data, render
it to the screen or calculated average. Once you are done,
you tell the cursor to continue, and the beaver runs
back in to fetch the second record.

Speaker 2 (18:41):
It streams the data sequentially. You can instruct the cursor
to iterate backward or to only fetch records that fall
within a specific date range index, ensuring you process large
data sets with minimal memory overhead.

Speaker 1 (18:53):
We have covered a massive architectural shift today from tiny
text strings to asynchronous Beaver's managing gigables of local storage.
Synthesizing this toolkit really requires understanding when to deploy each
technology exactly.

Speaker 2 (19:07):
So to break it down, cookies should be relegated almost
entirely to legacy authentication tokens or data the server explicitly
mandates to see on every single network request.

Speaker 1 (19:18):
Basically, leave the heavy brick at home to save bandwidth, please.

Speaker 2 (19:22):
Do step up to local storage for quick synchronous interface
state basic Jason preferences and utilizing that brilliant Crosstab messaging system.

Speaker 1 (19:31):
And for the heavy lifting.

Speaker 2 (19:32):
Finally, when the architecture demands robust, offline capable applications processing
massive data sets without freezing the user interface, you invest
the time to learn index dB.

Speaker 1 (19:42):
Mastering these tools is the dividing line between a web
page that feels clunky and fragile and a modern web
application that feels lightning fast and actually respects the user's
data plan. We finally have the medicine to cure the
browser's amnesia.

Speaker 2 (19:55):
We do, and you know it prompts a rather profound
reevaluation of the modern technology staff. Well Browsers have evolved
from simple document viewers that forget everything the moment you
close the window into massive localized databases capable of executing
complex business logic entirely.

Speaker 1 (20:12):
Offline, which fundamentally challenges the dominance of the cloud.

Speaker 2 (20:16):
It really does. If the application logic, the interface rendering,
and the primary data storage all live entirely on your
laptop or your phone, the server is no longer the
main character in the transaction. Wow, we are approaching a
threshold where your local device becomes the ultimate source of
truth and the Internet is reduced to nothing more than
a synchronization cable to a cloud backup.

Speaker 1 (20:38):
That is wild to think about. Imagine going back to
that friend's house tomorrow. You knock on the door, they
open it, and not only do they know your name instantly,
but they have your favorite coffee already pour, your favorite
music playing in a meticulously organized filing cabinet containing every
conversation you've ever had.

Speaker 2 (20:53):
A profound shift from amnesia to instantaneous local recall

Speaker 1 (20:59):
The Browsermit, go build something that doesn't forget

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

Betrayal Weekly

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Client-Side Data Storage: Keeping It Local

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

Betrayal Weekly

All Episodes

Client-Side Data Storage: Keeping It Local

Stuff You Should Know