Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
Speaker 1 (00:01):
All right, welcome
everybody to the SNEA Experts on
Data podcast.
My name is Eric Wright.
I'm the Chief Content Officerat GTM Delta and the host of the
SNEA Experts on Data podcast.
Super excited because today I'mwelcoming Rick Kutzepel, who is
somebody who we've been talkingabout.
A lot of the work that has beenhappening in the industry
(00:21):
around innovation and we've gotwork that's happening with the
stay community.
We've got stuff that there'scollaboration around the work
that's happening within SNEA aswell.
Great opportunity to have agreat discussion.
But before I get too far forfolks that are brand new to you,
rick, if you want to do a quickbio and introduction and then
(00:42):
we're going to jump in.
Speaker 2 (00:44):
Sure.
Thanks, Eric.
So my name is Rick Kutzpel.
I'm a product planner withinBroadcom's data center solutions
group and I'm also a member ofthe board of directors of the
SCSI Trade Association.
Speaker 1 (00:57):
Now this is.
It's always fun because I'mseeing more and more where.
When we started, you know,while this is the SNEA, you know
podcast, we talk with peoplefrom you know, ieee, with the
Luskussi Trade Association andwith many of the other standards
bodies and communities, butfolks are probably may not know
(01:17):
the difference or what theorigins are.
If you want to talk about STAYand we mentioned T as well in in
some of the chat we had priorto this let's, uh, let's unpack
the, the acronyms for folks uh,rick and right, we'll talk about
what's going on right.
Speaker 2 (01:33):
so I guess let me let
me start with stay.
Um, we're the scuzzy tradeassociation, um, so we're part
of snea um and we're consider usthe marketing marketing arm for
T10.
T10 is an insights organizationand that's where the real
technical work gets done.
They develop the specs and thenand then STAY helps guide some
(01:54):
of that and then and thenpromotes it within the industry.
And it's not just SaaS really,it's, it's really all things
SCSI, but we are branded as theSCSI Trade Association.
Speaker 1 (02:09):
It's interesting
because everybody nowadays, when
we talk about what's coming andwhat's new and the exciting
industry trends, the first thingeverybody does, of course,
their AI starts dripping offtheir tongues.
And it's all about all thesenew, fantastic, amazing things
that we think we're doing butwe're actually not necessarily
(02:30):
doing at the at the level wethink.
You know, I think there's anover rotation to these really
far-reaching, you know stuffthat's happening today and it's
interesting use cases, but weforget that there's still a ton
of innovation that's going on intried and true technologies
that have been around for awhile.
So how does this come in?
When we say SAS has gotsomething new, people may be
(02:51):
going really.
So what is what is going on inthe state community?
Speaker 2 (02:58):
Yeah, so.
So that's.
That's a good question.
You know, the the SAS spec has,you know, has gotten to 20, 24
gig.
Everybody's aware of the 24 gigrevision of the spec.
But what people don'tunderstand is that the SCSI
stack itself is is very layeredand the very bottom of that
would be in, in this case wouldbe the SAS for the physical
(03:21):
layer, and on top of that thereare many different areas and
that's where the innovation iscoming to support features
focused on the needs of thehyperscalers with capacity and
rotational media, as well as theperformance needs of Flash and
(03:50):
it's funny because we we thinkof the scale at which we're
we're working with, and quiteoften that's where people think
is everything is going to besort of bigger, better, faster,
more.
Speaker 1 (03:55):
But it's also about,
you know, consistent
programmability.
It's about consistent apis.
It's about new ways to domanagement.
You know a lot of the work thathappens in the SNEA community
around stuff with Swordfish,relating to what's going on with
Redfish.
So it's not just day-to-dayoperations of the metal, but how
(04:15):
do we manage, optimize, protect, so, when we have work that's
going on in the technicalworking groups and within Stay,
what are some of those sort offeatures and factors that are
being managed and innovated on?
Speaker 2 (04:35):
Well, you know.
So first let me talk about, youknow, the legacy of SaaS and of
SCSI, right, SCSI has beenaround for a very long time.
The legacy of SAS and of SCSI,right, SCSI has been around for
a very long time.
Um, SAS, since, you know, thebeginning of the of the decade,
uh, or the of the century,actually, Um, and and it's a
very tried and true um platform.
(04:56):
Um, you know very, you knowvery reliable, Um, it's been
enterprise, tested and true.
So, when it comes toreliability, manageability,
serviceability, a lot of that isactually built into the
specification.
Moving forward, the realinnovation that's occurring now
(05:18):
comes in a couple differentareas.
Like I mentioned, one isfocusing on the needs of the
CSPs, or the hyperscalers, withtheir insatiable requirements
for capacity and performance,and we'll talk about it in a
minute.
And it's, you know, performanceisn't just, you know, fast data
, you know fast reads and writes.
(05:38):
It can get into latency andother metrics, Right, and then,
and then on the you know, withregard to flash, right, the, you
know the incredible readcapabilities of flash and how.
You know how, how SAS isdealing with that, and and there
have been some innovationsaround that area as well.
Speaker 1 (06:01):
And even just you
know there's so many things that
we take for granted that are.
You know, I'm an older fella,so when I came into, you know,
computing in the enterprise,there were very low capacity
drives.
You know even just the idea ofhaving different RAID patterns.
(06:21):
It was sort of new, Like whatwas the reason why we'd use raid
zero versus raid one versusraid five, and then raid six was
just coming and it was like wewere sort of wrapped around the
axle on on how and when to usethis stuff, because the use
cases weren't really varied.
But what we're seeing now islike the pattern of workloads is
(06:43):
so fundamentally different thanit was when I was coming
through the industry in theenterprise days.
So what does this mean now, youknow, for SAS as a, as an
architecture, with these reallydiverse and like high scale
workloads?
Speaker 2 (07:01):
So good question.
One example is the servicelevel agreements that the
hyperscalers have.
So these big data centers willhave very specific service
agreements that they have withtheir customers, and so certain
metrics need to be met, um, andin order, you know, in one, one
(07:24):
thing that um T10 has done isthey've implemented what they
call command duration limits, um, and this is to help with the
tail latencies of the of thedrives, of the HDDs, of these
very, very large HDDs, right?
So for the most part, the, theaverage latencies of of an HDD
are are very predictable, very,very, you know, very on average,
(07:47):
very predictable.
But in some cases the drivecould be doing, you know, some
garbage collection, doinganother task, or have to do a
couple seeks, and then what theycall the tail latency becomes
very large, latency becomes verylarge, and so T10 has
(08:09):
implemented CDLs, or commandduration limits, in order to
handle this.
And what it does is itgracefully fails a particular
command if that latency gets toolong, and there are
configurable policies that arein place and so the user or the
CSP can put those in place to beable to, you know, throttle or
(08:30):
control that.
And this originated from an OCPeffort called OCP Fast Fail and
there's actually a goodpublication.
It's called Cloud HDD Fast FailRead.
It was published in 2018.
(08:51):
And that was really the genesisof then.
T10 took it and it was agenesis of CDLs, which was
published in SPC 6.
And starting to be deployed intoday's data centers
specifically to control the taillatencies.
And there's quite a bit ofinformation online if you go and
search.
Speaker 1 (09:10):
Yeah, and it was
interestingly, this morning I
was literally listening tosomebody talking about SQLite
and sort of one of thechallenges around rewriting it
in Rust and one of the thingsthat they were struggling with
was this idea of like partialwrites and handling partial
writes because of tail latencyin those io queues and it was.
It's so apropos that here weare, you know, realizing this is
(09:33):
being solved at a few differentlayers, but at the very bottom
layer, like the safest, down tothe metal layer that people need
to worry about or be confidentin, I guess, is what's been
going on and seeing thoseinnovations in SaaS.
Speaker 2 (09:51):
Yeah, and another one
is about just servicing the
capacity needs of thehyperscalers, right, and so
everybody's probably familiarwith, or at least heard of, smr
or shingled magnetic recording.
In fact, over 50% of theenterprise rotational bits
(10:14):
shipped in 2024 have been SMRenabled Right.
So that's pretty big.
So SMR has become very largeand just in general, in order to
improve the aerial densities ofthe drives right, to get more
capacity in the same footprintand T10, you know we.
(10:38):
So we have this all the wayback and it's in ZBC1.
And it's been published for nowI don't know, maybe eight or so
years, so it's well established.
The follow-on to that is whatT10 calls format with presets.
It's a very obscure name.
A lot of times it will bereferred to as hybrid SMR or
(11:01):
next generation SMR, and what itdoes is it gives the user or
the CSP the ability to formatthe drive either as a CMR drive
or an SMR drive or evendynamically do it.
So part of the drive is CMR andpart is SMR Wow, and it's
(11:22):
mainly being used today as a SKUreduction.
So the hyperscaler, the CSP,can buy a specific drive and
then, depending on the actualneed, do they need CMR, do they
need SMR?
Do they need more capacity butthen be limited to sequential
operations, or do they need CMRdrives?
(11:47):
And so it's a very interestingone, and that's being rolled out
today as we speak as well.
Speaker 1 (11:53):
What you think of it
like.
One of the things I'm talkingto the hyperscalers myself like
the biggest challenge iscapacity planning and management
because of really they don'tknow what workloads are coming,
so there's a lot of like thumbin the air, guessing the wind
direction, stuff.
But giving this flexibility nowmeans that diverse workload
(12:15):
patterns can apply to the samegear and then now we can do get
the advantages at the lowestlayer.
That ultimately ends up asbeing better SLAs, better
performance profiles and, mostlikely as well, durability,
because that means the longevityof the hardware is going to be
(12:40):
ultimately stretching out thecost.
There's wins all around, Iwould say.
Speaker 2 (12:44):
Right, no, agreed,
agreed out the cost and there's
wins all around.
I would say right, no, agreed,agreed now, one thing I was
curious.
Speaker 1 (12:50):
you know,
sustainability is coming up more
and more and I've been lucky,I've chatted with, uh, john
michael hands has been on a acouple times in the podcast, and
we've talked a lot about thesort of secondary factor.
We don't necessarily say, hey,you know, the stuff you just
just mentioned is like itactually does have a real strong
sustainability angle, but notas a primary purpose, it's, it's
(13:11):
a, it's sort of a bonus friesto it.
But this is the advantage that,like we used to, you get a
piece of hardware and it coulddo one thing, you know, and that
was it.
But now this is adding moreflexibility and I would say then
the environmental impact isprobably much better because
we're likely driving betterpower utilization, the use of
(13:43):
those large cap drives, so thatwe can again, you know, stretch
the need of the workload as faras possible without having to
put more impact on the power,air conditioning and just the
actual gear itself.
Speaker 2 (13:54):
Yeah, and that's
interesting.
I haven't really thought of itin that context, but that is a
you know.
I would call it a secondaryvalue proposition of you know
this type of technology.
Speaker 1 (14:07):
And then, as the
classic goes, everything sounds
great when it's going greatuntil it's not.
So stuff like RAID, rebuildassist and other things around
the recovery and the recoupingthat let's take us through.
You know what does this mean,for you know developments
happening in that area.
Speaker 2 (14:28):
Yeah, you know.
So, like I said in the beginning, you know there's been
innovation on numerous fronts,and we just touched on some of
the, you know, more HDDhyperscale-centric ones, but
also with regard to theinnovations just in performance
with Flash, especially the readcapabilities of Flash.
(14:51):
One problem that keeps gettingdiscussed is rebuild times.
The drives are getting so bigand rebuilds are pretty compute
intensive process, and not onlydoes it take a long time and
have the vulnerability of havingone drive out of your array,
(15:14):
but also then the performance ofthe system, you know, is
typically impacted.
And so what T10 has done isthey've implemented what they
call rebuild assist, and thisgives the drive the ability to
communicate via SAS, you know,to the rate engine.
You know about the health ofthe LBAs, specifically marking
(15:39):
the failed LBAs, so that when itcomes to a rebuild right, the
rate engine can use thatinformation and only then
recreate, because recreating thedata is one of the big ticket
items, and if they only have torebuild the specific things that
are bad instead of everything,the rebuild process can be much
(16:03):
more efficient.
So that's one that's rollingout today as well, and you know.
Speaker 1 (16:11):
On another, oh, go
ahead.
So I was going to say and youknow it's funny talking about
sort of percentage ofutilization, not that we're like
sort of looking at market specsand market you know
implementations, but again wekind of over-rotate to the new
exciting, shiny stuff.
I would say but like, what isthe percentage of SaaS?
(16:35):
You know, storage that'ssitting out in these
hyperscalers today is probablythe significant portion of of a
lot of their, their data layer.
Speaker 2 (16:43):
So you know and I
haven't seen the most recent
data but at the beginning ofthis year a report that I saw
was estimating, you know, about10% of of the of the capacity,
the total capacities in thesedata centers is not SAS relay,
is not on a SAS infrastructure.
So 90, 90% is behind the SASinfrastructure.
(17:05):
So remember that that's notonly SAS.
You know SAS drives because youknow the SAS drives may not
make up, you know that surelydoesn't make up that much.
But then all the all the ATAdrives right sit behind a SAS
infrastructure and you knowthat's part of the SAS spec is.
You know the SAT layer and theytranslate between SCSI and ATA.
(17:30):
And so all those ATA drives,all those near line high cap
drives, are all part of the SCSIinfrastructure.
Speaker 1 (17:39):
Yes, they say my
favorite lines always.
You call it legacy, I call itproduction.
All this time we talk about thepast, but the past is actually
pretty happily working in thepresent and it has a long future
ahead.
Speaker 2 (17:53):
You know, and the AI
is the buzzword right,
Everybody's eyes are spinningwith AI and it is a very
important uh technology and avery important transition within
the ecosystem.
Um, but remember, it takes alot of data to drive these
models and, while sas may not belike at the heart of the you
(18:18):
know the AI machine, right, it'svery important in containing
all this data that then go tocreate these models.
Speaker 1 (18:28):
Well, and I would say
this is where we will begin to
see.
In fact we're already startingto see this, you know and I
always mispronounce it whetherit's Jevons or Jevons paradox
the idea that we make somethingso efficient that it becomes
popular, you know, and that itrises in utilization, but the
idea that we think it's justgoing to be like stable and age
out, but in fact it becomes thisnew spot and we're seeing a lot
(18:50):
of, you know, the hope, atleast at the hyperscale level,
that they can use things like S3, you know, for storing training
data.
And then we begin to see thework of stuff like cxl and sdxi.
But at the underlay, what?
What is the thing that we'restill doing?
The artifacts are still thereat these hardware layers and
(19:15):
we're getting optimizations andhow we manage the way that you
know, like I, just the fact thatRAID, rebuild, assist, knowing
that you can be aware of whatthe performance impact is of
stuff and potentially, you know,run production workloads.
Alongside these things wherewe're very laser, specific on
(19:35):
what the rebuilds are versusbefore, like at these scales, it
would be impossible to managedata centers if we didn't have
the innovation that's gone on inthat SaaS layer.
Speaker 2 (19:47):
Agreed, agreed, and
it's been around for a very long
time and it will be around fora very long time.
Speaker 1 (19:54):
So, looking ahead,
rick, what's the kind of next
things that we're going to seecoming through on the STAY?
Speaker 2 (20:16):
community and, in
fact, you know what do you see
as some of the othercollaborations, even at the
human physical layer, andthere's a lot of work being done
in T10.
T10 is working vigorously on anumber of these things and there
are a bunch, and so a lot ofthese specs take time to come
out and to get done and donecorrectly and to get done and
(20:37):
done correctly.
So some of the things we'vebeen talking about are complete,
some are still being worked on,but some of the things we're
looking at in the future.
Security is a big topic, right,and so is there anything SAS
can do in terms of security?
Some ideas around key per IO,right.
(20:59):
So having a key for every,every piece of data going across
, um, that that's one that youknow that's.
That's something that's beingtossed around.
Um, um attestation, supportiveattestation, um is another one
um you know to be able to go be.
You know SAS being theinfrastructure that then all the
media is sitting.
You know the media that we'vebeen talking about, um, is
(21:22):
sitting behind.
You know SAS being theinfrastructure that then all the
media is sitting.
You know the media that we'vebeen talking about is sitting
behind.
You know it's a good place forthat type of thing.
So you know, there are a numberof things that you know that
are being looked at.
Speaker 1 (21:34):
Yeah, and you know,
and I just I picked on security
because security, security is ahot topic these days and so as
it should be yeah, well, wealways, I always, you know sort
of laugh, because we we had yeah, at one point somebody talked
about there's devops was again,it was the hot new thing.
Now that was really a new thinghad been around since rad, you
(21:57):
know development.
We just renamed it, we calledit devops, uh, and then all of a
sudden you have this idea ofdevops where we called it
devsecops and people got alittle earth because they're
like, why do you have to saydevsecops, like it's implied?
I said whoa, no, no, it washoped, it was not implied, it
was like no one did actuallyinvite the security team, and
that's another thing.
With what's going on with thescuzzy trade association and
inside sia.
It was not implied, it was likeno one did actually invite the
(22:18):
security team.
And that's another thing.
With what's going on with theSCSI Trade Association and
Inside SIA, you're surrounded bypeople that are cross group.
So you have security pros,you've got the CSPs, msps and
the hyperscalers that areparticipating.
You've got individual hardwarevendors, you've got memory
(22:39):
vendors, gpu vendors,everybody's in this room and
whereas we used to kind of goaway and develop our own
individual pieces and ourpillars and then they would come
together at the end with maybeAPIs or SDKs, if you're lucky.
Well, now you're literallyhaving conversations with these
security people so that, as yousay, the idea of like doing key
(22:59):
generation and dynamic you know,key per IO that's very much a
thing that requires both sidesof that conversation to be
present out with our you know,our integration into SNEA.
Speaker 2 (23:23):
Something that I, you
know I guess I didn't
appreciate is, you know, thatcommunity around us to give us,
you know, access to the expertsin all these different fields.
Right, that's something that I,you know, that I think a lot of
us didn't really foresee, butit's very beneficial to us.
And you know we are using SNEA,you know the infrastructure of
(23:46):
SNEA.
Speaker 1 (23:47):
In that case, Well,
there's definitely a lot of
excitement ahead and, like Isaid, for anybody that thinks
you know SAS has settled science, this is not the case and you
know there's still innovationswith LTO.
There's still innovations inall these other different
storage.
You know layers.
As much as I'd like to say thattape is dead, you know SAS
(24:10):
isn't dead, it just smells funny.
You know that was my old, it'sa Frank Zappa line of jazz.
But there is so much cool stuffthat's ahead and so I'm excited
.
And first of all, rick, thankyou very much for taking the
time to share today and forfolks that do want to get
connected with you, what's thebest way they can?
They can get together with youand and talk with the folks on
(24:34):
the on the stay community andand get connected with t10 as
well.
Speaker 2 (24:39):
Yeah, it's, it's via,
so I think the best way is via
the Scuzzy Trade Association,via SNIA and and I believe we'll
put that that, the, the link toour website at, which contains
our roadmap and you can see the.
You know, some of thoseinnovations.
We'll we'll include that linkin in the in the proceedings of
(25:01):
the podcast.
Speaker 1 (25:02):
Yeah, that's
fantastic and actually that was
a really good point.
We want to make sure that folksdo check it out, because being
able to see those roadmaps givesyou a sense of what's coming
and also where they see theopportunity to collaborate,
because quite often you knowpeople.
They think that they're headingtowards some uncharted
territory and you find out thatthere's a bunch of people
already waiting for you at thebottom of the hill.
(25:24):
You're like okay maybe we cansave a lot of time, and I think
that's.
The innovation pace thathappens in these communities is
so much greater, in a quiet way,because of the work that's
happening across companies whoare technically competitors in
the field.
And yet this is a safeinnovation collaboration space
(25:47):
where we can meet amazing humansand learn about amazing
technology capabilities.
So I'm a fan.
I'm a fan.
Well, rick, thank you so muchand, yeah, happy new year.
I guess is about the time.
This is rolling out.
People may be hopefully spendingNew Year's with a glass of
(26:08):
bubbly in their hand and a SNEApodcast on their iPod while
they're listening.
No better way to start youryear than with great
conversations like this, youryear than with great
conversations like this.
So, rick, thank you very much.
And for folks, of course, docheck out this and other great
conversations on the SnehaExperts on Data podcast.
We've got the audio version andwe've got this is also going up
(26:29):
on YouTube so we can see thesebeautiful smiles happening.
And thank you very much fortaking the time with us today,
rick.
Speaker 2 (26:36):
Thank you, eric, all
right.