Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric

Episode Transcript
Available transcripts are automatically generated. Complete accuracy is not guaranteed.
(00:02):
If you've struggled with accessing data
for your analytics and AI workloads,
as it's spread across different clouds
or databases and in different formats,
today, we will look at theoptions available to you
for connecting and accessingdata wherever it lives
with the unified data lake,
OneLake part of the cloud dataanalytics and AI platform,
Microsoft Fabric.

(00:22):
And importantly, we'llshow you how easy it is
for your team members to findthe data that you brought
in with the new OneLake catalogand how you can use Copilot
and Fabric as you work to interact
with your data whereverit lives from OneLake.
And joining me today from theMicrosoft Fabric Product team
is Maraki Ketema.
Welcome to the show.
- Thanks for having me.
- And thanks so much for joining us today.
But before we get intothis, why don't we set a bit

(00:44):
of context for anyone who'snew to Microsoft Fabric.
So, Microsoft Fabric is a preintegrated
optimized SaaS environment,
which provides a comprehensiveset of data analytics
and an AI platform withbuilt-in capabilities
for data integration, dataengineering, data science,
data warehousing, real time intelligence,
data visualization andoverall data management.

(01:06):
Underpinning Fabric is itsmulti-cloud data lake, OneLake,
which gives you a centralpoint for data to be discovered
and accessed wherever itresides across your data
estate at scale.
Now, we've covered MicrosoftFabric in a lot of past shows,
but today we really want to be able
to specifically demystify
how it can help you get abetter handle on your data.

(01:27):
- Well, it helps on a number of levels.
You've already mentioned scalability
and with all of the integratedcapabilities for data teams
to collaborate on buildingclean quality data,
it can be done at scale for any use case.
And OneLake really is the key
to getting a handle on yourdata by making it accessible

(01:48):
with support for open formats
like Delta Parquet and Iceberg.
This helps eliminate traditional barriers
to working with your data,
and we give you a variety of methods
to bring your data intoOneLake like Shortcuts
where you can virtualizedata from where it's stored,
which creates a pointer
to any structured openfile-based tabular data

(02:09):
or unstructured files,even images and multimedia.
All this happens withoutduplicating the data
or options for Mirroring
where you can create analways up-to-date replica
of the source in Fabric.
And this is great for databases
and data warehouseswith proprietary formats
where your businesscritical data may be stored.

(02:31):
Now both of these options can be used
like any other native data in OneLake
and they require no ETL.
Then for all of your other sources
that require data transformation or read
or write capabilities,you can use the hundreds
of connectors provided by Data Factory
and Microsoft Fabric

(02:51):
to make your data nativelyavailable in OneLake
and to bring streaming data,
you'll use MicrosoftReal Time Intelligence.
You'll likely use these techniques
to different extents dependingon your data and AI needs
and whichever methodyou use to connect data,
we make it available with minimal latency.

(03:11):
This is super important,for example, for real time
or gen AI tasks becausethey're less predictable
whereas a user or agentinteracts on the backend.
This can quickly createa series of requests
to retrieve data which need tohappen fast to ground the AI
so that responses aren't delayed.
Fabric takes care of allof this for you at scale

(03:33):
and at low latency.
- So quality data thenbecome super accessible
whenever you need itand wherever it lives.
Why don't we show thema few examples of this?
- Sure.
So, today I'm going to walk youthrough an e-commerce system
and it's for a Supermartwith a grocery department
where we need to quicklyunderstand demand versus supply
as well as market competition over prices

(03:54):
and get a 360 view of operationsand customer experiences.
Now, different teams, includingmarketing, analytics and IT
are collaborating togetherin a single Fabric workspace.
Now here the marketing teamcreates promotions daily
and they work with different vendors
who are using differentsystems to store data

(04:17):
and there's no standard file type.
The good news is that we can connect
to all of these differentsystems using Shortcuts.
Let me show you how that works.
Here under Get Data, I can seemy option to bring data in.
I'll choose a new shortcut.
You'll see that I have both Microsoft
and non-Microsoft locations.

(04:37):
In this case, I wantto connect to Amazon S3
for unstructured data.
From here, if I don't alreadyhave a connected data source,
I can create a newconnection using credentials
for the service.
But to save time, I'll usean existing connection.
I'll choose the second option here.
I can explore the data available to me
and I can choose thespecific folders I want.

(05:00):
I'll pick a few for Kentoso and confirm.
Now the data's in OneLake,
and I can expand the folders
and look at different datalike these markdown files
with texts, whichcontain customer feedback
and I have a nice preview of the data
to understand what's in it.
Additionally, I have someimage data on my local drive

(05:21):
that I want to share with otherson my team as we're trying
to figure out the best placementfor in-store promotions.
The good news is that I can also shortcut
to all of this data in OneLakedirectly from my desktop.
Let's take a look.
Here I am in Windows File Explorer
and I'm connected to OneLake
and I can interact with these files

(05:41):
and sync them right from here.
In fact, here I'm adding an image file
from our grocery department,
and from the status I cansee that it's already synced.
Now if I move back over to Fabric,
you'll see that it's justsynced into my lakehouse view.
From here, I can previewthe image right away.

(06:02):
So now I have the information I need
to start analyzing customer sentiment
and where we can placepoint of sale promotions.
Again, in both examples,
the file data still remains at the source,
just like shortcuts on your desktop,
the data doesn't actually live in OneLake,
but always stays in sync.
Shortcuts in MicrosoftFabric are supported

(06:23):
for open storage formatslike Microsoft Dataverse,
Azure Data Lake Storage, GoogleCloud Storage, Databricks,
Amazon S3, and any S3compatible stores and more.
And you can also use Shortcuts
for on-premise data sources using Fabric
on-premise data gateway.
- And beyond your file data,your Supermart is probably

(06:44):
dependent on operational datathat's sitting in databases
and warehouses,
all of which might have theirown proprietary formats.
So what's the path of least effort then
to bring that data in?
- So this is where Mirroringin Microsoft Fabric
comes into play.
It makes it easy toreplicate data into OneLake
and storage is included
as part of your existingMicrosoft Fabric capacity.

(07:06):
Let's jump in.
Here, you can see my salesdashboard, which is broken down
by category, location
and even has some forecasting built in.
And on the back end,
I already have various sources mirrored
into my Fabric workspacein OneLake that are feeding
into this particular view.
I'm going to use Mirroringand create a new item

(07:27):
to connect to Azure SQL DB
and bring in data from theSupermarts in the same region.
I'll filter by mirror
and then select the AzureSQL Database option.
From here, I'll add my connection details.
I'll type the database name
and the rest securely auto completes.
After I connect, it takes seconds
to show the table in the database.

(07:49):
And from there it's just one more click
to create the mirror database
and now it's ready to use in OneLake.
Just like Shortcuts, allof this works without ETL
or moving the source data.
And now if we go backto our Get data page,
you'll notice that mostof the Azure databases
are directly supported forMirroring as well as Snowflake.

(08:10):
That said, you aren'tlimited to using Mirroring
for just these sources.
You'll notice that Ihave two sources here,
Salesforce and our legacyon-prem SQL database.
These were brought intoOneLake using open mirroring.
Open mirroring is an API, which lets users
and data providers bringdata in from any source

(08:32):
while keeping them in sync.
You can learn more
about open mirroring ataka.ms/FabricOpenMirroring.
- So Mirroring then has agreat potential than in terms
of being a frictionlessway to bring your data in.
But how real time thenis the synchronization?
- It's near real time.
Once you've created the Mirror database
and brought your data in, youdon't need to do anything else

(08:54):
to keep the data fresh.
On the backend Fabric iscontinuously listening for changes
and making updates to the data in OneLake.
So I'll go ahead andrefresh my sales dashboard
and you can see the updates flow in.
Our sales just quadrupled inseconds with this new database.
That's actually becausewe've added a lot more stores

(09:15):
with their sales data.
- This is really a game changerthen in terms of the time
to insights and that youhave these low friction ways
to bring your data in.
That said though, there are lots of cases
where you might wantto transform your data
and need to be able to usemore data integration work
before you bring it in.
- Right.
And that's where DataFactory and Microsoft Fabric
is a powerful engine thatcan bring in your data

(09:37):
at petabyte scale witheverything you need to prep
and transform the data to.
Let's take a look.
As you begin to create pipelines,
to bring your data,
you'll see that we now havemore than 180 connectors
to the most common data types.
And these span both Microsoftand non-Microsoft options.
And connecting to one is like we showed

(09:59):
before with Shortcuts.
If I click on Snowflake, for example,
I just need to add connection settings
and valid credentials to addthe data source to my pipeline.
And from here, let me godeeper on the pipeline
experience itself.
Here is one that I've already started.
It takes our Supermartdata through the bronze
and silver layers
before landing the curateddata in the gold layer.

(10:21):
To gain a deeper understanding,we can actually use Copilot
to generate a summary ofwhat the pipeline is doing
and in seconds, as Copilotexplains here, data is loaded
before data is curated,
and we have schema validation,
which picks up on file mismatches
and places them in a separatefolder after sending an alert.

(10:41):
The pipeline provides a visualview of all of these steps.
Then if I move over to my notebook,
you'll see that it appliestransformations on the data
before it's loaded into our gold layer.
Now, once my data's in OneLake,
I can also start buildingout my own data flows.
Here's a table that Ijust pulled in from Excel
that looks at grocery transactionsover the past quarter.

(11:04):
This table is currentlysuper wide, making analysis,
very, very difficult.
Here's where the powerof Copilot comes in.
I don't need to know the rightbuttons or terms or words.
Sometimes it can just beas simple as describing
how I want my tables to look,and I'll submit this prompt,
and almost instantlythe table is transformed

(11:25):
and more optimized for analysis.
While I'm at it, I can also use Copilot
to do a simple operationlike renaming a column
and pay attention to the middle column.
The name was just changed.
But what if someoneinherits the state of flow?
Copilot can also provide descriptions
of what your query isdoing to help save time.
It's described the query

(11:46):
and it's easy to understand for anyone.
And here's the real power ofeverything we've done today.
As you can see in our lineage,
we now have all our connecteddata sources from Shortcuts,
Mirroring, and now Data Factory.
Not only can I now see everythingconnected in my dashboard,
but I can also use naturallanguage with built-in AI

(12:08):
to ask questions of my data.
In this case, I want to get ahead
of wastage issues inour grocery department.
My dashboard doesn't quite help me here.
This is where we can use the built-in AI
to ask questions of the data.
So I'll go ahead and prompt it
with which productsare at risk of spoilage

(12:28):
and required discounting.
It'll take a sec, and once that completes,
I'll get a top level viewof the products at risk
with details about their expiration dates.
Under that, I can see thebreakdown of its reasoning
with a detailed table of eachitem with quantity per store.
And there's even the raw SQL query,

(12:50):
the agent used to derive these insights.
- And that was a really powerful example
of what you can do onceyour data is in OneLake.
But what if I'm not as close to the data
and I want to be able to discover data
that I have access to?
- OneLake has the one Lake catalog,
which is a central place for data users
to discover the data they needand manage the data they own.
Let's take a look fromthe OneLake catalog,

(13:12):
I can see everything I have access to.
On the left, I can filterthe views by my items,
items endorsed by otherson my team favorites
and individual workspaces.
At the top, I can alsofilter by different types
of data artifacts,insights, and processes.
Let's take a look at the Ask questions.
AI experience I just showed,

(13:33):
and here I can see the lineagefor how the data's coming in.
That said, with all thisease of data discovery,
it's super important to control
and manage access to the data
that's exposed through OneLake.
And what's great is thatdata compliance controls
from Microsoft Purview are built in.
I can see the sensitivitylabels for any data asset,

(13:54):
and from a lineage perspective,
these labels are automaticallyinherited from upstream
parent data sources.
Permissions are also fully manageable,
and if there's a directlink to this artifact,
I'll be able to see it here.
Under the direct access tab, I can see who
and which groups haveaccess to this data already.

(14:14):
And as a data admin, I canalso add users to grant access
to specific resources.
In fact, I'll go ahead andadd you to this one, Jeremy
and I can determine
if you're allowed to share it with others,
edit or even view the data itself.
- Okay, so now if wemove over to my screen,
I can see that the Ask Queueitem has been shared with me,

(14:35):
and it's available right here.
Now to show you the process to discover
and request something,I'll first filter data
in my catalog view by semantic models
just to narrow the list down a bit
and for items that youcan see but not access.
You'll see this icon here
and there's a buttonto request access like
with this operations model here.
And when I use that, I can add a message

(14:56):
for why I'm requesting
and send it to the admin for that data
to get their approval.
- And beyond accessmanagement, the integrations
with Microsoft Purview for data security
and compliance keep getting deeper.
Also, there's another optionfor bringing data into OneLake
that we haven't demonstrated,
and that's real time streaming data.

(15:17):
That's because there's an entire show
on how to do that usingreal-time intelligence
that you can check outat aka.ms/MechanicsRTI
- It's really great to see all the ways
that you can bring qualitydata into OneLake for analytics
to ground your AI workloads.
In fact, you can bringdata in from OneLake
for use with your Gen AI apps

(15:37):
and agents using Azure AI Foundry,
which we'll cover morein an upcoming show.
So, Maraki what do you recommend
for all the people watchingright now to learn more?
- It's simple, you can tryeverything I show today
and everything else Fabrichas to offer by signing up
for a generous 60 day free trial.
We don't even require acredit card to get started.

(15:58):
- So now you have lots ofoptions to bring data in
and to start working with it.
Thanks so much forjoining us today, Maraki
and thank you for joining us
to learn more about all the updates now.
If you haven't yet, be sure to subscribe
to Microsoft Mechanics andwe'll see you again soon.

All Episodes

Episode Transcript

Popular Podcasts

Stuff You Should Know

Dateline NBC

24/7 News: The Latest

.css-15opob5{left:0;position:absolute;top:0.8rem;} All Episodes

.css-14f5ked{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:2;overflow:hidden;}Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric

Episode Transcript

Popular Podcasts

.css-r6mb8g{margin:0;word-break:break-word;display:-webkit-box;-webkit-box-orient:vertical;box-orient:vertical;-webkit-line-clamp:1;overflow:hidden;}Stuff You Should Know

Dateline NBC

24/7 News: The Latest

All Episodes

Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric

Stuff You Should Know