When I started looking into using the data warehouses feature in Fabric, I did see that there were limitations on Primary Key columns.
Below is my blog post on how I still use keys in my data warehouse, instead of using GUID’s which to me are long and hard to use.
In my example I am going to create a simple data warehouse which is going to consist of two-dimension tables (Date and Country) and a fact table with the Sales amounts.
I loaded my data into the lakehouse as shown below.
Creating my Warehouse
The first step is I needed to create a warehouse where I could create my tables.
To do this I changed the experience to Data Warehouse.
I then clicked on Warehouse and gave it the name of DW-FourMoo and clicked on Create.
Now in order to access the data I then created my warehouse table by importing the data from my lakehouse into the warehouse (Yes, I know it can be a bit confusing with all these similar names).
I then needed to load my source data into my warehouse table.
To do this I went into my Warehouse, clicked on Get Data and then selected “New Dataflow Gen2”
I then loaded my data via the Dataflow Gen2 into my warehouse table (I am not going to put in how I did it as it is not part of this blog post and I have covered this before in my other blog post “”).
Once loaded I could then my table in my warehouse as shown below.
I could now start creating my dimension and fact tables with the required keys.
Creating the dimension tables
Next, to create my dimension tables I used the following TSQL code below.
The code below is for my Country Dimension
Create table DW_FourMoo.dbo.tb_DimCountry AS Select distinct country_code, -- This is the column that I wanted to create the Key on ROW_NUMBER() OVER(ORDER BY country_code) as CountrySK from DW_FourMoo.dbo.staging_sales
I could then see that the query ran successfully.
And I could see the table and the actual Key values in my tb_DimCountry table.
I then used the following TSQL code to create the Date dimension.
Create table DW_FourMoo.dbo.tb_DimDate AS Select distinct CAST(CAST(time_ref as VARCHAR(10)) as DATE) as Date, -- This is the column that I wanted to create the Key on ROW_NUMBER() OVER(ORDER BY time_ref) as DateSK from DW_FourMoo.dbo.staging_sales
Once completed I could then see the table and values as shown below.
I now had my Date and Country dimensions.
Create the Fact Table
I am now at the point that I can create my fact table.
To do this I used the following TSQL code below.
Create table DW_FourMoo.dbo.tb_FactSales AS SELECT Country.CountrySK ,SUM([value]) as SalesAmount ,[status] ,D.Date FROM [dbo].[staging_sales] as S inner join dbo.tb_DimCountry as Country on S.country_code = Country.country_code Inner join dbo.tb_DimDate as D on S.time_ref = D.Date Group by Country.CountrySK ,[status] ,D.Date
I could then see my fact table as shown below.
The next challenge is what to do when I want to add or update rows in my dimension table. I want to keep my existing keys and only add new keys.
Add rows to my Dimension Table
In the code below I will show you how I added new keys to my dimension table.
Declare @MaxCountrySK as BigInt = (Select max(CountrySK) from DW.dbo.tb_DimCountry) Insert into DW.dbo.tb_DimCustomer SELECT distinct [country_code] ,ROW_NUMBER() OVER(ORDER BY country_code) + @MaxCountrySK as CountrySK FROM [DW_FourMoo].[dbo].[staging_sales] where country_code not in ( Select country_code from DW.dbo.tb_DimCountry )
If I wanted to add rows to the fact table, I could do this with traditional data warehousing where I only load the new data based on the date.
I hope that you have found this useful. If you have any questions or comments, please let me know.
Be the first to comment