Scenario:
Suppose we have an excel Dataset, with Millions of Rows, for which we want to apply the Limit while load at the Source step stage, using a Parameter, say Year.
Also, we want to Load only first N rows for the selected Year parameter, to see the sample data.
These parameters we can adjust in Power BI Service, to enable the Full Load.
This will be the most Powerful and useful technique when we deal with the large Excel datasets.
1) First create the required Parameters for Datasource, Dataset, Load Year, First N Rows:
xl_Datasource = E:\Tech_Lab\Power BI\DataSets\ds_Sample_Data.xlsx
xl_Dataset = tbl_OrderDetails
loadYear = 2014
xl_LoadFirstN_Rows = 5
xl_Datasource : It is a required Parameter of the type Text, which holds the path of source file.
xl_Dataset : It is a required Parameter of the type Text, which holds the sheet/table of source dataset.
loadYear : It is a required Parameter of the type Decimal Number, which holds the Year value, which will used to limit the Data Load at the Source step of the Power Query.
xl_LoadFirstN_Rows : It is an optional Parameter of the type Decimal Number, which holds a number, which will used to limit the no. of rows to Load at the Source step after the Year filter of Power Query.
2) Next pass the loadYear Parameter in the next step to Source:
In this step, Dataset headers will be Promoted and then it will be Filtered by the [Order_Year] field based onthe parameter loadYear.
Limit_Load_byYear
= Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data],[PromoteAllScalars=true]),each [Order_Year] >= loadYear)
We can leave blank or pass zero for this optional parameter to load all rows.
Filter_FirstN_Rows
= if (xl_LoadFirstN_Rows=0 or xl_LoadFirstN_Rows is null) then (Limit_Load_byYear) else Table.FirstN(Limit_Load_byYear,xl_LoadFirstN_Rows)
4) Finally, do the required transformations.
The complete power query used in this scenario is as follows..
let
Source = Excel.Workbook(File.Contents(xl_DataSource), null, true),
/* To Load TopN Rows */
Limit_Load_byYear = Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data],[PromoteAllScalars=true]),each [Order_Year] >= loadYear),
Filter_FirstN_Rows = if (xl_LoadFirstN_Rows=0 or xl_LoadFirstN_Rows is null) then (Limit_Load_byYear) else Table.FirstN(Limit_Load_byYear,xl_LoadFirstN_Rows),
#”Changed Type” = Table.TransformColumnTypes(Filter_FirstN_Rows,{{“Order_Id”, Int64.Type}, {“Order_Date”, type date}, {“Order_Year”, Int64.Type}, {“SalesCountry_Id”, Int64.Type}, {“Country_Name”, type text}, {“Cust_Id”, Int64.Type}, {“Cust_Name”, type text}, {“Prod_Id”, Int64.Type}, {“Prod_Name”, type text}, {“Gross_Sales”, Int64.Type}})
in
#”Changed Type”
Result:
In case, if we want to limit data load by a DateTime field, we can create and pass parameter as follows..
= Table.SelectRows(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data],[PromoteAllScalars=true]),each [Order_Date] >= loadDate)
——————————————————————————————————–
Thanks, TAMATAM ; Business Intelligence & Analytics Professional
——————————————————————————————————–
Be the first to comment