Power Query to remove unwanted Columns dynamically from Excel Dataset before load to Power BI | Power BI Blog


How to load only required Columns dynamically from Excel Dataset into Power BI using Power Query
Scenario:

Suppose, we have an excel dataset with some required and the unwanted columns like Colum2, Column 4, Prod_Id_2, Country_Id2, Cust_Id – CopyCust_Id – Copy.1, Country_Id3..etc

These kind of unwanted Columns can be removed as per and load only the required Columns as explained below:

1) First Load the Dataset(not Data) and keep only Source Step:

Source = Excel.Workbook(File.Contents(xl_DataSource), null, true)

Note: I have used the Parameters for DataSource and Dataset :

xl_DataSource = E:\Tech_Lab\Power BI\DataSets\ds_Sample_Orders.xlsx

xl_Dataset = tbl_OrderDetails

2) Next get the list of Column Names from the Dataset:

= Table.ColumnNames(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data], [PromoteAllScalars=true]))

3) Next convert the list of Column Names to a Table and then create a Flag column to identify the Columns to Remove from the Dataset:

ConvertToTable = Table.RenameColumns(Table.FromList(SelectColNames, Splitter.SplitByNothing(), null, null, ExtraValues.Ignore),{{“Column1”, “ColumnNames”}})

FlagCols2Remove = Table.AddColumn(ConvertToTable, “RemoveCols_Flag”, each if ([ColumnNames] is null or [ColumnNames] = “null” or Text.Length(Text.Select([ColumnNames],{“0″..”9″}))>0) then 1 else if Text.Contains([ColumnNames],”- Copy”) then 1 else 0)

Note :

The “RemoveCols_Flag” retruns 1 for the Columns to Remove and 0 to Keep.

4) Next generate a final list of ColumnNames to Remove where RemoveCols_Flag=1

ColsList2Remove= (Table.SelectColumns(Table.SelectRows(FlagCols2Remove, each ([RemoveCols_Flag] = 1)),”ColumnNames”))[ColumnNames]

5) Finally, pass the above list of Columns to Source Query, to remove them and Keep only the required columns to load

SelectReqCols= Table.RemoveColumns(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data], [PromoteAllScalars=true]),ColsList2Remove)

Result :

The complete Power Query used in this Scenario is as follows:

let

    Source = Excel.Workbook(File.Contents(xl_DataSource), null, true),

    SelectColNames = Table.ColumnNames(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data], [PromoteAllScalars=true])),

    ConvertToTable = Table.RenameColumns(Table.FromList(SelectColNames, Splitter.SplitByNothing(), null, null, ExtraValues.Ignore),{{“Column1”, “ColumnNames”}}),

    FlagCols2Remove = Table.AddColumn(ConvertToTable, “RemoveCols_Flag”, each if ([ColumnNames] is null or [ColumnNames] = “null” or Text.Length(Text.Select([ColumnNames],{“0”..”9″}))>0) then 1 else if Text.Contains([ColumnNames],”- Copy”) then 1 else 0),

    ColsList2Remove = (Table.SelectColumns(Table.SelectRows(FlagCols2Remove, each ([RemoveCols_Flag] = 1)),”ColumnNames”))[ColumnNames],

    SelectReqCols = Table.RemoveColumns(Table.PromoteHeaders(Source{[Item=xl_Dataset,Kind=”Sheet”]}[Data], [PromoteAllScalars=true]),ColsList2Remove),

    #”Changed Type” = Table.TransformColumnTypes(SelectReqCols,{{“Order_Id”, Int64.Type}, {“Order_Date”, type datetime}, {“Prod_Id”, Int64.Type}, {“Country_Id”, Int64.Type}, {“Cust_Id”, Int64.Type}, {“Gross_Sales”, Int64.Type}})

in

    #”Changed Type”

——————————————————————————————————–

Thanks, TAMATAM ; Business Intelligence & Analytics Professional

——————————————————————————————————–



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*