
Use Case – There are many scenarios where you need to refer to previous row such as need to compute month on month growth in Sales. In this article, I talk about 3 techniques to refer to previous row and decide which one is the best.
Solution – Below are 3 methods to refer to previous row. To demonstrate, I have created a table which is named Master. I have created 3 queries which reference to this Master to demonstrate 3 methods. In below method, we are referring to previous row for Units Sold column.
Method 1 (Index Method) – In this method, we insert a 0 based index and use that to refer to a previous row. Below is the code.
let
Source = Master,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Previous", each if [Index]>0 then #"Added Index"[Units Sold]{[Index]-1} else null)
in
#"Added Custom"
Method 2 (Merge Method) – In this method, 2 indexes are inserted – One is 0 based and one is 1 based. Then we use these 2 indexes to self join the table. In this method, first record goes down to end of table, hence we need to perform a sort to maintain the original sort order. Below is the code.
let
Source = Master,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 1, 1, Int64.Type),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
#"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"Units Sold"}, {"Previous"}),
#"Sorted Rows" = Table.Sort(#"Expanded Added Index1",{{"Index", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index", "Index.1"})
in
#"Removed Columns"
Method 3 (List Method) – In this method, we use Table.FromColumns function to refer to previous row. Below is the code.
let
Source = Master,
Custom1 = Table.FromColumns(Table.ToColumns(Source) & {{null} & List.RemoveLastN(Source[Units Sold],1)},Table.ColumnNames(Source)&{"Previous"})
in
Custom1
A sample pbix file can be downloaded from here illustrating the above
https://1drv.ms/u/s!Akd5y6ruJhvhugbERzhaB96FVlLQ?e=SmjWXg
Methodology to measure time taken by the queries – Since Power BI Desktop doesn’t provide a very reliable method to measure time, the data has been loaded to Excel and Excel VBA code has been used to measure the time taken by the query.
Below VBA code has been used to measure time performance of queries. The code is credited to Charles Williams of FastExcel and has been taken from here – https://docs.microsoft.com/en-us/previous-versions/office/developer/office-2010/ff700515(v=office.14…
#If VBA7 Then
Private Declare PtrSafe Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare PtrSafe Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
#Else
Private Declare Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
#End If
Function MicroTimer() As Double
' Returns seconds.
Dim cyTicks1 As Currency
Static cyFrequency As Currency
MicroTimer = 0
' Get frequency.
If cyFrequency = 0 Then getFrequency cyFrequency
' Get ticks.
getTickCount cyTicks1
' Seconds
If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function
I am invoking below function and I will be using a Sub to call this function with required Query name
Function GetTime(QueryName)
Dim Timer As Double
Dim PreviousRefreshStatus As Boolean
Timer = MicroTimer()
With ThisWorkbook.Connections("Query - " & QueryName).OLEDBConnection
PreviousRefreshStatus = .BackgroundQuery
.BackgroundQuery = False
.Refresh
.BackgroundQuery = PreviousRefreshStatus
End With
GetTime = Format(MicroTimer() - Timer, "0.000")
End Function
This is the Sub used to populate the results of 3 approaches in one column of Result worksheet.
Sub Result()
Dim Ws As Worksheet
Dim i As Long, LastRow As Long
Dim Rng As Range, Cell As Range
Application.ScreenUpdating = False
Set Ws = Worksheets("Result")
LastRow = Ws.Range("A" & Rows.Count).End(xlUp).Row
Set Rng = Ws.Range("A2:A" & LastRow)
For Each Cell In Rng
LastColumn = Ws.Cells(Cell.Row, Columns.Count).End(xlToLeft).Column
Cell.Offset(0, LastColumn) = GetTime(Cell)
Next Cell
Application.ScreenUpdating = True
End Sub
Once I run the macro, it populates the result for all approaches. Hence, I ran the query 3 times to measure time in 3 rounds and took average of 3 rounds to arrive at the final time taken by a query.
NOTE – Below time results should not be taken as absolute but rather should be used to compare performance of methods. Timing depends upon computer configuration, processes being run and various other factors. Also the queries run at different times, report different timings because of this. Lower the time is, variation is more. Higher the time is, variation is lesser.
Results – Below are the results for a dataset containing 500, 5000, 50000, half a million and 1 million records. Since Index method was clearly losing out and took 304 second just for 5000 records, hence for >5000 records, this method was discarded.
Note – List method is referred to as ListGenerate in below pictures
Conclusion – Yes, Merge method is something which is the the fastest. As number of records increase, List.Generate has been slowing catching up. This is partly due to the fact that we are forced to sort in Merge method to maintain original sorting order. But if we can live up without sorting, then Merge method’s performance will be all the more better.
All Excel files can be downloaded from below
— End of Article —
Be the first to comment