Part 3: Reflections on Excel to DAX Translation


Introduction

So, yes, I feel much better now after Part 2: Reflections on Excel to DAX Translation. In this post I would like to cover some of the challenges and triumphs encountered thus far, talk about my favorite solutions and ask your opinions!

Other Things I Learned

First, real quickly, here are a couple of other notable things that I learned thus far:

Favorites

So, of course you are going to develop favorite solutions, ones that you are proud of or just make you laugh. I have a couple of these.

IMSQRT

IMSQRT is definitely one of my favorites just because I laugh every time I say it. I M sqROOT! And yes I always say it with “the voice”! Other than that, it is one of the many complex number functions and returns the square root of an imaginary number. 

imsqrt.jpg

ARABIC

I just consider the solution for ARABIC elegant, that’s all. No other reason. I love the method of converting a string into a table. I have used the technique quite a bit in this project. And I enjoy the way it looks “ahead” and “behind” within the table representation of the string in order to arrive at its final calculation.

Image-of-Arabic-Nigths-by-Erdali.jpg

TRIMMEAN

TRIMMEAN is probably the function that I am most proud of. This one took some effort to puzzle through and, frankly, it accomplishes something that I was not sure DAX was even capable of doing. In fact, I nearly gave up on this function two or three times. To make it work, I had to implement a pretty novel idea I had and I consider the end result rather elegant overall. In fact, I like it so much I am going to put forth the extra effort in explaining exactly how it works like the explanations I provide in my new book, DAX Cookbook: Over 120 recipes to enhance your business with analytics, reporting, and business inte….trimmean3.jpg

Shameless self-promotion out of the way…the purpose of the TRIMMEAN function is to essentially remove outliers and then caculate the average. The Excel function documentation describes it this way:

 

“Returns the mean of the interior of a data set. TRIMMEAN calculates the mean taken by excluding a percentage of data points from the top and bottom tails of a data set. You can use this function when you wish to exclude outlying data from your analysis.”

 

And here is the code for TRIMMEAN:

 

TRIMMEAN = 
    VAR __Table = 
        ADDCOLUMNS(
            'Table',
            "Rank",RANKX('Table',[Value])
        )
    VAR __Percent = .2
    VAR __Count = COUNTROWS(__Table)
    VAR __Trim = MROUND(__Count * __Percent,2) / 2
    VAR __RanksTable =
        ADDCOLUMNS(
            ADDCOLUMNS(
                GROUPBY(
                    __Table,
                    [Rank],
                    "Count",COUNTX(CURRENTGROUP(),[Value]),
                    "Value",MAXX(CURRENTGROUP(),[Value])
                ),
                "CumulativeBottomCount",COUNTROWS(FILTER(__Table,[Rank] >= EARLIER([Rank]))),
                "CumulativeTopCount",COUNTROWS(FILTER(__Table,[Rank] =__MinBottom),
            "Product",IF([BottomWhile]>=0,[Count]*[Value],([Count] + [BottomWhile]) * [Value])
        )
    VAR __FinalTopRankTable =
        ADDCOLUMNS(
            FILTER(__RanksTable,[TopWhile]>=__MinTop),
            "Product",IF([TopWhile]>=0,[Count]*[Value],([Count] + [TopWhile]) * [Value])
        )
    VAR __Bottom = SUMX(__FinalBottomRankTable,[Product])
    VAR __Top = SUMX(__FinalTopRankTable,[Product])
RETURN
    DIVIDE(
        SUMX(__Table,[Value]) - __Bottom - __Top,
        __Count - 2 * __Trim
    )

 

Stepping through the code, we start by simply taking our table under analysis and adding a simple Rank column using RANKX. We store this as the variable __Table. The __Percent variable is the percentage of rows to trim off of the top and the bottom. In the code above, that is 20% (.2). To calculate how many rows need to be trimmed off of the top and bottom, we caculate __Count using COUNTROWS and the __Trim. Thus, __Trim is the number of rows that should be trimmed off the top and the bottom of the ranked table. In other words we want to trim off the __Trim lowest and the __Trim highest values.

 

Now, if we didn’t want to account for ties in ranks, we’d be pretty close to done, we could just get our highest rank and lowest rank, subtract/add our value for __Trim from these values and filter our table, AVERAGEX, boom, done. However if we want to account for ties so that our DAX TRIMMEAN function operates exactly like the Excel version, things are rather more complicated. We start by creating the __RanksTable. To do this we start by using GROUPBY to group __Table by Rank. and we add columns for how many at each rank we have, Count, as well as the value of the data at that Rank, Value

 

To accomplish what we want we essentially have to implement a double, concurrent while loop. Of course DAX does not have a while loop, but we can essentially emulate one, or in this case two. In order to do this, we implement two running totals, the columns, CumulativeBottomCount and CumulativeTopCount by using ADDCOLUMS. These essentially provide a cumulative total coming from opposite ends of our rankings. CumulativeBottomCount counts up the number of rows starting from the bottom (lowest values) while CumulativeTopCount counts up the number of rows starting from the top (highest values).

 

We now need to implement our while loop counters. These are BottomWhile and TopWhile. We do this by again using ADDCOLUMNS. We must do this as a separate ADDCOLUMNS because we need our CumulativeBottomCount and CumulativeTopCount columns. The calculations for both of these columns are almost identical, we simply subtract either our CumulativeBottomCount and CumulativeTopCount columns from __Trim. Thus, these columns count down starting at __Trim until they reach __Trim minus the cumulative count of columns in our dataset.

 

Our next step is to identify where we cross the threshold of interest. In other words where we would “break out” of our while loop in a more tranditional programming language. In this case, our threshold of interest is the row where our counter goes negative or equals zero. This is because this is where we need to stop “trimming” rows/values. These threshold values are stored in the variables __MinBottom and __MinTop.

 

We can now calculate two tables that only contain ranks whose values we wish to trim out of the final dataset. These are the variables __FinalBottomRankTable and __FinalTopRankTable and they operate identically. We start by using FILTER to filter out only the rows that have a BottomWhile or TopWhile value that is greater than or equal to our correspoding __MinBottom or __MinTop variables. We add a Product column using ADDCOLUMNS and this is where the magic happens. If our loop counter in a row is greater than or equal to zero, we simply return the Count column multiplied by the Value column as we will want to deduct the entire value of all rows in our original dataset at this rank. However, if our while loop counter is less than 0, then we add our while loop counter (which is negative) to our Count column and then multiply by the Value column. Thus, we will only be deducting a portion of the values at this particular rank.

 

The rest of the formula is very straight forward, we sum up the total amount of values that we wish to subtract from our overall total, __Bottom and __Top. We then return our average or mean by summing up all of the the table’s Value column while subtracting __Bottom and __Top. We then divide this sum by our __Count minus 2 multiplied by the number of items to trim, __Trim.

 

If you think it is overly complex, I completely understand. If you have a better way of doing it, I’d love to hear it! I went down a number of rabbit holes on this one and this is how I finally solved it. But, there may be a better way!

 

Least Favorite

Without a doubt, COUPON. That’s all I’ll say, coding this one was awful for a variety of reasons.

 

Most Useful

OK, a little different than favorite, here are the functions I created that I think are the most useful and should be considered for inclusion in Power BI Desktop as new Quick Measures:

  • BASE – This forms the basis for allowing DAX to handle bases other than base 10 as well as opens up the entire bitwise operator library of functions
  • DECIMAL – Obviously, the complement to BASE
  • COMPLEX – Similar to BASE, opens up the entire complex math functions to DAX
  • To **bleep** with MOD – You know, because it fixes the MOD function

Honorable Mentions

I really think that the techniques used in TRIMMEAN and ARABIC are highly useful techniques but do not rise to the general utility of a true Quick Measure. Maybe TRIMMEAN. It does have fairly sophisticated DAX that casual users would probably not be able to code and could be parameterized easily, Table[Column] and % to trim.

 

Some of the distribution functions (BINOM*, NEGBINOM.DIST, LOGNORM*) could easily be parameterized and made into Quick Measures.

 

Oh, definitely @OwenAuger ‘s ERF replacement!

 

Conclusion

This project has been fantastic in terms of a learning experience thus far but I would love to hear from the community. What do you think of this project? What functions are your favorites? Are there any that you think should be included as default Quick Measures in Power BI Desktop?



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*