Best practices can be wonderful things when the rational behind them and their nuances are fully understood. However, applying best practices blindly can sometimes get you into trouble. For starters, best practices are most often general in nature and do not apply to every particular situation. In addition, best practices can sometimes conflict with one another. In this article, I explore two “best practices” that are often quoted within the Power BI community and demonstrate their nuances. Specifically, Roche’s Maxim and, in particular, using SUMMARIZE to create calculated columns.
Roche’s Maxim
Simply stated, Roche’s Maxim states:
“Data should be transformed as far upstream as possible, and as far downstream as necessary.”
Now, as best practices go, this is a pretty good one. However, there are still nuances. First, it’s distinctly possible that a report developer might not have access to the source system to make changes. Getting the changes made at the source might take weeks or even months depending of the size of the organization. While it is certainly possible to get the transformations made at the source, in the interests of speed it might make more sense to make the data transformations in Power Query then rather than hold up the process.
But, even Power Query might not be the best answer either in all cases. For example, perhaps there is a median calculation that needs to occur. Median calculations tend to break query folding. So, one can imagine a scenario where adding the calculation in Power Query breaks query folding and actually makes scheduled refreshes less performant than if the calculation was just done in DAX, assuming that fewer ills are encountered when doing so.
Then there is the subject of competing best practices. Another best practice is that for maintainability and supportability reasons it’s generally a best practice to write all code in as few languages as possible. If there are already going to be lots of DAX calculations within the report anyway and this is the only calculation that would occur in Power Query, then maybe a better path is to keep all of the calculations in a single coding language, DAX.
As I said, nuances. But, hopefully this illustrates some of the reasons why one cannot just go around applying best practices blindly without knowing the specific situation. Let’s now turn our attention to adding columns in DAX SUMMARIZE expressions.
Adding Columns in SUMMARIZE
So here is one that I see all the times in the forums. Myself or someone else will answer a question where SUMMARIZE is used and within that SUMMARIZE expression is the addition of a column, like so:
Measure =
SUMX(
SUMMARIZE(
'Sales',
'Product'[Color],
"Sales", SUM('Sales'[SalesAmount])
),
[Sales]
)
Quite frequently, some individual will come along and quote a “best practice” from this article that says that adding columns within a SUMMARIZE is always bad because it can lead to unexpected results. Not wrong results mind you, but unexpected.
But let’s dig into this “best practice”. First, unlike Roche’s Maxim, which I would feel confident in saying is a “generally accepted best practice”. This “best practice” with SUMMARIZE is only something I’ve seen espoused by an extremely limited number of people. It’s certainly not part of any official documentation on SUMMARIZE and I’ve seen nothing from Microsoft that states this as a best practice.
Futhermore, the proposed alternatives either don’t really work or are cumbersome and kind of nonsensical in nature. For example, SUMMARIZECOLUMNS pretty much doesn’t work in anything outside of a Card visual. Create a table or matrix and add any column and the thing up and dies with the error: ‘SummarizeColumns() and AddMissingItems() may not be used in this context.’ Not useful.
The other alternative results in a kind of bizarre construct. Essentially to write the measure using “best practice” would be:
Measure =
SUMX(
ADDCOLUMNS(
SUMMARIZE(
'Sales',
'Product'[Color]
)
"Sales", CALCULATE(SUM('Sales'[SalesAmount]))
),
[Sales]
)
Yep, a seemingly random CALCULATE with no filter clause. And yes, we all know that this is because of how CALCULATE translates row context to filter context, etc. But intuitive? Not in the slightest. There’s an even more unintuitive version using GENERATE and no ADDCOLUMNS but I think the point has been made.
But here’s the thing. If you actually read the entire article cited above regarding this “best practice”, you actually realize that the unexpected results only occur when using CALCULATE within the added column within the SUMMARIZE. This is because of the way that CALCULATE’s context switching and the clusters created by SUMMARIZE interact with one another.
The solution? Simple. Don’t use CALCULATE when adding columns within a SUMMARIZE expression. Duh. Why isn’t that the “best practice”? Unless someone can come to me with a concrete example where not using CALCULATE when adding a column within a SUMMARIZE expression returns a wonky, wrong or unexpected result, I just don’t see the point of throwing the baby out with the bath water.
Conclusion
Knowing about best practices is important. Knowing the nuances, rational and when to apply best practices is more important. Knowing the difference between a best practice and an opinion is perhaps the most important.
Be the first to comment