Why You Shouldn’t Avoid Calculated Columns in Powe…


Recently Ed Hansberry posted a fantastic article, Why You Should Avoid Calculated Columns in Power BI — ehansalytics. It’s a great article in which Ed clearly articulates the standard Power BI best practice/mantra which can be summarized as:

  • If you have to use calculated columns, create those in Power Query or the source instead of DAX
  • Even better, use DAX measures instead of calculated columns

Now, I won’t reiterate the reasons here, it has to do with model size, refresh performance, etc. Read the article, I highly recommend it. However, as is too often the case with most best practices, a lot of nuance is generally glossed over. This is certainly true for the mantra in question, the use of DAX calculated columns. So, if you have read the best practices around this and are concerned about your use of DAX calculated columns, let me assure you that it is not the end of the world. Here’s why.

The best practices around DAX calculated columns are really geared toward data modeling professionals, not business users. And let’s face it, professional data modelers probably only represent a small fraction, maybe 10%-20% of the Power BI user base. Professional data modelers are highly technical and are likely versed in a variety of programming languages such as SQL, Power Query (M), as well as DAX, R, Python, you name it.

Let’s contrast this with a typical business user of Power BI. Most business users likely gravitate initially towards DAX because it has a familiar feel and many similar functions to Excel. Being somewhat less technical than professional data modelers, this may end up being the only language they ever really use.

It is also quite likely that these business users do not have access to source systems and probably wouldn’t even know where to begin even if they had access. And, Power Query (M) is an entirely different sort of programming language than something like DAX or Excel functions.

None of this is to insult the average business user but they are a business user and not a professional data modeler for a reason. They know more about the business and less about the technical stuff.

The point is, that requiring business users to learn something other than DAX really amps up the learning curve for Power BI and that’s not really what anyone wants. We want more people to adopt and use Power BI, not less.

Worrying about how much a calculated column is compressed, its cardinality and how much storage it takes up in the model really only comes into play at scale, like in the 200 million row table from Ed’s example. And even then, the calculated column still only takes up less than 1% of the total storage for the model.

So, again, for professional data modelers dealing with enterprise scale data, these sorts of things might matter. But for the typical business user dealing with tables with rows that number in the hundreds, thousands, tens of thousands or even a few million, the additional storage consumed and perhaps performance are barely noticeable, if at all. At the end of the day, it simply does not matter.

Another aspect of this discussion is overall maintainability of the solution, a subject that is too often overlooked. In most software solutions, writing a system in three different languages is generally considered a bad idea. So if you implement a software system partly in PHP, some C# and some Java, that’s a lot of extra technical debt to incur versus writing the system in a single language. It takes additional expertise to work on the system and make changes, fix bugs, etc. because you have to know all three different languages. So, there’s something to be said for just using a single language when implementing software.

Now, consider this with respect to the guidance that DAX calculated columns are bad. Well, if you are already going to have DAX measures and you can also write your calculated columns in DAX, that’s pretty good maintainability. Single language, single place to make changes, etc. Contrast that with the case where some of your custom columns are in SQL, some are in Power Query and some are in DAX. That’s pretty poor overall maintainability.

It is generally argued that creating columns in Power Query is better than DAX because data refreshes perform better. Well, not always. If your calculated column in Power Query breaks query folding, then it’s quite likely that you just decreased performance rather than increased performance over a DAX calculated column. So, absolutes are dangerous.

This is another “best practice” that needs some nuance. Look, measures are great, but they have some problems. First, they require a bit more technical expertise than calculated columns because you are dealing with filter context (in other words “filtering”). Second, they can often suffer from the dreaded “measure totals problem” whereas calculated columns never suffer such issues. So, again, measures tend to increase the technical complexity involved in a solution and this can be burdensome to business users just trying to do some basic analysis of their data.

Again, I want to be clear, Ed’s article is a great article and there is nothing “wrong” with the advice and guidance provided. It is absolutely the generally accepted “best practice” when it comes to professional data modeling. The point here is that as with all such “best practices”, the application of them is seldom, if ever, universal. So, if you are worried that your DAX calculated columns are going to bring about the apocalypse, don’t. If you are part of the 80%-90% of Power BI’s target market, business users, you almost certainly don’t have to worry about the issue in the slightest.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*