Looking for a great introduction to DAX?


First, an intro from Matt.  I have only met Daniel Duque via LinkedIn.  Daniel contacted me last month with the kind message below.  He wanted to write a book review and was looking for suggestions on where to do this.  I suggested he write it here on my blog, hence the article that follows below.  In addition to the fact that I appreciate the positive feedback, I also really enjoyed Daniel’s writing style.  Not everyone is good at clear, concise communication and I think Daniel has it.  Read his review below and let me know if you agree.

Daniel

5 Things I Learned from Reading Supercharge Power BI (and why it’s a great introduction to DAX)

To start things off I want to first state why I chose to read Supercharge Power BI in the first place. I did so because I had a simple goal in mind – to improve my Power BI skills. There are a few different routes I could have gone with this. Perhaps take a class on a website like Udemy? Or attain a Power BI certification via Microsoft?  These are both solid options, but I wanted to start my journey with something that felt more ‘fundamental’, so I decided to focus on learning DAX specifically. A quick google search on books for learning DAX led me to the book ‘Supercharge Power BI’ by Matt Allington.

Now some may be wondering “what even is DAX?”. Well, here’s the definition directly from the Microsoft website –

Data Analysis Expressions (DAX) is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables.

For those that have ever used data visualization software, or even some other type of data manipulation software, it may now make sense why I wanted to start with DAX. If it’s not clear, essentially what I wanted was to learn how to prepare my data properly inside Power BI. Doing this ‘data prep’ would then allow me to create more insightful visualizations, or perhaps in some instance, just help me create ‘prettier’ charts.

In briefly sharing these five things I learned from this book, I hope to inspire the reader to dig deeper into the capabilities of DAX and Power BI. So, let’s get to it. Here’s 5 things I learned from reading Super Charge Power BI:

1) Data Modelling Schemas for Power BI

In the business of data, how you choose to model your data can be the difference between creating incredible insights for your company or creating chaos and confusion in your company (not to mention angry supervisors). That’s why I found this topic particularly important. Some common modelling schemas for Power BI include the snowflake schema and the star schema. Both shown here below:

Star Schema

Star Schema

Snowflake Schema

Snowflake Schema

Though the snowflake schema is popular when using traditional transactional databases, it’s not the best way to logically organize your data in Power BI. For those wondering why, well, there’s many reasons. But here’s two: first, the additional relationships in a snowflake schema can potentially cause negative performance impacts on your reports; it’s never fun when you’re presenting a report and a visualization won’t load. Second, a star schema is more ‘scalable’, simply meaning that new dimensions can be easily added.

Now, there’s an additional nugget of information I want to include here, and that is the visual star schema layout that Matt Allington refers to as the “Collie Layout Methodology” (It was invented by a fella named Rob Collie). Star schemas do not actually have to be laid out in a “star” shape. This layout is an arrangement of the classic star schema but laid out in a way that the dimensional tables are at the top, and the data tables are at the bottom. The intention is that this layout allows the user to visualize filter propagation from dimension to fact tables more easily. Shown here below:

Collie Layout Methodology

Collie Layout Methodology

2) How and when to use Measures

A measure is a way of defining a calculation in a DAX model. This is useful when you need to aggregate values from multiple rows in a table. This is similar to a formula cell in Excel if you’d like to think of it that way. The difference being a measure can operate over an entire data model as opposed to just a few cells in a spreadsheet. Some simple examples of how a measure could be used would be to concatenate the values of two columns, or to subtract the value difference between two columns. Here’s an example of what a measure could look like:

Measure

I want to preface the next topic (calculated columns) by saying here that you could achieve many of the same results of a measure by using a calculated column. One of the main differences between these two is that calculated columns will take up more space in your data model, as they are adding more columns to your table(s). Measure formulas are stored in Power BI in tables, but unlike calculated columns, there is no aggregated data that is added to the tables. That’s why it’s typically best practice to use measures vs calculated columns when possible. This will help keep your reports more efficient.

There are some additional differences between measures and calculated columns. Some of those differences deal with how DAX and Power BI filter data. I won’t get into that specifically here though, as it’s somewhat complex to explain. Matt Allington does a great job of explaining this concept in his book for those who want to understand these filtering differences better. For now, let’s move on to the next topic.

3) How and when to use Calculated Columns

PowerBI makes it easy for users to create new columns for tables that exist in the data model via calculated columns. A calculated column is a column for which the contents are defined by a DAX formula and evaluated row by row in a table according to the formula. These columns can be used in any part of a report. The way Power BI will interact with these columns is no different than how it interacts with any other column.

In the measures section I stated that it is typically best practice to use measures instead of calculated columns whenever possible. So here I want to give you two instances of when it may actually be better to use a calculated column. Again, the reason as to why this is has to do with the different ways measures vs calculated columns filter data within Power BI. You’ll have to dig beyond this article to truly understand these topics, but I at least want to give you these examples to let you know what is possible.

Two instances when you might want to use calculated columns are for static segmentation and ABC classification. What static segmentation does is classify numerical values into ranges. A classic example of this is the analysis of sales on types of products by price range, as opposed to listing item prices individually; essentially simplifying the analysis by grouping item prices within a few segmentations. The second instance is ABC classification, what is done here is you’re grouping values together that make up a part of a percentage total. This would help if you’re trying to determine what products make up the core business of your company. You could make a tier a of your best-selling products, a tier of well performing products, and a tier for all other products. These products could make up 70%, 20%, and 10% of total sales, respectively. The three segmentations are what would make the “ABC” pattern.

4) Using Variables in DAX

This concept might be the easiest to understand for most readers, as presumably most people reading this have probably taken an algebra class at some point in their lives. If you understand variables in algebra, you should be able to understand what you can potentially do with variables in DAX, especially for those who have done any programming previously. A variable is defined in DAX by typing VAR, followed by the variable name and expression, and then concluded with RETURN. Here’s an example of what using variables in DAX may look like:

Var Formula

The above DAX could theoretically be written without variables, however without variables the formula would be much harder to interpret. This leads me into some of the reasons why DAX variables are useful. As shown above, they improve the readability of the code since you can assign names to expressions which makes the components of the formula more descriptive. Variables also allow you to breakdown the overall DAX expression, which make it easier to debug as you can run each variable individually. The last thing I’ll say about variables is that they also improve the performance of your DAX, as the value of a variable is computed only once. Alternatively, to achieve the same results from a DAX expression without variables you would need to repeat the same sub-expression throughout your DAX formula which would mean multiple executions of the sub-expression.

5) Time Intelligence Functions

We’ll finish off on a topic that is a paramount ingredient to almost any business report: Time Intelligence. These types of functions are important because they allow the user to create data calculations and visualizations broken down by year to date, quarter to date, month to date, and so on.  Within Power BI, there are 35 built in Time Intelligence functions for user convenience. However, the fun thing about DAX is that it allows you to create your own custom time intelligence functions. This comes in handy when working with time periods that may not be common throughout the general business world but may be common for your specific business or domain.

Conclusion

To truly understand the above topics, you will require hands-on experience with Power BI, so if you were confused by any of the above topics that would be expected as I only covered them briefly. At this point it’s expected the reader will have more questions about DAX than they did before they read this article. The main goal here is to encourage readers to dig deeper into DAX and Power BI. So, I encourage you to do so by either picking up a copy of Matt Allington’s book Supercharge Power BI or by exploring the deep depths of the World Wide Web.





Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*