Why reduce steps of our power query?
The first thing we are going to talk about before seeing the tips is to understand why this action is necessary. The answer is focused on two benefits with equal importance.
- Performance: although it may be absurd to improve speed because at the apply a single script is generated and at the end of the day it would execute the same functions, the fact of not having them consolidated in the same steps can generate worse performance. Like any other query language e.g. SQL, the response time depends on how the query logic generated. We can measure it or watch it in the following video: https://www.youtube.com/watch?v=2bzJFrJPg_c
- Maintenance: an issue that is left aside a lot and is brutally important is maintenance. Generating a minimum of concrete steps exponentially helps the ease of reading an ETL. Imagine that having to read 15 “Change Data Type” to find the error in a column that no longer exists would be more complex than if we only had a single step “Change Data Type”. Not only to execute actions but also to understand what the engine is doing at a glance.
Now that we know the importance of reducing steps of the query editor, let’s look at some examples. It should be admitted that 80% of cases of excessive steps occur for two reasons. On the one hand the “rename” of our columns and on the other the “change type of data” of our columns. Those two are usually the ones that introduce the most garbage in the editor. One way to avoid this is to think about the logic of the query first and then execute steps instead of clicking on everything we can think of.
IMPORTANT: The key to getting into a column reduction habit is to have the formula bar ALWAYS visible in the query editor.
Change data type
This step should only apply if our engine does not have differentiated type and if possible only once. There are engines, such as databases, that will import the data already differentiated avoiding the step. But there are others that will not and it is important that each column is with its data type. Let’s start by avoiding the autogeneration of this step by PowerBi so that it does not generate the step automatically because otherwise it executes it at the beginning and after actions such as “Split Column”. This can be disabled in the desktop configuration as shown in the following image:
One of the classic needs of converting data types is after generating an add custom column. After adding a custom column, the engine usually assigns the data type “Any ABC123”. The important thing is to pay attention to what our function does to find these tips. Let’s see the example from a column date we want a columns with the previous month:
Notice that when interpreting the code the description of our function indicates that there is a parameter that we do not use. The last parameter specifies “columnType as nullable”. When we create a custom column we can adjust the data type. We could specify that it is type date as follows
Rename columns
The need to maintain a good semantic layer understandable to our users usually leads us to rename our columns a lot. Like the rate change I will say that this step should ideally be executed once or none.
Next the previous logic we will pay attention to the code of the functions to find that the functions usually demand the name of the column after a transformation which would prevent us from generating new steps of “Rename Columns”. Let’s look at some examples:
By adding a custom column I can change the name, but if I forget I can see in the code where I change it
Similar to the previous one of a new custom column, most actions under the “Add Column” tab usually have a way to change their name. An example of auto-generated steps such as a two-column “Subtraction”:
As we can see there we could quietly replace in the code the text “Substraction” that was generated automatically by the desired name instead of generating a new step.
A classic example to rename is splitting columns by delimiter. for example if you want to split my name column that is in the format “last name,first name” into two columns the step will look like this
In the code we see again that at the end it indicates the name that the columns will have after dividing the original “Name”. There you could replace Name.1 with Last Name and Name.2 with First Name.
As there are many more, I close the “rename” with another classic that would be “Expand records”. When we link tables, we read APIs, etc. It often happens that we have to expand records. The expansion step has two important details. On the one hand choose if we want to keep the name of the table / record as a prefix and as always at the end of the Power Query code we will see the names that would be given to a column if we want to change them.
Already beforehand ask us for the prefix. If we did not realize we could also change it as in the “split example”:
Reorder columns
There are completely unnecessary steps that do not add value to our final query but do take away performance. This is the case with reordering columns. This action is merely aesthetic and only valid for work in the query editor since once applied and closed the Power Bi editor will put them in alphabetical order in our view of dragging fields. Unless we have operations that are very difficult to understand if we don’t have the fields close to each other, this operation ends up being a performance cost for the query.
Use first row header
Reading the structure of the functions we use can help in many ways as we could see in the change of data type. Another option that can help us is to use the first row of data as a header (Promote Headers). When we connect to Excel it usually happens that the engine adds a row to recognize the headers. If the connection is to an xlsx file we can avoid that step since the Excel.Workbook function has a parameter to read them. How did we know? reading the function. Let’s see the following image:
We could change that null to true so that the result of clicking the desired Sheet or Table already incorporates the first row as Headers without having to create another step.
Regroup same type of steps
Before starting to click by going column by column, it is necessary to analyze the entire table and think about the logic we want to implement. That way we concentrate on performing a common action for the entire table and not repeating steps column by column. It may seem the same for someone who is not used to going column by column renaming and changing the type, but it would be best to do the same type of step for all columns once since it would cause them to accumulate in the code instead of generating many steps in order to move from the image on the left to the right.
If we first (or last) made all the rename, the code would be like this:
We could even directly edit the code if we wanted to. Same case for the data type the code would look like this
If we do not want to enter the code, we already perform the renaming and after working for a while we remember that we forgot a column rename. We can avoid the generation of ANOTHER step if we go to that renamed step with one click. If we execute the same step as the selected step, the power query engine will ask if we want to create a new step or generate the code within it.
This could be applied to different types of steps such as “Filter rows”. We could filter rows by different column conditions in the same step instead of having several steps that filter rows.
Other possibilities
Along with these tips I’m sure there are many more. The more we know the M language, the more tips we will know and we could reach more complex solutions. For example, if we want to extract the name of the month from a column “Period” with the format “2022-05” it would be enough to create a new column converting that period into a date and extracting the expected format:
Date.ToText(
Date.From([Period]), "MMMM"
)
That way it was not necessary to duplicate period to convert it to date to be able to extract the name of the month.
NOTE: Remember that if we have the possibility to execute a native query to the data source, we could reduce our query. For example, filtering rows with a SQL WHERE instead of generating several steps.
I hope this helps you reduce the steps of your querys so that they are maintainable in the future. What about you? What tips do you have to reduce steps in the query editor?
Original post in spanish from ladataweb: https://blog.ladataweb.com.ar/post/689587368847671296/powerquery-tips-para-reducir-pasos-en-editor-d…
Be the first to comment