REFRAME in Power BI Direct Lake


Power BI offers a new type of connection to Microsoft Fabric Lakehouse or Warehouse, called Direct Lake. The Direct Lake connection acts like DirectQuery and won’t need the data to be refreshed. However, the Power BI semantic model has refresh settings that can be turned on or off. In this article and video, you will learn about the Refresh settings for the Power BI semantic model that is connected using a Direct Lake connection, what that is, and why it is called Reframe.

Video

Direct Lake

Microsoft Fabric introduces a new mode of connection for Power BI, Direct Lake. In Direct Lake, the Power BI engine (also called the Vertipaq engine) reads data directly from the Parquet files of the tables in the Lakehouse or Warehouse instead of copying the data from those into its own proprietary file format.

The Direct Lake connection acts like DirectQuery as it won’t copy the data, but because the data is stored as a column inside the Parquet files, it performs close to Import Data.

To learn more about Direct Lake, read my article and video about it here;

REFRAME or Refresh

To understand the term REFRAME, let’s look at how Power BI engine acts when it connects to a table in Lakehouse or Warehouse.

The tables in the Lakehouse or Warehouse are stored in Delta Lake format. The Delta Lake format keeps the data in Parquet files, and with changes in the data, new Parquet files are likely generated as time passes. JSON files keep track of which Parquet files belong to which timestamp of changes.

When Power BI looks at a table based on the JSON file associated with it, it finds out which Parquet files to read the most up-to-date data from. Then, if the data gets updated and new Parquet files get generated, Power BI needs to change the Parquet files that it is looking at. In other words, Power BI needs to REFRAME the data to the new set of Parquet files. This process is called REFRAME.

REFRAME is different from Refresh. In Refresh, Power BI will create another copy of the data from the source. The Refresh process can take a long time, depending on the volume of the data and the transformations applied. Reframe, however, won’t copy the data. It would be just Power BI looking at another Parquet file and considering it. This process is usually very fast.

To learn more about the Delta Lake table structure, Parquet files, and the JSON files related to that, read my article and video here:

Refresh/Reframe settings in the Semantic Model

Although the term is REFRAME, in the settings of the semantic model that uses a Power BI Direct Lake connection, this setting is still called Refresh. But remember that if you are on a Power BI Direct Lake connection, having this setting on means that Power BI will reframe (meaning it will look at the new Parquet files when they are available with the new changes in the data).

When NOT to Reframe?

It makes sense to keep the REFRAME happening, which means setting the Refresh to ON. This ensures that the Power BI semantic model is kept updated when the data is updated in the Lakehouse or Warehouse.

However, there are also situations where you may not want the REFRAME to happen as soon as the data gets updated. Here is one of those scenarios:

Imagine that you have a large-scale ETL process, including multiple Dataflows and Data Pipelines that might take an hour or so to complete. In that process, you may have the data coming to stating Lakehouse first, and then data gets loaded into dimensions and then loaded to fact tables after some transformations, etc.

If, in the middle of such a process, the Power BI semantic model keeps reframing and looks at the freshest data in each table, then the data of the tables might not reconcile with each other correctly because some of them (your dimension tables) might be fully loaded. Some of them (your fact tables) may not. The report generated from such a semantic model will most likely show incorrect values.

It is more reliable to turn off the Reframe process (or turn off the refresh settings in the semantic model) for such scenarios and then add the Semantic Model refresh (which is reframing in this case) happening through an activity in the Data Pipeline after all the ETL process is done.

Summary

The Power BI Direct Lake connection keeps the data up-to-date using the REFRAME process. This feature is, by default, ON for the semantic model regardless of the default of the custom semantic model. However, there are scenarios in which you may want to turn it off and set the refresh through an activity in a data pipeline or using a scheduled refresh.

Here are some helpful links that you can use to study more;

Reza Rad

Trainer, Consultant, Mentor

Reza Rad is a Microsoft Regional Director, an Author, Trainer, Speaker and Consultant. He has a BSc in Computer engineering; he has more than 20 years’ experience in data analysis, BI, databases, programming, and development mostly on Microsoft technologies. He is a Microsoft Data Platform MVP for 12 continuous years (from 2011 till now) for his dedication in Microsoft BI. Reza is an active blogger and co-founder of RADACAD. Reza is also co-founder and co-organizer of Difinity conference in New Zealand, Power BI Summit, and Data Insight Summit.

Reza is author of more than 14 books on Microsoft Business Intelligence, most of these books are published under Power BI category. Among these are books such as Power BI DAX Simplified, Pro Power BI Architecture, Power BI from Rookie to Rock Star, Power Query books series, Row-Level Security in Power BI and etc.

He is an International Speaker in Microsoft Ignite, Microsoft Business Applications Summit, Data Insight Summit, PASS Summit, SQL Saturday and SQL user groups. And He is a Microsoft Certified Trainer.

Reza’s passion is to help you find the best data solution, he is Data enthusiast.

His articles on different aspects of technologies, especially on MS BI, can be found on his blog: https://radacad.com/blog.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*