Why Did Microsoft Rename Datasets to Semantic Models?

In this article I’m going to talk about the recent change that Microsoft has made in renaming the underlying database used in Power BI from the term “dataset” to “semantic model” (or as I would call it, “semantic data model”).


The Word “Semantic”

Let me start by talking about the word “semantic”. This is probably a word you’ve heard before in a different context.  Often when people use different words to say the same thing, the difference in language is often referred to as “semantics”.  “We’re saying the same thing, it’s only semantics”.

Semantic Def

OK, so how does that relate to Power BI, I hear you ask. Well to explain that, I need to first talk about the difference between model based reporting tools and report based reporting tools.

Power BI is a Model Based Tool

Unlike many other BI tools on the market, Power BI is a “model based” tool. If you compare Power BI with SQL Server Reporting Services (SSRS) for example, SSRS first requires you to load the data needed for your specific SSRS report and then build the report off that subset of loaded data; a dataset if you will.  Dataset is a great name for the data used in an SSRS report because SSRS is a report based tool.  Power BI on the other hand encourages you to first build a model and then create one or many reports on top of that model. You are encouraged to load a superset of data you need and then build reusable assets and logic into the tool, such as business relationships between your business entities (e.g. Customers, Products, Sales), write reusable formulas (Measures) to calculate business metrics, and anything else that you will need to build a report for today but also for reports in the future. With Power BI, you build a reusable data model; build once use many times. For this reason, the term dataset was never a very good name for the underlying database in Power BI.

Power BI is not just any bog standard database; it is a tabular data base, and tabular databases are not the same as traditional databases such as SQL Server.

Tabular Databases Are Complex


Tabular databases are a relatively new class of database that have only been around since about 2006 or so. The Microsoft, in-memory tabular database is called Vertipaq.  When Vertipaq is deployed as part of Microsoft Excel, it is referred to as Power Pivot for Excel.  When it’s used in SQL Server Analysis Services or Power BI, it’s often referred to as Vertipaq, xVelocity, or sometimes simply SSAS Tabular.  You can read a bit of the early history of project Gemini here.  There is also a great podcast featuring Amir Netz (talking to Kasper de Jonge) if you would like to watch that for a fascinating back story.

Unlike a traditional database (eg SQL Server), if you were to peek inside the Vertipaq database to try and look at the stored data, you would not see anything that made any sense to a human. The Vertipaq tabular database has all sorts of storage tricks to improve data compression, data retrieval speed as well as other database and reporting optimisations. One of the consequences of the way data is stored in a tabular database is that mortal humans like you and I cannot possibly know how to load and interact with the data, let alone understand how it works.  Microsoft realised it was not going to be possible for end users to interact with the underlying database and hence it developed a semantic data modelling layer for Power BI from the very beginning.

What is a Semantic Data Layer?

A good analogy for a semantic data layer can be found in Power Query.  When a user goes into Power Query, the user is able to leverage a graphical user interface (GUI) via the inbuilt menu system to complete most, if not all data transformation tasks required.  The end user doesn’t need any understanding of the M programming language to be able to create Power Query queries because the GUI/menu system separates the complexities of the M Language from the people that need to use it.

In a similar sense, the semantic data layer is a method of allowing end users to “model” data loaded into Power BI without the need to understand the technicalities of the underlying Vertipaq database.  The semantic data model ‘hides’ the technical details allowing the user to add business ‘semantic’ concepts to build the model using a simple modelling UI (the model view).  Sales are related to Customers, and Products, and Time; Sales are calculated by adding the total of the [Sales Amount] column, etc.  The Power BI modeller can use the semantic data layer to build the Vertipaq database using familiar (semantic) concepts such as tables, relationships and formulas, and yet at the same time end up with a highly performant tabular database that would otherwise be out of reach for all but the most technically competent user.

Semantic Layer

The Semantic (Data) Model is a Great Name

I think the semantic model is a great name for the underlying database that exists in Power BI Desktop (and also online, once it is loaded into the Power BI service). It creates awareness that this is not just a simple set of data (dataset).   I also believe that this better name will encourage a new age of data analysts to dig more deeply into the technology and learn at a level that they otherwise may not have been motivated to pursue.

Microsoft has stated that it has renamed the dataset to the semantic model as part of is build out of Microsoft Fabric.


Of course, this is true, but for the reasons I have covered above, I am sure the semantic model, or semantic data model, is a much better name, anyhow.

About Matt

Matt Allington has more than 10 years’ experience working with Power BI and component technologies.  In July 2023, Matt’s business was acquired by Agile Analytics.  Matt and the team are available to help you with your consulting projects and training needs, no matter how big or small.  Feel free to make contact if you need some expert help.

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.