How to get Sempy (Semantic-link) to run when being triggered from a data pipeline which runs a Notebook in Fabric – FourMoo | Power BI


Below is where I had an error when trying to run a notebook via a data pipeline and it failed.

Below are the steps to get this working.

This was the error message I got as shown below.

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :

I first had to make sure that in the Fabric (Power BI Service) the persona was set to Data Engineering.

I then clicked on Environment to create a new environment.

I then gave my new Environment a name and clicked Create

I then needed to add the Semantic-Link (Sempy).

  1. I first clicked on Public Libraries.
  2. I then clicked on “Add from PyPI”
  3. And finally in the library I then typed in “semantic-link”, which then automatically selected the latest version.

I then clicked on Save and Publish.

It first saved and then confirmed the pending changes before publishing, I clicked on Publish all.

I then clicked on Publish in the next screen prompt.

I then clicked on View Progress to view the progress of the publish.

NOTE: This does take some time to complete so please be patient!

Once completed I could see my Environment in my Fabric Workspace

I then went into my Notebook and once it opened I clicked on Environment and changed it to my Environment “FourMoo_Sempy” as shown below.

I then got confirmation of the environment change.

Now in the first part of the code I needed to load the sempy using the code below.

# First need to install Semantic Link
%load_ext sempy

In my Notebook I am querying data from a semantic model and outputting it to a table called “Sales_Extract”

# Get the Power BI Workspace and Dataset

#Workspace Name
ws = "PPU Space Testing"
#Dataset Name
ds = "WWI Sales - Azure SQL Source - PPU - 4 Years - 2 Days"

# Reference: https://learn.microsoft.com/en-us/python/api/semantic-link-sempy/sempy.fabric?view=semantic-link-python#sempy-fabric-evaluate-measure
df = (
    fabric
    .evaluate_measure(
        workspace=ws,
        dataset=ds,
        groupby_columns=["'Date'[Yr-Mth]"],
        measure="Sales"
                
    )
)

# Convert to Spark DataFrame
sparkDF=spark.createDataFrame(df)
sparkDF.show()

#Table Name
table_name = "Sales_Extract"

#Write to Table
sparkDF.write.mode("append").format("delta").save("Tables/" + table_name)

Here is the table shown when testing to make sure that the notebook has run successfully.

In my data pipeline I used the Notebook transformation and configured it to use my notebook I created in the previous steps.

I then tested running my data pipeline and it ran successfully as shown below.

I then confirmed this in my Lakehouse table as shown below.

One additional item to show is if I wanted to use this Environment to be the default in your App Workspace, I would it by going into my Workspace settings.

I then did the following to change the default Environment.

  1. I expanded “Data Engineering/Science”
  2. I then clicked on “Spark settings”
  3. Next, I clicked on “Environment”
  4. The next step was to enable the option to set the default environment.
  5. Finally, I then selected my Environment as shown below “FourMoo_Sempy”

In this blog post I have shown how I created the environment to allow me to be able to run the Sempy (Semantic-link) python package when running it from a data pipeline.

I hope you found this useful and any comments or suggestions are most welcome.



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*