Renaming multiple Column Names in a single step using a PySpark Notebook – FourMoo | Power BI


Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

I found that it was easier to replace the column names one at a time, instead of trying to do it all at once. This also allowed me to troubleshoot to make sure I was getting the expected results.

Below is the code I used to replace the square backet with a blank value.

# Bulk Rename column Names replacing "]" with ""
for name in df.schema.names:
  df = df.withColumnRenamed(name, name.replace(']', ''))

#See Results
display(df)

If you want to replace your characters you just replace the ‘]’ with what you are looking to replace.

As shown below you can now see that the right square bracket has been removed.

I then replaced the left square bracket, but I also included the prefix in my data to also be replaced so that the final column name would be more meaningful.

Code for renaming the left square bracket

# Bulk Rename column Names replacing "Table[" with ""
for name in df.schema.names:
  df = df.withColumnRenamed(name, name.replace('Table[', ''))

#See Results
display(df)

Here is the result with the column renamed where it does not have the left or right square brackets.

In this blog post, I have shown a quick and effective way to rename multiple columns in my fabric notebook using the PySpark.

I hope you found this useful, it sure did save me a lot of time and effort.

Thanks for reading!



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*