Tips and Tricks to Organize Jupyter Notebook Visualizations | by Matthew Andres Moreno | Jan, 2024


Optimize your data science workflow by automating matplotlib output — with 1 line of code. Here’s how.

Naming things is hard. After a long enough day, we’ve all ended up with the highly-descriptive likes of “graph7(1)_FINAL(2).png” and “output.pdf” Look familiar?

We can do better — and quite easily, actually.

When we use data-oriented “seaborn-esque” plotting mechanisms, the ingredients for a descriptive filename are all there. A typical call looks like this,

sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")

Right there we know we’ve got “total_bill” on the x axis, “timecolor coded, etc. So what if we used the plotting function name and those semantic column keys to organize the output for us?

Here’s what that workflow looks like, using the teeplot tool.

import seaborn as sns; import teeplot as tp
tp.save = {".eps": True, ".pdf": True} # set custom output behavior
tp.tee(sns.scatterplot,
data=sns.load_data("tips"), x="total_bill", y="tip", hue="time")

teeplots/hue=time+viz=scatterplot+x=total-bill+y=tip+ext=.eps
teeplots/hue=time+viz=scatterplot+x=total-bill+y=tip+ext=.pdf

We’ve actually done three things in this example — 1) we rendered the plot in the notebook and 2) we’ve saved our visualization to file with a meaningful filename and 3) we’ve hooked our visualization into a framework where notebook outputs can be managed at a global level (in this case, enabling eps/pdf output).

This article will explain how to harness the teeplot Python package to get better organized and free up your mental workload to focus on more interesting things.

I am the primary author and maintainer of the project, which I have used in my own workflow for several years and found useful enough to package and share more widely with the community. teeplot is open source under the MIT license.

teeplot is designed to simplify work with data visualizations created with libraries like matplotlib, seaborn, and pandas. It acts as a wrapper around your plotting calls to handle output management for you.

teeplot workflow in action

Here’s how to use teeplot in 3 steps,

  1. Choose Your Plotting Function: Start by selecting your preferred plotting function, whether it’s from matplotlib, seaborn, pandas, etc. or one you wrote yourself.
  2. Add Your Plotting Arguments: Pass your plotting function as the first argument to tee, followed by the arguments you want to use for your visualization.
  3. Automatic Plotting and Saving: teeplot captures your plotting function and its arguments, executes the plot, and then takes care of wrangling the plot outputs for you.

That’s it!

Next, let’s look at 3 brief examples that demonstrate: a) basic use, b) custom post-processing, and c) custom plotting functions.

In this example, we pass a DataFrame df’s member function df.plot.box as our plotter and two semantic keys: “age” and “gender.” teeplot takes care of the rest.

# adapted pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.box.html
import pandas as pd; from teeplot import teeplot as tp

age_list = [8, 10, 12, 14, 72, 74, 76, 78, 20, 25, 30, 35, 60, 85]
df = pd.DataFrame({"gender": list("MMMMMMMMFFFFFF"), "age": age_list})

tp.tee(df.plot.box, # plotter...
column="age", by="gender", figsize=(4, 3)) # ...forwa

teeplots/by=gender+column=age+viz=box+ext=.pdf
teeplots/by=gender+column=age+viz=box+ext=.png



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*