Efficient Coding in Data Science: Easy Debugging of Pandas Chained Operations | by Marcin Kozak | Nov, 2023


How to inspect Pandas data frames in chained operations without breaking the chain into separate statements

Debugging chained Pandas operations without breaking the chain is possible. Photo by Miltiadis Fragkidis on Unsplash

Debugging lies in the heart of programming. I wrote about this in the following article:

This statement is quite general and language- and framework-independent. When you use Python for data analysis, you need to debug code irrespective of whether you’re conducting complex data analysis, writing an ML software product, or creating a Streamlit or Django app.

This article discusses debugging Pandas code, or rather a specific scenario of debugging Pandas code in which operations are chained into a pipe. Such debugging poses a challenging issue. When you don’t know how to do it, chained Pandas operations seem to be far more difficult to debug than regular Pandas code, that is, individual Pandas operations using typical assignment with square brackets.

To debug regular Pandas code using typical assignment with square brackets, it’s enough to add a Python breakpoint — and use the pdb interactive debugger. This would be something like this:

>>> d = pd.DataFrame(dict(
... x=[1, 2, 2, 3, 4],
... y=[.2, .34, 2.3, .11, .101],
... group=["a", "a", "b", "b", "b"]
.. ))
>>> d["xy"] = d.x + d.y
>>> breakpoint()
>>> d = d[d.group == "a"]

Unfortunately, you can’t do that when the code consists of chained operations, like here:

>>> d = d.assign(xy=lambda df: df.x + df.y).query("group == 'a'")

or, depending on your preference, here:

>>> d = d.assign(xy=d.x + d.y).query("group == 'a'")

In this case, there is no place to stop and look at the code — you can only do so before or after the chain. Thus, one of the solutions is to break the main chain into two sub-chains (two pipes) in a…

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.