Data Derivatives in Power BI: A Python-based Inter…


Screenshot 2023-06-30 at 19.31.16.png

This article focuses on exploring the use of interpolation and approximation of data sets in Power BI utilizing Python.

Given a data set, as depicted in the following figure:

 

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])

 

 

An important aspect to be considered is that a portion of our y7 data contains zeroes, which will be crucial to bear in mind during further calculations.

The aim is to construct derivatives from the data for all values of x and y, inclusive of y2, y3, etc. To facilitate this, we commence by building an interpolation based on these data, from which we will then derive.

 

 

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate


x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y1 = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y_list = y_list[y_list > 0]
    
    x = np.resize(x, len(y_list))
    
    minx = np.min(x)
    maxx = np.max(x)
        
    
    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)
    
    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()
  
    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Cubic-spline interpolation for ' + str(i) + ' line')

    ax2.plot(xnew, yder2)

    plt.show()
    i = i+1

 

 

 

Following the implementation of the aforementioned method, we are presented with the subsequent graph. This illustrates the original data points along with the derivative values:

technolog_3-1688143298483.png

An observation of the derivative reveals a considerable irregularity, which is primarily due to the non-uniform arrangement of data points. This irregularity may impose challenges for any conclusions to be drawn from these data in the future.

Hence, we aim to improve the initial code to perform data approximation instead of interpolation, and then compute the derivative from the resulting function.

 

 

import scipy as sp
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
y = np.array([243, 265, 287, 313, 343, 373, 401, 435, 475, 519, 571, 630, 699])
y2 = np.array([244, 269, 291, 320, 350, 379, 406, 442, 482, 526, 578, 633, 703])
y3 = np.array([371, 379, 395, 411, 427, 445, 463, 485, 509, 533, 555, 583, 615])
y4 = np.array([374, 382, 399, 416, 433, 457, 482, 509, 533, 561, 587, 615, 645])
y5 = np.array([370, 382, 390, 398, 408, 420, 432, 446, 460, 476, 494, 514, 538])
y6 = np.array([370, 388, 398, 410, 424, 438, 452, 466, 482, 500, 518, 540, 564])
y7 = np.array([348, 378, 408, 454, 508, 566, 638, 720, 0  , 0  , 0  , 0  , 0  ])
y8 = np.array([351, 388, 430, 478, 528, 588, 660, 740, 0  , 0  , 0  , 0  , 0  ])
y9 = np.array([358, 378, 400, 424, 463, 502, 538, 584, 628, 680, 0  , 0  , 0  ])
y10 = np.array([363, 383, 402, 430, 469, 507, 545, 590, 633, 685, 0  , 0  , 0  ])
y11 = np.array([243, 269, 295, 320, 350, 383, 416, 453, 497, 545, 593, 662, 722])
y12 = np.array([243, 264, 287, 313, 339, 372, 405, 445, 486, 534, 582, 649, 711])

y_arr = np.concatenate((y,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12), axis = 0)
y_arr = np.reshape(y_arr, (12, 13))

i = 1

for y_list in y_arr:
    x = np.array([15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])
    y = y_list
    
    y = y[y > 0]
    
    x = np.resize(x, len(y))
    
    
    minx = np.min(x)
    maxx = np.max(x)
    
    d=3 # степень полинома
    fp, residuals, rank, sv, rcond = np.polyfit(x, y, d, full=True) # Модель
    f = sp.poly1d(fp) # аппроксимирующая функция
    y1=[fp[0]*x[i]**2+fp[1]*x[i]+fp[2] for i in range(0,len(x))] # значения функции a*x**2+b*x+c
    so=round(sum([abs(y[i]-y1[i]) for i in range(0,len(x))])/(len(x)*sum(y))*100,4) # средняя ошибка
    fx = np.linspace(x[0], x[-1] + 1, len(x)) # можно установить вместо len(x) большее число для интерполяции

    y_list = f(fx)

    tck = interpolate.splrep(x, y_list, s=0)

    xnew = np.arange(minx, maxx, 0.5)
    ynew = interpolate.splev(xnew, tck, der=0)
    yder2 = interpolate.splev(xnew, tck, der=1)

    fig, ax1 = plt.subplots()

    ax2 = ax1.twinx()

    ax1.plot(x, y_list, 'o', xnew, ynew, x, y_list, 'b')
    plt.title('Approximation for ' + str(i) + ' line')
    ax2.plot(xnew, yder2)

    plt.show()
    i = i + 1

 

 

The result is displayed below:

technolog_0-1688159925403.png

As demonstrated, employing an approximation function results in a significantly smoother derivative of the function.

We proceed by examining the implementation for y8:

technolog_1-1688159958631.png

Upon inspection, we notice that excessive data along the x-axis is eliminated, with no attempt to transition to zero.

Subsequently, to continue working with the code and set up the necessary visualizations, we transfer the scripts into Power BI, as displayed below:

technolog_2-1688159992781.png

The contrast between the two approaches – interpolation and approximation – can be vividly discerned here.

 



Source link

Be the first to comment

Leave a Reply

Your email address will not be published.


*