Optimization of Nonlinear Functions via Piecewise Linearization | by Tim Forster

Let us consider a nonlinear function f(x), where x is a continuous variable. We would like to find the minimum value of this function f(x) by changing our decision variable x. The optimization problem can mathematically be formulated in the following manner:

In most cases, the user comes across some constraints that need to be considered. That might for example be a minimum required temperature in a room, a constraint on how much pressure is put on a material, or the minimum amount of time you want/need to wait until you drink your next coffee.

These constraints come either in the form of equalities or inequalities. For the sake of simplicity, we will assume that we have only inequality constraints, described by g(x)≤0. We can therefore add this constraint to the optimization problem as as follows:

If the functions (f(x) and g(x), or only one of them) are nonlinear, we would need to use nonlinear solvers. We try to avoid this, which is where a linearization step comes into play. So let’s pause our discussion about optimization and first look at a technique that allows us to approximate a nonlinear function.

To visualize and better understand this part, consider a logarithmic function f(x)=log(x) in the range of x=[1,6]:

# Get some packages
import numpy as np
import matplotlib.pyplot as plt# Create nonlinear function
def f(x):
return np.log(x)
# Define lower and upper values and an x-range
xlb = 1
xub = 6
numx = 100
x = np.linspace(xlb, xub, numx)
y = f(x)
# Plot the function
plt.figure()
plt.plot(x, y, color='blue', linestyle='--', label='Nonlinear f(x)')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend(frameon=False, loc='lower right')
plt.show()

With this piece of code, we can produce the following plot:

**Figure 1.** Logarithmic function as a nonlinear example.

Now, we start with the definition of a linear function, which has the following general form (with slope a and an intersection b):

Functions are here to describe something. That could for example be a phyical behaviour or system. This system can probably be sampled, so we can choose our x and can observe what our f(x) is (independent of the fact if the system is linear or nonlinear). Example. If we cook Spaghetti in a pot of water, we could wait 1 minute (x1) and observe how well our Spaghetti are cooked (f(x1)). We can wait another 5 minutes (x2) and check again how well the Spaghetti are done (f(x2)). I guess you know what I mean. 😄 🍝

If we have an environment where we get some sample points out of it, we can use them to calculate the corresponding slope a and the intersection b of the line that connects those two points. Let’s say we have 2 points, f(x1) and f(x2). This means, since the linear model should hold in both points, we know that these two equations hold in both points as well:

We have two unknowns (a and b) and two equations, so we can solve for a and b:

Using these expressions, we can try to approximate our considered function f(x)=log(x) in the range of x=[1,6].

Let’s do that in Python. First, we take the lower and upper bounds as our two points to define the segment (remember, Python starts indexing at zero, so don’t be confused if it says “segment 0”, which is just our first considered segment):

num_segments = 1
x_segments = np.linspace(xlb, xub, num_segments+1)
for i in range(num_segments):
print(f'[+] Segment {i} goes from x={x_segments[i]:.2f} to x={x_segments[i+1]:.2f}.')

[+] Segment 0 goes from x=1.00 to x=6.00.

Then, we write a function that returns the slope and the intersection given two supporting points (sometimes also called “nodes”), namely x=[x1,x2] and y=[f(x1),f(x2)]:

def calc_slope_and_intersection(x, y):
slope = (y[1:] - y[:-1])/(x[1:] - x[:-1])
intersection = y[:-1] - slope*x[:-1]
return float(slope), float(intersection)

We then calculate the slopes and intersections and store the values in lists:

slope, intersection = [], []
for i in range(num_segments):
x_segment = x_segments[i:i+2]   # Get the x values for the segment
y_segment = f(x_segment)        # Sample from nonlinear environment
slope_, intersection_ = calc_slope_and_intersection(x_segment, y_segment)
slope.append(slope_)
intersection.append(intersection_)
print(f'[+] Linear function of segment {i}: x*{slope[i]:.2f}+({intersection[i]:.2f}).')

[+] Linear function of segment 0: x*0.36+(-0.36).

The slope is calculated to be a=0.36, where the intersection has the same value of b=0.36. This brings us to the following linear equation as approximation:

We can plot this together with the original logarithmic function:

# Function that creates a linear function from slope and intersection
def get_linear(slope, intersection, xlb, xub):
x_ = np.array([xlb, xub])
return slope*x_ + intersection# Plot the linear functions
plt.figure()
plt.plot(x, y, color='blue', linestyle='--', label='Nonlinear f(x)') # Nonlinear function
for i in range(num_segments):
x_segment = x_segments[i:i+2]
y_segment = get_linear(slope[i], intersection[i], x_segment[0], x_segment[1])
plt.plot(x_segment, y_segment, color='orange', linestyle='-', label='Linear segment' if i==0 else '__nolabel__')    # Linear segments
plt.plot(x_segment, y_segment, color='black', linestyle='', marker='x', label='Nodes' if i==0 else '__nolabel__')   # Nodes
plt.xlabel('x')
plt.ylabel('f(x)')
plt.legend(frameon=False, loc='lower right')
plt.show()

**Figure 2.** Linear approximation (solid orange line) of a log-function.

Well. That’s not very accurate. But it’s linear, and it goes through the two nodes we used. So in essence, it does what we want. Now we can sample the function a bit more often (use more of these nodes), which brings us to the piecewise linearization technique.

Let us split the entire x-range into n segments (i=1,2,…,n), and perform the above-shown linearization of the function f(x) in each segment. In the equations below, the notation N describes the set of these segments, meaning N={1,2,…,n}. For each linear function, we need its individual slope and intersection:

Since we can define a number of desired values for x and then sample the corresponding values for f(x) from our (unknown) function, we can again calculate the slopes and intersections. Let us therefore define some more x-values for the different segments (the nodes). Say we use n=3.

The same code snippets from above can be used, just adjust the variable num_segments to 3. The x-ranges of the segments and their equations are given as follows:

[+] Segment 0 goes from x=1.00 to x=2.67.
[+] Segment 1 goes from x=2.67 to x=4.33.
[+] Segment 2 goes from x=4.33 to x=6.00.[+] Linear function of segment 0: x*0.59+(-0.59).
[+] Linear function of segment 1: x*0.29+(0.20).
[+] Linear function of segment 2: x*0.20+(0.62).

**Figure 3.** Linear approximation (solid orange line) of a log-function (dashed blue line) using 3 segments.

Looks a bit better. We could further increase the number of segments, but living at limits is not always what we want, so let’s stick to n=3 for the time being. 😎

With this, we identified a possible approximation of the nonlinear function f(x). Let’s go back and replace this nonlinearity in our original optimization problem.

Again, the goal is to find the minimum of the function f(x), considering that it has to be above a given specification K. We can formulate this as shown above, where our inequality constraint g(x)≤0 is basically K-f(x)≤0:

Since we have linearized our function in segments, all segments the optimization problem have to be considered. This is possible by creating another variable z for each segment. This variable should be binary (either 0 or 1). It is 0 if the segment is inactive, and 1 if the segment is active (meaning our minimum solution we are looking for is located in this segment).

We can now sum up all the linear functions of the segments and multiply them with the corresponding z variable.

We also consider that our optimal solution can only be in one segment at the time. In other words, only one segment can be active, meaning the sum of all z variables has to be 1:

Or in a slightly different form:

Well, it looks like we did not a very good job. This problem formulation is now nonlinear (due to the multiplication of x with z in the objective function and the constraint) and even has integer variables (z).

However, there is a simple trick that allows us rewriting such a “pseudo bilinear term” (which is a nonlinearity). First, we define an auxiliary variable x-tilde, which is the multiplication of the continuous variable (x) with the binary variable (z):

Next, we need to define the domain in which x-tilde can be valid for the cases when z=0 (inactive segment) and z=1 (active segment). If the segment is inactive, our auxiliary variable needs to be zero. We achieve this by the following inequalities:

Where xmin and xmax are the lower- and upper “nodes” which we calculated above. You can easily verify that if z is zero, x-tilde is forced to be zero as well.

In addition, if z=1, the auxilary variable x-tilde should be bounded by the “true” continuous variable x:

It is worth to mention here that xub is the upper bound of the continuous variable x (not the ones of the intermediate nodes). With this, we can rewrite our optimization problem from above: