You can use numpy.piecewise() to create the piecewise function and then use curve_fit(), Here is the code
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 15, 100)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
the output:

For an N parts fitting, please reference segments_fit.ipynb
Answer from HYRY on Stack OverflowYou can use numpy.piecewise() to create the piecewise function and then use curve_fit(), Here is the code
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 15, 100)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
the output:

For an N parts fitting, please reference segments_fit.ipynb
You can use pwlf to perform continuous piecewise linear regression in Python. This library can be installed using pip.
There are two approaches in pwlf to perform your fit:
- You can fit for a specified number of line segments.
- You can specify the x locations where the continuous piecewise lines should terminate.
Let's go with approach 1 since it's easier, and will recognize the 'gradient change point' that you are interested in.
I notice two distinct regions when looking at the data. Thus it makes sense to find the best possible continuous piecewise line using two line segments. This is approach 1.
import numpy as np
import matplotlib.pyplot as plt
import pwlf
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59,
84.47, 98.36, 112.25, 126.14, 140.03])
my_pwlf = pwlf.PiecewiseLinFit(x, y)
breaks = my_pwlf.fit(2)
print(breaks)
[ 1. 5.99819559 15. ]
The first line segment runs from [1., 5.99819559], while the second line segment runs from [5.99819559, 15.]. Thus the gradient change point you asked for would be 5.99819559.
We can plot these results using the predict function.
x_hat = np.linspace(x.min(), x.max(), 100)
y_hat = my_pwlf.predict(x_hat)
plt.figure()
plt.plot(x, y, 'o')
plt.plot(x_hat, y_hat, '-')
plt.show()

numpy.piecewise can do this.
piecewise(x, condlist, funclist, *args, **kw)
Evaluate a piecewise-defined function.
Given a set of conditions and corresponding functions, evaluate each function on the input data wherever its condition is true.
An example is given on SO here. For completeness, here is an example:
from scipy import optimize
import matplotlib.pyplot as plt
import numpy as np
x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12, 13, 14, 15], dtype=float)
y = np.array([5, 7, 9, 11, 13, 15, 28.92, 42.81, 56.7, 70.59, 84.47, 98.36, 112.25, 126.14, 140.03])
def piecewise_linear(x, x0, y0, k1, k2):
return np.piecewise(x, [x < x0, x >= x0], [lambda x:k1*x + y0-k1*x0, lambda x:k2*x + y0-k2*x0])
p , e = optimize.curve_fit(piecewise_linear, x, y)
xd = np.linspace(0, 15, 100)
plt.plot(x, y, "o")
plt.plot(xd, piecewise_linear(xd, *p))
The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).
"the process is iterated until possible convergence, which is not, in general, guaranteed"
In particular, the convergence or the result may depends on the first estimation of the breakpoints.
This is the method used in the R Segmented package.
Here is an implementation in python:
import numpy as np
from numpy.linalg import lstsq
ramp = lambda u: np.maximum( u, 0 )
step = lambda u: ( u > 0 ).astype(float)
def SegmentedLinearReg( X, Y, breakpoints ):
nIterationMax = 10
breakpoints = np.sort( np.array(breakpoints) )
dt = np.min( np.diff(X) )
ones = np.ones_like(X)
for i in range( nIterationMax ):
# Linear regression: solve A*p = Y
Rk = [ramp( X - xk ) for xk in breakpoints ]
Sk = [step( X - xk ) for xk in breakpoints ]
A = np.array([ ones, X ] + Rk + Sk )
p = lstsq(A.transpose(), Y, rcond=None)[0]
# Parameters identification:
a, b = p[0:2]
ck = p[ 2:2+len(breakpoints) ]
dk = p[ 2+len(breakpoints): ]
# Estimation of the next break-points:
newBreakpoints = breakpoints - dk/ck
# Stop condition
if np.max(np.abs(newBreakpoints - breakpoints)) < dt/5:
break
breakpoints = newBreakpoints
else:
print( 'maximum iteration reached' )
# Compute the final segmented fit:
Xsolution = np.insert( np.append( breakpoints, max(X) ), 0, min(X) )
ones = np.ones_like(Xsolution)
Rk = [ c*ramp( Xsolution - x0 ) for x0, c in zip(breakpoints, ck) ]
Ysolution = a*ones + b*Xsolution + np.sum( Rk, axis=0 )
return Xsolution, Ysolution
Example:
import matplotlib.pyplot as plt
X = np.linspace( 0, 10, 27 )
Y = 0.2*X - 0.3* ramp(X-2) + 0.3*ramp(X-6) + 0.05*np.random.randn(len(X))
plt.plot( X, Y, 'ok' );
initialBreakpoints = [1, 7]
plt.plot( *SegmentedLinearReg( X, Y, initialBreakpoints ), '-r' );
plt.xlabel('X'); plt.ylabel('Y');

[1]: Muggeo, V. M. (2003). Estimating regression models with unknown breakpoints. Statistics in medicine, 22(19), 3055-3071.
You could directly copy the segments_fit implementation
from scipy import optimize
def segments_fit(X, Y, count):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
Then you apply it as follows
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40)
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
This will give you (fx,fy) the corners your piecewise fit, let's plot it
import matplotlib.pyplot as plt
# show the results
plt.figure(figsize=(8, 3))
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.')
plt.legend(['fitted line', 'given points'])

EDIT: Introducing constant segments
As mentioned in the comments the above example doesn't guarantee that the output will be constant in the end segments.
Based on this implementation the easier way I can think is to restrict func(p) to do that, a simple way to ensure a segment is constant, is to set y[i+1]==y[i]. Thus I added xanchor and yanchor. If you give an array with repeated numbers you can bind multiple points to the same value.
from scipy import optimize
def segments_fit(X, Y, count, xanchors=slice(None), yanchors=slice(None)):
xmin = X.min()
xmax = X.max()
seg = np.full(count - 1, (xmax - xmin) / count)
px_init = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py_init = np.array([Y[np.abs(X - x) < (xmax - xmin) * 0.01].mean() for x in px_init])
def func(p):
seg = p[:count - 1]
py = p[count - 1:]
px = np.r_[np.r_[xmin, seg].cumsum(), xmax]
py = py[yanchors]
px = px[xanchors]
return px, py
def err(p):
px, py = func(p)
Y2 = np.interp(X, px, py)
return np.mean((Y - Y2)**2)
r = optimize.minimize(err, x0=np.r_[seg, py_init], method='Nelder-Mead')
return func(r.x)
I modified a little the data generation to make it more clear the effect of the change
import matplotlib.pyplot as plt
import numpy as np;
# mimic your data
x = np.linspace(0, 50)
y = 50 - np.clip(x, 10, 40) + np.random.randn(len(x)) + 0.25 * x
# apply the segment fit
fx, fy = segments_fit(x, y, 3)
plt.plot(fx, fy, 'o-')
plt.plot(x, y, '.k')
# apply the segment fit with some consecutive points having the
# same anchor
fx, fy = segments_fit(x, y, 3, yanchors=[1,1,2,2])
plt.plot(fx, fy, 'o--r')
plt.legend(['fitted line', 'given points', 'with const segments'])

You can get a one line solution (not counting the import) using univariate splines of degree one. Like this
from scipy.interpolate import UnivariateSpline
f = UnivariateSpline(x,y,k=1,s=0)
Here k=1 means we interpolate using polynomials of degree one aka lines. s is the smoothing parameter. It decides how much you want to compromise on the fit to avoid using too many segments. Setting it to zero means no compromises i.e. the line HAS to go threw all points. See the documentation.
Then
plt.plot(x, y, "o", label='original data')
plt.plot(x, f(x), label='linear interpolation')
plt.legend()
plt.savefig("out.png", dpi=300)
gives

It is a problem with types, you have to change the following line, so that the x is given as floats:
x = np.array([-3, -2, -1, 0, 1, 2, 3]).astype(np.float)
otherwise the piecewise_linear will might end up casting the types.
Just to be on the safe side you could also make the initial points float here:
popt_piecewise, pcov = curve_fit(piecewise_linear, x, y, p0=[0.1, 0.1, -5., 5.])
For completeness, I'll point out that fitting a piecewise linear function does not require np.piecewise: any such function can be constructed out of absolute values, using a multiple of np.abs(x-x0) for each bend. The following produces a good fit to the data:
def pl(x, x0, a, b, c):
y = a*np.abs(x-x0) + b*x + c
return y
popt_pl, pcov = curve_fit(pl, x, y, p0=[0, 0, 0, 0])
print(pl(x, *popt_pl))
Output is close to original y-values:
[ 8.90899998 5.828 2.74700002 -0.33399996 2.03499998 5.32
8.60500002]
To finish this up here, I'll share my own final solution to the problem. In order to stay close to my original question, you just have to define the vectorized function yourself and not use np.vectorize.
import scipy.optimize as so
import numpy as np
def fitfunc(x,p):
if x>p:
return x-p
else:
return -(x-p)
fitfunc_vec = np.vectorize(fitfunc) #vectorize so you can use func with array
def fitfunc_vec_self(x,p):
y = np.zeros(x.shape)
for i in range(len(y)):
y[i]=fitfunc(x[i],p)
return y
x=np.arange(1,10)
y=fitfunc_vec_self(x,6)+0.1*np.random.randn(len(x))
popt, pcov = so.curve_fit(fitfunc_vec_self, x, y) #fitting routine that gives error
print popt
print pcov
Output:
[ 6.03608994]
[[ 0.00124934]]
Couldn't you simply replace fitfunc with
def fitfunc2(x, p):
return np.abs(x-p)
which then produces something like
>>> x = np.arange(1,10)
>>> y = fitfunc2(x,6) + 0.1*np.random.randn(len(x))
>>>
>>> so.curve_fit(fitfunc2, x, y)
(array([ 5.98273313]), array([[ 0.00101859]]))
Using a switch function and/or building blocks like where to replace branches, this should scale up to more complicated expressions without needing to call vectorize.
[PS: the errfunc in your least squares example doesn't need to be a lambda. You could write
def errfunc(p, x, y):
return array_fitfunc(p, x) - y
instead, if you liked.]
