Non-negative polynomial approximation
Chebyshev interpolation is a well-studied technique for approximating functions or data with a polynomial. Efficient techniques for computing these approximations as well as corresponding error bounds are abundantly available in literature. Usually no restrictions on the approximating polynomial are imposed. However, there are cases in which we need this polynomial to be nonnegative – whether that is because the resulting approximation is unphysical, needs to be taken the square root of, or due to some other circumstances. We present a small collection of approaches which can be taken to compute such nonnegative approximations.
Based on a recent collaboration with Daniel Kressner, Haoze He, and Hei Yin Lam.
Motivating example from linear algebra
Suppose we want to approximate a matrix function
which approximates
Positive semidefiniteness
A desirable property for a matrix is positive semidefiniteness, i.e. all its eigenvalues
Chebyshev interpolation as the backbone
The goal of this section is to introduce Chebyshev interpolation of a general function
Chebyshev polynomials
The Chebyshev polynomials are defined as
where
Polynomial interpolation
The coefficients of any degree
of degree
Chebyshev nodes
Intuitively, one would probably spread the nodes more or less evenly over the interval
have some particularly desirable properties: By the definition of the Chebyshev polynomials
This exactly matches the definition of a discrete cosine transform (DCT) [Plonka et al., 2018], which can be efficiently computed with a fast Fourier transform algorithm. Hence, we can determine the function evaluations
Numerically computing the Chebyshev interpolant
We will now have a look at how Chebyshev interpolants can be computed numerically in Python.
# Package imports
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt
The scientific computing package NumPy provides convenient implementations of Chebyshev polynomials in the numpy.polynomal.chebyshev
module. The interpolate
function takes as an input a Python function handle and a degree, and returns the coefficients of the corresponding Chebyshev interpolant.
# Chebyshev interpolation coefficients with NumPy
def chebyshev_interpolation_numpy(f, m):
chebyshev_polynomial = np.polynomial.chebyshev.Chebyshev.interpolate(f, m)
mu = chebyshev_polynomial.coef
return mu
The coefficients of the Chebyshev interpolation can also be computed “by hand”, using the inverse of the discrete cosine transform (DCT) as we have seen in the previous section. We use the implementation of the inverse DCT idct
from the fft
module in the advanced scientifc computing package SciPy.
# Chebyshev interpolation coefficients with DCT
def chebyshev_interpolation_dct(f, m):
chebyshev_nodes = np.cos(np.pi * np.arange(m + 1) / m)
mu = sp.fft.idct(f(chebyshev_nodes), type=1)
mu[1:-1] *= 2 # rescale coefficients due to type-1 DCT convention in SciPy
return mu
Let us now test the two implementations on a simple example: the Chebyshev interpolants of degree
for
# Compute and visualize Chebyshev interpolants
def evaluate_interpolant(coefficients, evaluation_points):
return np.polynomial.chebyshev.Chebyshev(coefficients)(evaluation_points)
# Chebyshev interpolation of a bell-curve
sigma = 0.1
f = lambda x: np.exp(- (x / sigma) ** 2)
m = 10
mu_numpy = chebyshev_interpolation_numpy(f, m)
mu_dct = chebyshev_interpolation_dct(f, m)
# Visualize Chebyshev interpolant by evaluating them in 1000 points
x_lin = np.linspace(-1, 1, 1000)
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct, x_lin), label="DCT interpolant")
plt.plot(x_lin, f(x_lin), linestyle="--", label="exact function")
plt.legend()
plt.show()
Wait… Should they not give the same result? No. While looking into the source code of the numpy.polynomial.chebyshev
module, I found that unlike in the sandard reference on Chebyshev interpolation, the NumPy implementation interpolates the function at the Chebyshev nodes of the first kind
To quickly confirm this observation, we can also implement the DCT-based Chebyshev interpolation for the Chebyshev nodes of the first kind.
# Compute and visualize Chebyshev interpolants with nodes of the first kind
def chebyshev_interpolation_dct_firstkind(f, m):
chebyshev_nodes = np.cos(np.pi / 2 * (2 * np.arange(m + 1) + 1) / (m + 1))
mu = sp.fft.idct(f(chebyshev_nodes), type=3)
mu[1:] *= 2 # rescale coefficients due to type-3 DCT convention in SciPy
return mu
mu_dct_firstkind = chebyshev_interpolation_dct_firstkind(f, m)
# Visualize Chebyshev interpolant
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct_firstkind, x_lin), label="DCT interpolant (1st kind)")
plt.plot(x_lin, f(x_lin), linestyle="--", label="exact function")
plt.legend()
plt.show()
Great! Our implementation now gives the exact same result as NumPy obtains. But for simplicity and the theoretical analysis, we will switch back to the usual Chebyshev nodes. Let us now have a look at how our implementation compares to NumPy in terms of run-time. To do so, we interpolate the same bell curve with a Chebyshev interpolant of degree
m = 1000
%timeit chebyshev_interpolation_numpy(f, m)
%timeit chebyshev_interpolation_dct(f, m)
5.64 ms ± 166 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
23.8 μs ± 205 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
What? It turns out that our implementation with the DCT is more than 100 times faster than the NumPy implementation. This is due to the fact that the DCT of a length
Approximation error
The convergence of the interpolation error of the Chebyshev interpolants with the degree
Theorem [Trefethen, 2019]
Let
be the Chebyshev interpolation of degree of the smooth function . Then for the Bernstein ellipse
, .
The Bernstein ellipse is the ellipse with foci
Hence, if we let
Corollary
Let
for some and its Chebyshev interpolant of degree . Then for all and all it holds
So, the smaller
Let us now try to see how our corollary performs numerically. In particular, since we are free to use any
# Plot of interpolation error convergence
def theoretical_bound(m, epsilon, sigma):
return 4 * (1. + epsilon) ** (-m) * np.exp((epsilon / sigma) ** 2) / epsilon
m_list = 10 * np.arange(1, 40)
errors, bounds_1, bounds_2, bounds_3 = [], [], [], []
f_exact = f(x_lin)
for i, m in enumerate(m_list):
mu = chebyshev_interpolation_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_interp - f_exact)))
bounds_1.append(theoretical_bound(m, 0.1, sigma))
bounds_2.append(theoretical_bound(m, 0.2, sigma))
bounds_3.append(theoretical_bound(m, 0.5, sigma))
plt.semilogy(m_list, errors, marker="o", label=r"interpolation error")
plt.semilogy(m_list, bounds_1, label=r"bound for $\varepsilon = 0.1$")
plt.semilogy(m_list, bounds_2, color="#F98125", linestyle="--", label=r"bound for $\varepsilon = 0.2$")
plt.semilogy(m_list, bounds_3, color="#F98125", linestyle=":", label=r"bound for $\varepsilon = 0.5$")
plt.ylim(1e-16, 1e2)
plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.legend()
plt.show()
As we can see, the interpolation error is indeed bounded from above by the bounds from the corollary. It turns out that smaller
Nonnegative Chebyshev approximation
Now that we have discussed Chebyshev interpolation both theoretically and in practice, we will shift our focus to the main topic of this post: nonnegative Chebyshev approximation. To this extent, we will present four methods for nonnegative Chebyshev approximation, of which we find one particularly suitable for this task.
Method 1: Jackson damping
There is quite an old strategy for obtaining a nonnegative Chebyshev approximation. It is called Jackson damping, and as the name suggests, we apply some damping to the coefficients of the Chebyshev interpolant [Braverman et al., 2021]. Given a Chebyshev interpolant of a nonnegative function, we can enforce non-negativity of the Chebyshev interpolation by multiplying its coefficients
to get the damped Chebyshev interpolation coefficients
Despite this complicated looking definition of the Jackson coefficients, it turns out that computing all of them is essentially a one-liner in Python:
# Apply Jackson damping to coefficients
import functools
def damping_coefficients(m):
assert(m % 4 == 0)
gamma = functools.reduce(np.convolve, 4*[np.ones(m // 2 + 1)])[m:]
return gamma / gamma[0]
Let us quickly test Jackson damping numerically, and compare the approximation error of the damped with the standard Chebyshev interpolation.
# Comparison of Jackson damping with Chebyshev interpolation
errors, errors_nonneg = [], []
m_list = 8 * np.arange(1, 25)
for m in m_list:
mu = chebyshev_interpolation_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_exact - f_interp)))
mu_nonneg = damping_coefficients(m) * mu
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
plt.semilogy(m_list, errors, marker="o", label=r"Chebyshev interpolation")
plt.semilogy(m_list, errors_nonneg, marker="s", label=r"Jackson damping")
plt.ylabel(r"error")
plt.xlabel(r"degree $m$")
plt.legend()
plt.show()
This does not look promising; through the damping, we trade in the exponential convergence for reciprocal convergence.
Method 2: Féjér theorem
A second approach we consider on our journey to find a good nonnegative interpolant comes from a characterization of nonnegative polynomials given by the following theorem.
Theorem [Féjér, 1916]
Any nonnegative polynomial
of degree can be represented as
Conveniently, this nonnegative polynomial can quite easily be converted to an expression in terms of Chebyshev polynomials. To this extent, we first rewrite
Since
Splitting the sums yields
Using the definition of the Chebyshev polynomials
We can summarize this derivation in the following lemma.
Lemma
The Féjér representation
and the coefficients of the corresponding Chebyshev interpolation are related through
This is quite convenient. So we just need to find good coefficients
# Conversion from Féjér coefficients to Chebyshev coefficients
def convert_to_chebyshev_coefficients(c):
c_cheb = np.zeros_like(c)
c_cheb[0] = np.sum(np.abs(c) ** 2)
for j in range(1, len(c_cheb)):
c_cheb[j] = np.sum(c[:-j] * np.conj(c[j:]) + np.conj(c[:-j]) * c[j:])
return np.real(c_cheb)
Here it gets tricky. Obtaining coefficients
Given some coefficients
# Evaluation of approximation and approximation error from coefficients
def fejer_approximation(x, c, complex=False):
if complex:
# With complex coefficients, last half of 'c' go to imaginary part
c = c[:len(c) // 2] + 1j * c[len(c) // 2:]
m = (len(c) - 1) // 2
g = np.sum(c * np.exp(1j * np.outer(x, np.arange(-m, m + 1))), axis=1)
return np.abs(g) ** 2
def fejer_approximation_error(c, fun, a=0, b=2 * np.pi, n_test=100, complex=False):
diff = lambda x: np.abs(fun(np.cos(x)) - fejer_approximation(x, c, complex=complex))
return np.max(diff(np.linspace(a, b, n_test)))
Trying to find a minimum of this interpolation error through optimization may work for small degrees
# Direct solution of optimization problem
def chebyshev_coefficients_fejer(fun, m, c0=None, complex=False):
if c0 is None:
c0 = np.random.randn(2 * m if complex else m)
target = lambda c: fejer_approximation_error(c, fun, complex=complex)
c_opt = sp.optimize.minimize(target, c0, method="BFGS").x
if complex:
# With complex coefficients, last half of 'c' go to imaginary part
c_opt = c_opt[:len(c_opt) // 2] + 1j * c_opt[len(c_opt) // 2:]
c_cheb = convert_to_chebyshev_coefficients(c_opt)
return c_cheb
m_list = 2 * np.arange(20) + 3
errors = []
num_retries = 3
errors_fejer = np.empty((len(m_list), num_retries))
for i, m in enumerate(m_list):
mu = chebyshev_interpolation_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_exact - f_interp)))
for j in range(num_retries):
mu_fejer = chebyshev_coefficients_fejer(f, m=m, complex=True)
f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
errors_fejer[i, j] = np.max(np.abs(f_exact - f_interp_fejer))
plt.semilogy(m_list, errors, marker="o", label="Chebyshev interpolation")
plt.semilogy(m_list, np.mean(errors_fejer, axis=1), marker="s", label="Féjér approximation")
plt.fill_between(m_list, np.min(errors_fejer, axis=1), np.max(errors_fejer, axis=1), color="#F98125", alpha=0.2)
plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.legend()
plt.show()
Surprisingly, minimizing the approximation error in the Féjér representation works even better than Chebyshev interpolation for small degrees. However, once we increase the interpolation degree, the results are quite bad.
Method 3: Squaring the square root
There is one convenient trick, which leads to a nonnegative interpolation: First, compute the Chebyshev interpolant of the square root of the function to half the degree
Then, square this Chebyshev interpolant, which can again be represented in terms of Chebyshev polynomials
Because the function is the square of another function, it is by definition nonnegative. And what is best is that due to the DCT-duality between evaluations of the Chebyshev interpolation and its coefficients, we can (exactly) square a Chebyshev interpolation in just
where the square is understood elementwise. This then leads us to the following algorithm for nonnegative Chebyshev approximation.
Algorithm (nonnegative Chebyshev approximation) [Matti et al, 2025]
Input: Function
, even degree Output: Coefficients
such that interpolates 1: Compute
for the Chebyshev interpolant of 2: Zero-pad the coefficients
3: Evaluate the Chebyshev interpolant with
4: Compute the coefficients
with
The Python implementation is quite simple.
# Implementation of nonnegative Chebyshev approximation
def square_chebyshev_interpolant(mu):
mu = np.append(mu, np.zeros(len(mu) - 1)) # Zero-padding
mu[1:-1] /= 2 # Rescale coefficients due to type-1 DCT convention in SciPy
nu = sp.fft.idct(sp.fft.dct(mu, type=1) ** 2, type=1)
nu[1:-1] *= 2 # Rescale coefficients due to type-1 DCT convention in SciPy
return nu
def chebyshev_approximation_nonneg(f, m):
f_sqrt = lambda x: np.sqrt(f(x))
mu_sqrt = chebyshev_interpolation_dct(f_sqrt, m // 2)
mu = square_chebyshev_interpolant(mu_sqrt)
return mu
What’s more, this nonnegative interpolation verifies a similar bound as the standard Chebyshev interpolation. For the example of the bell curve, we can derive the following bound.
Proposition
Let
for some and its Chebyshev interpolant of degree . For all and all it holds
This can be proved by first applying the theorem to the interpolation of the square root
and then realizing
Let us quickly see how our method numerically compares against the standard Chebyshev interpolation.
m_list = 8 * np.arange(1, 25)
errors, errors_nonneg = [], []
for m in m_list:
mu = chebyshev_interpolation_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_exact - f_interp)))
mu_nonneg = chebyshev_approximation_nonneg(f, m)
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
plt.semilogy(m_list, errors, marker="o", label=r"Chebyshev interpolation")
plt.semilogy(m_list, errors_nonneg, marker="s", label=r"nonnegative interpolation")
plt.ylabel(r"error")
plt.xlabel(r"degree $m$")
plt.legend()
plt.show()
Nice! We are not that far off from the standard Chebyshev interpolation in terms of the approximation error. And also in terms of computational time, we are just around 2 times slower, because we need to compute the DCT multiple times.
m = 1000
%timeit chebyshev_interpolation_dct(f, m)
%timeit chebyshev_approximation_nonneg(f, m)
23.3 μs ± 185 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
41 μs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Method 4: Féjér representation on steroids
Before we definitely discard the Féjér representation approach, let us briefly explore an interesting connection between this formulation and the Chebyshev representation. We observe that for the representation
So with the coefficients
we have
i.e. the Chebyshev interpolation of
m_list = 2 * np.arange(40) + 3
errors_nonneg, errors_fejer = [], []
for i, m in enumerate(m_list):
mu_nonneg = chebyshev_approximation_nonneg(f, m - 1)
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
# Generate good initial guess for optimization in Féjér representation
f_sqrt = lambda x: np.sqrt(f(x))
c_sqrt = chebyshev_interpolation_dct(f_sqrt, (m - 1) // 2)
c0 = np.zeros(m)
c0[(m-1) // 2 + 1:] = c_sqrt[1:] / 2
c0[:(m-1) // 2] = c_sqrt[::-1][:-1] / 2
c0[(m-1) // 2] = c_sqrt[0]
mu_fejer = chebyshev_coefficients_fejer(f, m=m, c0=c0, complex=False)
f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
errors_fejer.append(np.max(np.abs(f_exact - f_interp_fejer)))
plt.semilogy(m_list, errors_nonneg, marker="o", label="nonnegative interpolation")
plt.semilogy(m_list, errors_fejer, marker="s", label="Féjér on steroids")
plt.legend()
plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.show()
Interesting. By using good initial guesses, we can beat the nonnegative Chebyshev interpolation. However, the improvement is only minor, and not really worth considering in view of the much higher run-time of the optimization.
Conclusion
With the simple trick of interpolating the square root of a function and then squaring the resulting interpolant, the nonnegative Chebyshev approximation is a fast and accurate way of computing polynomial approximations of functions.