Non-negative Chebyshev expansion
Chebyshev interpolantion is a well-studied technique for approximating functions or data with a polynomial. Efficient techniques for computing these approximations as well as corresponding error bounds are abundantly available in literature. Usually no restrictions on the polynomial are imposed. However, there are cases in which we need to polynomial to be non-negative – whether that’s because the resulting approximation is unphysical, needs to be processed by e.g. taking the square root, or due to some other circumstances. We present a small collection of approaches which can be taken to compute such a non-negative interpolant.
Based on a collaboration with Daniel Kressner and Haoze He.
Motivating example from linear algebra
Suppose we want to approximate a matrix function $f(\boldsymbol{A})$ by a matrix polynomial. This can be done by first computing a degree $m \in \mathbb{N}$ polynomial
\[p(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_m x^m \approx f(x)\]which approximates $f$ well (e.g. a Chebyshev expansion), and by subsequently using the matrix $\boldsymbol{A}$ as its argument
\[p(\boldsymbol{A}) = a_0 + a_1 \boldsymbol{A} + a_2 \boldsymbol{A}^2 + \dots + a_m \boldsymbol{A}^m \approx f(\boldsymbol{A}).\]A desirable property for a matrix is positive semi-definiteness, i.e. all its eigenvalues $\lambda$ being non-negative. Since the eigenvalues of a matrix function are $f(\lambda)$, the result of applying a non-negative function $f \geq 0$ to a matrix $\boldsymbol{A}$ is always a positive semi-definite matrix $f(\boldsymbol{A}) \succeq 0$. However, a polynomial approximation $p$ to a non-negative function $f$ is not guaranteed to preserve its non-negativity: it could be that even if $f(\boldsymbol{A})$ is positive semi-definite, $p(\boldsymbol{A})$ is indefinite. This can be avoided by ensuring that the approximating polynomial $p$ is non-negative.
Chebyshev interpolation
The goal of this section is to introduce Chebyshev interpolation of a general function $f:[-1, 1] \to \mathbb{R}$ .
Chebyshev polynomials
The Chebyshev polynomials are defined as
\[T_l(x) = \cos(l \cdot \cos^{-1}(x)),~l=0, 1, \dots\]where $l$ is also the degree of the polynomial. They form a basis of the vector space of all polynomials. Therefore, any polynomial $p$ of degree $m$ can be expressed in terms of Chebyshev polynomials
\[a_0 + a_1 x + \cdots + a_m x^m = \mu_0 T_0(x) + \cdots + \mu_m T_m(x).\]Polynomial interpolation
The coefficients of any degree $m$ polynomial are uniquely defined the values of this polynomial in $m + 1$ distinct points $x_0, x_1, \dots, x_m \in [-1, 1]$. Hence, to compute a Chebyshev expansion of a function $f$, that is, a polynomial
\[f^{(m)} = \mu_0 T_0(x) + \cdots + \mu_m T_m(x) = \sum_{l=0}^{m} \mu_l T_l(x)\]of degree $m$ which agrees with $f$ in the points $x_0, x_1, \dots, x_m$, we can just evaluate the function $f$ in these points, also called nodes, and find the coefficients $\mu_0, \mu_1, \dots, \mu_m$ of the polynomial which takes identical values at these points. But how should these points be chosen?
Chebyshev nodes
Intuitively, the nodes should be spread more or less evenly over the interval $[-1, 1]$ to make sure we gather as much information about the function $f$ as possible. However, it turns out that due to stability, taking equidistant nodes within $[-1, 1]$ is not the best idea. Instead, the so-called Chebyshev nodes
\[x_i = \cos\left(\frac{\pi i}{m}\right),~i=0, 1, \dots, m\]have some particularly desirable properties: By the definition of the Chebyshev polynomials $T_0, T_1, \dots, T_m$, we see that
\[f(x_i) = f^{(m)}(x_i) = \sum_{l=0}^{m} \mu_l T_l(x_i) = \sum_{l=0}^{m} \mu_l \cos\left(\frac{\pi i l}{m}\right)\]This exactly matches the definition of a discrete cosine transform (DCT), which can be efficiently computed with a fast Fourier transform algorithm. Hence, we can determine the function evaluations $f(x_0), f(x_1), \dots, f(x_m)$ by computing the DCT of the coefficients $\mu_0, \mu_1, \dots, \mu_m$, and – more remarkably – the coefficients $\mu_0, \mu_1, \dots, \mu_m$ of the Chebyshev interpolant from the function values $f(x_0), f(x_1), \dots, f(x_m)$ with the inverse DCT.
Numerically computing the Chebyshev interpolant
We will now have a look at how Chebyshev interpolants can be computed numerically in Python.
The scientific computing package NumPy provides convenient implementations of Chebyshev polynomials in the numpy.polynomal.chebyshev
module. The interpolate
function takes as an input a Python function handle and a degree, and returns the coefficients of the corresponding Chebyshev interpolant.
# Chebyshev expansion coefficients with NumPy
import numpy as np
def chebyshev_expansion_numpy(f, m):
chebyshev_polynomial = np.polynomial.chebyshev.Chebyshev.interpolate(f, m)
mu = chebyshev_polynomial.coef
return mu
The coefficients of the Chebyshev expansion can also be computed “by hand”, using the inverse of the discrete cosine transform (DCT) as we have seen in the previous section. We use the implementation of the inverse DCT idct
from the fft
module in the advanced scientifc computing package SciPy.
# Chebyshev expansion coefficients with DCT
import scipy as sp
def chebyshev_expansion_dct(f, m):
chebyshev_nodes = np.cos(np.pi * np.arange(m + 1) / m)
mu = sp.fft.idct(f(chebyshev_nodes), type=1)
mu[1:-1] *= 2 # rescale coefficients due to type-1 DCT convention in SciPy
return mu
Let us now test the two implementations on a simple example: the Chebyshev interpolants of degree $m = 10$ of a bell curve
\[f(x) = \exp(- x^2 / \sigma^2)\]for $\sigma = 0.1$.
# Compute and visualize Chebyshev interpolants
import matplotlib.pyplot as plt
def evaluate_interpolant(coefficients, evaluation_points):
return np.polynomial.chebyshev.Chebyshev(coefficients)(evaluation_points)
# Chebyshev expansion of a bell-curve
sigma = 0.1
f = lambda x: np.exp(- (x / sigma) ** 2)
m = 10
mu_numpy = chebyshev_expansion_numpy(f, m)
mu_dct = chebyshev_expansion_dct(f, m)
# Visualize Chebyshev interpolant by evaluating them in 1000 points
x_lin = np.linspace(-1, 1, 1000)
plt.figure(figsize=(4, 3))
plt.plot(x_lin, f(x_lin), color="black", linestyle="--", label="exact function")
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), color="#2F455C", label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct, x_lin), color="#F98125", label="DCT interpolant")
plt.legend()
plt.show()
Wait… Should they not give the same result? No. While looking into the source code of the numpy.polynomial.chebyshev
module, I found that unlike in the sandard reference on Chebyshev interpolation, the NumPy implementation interpolates the function at the Chebyshev nodes of the first kind
To quickly confirm this observation, we can also implement the DCT-based Chebyshev expansion for the Chebyshev nodes of the first kind.
# Compute and visualize Chebyshev interpolants with nodes of the first kind
def chebyshev_expansion_dct_firstkind(f, m):
chebyshev_nodes = np.cos(np.pi / 2 * (2 * np.arange(m + 1) + 1) / (m + 1))
mu = sp.fft.idct(f(chebyshev_nodes), type=3)
mu[1:] *= 2 # rescale coefficients due to type-3 DCT convention in SciPy
return mu
mu_dct_firstkind = chebyshev_expansion_dct_firstkind(f, m)
# Visualize Chebyshev interpolant
plt.figure(figsize=(4, 3))
plt.plot(x_lin, f(x_lin), color="black", linestyle="--", label="exact function")
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), color="#2F455C", label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct_firstkind, x_lin), color="#F98125", label="DCT interpolant (1st kind)")
plt.legend()
plt.show()
Great! Our implementation now gives the exact same result as NumPy obtains. But for simplicity and the theoretical analysis, we will switch back to the usual Chebyshev nodes. Let us now have a look at how our implementation compares to NumPy in terms of run-time. To do so, we interpolate the same bell curve with a Chebyshev interpolant of degree $m = 1000$.
m = 1000
%timeit chebyshev_expansion_numpy(f, m)
%timeit chebyshev_expansion_dct(f, m)
6.27 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
37 μs ± 1.3 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
What? It turns out that our implementation with the DCT is more than 100 times faster than the NumPy implementation. This is due to the fact that the DCT of a length $m$ array can be computed in $\mathcal{O}(m \log(m))$ floating-point operations with a simplified variant of the fast fourier transform (FFT) algorithm. However, NumPy’s implementation explicitly assembles the Vandermonde matrix, which makes the algorithm run in $\mathcal{O}(m^2)$ operations. Ouch! Eager to find out why no one has tried to improve this implementation, I sent a message to the NumPy discussion mailing list discussing the issue. Turns out that NumPy’s polynomial module was made “for prototyping and educational purposes only”. Let’s keep our hands off it then…
Approximation error
The convergence of the interpolation error of the Chebyshev interpolants with the degree $m$ can be reflected the following theorem.
Theorem [Trefethen, 2019]
Let $f^{(m)}$ be the Chebyshev expansion of degree $m \in \mathbb{N}$ of the smooth function $f : [-1, 1] \to \mathbb{R}$. Then
\[\max_{x \in [-1, 1]} |f(x) - f^{(m)}(x) | \leq \frac{2}{\chi^m(\chi - 1)} \sup_{z \in \mathcal{E}_{\chi}} |f(z)|\]for the Bernstein ellipse $\mathcal{E}_{\chi}$, $\chi > 1$.
The Bernstein ellipse is the ellipse with foci $\pm 1$ and sum of half-axis $\chi$. Some simple computations show that any (complex) point $z = x + i y \in \mathcal{E}_{\chi}$ has imaginary part $|y| \leq (\chi - \chi^{-1}) / 2$, such that for the bell curve, we have
\[|e^{-z^2 / \sigma^2}| = |e^{-(x + iy)^2 / \sigma^2}| = \underbrace{e^{-x^2 / \sigma^2}}_{\leq 1} e^{y^2 / \sigma^2} \leq e^{y^2 / \sigma^2} \leq e^{(\frac{\chi - \chi^{-1}}{2\sigma})^2}\]Hence, if we let $\chi = 1 + \varepsilon$ for some $\varepsilon > 0$, then $\chi - \chi^{-1} \leq 2 \varepsilon$ and consequently, we end up with the following specialization of the theorem.
Corollary
Let $f(x) = e^{-x^2 / \sigma^2}$ for some $\sigma > 0$ and $f^{(m)}$ its Chebyshev interpolant of degree $m$. Then for all $m \in \mathbb{N}$ and all $\varepsilon > 0$ it holds
\[\max_{x \in [-1, 1]} |f(x) - f^{(m)}(x) | \leq (1 + \varepsilon)^{-m} \frac{e^{\varepsilon^2 / \sigma^2}}{\varepsilon}.\]
So, the smaller $\sigma$, i.e. the pointier the bell curve, the less accurate the interpolation is going to be. This makes sense, since it is in general quite hard to approximate strongly changing functions with polynomials.
Let us now try to see how our corollary performs numerically. In particular, since we are free to use any $\varepsilon > 0$ in the bound, let us observe how the choice of this parameter influences the quality of the bound.
# Plot of interpolation error convergence
def theoretical_bound(m, epsilon, sigma):
return (1. + epsilon) ** (-m) * np.exp((epsilon / sigma)**2) / epsilon
m_list = 10 * np.arange(40) + 1
errors = np.empty(len(m_list))
bounds_1 = np.empty(len(m_list))
bounds_2 = np.empty(len(m_list))
bounds_3 = np.empty(len(m_list))
f_exact = f(x_lin)
for i, m in enumerate(m_list):
mu = chebyshev_expansion_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors[i] = np.max(np.abs(f_interp - f_exact))
bounds_1[i] = theoretical_bound(m, 0.1, sigma)
bounds_2[i] = theoretical_bound(m, 0.2, sigma)
bounds_3[i] = theoretical_bound(m, 0.5, sigma)
plt.figure(figsize=(4, 3))
plt.plot(m_list, bounds_1, color="#F98125", linestyle="-", label=r"bound for $\varepsilon = 0.1$")
plt.plot(m_list, bounds_2, color="#F98125", linestyle="--", label=r"bound for $\varepsilon = 0.2$")
plt.plot(m_list, bounds_3, color="#F98125", linestyle=":", label=r"bound for $\varepsilon = 0.5$")
plt.plot(m_list, errors, color="#2F455C", marker="o", label=r"interpolation error")
plt.ylim(1e-16, 1e2)
plt.yscale("log")
plt.xlabel(r"interpolation degree $m$")
plt.legend()
plt.show()
As we can see, the interpolation error is indeed bounded from above by the bounds from the corollary. It turns out that smaller $\varepsilon$ give tighter bounds for smaller degrees $m$, while larger $\varepsilon$ give tighter bounds for larger degrees $m$. Hence, depending on whether we are interested in a low-degree or high-degree approximation, we can choose an appropriate $\varepsilon$ for a good indication of the interpolation error.
Non-negative Chebyshev interpolation
Now that we have discussed Chebyshev interpolation both theoretically and in practice, we will shift our focus to the main topic of this post: non-negative Chebyshev expansions. To this extent, we will present four methods for non-negative Chebyshev interpolation, of which we find one particularly suitable for this task.
Method 1: Jackson damping
There is quite an old strategy for obtaining a non-negative Chebyshev expansion. It is called Jackson damping, and as the name suggests, we apply some damping to the coefficients of the Chebyshev interpolation [Braverman et al., 2021]. Given a Chebyshev interpolation of a non-negative function, we can enforce non-negativity of the Chebyshev expansion by multiplying its coefficients $\mu_0, \mu_1, \dots, \mu_m$ by some Jackson coefficients
\[\gamma_k = \sum_{j = -\frac{m}{2} - 1}^{\frac{m}{2} + 1 - k} \left( \frac{m}{2} - |j| + 1 \right) \left( \frac{m}{2} - |j + k| + 1 \right).\]to get the damped Chebyshev expansion
\[\widetilde{\mu}_l = \frac{\gamma_k}{\gamma_0} \mu_l,~l=0,1,\dots,m.\]Despite this complicated looking definition of the Jackson coefficients, it turns out that computing all of them is essentially a one-liner in Python:
# Apply Jackson damping to coefficients
import functools
def damping_coefficients(m):
assert(m % 4 == 0)
gamma = functools.reduce(np.convolve, 4*[np.ones(m // 2 + 1)])[m:]
return gamma / gamma[0]
Let us quickly test Jackson damping numerically, and compare the approximation error of the damped with the standard Chebyshev interpolation.
# Comparison of Jackson damping with Chebyshev interpolation
errors = []
errors_nonneg = []
m_list = 8 * np.arange(1, 25)
for m in m_list:
mu = chebyshev_expansion_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_exact - f_interp)))
mu_nonneg = damping_coefficients(m) * mu
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
plt.figure(figsize=(4, 3))
plt.plot(m_list, errors, color="#2F455C", marker="o", label=r"Chebyshev interpolation")
plt.plot(m_list, errors_nonneg, color="#F98125", marker="s", label=r"Jackson damping")
plt.yscale("log")
plt.ylabel(r"interpolation error")
plt.xlabel(r"interpolation degree $m$")
plt.ylim(1e-16, 1e1)
plt.legend()
plt.show()
This does not look promising; through the damping, we trade in the exponential convergence for reciprocal convergence.
Method 2: Féjér theorem
A second approach we consider on our journey to find a good non-negative interpolant comes from a characterization of non-negative polynomials given by the following theorem.
Theorem [Féjér, 1916]
Any non-negative polynomial $p : [-1, 1] \to \mathbb{R}^{+}$ of degree $2m$ can be represented as
\[p(\cos(y)) = \left| \sum_{l=-m}^{m} c_l e^{ily} \right|^2.\]
Conveniently, this non-negative polynomial can quite easily be converted to an expansion in terms of Chebyshev polynomials. To this extent, we first rewrite
\[p(\cos(y)) = \left| \sum_{l=-m}^{m} c_l e^{ily} \right|^2 = \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} e^{i(l-k)y}.\]Since $\cos(y) = \cos(2\pi-y)$ we get
\[\begin{split} p(\cos(y)) &= \frac{p(\cos(y)) + p(\cos(2\pi-y))}{2} \\ &= \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} \frac{e^{i(l-k)y} + \overbrace{e^{i2\pi(l-k)}}^{=1}e^{-i(l-k)y}}{2} \\ &= \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} \cos((l-k)y). \end{split}\]Splitting the sums yields
\[p(\cos(y)) = \sum_{l = -m}^{m} |c_l|^2 \cos(0y) + \sum_{-m \leq k < l \leq m} (c_l c_k^{\ast} + c_k c_l^{\ast}) \cos((l-k)y).\]Using the definition of the Chebyshev polynomials $T_l(\cos(y)) = \cos(l y)$ for $l \geq 0$, we obtain
\[\begin{split} p(\cos(y)) &= \sum_{l = -m}^{m} |c_l|^2 T_0(\cos(y)) + \sum_{-m \leq k < l \leq m} (c_l c_k^{\ast} + c_k c_l^{\ast}) T_{l-k}(\cos(y)) \\ &= \sum_{j=0}^{2m} \mu_j T_j(\cos(y)). \end{split}\]We can summarize this derivation in the following lemma.
Lemma
The Féjér representation $| \sum_{l=-m}^{m} c_l e^{ily} |^2$ and the coefficients of the corresponding Chebyshev expansion $\sum_{j=0}^{2m} \mu_j T_j(\cos(y))$ are related through
\[\mu_j = \begin{cases} \sum_{l = -m}^{m} |c_l|^2 & j = 0 \\ \sum_{l = -m}^{m - j} (c_l c_{l + j}^{\ast} + c_l^{\ast} c_{l + j}) & 1 \leq j \leq 2m \\ \end{cases}\]
This is quite convenient. So we just need to find good coefficients $c_l$ which linearly combine some exponential functions, and use the formula from the lemma to give us a non-negative Chebyshev expansion. A Python function which implements this formula is given below.
# Conversion from Féjér coefficients to Chebyshev coefficients
def convert_to_chebyshev_coefficients(c):
c_cheb = np.zeros_like(c)
c_cheb[0] = np.sum(np.abs(c)**2)
for j in range(1, len(c_cheb)):
c_cheb[j] = np.sum(c[:-j]*np.conj(c[j:]) + np.conj(c[:-j])*c[j:])
return np.real(c_cheb)
Here it gets tricky. Obtaining coefficients $c_{-m}, \dots, c_m$ for a non-negative Chebyshev expansion comes down to solving the non-convex optimization problem
\[\min_{c_{-m}, \dots, c_{m} \in \mathbb{C}} \left\lVert f - \left| \sum_{l=-m}^{m} c_l e^{il \cdot} \right|^2 \right\rVert_{\infty}.\]Given some coefficients $c_{-m}, \dots, c_m$ of such an expansion, the above error can numerically be approximated by evaluating the difference at, say, $100$ points in $[-1, 1]$, and taking the maximum absolute deviation between the true function and its expansion.
# Evaluation of expansion and expansion error from coefficients
def expansion(x, c, complex=False):
if complex:
# With complex coefficients, last half of 'c' go to imaginary part
c = c[:len(c) // 2] + 1j * c[len(c) // 2:]
m = (len(c) - 1) // 2
g = np.sum(c * np.exp(1j * np.outer(x, np.arange(-m, m + 1))), axis=1)
return np.abs(g) ** 2
def expansion_error(c, fun, a=0, b=2*np.pi, n_test=100, complex=False):
diff = lambda x: np.abs(fun(np.cos(x)) - expansion(x, c, complex=complex))
return np.max(diff(np.linspace(a, b, n_test)))
Trying to find a minimum of this expansion error through optimization may work for small degrees $m$, but is completely hopeless for larger degree approximations due to much larger parameter space. We can best see this when we use a non-convex optimizer (e.g. BFGS) to compute optimal coefficients for increasing degrees $m$.
# Direct solution of optimization problem
def chebyshev_coefficients_fejer(fun, m, c0=None, complex=False):
if c0 is None:
c0 = np.random.randn(2*m if complex else m)
target = lambda c: expansion_error(c, fun, complex=complex)
c_opt = sp.optimize.minimize(target, c0, method="BFGS").x
if complex:
# With complex coefficients, last half of 'c' go to imaginary part
c_opt = c_opt[:len(c_opt) // 2] + 1j * c_opt[len(c_opt) // 2:]
c_cheb = convert_to_chebyshev_coefficients(c_opt)
return c_cheb
m_list = 2 * np.arange(20) + 3
num_retries = 3
errors = np.empty(len(m_list))
errors_fejer = np.empty((len(m_list), num_retries))
for i, m in enumerate(m_list):
mu = chebyshev_expansion_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors[i] = np.max(np.abs(f_exact - f_interp))
for j in range(num_retries):
mu_fejer = chebyshev_coefficients_fejer(f, m=m, complex=True)
f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
errors_fejer[i, j] = np.max(np.abs(f_exact - f_interp_fejer))
plt.figure(figsize=(4, 3))
plt.plot(m_list, errors, color="#2F455C", marker="o", label="Chebyshev interpolation")
plt.plot(m_list, np.mean(errors_fejer, axis=1), color="#F98125", marker="s", label="Féjér expansion")
plt.fill_between(m_list, np.min(errors_fejer, axis=1), np.max(errors_fejer, axis=1), color="#F98125", alpha=0.2)
plt.yscale("log")
plt.legend()
plt.xlabel(r"expansion degree $m$")
plt.ylabel(r"expansion error")
plt.show()
Surprisingly, minimizing the approximation error in the Féjér representation works even better than Chebyshev interpolation for small degrees. However, once we increase the expansion degree, the results are quite bad.
Method 3: Squaring the square root
There is one convenient trick, which leads to a non-negative interpolation: First, compute the Chebyshev interpolant of the square root of the function to half the degree $m/2$
\[\sqrt{f}^{(m/2)}(x) = \sum_{l=0}^{m/2} \mu_l T_l(x).\]Then, square this Chebyshev interpolant, which can again be represented in terms of Chebyshev polynomials
\[f_{\mathrm{nn}}^{(m)}(x) = (\sqrt{f}^{(m/2)})^2(x) = \sum_{l=0}^{m} \nu_l T_l(x).\]Because the function is the square of another function, it is by definition non-negative. And what’s best is that due to the DCT-duality between evaluations of the Chebyshev expansion and its coefficients, we can (exactly) square a Chebyshev expansion in just $\mathcal{O}(m \log(m))$:
\[[\nu_0, \dots, \nu_m] = \operatorname{DCT}^{-1}\{\operatorname{DCT}\{[\mu_0, \dots, \mu_{m/2}, 0, \dots, 0]\}^2\}\]where the square is understood elementwise. This then leads us to the following algorithm for non-negative Chebyshev expansion.
Algorithm (non-negative Chebyshev expansion)
Input: Function $f : [-1, 1] \to \mathbb{R}$, degree $m \in 2\mathbb{N}$
Output: Coefficients $\nu$ such that $\sum_{l=0}^{m} \nu_l T_l(x)$ interpolates $f$
1: Compute $[\mu_0, \dots, \mu_{m/2}]$ for the Chebyshev interpolation $\sum_{l=0}^{m/2} \mu_l T_l(x)$ of $\sqrt{f}$
2: Zero-pad the coefficients $\boldsymbol{\mu} = [\mu_0, \dots, \mu_{m/2}, 0, \dots, 0]$
3: Evaluate the Chebyshev expansion with $\boldsymbol{f} = \operatorname{DCT}{\boldsymbol{\mu}}$
4: Compute the coefficients $\boldsymbol{\nu} = [\nu_0, \dots, \nu_m]$ with $\boldsymbol{\nu} = \operatorname{DCT}^{-1}{\boldsymbol{f}^2}$
The Python implementation is quite simple.
# Implementation of non-negative Chebyshev interpolation
def square_chebyshev_expansion(mu):
mu = np.append(mu, np.zeros(len(mu) - 1)) # Zero-padding
mu[1:-1] /= 2 # Rescale coefficients due to type-1 DCT convention in SciPy
nu = sp.fft.idct(sp.fft.dct(mu, type=1) ** 2, type=1)
nu[1:-1] *= 2 # Rescale coefficients due to type-1 DCT convention in SciPy
return nu
def chebyshev_expansion_nonneg(f, m):
f_sqrt = lambda x: np.sqrt(f(x))
mu_sqrt = chebyshev_expansion_dct(f_sqrt, m // 2)
mu = square_chebyshev_expansion(mu_sqrt)
return mu
What’s more, this non-negative expansion verifies a similar bound as the standard Chebyshev expansion. For the example of the bell curve, we can derive the following bound.
Proposition
Let $f(x) = e^{-x^2 / \sigma^2}$ for some $\sigma > 0$ and $f^{(m)}$ its Chebyshev interpolant of degree $m$. For all $m \in 2\mathbb{N}$ and all $\varepsilon > 0$ it holds
\[\max_{x \in [-1, 1]} |f(x) - f_{\mathrm{nn}}^{(m)}(x) | \leq (1 + \varepsilon)^{-m/2} \frac{e^{\varepsilon^2/2\sigma^2}}{\varepsilon} (2 + (1 + \varepsilon)^{-m/2} \frac{e^{\varepsilon^2/2\sigma^2}}{\varepsilon}).\]
This can be proved by first applying the theorem to the expansion of the square root
\[| \sqrt{f}(x) - \sqrt{f}^{(m/2)}(x) | \leq \frac{2}{\chi^{-m/2}(1 + \chi)} \sqrt{\max_{z \in \mathcal{E}_{\chi}} | f(z) |},\]and then realizing $f = \sqrt{f}^2$ and $f_{\mathrm{nn}}^{(m)} = (\sqrt{f}^{(m)})^2$ which implies
\[\begin{split} | f - f_{\mathrm{nn}}^{(m)} | &= | \sqrt{f}^2 - (\sqrt{f}^{(m)})^2 | \\ &\leq | \sqrt{f} - (\sqrt{f}^{(m)}) | \max\{ \sqrt{f}, \sqrt{f}^{(m)} \} \\ &\leq | \sqrt{f} - (\sqrt{f}^{(m)}) | (2 \max\{ \sqrt{f}\} + | \sqrt{f} - (\sqrt{f}^{(m)}) |). \end{split}\]Let us quickly see how our method numerically compares against the standard Chebyshev interpolation.
errors = []
errors_nonneg = []
m_list = 8 * np.arange(1, 25)
for m in m_list:
mu = chebyshev_expansion_dct(f, m)
f_interp = evaluate_interpolant(mu, x_lin)
errors.append(np.max(np.abs(f_exact - f_interp)))
mu_nonneg = chebyshev_expansion_nonneg(f, m)
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
plt.figure(figsize=(4, 3))
plt.plot(m_list, errors, color="#2F455C", marker="o", label=r"Chebyshev interpolation")
plt.plot(m_list, errors_nonneg, color="#F98125", marker="s", label=r"non-negative expansion")
plt.yscale("log")
plt.ylabel(r"expansion error")
plt.xlabel(r"expansion degree $m$")
plt.ylim(1e-16, 1e1)
plt.legend()
plt.show()
Nice! We are not that far off from the standard Chebyshev interpolation in terms of the approximation error. And also in terms of computational time, we are just around 2 times slower, because we need to compute the DCT multiple times.
m = 1000
%timeit chebyshev_expansion_dct(f, m)
%timeit chebyshev_expansion_nonneg(f, m)
36.5 μs ± 3.44 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
54.2 μs ± 2 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Method 4: Féjér representation on steroids
Before we definitely discard the Féjér representation approach, let us briefly explore an interesting connection between this formulation and the Chebyshev representation. We observe that for the representation $p(\cos(y)) = | \sum_{l=-m}^{m} c_l e^{ily} |^2$ when $c_l \in \mathbb{R}$ and $c_l = c_{-l}$ we have
\[\sum_{l=-m}^{m} c_l e^{ily} = c_0 + \sum_{l=1}^{m} (c_l e^{ily} + c_{-l} e^{-ily}) = c_0 + 2 \sum_{l=1}^{m} c_l \cos(ly)\]So with the coefficients
\[\mu_l = \begin{cases} c_0 & \text{if $l=0$} \\ 2c_l & \text{if $l\geq 1$} \end{cases}\]we have
\[p(x) = ( \sum_{l=0}^{m} \mu_l T_l(x) )^2\]i.e. the Chebyshev expansion of $\sqrt{p(x)}$ will give us coefficients which can be used as initial guess in the optimization problem. Now this is quite cool, because we can build a two-phase algorithm to obtain a non-negative Chebyshev expansion: 1) compute the Chebyshev expansion of $\sqrt{f}$ and convert the expansion coefficients with help of the representation in the lemma above. 2) Use the coefficients as initial guess in the non-convex optimization algorithm with the same objective as above. Let’s see if that actually works in practice.
errors_nonneg = []
errors_fejer = []
m_list = 2 * np.arange(40) + 3
for i, m in enumerate(m_list):
mu_nonneg = chebyshev_expansion_nonneg(f, m - 1)
f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))
# Generate good initial guess for optimization in Féjér representation
f_sqrt = lambda x: np.sqrt(f(x))
c_sqrt = chebyshev_expansion_dct(f_sqrt, (m - 1) // 2)
c0 = np.zeros(m)
c0[(m-1) // 2 + 1:] = c_sqrt[1:] / 2
c0[:(m-1) // 2] = c_sqrt[::-1][:-1] / 2
c0[(m-1) // 2] = c_sqrt[0]
mu_fejer = chebyshev_coefficients_fejer(f, m=m, c0=c0, complex=False)
f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
errors_fejer.append(np.max(np.abs(f_exact - f_interp_fejer)))
plt.figure(figsize=(4, 3))
plt.plot(m_list, errors_nonneg, color="#2F455C", marker="o", label="non-negative expansion")
plt.plot(m_list, errors_fejer, color="#F98125", marker="s", label="Féjér on steroids")
plt.yscale("log")
plt.legend()
plt.xlabel(r"expansion degree $m$")
plt.ylabel(r"expansion error")
plt.show()
Interesting. By using good initial guesses, we can beat the non-negative Chebyshev expansion. However, the improvement is only minor, and not really worth considering in view of the much higher run-time of the optimization.
Conclusion
With the simple trick of interpolating the square root of a function and then squaring the resulting interpolant, the non-negative Chebyshev expansion is a fast and accurate way of computing polynomial expansions of functions.