Non-negative polynomial approximation

Chebyshev interpolation is a well-studied technique for approximating functions or data with a polynomial. Efficient techniques for computing these approximations as well as corresponding error bounds are abundantly available in literature. Usually no restrictions on the approximating polynomial are imposed. However, there are cases in which we need this polynomial to be nonnegative – whether that is because the resulting approximation is unphysical, needs to be taken the square root of, or due to some other circumstances. We present a small collection of approaches which can be taken to compute such nonnegative approximations.

Based on a recent collaboration with Daniel Kressner, Haoze He, and Hei Yin Lam.

Motivating example from linear algebra

Suppose we want to approximate a matrix function $f(\boldsymbol{A})$ by a matrix polynomial. This can be done by first computing a degree $m \in \mathbb{N}$ polynomial

\[f^{(m)}(x) = a_0 + a_1 x + a_2 x^2 + \dots + a_m x^m \approx f(x)\]

which approximates $f$ well (e.g. a Chebyshev interpolant), and by subsequently using the matrix $\boldsymbol{A}$ as its argument

\[f^{(m)}(\boldsymbol{A}) = a_0 + a_1 \boldsymbol{A} + a_2 \boldsymbol{A}^2 + \dots + a_m \boldsymbol{A}^m \approx f(\boldsymbol{A}).\]

Positive semidefiniteness

A desirable property for a matrix is positive semidefiniteness, i.e. all its eigenvalues $\lambda$ being nonnegative. Since the eigenvalues of a matrix function are $f(\lambda)$, the result of applying a nonnegative function $f \geq 0$ to a matrix $\boldsymbol{A}$ is always a positive semidefinite matrix $f(\boldsymbol{A}) \succeq 0$. However, a polynomial approximation $f^{(m)}$ to a nonnegative function $f$ is usually not guaranteed to preserve its non-negativity: it could be that even if $f(\boldsymbol{A})$ is positive semidefinite, $f^{(m)}(\boldsymbol{A})$ is indefinite. This can be avoided by ensuring that the approximating polynomial $f^{(m)}$ stays nonnegative.

Chebyshev interpolation as the backbone

The goal of this section is to introduce Chebyshev interpolation of a general function $f:[-1, 1] \to \mathbb{R}$ .

Chebyshev polynomials

The Chebyshev polynomials are defined as

\[T_l(x) = \cos(l \cdot \cos^{-1}(x)),~l=0, 1, \dots\]

where $l$ is also the degree of the polynomial. They form a basis of the vector space of all polynomials. Therefore, any polynomial $p$ of degree $m$ can be expressed in terms of Chebyshev polynomials

\[a_0 + a_1 x + \cdots + a_m x^m = \mu_0 T_0(x) + \cdots + \mu_m T_m(x).\]

Polynomial interpolation

The coefficients of any degree $m$ polynomial are uniquely defined the values of this polynomial in $m + 1$ distinct points $x_0, x_1, \dots, x_m \in [-1, 1]$. Hence, to compute a Chebyshev interpolation of a function $f$, that is, a polynomial

\[f^{(m)}(x) = \mu_0 T_0(x) + \cdots + \mu_m T_m(x) = \sum_{l=0}^{m} \mu_l T_l(x)\]

of degree $m$ which agrees with $f$ in the points $x_0, x_1, \dots, x_m$, we can just evaluate the function $f$ in these points, also called nodes, and find the coefficients $\mu_0, \mu_1, \dots, \mu_m$ of the polynomial which takes identical values at these points. But how should these points be chosen?

Chebyshev nodes

Intuitively, one would probably spread the nodes more or less evenly over the interval $[-1, 1]$ to make sure we gather as much information about the function $f$ as possible. However, it turns out that due to stability, taking equidistant nodes within $[-1, 1]$ is not the best idea. Instead, the so-called Chebyshev nodes

\[x_i = \cos\left(\frac{\pi i}{m}\right),~i=0, 1, \dots, m\]

have some particularly desirable properties: By the definition of the Chebyshev polynomials $T_0, T_1, \dots, T_m$, we see that

\[f(x_i) = f^{(m)}(x_i) = \sum_{l=0}^{m} \mu_l T_l(x_i) = \sum_{l=0}^{m} \mu_l \cos\left(\frac{\pi i l}{m}\right)\]

This exactly matches the definition of a discrete cosine transform (DCT) [Plonka et al., 2018], which can be efficiently computed with a fast Fourier transform algorithm. Hence, we can determine the function evaluations $f(x_0), f(x_1), \dots, f(x_m)$ by computing the DCT of the coefficients $\mu_0, \mu_1, \dots, \mu_m$, and – more remarkably – the coefficients $\mu_0, \mu_1, \dots, \mu_m$ of the Chebyshev interpolant from the function values $f(x_0), f(x_1), \dots, f(x_m)$ with the inverse DCT.

Numerically computing the Chebyshev interpolant

We will now have a look at how Chebyshev interpolants can be computed numerically in Python.

# Package imports
import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

The scientific computing package NumPy provides convenient implementations of Chebyshev polynomials in the numpy.polynomal.chebyshev module. The interpolate function takes as an input a Python function handle and a degree, and returns the coefficients of the corresponding Chebyshev interpolant.

# Chebyshev interpolation coefficients with NumPy 
def chebyshev_interpolation_numpy(f, m):
    chebyshev_polynomial = np.polynomial.chebyshev.Chebyshev.interpolate(f, m)
    mu = chebyshev_polynomial.coef
    return mu

The coefficients of the Chebyshev interpolation can also be computed “by hand”, using the inverse of the discrete cosine transform (DCT) as we have seen in the previous section. We use the implementation of the inverse DCT idct from the fft module in the advanced scientifc computing package SciPy.

# Chebyshev interpolation coefficients with DCT
def chebyshev_interpolation_dct(f, m):
    chebyshev_nodes = np.cos(np.pi * np.arange(m + 1) / m)
    mu = sp.fft.idct(f(chebyshev_nodes), type=1)
    mu[1:-1] *= 2  # rescale coefficients due to type-1 DCT convention in SciPy
    return mu

Let us now test the two implementations on a simple example: the Chebyshev interpolants of degree $m = 10$ of a bell curve

\[f(x) = \exp(- x^2 / \sigma^2)\]

for $\sigma = 0.1$.

# Compute and visualize Chebyshev interpolants
def evaluate_interpolant(coefficients, evaluation_points):
    return np.polynomial.chebyshev.Chebyshev(coefficients)(evaluation_points)

# Chebyshev interpolation of a bell-curve
sigma = 0.1
f = lambda x: np.exp(- (x / sigma) ** 2)
m = 10
mu_numpy = chebyshev_interpolation_numpy(f, m)
mu_dct = chebyshev_interpolation_dct(f, m)

# Visualize Chebyshev interpolant by evaluating them in 1000 points
x_lin = np.linspace(-1, 1, 1000)
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct, x_lin), label="DCT interpolant")
plt.plot(x_lin, f(x_lin), linestyle="--", label="exact function")
plt.legend()
plt.show()

png

Wait… Should they not give the same result? No. While looking into the source code of the numpy.polynomial.chebyshev module, I found that unlike in the sandard reference on Chebyshev interpolation, the NumPy implementation interpolates the function at the Chebyshev nodes of the first kind

\[x_i = \cos\left(\frac{\pi (2 i + 1)}{2 m}\right),~i=0, 1, \dots, m.\]

To quickly confirm this observation, we can also implement the DCT-based Chebyshev interpolation for the Chebyshev nodes of the first kind.

# Compute and visualize Chebyshev interpolants with nodes of the first kind
def chebyshev_interpolation_dct_firstkind(f, m):
    chebyshev_nodes = np.cos(np.pi / 2 * (2 * np.arange(m + 1) + 1) / (m + 1))
    mu = sp.fft.idct(f(chebyshev_nodes), type=3)
    mu[1:] *= 2  # rescale coefficients due to type-3 DCT convention in SciPy
    return mu

mu_dct_firstkind = chebyshev_interpolation_dct_firstkind(f, m)

# Visualize Chebyshev interpolant
plt.plot(x_lin, evaluate_interpolant(mu_numpy, x_lin), label="NumPy interpolant")
plt.plot(x_lin, evaluate_interpolant(mu_dct_firstkind, x_lin), label="DCT interpolant (1st kind)")
plt.plot(x_lin, f(x_lin), linestyle="--", label="exact function")
plt.legend()
plt.show()

png

Great! Our implementation now gives the exact same result as NumPy obtains. But for simplicity and the theoretical analysis, we will switch back to the usual Chebyshev nodes. Let us now have a look at how our implementation compares to NumPy in terms of run-time. To do so, we interpolate the same bell curve with a Chebyshev interpolant of degree $m = 1000$.

m = 1000
%timeit chebyshev_interpolation_numpy(f, m)
%timeit chebyshev_interpolation_dct(f, m)
5.64 ms ± 166 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
23.8 μs ± 205 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

What? It turns out that our implementation with the DCT is more than 100 times faster than the NumPy implementation. This is due to the fact that the DCT of a length $m$ array can be computed in $\mathcal{O}(m \log(m))$ floating-point operations with a simplified variant of the fast fourier transform (FFT) algorithm. However, NumPy’s implementation explicitly assembles the Vandermonde matrix, which makes the algorithm run in $\mathcal{O}(m^2)$ operations. Ouch! Eager to find out why no one has tried to improve this implementation, I sent a message to the NumPy discussion mailing list discussing the issue. Turns out that NumPy’s polynomial module was made “for prototyping and educational purposes only”. Let’s keep our hands off it then…

Approximation error

The convergence of the interpolation error of the Chebyshev interpolants with the degree $m$ can be reflected the following theorem.

Theorem [Trefethen, 2019]

Let $f^{(m)}$ be the Chebyshev interpolation of degree $m \in \mathbb{N}$ of the smooth function $f : [-1, 1] \to \mathbb{R}$. Then

\[\max_{x \in [-1, 1]} |f(x) - f^{(m)}(x) | \leq \frac{4}{\chi^m(\chi - 1)} \sup_{z \in \mathcal{E}_{\chi}} |f(z)|\]

for the Bernstein ellipse $\mathcal{E}_{\chi}$, $\chi > 1$.

The Bernstein ellipse is the ellipse with foci $\pm 1$ and sum of half-axis $\chi$. Some simple computations show that any (complex) point $z = x + i y \in \mathcal{E}_{\chi}$ has imaginary part $|y| \leq (\chi - \chi^{-1}) / 2$, such that for the bell curve, we have

\[|e^{-z^2 / \sigma^2}| = |e^{-(x + iy)^2 / \sigma^2}| = \underbrace{e^{-x^2 / \sigma^2}}_{\leq 1} e^{y^2 / \sigma^2} \leq e^{y^2 / \sigma^2} \leq e^{(\frac{\chi - \chi^{-1}}{2\sigma})^2}\]

Hence, if we let $\chi = 1 + \varepsilon$ for some $\varepsilon > 0$, then $\chi - \chi^{-1} \leq 2 \varepsilon$ and consequently, we end up with the following specialization of the theorem.

Corollary

Let $f(x) = e^{-x^2 / \sigma^2}$ for some $\sigma > 0$ and $f^{(m)}$ its Chebyshev interpolant of degree $m$. Then for all $m \in \mathbb{N}$ and all $\varepsilon > 0$ it holds

\[\max_{x \in [-1, 1]} |f(x) - f^{(m)}(x) | \leq 4 (1 + \varepsilon)^{-m} \frac{e^{\varepsilon^2 / \sigma^2}}{\varepsilon}.\]

So, the smaller $\sigma$, i.e. the pointier the bell curve, the less accurate the interpolation is going to be. This makes sense, since it is in general quite hard to approximate strongly changing functions with polynomials.

Let us now try to see how our corollary performs numerically. In particular, since we are free to use any $\varepsilon > 0$ in the bound, let us observe how the choice of this parameter influences the quality of the bound.

# Plot of interpolation error convergence
def theoretical_bound(m, epsilon, sigma):
    return 4 * (1. + epsilon) ** (-m) * np.exp((epsilon / sigma) ** 2) / epsilon

m_list = 10 * np.arange(1, 40)
errors, bounds_1, bounds_2, bounds_3 = [], [], [], []
f_exact = f(x_lin)

for i, m in enumerate(m_list):
    mu = chebyshev_interpolation_dct(f, m)
    f_interp = evaluate_interpolant(mu, x_lin)

    errors.append(np.max(np.abs(f_interp - f_exact)))
    bounds_1.append(theoretical_bound(m, 0.1, sigma))
    bounds_2.append(theoretical_bound(m, 0.2, sigma))
    bounds_3.append(theoretical_bound(m, 0.5, sigma))

plt.semilogy(m_list, errors, marker="o", label=r"interpolation error")
plt.semilogy(m_list, bounds_1, label=r"bound for $\varepsilon = 0.1$")
plt.semilogy(m_list, bounds_2, color="#F98125", linestyle="--", label=r"bound for $\varepsilon = 0.2$")
plt.semilogy(m_list, bounds_3, color="#F98125", linestyle=":", label=r"bound for $\varepsilon = 0.5$")
plt.ylim(1e-16, 1e2)
plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.legend()
plt.show()

png

As we can see, the interpolation error is indeed bounded from above by the bounds from the corollary. It turns out that smaller $\varepsilon$ give tighter bounds for smaller degrees $m$, while larger $\varepsilon$ give tighter bounds for larger degrees $m$. Hence, depending on whether we are interested in a low-degree or high-degree approximation, we can choose an appropriate $\varepsilon$ for a good indication of the interpolation error.

Nonnegative Chebyshev approximation

Now that we have discussed Chebyshev interpolation both theoretically and in practice, we will shift our focus to the main topic of this post: nonnegative Chebyshev approximation. To this extent, we will present four methods for nonnegative Chebyshev approximation, of which we find one particularly suitable for this task.

Method 1: Jackson damping

There is quite an old strategy for obtaining a nonnegative Chebyshev approximation. It is called Jackson damping, and as the name suggests, we apply some damping to the coefficients of the Chebyshev interpolant [Braverman et al., 2021]. Given a Chebyshev interpolant of a nonnegative function, we can enforce non-negativity of the Chebyshev interpolation by multiplying its coefficients $\mu_0, \mu_1, \dots, \mu_m$ by some Jackson coefficients

\[\gamma_k = \sum_{j = -\frac{m}{2} - 1}^{\frac{m}{2} + 1 - k} \left( \frac{m}{2} - |j| + 1 \right) \left( \frac{m}{2} - |j + k| + 1 \right).\]

to get the damped Chebyshev interpolation coefficients

\[\widetilde{\mu}_l = \frac{\gamma_k}{\gamma_0} \mu_l,~l=0,1,\dots,m.\]

Despite this complicated looking definition of the Jackson coefficients, it turns out that computing all of them is essentially a one-liner in Python:

# Apply Jackson damping to coefficients
import functools

def damping_coefficients(m):
    assert(m % 4 == 0)
    gamma = functools.reduce(np.convolve, 4*[np.ones(m // 2 + 1)])[m:]
    return gamma / gamma[0]

Let us quickly test Jackson damping numerically, and compare the approximation error of the damped with the standard Chebyshev interpolation.

# Comparison of Jackson damping with Chebyshev interpolation
errors, errors_nonneg = [], []
m_list = 8 * np.arange(1, 25)

for m in m_list:
    mu = chebyshev_interpolation_dct(f, m)
    f_interp = evaluate_interpolant(mu, x_lin)
    errors.append(np.max(np.abs(f_exact - f_interp)))

    mu_nonneg = damping_coefficients(m) * mu
    f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
    errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))

plt.semilogy(m_list, errors, marker="o", label=r"Chebyshev interpolation")
plt.semilogy(m_list, errors_nonneg, marker="s", label=r"Jackson damping")

plt.ylabel(r"error")
plt.xlabel(r"degree $m$")
plt.legend()
plt.show()

png

This does not look promising; through the damping, we trade in the exponential convergence for reciprocal convergence.

Method 2: Féjér theorem

A second approach we consider on our journey to find a good nonnegative interpolant comes from a characterization of nonnegative polynomials given by the following theorem.

Theorem [Féjér, 1916]

Any nonnegative polynomial $p : [-1, 1] \to \mathbb{R}^{+}$ of degree $2m$ can be represented as

\[p(\cos(y)) = \left| \sum_{l=-m}^{m} c_l e^{ily} \right|^2.\]

Conveniently, this nonnegative polynomial can quite easily be converted to an expression in terms of Chebyshev polynomials. To this extent, we first rewrite

\[p(\cos(y)) = \left| \sum_{l=-m}^{m} c_l e^{ily} \right|^2 = \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} e^{i(l-k)y}.\]

Since $\cos(y) = \cos(2\pi-y)$ we get

\[\begin{split} p(\cos(y)) &= \frac{p(\cos(y)) + p(\cos(2\pi-y))}{2} \\ &= \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} \frac{e^{i(l-k)y} + \overbrace{e^{i2\pi(l-k)}}^{=1}e^{-i(l-k)y}}{2} \\ &= \sum_{l=-m}^{m} \sum_{k=-m}^{m} c_l c_k^{\ast} \cos((l-k)y). \end{split}\]

Splitting the sums yields

\[p(\cos(y)) = \sum_{l = -m}^{m} |c_l|^2 \cos(0y) + \sum_{-m \leq k < l \leq m} (c_l c_k^{\ast} + c_k c_l^{\ast}) \cos((l-k)y).\]

Using the definition of the Chebyshev polynomials $T_l(\cos(y)) = \cos(l y)$ for $l \geq 0$, we obtain

\[\begin{split} p(\cos(y)) &= \sum_{l = -m}^{m} |c_l|^2 T_0(\cos(y)) + \sum_{-m \leq k < l \leq m} (c_l c_k^{\ast} + c_k c_l^{\ast}) T_{l-k}(\cos(y)) \\ &= \sum_{j=0}^{2m} \mu_j T_j(\cos(y)). \end{split}\]

We can summarize this derivation in the following lemma.

Lemma

The Féjér representation $| \sum_{l=-m}^{m} c_l e^{ily} |^2$ and the coefficients of the corresponding Chebyshev interpolation $\sum_{j=0}^{2m} \mu_j T_j(\cos(y))$ are related through

\[\mu_j = \begin{cases} \sum_{l = -m}^{m} |c_l|^2 & j = 0 \\ \sum_{l = -m}^{m - j} (c_l c_{l + j}^{\ast} + c_l^{\ast} c_{l + j}) & 1 \leq j \leq 2m \\ \end{cases}\]

This is quite convenient. So we just need to find good coefficients $c_l$ which linearly combine some exponential functions, and use the formula from the lemma to give us a nonnegative Chebyshev approximation. A Python function which implements this formula is given below.

# Conversion from Féjér coefficients to Chebyshev coefficients
def convert_to_chebyshev_coefficients(c):
    c_cheb = np.zeros_like(c)
    c_cheb[0] = np.sum(np.abs(c) ** 2)
    for j in range(1, len(c_cheb)):
        c_cheb[j] = np.sum(c[:-j] * np.conj(c[j:]) + np.conj(c[:-j]) * c[j:])
    return np.real(c_cheb)

Here it gets tricky. Obtaining coefficients $c_{-m}, \dots, c_m$ for a nonnegative Chebyshev approximation comes down to solving the non-convex optimization problem

\[\min_{c_{-m}, \dots, c_{m} \in \mathbb{C}} \left\lVert f - \left| \sum_{l=-m}^{m} c_l e^{il \cdot} \right|^2 \right\rVert_{\infty}.\]

Given some coefficients $c_{-m}, \dots, c_m$ of such an interpolation, the above error can numerically be approximated by evaluating the difference at, say, $100$ points in $[-1, 1]$, and taking the maximum absolute deviation between the true function and its interpolation.

# Evaluation of approximation and approximation error from coefficients
def fejer_approximation(x, c, complex=False):
    if complex:
        # With complex coefficients, last half of 'c' go to imaginary part
        c = c[:len(c) // 2] + 1j * c[len(c) // 2:]
    m = (len(c) - 1) // 2
    g = np.sum(c * np.exp(1j * np.outer(x, np.arange(-m, m + 1))), axis=1)
    return np.abs(g) ** 2

def fejer_approximation_error(c, fun, a=0, b=2 * np.pi, n_test=100, complex=False):
    diff = lambda x: np.abs(fun(np.cos(x)) - fejer_approximation(x, c, complex=complex))
    return np.max(diff(np.linspace(a, b, n_test)))

Trying to find a minimum of this interpolation error through optimization may work for small degrees $m$, but is completely hopeless for larger degree approximations due to much larger parameter space. We can best see this when we use a non-convex optimizer (e.g. BFGS) to compute optimal coefficients for increasing degrees $m$.

# Direct solution of optimization problem
def chebyshev_coefficients_fejer(fun, m, c0=None, complex=False):
    if c0 is None:
        c0 = np.random.randn(2 * m if complex else m)
    target = lambda c: fejer_approximation_error(c, fun, complex=complex)
    c_opt = sp.optimize.minimize(target, c0, method="BFGS").x
    if complex:
        # With complex coefficients, last half of 'c' go to imaginary part
        c_opt = c_opt[:len(c_opt) // 2] + 1j * c_opt[len(c_opt) // 2:]
    c_cheb = convert_to_chebyshev_coefficients(c_opt)
    return c_cheb

m_list = 2 * np.arange(20) + 3
errors = []
num_retries = 3
errors_fejer = np.empty((len(m_list), num_retries))

for i, m in enumerate(m_list):
    mu = chebyshev_interpolation_dct(f, m)
    f_interp = evaluate_interpolant(mu, x_lin)
    errors.append(np.max(np.abs(f_exact - f_interp)))
    for j in range(num_retries):
        mu_fejer = chebyshev_coefficients_fejer(f, m=m, complex=True)
        f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
        errors_fejer[i, j] = np.max(np.abs(f_exact - f_interp_fejer))

plt.semilogy(m_list, errors, marker="o", label="Chebyshev interpolation")
plt.semilogy(m_list, np.mean(errors_fejer, axis=1), marker="s", label="Féjér approximation")
plt.fill_between(m_list, np.min(errors_fejer, axis=1), np.max(errors_fejer, axis=1), color="#F98125", alpha=0.2)

plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.legend()
plt.show()

png

Surprisingly, minimizing the approximation error in the Féjér representation works even better than Chebyshev interpolation for small degrees. However, once we increase the interpolation degree, the results are quite bad.

Method 3: Squaring the square root

There is one convenient trick, which leads to a nonnegative interpolation: First, compute the Chebyshev interpolant of the square root of the function to half the degree $m/2$

\[\sqrt{f}^{(m/2)}(x) = \sum_{l=0}^{m/2} \mu_l T_l(x).\]

Then, square this Chebyshev interpolant, which can again be represented in terms of Chebyshev polynomials

\[f_{\mathrm{nn}}^{(m)}(x) = (\sqrt{f}^{(m/2)})^2(x) = \sum_{l=0}^{m} \nu_l T_l(x).\]

Because the function is the square of another function, it is by definition nonnegative. And what is best is that due to the DCT-duality between evaluations of the Chebyshev interpolation and its coefficients, we can (exactly) square a Chebyshev interpolation in just $\mathcal{O}(m \log(m))$:

\[[\nu_0, \dots, \nu_m] = \operatorname{DCT}^{-1}\{\operatorname{DCT}\{[\mu_0, \dots, \mu_{m/2}, 0, \dots, 0]\}^2\}\]

where the square is understood elementwise. This then leads us to the following algorithm for nonnegative Chebyshev approximation.

Algorithm (nonnegative Chebyshev approximation) [Matti et al, 2025]

Input: Function $f : [-1, 1] \to \mathbb{R}$, even degree $m \in \mathbb{N}$

Output: Coefficients $\nu$ such that $\sum_{l=0}^{m} \nu_l T_l(x)$ interpolates $f$

1: Compute $[\mu_0, \dots, \mu_{m/2}]$ for the Chebyshev interpolant $\sum_{l=0}^{m/2} \mu_l T_l(x)$ of $\sqrt{f}$

2: Zero-pad the coefficients $\boldsymbol{\mu} = [\mu_0, \dots, \mu_{m/2}, 0, \dots, 0] \in \mathbb{R}^{m + 1}$

3: Evaluate the Chebyshev interpolant with $\boldsymbol{f} = \operatorname{DCT}{\boldsymbol{\mu}}$

4: Compute the coefficients $\boldsymbol{\nu} = [\nu_0, \dots, \nu_m]$ with $\boldsymbol{\nu} = \operatorname{DCT}^{-1}{\boldsymbol{f}^2}$

The Python implementation is quite simple.

# Implementation of nonnegative Chebyshev approximation
def square_chebyshev_interpolant(mu):
    mu = np.append(mu, np.zeros(len(mu) - 1))  # Zero-padding
    mu[1:-1] /= 2  # Rescale coefficients due to type-1 DCT convention in SciPy
    nu = sp.fft.idct(sp.fft.dct(mu, type=1) ** 2, type=1)
    nu[1:-1] *= 2  # Rescale coefficients due to type-1 DCT convention in SciPy
    return nu

def chebyshev_approximation_nonneg(f, m):
    f_sqrt = lambda x: np.sqrt(f(x))
    mu_sqrt = chebyshev_interpolation_dct(f_sqrt, m // 2)
    mu = square_chebyshev_interpolant(mu_sqrt)
    return mu

What’s more, this nonnegative interpolation verifies a similar bound as the standard Chebyshev interpolation. For the example of the bell curve, we can derive the following bound.

Proposition

Let $f(x) = e^{-x^2 / \sigma^2}$ for some $\sigma > 0$ and $f^{(m)}$ its Chebyshev interpolant of degree $m$. For all $m \in 2\mathbb{N}$ and all $\varepsilon > 0$ it holds

\[\max_{x \in [-1, 1]} |f(x) - f_{\mathrm{nn}}^{(m)}(x) | \leq (1 + \varepsilon)^{-m/2} \frac{e^{\varepsilon^2/2\sigma^2}}{\varepsilon} (2 + (1 + \varepsilon)^{-m/2} \frac{e^{\varepsilon^2/2\sigma^2}}{\varepsilon}).\]

This can be proved by first applying the theorem to the interpolation of the square root

\[| \sqrt{f}(x) - \sqrt{f}^{(m/2)}(x) | \leq \frac{2}{\chi^{-m/2}(1 + \chi)} \sqrt{\max_{z \in \mathcal{E}_{\chi}} | f(z) |},\]

and then realizing $f = \sqrt{f}^2$ and $f_{\mathrm{nn}}^{(m)} = (\sqrt{f}^{(m)})^2$ which implies

\[\begin{split} | f - f_{\mathrm{nn}}^{(m)} | &= | \sqrt{f}^2 - (\sqrt{f}^{(m)})^2 | \\ &\leq | \sqrt{f} - (\sqrt{f}^{(m)}) | \max\{ \sqrt{f}, \sqrt{f}^{(m)} \} \\ &\leq | \sqrt{f} - (\sqrt{f}^{(m)}) | (2 \max\{ \sqrt{f}\} + | \sqrt{f} - (\sqrt{f}^{(m)}) |). \end{split}\]

Let us quickly see how our method numerically compares against the standard Chebyshev interpolation.

m_list = 8 * np.arange(1, 25)
errors, errors_nonneg = [], []

for m in m_list:
    mu = chebyshev_interpolation_dct(f, m)
    f_interp = evaluate_interpolant(mu, x_lin)
    errors.append(np.max(np.abs(f_exact - f_interp)))

    mu_nonneg = chebyshev_approximation_nonneg(f, m)
    f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
    errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))

plt.semilogy(m_list, errors, marker="o", label=r"Chebyshev interpolation")
plt.semilogy(m_list, errors_nonneg, marker="s", label=r"nonnegative interpolation")

plt.ylabel(r"error")
plt.xlabel(r"degree $m$")
plt.legend()
plt.show()

png

Nice! We are not that far off from the standard Chebyshev interpolation in terms of the approximation error. And also in terms of computational time, we are just around 2 times slower, because we need to compute the DCT multiple times.

m = 1000
%timeit chebyshev_interpolation_dct(f, m)
%timeit chebyshev_approximation_nonneg(f, m)
23.3 μs ± 185 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
41 μs ± 123 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Method 4: Féjér representation on steroids

Before we definitely discard the Féjér representation approach, let us briefly explore an interesting connection between this formulation and the Chebyshev representation. We observe that for the representation $p(\cos(y)) = | \sum_{l=-m}^{m} c_l e^{ily} |^2$ when $c_l \in \mathbb{R}$ and $c_l = c_{-l}$ we have

\[\sum_{l=-m}^{m} c_l e^{ily} = c_0 + \sum_{l=1}^{m} (c_l e^{ily} + c_{-l} e^{-ily}) = c_0 + 2 \sum_{l=1}^{m} c_l \cos(ly)\]

So with the coefficients

\[\mu_l = \begin{cases} c_0 & \text{if $l=0$} \\ 2c_l & \text{if $l\geq 1$} \end{cases}\]

we have

\[p(x) = ( \sum_{l=0}^{m} \mu_l T_l(x) )^2\]

i.e. the Chebyshev interpolation of $\sqrt{p(x)}$ will give us coefficients which can be used as initial guess in the optimization problem. Now this is quite cool, because we can build a two-phase algorithm to obtain a nonnegative Chebyshev approximation: 1) compute the Chebyshev interpolation of $\sqrt{f}$ and convert the interpolation coefficients with help of the representation in the lemma above. 2) Use the coefficients as initial guess in the non-convex optimization algorithm with the same objective as above. Let’s see if that actually works in practice.

m_list = 2 * np.arange(40) + 3
errors_nonneg, errors_fejer = [], []

for i, m in enumerate(m_list):
    mu_nonneg = chebyshev_approximation_nonneg(f, m - 1)
    f_interp_nonneg = evaluate_interpolant(mu_nonneg, x_lin)
    errors_nonneg.append(np.max(np.abs(f_exact - f_interp_nonneg)))

    # Generate good initial guess for optimization in Féjér representation
    f_sqrt = lambda x: np.sqrt(f(x))
    c_sqrt = chebyshev_interpolation_dct(f_sqrt, (m - 1) // 2)
    c0 = np.zeros(m)
    c0[(m-1) // 2 + 1:] = c_sqrt[1:] / 2
    c0[:(m-1) // 2] = c_sqrt[::-1][:-1] / 2
    c0[(m-1) // 2] = c_sqrt[0]

    mu_fejer = chebyshev_coefficients_fejer(f, m=m, c0=c0, complex=False)
    f_interp_fejer = evaluate_interpolant(mu_fejer, x_lin)
    errors_fejer.append(np.max(np.abs(f_exact - f_interp_fejer)))

plt.semilogy(m_list, errors_nonneg, marker="o", label="nonnegative interpolation")
plt.semilogy(m_list, errors_fejer, marker="s", label="Féjér on steroids")

plt.legend()
plt.xlabel(r"degree $m$")
plt.ylabel(r"error")
plt.show()

png

Interesting. By using good initial guesses, we can beat the nonnegative Chebyshev interpolation. However, the improvement is only minor, and not really worth considering in view of the much higher run-time of the optimization.

Conclusion

With the simple trick of interpolating the square root of a function and then squaring the resulting interpolant, the nonnegative Chebyshev approximation is a fast and accurate way of computing polynomial approximations of functions.