# Using the Accelerate library to implement Chebyshev Functions

Chebyshev approximations are a way of representing any bound, continuous function, with a polynomial. Having a polynomial representation means that further function manipulations, such as derivation, integration, and root-finding, become relatively trivial computations.

The first step for this project, as a proof of concept, was to write it in vanilla haskell, using lists as the primary representation of Chebyshev Polynomials. The introduction of the project can be found in an earlier post here.

One of the Haskell libraries that have been developed is Accelerate, an embedded language that allows for parallel array computations on computers with more than one core.

Since the Chebyshev polynomials in this project are, in fact, represented by arrays or lists of coefficients, I wanted to make full use of the parallel computing capabilities of the language. I spent the past few weeks converting my code to the Accelerate language.

The primary function of accelerate is to automatically generate optimized code. There are two primary types of values in Accelerate: Scalars and Arrays.

Using accelerate required a different way of thinking. I had to write functions that allow everything to be computed all at once. As an example, The chebyshev approximation is, by definition, $$f\left( x \right) \cong \ \sum_{i = 0}^{\infty}{c_{i}T_{i}(x)}\$$.

I could simply generate the list of coefficients, generate the list of polynomials (as a list of lists), and map the coefficient with the respective polynomial. The last step is simply to sum up all the polynomials.

chebf f n =
let coeffs   = chebCoeff f n n
chebPols = chebPol n
zipped   = zip coeffs chebPols
mapped   = map ($$x, y) -> map (*x) y) zipped in foldl (\x y -> sumVectors x y) [] mapped For example, if I had \(c_{i} = \left\lbrack 1,\ 2,\ 3 \right\rbrack,$$ pols = $$\lbrack\left\lbrack 1,\ 1 \right\rbrack,\ \left\lbrack 2,\ 0,\ 3 \right\rbrack,\ \left\lbrack 5,\ 3,\ 4,\ 1 \right\rbrack\rbrack$$ , the first step would be to zip the coefficients with the polynomials. That is, $zipped = \lbrack\left( 1,\ \left\lbrack 1,\ 1 \right\rbrack \right),\ \left( 2,\ \left\lbrack 2,\ 0,\ 3 \right\rbrack \right),\ \left( 3,\ \left\lbrack 5,\ 3,\ 4,\ 1 \right\rbrack \right)\rbrack$ Next, the coefficient will be multiplied through to the respective polynomial. $mapped = \lbrack\left\lbrack 1,\ 1 \right\rbrack,\ \left\lbrack 4,\ 0,\ 6 \right\rbrack,\ \left\lbrack 15,\ 9,\ 12,\ 3 \right\rbrack\rbrack$ Finally, all the polynomials need to be summed up.

In the polynomial representation, $\left\lbrack 1,\ 1 \right\rbrack + \left\lbrack 4,\ 0,\ 6 \right\rbrack + \left\lbrack 15,\ 9,\ 12,\ 3 \right\rbrack = \lbrack 20,\ 10,\ 18,\ 3\rbrack$ Note that each list is allowed to be a different length. In Accelerate, it is much more simple to pre-allocate arrays. Oftentimes matrices need to be padded with zeros to allow for uniform list lengths.

Since arrays in Accelerate need to be pre-allocated, we need a very different parallel approach.

{- Given a function f, and degree n, calculates chebyshev approximation. Get list of coeffs and chebyshev polynomials. Want to zip each coeff w/ respective polynomial and multiply. Finally, fold over all polynomials. -}

chebf :: (Exp Double -> Exp Double) -> Int -> Acc (Vector Double)
chebf f n =
let n'   = constant n
chebPols = genChebMatrix n
coeffs   = chebCoeff' f n'
coeffsM   = A.replicate (lift (Z :. All :. (n'+1))) coeffs
in
A.sum $A.transpose$ A.zipWith (*) coeffsM (use chebPols)

Here, we need still to generate two matrices – one for the coefficients, and one for the polynomials. However, they need to be the same size in order for us to properly map the correct coefficient to the polynomial. In this approach, I generate a matrix of polynomials, padded with zeroes where appropriate to fill it to the appropriate size. If we take the previous example, where the polynomials were $$\lbrack\left\lbrack 1,\ 1 \right\rbrack,\ \left\lbrack 2,\ 0,\ 3 \right\rbrack,\ \left\lbrack 5,\ 3,\ 4,\ 1 \right\rbrack\rbrack$$, we end up with this matrix:

$\begin{bmatrix} 1 & 1 & 0 & 0 \\ 2 & 0 & 3 & 0 \\ 5 & 3 & 4 & 1 \\ \end{bmatrix}$

Next, I generate a matrix of coefficients, where each row is the same value, so that the elements of the coefficient and the polynomial matrix can be matched pair-wise. As a recap, the coefficients are $$c_{i} = \left\lbrack 1,\ 2,\ 3 \right\rbrack$$.

Next, we zip and multiply the two matrices together in an element-wise fashion.

$\begin{bmatrix} (1,\ 1) & (1,\ 1) & (0,\ 1) & (0,\ 1) \\ (2,\ 2) & (0,\ 2) & (3,\ 2) & (0,\ 2) \\ (5,\ 3) & (3,\ 3) & (4,\ 3) & (1,\ 3) \\ \end{bmatrix}$

$A = \ \begin{bmatrix} 1 & 1 & 0 & 0 \\ 4 & 0 & 6 & 0 \\ 15 & 9 & 12 & 3 \\ \end{bmatrix}$

Finally, we get the final polynomial representation by taking the transpose and summing the product horizontally, collapsing the matrix into a vector representation of the Chebyshev approximation.

$A^{T} = \ \begin{bmatrix} 1 & 4 & 15 \\ 1 & 0 & 9 \\ 0 & 6 & 12 \\ 0 & 0 & 3 \\ \end{bmatrix}$

$result = \ \begin{bmatrix} 20 & 10 & 18 & 3 \\ \end{bmatrix}$

Both approaches lead to the same result. However, the Accelerate approach has the advantage of giving us the freedom of generating optimized code that allows for parallel computing.