HW2: MAT 321 - Numerical methods.

Author: Marc Aurèle Gilles

Disclaimer: this homework has a lot of text but most of it is background. There is also a lot of code, but most of it is provided for you. Each function you have to complete is indicated by

ADD YOUR CODE HERE

You should try to understand what the rest of the code does, but you may assume you don't have to change the code. See precept 2 for the syntax of matrix operations if you need it.

Problem 1: Poisson's equation

The latter part of the class is focused on differential equations, but we have already everything we need to design our first numerical ODE solver. In this problem, we derive a solution for Poisson's equation in 1D:

$$ - \frac{d^2}{dx^2} u(x) = f(x) \qquad u(0) = u(1) = 0 $$

Here, $f(x)$ is given, and we are seeking $u(x)$. In precept 2 we showed that we can estimate the second derivative of a function using the central differences:

$$ D^2_h u(x) := \frac{u(x + h) - 2u(x) + u(x- h)}{h^2} = u''(x) + \mathcal{O}(h^2) $$

We will discretize the interval [0,1] with the following n+2 points: $x_j = jh $, $h = 1/(n+1)$ and write the function values of $u(x_j)$ for $j =1 \dots n$ in a vector $v$:

$$ v = \begin{bmatrix} u(x_1) \\ u(x_2) \\ \vdots \\ u(x_n) \end{bmatrix} $$

1.1:

Before we try to solve the ODE, let us consider a simpler problem: given the vector $v$ above, and assuming $ u(0) = u(1) = 0 $, find a matrix $T$ that computes the vector:

$$ T v = \begin{bmatrix} - D^2_h u(x_1) \\ - D^2_h u(x_2) \\ \vdots \\ - D^2_h u(x_n) \end{bmatrix} $$

Your task is to implement the function that constructs the matrix $T$ in the code below (and make sure it passes the tests below)

(Hint: if we look at $-D^2_hu(x)$ evaluated at $x=x_i$, we get row $i$ of the vector $Tv$ for $1 < i < n$ , which reads: $$ [Tv]_i = - D^2_h u(x_i) = - \frac{u(x_i + h) - 2u(x_i) + u(x_i- h)}{h^2} $$ Since $x_i = ih$, we have that $x_i + h = ih + h = (i+1)h = x_{i+1}$ and $x_i -h = x_{i-1}$, $$[Tv]_i = - D^2_h u(x_i) = - \frac{u(x_{i+1}) - 2u(x_i) + u(x_{i-1})}{h^2} = -\frac{v_{i+1} - 2 v_i + v_{i-1}}{h^2} $$ where $[Tv]_i$ denotes entry i of the vector $Tv$. Also note that the $x_i$ are fixed at this point, e.g., if we take $n = 3$, $h = 1/(3+1) = 1/4$ and the vector is:

$$ v = \begin{bmatrix} u(x_1) \\ u(x_2) \\ u(x_3) \end{bmatrix} = \begin{bmatrix} u(1/4) \\ u(1/2) \\ u(3/4) \end{bmatrix} $$

There are no more $x$'s! )

1.2

If, instead, we are given a vector of values of the second derivatives $-u_{xx}(x)$ (call this function, say $f(x)$), and are seeking approximate values of $u(x)$, we can solve the linear system $Tv_h = b$, where

$$ b = \begin{bmatrix} f(x_1)\\ f(x_2)\\ \vdots\\ f(x_n)\\ \end{bmatrix} $$

to get an approximate solution $v_h$ such that $u(x_i) \approx [v_h]_i$ to the ODE. Note that the boundary conditions are implicitly imposed by the way we built the matrix. This is called the finite-difference method.

In previous case, we were computing $-u_{xx}(x)$ approximately using samples of $u(x)$ and the Taylor reminder theorem guarantees that the error in the $\infty$-norm will decay as $\mathcal{O}(h^2)$. This does not directly imply that the inverse problem will have the same property, but in fact it does. We won't prove this (yet) but want to demonstrate it numerically.

Your task is to solve the ODE using the finite difference method, and make a convergence plot similar to the one above by computing the $\infty$-norm of the error (that is, $e_h = \max_i |(v_h)_i - u(x_i)|$) and showing that it decreases as $\mathcal{O}(h^2)$. To solve the linear system, you may call np.linalg.solve(). (In the coming weeks, we will see that there are much faster algorithm to solve this system than the general one implemented in np.linalg.solve(). ) We test this with the simple right-hand side $f(x) = -9 \pi^2 \sin(3 \pi x)$ defined above so that we can easily monitor the error, but this applies to any sufficiently smooth function $f(x)$.

Problem 2: Fast matvecs

We showed in class that matrix-vector multiplication with a general $n \times n $ matrix has complexity $\mathcal{O}(n^2)$. However, we can do better in many cases, e.g. if $A = u v^T$ is an outer product there is an $\mathcal{O}(n)$ algorithm. We explore other such situations here.

2.1: truncated SVD.

Suppose you $A_k$ is the rank-$k$ truncated SVD of A $ \in \mathbb{R}^{n \times n} $, write a code that computes $ A_k v$ for some vector $v$ in $\mathcal{O}(kn) $

2.2: Circulant matrices

The Discrete Fourier Transform (DFT) of a vector $x \in \mathbb{C}^n$ is the vector $y\in \mathbb{C}^n$ defined by:

$$ y_k = \sum_{j=0}^{n-1} x_j \exp(-i2\pi k j / n ) $$

(Here, vectors are 0-indexed) In other words, it is the result of the matrix-vector multiply $Fx$ between the matrix

$$ F = \begin{bmatrix} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega & \omega^2 & \omega^3 & \cdots & \omega^{n-1} \\ 1 & \omega^2 & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^3 & \omega^6 & \omega^9 & \cdots & \omega^{2(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ddots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{bmatrix} $$

where $\omega = \exp(-i 2 \pi / n) $, $i^2 = -1$, and the vector

$$ x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_{n-1} \end{bmatrix} \ . $$

The fast Fourier transform (FFT) [Cooley–Tukey,1961] is an algorithm that computes the vector $Fx$ in $\mathcal{O}(n \log n) $ operations. The FFT has been described (by someone a little older than you) as "the most important numerical algorithm of our lifetime" due its wide ranging applications in science and engineering. Time permitting, we will derive the FFT towards the end of the class, but for now we will treat is as a blackbox that computes $Fx$. In python, it can be computed by: np.fft.fft(x). Similarly, the inverse FFT that computes $F^{-1}x$ can be computed by np.fft.ifft(x).

Circulant matrices are square matrices in which all row vectors are composed of the same elements and each row vector is rotated one element to the right relative to the preceding row vector. For example, a $5 \times 5$ circulant matrix takes the form:

$$ C = \qquad\begin{bmatrix} c_0 & c_4 & c_3 & c_2 & c_1 \\ c_1 & c_0 & c_4 & c_3 & c_2 \\ c_2 & c_1 & c_0 & c_4 & c_3 \\ c_3 & c_2 & c_1 & c_0 & c_4 \\ c_4 & c_3 & c_2 & c_1 & c_0 \end{bmatrix} $$

The first column of $C$, sometimes called the signature, $c = [ c_0, c_1, \dots c_{n -1}]$, completely describes the matrix. Circulant matrices have an amazing property: they are diagonalized by the DFT, that is:

$$ C = F^{-1} D F $$

where $D$ is diagonal, and its diagonal entries are given by $\text{diag}(D) = Fc $ (this notation means $d_{ii} = [Fc]_i$: the i-th diagonal entry of D is the i_th component of the vector $Fc$ ). Using this fact and the FFT, write an algorithm that computes $Cx$ in $\mathcal{O}(n \log n )$ operations.

2.3: Toeplitz matrices

Toeplitz matrices are matrices whose entries are constant along diagonals. E.g., the following $5 \times 5$ matrix is Toeplitz:

$$ T = \qquad\begin{bmatrix} c_0 & r_1 & r_2 & r_3 & r_4 \\ c_1 & c_0 & r_1 & r_2 & r_3 \\ c_2 & c_1 & c_0 & r_1 & r_2 \\ c_3 & c_2 & c_1 & c_0 & r_1 \\ c_4 & c_3 & c_2 & c_1 & c_0 \end{bmatrix} $$

Note that circulant matrices are Toeplitz, but Toeplitz matrices are not circulant! However, one can always embed an $n \times n $ Toeplitz matrix into a larger $2n \times 2n$ circulant matrix. For example, the top-left $5 \times 5$ block of the circulant matrix below is the matrix T:

$$ A = \left[ \begin{array}{ccccc|ccccc} c_0 & r_1 & r_2 & r_3 & r_4 & 0 & c_4 & c_3 & c_2 & c_1 \\ c_1 & c_0 & r_1 & r_2 & r_3 & r_4 & 0 & c_4 & c_3 & c_2 \\ c_2 & c_1 & c_0 & r_1 & r_2 & r_3 & r_4 & 0 & c_4 & c_3 \\ c_3 & c_2 & c_1 & c_0 & r_1 & r_2 & r_3 & r_4 & 0 & c_4 \\ c_4 & c_3 & c_2 & c_1 & c_0 & r_1 & r_2 & r_3 & r_4 & 0 \\ \hline 0 & c_4 & c_3 & c_2 & c_1 & c_0 & r_1 & r_2 & r_3 & r_4 \\ r_4 & 0 & c_4 & c_3 & c_2 & c_1 & c_0 & r_1 & r_2 & r_3 \\ r_3 & r_4 & 0 & c_4 & c_3 & c_2 & c_1 & c_0 & r_1 & r_2 \\ r_2 & r_3 & r_4 & 0 & c_4 & c_3 & c_2 & c_1 & c_0 & r_1 \\ r_1 & r_2 & r_3 & r_4 & 0 & c_4 & c_3 & c_2 & c_1 & c_0 \\ \end{array} \right] $$

(the lines have been added for emphasis). Thus, we can write any Toeplitz matrix $T \in \mathbb{R}^{n \times n} $ as: $T = \begin{bmatrix} I_{n,n} & 0_{n,n} \end{bmatrix} A \begin{bmatrix} I_{n,n} \\ 0_{n,n} \end{bmatrix} $ where $A\in \mathbb{R}^{2n \times 2n}$ is the circulant embedding, $I_{n,n}$ is the $n\times n$ identity matrix, and $0_{n,n}$ is the $n\times n$ zero matrix. Using this fact, design and implement an $\mathcal{O}(n \log n)$ algorithm for matrix vector multiplication with a Toeplitz matrix $T$.

Problem 3: Always a BLAS

In this problem we will explore the limits of our model for reasoning about the speed of algorithms. Recall that we reason about the speed of an algorithm through their complexity, by counting the number of floating point operations (flop).

BLAS (Basic Linear Algebra Subprograms) are the low-level optimized implementation of the basic linear algebra operations. They are separated into levels:

One can write one matrix-matrix multiply of two $n \times n$ matrices (a level 3 BLAS operation) as $n$ matrix-vector multiplies ($n$ level 2 BLAS operations), thus one might think that they have comparable running time. This is, however, not what we see in practice. The reason for this is that BLAS level 3 operations are highly optimized and benefit from low data transfer to compute ratio. In pratice, this means that one should always use the highest level of BLAS possible, e.g., if you do many BLAS level 1 operations, try to replace them with a BLAS level 2 operation.

We will illustrate this by looking at a specific computation: computing the pairwise distances between two sets of vectors. Suppose we have two sets of points in $m$ dimensions: $\{ x_i\}_{i=1}^{n}$, $\{ y_j \}_{j=1}^{l}$, $x_i, y_j \in \mathbb{R}^m$ and we would like to compute the pairwise distance-squared between the two $d(x_i, y_j) = \| x_i - y_j \|^2$ . This operation is fundamental in data science application, e.g., for clustering.

The naive implementation of this operation with two nested for-loops (provided below) uses $n \times l$ level 1 BLAS operations, and thus has complexity $\mathcal{O}(nlm)$. Our goal is to rewrite it with higher level BLAS operations, but the complexity will remain the same.

3.1:

Your first task is to express the matrix with entries:

$$ [H]_{i,j} = \| x_i - y_j \|^2 $$

as the product of three matrices: two rank-1 matrices and the matrix $-2X^T Y$, where $X \in \mathbb{R}^{m \times n}$ is the matrix with columns $\{x_i\}$ and $Y \in \mathbb{R}^{m \times l}$ is the matrix with column $\{y_i\}$. That is, find $u_1, v_1$ and $u_2, v_2$ such that

$$ H = u_1v_1^T + u_2 v_2^T - 2X^TY $$

Write your answer to part 3.1 in this cell (use math and words, NOT code!)

3.2:

Your second task is to implement this formula with one level 3 BLAS operations, a few (say 4) level 2 BLAS operations and at most $O(n)$ level 1 BLAS operations, and check the difference in running time. You don't have to justify your operation count but make sure it is right!

3.3: Fast image alignment

A related problem is image alignment, where one seeks to align two images as shown below.

image alignment

(if the image does not appear, see the image_alignment.png in your folder)

To make our life simpler, we will consider 1D signals that are periodic, but the same ideas apply to images. A natural idea to align two signals is to find the translation of one signal that results in the smallest difference between the two signals (this is illustrated in the figure). That is, we seek the translation $t$ that minimizes

$$d( S(y_1,t), y_2) = \| S(y_1,t) - y_2\|^2$$

where $S(y_1,t)$ is the operator that shifts circularly the signal by $t$ pixels. This task can get prohibitively expensive quickly for large datasets of images, so it is often important to do this computation as efficiently as possible.

Your task is to design and implement an algorithm to compute all m entries of the vector

$$ d = \begin{bmatrix} d( S(y_1,0), y_2) \\ d( S(y_1,1), y_2) \\ \vdots\\ d( S(y_1,m-1), y_2) \end{bmatrix} $$

in $\mathcal{O}(m \log m)$ operations.

Problem 4: Conditioning

Solve problem 2.9 from Suli and Mayers:

  1. Prove that, for any nonsingular matrix $A \in \mathbb{R}^{n \times n}$,
$$ \kappa_2(A) = \left( \frac{\lambda_n}{\lambda_1} \right)^{1/2}, $$

where $\lambda_1$ is the smallest and $\lambda_n$ is the largest eigenvalues of the matrix $A^\top A$.

  1. Show that the condition number $\kappa_2(Q)$ of an orthogonal matrix $Q$ is equal to $1$.

  2. Conversely, if $\kappa_2(A) = 1$ for a matrix $A$, show that all the eigenvalues of $A^\top A$ are equal; deduce that $A$ is a scalar multiple of an orthogonal matrix.

Recall that the condition number $\kappa_2(A)$ is defined by $\kappa_2(A) = \sigma_\text{max}/\sigma_\text{min}$ where $\sigma_\text{max}$ and $\sigma_\text{min}$ are the largest and smallest singular values of $A$.

Write your answer for problem 4 in this cell. You may also turn in hand-written answers for this problem.