## 13 February, 2016

### Linear Algebra

While linear algebra is, strictly speaking, a subject of algebra and not of analysis or
diﬀerential equations (which are, along with probability theory, the basic mathematical
tools used for studying quantitative ﬁnance), the theory of vector spaces is so fundamen-
tal to the modern formulation of either, that it is necessary to study it before proceeding
too far. Indeed, the theory has proven itself to be perhaps the most inﬂuential in unifying
signiﬁcant branches of modern mathematics

## 07 December, 2015

### Numerical Analysis Review Part 1a: Solutions of Equations in One Variable (Bisection Method)

- Introduction -

This post begins a sequence of posts reviewing topics from numerical analysis that are important in quantitative finance. In this first post we will cover root solving techniques for single equations, discussing aspects from the theory and exhibiting several computational examples for different types of equations.

By subtracting from both sides, any system of $m$ equations of $n$ variables can be written in the form
$$(1)\;\;\;\;F(x_{1},\ldots,x_{n})=0$$
where $F:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is assumed to be a sufficiently smooth $\mathbb{R}^{m}$-valued function of $x\in\mathbb{R}^{n}$, where the term sufficient depends on the particular application or numerical method being applied.  Such a system is said to be in explicit form.  If the form (1) is not possible, then the system is said to be in implicit form and must be written as
$$(2)\;\;\;\;F(x_{1},\ldots,x_{n},y_{1},\ldots,y_{m})=0$$
where $x\in\mathbb{R}^{n}$, $y\in\mathbb{R}^{m}$ and $F:\mathbb{R}^{m+n}\to\mathbb{R^{m}}$.  We will assume that every system we encounter is explicit unless otherwise stated.  For example,
$$F(x)=x^{2}-3=0$$
is in explicit form and
$$F(x,y)=\int_{0}^{y}e^{-y'^{2}}dy'-\sqrt{\log|xy|}+\tan^{-1}\left(\frac{y}{x}\right)=0$$
is in implicit form and can not be put into explicit form by using elementary functions and operations alone.  Note that we do not limit ourselves to purely algebraic equations and allow for analytic constructs such as definite integrals and special functions defined by power series; however, we do not allow for equations that involve integrals and derivatives of $y$ (in particular, we are solving for variables or numbers, not functions, the topic of later posts covering integral and differential equations).  In any case, it is easy to find examples of purely algebraic equations that cannot be put into explicit form.

In general, for there to be any hope of a solution to (1) it is required that $m\leq n$ (if $m>n$ then $F$ is not surjective and hence there may be no $x\in\mathbb{R}^{n}$ such that $F(x)=0$).  If $m=n$ then we can expect any solution (if one exists) to be unique.  If $m<n$, then there could be many solutions and the system is said to be overparameterized (in such a case, we can often introduce $n-m$ additional equations called constraint conditions so that we obtain a new system $G(x)=0$ where $G\in\mathbb{R}^{m+(n-m)}=\mathbb{R}^{n}$).  In the special case that $F$ is linear (i.e., $F$ can be represented by matrix multiplication), then these dimensional heuristics become actual theorems and much more can be definitively stated.  In any case, we shall come to these topics in future posts on systems of both linear and nonlinear equations.

For the remainder of this post, we shall assume both $m,n=1$ (we will not deal with the case $n>1$ here because this necessarily leads us to consider constraints, which in turn leads us to consider systems of equations where $m>1$).

- Single Equations in Explicit Form -

Let us consider an explicit equation in the form of (1) for $m,n=1$.  We shall consider several numerical methods for this problem:
1. Bisection
2. Fixed Point
3. Newton-Raphson
4. Secant Line
Each of these methods belongs to the class called iterative methods, as opposed to direct methods which are really only applicable to systems of linear equations where systematic direct procedures are possible because of certain theorems from linear algebra (indeed, if we could solve (1) directly, we would not need to resort to numerical methods, and there is no general procedure for solving systems of nonlinear equations in a systematic and direct fashion).

There are some general considerations for iterative methods that need to be made, which are applicable in all situations, among them include:
1. What are the requisite requirements (including regularity conditions) the problem must first satisfy?
2. How fast does the method converge?
3. What situations can lead to numerical instability for the method?
4. How sensitive is the method to choice of initial approximation?
We shall address these considerations when reviewing each of the methods listed above.

- Bisection Method -

Let $F:\mathbb{R}\to\mathbb{R}$ be continuous on $[a,b]$.  The intermediate value theorem from elementary point-set topology states that for such an $F$, if $F(a)\leq0$ and $F(b)\geq0$, then there exists some $x'\in[a,b]$ such that $F(x')=0$.  Although the method we are about to describe will converge if $x'$ is not unique, we assume that it is (this can be often be justified from considerations of the problem).
Assuming that $F(a)$ and $F(b)$ have opposite signs and are non-zero (if either is equal to $0$ then we are done), we sample a point $x_{0}\in(a,b)$ and compute $F(x_{0})F(a)$ and $F(x_{0})F(b)$.  By the intermediate value theorem, $F(x_{0})$ has the opposite sign from either $F(a)$ or $F(b)$, and we define $I_{1}=(x_{0},b)$ or $I_{1}=(a,x_{0})$ according to whether
$$F(b)F(x_{0})<0$$
or
$$F(a)F(x_{0})<0$$
holds.  We then iterate this procedure, obtaining a sequence of intervals $\{I_{n}\}_{n\geq0}$ with $x'\in I_{n}$ for every $n$ and $|I_{n}|\to0$ as $n\to\infty$.  In principle we would stop once $F(x_{n})=0$, although we usually use a condition like
$$|F(x_{n})|<\epsilon,$$
$$|x_{n+1}-x_{n}|<\epsilon,\;\text{or},$$
$$\frac{|x_{n+1}-x_{n}|}{|x_{n}|}<\epsilon\;(x_{n}\neq0).$$
We have thus established the main requirements for our method (Consideration #1): continuity of $F$ (so that the intermediate value theorem holds) and determination of an interval $I_{0}=[a,b]$ where $x'\in(a,b)$ and $F(a)F(b)<0$.  Furthermore, in order for the method to converge to the root applicable to the problem in mind (in the event that $F$ has multiple roots in its domain), we have the additional requirement that $I_{0}$ be chosen so that $F$ has only one root in $I_{0}$ (it is clear that the method will converge to a root as long as one exists in $I_{0}$).

To address Considerion #2, we present the following theorem.
Theorem 1Suppose that $F\in\mathcal{C}[a,b]$, that $F(a)F(b)<0$, and that $x_{0}=\frac{b+a}{2}$ (the midpoint of $[a,b]$).  Then the bisection method generates a sequence of points $\{x_{n}\}_{n=0}^{\infty}\subset[a,b]$ such that $x_{n}\to x'$ where $F(x')=0$.  Moreover, we have the quantitative bound $$|x_{n}-x'|\leq\frac{b-a}{2^{n}}$$ for every $n\geq1$.
Proof.  We assume that the midpoint is taken to advance each step of the iteration, i.e. $x_{n}=\frac{1}{2}(b_{n}+a_{n})$ for all $n\geq1$ is the mid-point of $I_{n}=[a_{n},b_{n}]$.  In that case, since for all $n\geq1$
$$b_{n}-a_{n}=\frac{b-a}{2^{n-1}}$$
and $x'\in(a_{n},b_{n})$, we have
$$|x_{n}-x'|\leq\frac{1}{2}(b_{n}-a_{n})=\frac{b-a}{2^{n}}\to0\;\text{as}\;n\to\infty.$$
In particular,
$$x_{n}=x'+O\left(2^{-n}\right).$$
It is instructive to solve for $n$ for a given error tolerance $\epsilon$.  We have
$$2^{n}\leq\epsilon^{-1}(b-a),$$
thus taking logarithms (which is valid since $\log x$ is an increasing function) we obtain
$$n\leq C\log\left(\epsilon^{-1}(b-a)\right)$$
where $C=(\log(2))^{-1}$.  If $\epsilon=10^{-6}$, then $$N(\epsilon)\approx20+C\log(b-a),$$ thus if $(b-a)=1$ then at least 20 iterations are required before $|x_{n}-x'|<10^{-6}$ can be guaranteed.  In practice, the number of required iterations will usually be substantially lower, but even so, as we shall see there are methods which are guaranteed to converge much faster provided additional regularity conditions on $F$ are satisfied.

As for Considerations #3 and #4, it should be evident that the bisection method will always converge as long as $F$ is continuous.  Moreover, the approximating sequence is guaranteed to stay bracketed within in the initial interval $[a,b]$ for all $n\geq 1$, and the convergence is not affected asymptotically by the initial choice $x_{0}$.  This is one of the great virtues of the bisection method and one of the reasons that it is frequently used to start some of the more sensitive iteration schemes that we will describe below.

To illustrate the bisection method and its convergence properties, we will implement it in C++ to solve some equations that we will also solve with other methods to be discussed below.  There are by now many ways in C++ to represent functions for numerical methods.  Clasically, this was done (in C) using function pointers and (in C++) or functors (objects with the () operator overloaded) or generically via templates.  C++11 and prior to C++11 the Boost library) offers a number of convenient alternatives including functional programming constructs like variadic templates and lambda expressions, as well as objected oriented constructs like function objects and argument binding.  Throughout this series, we will apply all of these old and new methodologies.  We shall also use MATLAB to prototype and/or check our methods in C++ (and in some cases use the MATLAB API in our C++ programs).

bisection_method.h
#pragma once

#include <functional>
#include <numeric>

double bisection_method(std::function<double(double x)>& F, int& error, double a = 0, double b = 1, double eps = 0.0000001);
int sgn(double x);

double bisection_method(std::function<double(double x)>& F, int& error, double a, double b, double eps)
{
if (!F) {
error = 1;
return 1;
}
if (sgn(F(a)) == sgn(F(b))) {
error = 2;
return 1;
}
if (b < a)
std::swap(a, b);

int N = round(log((b - a) / eps) / log(2)) + 1;
double x_n = (a + b) / 2;
double x = x_n + 1;
int n = 0;
while ((n <= N) && (abs(x_n - x) > eps)) {
x = x_n;
if (sgn(F(a))*sgn(F(x)) < 0) { // update b_n
b = x;
x_n = (a + x) / 2;
} else { // update a_n
a = x;
x_n = (x + b) / 2;
}
++n;
}
if (n == N)
error = 3;
else
error = 0;
return x;
}

int sgn(double x)
{
return x < 0 ? -1 : 1;
}

main.h
#include <iostream>
#include <cstdlib>
#include <functional>
#include <cmath>
#include "bisection_method.h"

int main(void)
{
int error = 0;
std::function<double(double)> F = [](double x) { return 1 - exp(x); };
double x = bisection_method(F, error, -1, 3);
std::cout << "Root estimated to be x' = " << x << " (Error Code: " << error << ")" << std::endl;
system("PAUSE");
return 0;
}


In this code we have asked our solver to find the root of $$F(x)=1-e^{x}$$
which it determines within $\epsilon=10^{-6}$ to be $x'=0$.  Note that the return is with error code $0$, indicating that the bisection method succeeded before the theoretical upper-bound on the number of iterations $N(\epsilon)$. The following figure illustrates the situation - observe that the root $x'=0$ is a fixed point of $F$. We shall discuss this situation further when we get to fixed-point iteration.

## 12 July, 2015

### Gaussian Form of Geometric Brownian Motion

In numerical analysis, it can in some cases be more efficient to evaluate a function when it is expressed in a certain form.  For example, it is numerically more efficient to compute a second degree polynomial
$$P_{2}(x)=a_{2}x^{2}+a_{1}x+a_{0}$$
when it is in the form
$$P(x)=x(a_{2}x+a_{1})+a_{0}.$$
That is because evaluating the former requires three multiplication operations and two addition operations, whereas the latter requires only two multiplication operations and two addition operations.  This way of evaluating polynomials numerically is known as Horner's Method, which describes the general formulation for a polynomial of degree $n$ and coefficients $\{a_{j}\}_{0\leq j\leq n}.$  A simple implementation of the algorithm in C can be found here.  In general, evaluation of a $n$ degree polynomial $P(x)$ will require $2n-1$ multiplication operations and $n$ addition operations using the naïve method, whereas it will require just $n$ multiplication operations and $n$ addition operations using Horner's Method.  Although from the point of view of algorithmic analysis both methods are equivalent (they both require $O(n)$ operations to compute $P(x)$), on a practical/implementation level, Horner's Method requires only $2n$ operations whereas the naïve approach requires $3n-1$ operations.  On the other hand, since the savings come from only the multiplication operations, the performance gain (unless each $a_{j}$ and $x$ are all integers) is somewhat tempered by the fact that for most computing hardware floating point multiplication is much faster than floating point addition (the opposite is generally true for integral arithmetic).  Note also that if $x^{2}$ is expressed explicitly as $x*x$, then a modern compiler will very likely optimize the naïve expression into something similar to Horner's Method; however, for high degree polynomials, expanding the powers with the multiplication operator like this is not feasible without producing very ugly, static, and difficult to maintain code - a power function of some sort is likely to be used and it is then significantly less likely the compiler will be able (or if it can, choose) to optimize the resulting code).

Now if the function in question is evaluated repeatedly, as in the case of finite difference methods and especially Monte Carlo simulations, then it is sometimes possible to achieve significant performance gains.  In financial applications, the formula for the exact solution to the SDE
$$dS_{t}=rSdt+\sigma SdW_{t},$$
where $W_{t}$ is a standard Brownian motion process, frequently needs to be evaluated repeatedly in the context Monte Carlo simulations.  The formula is given by
$$S_{t}=S_{0}\exp\left\{\left(r-\frac{1}{2}\sigma^{2}\right)t+\sigma W_{t}\right\}.$$
Frequently, certain parameters in the formula are known and can be synthesized as constants before entering into the main control loop, which could be on the order of millions or even billions of iterations.  In the case of a very simple Monte Carlo simulation for the pricing of a European vanilla call option, as discussed in this post, all the parameters except for the samples $W_{t}$ are known ($t$ is fixed in this situation).  Thus we can rewrite $S_{t}$ as
$$S_{t}=Ae^{bN},$$
where
$$A=S_{0}\exp\left\{\left(r-\frac{1}{2}\sigma^{2}\right)t\right\},\;\;\;\;b=\sigma\sqrt{t},$$
and $N$ is a random variable with a standard normal distribution that is re-sampled at each iteration.  $A$ and $b$ are both computed before entering into the Monte Carlo simulation loop, thereby realizing a significant gain in computational efficiency when the number of iterations (trials) is large.

In other situations, different functional forms of $S_{t}$ can be useful.  A number of optimized numerical routines are available for Gaussian functions, which are functions of the form
$$G(\alpha,\beta,\gamma;x)=\alpha\exp\left\{-\frac{(x-\beta)^{2}}{2\gamma^{2}}\right\}.$$
If we match $S_{t}$ with $G(x)$, then the obvious candidate for identification with the argument $x$ is $\sigma$.  If we complete the square in the argument of the exponential function in $S_{t}$, we get
$$-\frac{t}{2}\left(\sigma^{2}-\frac{2W_{t}}{t}\sigma\right)=-\frac{t}{2}\left(\sigma-\frac{W_{t}}{t}\right)^{2}+\frac{W_{t}^{2}}{2t}.$$
Therefore,
$$S_{t}=S_{0}\exp\left\{rt+\frac{W_{t}^{2}}{2t}\right\}\exp\left\{-\frac{\left(\sigma-\frac{W_{t}}{t}\right)^{2}}{2/t}\right\}.$$
Matching the triple of parameters $(\alpha,\beta,\gamma)$ in $G(\sigma)$, we find
$$S_{t}(\sigma)=G(\sigma)$$
where
$$\alpha=S_{0}\exp\left\{rt+\frac{W_{t}^{2}}{2t}\right\}, \;\;\;\; \beta=\frac{W_{t}}{t}, \;\;\;\; \gamma=\frac{1}{\sqrt{t}}.$$
If we map $W_{t}\mapsto\sqrt{t}{N}$ where $N$ is a standard normal random variable, then
$$\alpha=S_{0}\exp\left\{rt+\frac{N^{2}}{2}\right\}, \;\;\;\; \beta=\frac{N}{\sqrt{t}}, \;\;\;\; \gamma=\frac{1}{\sqrt{t}}.$$

This is the Gaussian form of $S_{t}$, with independent variable $\sigma$.  This representation is particularly useful for numerical pricing routines when a stochastic/local volatility model is being used.

## 23 March, 2015

### A Primer in Harmonic Analysis

I picked these problems from Modern Fourier Analysis Vol I - I think that they serve as a good primer for the basic techniques and theorems in harmonic analysis (a subject that I have recently started looking back into in order to deal with some of the techniques used when Levy processes in mathematical finance).
Problem I.  Fix $d\geq1$ and suppose $\psi:(0,\infty)\mapsto[0,\infty)$ is $C^{1}$, non-increasing, and $\int_{\mathbb{R}^{d}}\psi(|x|)\;dx\leq A<\infty.$  Define
$$[M_{\psi}f](x):=\sup_{0<r<\infty}\frac{1}{r^{d}}\int_{\mathbb{R}^{d}}|f(x-y)|\psi\left(\frac{|y|}{r}\right)\;dy$$
and show that $$[M_{\psi}f](x)\leq A[Mf](x)$$
where $M$ is the usual Hardy-Littelwood maximal function.
Solution.  We first observe that the translation invariance of the indicated estimate implies that it is sufficient to prove the case $x=0$ (this can be seen more explicitly by replacing $f$ by $\tau_{x}f$, where $\tau_{x}$ is the translation by $x$ operator, and applying the present case to be proven to see then that the estimate holds for all $x$).   For convenience let us define $\psi_{r}(|y|)=r^{-d}\psi(|y|/r)$.  The radial properties of the terms in the estimate suggest polar coordinates will be useful in dealing with the resultant integrals.  Let us recall that the polar coordinate formula implies as a consequence of itself that
$$\frac{d}{ds}\int_{B(0,s)}f(y)\;dy=\frac{d}{ds}\int_{0}^{s}dt\int_{\partial B(0,t)}f(\omega)\;dS(\omega)=\int_{\partial B(0,s)}f(\omega)\;dS(\omega)=s^{d-1}\int_{S^{d-1}}f(s\omega)\;dS(\omega).$$
In the last term we have used a change of variables in order to place the integration over the the unit sphere (and in particular, to keep the domain fixed).  In order to apply this formula in an integration by parts without causing notational chaos, let us define
$$\alpha(s)=\int_{S^{d-1}}|f(s\omega)|\;dS(\omega)$$
and
$$\beta(s)=\int_{0}^{s}\alpha(t)t^{d-1}\;dt.$$
Note that $\beta(s)$ is majorized by $\omega(d)s^{d}[Mf](0)$ where $\omega(d)$ is the measure of the unit ball in $\mathbb{R}^{d}$.
Let us make a further assumption that $\psi$ is compactly supported on a ball of radius $\delta$ so that $\psi_{r}$ is also compactly supported (and thus bounded) on a ball of radius $r\delta$.  Invoking polar coordinates in the left hand side, setting $x=0$, $\psi(|y|)=\psi(|-y|)$ (in particular, the associativity of convolution), and integration by parts along with the fact that $\beta(0)=0=\psi_{r}(r\delta)$, we estimate at last

\begin{align*} \int_{\mathbb{R}^{d}}|f(-y)|\psi_{r}(|y|)\;dy&=\int_{\mathbb{R}^{d}}|f(y)|\psi_{r}(|y|)\;dy\\ &=\int_{0}^{\infty}\psi_{r}(s)s^{d-1}\;ds\int_{\mathcal{S}^{d-1}}|f(s\omega)|\;dS(\omega)\\ &=\int_{0}^{r\delta}\psi_{r}(s)s^{d-1}\int_{\mathcal{S}^{d-1}}|f(s\omega)|\;dS(\omega)\\ &=\int_{0}^{r\delta}\psi_{r}(s)s^{d-1}\alpha(s)\;ds\\ &=\beta(r\delta)\psi_{r}(r\delta)-\beta(0)\psi_{r}(0)-\int_{0}^{r\delta}\beta(s)d\psi_{r}(s)\\ &=\int_{0}^{r\delta}\beta(s)d(-\psi_{r}(s))\\ &\leq[Mf](0)\int_{0}^{\infty}\omega(d)s^{d}d(-\psi_{r}(s))\\ &=[Mf](0)\int_{0}^{\infty}d\omega(d)s^{d-1}\psi_{r}(s)\;ds\\ &=[Mf](0)\int_{0}^{\infty}\psi_{r}(s)\;ds\int_{\partial B(0,s)}\;dS(\omega)\\ &=[Mf](0)\int_{\mathbb{R}^{d}}\frac{1}{r^{d}}\psi\left(\frac{|y|}{r}\right)\;dy\\ &=[Mf](0)\int_{\mathbb{R}^{d}}\psi(|y|)\;dy\\ &=A[Mf](0), \end{align*}
as desired.  To complete the proof, simply take an increasing sequence $\psi_{n}\to\psi$ of compactly supported $C^{1}$ functions.  Since the estimate holds for each $\psi_{n}$, it holds for the limit function $\psi$.

Problem II.  Consider the heat kernel $$G(x,t)=\frac{1}{(4\pi t)^{\frac{d}{2}}}e^{-\frac{|x|^{2}}{4t}}.$$