07 December, 2015

Numerical Analysis Review Part 1a: Solutions of Equations in One Variable (Bisection Method)

- Introduction -

This post begins a sequence of posts reviewing topics from numerical analysis that are important in quantitative finance. In this first post we will cover root solving techniques for single equations, discussing aspects from the theory and exhibiting several computational examples for different types of equations.

By subtracting from both sides, any system of $m$ equations of $n$ variables can be written in the form
where $F:\mathbb{R}^{n}\to\mathbb{R}^{m}$ is assumed to be a sufficiently smooth $\mathbb{R}^{m}$-valued function of $x\in\mathbb{R}^{n}$, where the term sufficient depends on the particular application or numerical method being applied.  Such a system is said to be in explicit form.  If the form (1) is not possible, then the system is said to be in implicit form and must be written as
where $x\in\mathbb{R}^{n}$, $y\in\mathbb{R}^{m}$ and $F:\mathbb{R}^{m+n}\to\mathbb{R^{m}}$.  We will assume that every system we encounter is explicit unless otherwise stated.  For example,
is in explicit form and
is in implicit form and can not be put into explicit form by using elementary functions and operations alone.  Note that we do not limit ourselves to purely algebraic equations and allow for analytic constructs such as definite integrals and special functions defined by power series; however, we do not allow for equations that involve integrals and derivatives of $y$ (in particular, we are solving for variables or numbers, not functions, the topic of later posts covering integral and differential equations).  In any case, it is easy to find examples of purely algebraic equations that cannot be put into explicit form.

In general, for there to be any hope of a solution to (1) it is required that $m\leq n$ (if $m>n$ then $F$ is not surjective and hence there may be no $x\in\mathbb{R}^{n}$ such that $F(x)=0$).  If $m=n$ then we can expect any solution (if one exists) to be unique.  If $m<n$, then there could be many solutions and the system is said to be overparameterized (in such a case, we can often introduce $n-m$ additional equations called constraint conditions so that we obtain a new system $G(x)=0$ where $G\in\mathbb{R}^{m+(n-m)}=\mathbb{R}^{n}$).  In the special case that $F$ is linear (i.e., $F$ can be represented by matrix multiplication), then these dimensional heuristics become actual theorems and much more can be definitively stated.  In any case, we shall come to these topics in future posts on systems of both linear and nonlinear equations.

For the remainder of this post, we shall assume both $m,n=1$ (we will not deal with the case $n>1$ here because this necessarily leads us to consider constraints, which in turn leads us to consider systems of equations where $m>1$).

- Single Equations in Explicit Form -

Let us consider an explicit equation in the form of (1) for $m,n=1$.  We shall consider several numerical methods for this problem:
  1. Bisection
  2. Fixed Point
  3. Newton-Raphson
  4. Secant Line
Each of these methods belongs to the class called iterative methods, as opposed to direct methods which are really only applicable to systems of linear equations where systematic direct procedures are possible because of certain theorems from linear algebra (indeed, if we could solve (1) directly, we would not need to resort to numerical methods, and there is no general procedure for solving systems of nonlinear equations in a systematic and direct fashion).

There are some general considerations for iterative methods that need to be made, which are applicable in all situations, among them include:
  1. What are the requisite requirements (including regularity conditions) the problem must first satisfy?
  2. How fast does the method converge?
  3. What situations can lead to numerical instability for the method?
  4. How sensitive is the method to choice of initial approximation?
We shall address these considerations when reviewing each of the methods listed above.

- Bisection Method -

Let $F:\mathbb{R}\to\mathbb{R}$ be continuous on $[a,b]$.  The intermediate value theorem from elementary point-set topology states that for such an $F$, if $F(a)\leq0$ and $F(b)\geq0$, then there exists some $x'\in[a,b]$ such that $F(x')=0$.  Although the method we are about to describe will converge if $x'$ is not unique, we assume that it is (this can be often be justified from considerations of the problem).
Assuming that $F(a)$ and $F(b)$ have opposite signs and are non-zero (if either is equal to $0$ then we are done), we sample a point $x_{0}\in(a,b)$ and compute $F(x_{0})F(a)$ and $F(x_{0})F(b)$.  By the intermediate value theorem, $F(x_{0})$ has the opposite sign from either $F(a)$ or $F(b)$, and we define $I_{1}=(x_{0},b)$ or $I_{1}=(a,x_{0})$ according to whether
holds.  We then iterate this procedure, obtaining a sequence of intervals $\{I_{n}\}_{n\geq0}$ with $x'\in I_{n}$ for every $n$ and $|I_{n}|\to0$ as $n\to\infty$.  In principle we would stop once $F(x_{n})=0$, although we usually use a condition like
We have thus established the main requirements for our method (Consideration #1): continuity of $F$ (so that the intermediate value theorem holds) and determination of an interval $I_{0}=[a,b]$ where $x'\in(a,b)$ and $F(a)F(b)<0$.  Furthermore, in order for the method to converge to the root applicable to the problem in mind (in the event that $F$ has multiple roots in its domain), we have the additional requirement that $I_{0}$ be chosen so that $F$ has only one root in $I_{0}$ (it is clear that the method will converge to a root as long as one exists in $I_{0}$).

To address Considerion #2, we present the following theorem.
Theorem 1Suppose that $F\in\mathcal{C}[a,b]$, that $F(a)F(b)<0$, and that $x_{0}=\frac{b+a}{2}$ (the midpoint of $[a,b]$).  Then the bisection method generates a sequence of points $\{x_{n}\}_{n=0}^{\infty}\subset[a,b]$ such that $x_{n}\to x'$ where $F(x')=0$.  Moreover, we have the quantitative bound $$|x_{n}-x'|\leq\frac{b-a}{2^{n}}$$ for every $n\geq1$.
Proof.  We assume that the midpoint is taken to advance each step of the iteration, i.e. $x_{n}=\frac{1}{2}(b_{n}+a_{n})$ for all $n\geq1$ is the mid-point of $I_{n}=[a_{n},b_{n}]$.  In that case, since for all $n\geq1$
and $x'\in(a_{n},b_{n})$, we have
In particular,
It is instructive to solve for $n$ for a given error tolerance $\epsilon$.  We have
thus taking logarithms (which is valid since $\log x$ is an increasing function) we obtain
$$n\leq C\log\left(\epsilon^{-1}(b-a)\right)$$
where $C=(\log(2))^{-1}$.  If $\epsilon=10^{-6}$, then $$N(\epsilon)\approx20+C\log(b-a),$$ thus if $(b-a)=1$ then at least 20 iterations are required before $|x_{n}-x'|<10^{-6}$ can be guaranteed.  In practice, the number of required iterations will usually be substantially lower, but even so, as we shall see there are methods which are guaranteed to converge much faster provided additional regularity conditions on $F$ are satisfied.

As for Considerations #3 and #4, it should be evident that the bisection method will always converge as long as $F$ is continuous.  Moreover, the approximating sequence is guaranteed to stay bracketed within in the initial interval $[a,b]$ for all $n\geq 1$, and the convergence is not affected asymptotically by the initial choice $x_{0}$.  This is one of the great virtues of the bisection method and one of the reasons that it is frequently used to start some of the more sensitive iteration schemes that we will describe below.

To illustrate the bisection method and its convergence properties, we will implement it in C++ to solve some equations that we will also solve with other methods to be discussed below.  There are by now many ways in C++ to represent functions for numerical methods.  Clasically, this was done (in C) using function pointers and (in C++) or functors (objects with the () operator overloaded) or generically via templates.  C++11 and prior to C++11 the Boost library) offers a number of convenient alternatives including functional programming constructs like variadic templates and lambda expressions, as well as objected oriented constructs like function objects and argument binding.  Throughout this series, we will apply all of these old and new methodologies.  We shall also use MATLAB to prototype and/or check our methods in C++ (and in some cases use the MATLAB API in our C++ programs).

#pragma once

#include <functional>
#include <numeric>

double bisection_method(std::function<double(double x)>& F, int& error, double a = 0, double b = 1, double eps = 0.0000001);
int sgn(double x);

double bisection_method(std::function<double(double x)>& F, int& error, double a, double b, double eps)
 if (!F) {
  error = 1;
  return 1;
 if (sgn(F(a)) == sgn(F(b))) {
  error = 2;
  return 1;
 if (b < a) 
  std::swap(a, b);
 int N = round(log((b - a) / eps) / log(2)) + 1;
 double x_n = (a + b) / 2;
 double x = x_n + 1;
 int n = 0;
 while ((n <= N) && (abs(x_n - x) > eps)) {
  x = x_n;
  if (sgn(F(a))*sgn(F(x)) < 0) { // update b_n
   b = x;
   x_n = (a + x) / 2;
  } else { // update a_n
   a = x;
   x_n = (x + b) / 2;
 if (n == N)
  error = 3;
  error = 0;
 return x;

int sgn(double x)
 return x < 0 ? -1 : 1;

#include <iostream>
#include <cstdlib>
#include <functional>
#include <cmath>
#include "bisection_method.h"

int main(void)
 int error = 0;
 std::function<double(double)> F = [](double x) { return 1 - exp(x); };
 double x = bisection_method(F, error, -1, 3);
 std::cout << "Root estimated to be x' = " << x << " (Error Code: " << error << ")" << std::endl;
 return 0;

In this code we have asked our solver to find the root of $$F(x)=1-e^{x}$$
which it determines within $\epsilon=10^{-6}$ to be $x'=0$.  Note that the return is with error code $0$, indicating that the bisection method succeeded before the theoretical upper-bound on the number of iterations $N(\epsilon)$. The following figure illustrates the situation - observe that the root $x'=0$ is a fixed point of $F$. We shall discuss this situation further when we get to fixed-point iteration.