28 October, 2015

Measure Theoretic Approach to Linear Regression

In this post we discuss the basic theory of linear regression, mostly from the perspective of probability theory.  At the end we devote some time to statistical applications.  The reason for focusing on the probabilistic point of view is to make rigorous the actual use of linear regression in applied settings; furthermore, the approach clarifies what linear regression actually is (and isn't). Indeed, there seems to be wide gaps in understanding among practitioners, probably due to the ubiquitous use of the methodology and the need for less mathematically inclined people to use it.

Regardless of the perspective, we have a probability (sample) space $(\Omega,\mathcal{F},\mathbb{P})$ and real-valued $\mathcal{F}$-measurable random variables $X$ and $Y$ defined on $\Omega$.  What differentiates the probabilistic and statistical point of views are what we know about $X$ and $Y$.  In the former, we know their distributions, i.e. all of the statistical properties of $X$ and $Y$ on their (continuous) range of values.  In the latter case we have only samples $\{(x_{n}, y_{n})\}_{n\geq0}$ of $X$ and $Y$.  In regression theory, the goal is to express a relationship between $X$ and $Y$ and associated properties, a much easier task from the probabilistic point of view since we have a parametric (or at least fully determined) distributions.  From the statistical point of view, due to lack of a complete distribution, we have to make certain modeling, distributional, and even sampling assumptions in order to make useful inferences (without these assumptions, the inferences we would be able to make would be far too weak to serve any useful purpose).  We will discuss some of these assumptions that are typical in most regression applications later.  For now, we focus on $X$ and $Y$ with known distribution measures $\mu_{X}$ and $\mu_{Y}$.


24 October, 2015

Analyzing the Definition of Independence

One of the most fundamental concepts of probability theory is that of independence.  The concept is intuitive and captures the idea that two experiments are independent if the outcome of one does not affect the outcome of the other.  What we mean here by experiment is a measurable space $\Omega$ (called the sample space) consisting of points $\omega\in\Omega$ that represent all the possible outcomes of our experiment, and a $\sigma$-algebra $\mathcal{F}$ consisting of all possible combinations of outcomes (events) represented by subsets $A\in\Omega$.  Additionally, there is a measure $\mathbb{P}$ that assigns $1$ to the entire sample space $\Omega$ and that is countably additive on $\mathcal{F}$ in the sense that whenever $\{A_{n}\}_{n\geq0}$ is an at-most countable set of disjoint events of $\mathcal{F}$ we have $\mathbb{P}(\cup_{n} A_{n})=\sum_{n}\mathbb{P}(A_{n})$.  We then have the following formal definition of independence:


12 July, 2015

Gaussian Form of Geometric Brownian Motion

In numerical analysis, it can in some cases be more efficient to evaluate a function when it is expressed in a certain form.  For example, it is numerically more efficient to compute a second degree polynomial
$$P_{2}(x)=a_{2}x^{2}+a_{1}x+a_{0}$$
 when it is in the form
$$P(x)=x(a_{2}x+a_{1})+a_{0}.$$
That is because evaluating the former requires three multiplication operations and two addition operations, whereas the latter requires only two multiplication operations and two addition operations.  This way of evaluating polynomials numerically is known as Horner's Method, which describes the general formulation for a polynomial of degree $n$ and coefficients $\{a_{j}\}_{0\leq j\leq n}.$  A simple implementation of the algorithm in C can be found here.  In general, evaluation of a $n$ degree polynomial $P(x)$ will require $2n-1$ multiplication operations and $n$ addition operations using the naïve method, whereas it will require just $n$ multiplication operations and $n$ addition operations using Horner's Method.  Although from the point of view of algorithmic analysis both methods are equivalent (they both require $O(n)$ operations to compute $P(x)$), on a practical/implementation level, Horner's Method requires only $2n$ operations whereas the naïve approach requires $3n-1$ operations.  On the other hand, since the savings come from only the multiplication operations, the performance gain (unless each $a_{j}$ and $x$ are all integers) is somewhat tempered by the fact that for most computing hardware floating point multiplication is much faster than floating point addition (the opposite is generally true for integral arithmetic).  Note also that if $x^{2}$ is expressed explicitly as $x*x$, then a modern compiler will very likely optimize the naïve expression into something similar to Horner's Method; however, for high degree polynomials, expanding the powers with the multiplication operator like this is not feasible without producing very ugly, static, and difficult to maintain code - a power function of some sort is likely to be used and it is then significantly less likely the compiler will be able (or if it can, choose) to optimize the resulting code).


23 March, 2015

A Primer in Harmonic Analysis



I picked these problems from Modern Fourier Analysis Vol I - I think that they serve as a good primer for the basic techniques and theorems in harmonic analysis (a subject that I have recently started looking back into in order to deal with some of the techniques used when Levy processes in mathematical finance).
Problem I.  Fix $d\geq1$ and suppose $\psi:(0,\infty)\mapsto[0,\infty)$ is $C^{1}$, non-increasing, and $\int_{\mathbb{R}^{d}}\psi(|x|)\;dx\leq A<\infty.$  Define
$$[M_{\psi}f](x):=\sup_{0<r<\infty}\frac{1}{r^{d}}\int_{\mathbb{R}^{d}}|f(x-y)|\psi\left(\frac{|y|}{r}\right)\;dy$$
and show that $$[M_{\psi}f](x)\leq A[Mf](x)$$
where $M$ is the usual Hardy-Littelwood maximal function.
Solution.  We first observe that the translation invariance of the indicated estimate implies that it is sufficient to prove the case $x=0$ (this can be seen more explicitly by replacing $f$ by $\tau_{x}f$, where $\tau_{x}$ is the translation by $x$ operator, and applying the present case to be proven to see then that the estimate holds for all $x$).   For convenience let us define $\psi_{r}(|y|)=r^{-d}\psi(|y|/r)$.  The radial properties of the terms in the estimate suggest polar coordinates will be useful in dealing with the resultant integrals.  Let us recall that the polar coordinate formula implies as a consequence of itself that
$$\frac{d}{ds}\int_{B(0,s)}f(y)\;dy=\frac{d}{ds}\int_{0}^{s}dt\int_{\partial B(0,t)}f(\omega)\;dS(\omega)=\int_{\partial B(0,s)}f(\omega)\;dS(\omega)=s^{d-1}\int_{S^{d-1}}f(s\omega)\;dS(\omega).$$


22 March, 2015

Divergence of Harmonic Series on a Sequence of Decreasing Sub-Domains of $\mathbb{N}$


The series $\sum_{n\in\mathbb{N}}n^{-p}$ diverges if $p\leq1$ and converges if $p>1$, and so it may seem plausible that (being a "bifurcation point" of this condition) the harmonic series $\sum_{n\in\mathbb{N}}n^{-1}$ could converge on some proper subset of $A\subset\mathbb{N}$. This is obvious if $A$ is finite. If $A$ is infinite, then a moment's thought reveals that there are many subsets on which the harmonic series converges since its terms contain any other series with terms in $\mathbb{N}^{-1}$. So for instance $$\sum_{n\in\mathbb{N}}n^{-2}=\frac{\pi^{2}}{6},$$ $$\sum_{n\in\mathbb{N}}\frac{1}{n!}=e,$$ $$\sum_{n\in\mathbb{N}}\frac{1}{2^{n}}=2,$$ and so on. Given that rather "large" subsets of $\mathbb{N}$ lead to convergence of the harmonic series, the following result was somewhat surprising to me when I was first asked to prove it.
Claim. Let $$A_\epsilon := \{a \in \mathbb{N} : 1 - cos(a) < \epsilon\}.$$ Then $$\sum_{n\in A_\epsilon } \frac{1}{n}$$ diverges for all $0<\epsilon<1.$
Proof.  For $0<\epsilon< 1$, the inequality $1-\cos(a)< \epsilon$ has solutions for $$a\in(2k\pi-\theta,2k\pi+\theta)$$ where $\theta=\cos^{-1}(1-\epsilon)$ (note that $\theta\in(0,\frac{\pi}{2})$ and by using a Taylor expansion, it is easy to see $\theta=O(\epsilon^{\frac{1}{2}})$, although all that is important is $\theta\to0$ as $\epsilon\to0$). For there to be any positive integers $a:=a_{k}$ in such an interval, it is necessary and sufficient that
$$\frac{[2k\pi-\theta]}{[2k\pi+\theta]}<1,$$
where $[\cdot]$ is the "floor" function (round down, e.g. truncate the decimals). Intuitively, this condition just says there is an integer in the $k$th solution interval (note that there could be multiple integral solutions in a $k$th interval, though this is not very important since we are mostly interested in the case for small $\epsilon$; furthermore, since $\theta=O(\epsilon^{\frac{1}{2}})$, then once $\epsilon$ is sufficiently small (say $\epsilon<0.1,$ so that $2\theta$).

From the above observations and the fact that $2\pi<6.3$ (circumference of the unit circle), it is not difficult to ascertain that $\#A=\infty$ (the cardinality of the set $A$).  Therefore $A$ is countable with its elements forming an "approximate" arithmetic sequence of integers in that sense that for
$$D:=\max_{a_{i}\in A}|a_{i+1}-a_{i}|<\infty,$$
on "average" the difference of two successive integers is approximately $D$ (having analytical results that are sharp is unnecessary in the present situation as we are only after qualitative facts like convergence).

We can now determine whether or not the sum converges. Define sequences $a_{j}:=\frac{1}{j}$ for $j\in A $, and $0$ otherwise, and $b_{j}:=\frac{1}{j}$ for all $j=1,2,\ldots.$ Then $c_{j}:=\frac{a_{j}}{b_{j}}=1$ for $j\in A$, and $0$ otherwise. Therefore, $c_{j}$ has a sum which looks like $$1+0+\ldots+0+1+0+\ldots+0+1+\ldots$$ Define one more sequence $d_{j}:=1$ if $j=a_{1}D$ ($a_{1}$ being the first integral solution to the original inequality) and $0$ otherwise (in other words, $d_{j}$ really is an arithmetic sequence with common difference $D$). Recall from the theory of Cesaro summation that for zero-spacing $D$, $$\frac{1+0+\ldots+0+1+0+\ldots+0+\ldots+0+1_{n}}{n}\to\frac{1}{D+1}\;as\;n\to\infty$$ (note because Cesaro summation is an averaging process, the limit holds even if there is a finite number of instances of improper spacing for a finite number of terms). Consequently, $$\frac{d_{1}+\ldots+d_{j}}{n}=\frac{1}{D+1}\;as\;n\to\infty$$ (see previous parenthetical remark). Consequently, \begin{align*} \lim\limits_{n\to\infty}\frac{1}{n}\sum\limits_{j=1}^{n}\frac{a_{j}}{b_{j}} &=\lim\limits_{n\to\infty}\frac{1}{n}\sum\limits_{j=1}^{n}c_{j}\\ &\geq\lim\limits_{n\to\infty}\frac{1}{n}\sum\limits_{j=1}^{n}d_{j}\\ &=\frac{1}{D+1}\\ &>0 \end{align*} for all $\epsilon>0$, no matter how small (note that $D$ behaves something like $O(\theta^{-1})$, and by extension something like $O(\epsilon^{-\frac{1}{2}}).$ It follows that $$\sum\limits_{j=1}^{\infty}a_{j}=\infty,$$ e.g. diverges for every $\epsilon>0$ (if you don't see why or don't recognize the convergence theorem used, just apply the summation by parts formula to $\sum a_{j}$ together with the established bound).

21 March, 2015

Does the Trigonometric Harmonic Series Converge?



It is well known that the harmonic series $H(x)=\sum_{n=1}^{\infty} xn^{-1}=+\infty$ for every $x\neq0$, but what about the trigonometric harmonic series $T(x)=\sum_{n=1}^{\infty}e^{inx}n^{-1}$?  Obviously for $k=1,2,\ldots$ we have $T(2k\pi)=H(1)=+\infty$.  It is an interesting fact that the cancellation properties inherent in $T$ imply convergence.  This is relatively straight-forward to prove this using a modificaton of Leibniz's alternating series test.  More remarkable is that the convergence is actually absolute.

In order to investigate the convergence of
$$(1)\;\;\;\;\;T(x)=\sum_{n=1}^{\infty}\frac{e^{inx}}{n}<\infty,$$
first note that
$$\lim_{n\to\infty}|z^{n}|\to0$$
for every $z\in\mathbb{C}$ with $|z|<1$.  Since
$$1>\frac{1}{n}>\frac{1}{n+1}>0$$ for all $n>1$, we find $\frac{1}{n}\searrow0$ (monotonically decreases to zero) and so Dirichlet's test implies
$$(2)\;\;\;\;\;\sum\limits_{n=1}^{\infty}\frac{z^{n}}{n}<\infty,$$
the convergence taking place and being absolute for every $z$ with $|z|<1$.  To deal with the boundary $|z|=1$, note that if $|z|=1$ and $z\neq1$ (i.e. $z\neq1+0i$), then we have
$$\left|\sum_{n=1}^{N}z^{n}\right|=\left|\frac{1-z^{N+1}}{1-z}\right|\leq\frac{2}{1-z}<\infty.$$
The upper bound $M=\frac{2}{1-z}$ is independent of $N$ and so (2) holds for all $|z|\leq1$, except when $z=1$.  Putting $z\mapsto e^{inx}$ shows that (1) converges absolutely for every $x\neq 2k\pi$ ($k=1, 2, \ldots$).

To carry out the actual summation for $T(x)$ is a tedious exercise in complex analytic methods, and the resulting formulas are unworkable (although again rather remarkably, they contain only elementary functions).  Another approach is to recognize that $T(x)$ is the Fourier transform (series) of some periodic function with Fourier coefficients $\hat{f}(0)=0$ and for $n>1$
$$\hat{f}(n)=\frac{1}{n}.$$
Despite this, the computation is relatively straight-forward for certain values of $x$.  For example, take $x=1$ and note that
$$T(1)=\sum_{n=1}^{\infty}\frac{e^{in}}{n}.$$
Writing
$$\int\left(\underbrace{(e^{iz})^{1}+(e^{iz})^{2}+\ldots}_{\text{geomtric series with ratio }r=e^{iz}}\right)dz=\int\frac{e^{iz}}{1-e^{iz}}\;dz,$$
we find that (with $u=1-e^{iz}$)
$$\sum_{n=1}^{\infty}\frac{1}{in}(e^{iz})^{n}=i\int\frac{du}{u}=i\ln(1-e^{iz}).$$
Combining all of this together, we obtain
$$\begin{align*}
T(1)
&=\left(i\ln(1-e^{iz})\right)\Big|_{z=1}\\
&=i\ln\left(e^{i/2}\left((e^{-i/2}-e^{i/2}\right)\right)\\
&=i\ln\left(e^{i/2}\right)+i\ln\left(2i\sin\left(-\frac{1}{2}\right)\right)\\
&=-\frac{1}{2}+i\left(\ln(-i)+\ln\left(\sin\left(\frac{1}{2}\right)\right)\right)\\
&=-\frac{1}{2}+\frac{\pi}{2}+i\ln\left(2\sin\left(\frac{1}{2}\right)\right)
\end{align*}$$
Since
$$T(1)=\sum_{n=1}^{\infty}\left(\frac{\cos n}{n}+i\frac{\sin n}{n}\right),$$
taking real and imaginary parts yields
$$T(1)=\frac{-\ln(2-2\cos(1))}{2}+i\frac{\pi-1}{2}.$$

The graphic at the beginning of the post shows the graph of $\sin n/n$ on the $(n,x)$ plane.

20 March, 2015

A Rigorous Proof of Ito's Lemma

In this post we state and prove Ito's lemma.  To get directly to the proof, go to II Proof of Ito's Lemma.

For all its importance, Ito's lemma is rarely proved in finance texts, where one often finds only a heuristic justification involving Taylor's series and the intuition of the "differential form" of the lemma.  There are various reasons for this.  Ito's lemma is really a statement about integration, not differentiation.  Indeed, differentiation is not even defined in the realm of stochastic processes due to the non-differentiability of Brownian paths.  Thus, in order to present a proof of Ito's lemma, one must first cover stochastic integrals and prior to that the basic properties of Brownian motion, topics which for reasons of scope/audience cannot always be covered.  However, even more mathematically inclined texts only provide a sketch and skirt over technical details of convergence.  The purpose of this article is to remedy this situation and we begin with

I. MOTIVATION AND A REVIEW OF ORDINARY CALCULUS

If $f$ is $k+1$ times differentiable then Taylor's theorem asserts
$$(1)\;\;\;\;f(t+h)-f(t)=hf'(t)+\frac{h^{2}}{2}f''(t)+\ldots+\frac{h^{k+1}}{(k+1)!}f^{(k+1)}(t^{*})$$
where $t^{*}\in[t,t+h]$ if $h>0$ and $t\in[t+h,t]$ if $h<0$.

Fix $T>0$ ($T$ not necessarily small) and consider the difference $f(T)-f(0)$.  This can be computed as a sum of non-overlapping differences, i.e. if $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ is a partition of $[0,T]$, then with the aid of (1) using $h=t_{i+1}-t_{i}$, we get

$$\begin{align*}
(2)\;\;\;\;f(T)-f(0)&=\sum_{i=0}^{n-1}f(t_{i+1})-f(t_{i})\\
&=\sum_{i=0}^{n-1}f'(t_{i})(t_{i+1}-t_{i})+\frac{1}{2}\sum_{i=0}^{n-1}f''(t_{i})(t_{i+1}-t_{i})^{2}+\sum_{i=0}^{n-1}o\left(||\Pi||^{2}\right).\end{align*}$$

As $n\to\infty$ (or $||\Pi||\to0$, i.e. $\max_{i}(t_{i+1}-t_{i})\to0$), we get
$$\sum_{i=0}^{n-1}f'(t_{i})(t_{i+1}-t_{i})\to\int_{0}^{T}f'(s)\;ds$$
and for $k\geq2$
$$\frac{1}{k!}\sum_{i=0}^{n-1}f^{(k)}(t_{i})(t_{i+1}-t_{i})^{k}\leq||\Pi||^{k-1}\sum_{i=0}^{n-1}f^{(k)}(t_{i})(t_{i+1}-t_{i})\to0\cdot\int_{0}^{T}f^{(k)}(s)\;ds=0.$$

That is, $f(T)-f(0)=\int_{0}^{T}f'(s)\;ds$, which is the second fundamental theorem of calculus.  Now suppose $f$ and $g$ are smooth functions with $k+1$ derivatives and consider the composition $h=f\circ g$.  The familiar chain rule implies $h$ is differentiable and that
$$(3)\;\;\;\;h'(t)=f'(g(t))g'(t).$$

By substituting $h$ into (2) and computing $h^{(k)}$ iteratively according to (3), we get
$$(4)\;\;\;\;f(g(T))-f(g(0))=\int_{0}^{T}f'(g(x))g'(x)\;dx.$$

We shall now see what happens when $g$ is not differentiable.  In that case, $h$ is not differentiable, and (1) through (4) are no longer valid.  However, we can write (4) instead as
$$(5)\;\;\;\;f(g(T))-f(g(0))=\int_{0}^{T}f'(g(x))\;dg$$
where the integral is now taken as a Riemann-Stieltjes intergal.  If $g$ is differerntiable, then (5) reduces to (4), but (5) still makes sense even if $g$ is merely continuous (continuity is needed since $\int h\;dg$ is not well-defined if $h$ and $g$ share a common discontinuity, and $h=f(g(t))$ will in general be discontinuous wherever $g$ is).  Moreover, since $f$ is smooth, we may rewrite (2) as
$$\begin{align*}
(6)\;\;\;\;f(g(T))-f(g(0))&=\sum_{i=0}^{n-1}f(g(t_{i+1}))-f(g(t_{i}))\\
&=\sum_{i=0}^{n-1}f'(g(t_{i}))(g(t_{i+1})-g(t_{i})) + \frac{1}{2}\sum_{i=0}^{n-1}f''(g(t_{i}))(g(t_{i+1})-g(t_{i}))^{2}+\ldots\end{align*}$$

Despite $g$ being non-differentiable, if it is sufficently "nice" then the terms converge to the same values as in (2) and we will recover (5).  A useful sufficient condition is that $g$ be continuous and of bounded variation.  This means
$$[g](T)=\sup_{\Pi}\sum_{i\in\Pi}|g(t_{i+1})-g_(t_{i})|<\infty.$$
It is easy to prove that if $g$ is differentiable, then it is of bounded variation, since then an easy application of the above (or the mean-value theorem) gives (for a norm decreasing sequence of partitions $\Pi_{1},\Pi_{2},\ldots$)
$$[g](T)=\lim_{n\to\infty}\sum_{j\in\Pi_{n}}|g(t_{j+1})-g(t_{j})|=\int_{0}^{T}|g'(t)|\;dt<\infty.$$
For $\int f\;dg$, $g$ being of bounded variation and not sharing common discontinuities with $f$ is usually the most general sufficient condition used when considering existence, though this is not strictly necessary.  When $g$ is not of bounded variation, then $\int f\;dg$ may or may not exist, and it may even exist conditionally on the particular sample point used in the approximating sums, as we shall see below.

Now, the Ito lemma deals with the special case $g(t)=W(t)$ where $W$ is a Brownian motion sample path.  It turns out that for $\omega$ a.s. that
$$[W](T)=\infty,$$
and
$$[W,W](T):=\sup_{\Pi}\sum_{i\in\Pi}|W(t_{i+1})-W(t_{i})|^{2}=\infty.$$
The latter quantity is called the quadratic (or second) variation of $W$.  For continuous functions $g$, $[g,g](T)\equiv0$ (this follows from estimating the higher order terms in (2)).  Moreover,
$$[W]^{(3)}(T):=\sup_{\Pi}\sum_{i\in\Pi}|W(t_{i+1})-W(t_{i})|^{3}=0.$$
In fact, we have $[W]^{(\alpha)}(T)=\infty$ for $\alpha\leq2$ and $[W]^{(\alpha)}(T)=0$ for $\alpha>0$, a.s. $\omega$.  It would seem that the regularity on which integration theory depends on so directly (i.e. variation of the integrator) is not tractable for $W$.  It turns out though, that we can obtain something useful by weakening the definition slightly.  Let $\Pi_{1},\Pi_{2},\ldots$ be a sequence of partitions with $||\Pi_{n}||\to0$ as $n\to\infty$.  Then we can redefine the quadratic variation as
$$[W,W](T):=\lim_{n\to\infty}\sum_{i\in\Pi_{n}}|W(t_{i+1})-W(t_{i})|^{2}.$$
Unfortunately, even this is not well-defined without further qualification.  The reason that the supremum definition for the quadratic variation is a.s. infinite is due to the fact that it is possible for any $C>0$ to find a sequence of partitions $\{\Pi^{C}_{n}\}_{n}$ so that the above definition is equal to $C$ for some fixed sample path $\omega$.  However, the limit converges to $T$ in $L^{2}(\Omega)$ (or in probability, if you prefer).  That is to say, it converges in the $L^{2}$ norm to some random variables $Q(\omega)$ so that $Q(\omega)=T$ a.s. $\omega$ (recall that $L^{2}$ limits are defined only up to a set of measure $0$).  It turns out that if we make the further restriction that $\Pi_{1}\supset \Pi_{2}\supset \Pi_{3},\ldots...$ and that $\sum_{n=1}^{\infty}||\Pi_{n}||<\infty$, then the limit also holds $\omega$ a.s. pointwise (Borel-Cantelli).  In the remainder of this post we will not distinguish between these modes of convergence and state freely that $[W,W](T)=T$, without furthe reference to any technicalities with this claim.

Since $[W](T)=\infty$ and $[W,W](T)=T$, we must take care in computing the various limits appearing in
$$\begin{align*}(7)\;\;\;\;f(W(T))-f(W(0))&=\sum_{i=0}^{n-1}f(W(t_{i+1}))-f(W(t_{i}))\\
&=\sum_{i=0}^{n-1}f'(W(t_{i}))(W(t_{i+1})-W(t_{i})) + \frac{1}{2}\sum_{i=0}^{n-1}f''(W(t_{i}))(W(t_{i+1})-W(t_{i}))^{2}+\ldots\end{align*}$$

Since $[W,W](T)=T<\infty$, is follows that $[W]^{(k)}(T)=0$ for all $k\geq3$ by a simple estimate as as been done several times above.  Thus the $\ldots$ terms can safely be ignored.  And since $\sup_{t\in[0,T]}|f''(t)|<\infty$, the second sum converges.  We shall see that it converges to
$$\int_{0}^{T}f''(W(s))\;ds.$$
(Incidentally, this is where the commonly used, though mathematically meaningless, notation $dWdW=dt$ comes from).  The first term also converges, though this is not immediately obvious since the Riemann-Stieltjes theory does not apply to it as the integrator $W(t)$ is not of bounded variation.  It turns out that it converges to
$$\int_{0}^{T}f'(W(s))\;dW$$
where the integral is what is known as an Ito integral.  This integral is constructed exactly like a Riemann-Stieltjes integral, except that the sample point used in the approximating sums must always be the left-hand point of the interval.  Differerent approximation schemes (i.e. mid-point, right-point, etc.) lead to different limiting values.  If the mid-point is used, it is referred to as the Strochonivich integral.  We shall not need this integral here.  The reason that the Ito integral is used (i.e. left-hand point approximation) is that $f(W(t_{i}))$ is interpreted as the position we take in a stock at time $t_{i}$ with the information available at time $t_{i}$, and the capital gain on the stock is then $f(W(t_{i}))(W(t_{i+1})-W(t_{i}))$ if we assume the stock price follows a Brownian motion (which strictly speaking it doesn't, but we shall ignore this fact here since it can be corrected by replacing $W$ with geometric Brownian motion $X$).  Taking the limit as $\max|t_{i+1}-t_{i}|\to0$ and then summing the individual gains gives us the net capital gains on a portfolio resulting from taking positions $f(W(t))$ in continuous time.

In light of the above, we conclude that
$$(8)\;\;\;\;f(W(T))-f(W(0))=\int_{0}^{T}f'(W(s))\;dW(s)+\frac{1}{2}\int_{0}^{T}f''(W(s))\;ds.$$
Compare this to (4), and we see that we obtain one, and only one, extra term $\frac{1}{2}\int_{0}^{T}f''(W(s))\;ds$, can be traced back to the fact $[W,W](T)=T$ and $[W]^{k}=0$ for $k\geq2.$  This is often recast in differential notation (which again, is mathematically meaningless)
$$(9)\;\;\;\;df=f'dW+\frac{1}{2}f''dt.$$

The mathematically meaningful form is (8), though (9) is used more often for calculations since it is accompanied by what is known as a "box" calculus that facilitates computations.  This will be discussed in more detail below.


II. PROOF OF ITO'S LEMMA

Let $\{W(t)\}_{t\geq0}$ be a standard Brownian motion with the natural filtration $\{\mathcal{F}_{t}\}_{t\geq0}$, and $f(x,t)\in\mathcal{C}^{2}(\mathbb{R}\times[0,T])$ jointly in $(x,t)$.  We will consider the stochastic process $\Delta(t)=f(W(t),t)$, which is clearly adapted to $\{
\mathcal{F}_{t}\}_{t\geq0}.$

We take the following preliminary facts for granted, and defer to previous blog posts covering Brownian motion and stochastic integration for proofs.
  1. Almost surely, we have the variation formulas $[W]^{1}(t)=+\infty,[W]^{2}(t)=t$ and $[W]^{k}(t)=0$ for $k\geq3$.
  2. Almost surely, we have the convergence of $\lim_{||\Pi_{[0,T]}||\to0}\sum_{i=1}^{n}\Delta(t_{i})(W(t_{i+1})-W(t_{i}))$ for any continuous and adapted process $\Delta(t)$.  We denote this limit by $\int_{0}^{T}\Delta(t)\;dW(t)$ and refer to it as the Ito integral of $\Delta$.  The limit is taken in $L^{2}(\Omega).$
Theorem (Ito's Lemma).  With the notation above, we have for all $T>0$ $$\begin{align*}f(W(T),T)-f(W(0),0)=\\\int_{0}^{T}f_{t}(W(t),t)\;dt+\int_{0}^{T}f_{x}(W(t),t)\;dW(t)+\frac{1}{2}\int_{0}^{T}f_{xx}(W(t),t)\;dt.\end{align*}$$  We sometimes write for $f=f(W(t),t)$ $$df=f_{t}dt+f_{x}dW+\frac{1}{2}f_{xx}dt.$$

Proof.  Fix $T>0$ and let $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ be a partition of $[0,T]$ and compute using Taylor's expansion
$$\begin{align*}
f(W(T),T)-f(0,0)&=\sum_{i=0}^{n-1}(f(W(t_{i+1}),t_{i+1})-f(W(t_{i}),t_{i}))\\
&=\sum_{i=0}^{n-1}f_{t}(W(t_{i}),t_{i})(t_{i+1}-t_{i})\\
&+\sum_{i=0}^{n-1}f_{x}(W(t_{i}),t_{i})(W(t_{i+1})-W(t_{i}))\\
&+\frac{1}{2}\sum_{i=0}^{n-1}f_{xx}(W(t_{i}),t_{i})(W(t_{i+1})-W(t_{i}))^{2}\\
&+\sum_{i=0}^{n-1}O((t_{i+1}-t_{i})(W(t_{i+1})-W(t_{i})))\\
&+\sum_{i=0}^{n-1}O((t_{i+1}-t_{i})^{2})\\
&+\sum_{i=0}^{n-1}O((W(t_{i+1})-W(t_{i}))^{3})\\
&:= A+B+C+D+E+F.\end{align*}$$

The left hand side is unaffected by taking limits as $||\Pi||\to0$, and so we may do so in computing the right hand side terms.  Without loss of generality we assume $\Pi$ is uniform, so we consider equivalently $n\to\infty.$

The regularity of $f$ implies that
$$A\to\int_{0}^{T}f_{t}(W(t),t)\;dt\;\text{as}\;n\to\infty,$$
the integral being an ordinary Lebesgue (Riemann) integral.  By item 2 above we have
$$B\to\int_{0}^{T}f_{x}(W(t),t)\;dW(t)\;\text{as}\;n\to\infty,$$
the integral being an Ito integral as discussed here.  To deal with $D$, $E$ and $F$ we estimate
$$|D|\ll_{\beta}\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|\sum_{i=0}^{n-1}(t_{i+1}-t_{i})\ll_{\beta}T\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|,$$
$$|E|\ll_{\beta}\sup_{0\leq i\leq n}|t_{i+1}-t_{i}|\sum_{i=0}^{n-1}(t_{i+1}-t_{i})\ll_{\beta}T\sup_{0\leq i\leq n}|t_{i+1}-t_{i}|,$$
and
$$|F|\ll_{\beta}\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|\sum_{i=0}^{n-1}(W(t_{i+1})-W(t_{i}))^{2}.$$
Appealing to item 2 above we then conclude (since the maps $t\mapsto t$ and $t\mapsto W(t)$ are continuous) that
$$D,E,F\to0\;\text{as}\;n\to\infty.$$
It remains to establish the limit
$$C\to\frac{1}{2}\int_{0}^{T}f_{xx}(W(t),t)\;dt\;\text{as}\;n\to\infty.$$
Intuitively this should be true since $[W]^{2}(T)=T,$ a fact that we sometimes write as $dWdW=dt.$  However, a rigorous proof requires some effort, and this is precisely the point in the proof (assuming Brownian motion and stochastic integration are covered) that almost every mathematical finance text skips over.  (Note that theorem has already been proved in the special case that $f=p(x,t)$, a second degree polynomial; as an example, consider the special case $f(x,t)=\frac{1}{2}x^{2}$ in order to compute the Ito integral $\int_{0}^{T}W(t)\;dW(t)$).

Because this fact is of interest in and of itself, we isolate the proof that $C\to\frac{1}{2}\int_{0}^{T}f_{xx}(W(t),t)\;dt\;\text{as}\;n\to\infty$ in the following lemma.

Lemma.  Let $f$ be a bounded continuous function on $[0,T]$ and $\{W(t)\}_{t \geq 0}$ a standard one-dimensional Brownian motion. Then almost surely $$\sum_{i=0}^{n-1} f(W(t_{i}))(W(t_{i+1})-W(t_{i}))^{2}\to\int_{0}^{T}f(W(t))\;dt\;\text{as}\;n\to\infty$$ where $n\to\infty$ means (WLOG) $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ is a uniform partition of $[0,T]$ and $|\Pi| := \max_j |t_j-t_{j-1}|\to0$.

Proof.  Since $t \mapsto f(W(t))$ is (almost surely) continuous, $$\sum_{i=0}^{n-1} f(W_{t_{i}})(t_{i+1}-t_{i}) \to \int_0^T f(W(t))\;dt\;\text{as}\;n\to\infty.$$
Therefore, it suffices to show
$$I_n := \sum_{i=0}^{n-1} f(W(t_{i})) \bigg[ (W(t_{i+1})-W(t_{i}))^2 - (t_{i+1}-t_{i}) \bigg] \to 0\;\text{as}\;n\to\infty.$$

At this point it is convenient to define $\Delta t_{i} := t_{i+1}-t_{i}$ and $\Delta W_i := W(t_{i+1})-W(t_{i})$.  Recalling that $\{W(t)^2-t\}_{t \geq 0}$ is a martingale with respect to the canonical filtration $(\mathcal{F}_t)_{t \geq 0}$, we compute

$$\begin{align*} &\quad \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))[\Delta W_i^2 - \Delta_i]\Delta W_i^2-\Delta_i]\bigg)\\ &= \mathbb{E} \bigg( \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))  [\Delta W_i^2 - \Delta_i]  [\Delta W_i^2-\Delta_i] \mid \mathcal{F}_{t_{i}} \bigg) \bigg) \\ &= \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))  [\Delta W_i^2-\Delta_i]  \underbrace{\mathbb{E} \bigg( \Delta W_i^2 - \Delta_i \mid \mathcal{F}_{t_{i}} \bigg)}_{\mathbb{E}(\Delta W_i^2-\Delta i)=0} \bigg) = 0, \end{align*}$$

and thus

$$\mathbb{E}(I_n^2) = \mathbb{E}\left(\sum_{i=0}^{n-1} f(W(t_{i}))^2 (\Delta W_i^2-\Delta_i)^2 \right).$$

(Observe that the cross-terms vanish.)  Using that $f$ is bounded and $W(t)-W(s) \sim W(t-s) \sim \sqrt{t-s} W(1)$ we find

$$\begin{align*} \mathbb{E}(I_n^2) &\leq \|f\|_{\infty}^2 \sum_{i=0}^{n-1} \mathbb{E}\bigg[(\Delta W_i^2-\Delta_i)^2\bigg] \\ &= \|f\|_{\infty}^2 \sum_{i=0}^{n-1} \Delta_i^2  (\mathbb{E}(W_1^2)-1)^2 \\ &\leq C |\Pi| \sum_{i=0}^{n-1} \Delta_i = C |\Pi| T \end{align*}$$


for $C := \|f\|_{\infty}^2 (\mathbb{E}(W_1^2)-1)^2$. Letting $|\Pi| \to 0$, the claim follows.



III.  CLARIFICATION OF "ALMOST SURE" CONVERGENCE

We assume the reader is familiar with the various lines of convergence in real analysis: pointwise, uniform, almost uniform, in measure/probability, $L^{p}$, etc.  This short section is just to help clarify what is meant by almost sure convergence in the context of this and related topics.

Statements of convergence involving Brownian motion are almost always established in $L^{2}(\Omega,P)$, which in turn implies convergence in probability because Chebyshev's inequality states for a sequence of random variables $X_{n}$ and proposed limit $X$ that
$$P(|X_{n}-X|\geq\epsilon)\leq\frac{1}{\epsilon^{2}}\mathbb{E}\left[|X_{n}-X|^{2}\right]\to0\;\text{as}\;n\to\infty\;\text{for all}\;\epsilon>0\;\text{fixed}.$$

For example, in the proof of Ito's lemma we really proved that $$\lim_{n\to\infty}\sum_{i=0}^{n-1}f(W(t_{i-1}),t_{i-1})(W(t_{i+1})-W(t_{i}))^{2}=\int_{0}^{T}f(t)\;dt$$ in $L^{2}(\Omega)$, and by consequence, almost surely.  To clarify, this means that for almost every sample path, or outcome $\omega\in\Omega$, we have
$$\lim_{n\to\infty}X_{n}(\omega):=\lim_{n\to\infty}\sum_{i=0}^{n-1}f(W(t_{i-1}),t_{i-1})(W(t_{i+1})-W(t_{i}))^{2}=\int_{0}^{T}f(t)\;dt.$$

The case is similar to proving things like almost surely $[W,W](t)=t$ and almost surely $\int f(t)\;dW(t)$ exists in the Ito sense.


Bifurcating Lease Embedded FX Derivatives

Section I.  Overview

Suppose an entity enters into an agreement to lease property and make rental payments each month, but that the fixed notional underlying the lease payments is denominated in some other currency.  This introduces an exposure for the lessee (but not for the lessor) since now they must pay a domestic currency equivalent of some fixed amount in a foreign currency - in other words, they pay Lease Payment x Exchange Rate, whatever that might be at the time the payment becomes due.  If the entity is a corporate entity, accounting regulations require the entity to "bifurcate" the embedded derivative from the contract and account for it as though it was a legitimate derivative, per the rules of derivative accounting.  This introduces accounting complexities, but the problem must also of course be solved from a valuation point of view.


In this post we consider lease agreements as just described, as well as those with caps and floors on the exchange rate with strikes contractually written into the agreements.


Section II.  Valuation Methodology


For leases determined to have embedded derivatives (from the point of view of the domestic entity), we value the embedded derivative as a strip of component derivatives corresponding to each future cash flow. That is, each cash flow represents the notional (denominated in the foreign currency CUR1) for each component derivative, and value of the lease embedded derivative is the aggregate value of these component derivatives.

Our FX convention is CUR1/CUR2 where this rate is the \# units of CUR2 per 1 unit of CUR1 - such a quantity has units [CUR2]/[CUR1]. We refer to CUR2 as the domestic, functional and settlement currency and CUR1 as the foreign, deal and notional currency.

Our valuation methodology is based on usual market-practice - in particular, no arbitrage and discounted cash flow principles. For options, we use the additional assumption of no-arbitrage for an asset price following a simple geometric Brownian motion (Black-Scholes-Merton model). Consider a present valuation date $t$, future maturity date $T>t$, future cash flow $N=N(T)$, corresponding strike rate $K=K(T)$, forward rate $F=F_{t}(T)$, discount rate $D=D_{t}(T)$ and volatility $\sigma=\sigma_{t}(K,T)$. Let $V=V_{t}(T,N,K,F,D,\sigma)$ denote the value of a derivative written on CUR1/CUR2 with the previous parameters. Then our previous assumptions lead us to the following valuation formulas: $$(1)\;\;\;\;V^{\text{fwd}}_{t}(T)=N(T)\cdot(F_{t}(T)-K(T))\cdot D_{t}(T)$$
$$(2)\;\;\;\;V^{\text{call}}_{t}(\Phi(d_{+})F_{t}(T)-\Phi(d_{-})K)\cdot D_{t}(T),$$
and
$$(3)\;\;\;\;V^{\text{put}}_{t}(T)=(\Phi(-d_{-})K-N(-d_{+})F_{t}(T))\cdot D_{t}(T).$$
In (2) and (3) we define
$$d_{\pm}=\frac{1}{\sigma_{t}(K,T)\sqrt{T-t}}\left[\log\left(\frac{F_{t}(T)}{K(T)}\right)\pm\frac{1}{2}\sigma_{t}(K,T)^{2}(T-t)\right]$$
and
$$\Phi(x)=(2\pi)^{-1/2}\int_{-\infty}^{x}e^{-y^{2}}{2}\;dy,$$
the standard normal cumulative distribution function.
(Note the dependence of $\sigma_{t}$ on $(K,T)$ is due to the nature of FX option markets exhibiting term structure variation and ``smiles.'')

Section III.  Extraction Methodology

Section III(a).  Specifying the Strike

We extract the embedded derivative in accordance to the principle that the stated value of the cash flow at inception of the lease agreement should be such that the value of the embedded derivative at inception is $0$. We appy this principle to the forward component of the embedded derivative, and approximate it by assuming the cancellation between the cap and floor values (because one is a short position and the other is a long position - see below) would net $0$ if we assumed that they constitute a forward in combination. This is exactly true from put-call parity when the strikes are the same, but only approximately true if they are different (which they must be since otherwise the combination of the three instruments would net $0$ and there would be no embedded derivative). In particular, if the lease agreement has $i=1,2,3,\ldots,n$ future cash flows, a cap $\overline{S}=\overline{S}(T_{i})$ and a floor $\underline{S}=\underline{S}(T_{i})$, then for each corresponding component derivative we set (where again, $t=0$ is the inception date of the lease): $$(4)\;\;\;\;K^{\text{fwd}}(T_{i})=F_{0}(T_{i}),$$ $$(4)\;\;\;\;K^{\text{cap}}(T_{i})=\overline{S}(T_{i}),$$ and $$(4)\;\;\;\;K^{\text{flr}}(T_{i})=\underline{S}(T_{i}).$$ Accounting rules indicate that this is the proper approach from a valuation point of view.

Section III(b).  Specifying the Derivative - A Decomposition

In keeping with our notation, we let $L_{t}(T_{i})$ denote the present fair value at time $t$ of the future cash flow made at time $T_{i}$. This is always a negative quantity from the entity's point of view. The idea in order to obtain the embedded derivative is to separate the risky portion of this value from the non-risky portion. In particular, we decompose $L_{t}(T_{i})$ as
$$L
_{t}(T_{i})=B_{t}(T_{i})+Z_{t}(T_{i}),$$
where $B_{t}(T_{i})$ only depends on $t$ through the discounting at time $t$ $D_{t}(T_{i})$ (in particular, it is independent of market variables like $F_{t}(T_{i})$), and $Z_{t}(T_{i})$ is a function of all random market variables inherent in $L_{t}(T_{i})$. There are an infinite number of ways to structure such a decomposition, but accounting guidance discussed above is equivalent to certain initial and terminal conditions which allow us to uniquely solve for $B(T_{i})$ and $Z(T_{i})$.

Section III(c).  Specifying the Derivative - Forward Only Case

If the lease payment $L_{t}(T_{i})$ lacks any optional features, then its payoff is
$$L_{T_{i}}(T_{i})=-N\cdot S_{T_{i}}$$
and therefore its fair present value for $0<t<T_{i}$ is given by $$(7)\;\;\;\;L_{t}(T_{i})=-N(T_{i})\cdot F_{t}(T_{i})\cdot D_{t}(T_{i}).$$ Observe that this quantity has units of CUR2 and this is the present value of what the entity has to pay at time $T_{i}$. Since it depends on the forward rate $F_{t}(T_{i})$, it has an exposure to CUR1/CUR2 and is therefore risky. The ideas previous discussed involves decomposing $L_{t}(T_{i})$ into two parts $$L_{t}(T_{i})=B_{t}(T_{i})+Z_{t}(T_{i}),$$ where $B_{t}(T_{i})$ only depends on $t$ through $D_{t}(T_{i})$ (in particular, it is independent of $F_{t}(T_{i})$), and $Z_{t}(T_{i})$ is a function of $F_{t}(T_{i}).$ The initial condition $$Z_{0}(T_{i})=0$$
and
terminal payoff condition
$$L_{T_{i}}(T_{i})=B_{T_{i}}(T_{i})+Z_{T_{i}}(T_{i})=N(T_{i})\cdot F_{T_{i}}(T_{i})\cdot D_{T_{i}}(T_{i})=-N(T_{i})\cdot S_{T_{i}}$$
allow us to uniquely solve for the payoff of $B(T_{i})$ and $Z(T_{i})$, the principle of rational pricing and the fact that $B_{t}(T_{i})$ is a constant in $t$ (ignoring discounting) then gives us $L$, $B$, and $Z$ for all $0<t<T_{i})$. Indeed, from the terminal condition we have
$$Z_{T_{i}}(T_{i})=-N(T_{i})\cdot S_{T_{i}}-B_{T_{i}}(T_{i})$$
and from the initial condition
$$B_{0}(T_{i})=L_{0}(T_{i})=-N(T_{i})\cdot F_{0}(T_{i})\cdot D_{0}(T_{i})=-N(T_{i})\cdot K^{\text{fwd}}(T_{i})\cdot D_{0}(T_{i}).$$
Hence (dropping the ``fwd'' from $K$),
$$Z_{T_{i}}(T_{i})=-N(T_{i})\cdot(K(T_{i})-S_{T_{i}}).$$
This shows that the payoff of $Z(T_{i})$ is equal to a short position in CUR1/CUR2 with notion $N(T_{i})$. Applying our reasoning above, we discover $Z_{t}(T_{i})$ is given by (2). Explicitly,
$$(8)\;\;\;\;Z_{t}(T_{i})=-N(T_{i})\cdot(K(T_{i})-F_{t}(T_{i}))\cdot D_{t}(T_{i})$$br /> and
$$(9)\;\;\;\;B_{t}(T_{i})=-N(T_{i})\cdot K(T_{i}).$$

Section III(c).  Specifying the Derivative - Ranged Forward Case (Caps & Floors)

If $L_{t}(T_{i})$ has optional features, then the initial condition $Z_{0}(T_{i})$ is replaced by the value of these optional features using the strike prices given by (5) and (6). For a ranged forward, we have a cap $\overline{S}(T_{i})$ and a floor $\underline{S}(T_{i})$. With our FX convention CUR1/CUR2, the terms ``cap'' and ``floor' are really as such from the counter-party's perspective, or from the entity's perspective when considering the value of the overall lease $L_{t}(T_{i})$ (a cap and floor on how much the entity has to pay). However, when considering the value of $Z_{t}(T_{i})$ from the entity's, the cap $\overline{S}(T_{i})$ is an upper-bound on how much CUR2 can weaken against CUR1, hence a floor ($=1/\overline{S}(T_{i})$) on their losses from their short position in the forward component of the embedded derivative. Conversely, the floor $\underline{S}(T_{i})$ is a lower-bound on how much CUR2 can strengthen against CUR1 hence a cap ($=1/\underline{S}(T_{i})$) on their gains.

The previous paragraph shows that $Z_{t}(T_{i})$ is a sum the sum of three distinct derivatives $\sum_{k=1}^{3}Z^{k}_{t}(T_{i})$ - a short position in a put option on CUR1/CUR2 struck at $\underline{S}(T_{i})$, a long position in a call option on CUR1/CUR2 struck at $\overline{S}(T_{i})$, and a short position in a forward on CUR1/CUR2 struck at $K(T_{i})=F_{0}(T_{i}).$ This can be proved as we did for the case of a forward, where the initial condition is taken to be $$Z_{0}(T_{i})=\sum_{k=1}^{3}Z^{k}_{0}(T_{i})=V^{\text{call}}_{0}(T_{i})-V^{\text{put}}_{0}(T_{i})+\underbrace{V^{\text{fwd}}_{0}(T_{i})}_{=0},$$
as given by (1), (2) and (3), respectively.
The terminal condition is
$$L_{T_{i}}(T_{i})=\left\{\begin{array}{ll}-N(T_{i})\cdot\overline{S}(T_{i}),&S_{T_{i}}>\overline{S}(T_{i})\\-N(T_{i})\cdot S_{T_{i}},&\underline{S}(T_{i})\leq S_{T_{i}}\leq\overline{S}(T_{i})\\-N(T_{i})\cdot \underline{S}(T_{i}),&S_{T_{i}}<\underline{S}(T_{i}).\end{array}\right.$$
It follows that
$$B_{0}(T_{i})=-N(T_{i})\cdot K(T_{i})\cdot D_{0}(T_{i})$$
and hence
$$B_{t}(T_{i})=-N(T_{i})\cdot K(T_{i})\cdot D_{t}(T_{i})$$
for all $0<t<T_{i}.$ Now,
$$Z_{T_{i}}(T_{i})=L_{T_{i}}(T_{i})-B_{T_{i}}(T_{i})=\left\{\begin{array}{ll}K(T_{i})-\overline{S}(T_{i}),&S_{T_{i}}>\overline{S}(T_{i})\\K(T_{i})-S_{T_{i}},&\underline{S}(T_{i})\leq S_{T_{i}}\leq\overline{S}(T_{i})\\K(T_{i})-\underline{S}(T_{i}),&S_{T_{i}}<\underline{S}(T_{i}).\end{array}\right.$$
One verifies easily that this is equal to
$$Z_{T_{i}}(T_{i})=-(S_{T_{i}}K(T_{i}))+\max(S_{T_{i}}-\overline{S}(T_{i}),0)-\max(\underline{S}(T_{i})-S_{T_{i}},0)$$
which are the payoff functions of the indicated derivatives. Thus,
$$(10)\;\;\;\;Z_{t}(T_{i})=-A+B-C$$
for all $0<t<T_{i}$ where $A$ is given by (1), $B$ by (2) and $C$ by (3).


Section IV Lease Modifications - An Introduction

In a subsequent post I will elaborate on the bifurcation and valuation of modifications to lease agreements.  For now, let us keep in mind the above consider a typical lease cash flow $L_{t}(T)$ with a notional of $N$.  At the time the lease is entered into, FASB requires bifurcation of any implied derivative $Z$. Suppose $Z$ is just an FX forward (short CUR1/CUR2).  At inception ($t=0$) the strike is $K_{0}=F_{0}(T)$, the forward rate corresponding to the future time $T$ as calculated at time $t=0$. The value of $Z$ at any time $0<t<T$ is
$$Z_{t}(N_{0},K_{0},T)=N_{0}\cdot(K_{0}-\cdot F{t}(T))\cdot D_{t}(T),$$
where $D_{t}(T)$ is the discount factor for term $T$ at time $t$. This valuation methodology makes the embedded derivative $0$ at inception of the lease.

Suppose at some time $0<\tau<T$ we have the modification $N_{0}\mapsto N_{\tau}<N_{0}$ (the lease payment decreases). Then this is economically equivalent to maintaining the unmodified lease and entering into another lease with notional $\Delta N_{0,\tau}:=N_{0}-N_{\tau}$ as a lessor at the time of modification $\tau$. FASB would then require the lessor to put the resulting embedded derivative on their balance sheet that time.  The value of this derivative (since it is equivalent to a long position in CUR1/CUR2 or short CUR2/CUR1) is
$$\tilde{Z_{t}}(\Delta N_{0,\tau},K_{\tau},T)=\Delta N_{0,\tau}\cdot(F_{t}(T)-K_{\tau})\cdot D_{t}(T).$$
Now, from an operational lease accounting point of view, the P/L at the cash flow date is just $N_{0}-\Delta N_{0,\tau}=N_{\tau}.$ Therefore, the net embedded derivative of this overall lease contract is $Z+\tilde{Z}$ (the ``+'' is actually a ``-'' since we modeled $\tilde{Z}$ as a long position). Thus, the derivative's value at all times $\tau<t<T$ is
$$\begin{align*}
Z^{\tau}_{t}(N_{\tau},K_{\tau},T)
&=Z_{t}(N_{0},K_{0},T)+\tilde{Z_{t}}(\Delta N_{0,\tau},K_{\tau},T)\\
&=N_{0}\cdot(K_{0}-F_{t}(T))\cdot D_{t}(T)+\Delta N_{0,\tau}\cdot(F_{t}(T)-K_{\tau})\cdot D_{t}(T)\\
&=D_{t}(T)\Big[N_{0}K_{0}-N_{0}F_{t}(T)+N_{0}F_{t}(T)-N_{0}K_{\tau}-N_{\tau}F_{t}(T)+N_{\tau}K_{\tau}\Big]\\
&=N_{0}\cdot(K_{0}-K_{\tau})\cdot D_{t}(T)+N_{\tau}\cdot(K_{\tau}-F_{t}(T))\cdot D_{t}(T)\\
&=N_{0}\Delta K_{0,\tau}D_{t}(T)+N_{\tau}\cdot(K_{\tau}-F_{t}(T))\cdot D_{t}(T)\\
&=N_{\tau}(K_{\tau}-F_{t}(T))\cdot D_{t}(T)+C
\end{align*}$$
where the constant $C$ is the settlement price at time $T$ made at time $\tau$ and discounted to time $\tau<t<T$.

17 March, 2015

Can a Derivative's Value Exceed the Underlying Notional Value?



On a recent project I valued some derivatives, the results of which the client balked at because the values exceeded the notional on which they were written.  So is it ever possible for a derivative's valuation to exceed its underlying notional?


The answer, of course, depends, and first we need to clarify what we mean by a derivative's value exceeding its notional.  A typical derivative (like an option or future) is written on some underlying asset with price $S_{t}$ and some quantity or notional $N$.  The term quantity is frequently used for assets like stocks and the term notional for assets like currencies - so in the latter case, if I have USD/EUR call option, then I view the asset as the US dollar (that I want to buy a call option on) priced in the European Euro (that is what the USD/EUR exchange rate is - the cost of a US dollar in Euros), with a notional (i.e. quantity) equal to (say) $\$100,000,000$ USD.

Notice that the quantity/notional has units in the underlying asset and that the spot price $S_{t}$ has units of value in the numeraire/settlement currency per 1 asset.  Hence, when we ask if a derivative's value can exceed its underlying notional, we are really asking whether the value at time $t$, denoted $V_{t}$, can exceed the quantity $NS_{t}$, which has units in the settlement currency (EUR in the above example).  In other words, we ask whether
$$V_{t}>NS_{t}$$
can hold without introducing an arbitrage.

Essentially, the purpose of the above discussion was to express precisely what we mean for the valuation to exceed the underlying notional and moreover, to emphasize that derivative's valuation cannot be directly compared to the notional in order to answer the question, since since the units are not the same - the underlying notional $N$ needs to be multiplied by the spot price $S_{t}$ so that each has units in the valuation currency.

The classic counter-example to answering this question affirmatively in all instances comes from considering a call on option written on $N$ of some asset with price $S_{t}$.  If you value this option at time $t$, then it is clear that
$$V_{t}<NS_{t},$$
for otherwise one could short a covered option at no cost and an arbitrage would exist.  But this argument no longer holds for instruments with payoffs that are not artificially bounded by some optionality mechanism.  Indeed, the example from my experience involved an FX forward on CUR1/CUR2 (to be generic).  For such a forward, let $K$ be the strike, $N$ the notional (denominated in CUR1), $D$ the discount factor and $\alpha$ the CUR2/CUR1 exchange rate ($1/S_{t}$).  Then if the entity is in the short position we have
$$-\alpha N\cdot(F-K)\cdot D>N$$
$$(F-K)<-\frac{1}{\alpha D}$$
$$F<K-\frac{1}{\alpha D}.$$
Hence, if the forward rate is sufficiently small (i.e. price of CUR1 declined) with respect to the inception strike, then the value of the forward will exceed the notional as an asset. Conversely,
$$-\alpha N\cdot(F-K)\cdot D<-N$$
$$(F-K)>\frac{1}{\alpha D}$$
$$F>K+\frac{1}{\alpha D}$$
Hence, if the forward rate is sufficiently large (i.e. the price of CUR1 increased) with respect to the inception strike, then the value of the forward will exceed the notional as a liability.

It is not difficult to come up with similar bounds for other basic instruments such as swaps either.

23 February, 2015

Why is Brownian Motion Almost Surely Continuous?



A user from the Quant StackExchange recently asked why the regularity of condition of Brownian motion, namely almost sure continuity, is what it is: almost sure?  Why can't this be upgraded to Brownian motion being surely continuous?

The answer to the latter question is that, actually, it can and very often is.  The answer to the former question is that stipulating almost sure continuity is required in order to make the defining conditions of Brownian motion axiomatic, while still encompassing all of the methods of construction.

Indeed, the most common construction of Brownian motion (or at least the most direct) is through an application of Kolmogorov's extension theorem (the details of this approach can be found in Durrett).  But due to technical issues arising from measure theory (which are actually quite natural), the resulting construction leads to realizations of Brownian motions that are discontinuous.

On the other hand, the approach to constructing Brownian motion from the limit of scaled random walks actually leads to surely continuous realizations.  There are two available routes one can go when having this approach in mind: (a) construct Brownian motion paths directly (i.e. pointwise) from scaled random walks (one common way to do this is by appropriately specifying Brownian motion on the dyadic intervals, interpolating between, and taking limits) or (b) construct Brownian motion by obtaining the Wiener measure on the space of continuous functions beginning at the origin from the induced measures on this space obtained from the scaled random walks on $\mathbb{Z}^{\infty}_{2},$ the space of sequences with values of $-1$ or $1$.

The user also asked whether an explicit example of a discontinuous Brownian motion path could be exhibited.  The following is my complete answer to this and the above questions.

____________________________

Exhibiting a counter-example is straight-forward enough.  For example, let $B_{t}(\omega)$ be a Brownian motion and $\mathcal{T}(\omega)$ a stopping time on $(\Omega,\mathbb{P})$ with a continuous distribution.
Then with
$$B'_{t}(\omega)=\left\{\begin{array}{ll}B_{t}(\omega),&t\neq\mathcal{T}(\omega)\\B_{t}(\omega)+1,&t=\mathcal{T}(\omega),\end{array}\right.$$
$B'_{t}(\omega)$ satisfies (1) and (2) below, but is discontinuous precisely when $t=T(\omega)$.  Therefore, $B_{t}(\omega)$ is a particular realization of Brownian motion that is not everywhere continuous.

There are lots of other ways to obtain a "bad" Brownian motion.  Another example is
$$B'_{t}(\omega)=B_{t}(\omega)\mathbb{1}_{\{B_{t}(\omega)\;\text{irrational}\}},$$
but this is less straight-forward to prove.

____________________________

The reason for stipulating almost sure continuity has to do with the way one constructs Brownian motion, and the issue can be completely dispensed with dependent on one's approach.
The usual presentation in finance texts is the abstract one, namely given a probably space $(\Omega,\mathbb{P})$, one has a Brownian motion $B_{t}(\omega)$ on this space if
  1. For every set of times $0\leq t_{1}<t_{2}<\ldots<t_{n}$ the increments $B_{t_{1}},B_{t_{2}}-B_{t_{1}},\ldots,B_{t_{k}}-B_{t_{k-1}}$ form a mutually independent set of random variables on $(\Omega,\mathbb{P}).$
  2.  The increments above are normally distributed with mean $0$ and variance $\Delta t$.
  3. For almost every $\omega\in\Omega$ the path $t\mapsto B_{t}(\omega)$ is continuous.
Most texts also include a section that sketches a concrete realization of Brownian motion as the limit of scaled random walks.  If one does this rigorously, one sees that (3) upgrades to for every $\omega\in\Omega$

Indeed, if we start with $(\Omega,\mathbb{P})$ satisfying the above and let $\mathcal{P}$ denote the collection of continuous functions $[0,\infty)\to\mathbb{R}$ with $p(0)=0$, then we get from (3) the inclusion map
$$\mathcal{i}:\Omega\to\mathcal{P},$$
defined on a set $\Omega'\subset\Omega$ of full measure, and the push-forward measure of $\mathbb{P}$ onto $\mathcal{P}$ under this inclusion map turns out to be equal to the Wiener measure $\mathbb{W}$ on $\mathcal{P}$, which is unique.

Conversely, one can construct $(\mathcal{P},\mathbb{W})$ directly by starting with the set $\mathcal{P}$ (where every element of this set is continuous a priori) and demonstrating that the measures $\mu_{N}$ on $\mathbb{Z}^{\infty}_{2}$ arising from the appropriately scaled random walks $S_{t}^{N}(\omega)$ ($\omega\in\mathbb{Z}^{\infty}_{2})$ induce a collection of tight measures on $\mathcal{P}$ which converge weakly to $\mathbb{W}$:
$$\mu_{N}\Longrightarrow\mathbb{W}\;\text{(weakly)}$$
One then defines
$$\tilde{B}_{t}(\omega):=p(t)\in\mathcal{P}$$
and readily shows that under $\mathbb{W}$, $\tilde{B}_{t}$ satisfies (1)-(3) and that therefore
$$\tilde{B}_{t}(\omega)=B_{t}(\omega),$$
but that now *every* Brownian motion is continuous.

The equivalence of the implications above show the existence of Brownian motion is essentially tantamount to the existence of a Wiener measure on $\mathbb{W}$ arising from the sequence of measures arising naturally from the scaled random walks.  If one starts from the goal of obtaining this measure, one gets continuity for *every* Brownian motion $p(t)=B_{t}(\omega)$.

____________________________

Other constructions of Brownian motion require us stipulate almost sure continuity due to technicalities arising from measure theory on product spaces.  The quickest construction of Brownian motion in this direction is by applying Kolmogorov's extension theorem on a suitable class of processes; details can be found in Durrett.

21 February, 2015

A Simple Monte Carlo Simulator for European Call Options

In this post I will present a procedural C++ implementation of a simple Monte Carlo simulator for the pricing of a European call option.  Subsequent articles will make significant improvements such as the pricing of puts and different types of options, improved sampling, incorporation of jump processes, etc.

Recall that we model a stock price $S$ as a stochastic process $\{S_{t}:0\leq t\leq T\}$ governed by the stochastic differential equation (geometric Brownian motion)
$$(1)\;\;\;\;dS_{t}=\mu S_{t}dt+\sigma S_{t}dW_{t}$$
where $\mu$ is the drift or expected return on the stock, $W_{t}$ is a Weiner process $(dW_{t}=N(0,1)\sqrt{dt}$ where $N(0,1)$ is the standard normal distribution), and $\sigma$ is the volatility of the stock (a measure of the variance since $\sigma N(0,1)=N(\sigma^{2},1)$).  Using arbitrage arguments (Merton's method) we arrive at the (linear) Black-Scholes-Merton partial differential equation
$$(2)\;\;\;\;V_{t}+\frac{1}{2}\sigma^{2}S^{2}V_{ss}+rSV_{s}-rV=0.$$ The price $V$ of any derivative must satisfy this equation, and conversely, any solution to (2) gives the price of some derivative based on the boundary conditions used (for European call options, the boundary conditions at the expiry time $T$ are $V(\cdot,T)=\max(S_{T}-K,0)$ where $K$ is the strike price of the option and $S_{T}$ is the value of the (random) stock price at time $T$).  Pricing methods for derivatives based on (2) are known as PDE methods in mathematical finance, and usually consist of numerically solving (2) using finite difference methods (since the domain of definition is rectangular in the $(S,t)$ coordinate system).  This can be difficult, however, since the varied boundary conditions must be taken into account and issues of convergence, stability, etc. enter in.

A different (and more recently popular) approach is probabilistic and involves sampling the randomness in the geometric Brownian motion model (1) of the stock price multiple times and taking an average.  Indeed, an application of the Feynman-Kac formula shows that when discounted appropriately, solutions to (1) are martingales and this implies that $V$ is simply the expected value of the discounted payoff of the derivative.  For a European call option, the payoff is $f(S)=\max(S-K,0)$ and so we get
$$(3)\;\;\;\;V_{\text{EuroCall}}=e^{-rT}\mathcal{E}[f(S_{T})]$$
where $r$ is the riskless return, $\mathcal{E}$ is taken under the risk-neutral probability measure, and $S_{T}$ is the final value of the stochastic process $\{S_{t}\}$ at expiry of the option $T$.  (From now on, $V$ is for the price of a European call option.)  Thus, beginning with (1), passing to the logarithm, applying Ito's Lemma and then applying (3) we get
$$\begin{align*}
&dS_{t}=rS_{t}\;dt+\sigma S_{t}dW_{t}\\
\Longrightarrow&d\log S_{t}=(r-\frac{1}{2}\sigma^{2})dt+\sigma dW_{t}\\
\Longrightarrow&\log S_{t}=\log S_{0}+(r-\frac{1}{2}\sigma^{2})t+\sigma W_{t}\\
\Longrightarrow&S_{t}=S_{0}\exp\left\{\left(r-\frac{1}{2}\sigma^{2}\right)t+\sigma\sqrt{T}N(0,1)\right\}\\
\Longrightarrow&V=e^{-rT}\mathcal{E}\left[f\left(S_{0}\exp\left\{\left(r-\frac{1}{2}\sigma^{2}\right)t+\sigma\sqrt{T}N(0,1)\right\}\right)\right]\\
(4) \Longrightarrow&V=e^{-rT}\mathcal{E}\left[\max\left(S_{0}\exp\left\{\left(r-\frac{1}{2}\sigma^{2}\right)t+\sigma\sqrt{T}N(0,1)\right\}-K,0\right)\right]\\
\end{align*}$$

(Recall that $dW_{t}=N(0,1)\sqrt{dt}$ and so $W_{T}=N(0,1)\sqrt{T}$).  The above computations are somewhat complicated when carried out in detail, and while the explanation for how (2) is derived from (1) is relatively straight-forward, the derivation of (3) from (1) (and thus the previous computations) is significantly more subtle but far more important in the theory of mathematical finance.  A full explanation of the concepts involved (properties of Weiner processes, stochastic integration and Ito's lemma, risk-neutral valuation and the risk-neutral measure, change of numeraire, etc.) is beyond the scope of this post but will be elaborated on in subsequent posts.  In any event, we only need to take faith that (4) is equivalent to (2) for pricing European call options.

The idea of Monte Carlo simulation is now evident. We simulate $N$ paths of $S_{T}$ by sampling from the standard normal distribution $N(0,1)$ and then compute $f(S^{n}_{T})$.  Since each sampling is independent, the random variables $\{f(S_{T}^{n})\}_{n}$ are independent and identically distributed and so the law of large numbers implies
$$\frac{1}{N}\sum_{n=1}^{N}f(S^{n}_{T})\to\mathcal{E}[f(S_{T})]\;\text{as}\;N\to\infty$$
pointwise (and of course, in probability).  We then then discount at $e^{-rT}$ and this is our estimate on the price $V$.

The following implementation is coded procedurally in C++ and prices a European call option using the method explained above.  The only part which may require explanation is the method in which $N(0,1)$ is sampled.  While there are several ways to do this, a simple yet accurate/efficient method is known as the Box Muller algorithm, implemented here.

MonteCarloMethod_Source.cpp
#include <iostream>
#include <cmath>

using namespace std;

double BoxMullerGaussian();
double MonteCarloSimulator(double Expiry, double Strike, double Spot, double Vol, double Riskless, unsigned long N);

int main(void) {
 double Expiry, Strike, Spot, Vol, Riskless, N = 0;
 cout << "Enter: Expiry, Strike, Spot, Vol, Riskless, # Trials" << endl;
 cin >> Expiry >> Strike >> Spot >> Vol >> Riskless >> N;

 cout <<  "The price of the option is "
  << MonteCarloSimulator(Expiry, Strike, Spot, Vol, Riskless, N)
  << endl;
 
 double pause;
 cin >> pause;

 return 0;
}

double BoxMullerGaussian() {
 double x,y;
 
 double sizeSquared;
 do {
  x = 2.0*rand()/(double)(RAND_MAX) - 1;
  y = 2.0*rand()/(double)(RAND_MAX) - 1;
  sizeSquared = x*x + y*y;
 } while(sizeSquared >= 1.0);

 return x*sqrt(-2*log(sizeSquared)/sizeSquared);
}

double MonteCarloSimulator(double Expiry, double Strike, double Spot, double Vol, double Riskless, unsigned long N) {
 double var = Vol*Vol*Expiry;
 double std = sqrt(var);
 double itoCorrection = -0.5*var;
 double movedSpot = Spot*exp(Riskless*Expiry + itoCorrection);
 double _Spot = 0;
 double runningSum = 0;

 for (unsigned long i=0; i<N; i++) {
  double _Gaussian = GetOneGaussianByBoxMuller();
  _Spot = movedSpot*exp(std*_Gaussian);
  double _Payoff = _Spot - Strike;
  _Payoff = _Payoff>0 ? _Payoff : 0;
  runningSum += _Payoff;
 }

 return exp(-Riskless*Expiry)*(runningSum/N); // expectation
}
As an example, we consider Microsoft's stock. Today it closed at 41.35, so $\text{Spot}=41.35$ (the spot price in option pricing is $S_{0}$, which is not random, but $S_{t}$ is for $t>0$ upto $t=T=\text{Expiry}$). We write the option with a strike price $\text{Strike}=40.00$ so that the option is in the money initially (for the buyer). If the time to expiry is 6 mo., then $\text{Expiry}=0.5$. The 6mo. risk-free rate is $\text{Riskless}=0.33$ (taken to be the LIBOR rate quoted this week). The implied 6mo. volatility for Microsoft options is around $\text{Vol}=0.22$. With these parameters and $N=100$ trials, we get $$V_{\text{MSFTEuroCall}}=8.98$$

18 February, 2015

Put-Call Parity


Put-Call Parity for European Options.  Fix $t>0$ and let $T>t$ be a fixed future time. Denote the continuously compounded risk-free interest rate of tenor $T-t$ at time $t$ by $r_{t}(t,T)$, and let $K$ be the strike price on some asset $S$ negotiated at time $t=0$, whose price at time $t\geq0$ is denoted by $S_{t}$.  Then if $c_{t}(K,T)$ and $p_{t}(K,T)$ are the respective prices of European call and put options on $S$ with strike $K$ and expiry $T$, then we have the following result: $$c_{t}(K,T)-p_{t}(K,T)=e^{-r_{t}(t,T)(T-t)}\left(F_{t}(T)-K\right),$$ where $F_{t}(T)$ is the price of a forward contract on $S$ expiring at time $T$, calculated at time $t$. Theoretically, $$F_{t}(T)=S_{t}e^{-r_{t}(t,T)(T-t)}.$$ We are assuming throughout that the asset $S$ pays no income and has no further funding/carrying cost over the period $[t,T]$ beyond the risk-free rate $r_{t}(t,T)$.  The usual adjustments apply if there are dividends, storage costs, etc., with income benefiting the short party and funding/carrying costs benefiting the long party.
Proof.  The proof consists of a simple replication argument.  Let $V_{t}=c_{t}-p_{t}$ be the value of the portfolio at time $t$ consisting of a long position on $c$ and a short position on $p$.  At time $T$ the payoff from our portfolio is
$$V_{T}=c_{T}-p_{T}=\max(S_{T}-K,0)-\max(K-S_{T},0)=S_{T}-K.$$
Therefore, the payoff from our contract is equal to a long forward position on $S$.  If we took opposite positions, then we would have a short forward position.

Thus our portfolio replicates the payoff of a forward contract on $S$.  It follows that

If the value of a derivative is known at time $T$ with certainty, then the value at any previous time $t$ is equal to the value at time $T$ discounted back to time $t$.  In particular, since $V_{T}$ is known with certainty,
$$V_{t}=e^{-r_{t}(T-t)}(S_{T}-K)\;\;\;\;0\leq t\leq T.$$
This assertion follows from the fact that forward contracts have a certain known payoff at their time of expiration, the fact that our portfolio is equal to this payoff, and the principle of Rational Pricing.  It follows that
$$V_{t}=e^{-r_{t}(T-t)}(F_{t}(T)-K)\;\;\;\;0\leq t\leq T.$$

14 February, 2015

Overview of the Black-Scholes Model and PDE


In this post we take the PDE approach to pricing derivatives in the Black-Scholes universe.  In a subsequent approach we will cover the risk-neutral valuation approach; the two are essentially equivalent by the Feynman-Kac formula.

I. ASSUMPTIONS

We will assume that our market consists soley of an equity (stock ) $S$, a risk-free money market account (bond) $B$ with return $r\geq0$, and any number of derivatives with $S$ as the underlying.  We will make several non-technical assumptions about $S$ and our market, collectively termed the "Black Scholes market."

1. Infinite liquidity.  Market participants can buy or sell $S$ at any time.
2. Infinite depth.  The buying and selling of $S$ does not affect the price of $S$, no matter the transaction size.
3.  No friction.  It costs nothing to buy or sell $S$ (i.e. trading $S$ incurs no transaction costs).  In particular, the price paid by the buyer and seller is the same (i.e. bid-offer spread it $0$).
4. Constant risk-free rate. $r\equiv\text{const}.$
5. No arbitrage.  There do not exist portfolios of assets where the portfolio is riskless and earns more than the risk-free money account $B$.
6. Infinite divisibility. Market participants can buy $S$ in any amount $\Delta\in\mathbb{R}$.
7. Short selling is possible.  Market participants can short (borrow) $S$ at no cost.
8. No storage costs.  Market participants can hold $S$ at no cost.

It is possible to relax several of these assumptions in various ways.  Short selling is generally permitted in US markets (cf. SEC up-tick rule) and there are of course no storage costs for equities except for the payment of dividends if one has shorted a dividend paying stock.

One can factor variable risk-free interest rates directly into the model by allowing $r$ to be a deterministic function of time, or even to follow a stochastic process.  There is a tremendous amount of on-going research in dealing with  transaction costs, but let us just say that most institutional investors in derivative contracts (e.g. market makers/banks) take huge positions and therefore the effect of transaction costs is minimized (as we will see below, it is the frequency of trades in $S$ in a process termed dynamic hedging that is the source of transaction costs, not the quantity/volume of a single trade).  For similar reasons, the divisibility assumption is also relatively minor when large numbers are involved, since the fractional component of the quantity in a transaction is small relative to the whole number quantity.

The assumption of infinite liquidity is generally not an issue for plain vanilla contracts that actively trade on exchanges, as there are always counter-parties available to take opposing positions.  The assumption becomes more dubious when one moves to the over-the-counter market where exotic contracts are traded; however, the existence of market makers willing to take positions in essentially any contract that can be hedged essentially validates the assumption.  The non-zero bid-offer spread of course then violates the no friction assumption.

The assumption which is most arguable, however, is that of infinite depth, which violates the law of supply and demand.  At the end of the day however, our goal is to develop a model which is both simple and provides a good approximation to reality, and these assumptions will lead us to such a model which is generally good enough, and can be perfected in various ways as needed.

II. ASSET PRICE MODEL - QUANTITATIVE ANALYSIS APPROACH

In order to price derivatives dependent on a stock $S$ (or any underlying for that matter), it is necessary to first come up with a mathematical model for the price movements of $S$.  This is where the approaches of fundamental and quantitative analysis diverge markedly.  Whereas fundamental analysis attempts to predict stock (asset) price movements through a careful analysis of a company's financial statements and other qualitative sources of information like press releases, general market sentiment, new product launches, etc. (this process is often termed equity research and is the method used by traditional long-term value investors), quantitative analysis models stock (asset) price movements with a stochastic model which we now discuss, and attempts to make predictions on future stock price movements through statistical significance analysis/econometrics, which we will not discuss in any detail here.  Incidentally, a third approach, known as technical analysis, attempts to use past data and trends in order to predict the price in the future.  As we will see, the validity of such an approach represents a direct contradiction to the quantitative model we are about to develop, and we therefore assume it is invalid.  There is in fact, little evidence to suggest that technical analysis leads to reliably predictable results of any degree.

We assume the existence of a probability space $(\Omega,\mathbb{P},\mathcal{F})$ and an associated filtration $\{\mathcal{F}\}_{t\geq0}$ that supports a Brownian motion $\{W(\omega,t)\}_{\omega\in\Omega,t\geq0}.$  We will assume that the reader is familiar with the construction and basic properties of $W(\omega,t)$.

Associated to any process $\Delta(\omega,t)\in L^{2}(\Omega\times\mathbb{R}_{t})$ is an integral called the Ito integral
$$I(\omega,t)=\int_{0}^{t}\Delta(\omega,t)\;dW(\omega,t),$$
which we also assume the reader is familiar with (it is defined for each fixed $\omega\in\Omega$ just like the usual Riemann-Stieltjes integral, except that the sample point in the approximating sum is always taken to be the left-end of each partition interval).  In what follows, we will generally suppress reference to $\omega\in\Omega$ and sometimes use the more customary subscript notation $W_{t}, I_{t}, \Delta_{t},$ etc.

An Ito process is a stochastic process $\{X_{t}\}_{t\geq0}$ defined by the stochastic integral equation (SDE)
$$X(t)=X(0)+\int_{0}^{t}a(X(s),s)\;ds+\int_{0}^{t}b(X(s),s)\;dW(s).$$
Note that the terminology stochastic refers to the fact that the equation involves an Ito integral; however, the solution to an SDE is a random process, and so the method of solution of an SDE in principle does not involve any probability theory.  It is the fact that $W(s)$ is almost surely non-differentiable that complicates any proposed solution method (this is expressed by Ito's lemma, which is a resultantly more complicated version of the chain rule).

For $\mu,\sigma\geq0$, if $a=\mu X$ and $b=\sigma X$, we get (with $S=X$)
$$S(t)=S(0)+\mu\int_{0}^{t}S(s)\;ds+\sigma\int_{0}^{t}S(s)\;dW(s).$$

This is our model for stock-price movements, and it is often called geometric Brownian motion.  It is often expressed locally in "differential" form as
$$dS=\mu Sdt+\sigma SdW$$
or
$$\frac{dS}{S}=\mu dt+\sigma dW.$$

There is no harm in doing this as long as one understands vividly that the differential form has no rigorous meaning attached to it, whereas the integral form is well-defined mathematically.  In fact, the entire topics of SDE's should be replaced by SIE's, since $W$ is no-where differentiable and so differential equations involving it really make no sense.  Nonetheless, the convention pervades the subject, and in particular mathematical finance, and so one must get accustomed to it.

There are two reasons why the differential representation of geometric Brownian motion is used.  First, it provides better intuition for why the model is a good model for stock price movements, as we will explain shortly.  Second, as we will see through the remainder of this article, the it facilitates the computations which come up in quantitative finance, in particular those involving Ito's lemma.

Why is geometric Brownian motion a good model for stock price movements?  The two parameters $\mu$ and $\sigma$ represent drift (expected return) and standard deviation per unit time (volatility), respectively.

Geometric Brownian motion captures the intuitive idea that asset prices should drift according to the expected return $\mu$; the riskier the asset, the higher the expected return demanded by investors, and therefore the greater drift in the asset price, regardless of any random fluctuations.  The expected return on an asset is also independent of the stock price, i.e. investors will demand $\mu$ whether the asset trades at $50$ or $5$.  This is modeled by the $\mu\int_{0}^{t}S(s)ds$ term, or $\mu Sdt.$  If there was no stochastic component to the model, then we would have
$$S(t)=S(0)+\mu\int_{0}^{t}S(s)\;ds,$$
or by the fundamental theorem of calculus
$$S'(t)=\mu S(t),$$
which has the solution
$$S(t)=S(0)e^{\mu t}.$$
This is how we would expect the price of a riskless asset with return $\mu$ to grow.

Of course, asset prices are anything but deterministic, and it is natural to assume the presence of an unbiased (i.e. centered at $0$) white noise weighted according to the perceived volatility of the asset (volatility is not a risk-meaure; it is, among other things, directly tied to the liquidity of the asset and how trading affects its price).  The asset price swings should also be directly proportional the the price of the asset itself.  For instance, a stock that trades around $1$ will have price swings markedly lower than an asset that trades around $100.$  This combined with the volatility of the asset is modeled by the $\sigma\int_{0}^{t}S(s)\;dW(s)$ term, or $\sigma SdW.$

If $\mu=0$, then we would have
$$S(t)=S(0)+\sigma\int_{0}^{t}S(s)\;dW(s).$$
There is no corresponding fundamental theorem of calculus for the Ito integral; however, we will see how to solve this below by using Ito's lemma.  In any event, if we combine these two terms, we recover the geometric Brownian motion model
$$S(t)=S(0)+\mu\int_{0}^{t}S(s)\;ds+\sigma\int_{0}^{t}S(s)\;dW(s).$$

III.  EXTENDING THE MODEL TO DERIVATIVES

The extension of the model to derivatives, that is functions of the asset price $S$ and time $t$, involves a technical theorem known as Ito's lemma.  An expository article on its motivation and rigorous proof can be found on one of my previous posts entitled a rigorous proof of Ito's lemma.

If $V=V(S(t),t)$ is the value of a contingent claim on $S$, then we have from Ito's lemma (appropriately extended to handle geometric Brownian motion) that $V$ follows the process
$$\begin{align*}
V(t)&=V(0)+\int_{0}^{t}V_{t}(S(s),t)\;ds+\int_{0}^{t}V_{S}(S(s),s)\;dW(s)+\frac{1}{2}\int_{0}^{t}V_{SS}(S(s),s)\;ds\\
&=V(0)+\mu\int_{0}^{t}S(s)V_{S}(S(s),s)\;ds+\sigma\int_{0}^{t}S(s)V_{S}(S(s),s)\;dW(s)+\int_{0}^{t}V_{t}(S(s),s)\;ds+\frac{1}{2}\sigma^{2}\int_{0}^{t}S^{2}(s)V_{SS}(S(s),s)\;ds\\
&=V(0)+\int_{0}^{t}\left(\mu S(s)V_{S}+\frac{1}{2}\sigma^{2}S^{2}(s)V_{SS}(S(s),s)+V_{t}(S(s),s)\right)\;ds+\sigma\int_{0}^{t}S(s)V_{S}(S(s),s)\;dW(s).\end{align*}$$

In the more usual differential form, we have then
$$dV=\left(\mu SV_{S}+\frac{1}{2}\sigma^{2}S^{2}V_{SS}+V_{t}\right)dt+\sigma SV_{S}dW.$$

We again point out that the differential form is just a short-hand for the integral form above, which is the only mathematically meaningful expression.

VI. DERIVING A NO-ARBITRAGE CONDITION ON $V(S(t),t): THE BLACK-SCHOLES PDE

In this section $\Pi$ denotes the value of a portfolio consisting of $\Delta$ units of an asset $S$, whose value $\{S_{t}\}_{t\geq0}$ follows the geometric Brownian motion process discussed previously, and a short position in an derivative whose underlying is $S$.

As we will see, the exact nature of the derivative and its payoff (the value of $V$ at expiry, i.e. $V(S(T),T)$) function is unimportant, just that its value at time $0\leq t\leq T$ depends only $(S(t),t)$.  In other words, $V$ is path independent. The PDE $V$ must satisfy is the same whether $V$ is the value function for an option, a forward agreement, future, whatever.  In the subsequent posts we will discuss briefly on how one might extend the PDE to cover more exotic types of derivatives with path dependent payoffs or other exotic features written into their contracts like barriers and knock-outs.  We will also assume that $S$ provides no income over the life of the derivative.  In subsequent posts we will discuss how to incorporate income (i.e. dividends if $S$ is a stock), and in any event the modification is trivial (just replace $r$ with $r-q$ below, $q$ being the

The idea of the derivation is to determine the initial capital (cost) that must be put up to construct the portfolio $\Pi$ and hedge the position (risk) so as to ensure a riskless payoff, which must be equal to the payoff of the risk-free money market account $P_{0}e^{rt}$; otherwise there would exist simple arbitrage strategies.  This puts a condition on $\Delta$, the quantity of the underlying we must hold at time $t$ in order to hedge against a short position in $V$.  The strategy is known as continuous time delta hedging.  The hedging eliminates the risk of the portfolio in real time, and as mentioned, means the portfolio must earn the risk free rate.  It is really quite phenomenal that this is possible, and the basic reason that it is has to do with the fact that both $S$ and $V$ are affected by the same underlying uncertainty, namely the Brownian motion process $\{W_{t}\}_{t\geq0}$.  Using Ito's lemma, we can eliminate the terms involving $W$ and therefore eliminate uncertainty.  Indeed, it is impossible to do this if we assume more complicated asset price models different from geometric Brownian motion (this is an active area of research since there is much evidence to suggest that the true process followed by an asset $S$ is a Levy flight, or a fat tailed Brownian motion).

From the way we have constructed our portfolio and the previous sections, we have at time $0\leq t\leq T$ we have
$$\begin{align*}
\Pi(t)&=\Delta(t)S(t)-V(S(t),t)\\
&=\Delta(t)\left\{S(0)+\int_{0}^{t}\mu S(s)\;ds+\int_{0}^{t}\sigma S(s)\;dW(s)\right\}\\
&\;\;\;\;-\left\{V(0)+\int_{0}^{t}\left(\mu S(s)V_{S}(S(s),s)+\frac{1}{2}\sigma^{2}S^{2}(s)V_{SS}(S(s),s)+V_{t}(S(s),s)\right)\;ds+\int_{0}^{t}\sigma S(s)V_{S}(S(s),s)\;dW(s)\right\}\\
&=\Delta(t)S(0)-V(0)\\
&\;\;\;\;+\int_{0}^{t}\left[\Delta(t)\mu S(s)-\left(\mu S(s)V_{S}(S(s),s)+\frac{1}{2}\sigma^{2}S^{2}(s)V_{SS}(S(s),s)+V_{t}(S(s),s)\right)\right]ds\\\
&\;\;\;\;+\int_{0}^{t}\left[\Delta(t)\sigma S(s)-\sigma S(s)V_{S}(S(s),s)\right]dW(s).
\end{align*}$$

We seek to eliminate the uncertainty in $\Pi$, so we match for each time
$$\Delta(t)=V_{S}(S(t),t).$$
(Observe very carefully below how this substitution works!)

Thus, $\Pi$ being riskless, we must have in order to preclude arbitrage opportunities with this portfolio,
$$\begin{align*}
\Pi(t)&=V_{S}(S(0),0)S(0)-V(0)\\
&\;\;\;\;+\int_{0}^{t}\left[\left(\mu V_{S}(S(s),s)S(s)-\mu S(s)V_{S}(S(s),s)\right)-\left(\frac{1}{2}\sigma^{2}S^{2}(s)V_{SS}(S(s),s)+V_{t}(S(s),s)\right)\right]ds\\\
&\;\;\;\;+\int_{0}^{t}\left[\sigma V_{S}(S(s),s)S(s)-\sigma S(s)V_{S}(S(s),s)\right]dW(s).\\
&=V_{S}(S(0),0)S(0)-V(0)-\int_{0}^{t}\left[\frac{1}{2}\sigma^{2}S^{2}(s)V_{SS}(S(s),s)+V_{t}(S(s),s)\right]ds\\
&=\Pi(0)e^{rt}\\

&=\text{value of risk free money market account at time}\;t\;\text{with initial investment}\;\Pi(0).
\end{align*}$$

Note that with the non-differentiable Brownian motion now gone, we can legitimately pass to the differential form of this expression by using the fundamental theorem of calculus.  Differentiating with respect to $t$ we have the localized version of $\Pi$ given by
$$\Pi_{t}=-\left(V_{t}+\frac{1}{2}\sigma^{2}S^{2}V_{SS}\right)=r\Pi(0)e^{rt}=r\Pi=r(SV_{S}-V).$$
Consequently,
$$V_{t}+\frac{1}{2}\sigma^{2}S^{2}V_{SS}+rSV=rV.$$
Note that we also have
$$V_{t}+\frac{1}{2}\sigma^{2}S^{2}V_{SS}=-r\Pi_{0}e^{rt}.$$
Of coruse we don't know $\Pi_{0}$ a priori, so we appeal to the previous PDE, which is known as the Black-Scholes PDE.  Observe that despite $S$ being present, there is nothing random about the PDE and $S$ can be regarded as as dummy variable.  It is the no-arbitrage requirement that leads to a deterministic outcome, and at each time/asset price $(S(t),t)$ we have the price given by the solution to the PDE, with remaining time to expiry beign $T-t$.

It is backwards parabolic, and so requires a terminal condition $f$ and time $t=T$, the payoff for the specific derivative under consideration.  Additional boundary conditions will lead to prices for other more exotic options discussed in subsequent posts.


11 February, 2015

Analysis of the Black Scholes PDE

In this post we conduct a cursory analysis of the Black-Scholes (B-S) partial differential equation (PDE), including existence and uniqueness of solutions, well-posedness, and in certain special circumstances, analytical solutions.

The B-S PDE is defined the space positive half space $\mathbb{R}^{+}_{S}\times\mathbb{R}^{+}_{t}=\{(S,t): S\in(0,\infty),t\in(0,\infty)\}$ by the equation
$$(1)\;\;\;\;V_{t}+\frac{1}{2}\sigma^{2}S^{2}V_{SS}+rSV_{S}-rV=0.$$

For various pay-off functions $f=f(S,t)$ (where the $S$ in the argument is interpreted as $\{S_{t}\}_{t\in\mathbb{R}}$; in many cases it will just be $t=T$, the value of $S$ at expiration), boundary conditions may be instituted and the domain of definition becomes $(0,S_{\text{max}})\times(0,T)$.  We shall discuss boundary conditions in more detail later; for now we will work on the positive plane.

For $\xi\in\mathbb{R}$ we have
$$-\frac{1}{2}\sigma^{2}\xi^{2}<\theta\xi^{2}$$
for some appropriate positive parameter $\theta$ depending only on $\sigma$.  Thus, (1) is uniformly backward parabolic.

I.  Converting the B-S PDE to the heat equation (systematic procedure)

We begin our analysis by performing a sequence of change of independent and dependent variables in order to reduce the equation to a more simple one.  We first deal with the variable co-efficients, whose special structure suggests the change of variables $S\mapsto e^{-S}$, or using a new variable name, $x=\log S$.  The differential operators then become
$$\frac{\partial}{\partial S}=\frac{\partial}{\partial x}\frac{\partial{x}}{\partial S}=\frac{1}{S}\frac{\partial}{\partial x}$$
and
$$\frac{\partial^{2}}{\partial S^{2}}=\frac{\partial}{\partial S}\left[\frac{1}{S}\frac{\partial}{\partial x}\right]=\frac{1}{S^{2}}\left(\frac{\partial}{\partial x^{2}}-\frac{\partial}{\partial x}\right).$$
Substituting these into (1) yields
$$V_{t}+\frac{1}{2}\sigma^{2}(V_{xx}-V_{x})+rV_{x}-rV=0,$$
or
$$V_{t}+\frac{1}{2}\sigma^{2}V_{xx}+(r-\frac{1}{2}\sigma^{2})V_{x}-rV=0.$$

The reaction term $rV$ will lead to an exponential increase in the solution, and so we assume $V$ has the form $e^{rt}u$ for some to be determined function $u$.  This leads to the change of variables $V\mapsto e^{rt}u$ and the equation becomes
$$e^{rt}\left(ru+u_{t}+\frac{1}{2}\sigma^{2}u_{xx}+(r-\frac{1}{2}\sigma^{2})u_{x}-ru\right)=0,$$
or
$$u_{t}+\frac{1}{2}\sigma^{2}u_{xx}+(r-\frac{1}{2}\sigma^{2})u_{x}=0.$$

(If $r=r(t)$ is not constant, but a deterministic function of $t$ alone, then we can solve the ODE $w_{t}-r(t)w=0$ and make the change of variables $V=w(t)u$ in order to eliminate the source term; we will assume however that $r\equiv\text{const}$ for the remainder of this article.)

We now deal with the drift term $(r-\frac{1}{2}\sigma^{2})u_{x}$ by switching to the moving frame $x'=x-(r-\frac{1}{2}\sigma^{2})t$ (i.e. switching to the characteristic coordinate system).  We also at this stage change the structure of the equation to forward parabolic by changing the direction of evolution to forward time through the change of variables $t\mapsto -t$, or $t'=-t.$  We compute as before the new differential operators
$$\frac{\partial}{\partial x}=\frac{\partial}{\partial x'}\frac{\partial x'}{\partial x}+\frac{\partial}{\partial t'}\frac{\partial t'}{\partial x}=\frac{\partial}{\partial x'},$$
$$\frac{\partial^{2}}{\partial x^{2}}=\frac{\partial^{2}}{\partial x'^{2}},$$
$$\frac{\partial}{\partial t}=\frac{\partial}{\partial x'}\frac{\partial x'}{\partial t}+\frac{\partial}{\partial t'}\frac{\partial t'}{\partial t}=-(r-\frac{1}{2}\sigma^{2})\frac{\partial}{\partial x'}-\frac{\partial}{\partial t'}.$$

Substitution then yields
$$-(r-\frac{1}{2}\sigma^{2})u_{x'}-u_{t'}+\frac{1}{2}\sigma^{2}u_{x'x'}+(r-\frac{1}{2}\sigma^{2})u_{x'}=0$$
or
$$(2)\;\;\;\;u_{t'}-ku_{x'x'}=0.$$

This is the heat equation, with $k=\frac{1}{2}\sigma^{2}.$  It therefore suffices to study the properties of (1) by studying (2).

II. Uniqueness and Stability

We shall restrict our attention to the domain $D=\{(x,t):0\leq t\leq T,0\leq x\leq \ell\}.$ This is consistent with domain used in practice with solving (1).  We have the following theorem
Maximum Principle.  If $u$ solves the heat equation $u_{t}=ku_{xx}$, then $u$ achieves its maximum on the boundary lines $t=0$, $x=0$, or $x=\ell$.  Moreover, if $u$ achieves its maximum in the interior of $D$ or on the line $t=T$, then $u$ is constant.
The first assertion is what is known as the weak maximum principle, and this is what we shall prove.  The second assertion is called the strong maximum principle since it implies the weak version; however, its proof requires tools from the theory of harmonic functions (e.g. the mean-value property) and some non-trivial analysis (cf. Evans).  Analogous statements hold for the minimum by considering $-u$.

Proof.  Recall from calculus that if a function $u$ obtains its maximum at an interior point of a set $E$, then $D_{i}u=0$ and $D^{2}_{i}u\leq0$.  Therefore we have $u_{t}=0$ and $u_{xx}\leq0$.  The idea is to show that $u_{xx}<0$ in order to obtain a contradiction, and we will do this with a perturbation.

Let $\epsilon>0$, put $M=\sup_{(x,t)\in\partial D-\{(x,t):t=T\}}u$, and define $v:=u+\epsilon x^{2}.$ The definition of $v$ shows
$$(3)\;\;\;\;v_{t}-kv_{xx}=u_{t}-ku_{xx}-2\epsilon k=-2\epsilon k<0.$$
Applying the argument in the first paragraph shows that $v$ does not attain its maximum in $D^{\circ}$.  Suppose now $u$ obtains its maximum at a point $(x_{0},T)$ on the line $t=T$.  Then $v_{x}=0,v_{xx}\leq0$ as before and
$$v_{t}(x_{0},T)=\lim_{\delta\to0^{+}}\frac{v(x_{0},T)-v(x_{0},T-\delta)}{\delta}\geq0,$$
a contradiction.  Since $D$ is compact, the continuous function $v$ must obtain its maximum on $\partial D-\{(x,t):t=T\}.$  Therefore $v\leq M+\epsilon\ell^{2}$ on $D$.  By sending $\epsilon\to0$, we obtain the assertion of the weak maximum principle for $u$.

This theorem immediately settles the uniqueness of the boundary value problem for (2), for suppose $u$ and $v$ are both solutions to (2) with identical boundary data. Then $w:=u-v$ is also a solution by linearity has vanishes on the boundary $x=0,x=\ell,t=0$.  Thus $w\equiv0$ in all of $D$ by combining the assertions of the maximum and minimum principles.  This implies $u\equiv v$ and the claim follows.

Another important route to establishing uniqueness is through the concept of conservation of energy.

Consider the "energy" integral
$$E(t)=\int_{0}^{\ell}|u(x,t)|^{2}\;dx$$
where $u$ vanishes on the spatial boundary $0\leq x\leq\ell$ and at the intial time $t=0$.  Differentiating with respect to $t$ yields
$$E'(t)=\frac{d}{dt}\int_{0}^{\ell}u^{2}\;dx=\int_{0}^{\ell}2uu_{t}\;dx.$$
where differentiation under the integral sign is valid, since as will be clear in the next section when we solve the heat equation, the mapping $x\mapsto 2u(x,t)u'(x,t)$ is absolutely integrable on $[0,\ell]$ for every $t\in[0,T)$.  Substituting $u_{t}=ku_{xx}$ and integrating by parts yields
$$E'(t)=2k\int_{0}^{\ell}uu_{xx}\;dx=2kuu_{x}\Big|_{x=0}^{x=\ell}-2k\int_{0}^{\ell}u^{2}_{x}\;dx=-2k\int_{0}^{\ell}u_{x}^{2}\;dx\leq0.$$
However, $E(0)=0$, and therefore $E\equiv0$ for all $t$.  Consequently, $u\equiv0$ and by considering the difference of two solutions with identical boundary data, we recover uniqueness (initially, we have uniqueness only almost everywhere, but since solutions to the heat equation are smooth, we recover pointwise uniqueness.)

We continue our efforts to establish well-posedness by showing that the PDE is stable with respect to the initial and boundary data.

III. Existence and the fundamental solution

There are by now multiple approaches to solving the heat equation recorded in the literature.  One of the simplest makes use of the Fourier transform.  We begin by extending our spatial domain to all of $\mathbb{R}$ and enforcing a decay condition so that $u\in L^{2}(dx).$  Our convention for the (one-dimensional) Fourier transform is
$$\mathcal{F}(u)(\xi):=\hat{u}(\xi)=(2\pi)^{-1/2}\int_{-\infty}^{\infty}u(x)e^{-ix\xi}\;dx$$
so that
$$||\hat{u}||_{2}=||u||_{2}\;\;\;\;(\text{Plancheral})$$
along with the associated inversion formula
$$u(x)=\int_{-\infty}^{\infty}\hat{u}(\xi)e^{ix\xi}\;d\xi.$$

The Fourier transform drops polynomial weights in exchange for anti-differentiation, so we apply it to the spatial derivative $x$ in order to get

$$\hat{u}_{t}=-k|\xi|^{2}\hat{u}.$$

Note that we have used the relation $\mathcal{F}(\partial_{t}u)=\partial_{t}\mathcal{F}u,$ which holds because of our condition on $u$.

Solving this ODE (in $t$) yields
$$\hat{u}=\hat{u}(0)e^{-k\xi^{2}t}.$$

A simple computation involving substituting the above result into the inversion formula and applying the convolution theorem then yields

$$u(x,t)=(4\pi kt)^{-\frac{1}{2}}\int_{-\infty}^{\infty}\phi(x)e^{-(x-y)^{2}/4kt}\;dy,$$
where $\phi(x)=u(x,0).$  This implies that our fundamental solution is
$$\Phi(x,t)=(4\pi kt)^{-\frac{1}{2}}e^{-x^{2}/4kt}.$$

IV. Boundary conditions and special solutions to the B-S PDE

Let us examine some typical boundary conditions in the Black-Scholes PDE.