20 March, 2015

A Rigorous Proof of Ito's Lemma

In this post we state and prove Ito's lemma.  To get directly to the proof, go to II Proof of Ito's Lemma.

For all its importance, Ito's lemma is rarely proved in finance texts, where one often finds only a heuristic justification involving Taylor's series and the intuition of the "differential form" of the lemma.  There are various reasons for this.  Ito's lemma is really a statement about integration, not differentiation.  Indeed, differentiation is not even defined in the realm of stochastic processes due to the non-differentiability of Brownian paths.  Thus, in order to present a proof of Ito's lemma, one must first cover stochastic integrals and prior to that the basic properties of Brownian motion, topics which for reasons of scope/audience cannot always be covered.  However, even more mathematically inclined texts only provide a sketch and skirt over technical details of convergence.  The purpose of this article is to remedy this situation and we begin with


If $f$ is $k+1$ times differentiable then Taylor's theorem asserts
where $t^{*}\in[t,t+h]$ if $h>0$ and $t\in[t+h,t]$ if $h<0$.

Fix $T>0$ ($T$ not necessarily small) and consider the difference $f(T)-f(0)$.  This can be computed as a sum of non-overlapping differences, i.e. if $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ is a partition of $[0,T]$, then with the aid of (1) using $h=t_{i+1}-t_{i}$, we get


As $n\to\infty$ (or $||\Pi||\to0$, i.e. $\max_{i}(t_{i+1}-t_{i})\to0$), we get
and for $k\geq2$

That is, $f(T)-f(0)=\int_{0}^{T}f'(s)\;ds$, which is the second fundamental theorem of calculus.  Now suppose $f$ and $g$ are smooth functions with $k+1$ derivatives and consider the composition $h=f\circ g$.  The familiar chain rule implies $h$ is differentiable and that

By substituting $h$ into (2) and computing $h^{(k)}$ iteratively according to (3), we get

We shall now see what happens when $g$ is not differentiable.  In that case, $h$ is not differentiable, and (1) through (4) are no longer valid.  However, we can write (4) instead as
where the integral is now taken as a Riemann-Stieltjes intergal.  If $g$ is differerntiable, then (5) reduces to (4), but (5) still makes sense even if $g$ is merely continuous (continuity is needed since $\int h\;dg$ is not well-defined if $h$ and $g$ share a common discontinuity, and $h=f(g(t))$ will in general be discontinuous wherever $g$ is).  Moreover, since $f$ is smooth, we may rewrite (2) as
&=\sum_{i=0}^{n-1}f'(g(t_{i}))(g(t_{i+1})-g(t_{i})) + \frac{1}{2}\sum_{i=0}^{n-1}f''(g(t_{i}))(g(t_{i+1})-g(t_{i}))^{2}+\ldots\end{align*}$$

Despite $g$ being non-differentiable, if it is sufficently "nice" then the terms converge to the same values as in (2) and we will recover (5).  A useful sufficient condition is that $g$ be continuous and of bounded variation.  This means
It is easy to prove that if $g$ is differentiable, then it is of bounded variation, since then an easy application of the above (or the mean-value theorem) gives (for a norm decreasing sequence of partitions $\Pi_{1},\Pi_{2},\ldots$)
For $\int f\;dg$, $g$ being of bounded variation and not sharing common discontinuities with $f$ is usually the most general sufficient condition used when considering existence, though this is not strictly necessary.  When $g$ is not of bounded variation, then $\int f\;dg$ may or may not exist, and it may even exist conditionally on the particular sample point used in the approximating sums, as we shall see below.

Now, the Ito lemma deals with the special case $g(t)=W(t)$ where $W$ is a Brownian motion sample path.  It turns out that for $\omega$ a.s. that
The latter quantity is called the quadratic (or second) variation of $W$.  For continuous functions $g$, $[g,g](T)\equiv0$ (this follows from estimating the higher order terms in (2)).  Moreover,
In fact, we have $[W]^{(\alpha)}(T)=\infty$ for $\alpha\leq2$ and $[W]^{(\alpha)}(T)=0$ for $\alpha>0$, a.s. $\omega$.  It would seem that the regularity on which integration theory depends on so directly (i.e. variation of the integrator) is not tractable for $W$.  It turns out though, that we can obtain something useful by weakening the definition slightly.  Let $\Pi_{1},\Pi_{2},\ldots$ be a sequence of partitions with $||\Pi_{n}||\to0$ as $n\to\infty$.  Then we can redefine the quadratic variation as
Unfortunately, even this is not well-defined without further qualification.  The reason that the supremum definition for the quadratic variation is a.s. infinite is due to the fact that it is possible for any $C>0$ to find a sequence of partitions $\{\Pi^{C}_{n}\}_{n}$ so that the above definition is equal to $C$ for some fixed sample path $\omega$.  However, the limit converges to $T$ in $L^{2}(\Omega)$ (or in probability, if you prefer).  That is to say, it converges in the $L^{2}$ norm to some random variables $Q(\omega)$ so that $Q(\omega)=T$ a.s. $\omega$ (recall that $L^{2}$ limits are defined only up to a set of measure $0$).  It turns out that if we make the further restriction that $\Pi_{1}\supset \Pi_{2}\supset \Pi_{3},\ldots...$ and that $\sum_{n=1}^{\infty}||\Pi_{n}||<\infty$, then the limit also holds $\omega$ a.s. pointwise (Borel-Cantelli).  In the remainder of this post we will not distinguish between these modes of convergence and state freely that $[W,W](T)=T$, without furthe reference to any technicalities with this claim.

Since $[W](T)=\infty$ and $[W,W](T)=T$, we must take care in computing the various limits appearing in
&=\sum_{i=0}^{n-1}f'(W(t_{i}))(W(t_{i+1})-W(t_{i})) + \frac{1}{2}\sum_{i=0}^{n-1}f''(W(t_{i}))(W(t_{i+1})-W(t_{i}))^{2}+\ldots\end{align*}$$

Since $[W,W](T)=T<\infty$, is follows that $[W]^{(k)}(T)=0$ for all $k\geq3$ by a simple estimate as as been done several times above.  Thus the $\ldots$ terms can safely be ignored.  And since $\sup_{t\in[0,T]}|f''(t)|<\infty$, the second sum converges.  We shall see that it converges to
(Incidentally, this is where the commonly used, though mathematically meaningless, notation $dWdW=dt$ comes from).  The first term also converges, though this is not immediately obvious since the Riemann-Stieltjes theory does not apply to it as the integrator $W(t)$ is not of bounded variation.  It turns out that it converges to
where the integral is what is known as an Ito integral.  This integral is constructed exactly like a Riemann-Stieltjes integral, except that the sample point used in the approximating sums must always be the left-hand point of the interval.  Differerent approximation schemes (i.e. mid-point, right-point, etc.) lead to different limiting values.  If the mid-point is used, it is referred to as the Strochonivich integral.  We shall not need this integral here.  The reason that the Ito integral is used (i.e. left-hand point approximation) is that $f(W(t_{i}))$ is interpreted as the position we take in a stock at time $t_{i}$ with the information available at time $t_{i}$, and the capital gain on the stock is then $f(W(t_{i}))(W(t_{i+1})-W(t_{i}))$ if we assume the stock price follows a Brownian motion (which strictly speaking it doesn't, but we shall ignore this fact here since it can be corrected by replacing $W$ with geometric Brownian motion $X$).  Taking the limit as $\max|t_{i+1}-t_{i}|\to0$ and then summing the individual gains gives us the net capital gains on a portfolio resulting from taking positions $f(W(t))$ in continuous time.

In light of the above, we conclude that
Compare this to (4), and we see that we obtain one, and only one, extra term $\frac{1}{2}\int_{0}^{T}f''(W(s))\;ds$, can be traced back to the fact $[W,W](T)=T$ and $[W]^{k}=0$ for $k\geq2.$  This is often recast in differential notation (which again, is mathematically meaningless)

The mathematically meaningful form is (8), though (9) is used more often for calculations since it is accompanied by what is known as a "box" calculus that facilitates computations.  This will be discussed in more detail below.


Let $\{W(t)\}_{t\geq0}$ be a standard Brownian motion with the natural filtration $\{\mathcal{F}_{t}\}_{t\geq0}$, and $f(x,t)\in\mathcal{C}^{2}(\mathbb{R}\times[0,T])$ jointly in $(x,t)$.  We will consider the stochastic process $\Delta(t)=f(W(t),t)$, which is clearly adapted to $\{

We take the following preliminary facts for granted, and defer to previous blog posts covering Brownian motion and stochastic integration for proofs.
  1. Almost surely, we have the variation formulas $[W]^{1}(t)=+\infty,[W]^{2}(t)=t$ and $[W]^{k}(t)=0$ for $k\geq3$.
  2. Almost surely, we have the convergence of $\lim_{||\Pi_{[0,T]}||\to0}\sum_{i=1}^{n}\Delta(t_{i})(W(t_{i+1})-W(t_{i}))$ for any continuous and adapted process $\Delta(t)$.  We denote this limit by $\int_{0}^{T}\Delta(t)\;dW(t)$ and refer to it as the Ito integral of $\Delta$.  The limit is taken in $L^{2}(\Omega).$
Theorem (Ito's Lemma).  With the notation above, we have for all $T>0$ $$\begin{align*}f(W(T),T)-f(W(0),0)=\\\int_{0}^{T}f_{t}(W(t),t)\;dt+\int_{0}^{T}f_{x}(W(t),t)\;dW(t)+\frac{1}{2}\int_{0}^{T}f_{xx}(W(t),t)\;dt.\end{align*}$$  We sometimes write for $f=f(W(t),t)$ $$df=f_{t}dt+f_{x}dW+\frac{1}{2}f_{xx}dt.$$

Proof.  Fix $T>0$ and let $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ be a partition of $[0,T]$ and compute using Taylor's expansion
&:= A+B+C+D+E+F.\end{align*}$$

The left hand side is unaffected by taking limits as $||\Pi||\to0$, and so we may do so in computing the right hand side terms.  Without loss of generality we assume $\Pi$ is uniform, so we consider equivalently $n\to\infty.$

The regularity of $f$ implies that
the integral being an ordinary Lebesgue (Riemann) integral.  By item 2 above we have
the integral being an Ito integral as discussed here.  To deal with $D$, $E$ and $F$ we estimate
$$|D|\ll_{\beta}\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|\sum_{i=0}^{n-1}(t_{i+1}-t_{i})\ll_{\beta}T\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|,$$
$$|E|\ll_{\beta}\sup_{0\leq i\leq n}|t_{i+1}-t_{i}|\sum_{i=0}^{n-1}(t_{i+1}-t_{i})\ll_{\beta}T\sup_{0\leq i\leq n}|t_{i+1}-t_{i}|,$$
$$|F|\ll_{\beta}\sup_{0\leq i\leq n}|W(t_{i+1})-W(t_{i})|\sum_{i=0}^{n-1}(W(t_{i+1})-W(t_{i}))^{2}.$$
Appealing to item 2 above we then conclude (since the maps $t\mapsto t$ and $t\mapsto W(t)$ are continuous) that
It remains to establish the limit
Intuitively this should be true since $[W]^{2}(T)=T,$ a fact that we sometimes write as $dWdW=dt.$  However, a rigorous proof requires some effort, and this is precisely the point in the proof (assuming Brownian motion and stochastic integration are covered) that almost every mathematical finance text skips over.  (Note that theorem has already been proved in the special case that $f=p(x,t)$, a second degree polynomial; as an example, consider the special case $f(x,t)=\frac{1}{2}x^{2}$ in order to compute the Ito integral $\int_{0}^{T}W(t)\;dW(t)$).

Because this fact is of interest in and of itself, we isolate the proof that $C\to\frac{1}{2}\int_{0}^{T}f_{xx}(W(t),t)\;dt\;\text{as}\;n\to\infty$ in the following lemma.

Lemma.  Let $f$ be a bounded continuous function on $[0,T]$ and $\{W(t)\}_{t \geq 0}$ a standard one-dimensional Brownian motion. Then almost surely $$\sum_{i=0}^{n-1} f(W(t_{i}))(W(t_{i+1})-W(t_{i}))^{2}\to\int_{0}^{T}f(W(t))\;dt\;\text{as}\;n\to\infty$$ where $n\to\infty$ means (WLOG) $\Pi=\{t_{0}=0,t_{1},\ldots,t_{n}=T\}$ is a uniform partition of $[0,T]$ and $|\Pi| := \max_j |t_j-t_{j-1}|\to0$.

Proof.  Since $t \mapsto f(W(t))$ is (almost surely) continuous, $$\sum_{i=0}^{n-1} f(W_{t_{i}})(t_{i+1}-t_{i}) \to \int_0^T f(W(t))\;dt\;\text{as}\;n\to\infty.$$
Therefore, it suffices to show
$$I_n := \sum_{i=0}^{n-1} f(W(t_{i})) \bigg[ (W(t_{i+1})-W(t_{i}))^2 - (t_{i+1}-t_{i}) \bigg] \to 0\;\text{as}\;n\to\infty.$$

At this point it is convenient to define $\Delta t_{i} := t_{i+1}-t_{i}$ and $\Delta W_i := W(t_{i+1})-W(t_{i})$.  Recalling that $\{W(t)^2-t\}_{t \geq 0}$ is a martingale with respect to the canonical filtration $(\mathcal{F}_t)_{t \geq 0}$, we compute

$$\begin{align*} &\quad \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))[\Delta W_i^2 - \Delta_i]\Delta W_i^2-\Delta_i]\bigg)\\ &= \mathbb{E} \bigg( \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))  [\Delta W_i^2 - \Delta_i]  [\Delta W_i^2-\Delta_i] \mid \mathcal{F}_{t_{i}} \bigg) \bigg) \\ &= \mathbb{E} \bigg( f(W(t_{i})) f(W(t_{i-1}))  [\Delta W_i^2-\Delta_i]  \underbrace{\mathbb{E} \bigg( \Delta W_i^2 - \Delta_i \mid \mathcal{F}_{t_{i}} \bigg)}_{\mathbb{E}(\Delta W_i^2-\Delta i)=0} \bigg) = 0, \end{align*}$$

and thus

$$\mathbb{E}(I_n^2) = \mathbb{E}\left(\sum_{i=0}^{n-1} f(W(t_{i}))^2 (\Delta W_i^2-\Delta_i)^2 \right).$$

(Observe that the cross-terms vanish.)  Using that $f$ is bounded and $W(t)-W(s) \sim W(t-s) \sim \sqrt{t-s} W(1)$ we find

$$\begin{align*} \mathbb{E}(I_n^2) &\leq \|f\|_{\infty}^2 \sum_{i=0}^{n-1} \mathbb{E}\bigg[(\Delta W_i^2-\Delta_i)^2\bigg] \\ &= \|f\|_{\infty}^2 \sum_{i=0}^{n-1} \Delta_i^2  (\mathbb{E}(W_1^2)-1)^2 \\ &\leq C |\Pi| \sum_{i=0}^{n-1} \Delta_i = C |\Pi| T \end{align*}$$

for $C := \|f\|_{\infty}^2 (\mathbb{E}(W_1^2)-1)^2$. Letting $|\Pi| \to 0$, the claim follows.


We assume the reader is familiar with the various lines of convergence in real analysis: pointwise, uniform, almost uniform, in measure/probability, $L^{p}$, etc.  This short section is just to help clarify what is meant by almost sure convergence in the context of this and related topics.

Statements of convergence involving Brownian motion are almost always established in $L^{2}(\Omega,P)$, which in turn implies convergence in probability because Chebyshev's inequality states for a sequence of random variables $X_{n}$ and proposed limit $X$ that
$$P(|X_{n}-X|\geq\epsilon)\leq\frac{1}{\epsilon^{2}}\mathbb{E}\left[|X_{n}-X|^{2}\right]\to0\;\text{as}\;n\to\infty\;\text{for all}\;\epsilon>0\;\text{fixed}.$$

For example, in the proof of Ito's lemma we really proved that $$\lim_{n\to\infty}\sum_{i=0}^{n-1}f(W(t_{i-1}),t_{i-1})(W(t_{i+1})-W(t_{i}))^{2}=\int_{0}^{T}f(t)\;dt$$ in $L^{2}(\Omega)$, and by consequence, almost surely.  To clarify, this means that for almost every sample path, or outcome $\omega\in\Omega$, we have

The case is similar to proving things like almost surely $[W,W](t)=t$ and almost surely $\int f(t)\;dW(t)$ exists in the Ito sense.