11 January, 2019

Relating Random Variables and their Statistics to Inner Products

Let us consider a typical real-valued random variable $Y$ defined on a probability space $(\Omega,\mathcal{F},\mathbb{P})$.  Let $X$ be another such random variable, and assume $X,Y\in L^{2}(\mathcal{F}$).  We emphasize the $\sigma$-algebra $\mathcal{F}$ because this will be the structural element of our probability space subject to frequent change; in particular, the measure and sample space $\Omega$ will remain constant (we ignore the issues that arise with this assumption when additional random variables are added to our universe, and implicitly assume $\Omega$ and $\mathbb{P}$ have been properly extended in a way that is consistent with all other previous random variables in play).

The space $L^{2}(\mathcal{F})$ is a Hilbert space with norm
$$(X,Y)=\mathbb{E}[XY]=\int_{\Omega}X(\omega)Y(\omega)\;d\mathbb{P}(\omega).$$ The integral defining this inner product can be calculated as a Lebesgue integral on the range of the random vector $(X,Y)$ (in particular, an integral over $\mathbb{R}^{2}$) in the usual way by changing variables and using the appropriate push-forward (distribution) measures of $X$ and $Y$ (and densities if the distributions are absolutely continuous with respect to Lebesgue measure).




This inner product can be related to many of the common statistical measures of $X$ and $Y$.  Let us use the notations
$$\left\{\begin{array}{l}
\mu_{X}=\mathbb{E}(X)\\
\sigma_{XY}=\mathbb{Cov}(X,Y)=\mathbb{E}[(X-\mu_{X})(Y-\mu_{Y})]\\
\sigma^{2}_{X}=\mathbb{Var}(X)=\mathbb{Cov}(X,X)=\mathbb{E}[(X-\mu_{X})^{2}]\end{array}\right.$$  Then in terms of the inner product we have
$$\left\{\begin{array}{l}
\sigma_{X}^{2}=(X-\mu_{X},X-\mu_{X})=||X-\mu_{X}||_{2}^{2}\\
\sigma_{XY}=(X-\mu_{X},Y-\mu_{Y})
\end{array}\right.$$
This correspondence suggests that covariance between random variables is akin to an orthogonal projection.  The only complication is the centering about the means $\mu$; if our random variables are distributionally symmetric, then there is no problem.  But in general this is not the case; moreover, we cannot just subtract off the means and maintain a consistent definition for the inner product, since $\mu_{X}\neq\mu_{Y}$ in general.  However, a very useful identity will partially resolve this, namely
$$(Y,X-\mu_{X})=(Y-\mu_{Y},X)=(Y-\mu_{Y},X-\mu_{X})=\mathbb{Cov}(X,Y).$$
Indeed, using the fact that $(1_{\Omega},X)=\mathbb{E}X=\mu_{X}$ and the one-argument linearity property of the inner products, we have
$$(Y-\mu_{Y},X-\mu_{X})=(X,Y)-\mu_{X}(Y, 1_{\Omega})-\mu_{Y}(1_{\Omega},X)+\mu_{X}\mu_{Y}(1_{\Omega},1_{\Omega})=(X,Y)-\mu_{X}\mu_{Y},$$
yet
$$(Y,X-\mu_{X})=(X,Y)-\mu_{X}(Y,1_{\Omega})=(X,Y)-\mu_{X}\mu_{Y},$$ and the claim follows by symmetry.  Thus, we can define in the usual way the orthogognal projection of $Y$ in the direction of $X-\mu_{X}$ by
$$(\mathbb{proj}_{X-\mu_{X}}Y)(\omega)=\frac{(Y,X-\mu_{X})}{||X-\mu_{X}||_{2}^{2}}(X(\omega)-\mu_{X})=\frac{\sigma_{XY}}{\sigma_{X}^{2}}(X(\omega)-\mu_{X})=\frac{\mathbb{Cov}(X,Y)}{\mathbb{Var}(X)}(X(\omega)-\mu_{X})$$ One might recognize the coefficient $\sigma_{XY}/\sigma_{X}^{2}$ as the minimal variance hedge ratio of $Y$ and $X$ (i.e., the quantity $h$ such that the random variable (portfolio) $P:=X+hY$ has minimal variance).  With the above in mind, we make some clarifying definitions:


Definition.  Let $X_{1},\ldots,X_{n}$ be a collection of random variables in $\in L^{2}(\mathcal{F})$.  Then $\{X_{n}\}_{n}$
  1. Linearly independent if for all $\omega\in\Omega$ the identity $\alpha_{1}X_{1}(\omega)+\ldots+\alpha_{n}X_{n}(\omega)=0$ implies $\alpha_{j}=0$ for $1\leq j\leq n$. Otherwise, $\{X_{n}\}_{n}$ are linearly dependent.
  2. Pairwise orthogonal if $(X_{i},X_{j})=0$ for all $i\neq j$ and $1\leq i,j\leq n$.
  3. Pairwise uncorrelated if $(X_{i}-\mu_{i},X_{j}-\mu_{j})=0$ for $i\neq j$ and $1\leq i,j\leq n$. 
Note again that (3) is $\mathbb{Cov}(X_{i},X_{j})=(X_{i},X_{j}-\mu_{j})=(X_{i}-\mu_{i},X_{j})=(X_{i},X_{j})-\mu_{i}\mu_{j}.$ Consequently, (2) and (3) are equivalent if either $\mu_{i}=0$ or $\mu_{j}=0$ and mutually exclusive if $\mu_{i}\neq0$ and $\mu_{j}\neq0$. Also note that (2) implies (1), but not conversely.

We are now ready for the main business of this post.  Let $Y\in L^{2}(\mathcal{F})$, $\mathcal{G}\subset\mathcal{F}$, and $\{X_{n}\}_{n}$ be a collection of $\mathcal{G}$-measurable random variables.  After removing any linearly dependent $X_{i}$, let $\mathcal{S}:=\mathbb{span}(X_{0},X_{1},\ldots,X_{n})$.  We assume $X_{0}(\omega)=1_{\Omega}(\omega)$ (i.e., the constant function). We wish to estimate $Y$ from the variables $\{X_{n}\}_{n\geq0}$.  This can be done optimally (in the sense of minimal $L^{2}$ norm) by orthogonally projecting $Y$ onto $\mathcal{S}$.  In order to carry out this procedure, we must first orthogonalize the collection $\{X_{n}\}_{n}$.  This can be carried out using the Gram-Scmidt procedure, which generates a new family $\{\hat{X}_{n}\}_{n}$ that is pairwise orthogonal and such that $\hat{\mathcal{S}}=\mathcal{S}$.  The procedure is very simple: begin with $\hat{X}_{0}=X_{0}$.  Then define $\hat{X}_{1}$ to be the difference between $\hat{X}_{0}$ and the orthogonal projection of $X_{1}$ onto $\hat{X_{0}}$.  In other words, $\hat{X}_{1}$ is the error in estimating $X_{1}$ from $\hat{X_{0}}.$ The procedure is then repeated for $X_{2}$, where $\hat{X}_{2}$ is set equal to the difference between $X_{2}$ and its projection onto (the space spanned by) $\hat{X}_{0}$ and $\hat{X_{1}}$, and so on.

Let us carry out this procedure for the case $n=3$.  We note again that the projection of $Y$ in the direction of $X$ is defined as
$$\mathbb{proj}_{X}Y=\frac{(X,Y)}{(X,X)}X.$$
Note that this means
$$\mathbb{proj}_{X-\mu_{X}}Y=\frac{\mathbb{Cov}(X,Y)}{\mathbb{Var}(X)}(X-\mu_{X})=\frac{\sigma_{XY}}{\sigma^{2}_{X}}(X-\mu_{X}).$$

We have $$\hat{X}_{0}=1_{\Omega};$$
$$\begin{align*}
\hat{X}_{1}
&=X_{1}-\mathbb{proj}_{\hat{X}_{0}}X_{1}
\\&=X_{1}-\frac{\left(X_{1},1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega}
\\&=X_{1}-\mu_{1}
;\end{align*}$$
$$\begin{align*}
\hat{X}_{2}
&=X_{2}-\left(\mathbb{proj}_{\hat{X}_{0}}+\mathbb{proj}_{\hat{X}_{1}}\right)X_{2}
\\&=X_{2}-\frac{\left(X_{2}, 1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega}-\frac{\left(X_{2},X_{1}-\mu_{1}\right)}{||X_{1}-\mu_{1}||_{2}^{2}}\left(X_{1}-\mu_{1}\right)
\\&=\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)
\\&=X_{2}-\frac{\sigma_{12}}{\sigma^{2}_{1}}X_{1}-\left(\mu_{2}-\frac{\sigma_{12}}{\sigma^{2}_{1}}\mu_{1}\right)
;\end{align*}$$

$$\begin{align*}
\hat{X}_{3}
&=X_{3}-\left(\mathbb{proj}_{\hat{X}_{0}}+\mathbb{proj}_{\hat{X}_{1}}+\mathbb{proj}_{\hat{X}_{2}}\right)X_{3}
\\&=X_{3}-\frac{\left(X_{3}, 1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega}-\frac{\left(X_{3},X_{1}-\mu_{1}\right)}{||X_{1}-\mu_{1}||_{2}^{2}}\left(X_{1}-\mu_{1}\right)-\frac{\left(X_{3},\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right)}{||\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)||^{2}_{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right)
\\&=\left(X_{3}-\mu_{3}\right)-\frac{\sigma_{13}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)-\frac{\sigma_{23}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\sigma_{13}}{\sigma_{2}^{2}-2\frac{\sigma_{12}}{\sigma_{1}^{2}}\sigma_{12}+\frac{\sigma_{12}^{2}}{\sigma_{1}^{4}}\sigma_{1}^{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right)
\\&=\left(X_{3}-\mu_{3}\right)-\frac{\sigma_{13}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)-\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right)
\\&=\left(X_{3}-\mu_{3}\right)-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\left(X_{1}-\mu_{1}\right)-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\left(X_{2}-\mu_{2}\right)
\\&=X_{3}-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)X_{2}-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)X_{1}-\left(\mu_{3}-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\mu_{2}-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\mu_{1}\right)
;\end{align*}
$$

$:=\sigma(X)\subset\mathcal{F}$ be the sub-$\sigma$-algebra of $\mathcal{F}$ generated by $X$ (i.e., $\{X^{-1}(B)\}_{B\in\mathcal{B}(\mathbb{R})}$,