Processing math: 100%

11 January, 2019

Relating Random Variables and their Statistics to Inner Products

Let us consider a typical real-valued random variable Y defined on a probability space (\Omega,\mathcal{F},\mathbb{P}).  Let X be another such random variable, and assume X,Y\in L^{2}(\mathcal{F}).  We emphasize the \sigma-algebra \mathcal{F} because this will be the structural element of our probability space subject to frequent change; in particular, the measure and sample space \Omega will remain constant (we ignore the issues that arise with this assumption when additional random variables are added to our universe, and implicitly assume \Omega and \mathbb{P} have been properly extended in a way that is consistent with all other previous random variables in play).

The space L^{2}(\mathcal{F}) is a Hilbert space with norm
(X,Y)=\mathbb{E}[XY]=\int_{\Omega}X(\omega)Y(\omega)\;d\mathbb{P}(\omega).
The integral defining this inner product can be calculated as a Lebesgue integral on the range of the random vector (X,Y) (in particular, an integral over \mathbb{R}^{2}) in the usual way by changing variables and using the appropriate push-forward (distribution) measures of X and Y (and densities if the distributions are absolutely continuous with respect to Lebesgue measure).




This inner product can be related to many of the common statistical measures of X and Y.  Let us use the notations
\left\{\begin{array}{l} \mu_{X}=\mathbb{E}(X)\\ \sigma_{XY}=\mathbb{Cov}(X,Y)=\mathbb{E}[(X-\mu_{X})(Y-\mu_{Y})]\\ \sigma^{2}_{X}=\mathbb{Var}(X)=\mathbb{Cov}(X,X)=\mathbb{E}[(X-\mu_{X})^{2}]\end{array}\right.
 Then in terms of the inner product we have
\left\{\begin{array}{l} \sigma_{X}^{2}=(X-\mu_{X},X-\mu_{X})=||X-\mu_{X}||_{2}^{2}\\ \sigma_{XY}=(X-\mu_{X},Y-\mu_{Y}) \end{array}\right.

This correspondence suggests that covariance between random variables is akin to an orthogonal projection.  The only complication is the centering about the means \mu; if our random variables are distributionally symmetric, then there is no problem.  But in general this is not the case; moreover, we cannot just subtract off the means and maintain a consistent definition for the inner product, since \mu_{X}\neq\mu_{Y} in general.  However, a very useful identity will partially resolve this, namely
(Y,X-\mu_{X})=(Y-\mu_{Y},X)=(Y-\mu_{Y},X-\mu_{X})=\mathbb{Cov}(X,Y).

Indeed, using the fact that (1_{\Omega},X)=\mathbb{E}X=\mu_{X} and the one-argument linearity property of the inner products, we have
(Y-\mu_{Y},X-\mu_{X})=(X,Y)-\mu_{X}(Y, 1_{\Omega})-\mu_{Y}(1_{\Omega},X)+\mu_{X}\mu_{Y}(1_{\Omega},1_{\Omega})=(X,Y)-\mu_{X}\mu_{Y},

yet
(Y,X-\mu_{X})=(X,Y)-\mu_{X}(Y,1_{\Omega})=(X,Y)-\mu_{X}\mu_{Y},
and the claim follows by symmetry.  Thus, we can define in the usual way the orthogognal projection of Y in the direction of X-\mu_{X} by
(\mathbb{proj}_{X-\mu_{X}}Y)(\omega)=\frac{(Y,X-\mu_{X})}{||X-\mu_{X}||_{2}^{2}}(X(\omega)-\mu_{X})=\frac{\sigma_{XY}}{\sigma_{X}^{2}}(X(\omega)-\mu_{X})=\frac{\mathbb{Cov}(X,Y)}{\mathbb{Var}(X)}(X(\omega)-\mu_{X})
One might recognize the coefficient \sigma_{XY}/\sigma_{X}^{2} as the minimal variance hedge ratio of Y and X (i.e., the quantity h such that the random variable (portfolio) P:=X+hY has minimal variance).  With the above in mind, we make some clarifying definitions:


Definition.  Let X_{1},\ldots,X_{n} be a collection of random variables in \in L^{2}(\mathcal{F}).  Then \{X_{n}\}_{n}
  1. Linearly independent if for all \omega\in\Omega the identity \alpha_{1}X_{1}(\omega)+\ldots+\alpha_{n}X_{n}(\omega)=0 implies \alpha_{j}=0 for 1\leq j\leq n. Otherwise, \{X_{n}\}_{n} are linearly dependent.
  2. Pairwise orthogonal if (X_{i},X_{j})=0 for all i\neq j and 1\leq i,j\leq n.
  3. Pairwise uncorrelated if (X_{i}-\mu_{i},X_{j}-\mu_{j})=0 for i\neq j and 1\leq i,j\leq n
Note again that (3) is \mathbb{Cov}(X_{i},X_{j})=(X_{i},X_{j}-\mu_{j})=(X_{i}-\mu_{i},X_{j})=(X_{i},X_{j})-\mu_{i}\mu_{j}. Consequently, (2) and (3) are equivalent if either \mu_{i}=0 or \mu_{j}=0 and mutually exclusive if \mu_{i}\neq0 and \mu_{j}\neq0. Also note that (2) implies (1), but not conversely.

We are now ready for the main business of this post.  Let Y\in L^{2}(\mathcal{F})\mathcal{G}\subset\mathcal{F}, and \{X_{n}\}_{n} be a collection of \mathcal{G}-measurable random variables.  After removing any linearly dependent X_{i}, let \mathcal{S}:=\mathbb{span}(X_{0},X_{1},\ldots,X_{n}).  We assume X_{0}(\omega)=1_{\Omega}(\omega) (i.e., the constant function). We wish to estimate Y from the variables \{X_{n}\}_{n\geq0}.  This can be done optimally (in the sense of minimal L^{2} norm) by orthogonally projecting Y onto \mathcal{S}.  In order to carry out this procedure, we must first orthogonalize the collection \{X_{n}\}_{n}.  This can be carried out using the Gram-Scmidt procedure, which generates a new family \{\hat{X}_{n}\}_{n} that is pairwise orthogonal and such that \hat{\mathcal{S}}=\mathcal{S}.  The procedure is very simple: begin with \hat{X}_{0}=X_{0}.  Then define \hat{X}_{1} to be the difference between \hat{X}_{0} and the orthogonal projection of X_{1} onto \hat{X_{0}}.  In other words, \hat{X}_{1} is the error in estimating X_{1} from \hat{X_{0}}. The procedure is then repeated for X_{2}, where \hat{X}_{2} is set equal to the difference between X_{2} and its projection onto (the space spanned by) \hat{X}_{0} and \hat{X_{1}}, and so on.

Let us carry out this procedure for the case n=3.  We note again that the projection of Y in the direction of X is defined as
\mathbb{proj}_{X}Y=\frac{(X,Y)}{(X,X)}X.

Note that this means
\mathbb{proj}_{X-\mu_{X}}Y=\frac{\mathbb{Cov}(X,Y)}{\mathbb{Var}(X)}(X-\mu_{X})=\frac{\sigma_{XY}}{\sigma^{2}_{X}}(X-\mu_{X}).


We have \hat{X}_{0}=1_{\Omega};

\begin{align*} \hat{X}_{1} &=X_{1}-\mathbb{proj}_{\hat{X}_{0}}X_{1} \\&=X_{1}-\frac{\left(X_{1},1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega} \\&=X_{1}-\mu_{1} ;\end{align*}

\begin{align*} \hat{X}_{2} &=X_{2}-\left(\mathbb{proj}_{\hat{X}_{0}}+\mathbb{proj}_{\hat{X}_{1}}\right)X_{2} \\&=X_{2}-\frac{\left(X_{2}, 1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega}-\frac{\left(X_{2},X_{1}-\mu_{1}\right)}{||X_{1}-\mu_{1}||_{2}^{2}}\left(X_{1}-\mu_{1}\right) \\&=\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right) \\&=X_{2}-\frac{\sigma_{12}}{\sigma^{2}_{1}}X_{1}-\left(\mu_{2}-\frac{\sigma_{12}}{\sigma^{2}_{1}}\mu_{1}\right) ;\end{align*}


\begin{align*} \hat{X}_{3} &=X_{3}-\left(\mathbb{proj}_{\hat{X}_{0}}+\mathbb{proj}_{\hat{X}_{1}}+\mathbb{proj}_{\hat{X}_{2}}\right)X_{3} \\&=X_{3}-\frac{\left(X_{3}, 1_{\Omega}\right)}{||1_{\Omega}||_{2}^{2}}1_{\Omega}-\frac{\left(X_{3},X_{1}-\mu_{1}\right)}{||X_{1}-\mu_{1}||_{2}^{2}}\left(X_{1}-\mu_{1}\right)-\frac{\left(X_{3},\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right)}{||\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)||^{2}_{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right) \\&=\left(X_{3}-\mu_{3}\right)-\frac{\sigma_{13}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)-\frac{\sigma_{23}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\sigma_{13}}{\sigma_{2}^{2}-2\frac{\sigma_{12}}{\sigma_{1}^{2}}\sigma_{12}+\frac{\sigma_{12}^{2}}{\sigma_{1}^{4}}\sigma_{1}^{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right) \\&=\left(X_{3}-\mu_{3}\right)-\frac{\sigma_{13}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)-\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\left(\left(X_{2}-\mu_{2}\right)-\frac{\sigma_{12}}{\sigma^{2}_{1}}\left(X_{1}-\mu_{1}\right)\right) \\&=\left(X_{3}-\mu_{3}\right)-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\left(X_{1}-\mu_{1}\right)-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\left(X_{2}-\mu_{2}\right) \\&=X_{3}-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)X_{2}-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)X_{1}-\left(\mu_{3}-\left(\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\mu_{2}-\left(\frac{\sigma_{13}}{\sigma^{2}_{1}}-\frac{\sigma_{12}}{\sigma_{1}^{2}}\frac{\sigma_{1}^{2}\sigma_{23}-\sigma_{12}\sigma_{13}}{\sigma_{1}^{2}\sigma_{2}^{2}-\sigma_{12}^{2}}\right)\mu_{1}\right) ;\end{align*}


:=\sigma(X)\subset\mathcal{F} be the sub-\sigma-algebra of \mathcal{F} generated by X (i.e., \{X^{-1}(B)\}_{B\in\mathcal{B}(\mathbb{R})},