08 January, 2015

Some Exercises in Real Analysis


Problem #1.  


  1. See Exam Handout.

Solution.
  1. By the definition of the weak topology $\tau'$ on $H$, each $\Lambda\in H^{\star}$ is continuous, and $\tau'$ is the weakest topology which insures this by declaring $V(\Lambda,U):=\Lambda^{-1}(U)$ to be a subbasic open set for every open $U\in\mathbb{R}$. Since we are working in a topological vector space, we can significantly refine this in order to make this weak topology more tractable. First note the easy fact that any mapping $f$ from a topological space $X$ into a metric space $Y_{d}$ is continuous if and only if there exists an open set $U_{\epsilon}$ corresponding to every $\epsilon>0$ such that $d_{Y}(f(x_{1}-f(x_{2}))<\epsilon$ whenever $x_{1},x_{2}\in U_{\epsilon}$ (this is a standard fact proved in Math 121), and the even easier fact that $\Lambda0=0$ by linearity. Now $\mathbb{R}$ is a metric space whose origin has a local base given by the balls $B_{\epsilon}(0)$ for all $\epsilon>0$; consequently, the origin $0$ of $H$ has a weak local base $\mathscr{B}$ given by the collection of all finite intersections of the subbasic open sets $$V(\Lambda,\epsilon):=\left\{x\in H:|\Lambda x|<\epsilon\;\text{for all}\; \Lambda\in H^{\star}\;\text{and}\;\epsilon>0\right\},$$ and since translation on any topological vector space is continuous (Definition 1.9.1), this local base completely determines the weak topology $\tau'$ on all of $H$ (thus we can reduce most arguments which are strictly topological in nature to arguments concerning only the origin). Explicitly, every weak neighborhood $U$ of $0$ is the union of basic open sets of the form $$V=\bigcap\limits_{j=1}^{n}V(\Lambda_{j},\epsilon_{j})=\{x\in H:|\Lambda_{j} x|<\epsilon_{j}\;\text{for every}\;1\leq j\leq n<\infty\},$$ and after a translation by $x$, each weak neighborhood of a point $x\in H$ is the union of the basic open sets of the form $x+V$. (The notation $\Lambda_{j}$ is for convenience, and in no way implies we are ordering or otherwise enumerating $H^{\star}$.) Fixing $U$ and taking the intersection of the kernels of each $\Lambda_{j}$ (which is a subspace of $H$) corresponding to some basic open set $V$ which $U$ contains, we find $$\{0\}\subset K:=\bigcap\limits_{j=1}^{n}\text{Ker}(\Lambda_{j})=\{x\in H:\Lambda_{j}=0<\epsilon;\text{for all}\;1\leq j\leq n\}\subset V\subset U,$$ and since $x\mapsto (\Lambda_{1},\ldots,\Lambda_{n})$ takes $X$ into $\mathbb{R}^{n}$ with kernel $K$ as above, it follows that $$\dim X\leq \dim\mathbb{R}^{n}+\dim K,$$ this implying (since $H$ is infinite dimensional) $\dim K=\infty.$ Consequently, every weak neighborhood $U$ of $0$ contains an infinite dimensional subspace of $H$! Among other things, this proves that $U\cap S\neq\emptyset$ since clearly $U$ contains every point of the form $\alpha y$, where $\alpha\in\mathbb{R}$ and $y\in K$, and in particular the point $\frac{y}{||y||}\in S.$ Translating the origin to any point $x\in H$ (cf. remark above) then shows that $U_{x}$ is a weak neighborhood of $x$ and that $U_{x}\supset K_{x}$, i.e. every weak neighborhood $U_{x}$ of every point $x\in H$ contains the infinite dimensional affine subspace $K_{x}$, and thus a line segment of the form $x+\alpha y$ passing through $x$. It follows that $U_{x}\cap S\neq\emptyset$ whenever $||x||\leq1$, since in this case we can find an $\alpha$ corresponding to any $y\in K_{x}$ such that $||x+\alpha y||=1$ (for Hilbert spaces we may, in applying the Pythagorean identity, take $\alpha=\left(\frac{1-||x||^{2}}{||y||^{2}}\right)^{\frac{1}{2}}$ by choosing $y\in K_{x}$ orthogonal to $x$). We conclude therefore that $\overline{S}^{w}\supset B$ since every weak neighborhood of every $x\in B$ contains an infinite dimensional subspace which intersects $S$.

    (See remark (1) below) To obtain the reverse inclusion, we must exhibit, for each $x\in H$ with $||x||>1$, a weak neighborhood $U$ which does not intersect $S$. In the spirit of the preceding geometric proof, the geometric Hahn-Banach theorem (Theorem 1.5.14) easily provides us with such a neighborhood (see remark (4) below for an alternative approach to applying it). To apply it, we first record the obvious facts that the strong (norm) topology on $H$ is locally convex (Exercises 1.3.3 and 1.9.1), that it is normal (T4) in the sense that its basic open sets separate closed sets (this easy fact about metric spaces is standard in basic topology), and that $B$ is strongly closed; hence, there is a convex strongly open neighborhood $W$ of $\{x\}$ such that $W\cap B=\emptyset$. In light of the convexity of $B$, all of the conditions of the geometric Hahn Banach theorem are satisfied, and so there is a hyperplane in $H$ which separates $B$ and $W$. Precisely, for each $x\notin B$, there exists a linear functional $\Lambda\in H^{\star}$ and $c\in\mathbb{R}$ such that $$\Lambda x_{0}< c\leq\Lambda z$$ for every $x_{0}\in W$ and $z\in B.$ By definition the pullback $\Lambda^{-1}(-\infty,c)$ is a weak neighborhood $U$ of $x$, and by the separation bound above $U\cap B=\emptyset.$ $U$ is then a weak neighborhood of $x$ which does not intersect $S$, and so $x\notin\overline{S}^{w}$, i.e. $\overline{S}^{w}\subset B^{c}$. The claim follows by combining the two inclusions.

    Remarks:

    (1) After concluding $\overline{S}^{w}\supset B$, we can use Exercise 1.9.20 to conclude $\overline{S}^{w}=B$, since this exercise shows $B$ is weakly closed. This seemed to violate the instructions of the exam though, since the exercise is too closely related; in any case, the proof above provided a nice application of some of the optional material in the text (namely, the geometric Hahn-Banach theorem), and facilitates a slight generalization to the problem as indicated below.

    (2) Note that the above proof goes through when $H$ is merely an infinite-dimensional normed space (not necessarily Banach); moreover, since every continuous linear functional $\Lambda$ is completely determined by its continuous real part (recall $z=\Re z-i\Re(iz)$ for every $z\in\mathbb{C}$), the geometric Hahn-Banach theorem continues to hold in a complex topological space $X$, with the separation being given by $\Re\Lambda$ and $c\in\mathbb{R}$. Indeed, with the real case already proved, it is easy to see that the so constructed continuous real linear functional $\lambda$ is the real part of a unique continuous complex linear functional $\Lambda$, and that $\Lambda x=\lambda x-i\lambda(ix)$. Thus, the conclusion can be generalized to: the weak close of the unit sphere in an infinite dimensional complex normed space is the unit ball.

    (3) Note that an immediate implication of the result above is that the strong topology strictly contains the weak topology in an infinite dimensional normed space, since $S$ is closed in the norm topology. Here's another observation that implies the same result, among others. In the proof above we showed that every weak neighborhood of $0$ contains an infinite dimensional subspace of $H$. This implies that the weak topology on $H$ is never locally bounded whenever $H$ is infinite-dimensional, regardless of whether $H$ is normed, metrized, etc.; thus, if $H$ is a locally bounded and infinite dimensional topological vector space (note that normed spaces are locally bounded), then the strong topology on $H$ strictly contains the weak topology on $H$. Taken together, these observations provide a nice extension to part of Exercise 1.19.13. Thus, insofar as $H$ is normed, the weak and strong topologies never coincide when $H$ is infinite dimensional. On the other hand, we proved (after making some minor adjustments) that at least the weak and strong closures of any convex set coincide. There are numerous other observations in this direction; for example, the weak topology on $H$ is never locally compact (despite the unit ball being weakly compact when $H$ is reflexive) whenever $H$ is infinite dimensional (Exercise 1.9.24); it is also straight-forward to prove the converse of Exercise 1.9.24, thus showing a topological vector space is locally compact if and only if it is finite dimensional, thereby implying that the weak and strong topologies coincide whenever $H$ is finite-dimensional, since both topologies are Hausdorff, and one cannot weaken or strengthen such a topology without losing the Hausdorff or local compactness property, respectively.

    (4) There is a very convenient form of the geometric Hahn-Banach theorem which is stated on its Wikipedia page, in that if the topological vector space $X$ is locally convex, $A$ compact, and $B$ closed, then there exists a hyperplane separating these sets with an even stronger condition than in the case for when $A$ is open and both $A$ and $B$ are convex. Specifically, there are two scalars $c,d\in\mathbb{R}$ and a continuous real linear functional $\Lambda\in H^{\star}$ (cf remark (2) for the complex case) such that for all $x\in A$ and $y\in B$, $$\Lambda x< c< d<\Lambda y$$ holds. Then in the proof above, one can take $\{x\}\in B^{c}$ to be the compact set $A$ and the ball $B$ to be the closed set $B$. Since this form of the Hahn-Banach theorem is not needed, I will not prove it in detail, but its proof is fairly obvious and consists of recognizing that if $V$ is a convex neighborhood of $0$ (which exists since $X$ is now assumed locally convex), then $A+V$ is open and convex (the convexity is obvious; that it is open when $A$ is compact follows from $A+V=\cup_{x\in A}x+V$, i.e. a union of open sets index by elements in $A$), and one can arrange so that $(A+V)\cap B=\emptyset$ (this is a sort of ``almost'' normal property that all topological vector spaces have, in that a pair of compact and closed sets can be separated by open sets, but not necessarily two closed sets). Then one simply applies the original geometric Hahn-Banach theorem to the sets $A+V$ and $B$ to obtain a separating hyperplane $\Lambda$, noting then that $\Lambda(A)\subset\Lambda(A+V)$ and that $\Lambda(A)$ is compact in $\mathbb{R}$ and separated from $\Lambda(B)$.

    (5) While this remark is not directly related to the problem above, it seems worthwhile to mention it in light of remark (4). From the modified geometric Hahn-Banach theorem, we see that in a locally convex space the dual space $X^{\star}$ always separates points on $X$. But this is the only essential ingredient required to prove that the weak topology on $X$ is Hausdorff, thus providing a generalization to Exercise 1.9.14 (that the weak-* topology on $X^{\star}$ is Hausdorff follows immediately from the obvious fact that the linear functionals $\lambda_{x}(\Lambda)=\Lambda(x)$ are continuous (by definition) and separate points in $X^{\star}$).   

Problem #2.  


  1. See Exam Handout.

Solution.
  1. Let $\tau$ and $\upsilon$ denote the strong (norm) topologies on $X$ and $Y$, respectively, with a $'$ indicating the corresponding weak topologies. Then if $T$ is continuous from $X_{\tau'}\to Y_{\upsilon'}$, it is obviously continuous from $X_{\tau}\to Y_{\upsilon'}$ since $\tau\supset\tau'$ (Exercise 1.9.13). But $Y_{\upsilon'}$ is Hausdorff (Exercise 1.9.14), $\upsilon\supset\upsilon'$, and so $T$ is continuous from $X_{\tau}\to Y_{\upsilon}$ as a consequence of the closed graph theorem (Theorem 1.7.19, condition (iii)). Suppose now $T$ is continuous from $X_{\tau}\to Y_{\upsilon}$ and let $U$ be a weakly open set in $Y_{\upsilon'}$. By the definition of the weak topology, there is a corresponding collection of open sets $V_{i}\subset\mathbb{C}$ such that $$W:=T^{-1}(U)=T^{-1}\left(\cup_{\alpha}\cap_{i}\Lambda_{i}^{-1}(V_{i})\right)=\cup_{\alpha}\cap_{i}T^{-1}\left(\Lambda_{i}^{-1}(V_{i})\right)=\cup_{\alpha}\cap_{i}\left(\Lambda_{i}\circ T\right)^{-1}(V_{i}),$$ where $i=1,2,\ldots,n$, $\alpha\in A$ is some (arbitrary) index, and $\Lambda_{i}\in Y^{\star}$. The continuity of $T$ from $X_{\tau'}\to Y_{\upsilon'}$ now follows from showing $W$ is weakly open in $X_{\tau'}$, and this will follow by showing the continuity of each $\lambda_{i}:=\Lambda_{i}\circ T$ when viewed as maps from $X_{\tau'}\to Y_{\upsilon'}\to\mathbb{C}$. To that end, note first the obvious fact that $T$ is continuous from $X_{\tau}\to Y_{\upsilon'}$ since $\upsilon\supset\upsilon'$. Then since the composition of two linear continuous functions is again a linear continuous function (they form a type of algebra), we see $\lambda$ as defined above for arbitrary $\Lambda\in Y^{\star}$ is a linear continuous map from $X_{\tau}\to Y_{\upsilon'}\to\mathbb{C}$, i.e. $\lambda\in X^{\star}.$ But then by definition of the weak topology again, $\lambda$ is also a linear continuous map from $X_{\tau'}\to Y_{\upsilon'}\to\mathbb{C}$, and the claim follows from the preceding observations.   

Problem #3.  


  1. See Exam Handout.

Solution.
  1. By assumption, for every $x\in H$ we have $$0\leq||T^{n+1}x||=||TT^{n}x||\leq||T||_{\text{op}}||T^{n}x||\leq||T^{n}x||,$$ and $$||T^{n}x||\leq||T^{n}||_{\text{op}}||x||=||T||_{\text{op}}^{n}||x||\leq||x||<\infty.$$ (The relation $||T^{n}||=||T||^{n}$ is a consequence of $T$ being self-adjoint since $||T^{*}T||=||T||^{2}$ for any $T\in B(H)$). Thus, being a bounded monotonically decreasing sequence, $\lim_{n\to\infty}||T^{n}x||$ exists for every $x\in H$. We now compute the Cauchy difference of the sequence $T^{2n}x$ to obtain \begin{align*} ||T^{2n}x-T^{2m}x||^{2} &=(T^{2n}x-T^{2m}x\;|\;T^{2n}x-T^{2m}x)\\ &=(T^{4n}x\;|\;x)-2(T^{2(n+m)}x\;|\;x)+(T^{4m}x\;|\;x)\\ &=||T^{2n}x||^{2}-2(T^{2(n+m)}x\;|\;x)+||T^{2m}x||^{2}. \end{align*} Since $$||T^{n}x||^{2}=(T^{n}x\;|\;T^{n}x)=(T^{2n}x\;|\;x)$$ converges as $n\to\infty$, we find that the subsequences $(T^{4n}\;|\;x)$, $(T^{4m}x\;|\;x)$, and $(T^{2(m+n)}\;|\;x)$ all converge to the same limit (note $2(m+n)$ is even, so is a subsequence of a convergent subsequence, cf. Exercise 1.6.8). Thus, $||T^{2n}x-T^{2m}x||\to0$ as $m,n\to\infty$. This holds for every $x$, so we conclude $\lim_{n\to\infty}T^{2n}$ exists in the strong operator topology in $B(H)$.   
  2. Since $x\in V$ if and only if $T^{2}x-x=0$, we see that $V=\text{Ker}(T^{2}-I)=(T^{2}-I)^{-1}(\{0\})$, which is a closed subspace of $H$ since $T^{2}-I$ is continuous and the singleton set $\{0\}$ is closed. Now let $Tx=\lim_{n\to\infty}T^{2n}x$, which exists by (a) for every $x\in H$. It is clear that $T$ is linear because limit operations are linear and each $T^{2n}$ is. That $T$ is bounded is obvious from its definition, thus continuous $T$ is continuous and $T\in B(H)$. To show that $T$ is an orthogonal projection of some subspace of $H$, it is enough to show that $T$ is an idempotent self-adjoint operator. To that end, fix $x,y\in H$ and compute $$(x,T^{*}Ty)=(Tx\;|\;Ty)=\lim\limits_{n\to\infty}(T^{2n}x\;|\;T^{2n}y)=\lim_{n\to\infty}(x\;|\;T^{2n^{*}}T^{2n}y)=\lim_{n\to\infty}(x\;|\;T^{4n}y)=(x\;|\;Ty),$$ and also in the same manner $$(Tx,Ty)=\lim_{n\to\infty}(T^{4n}x\;|\;y)=(Tx\;|\;y).$$ The conclusions that $T=T^{*}T=TT^{*}=T^{2}$ all follow immediately from this; thus, $T=\pi_{S}$ for some subspace $S=\mathscr{R}(T)$. Suppose now $x\in V$. Then $T^{2}x=x$, so $T^{2n}x=x$ for each $n$, and thus $Tx=x$, the identity operator on $V$. We have that $S\supset V$, and to show the reverse inclusion, take $x\in H$ and let $\pi_{V}$ be the orthogonal projection of $V$. We need to show $\pi_{V}Tx=Tx$ so that we have $Tx\in V$, hence $S\subset V$.   
  3. I wasn't able to come up with a counter-example in time; mostly because I don't have any intuition on what a self-adjoint operator is, let alone many concrete examples (barring projections and the integration operator in problem 6). But here's a quick example that everyone is familiar with from Fourier series that nicely illustrates how hard it is to converge in the uniform topology. Consider the space $L^{2}([0,2\pi])$ and the orthogonal projection operators $\pi_{i}f=(f\;|\;e^{int})e^{int}$ onto the orthonormal basis $\{e^{int}\}_{i=1}^{\infty}.$. Obviously each $\pi_{i}$ is self-adjoint (being projections), and it is clear each $||\pi_{i}||=1$ by their definitions for each $i$. The Parasavel identity shows that $\pi_{i}\to0$ in the strong topology, since $||f||^{2}=\sum_{i=1}^{\infty}||\pi_{i}f||^{2}$ so that we must have $||\pi_{i}f||\to0$ as $i\to\infty$ for each $f\in L^{2}$. But $||\pi_{i}||=1$, so by uniqueness of limits, we conclude $\pi_{j}$ does not converge in the uniform topology.   

Problem #4.  


  1. See Exam Handout.

Solution.
  1. (Note: This ended up being one of the more difficult problems on the exam, and I have kept my original line of thought mostly in tact to illustrate this (mostly for myself as a learning aid)).

    Since we need to show strong convergence, let's begin by fixing $x\in H$ and computing the Cauchy difference (with $T_{n}:=(\pi_{V}\pi_{W})^{n}$) \begin{align*} ||T_{n}x-T_{m}x||^{2} &=\left(T_{n}x-T_{m}x\;|\;T_{n}x-T_{m}x\right)\\ &=\left(T_{n}x\;|\;T_{n}x\right)-2\left(T_{n}x\;|\;T_{m}x\right)+\left(T_{m}x\;|\;T_{m}x\right)\\ &=\left(T_{n}^{*}T_{n}x\;|\;x\right)-2\left(T_{m}^{*}T_{n}x\;|\;x\right)+\left(T_{m}^{*}T_{m}x\;|\;x\right)\\ &=\left(\pi_{W}T_{2n-1}x\;|\;x\right)-2\left(\pi_{W}T_{n+m-1}x\;|\;x\right)+\left(\pi_{W}T_{2m-1}x\;|\;x\right)\\ &=||T_{n}x||^{2}-2\left(T_{n}x\;|\;T_{m}x\right)+||T_{m}x||^{2}. \end{align*} The second to last line follows from the identity $$T_{n}^{*}=(\pi_{V}\pi_{W}\ldots\pi_{V}\pi_{W})^{*}=\pi_{W}^{*}\pi_{V}^{*}\ldots\pi_{W}^{*}\pi_{V}^{*}=\pi_{W}\pi_{V}\pi_{W}\ldots\pi_{V}\pi_{W}\pi_{V}=\pi_{W}T_{n-1}\pi_{V}$$ (where the self-adjointness of each projection is used) and then applying the idempotent identity $\pi_{V}^{2}=\pi_{V}$ to rewrite the resulting composition of operators; the last line, which is true for any squared inner product, is written for comparison to the more interesting line before. Since $\pi_{W}$ is an orthogonal projection, it follows that $$||T_{n}x||^{2}=(\pi_{W}T_{2n-1}x\;|\;x)\leq(T_{2n-1}x\;|\;x),$$ this being because if $y\in H$ and $\pi_{S}$ is an orthogonal projection onto a closed subspace $S$, then $y=\pi_{S}y+\pi_{S^{\perp}}y$ and so by the Pythagorean identity $||y||^{2}=||\pi_{S}y||^{2}+||\pi_{S^{\perp}}||^{2}$, i.e. $||\pi_{S}y||\leq||y||$ (one then just uses the polarization identity to conclude the same thing for inner products). Consequently, $$||T_{n+1}x||^{2}\leq(T_{2n+1}x\;|\;x)=(\pi_{V}\pi_{W}\pi_{V}\pi_{W}T_{2n-1}x\;|\;x)\leq(\pi_{W}T_{2n-1}x\;|\;x)=||T_{n}x||^{2}\;(!)$$ Thus, being a monotonically decreasing sequence of nonnegative numbers, $\lim\limits_{n\to\infty}||T_{n}x||$ exists for every $x\in H$ since $H$ is complete. But this does not show $\lim_{n\to\infty}T_{n}x$ exists! We need some kind of weak convergence before we can make this claim (cf. Exercise 1.9.18 for instance). But at least notice that our remarks above also show that the subsequence $\{(T_{j}x\;|\;x)\}_{j\;\text{odd}}$ converges, since this is also a monotonically decreasing sequence bounded below by $0$. Suppose now we could arrange $n+m-1$ to always be odd, that is, of the form $2j-1$ (where $j\to\infty$ as $m,n\to\infty$). Then we could have $$||T_{n}x-T_{m}x||^{2}\leq(T_{2n-1}x\;|\;x)+(T_{2m-1}x\;|\;x)-2(T_{2j-1}x\;|\;x)\leq(T_{2n+1}x\;|\;x)+(T_{2m+1}x\;|\;x)-2(T_{2j+1}x\;|\;x)\leq\ldots.$$ We know that the right hand side of this inequality tends to $0$ as $m,n\to\infty$; however, because of the inequality, we cannot directly conclude $||T_{n}x-T_{m}x||\to0$ as $m,n\to \infty$. We could fix this by carrying around a chain of extra projections as indicated above, but this is awkward, and obviously does not resolve the more immediate issue of needing $m+n-1$ to be odd. But suppose we could show $x'=\lim_{n\to\infty}Tx$ existed for every $x\in H$. Then we would immediately see that $T$ was linear because limit operations are linear and each $T_{n}$ is; moreover, if $x\in V\cap W$, then $T_{n}=(\pi_{V}\pi_{W}\ldots\pi_{V}\pi_{W})x=x$, so $Tx=x$, and $T$ would be the identity on $V\cap W$ as required. But even if we could get here, we have only concluded upto this point that $T=\pi_{V\cap W}$ for $x\in V\cap W$. To finish the proof, we would need to show that $T$ is an orthogonal projection (e.g. self-adjoint, idempotent, etc.) and that $\mathscr{R}(T)\subset W\cap V$ (the reverse inclusion $\mathscr{R}(T)\supset W\cap V$ was already shown). But working with the sequence $T_{n}$ leads to further difficulties; for example, in an attempt to show that $T$ is self-adjoint, we would write (using the continuity of the inner product) $(Tx,y)=\lim_{n\to\infty}(T_{n}x\;|\;y)=\lim_{n\to\infty}(x\;|\;\pi_{W}T_{n-1}\pi_{V}y)=(x\;|\;\pi_{W}T\pi_{V}y),$ and thus would need to verify that $\pi_{V}T\pi_{W}y=Ty$. To show the inclusion $\mathscr{R}(T)\subset V\cap W$, one obvious way to proceed would be by verifying $\pi_{W}Tx=Tx=\pi_{V}Tx$, so that for each $x\in H$ we would have $Tx\in V\cap W$. Some of these equalities are trivial; for example, since $\pi_{V}T_{n}x=T_{n}x$ for all $x,n$ and $\pi_{V}$ is obviously bounded and continuous, we could conclude by taking limits that $\pi_{V}Tx=Tx\in V$ (cf. Exercise 1.9.29). The same method shows $T\pi_{W}x=Tx$, but this equality does not help us, and the method fails for $\pi_{W}T_{n}x$ since we cannot apply any idempotent relation here. We conclude that this direct approach is fruitless.

    In light of the above observations, there is a clear, though admittedly awkward, way of resolving our issues (although I would be interested in seeing if the direct approach above could be resolved). Our major source of frustration stemmed from the need to ensure $m+n-1$ was odd in the Cauchy difference above, and also from the lack of control in applying an indempotent relation. We can resolve this by considering the recursively defined sequence $T'_{1}:=\pi_{V}$, $T_{2}=\pi_{W}\pi_{V}$, $T'_{3}=\pi_{V}\pi_{W}\pi_{V}$, $\pi_{W}\pi_{V}\pi_{W}\pi_{V}$, $\ldots$, which is similar to the sequence $T_{n}$. A quick glance at our previous work then shows the above relations continue to hold and with a couple of notational changes, we can prove the above assertions which we had difficulty with for $T_{n}$. However, even once this is done, it is still possible that $T'_{n}x\neq T_{n}x$ as $n\to\infty$ for each $x\in H$ ($T'_{n}$ is not even a subsequence of $T_{n}$, and we still have not shown $T_{n}$ even converges). However, this is easily corrected by showing that the ``reverse'' sequence $T''_{1}=\pi_{W}$, $T''_{1}=\pi_{V}\pi_{W}$, $T''_{3}=\pi_{W}\pi_{V}\pi_{W}$, $\ldots$ (for which $n$ even we have $T'_{n}=T''^{*}_{n}$ and for $n$ odd each term in each sequence is self-adjoint) has the same strong limit as $T'$. Once this is done, we will then be able to conclude $\lim_{n\to\infty}T_{n}$ exists and that $\lim T_{n}=\lim T_{n}'=\lim_{n}T_{n}''=T.$

    We now proceed to show $T_{n}'$ and $T_{n}''$ both converge strongly to a common limit $T=\pi_{V\cap W}$; this will then finally show $T_{n}$ converges strongly to $\pi_{V\cap W}$, and the proof will be complete. In that vein, we begin with $T'$ and proceed exactly as before by fixing $x\in H$ and examining the convergence of the sequence $T'_{n}x$. We first note the similar (though different) adjoint relations $$T'^{*}_{n}T'_{n}=T'_{2n-1},$$ and $$T'^{*}_{n}T'_{m}=T'_{m+n-l},$$ where the integer $l$ is needed because of the differences which can occur when $m$ and $n$ have the same or opposite parity (even/oddness). When they are both odd or even, $l=1$, and when one is odd and the other even, $l=0$. Note that we no longer have extra projections to worry about, and so we can maintain equality in the Cauchy difference below! Note also that now $m+n-l$ is always odd (addition assigns even parity to terms which are the same, and assigns odd parity to terms which differ), and changes it when they are different), so can be written as $2j-11$ for some $j(m,n)\to\infty$ as $m,n\to\infty$. Thus, we find exactly as before (and making the above changes) that $$||T'_{n}x-T'_{m}x||^{2}=(T'_{2n-1}x\;|\;x)-2(T'_{2j-1}x\;|\;x)+(T'_{2m-1}x\;|\;x).$$ In the same manner, we deduce that $$(T'_{2n-1}x\;|\;x)=||T'_{n}||^{2}$$ so $$(T'_{2n+1}x\;|\;x)=||T'_{n+1}||^{2}=||\pi_{V}T'_{n}||^{2}\leq||T'_{n}||^{2}=(T'_{2n-1}x\;|\;x)$$ or $$(T'_{2n+1}x\;|\;x)=||T'_{n+1}||^{2}=||\pi_{W}T'_{n}||^{2}\leq||T'_{n}||^{2}=(T'_{2n-1}x\;|\;x)$$ depending on whether $n$ is odd or even. In either case, we find that $\lim_{n\to\infty}(T'_{2n-1}x\;|\;x)$ exists, since this subsequence (indexed by odd numbers) is bounded below by $0$ and nonincreasing. Therefore, we conclude at last $$\lim\limits_{n\to\infty}T'_{x}=x'\in H$$ for every $x\in H$, since $$||T'_{n}x-T'_{m}x||\to0$$ as $m,n\to\infty$. Now we finish the proof. $T'$ is linear since limit operations are, and each $T'_{n}$ is. It is also clear that $\mathscr{R}(T')\subset V\cap W$ and that on $V\cap W$ $T'$ is the identity since for such $x$ we have $\pi_{V}x=x$, $\pi_{W}x=x$, so that $T'_{n}x=x$ for each $n$. One of the motivating reasons for the definition of the sequence $T'_{n}$ is so that we could prove $\pi_{V}T'=T'$ and $\pi_{W}T'=T'$. Indeed, $\pi_{V}T'_{2n}x=T'_{2n+1}x$ and $\pi_{W}T'_{2n-1}x=T'_{2n}x$ by the relations above. Therefore, we can take $n\to\infty$ and conclude $\pi_{V}T'x=\pi_{W}T'x=T'x$ for all $x\in H$ (cf. Exercise 1.9.29; note that we have also used the fact that if a sequence converges, all of its convergent subsequences converge to the same limit). It follows that $\mathscr{R}(T')\subset V\cap W$ since $T'x=\pi_{V}T'x\in V$ and $T'x=\pi_{W}T'x\in W$, so $T'x\in V\cap W$ for every $x\in H$. Consequently, $\mathscr{R}(T')=W\cap V$, and $T$ is the identity on $W\cap V$. It remains to show that $T'$ is an orhtogonal projection, and this will follow from showing $T'$ is indempotent and self-adjoint. If $x,y\in H$, then the continuity of the inner product and our computations above show that $$(x\;|\;T'^{*}T'y)=(T'x\;|\;T'y)=\lim_{n\to\infty}(T'_{n}x,\;|\;T'_{n}y)=\lim_{n\to\infty}(T'_{2n-1}x\;|\;y)=(T'x\;|\;y)=(x\;|\;T'^{*}y),$$ and also (in the same manner) $$(T'x,T'y)=\lim_{n\to\infty}(x\;|\;T'_{2n-1}y)=(x\;|\;T'y).$$ All of the relations $T'=T'^{*}T'=T'T'^{*}=T'^{2}$ follow immediately from these these results; thus, $T'=\pi_{V\cap W}$. Since it is obvious that the exact same argument goes through with $T''_{n}$, we conclude $T'=T''$ and so $$\lim_{n\to\infty}Tx=\lim_{n\to\infty}T'x=\lim_{n\to\infty}T''x=\lim_{n\to\infty}(\pi_{V}\pi_{W})^{n}x=\pi_{V\cap W}x$$ for every $x\in H$; that is, $(\pi_{V}\pi_{W})^{n}\to\pi_{V\cap W}$ strongly in $B(H\to H)$. }   

Problem #5.  


  1. See Exam Handout.

Solution.
  1. Following the hint, a quick mental calculation motivates us to define linear functionals $$\Lambda_{n}^{x}(f):=(f*K_{n})(x)=\int_{\mathbb{R}}K_{n}(x-y)f(y)\;dx,$$ since then $$||\Lambda_{n}^{x}||_{op}=\sup\limits_{f\in BC(\mathbb{R}\to\mathbb{R}),\;||f||_{L^{\infty}}=1}\left|\int_{\mathbb{R}}K_{n}(x-y)f(y)\;dy\right|\leq\int_{\mathbb{R}}|K_{n}(y)|\;dy=||K_{n}||_{L^{1}}.$$ But this bound is only useful if we can demonstrate equality, or otherwise modify it to obtain a lower bound, since we are attempting to upper bound the term $||K_{n}||_{L^{1}}.$ To that end, fix $x=0$ and consider just the set of linear functionals $\{\Lambda_{n}\}.$ For fixed $n$, it is clear that the supremum defining $||\Lambda_{n}||$ above will be achieved by the continuous function which is most frequently $1$ in absolute value and of the same sign as $K_{n}$. Absent continuity, such a function is the simple function $\psi_{n}$ defined by $\psi_{n}(-y)=1$ when $K_{n}(y)\geq0$ and $\psi_{n}(-y)=-1$ whenever $K_{n}(y)<0

    Remarks. This exercise complements Example 1.7.9 following the proof of the uniform boundedness principle, concerning the Fourier transform. In fact, in Math 136 we showed that the Dirichlet kernels $\{D_{n}\}_{n=1}^{\infty}$ for the partial sums of the Fourier series of continuous functions on $[0,2\pi]$ have divergent $L^{1}$ norm as $n\to\infty$. If we replace $K_{n}$ by $D_{n}$ in this problem, then we are in the exact opposite situation since in this case we have an unbounded set of linear functionals, and the uniform boundedness principle would imply then that for every $x$ in $[0,2\pi]$ that there is a corresponding dense collection of continuous functions whose Fourier series diverge at $x$!

Problem #6.  


  1. See Exam Handout.

Solution.
  1. First note that $$|Tf(x)|=\left|\int_{0}^{x}f(y)\;dy\right|\leq\int_{0}^{x}|f(y)|\;dy\leq||f||_{1}\leq||\chi_{[0,1]}||_{2}||f||_{2}<\infty$$ by Holder's inequality and the fact that the indicator function $\chi_{[0,1]}$ has norm $1$ on $L^{2}([0,1])$. This shows that $T$ is well-defined, which isn't immediately obvious since $f$ is only assumed to be in $L^{2}([0,1])$, not $L^{1}([0,1])$ (and in fact this is only true because $[0,1]$ is finite, a counter-example being $x^{-1}$ on $[1,\infty]$ -- thus, we see another curious instance of the interplay between containments of $L^{p}$ spaces and how such containments depend not only on the exponents $p$ and $q$, but also on the underlying support of each function; see Exercises 1.3.11-14). Let $$\mathscr{F}:=\{Tf:||f||_{2}\leq1\},$$ and observe for each $F\in\mathscr{F}$, $x,y\in[0,1]$, and $x\geq y$ that $$|F(x)-F(y)|\leq\int_{y}^{x}|f(t)|\;dt=\int_{0}^{1}|f(t)|\chi_{[x,y]}(t)\;dt\leq||\chi_{[x,y]}||_{2}||f||_{2}\leq A|x-y|^{\alpha}=|x-y|^{\frac{1}{2}}.$$ Thus, $T$ takes the unit ball $B(0;1)\subset L^{2}([0,1])$ into a subset $\mathscr{F}$ of the Holder space $H^{1,\frac{1}{2}}([0,1])$. It is clear that each $F\in\mathscr{F}$ is continuous on the compact set $[0,1]$, and that the family $\mathscr{F}$ is equicontinuous with continuity mode $\delta=\epsilon^{2}$ (owing to the uniform Holder continuity for each $F$) and uniformly bounded by $1$. All of the conditions of the Arzela-Ascoli theorem are satisfied (Theorem 1.8.23) and we conclude $\overline{\mathscr{F}}$ is compact in $BC([0,1]\to\mathbb{R})$; that is,$$\mathscr{F}=T(B(0;1))\subset H^{1,\frac{1}{2}}[0,1]\subset BC([0,1])\subset L^{2}([0,1])$$ is precompact in $BC([0,1])$. It remains to show that this pre-compactness in $BC([0,1])$ implies pre-compactness in the norm topology of $L^{2}([0,1])$. The easiest way to show this is by using the sequential version of the Arzela-Ascoli theorem (Corollary 1.8.25). Consider any sequence $\{F_{n}\}$ in $\mathscr{F}$. Then such a subsequence has a convergent subsequence $F_{n_{j}}$ which converges uniformly on $[0,1]$ as $j\to\infty$ to a limit $F$, i.e. converges in the $L^{\infty}$ norm. But observe that $$\lim_{j\to\infty}||F_{n_{j}}-F||_{2}=\lim_{j\to\infty}\int_{0}^{1}|F_{n_{j}}(x)-F(x)|^{2}\;dx=0$$ by the dominated convergence theorem (actually, dominated convergence is not needed here, since the underlying measure space $[0,1]$ is finite, every $f\in L^{\infty}$ is automatically in $L^{2}$; in particular, estimating the integral after taking $j$ so large that $|f_{n_{j}}(x)-F(x)|<\epsilon$ for all $x\in[0,1]$ gives $||f_{n_{j}}-f||_{2}<\epsilon^{2}$), and so $\mathscr{F}$ is precompact in $L^{2}$ since sequential compactness is equivalent to compactness in normed spaces. Note that the crucial fact here is that the measure space $[0,1]$ is finite, and Exercise 1.3.11 elaborates on this. In any case, we see that $T$ is compact as required.   
  2. My immediate conjecture was the differentiation operator! But alas, a simple computation shows $T^{*}$ is not nearly this exciting. Indeed, for $f,g\in L^{2}([0,1])$ we need $(Tf\;|\;g)=(f\;|\;T^{*}g),$ so following our nose we compute \begin{align*} (Tf\;|\;g) &=\int_{[0,1]}\left(\int_{[0,x]}f(y)\;dy\right)\overline{g(x)}\;dx\\ &=\int_{[0,1]}\overline{g(x)}\int_{[0,1]}f(y)\chi_{[0,x]}(y)\;dydx\\ &=\int_{[0,1]}f(y)\int_{[0,1]}\overline{g(x)}\chi_{[0,y]}(x)\;dxdy\\ &=\int_{[0,1]}f(y)\overline{\int_{[0,1]}\overline{\chi_{[0,y]}(x)}g(x)\;dx}dy. &=\int_{[0,1]}f(y)\overline{\int_{[0,1]}\chi_{[0,y]}(x)g(x)\;dx}dy. \end{align*} Thus, taking $T^{\star}=T$ works, and by uniqueness of the adjoint, we find that this particular integral operator is self-adjoint. Note that the use of Fubini's theorem to change the order of integration is justified since it is clear $F(x,y):=\overline{g(x)}f(y)\chi[0,x](y)$ is absolutely integrable on $[0,1]\times[0,1]$ as $||f||_{1}\leq||1||_{2}||f||_{2}=||f||_{2}<\infty$ and $||\bar{g}||_{1}\leq||1||_{2}||\bar{g}||_{2}=||\bar{g}||_{2}<\infty$ by Holder's inequality.   

Problem #7.  


  1. See Exam Handout.

Solution.
  1. Inspired by the similarities of semi-continuous functions with jump functions, define $\{D_{n}\}_{n=1}^{\infty}$ by $$D_{n}:=\left\{x\in X:\omega_{f}(x)>\frac{1}{n}\right\},$$ where $\omega_{f}(x)=\lim_{r\to0}\Omega_{f}\left(B(x;r)\right)$ is the oscillation of $f$ at $x$ (cf Math 131). Note that $D:=\bigcup_{n}D_{n}\subset X$ is the set of discontinuities of $f$ in $X$ since $f$ is continuous at $x$ if and only if $\omega_{f}(x)=0$ (cf Math 131). Now $f$ is upper semicontinuous and $X$ is complete (hence normal by Exercise 1.10.1), so there exists continuous functions $f_{j}:X\to(-\infty,\infty]$ such that $f_{j}(x)\geq f(x)$ and $f_{j}(x)\to f(x)$ as $j\to\infty$ for every $x\in X$ (Exercise 1.10.11). Now choose any $N>0$ and any ball $B$ in $X$ and define sets $$E_{m}=\bigcap\limits_{i,j=m}^{\infty}\left\{x\in B:d(f_{i}(x)-f_{j}(x))\leq\frac{1}{2N}\right\}.$$ Then $B=\bigcup_{m}E_{m}$ since $f_{j}\to f$ for every $x$; thus, by Baire's category theorem (Theorem 1.7.3), there is at least one $E_{M}$ which is dense in a subball $B'\subset B$. Moreover, the continuity of each $f_{j}$ implies each $E_{m}$ is the intersection of closed sets, thus $E_{M}$ is closed and we have $E_{M}=B'$. This implies for every $x$ in this subball $B'$ that $d(f_{i}(x),f_{j}(x))\leq\frac{1}{2N}$ whenever $i,j\geq M$, a rather striking conclusion given the definition of $E_{M}$ as an infinite double intersection. Now, holding $M$ fixed we find $$\lim\limits_{j\to\infty}d(f_{M}(x),f_{j}(x))=d(f_{M}(x),f(x))\leq\frac{1}{2N}.$$ An application of the triangle inequality then shows $B'\cap D_{N}=\emptyset$, and since the choice of $B$ was arbitrary (in particular, independent of $N$), we see that every ball $B$ in $X$ contains a subball $B'$ which avoids those points of discontinuity of $f$ lying in $D_{N}$ (i.e. those points where $\omega_{f}(x)>\frac{1}{N}$). Therefore, in sending $N\to\infty$, we conclude every ball $B$ of $X$ actually contains a subball $B'$ on which $f$ is continuous; moreover, we see that $D$ is the union of nowhere dense sets $D_{n}$, and so $f$ is in fact continuous on a dense set of $X$!   

Problem #8.  


  1. See Exam Handout.

Solution.
  1. The Riesz-Fischer representation theorem (Theorem 1.10.11) implies the existence of a unique positive Radon measure $\mu$ such that $I$ is represented by $\mu$, in the sense that $$I(f)=\int_{\mathbb{R}}f\;d\mu$$ for every $f\in C_{c}(\mathbb{R}\to\mathbb{R})$ (obviously $\mathbb{R}$ is a locally/$\sigma$ compact Hausdorff space). But $I$ is translation invariant in and of itself, so the induced measure $\mu$ must also be translation invariant in the sense that $$I(f_{h})=\int_{\mathbb{R}}f(x-h)\;d\mu(x)=\int_{\mathbb{R}}f(x)\;d\mu(x-h)=\int_{\mathbb{R}}f(x)\;d\mu(x)=I(f).$$ Thus, for any $E\subset\mathbb{R}$ and $x\in\mathbb{R}$, we must have $$\mu(x+E)=\mu(E).$$ But then $\mu$ is just a multiple of the Lebesgue measure $m$ since by Exercise 1.2.23 in the previous textbook on measure theory, $\mu$ satisfies (i), (ii), and (iii) and the constant $c$ corresponds to the normalization of $\mu$, that is $\mu([0,1])=c$ and so $$I(f)=\int_{\mathbb{R}}f\;d\mu=c\int_{\mathbb{R}}f\;dx$$ as claimed, where $dx=dm(x)$, the Lebesgue measure on $\mathbb{R}.$ The above arguments are a little symbolic and not very rigorous, so let us be explicit. For $I(f)=I(f_{h})$ to hold, we must have $$\int_{\mathbb{R}}f(x+h)\;d\mu=\lim_{n\to\infty}\int_{\mathbb{R}}\phi_{n}(x+h)\;d\mu=\lim_{n\to\infty}\int_{\mathbb{R}}\phi_{n}(x)\;d\mu=\int_{\mathbb{R}}f(x)\;d\mu$$ for a monotonic sequence of simple functions converging to $f$ a.e. pointwise (such a sequence exists, e.g. one can use the standard construction of partitioning the range of $f$ and pulling back to measurable sets in order to define each $\phi_{n}$, and the result is implied by the monotone convergence theorem). But the integral of each $\phi_{n}$ is just $$\int_{\mathbb{R}}\phi_{n}(x)\;d\mu=\sum\limits_{i=1}^{n}\alpha_{j}\mu(E_{j})$$ for measurable sets $E_{j}$ and scalars $\alpha_{j}$ (one can arrange the $E_{j}$ to be pairwise disjoint), and so $$\int_{\mathbb{R}}\phi_{n}(x-h)\;d\mu=\sum\limits_{i=1}^{n}\alpha_{j}\mu(E_{j}-h).$$ Thus we see that the translation invariance of the integral forces translation invariance of the measure $\mu$. Now we show $\mu(E)=cm(E)$ for every $E\subset\mathbb{R}$ that is Lebesgue measurable. As usual, set $c=\mu([0,1])$ and note that $m([0,1])=1.$ Now consider the $n$ dyadic mesh on $\mathbb{R}$ consisting of $2^{-n}$ disjoint intervals $I$ of length $2^{-n}$. We know that the Lebesgue measure $m$ assigns a mass of $2^{-n}$ to each such interval, and that the set $[0,1]$ can be assimilated by a union $2^{n}$ intervals of length $2^{-n}$, all of which are translates of each other. The translation invariance of both $\mu$ and $m$ then implies for every $I$ in the $n$th dyadic mesh (e.g. $|I|=2^{-n}$) that $$c=\mu([0,1])=\sum_{j=1}^{2^{n}}\mu(I+h_{j})=\sum_{j=1}^{2^{n}}\mu(I)=2^{n}\mu(I)=cm([0,1])=c2^{n}m(I)=cm([0,1])$$ In other words, $$\mu(I)=cm(I)$$ for every dyadic box of size $2^{-n}$. Standard arguments from measure theory then implies $m$ and $\mu$ agree on all Borel sets (in particular, because every open set is the countable union of dyadic boxes), and thus $m$ and $\mu$ agree on the Lebesgue $\sigma$-algebra since every such set disagrees with a Borel set on a null set. Passing from this measure to its associated integral, we then conclude $$I(f)=\int_{\mathbb{R}}f\;d\mu=c\int_{\mathbb{R}}f\;dx$$ as required.   





Exercise 1 (a) 1.1.3, (b) 1.1.4, (c) 1.1.7
  1. (Uniqueness of elementary measure).  Let $d\geq1$, $m':\mathscr{E}(\mathbb{R}^{d})\to\mathbb{R}^{+}$ be a non-negative, finite additive and translation-invariant map from the elementary sets to the positive reals. Prove the existence of a positive constant $c$ such that $m'(E)=cm(E)$ for all $E\in\mathscr{E}(\mathbb{R}^{d})$.

  2. (Products of elementary sets are elementary).   Let $d_{1},d_{2}\geq1$, $E_{1}\subset\mathbb{R}^{d_{1}}$ and $E_{2}\subset\mathbb{R}^{d_{2}}$ elementary, and show that $E_{1}\times E_{2}\subset\mathbb{R}^{d_{1}+d_{2}}$ is elementary with $m(E_{1}\times E_{2})=m(E_{1})m(E_{2})$.

  3. (Regions under graphs of continuous functions are Jordan measurable).   Let $B\subset\mathbb{R}^{d}$ be a closed box and $f\in\mathscr{C}[B\to\mathbb{R}^{d}]$. (1) Show that the graph $G_{f}=\{(x,f(x))\;:\;x\in B\}\subset\mathbb{R}^{d+1}$ has Jordan measure zero. (2) Show that the set $\mathscr{A}=\{(x,t)\;:\;x\in B,\;0\leq t\leq f(x)\}\subset\mathbb{R}^{d+1}$ is Jordan measurable.

Solution.
  1. Let $m$ be the usual elementary measure, $m'$ as above, and put $c=m'\left([0,1)\right)$. Then \begin{align*} c=m'([0,1)) &=m'\left(\bigcup\limits_{j=0}^{n}\left[\frac{j}{n},{j+1}{n}\right)\right) &\text{(finite union of disjoint elementary sets)}\\ &=\sum\limits_{j=1}^{n}m'\left(\left[\frac{j}{n},\frac{j+1}{n}\right)\right) &\text{(finite additivity)}\\ &=\sum\limits_{j=1}^{n}m'\left(\left[0,\frac{1}{n}\right)\right) &\text{(translation invariance)}\\ &=n\cdot m'\left[0,\frac{1}{n}\right)\\ \end{align*} so that $$m'\left(\left[0,\frac{1}{n}\right)\right)=\frac{c}{n}.$$ Therefore $m'$ and $m$ agree on intervals of the form $[0,\frac{1}{n})$ upto a constant normalization of $[0,1)$; in particular, if $c=1$, they agree exactly on these intervals, and so in the following we may use the phrase "agree" without reference to normalization. Repeated applications of translation invariance and finite additivity now shows that $m$ and $m'$ also agree on all intervals $[a,b)$ with rational end points. Restricting attention to such sets, and treating $m'$ as a function of the end points, we have that \begin{align*} \Big|m'(\vec{x_{1}-x_{2}})\Big| &= \Big|m'\left([a_{1},b_{1})\right)-m'\left([a_{2},b_{2})\right)\Big|\\ &= c\Big|m\left([a_{1},b_{1})\right)-m\left([a_{2},b_{2})\right)\Big| \\ &= c\Big|(b_{1}-a_{1})-(b_{2}-a_{2})\Big| \\ &= c\Big|(b_{1}-b_{2})-(a_{1}-a_{2})\Big| \\ &\leq c\Big||b_{1}-b_{2}|+|a_{1}-a_{2}|\Big|\\ &\leq c\Big|x_{1}-x_{2}\Big|, &\text{(equivalent $\ell_{\infty}$ norm)} \end{align*} which shows that $m'$ (and $m$) is Lipschitz continuous with Lipschitz constant $c$ ($1$) on the set of intervals $[a,b)$ with rational end points; and since $\mathbb{Q}$ is dense in $\mathbb{R}$, it follows that $m$ and $m'$ agree on all intervals $[x,y)$ with real end points. Applying finite-additivity again then shows $m$ and $m'$ agree on sets of the form $\bigcup_{k=1}^{N}[x_{k},y_{k})$, which are the elementary sets of $\mathbb{R}^{1}$. It follows that $m$ and $m'$ agree for all elementary sets in $\mathbb{R}^{d}$ with normalization constant $c\mapsto c^{d}$.

  2. We have $E_{1}=\bigcup_{j=1}^{N} B^{1}_{j}$ and $E_{2}=\bigcup_{j=1}^{M} B^{2}_{j}$. Therefore \begin{align*} E_{1}\times E_{2} &= \left(B^{1}_{1}\cup\ldots\cup B^{1}_{N}\right)\times\left(B^{2}_{1}\cup\ldots\cup B^{2}_{M}\right)\\ &=\Big(\{B^{1}_{1}\times B^{2}_{1}\}\cup\ldots\cup \{B^{1}_{1}\times B^{2}_{M}\}\Big)\cup\ldots\cup\Big(\{B^{N}_{1}\times B^{2}_{1}\}\cup \ldots\cup \{B^{N}_{1}\times B^{2}_{M}\}\Big)\\ &= \bigcup\limits_{j=1}^{N}\bigcup\limits_{i=1}^{M}B^{1}_{j}\times B^{2}_{i} \\ &= \bigcup_{k=1}^{NM}B'_{k} \end{align*} where $k$ indexes each cross product (which is again a box $B'_{k}$). Since $NM<\infty$, we see that $E_{1}\times E_{2}$ is indeed elementary. Furthermore, by part $(i)$ of Lemma $1.1.2$ from the text, we may assume that the sets $\{B^{1}_{j}\}_{j=1}^{N}$ and $\{B^{2}_{j}\}_{j=1}^{M}$ are pairwise disjoint so that the set $\bigcup_{ji}\{B^{1}_{j}\times B^{2}_{i}\}=\{B'_{k}\}_{k=1}^{NM}$ is also pairwise disjoint. Moreover, by part $(ii)$ we also have $m(E_{1})=\sum_{j=1}^{N}|B^{1}_{j}|$ and $m(E_{2})=\sum_{j=1}^{M}|B^{2}_{j}|$, so that \begin{align*} m(E_{1}\times E_{2}) &= m\left(\bigcup\limits_{k=1}^{NM}B'_{k}\right) \\ &= \sum\limits_{k=1}^{NM}|B'_{k}|\\ &= \sum\limits_{j=1}^{N}\sum\limits_{i=1}^{M}|B^{1}_{j}\times B^{2}_{i}| \\ &= \sum\limits_{j=1}^{N}\sum\limits_{i=1}^{M}|B^{1}_{j}||B^{2}_{i}|\\ &= \sum\limits_{j=1}^{N}|B^{1}_{j}|\sum\limits_{i=1}^{M}|B^{2}_{i}|\\ &= m(E_{1})m(E_{2}) \end{align*} as required.

  3. For this part, let $\mathscr{G}_{f}$ denote the graph of $f$ and $\mathscr{R}^{+}_{f}$ denote the region of $f$ between the coordinate plane and its positive values. To prove (1) let $\epsilon>0$ be given. The compactness of $B$ implies the existence of a $\delta_{\epsilon}$ (depending on $\epsilon$ only) such that $|x-y|<\delta_{\epsilon}$ implies $|f(x)-f(y)|<\epsilon$ for all $x,y\in B$. Corresponding to this $\delta_{\epsilon}$, put $\delta\leq\frac{1}{\sqrt{d}}\cdot\delta_{\epsilon}$. Then clearly $\delta$ also satisfies the uniform continuity condition; moreover, if $Q$ has center $x$ and $|Q|=\delta\times\ldots\times\delta$, then $Q\subset B(x;\delta_{\epsilon})$. Now suppose $B=\ell_{1}\times\ldots\times\ell_{d}$, and define integers $\{N_{i}\}_{i=1}^{d}$ by $$N_{i}=\left\lceil\frac{\ell_{1}}{\delta}\right\rceil=min\{n\in\mathbb{N}\;:\;n\geq\frac{\ell_{1}}{\delta}\}.$$ Then $B$ can be covered by a family of $M=\prod_{i=1}^{d} N_{i}<\infty$ pairwise disjoint cubes $\{Q_{k}\}_{k=1}^{M}\subset\mathbb{R}^{d}$, each of dimension $\ell=\delta$, hence volume $\delta^{d}$. This is achieved in the obvious way by first extending each $\ell_{i}$ of $B$ by $N_{i}\delta-\ell_{i}$, then partitioning the resulting lengths into $N_{i}$ disjoint sub-intervals of length $\delta$, and then finally by taking the union of $M$ cross products in a manner similar to part $(b)$. From this cover of $B$, we construct a family of of $M$ pairwise disjoint boxes $\{B_{k}\}_{k=1}^{M}\subset\mathbb{R}^{d+1}$ by defining $B_{k}=Q_{k}\times I^{\epsilon}_{k}$, where $I^{\epsilon}_{k}=[\inf_{x\in Q_{k}}f(x),\sup_{x\in Q_{k}} f(x)]$, so that by uniform continuity $|I^{\epsilon}_{k}|<\epsilon$. Then $$G_{f}\subset\bigcup_{k=1}^{M}B_{k},$$ and we have \begin{align*} m^{\star,(J)}(G_{f}) &=\inf\limits_{B_{1}\cup\ldots\cup B_{N}\supset G_{f}}\sum\limits_{j=1}^{N}|B_{j}| \\ &\leq\sum\limits_{k=1}^{M}|B_{k}|\\ &=\delta^{d}\sum\limits_{k=1}^{M}\sup\limits_{x\in Q_{k}}|f(x)-f(y)|\\ &\leq M\delta^{d}\epsilon. \end{align*} Since $\epsilon$ is arbitrary, and $O(M\delta^{d}\epsilon)=O(\prod_{i=1}^{d}N_{i}\cdot\delta^{d}\epsilon)=O(\delta^{-d}\delta^{d}\epsilon)=O(\epsilon)$, we see that $M\delta^{d}\epsilon\to0$ as $\epsilon\to0$. Hence, $m^{\star,(J)}(G_{f})=0$ which implies that $m(J)(G_{f})=0$.

    To prove (2) we make a slight modification to the construction of the $Q_{k}$ above, resulting in a family of cubes and boxes instead. We now keep each $\ell_{i}$ fixed and redefine $$N_{i}=\left\lfloor\frac{\ell_{1}}{\delta}\right\rfloor=max\{n\in\mathbb{N}\;:\;n\leq\frac{\ell_{1}}{\delta}\}.$$ Then each $\ell_{i}$ can be partitioned into $N_{i}$ sub-intervals of length $\delta$ in addition to a ``fringe'' interval $F_{i}$ of length at most $\delta$. We now construct a family of pairwise disjoint boxes $\{Q_{k}\}_{k=1}^{M}$ ($M=O(\delta^{-d})<\infty$) which cover $B$ by taking the union of all cross products from the resulting sub-interval partitions. Note that any $|Q_{k}|$ arising from poducts involving some $F_{i}$ have volume $|Q_{k}|<\delta^{d}$, and in fact, because of the way the partition is made, $B=\bigcup_{k=1}^{M}Q_{k}$. Now assume $f$ is nonnegative (so that $\mathscr{A}$ contains the entire region under $f$) and define numbers $\{\alpha_{k}\}_{k=1}^{M}$ by $\alpha_{k}=\sup_{x\in Q_{k}}f(x)$ (this exists and is finite because of compactness). Then $|\alpha_{k}-f(x)|<\epsilon$ for every $x\in Q_{k}$ (by uniform continuity). We can then construct a finite sequence of boxes $\{B^{\alpha}_{k}\}_{k=1}^{M}$, each of volume $\alpha_{k}|Q_{k}|$, which cover the set $\mathscr{A}$ by setting $B_{k}=Q_{k}\times[0,\alpha_{k}]$. Likewise, if we define numbers $\{\beta_{k}\}_{k=1}^{M}$ by $\beta_{k}=\inf_{x\in Q_{k}}f(x)$, we obtain in the same manner a finite sequence of boxes $\{B^{\beta}_{k}\}_{k=1}^{M}$ of volume $\beta_{k}|Q_{k}|$ which are contained in $\mathscr{A}$. (Essentially, we have formed upper and lower Darboux sums of $f$ on $B$). We now take differences to obtain \begin{align*} \left|\sum\limits_{k=1}^{M}|B^{\alpha}_{k}|-\sum\limits_{k=1}^{M}|B^{\beta}_{k}|\right| &= \sum\limits_{k=1}^{M}|B^{\alpha}_{k}|-|B^{\beta}_{k}|\\ &=\sum\limits_{k=1}^{M}|Q_{k}||\alpha_{k}-\beta_{k}|\\ &\leq\delta^{d}\sum\limits_{k=1}^{M}|\alpha_{k}-\beta_{k}|\\ &<\delta^{d}M\epsilon. \end{align*} Recalling that $\delta^{d}M=O(1)$ (even with the modified construction) and appealing to the obvious inequality $$\sum_{k}^{M}|B^{\beta}_{k}|\leq m_{\star,(J)}(\mathscr{A})\leq m(\mathscr{A})\leq m^{\star,(J)}(\mathscr{A})\leq\sum_{k}^{M}|B^{\alpha}_{k}|,$$ we see that $$|m^{\star,(J)}(\mathscr{A})-m_{\star,(J)}(\mathscr{A})|\leq\epsilon$$ so that $\mathscr{A}$ is Jordan measurable. The case where $f$ is possibly negative now follows immedaitely by observing that $m(\mathscr{A}_{f})=m(\mathscr{A}_{g})$ where $g$ is the function defined by $g(x)=f(x)$ for $x$ such that $f(x)\geq0$ and $g(x)=0$ for $x$ such that $f(x)<0 data-blogger-escaped-above="" data-blogger-escaped-and="" data-blogger-escaped-applies="" data-blogger-escaped-continuity="" data-blogger-escaped-continuous="" data-blogger-escaped-f="" data-blogger-escaped-function="" data-blogger-escaped-g="" data-blogger-escaped-implies="" data-blogger-escaped-li="" data-blogger-escaped-nonnegative="" data-blogger-escaped-note="" data-blogger-escaped-of="" data-blogger-escaped-proof="" data-blogger-escaped-so="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-to="">

Exercise 1 (a) 1.1.8, (b) 1.1.9, (c) 1.1.10
  1. (Measurability of triangular shapes).   Suppose $A,B,C\in\mathbb{R}^{2}$. (1) Show that the solid triangle with vertices $A,B,C$ is Jordan measurable. (2) Show that the Jordan measure of the solid triangle is equal to $\frac{1}{2}|(B-A)^(C-A)|$ where $|(a,b)^(c,d)|:=|ad-bc|$.

  2. (Measurability of compact convex polytopes). Show that every compact convex polytope in $\mathbb{R}^{d}$ is Jordan measurable.





  • (Measurability of open and closed balls). (1) Show that open and closed balls in $\mathbb{R}^{d}$ are measurable with $m(B_{r})=c_{d}r^{d}$ for some constant $c_{d}$ depending on $d$ only. (2) Establish (crude) upper and lower bounds on $c_{d}$.

  • Solution.
    1. (1) It is obvious that $\mathscr{T}$ is expressible as the region $\mathscr{R}_{f\geq g}$ bounded by two (possibly piecewise) continuous linear functions defined on the projection of $\mathscr{T}$ onto the coordinate $x$ axis, and so we may apply corollary 1 to conclude $\mathscr{T}$ is Jordan measurable. Note that it does not matter if $\mathscr{T}$ has an edge parallel to any of the coordinate axes. Note also that we can now use induction to prove any $d$ dimensional triangle is Jordan measurable. Indeed, having established the base case, suppose any $(d-1)$ dimensional triangle is Jordan measurable. Then any $d$ dimensional triangle is bounded between two continuous functions defined on the projection of the $d$ dimensional triangle onto the $(d-1)$ dimensional coordinate axes, and this is either a $(d-1)$ dimensional triangle or a box. Hence, corollary 1 shows that the $d$ dimensional triangle is Jordan measurable.

      (2) To get an idea of what the given expression is, we expand its definition: \begin{align*} |(B-A)\hat{\;\;\;}(C-A)| &= \frac{1}{2}|((B_{x}-A_{x},B_{y}-A_{y})^|(C_{x}-A_{x},C_{x}-C_{y})|\\ &= \frac{1}{2}|(B_{x}-A_{x})(C_{y}-A_{y})-(B_{y}-A_{y})(C_{x}-A_{x})|\\ &= \frac{1}{2}|B_{x}C_{y}+B_{y}A_{x}+A_{y}C_{x}-B_{x}A_{y}-A{x}C_{y}-B_{y}C_{x}|\\ &= \frac{1}{2}|A_{x}(B_{y}-C_{y})-B_{x}(A_{y}-C_{y})+C_{x}(A_{y}-B_{y})|. \end{align*} It is clear now (and in retrospect, even before the computation) that this formula is equivalent to the $\vec{k}$ minor associated with the cross product determinant of $\vec{BA}\times\vec{CA}$, and because the $\vec{i}$ and $\vec{j}$ minor determinants are both $0$ (since the vectors spanning $\mathscr{T}$ are planar), it is also equal to the magnitude $|\vec{BA}\times\vec{CA}|$. Therefore, if $\theta$ is the angle between the vectors $\vec{AB}$ and $\vec{AC}$, we have that \begin{align*} \frac{1}{2}|(B-A)\hat{\;\;\;}(C-A)| &=\vec{AB}\times\vec{AC}\cdot\vec{k}\\ &=|\vec{AB}\times\vec{AC}|\\ &=\frac{1}{2}|\vec{AB}||\vec{AC}|\sin\theta\\ &=\frac{1}{2}Base(\mathscr{T})Height(\mathscr{T}), \end{align*} which is the well known formula for the area (measure) of $\mathscr{T}$.

    2. From the definition, any compact convex polytope $P$ is the intersection of $N$ half spaces $H_{n}$, and so can be written as the solution to a system of linear inequalities: $$Ax\leq c,$$ or more explicitly $$\sum_{j=1}^{d}x_{i}v_{i}\leq c_{k},\;\;k=1,2,\ldots,N.$$ The solution to such a system of inequalities is a region bounded by a set of planes, and this case, the region is assumed to be bounded. The result is trivial if $d=1$, since the only half spaces are intervals, the finite intersection of which clearly being Jordan measurable. For $d=2$, it is again trivial, for the projection of $P$ onto the $x$-axis is just a closed interval, and we can apply corollary 1 since any half space can be written as a continuous linear function, and therefore $P$ can be expressed as the region $\mathscr{R}_{f\geq g}$ bounded by two (possibly piecewise) continuous linear functions defined on this interval. We now apply induction for the general case. Indeed, having established the base case, suppose any $(d-1)$ dimensional compact convex polytope is Jordan measurable. Then any $d$ dimensional compact convex polytope is bounded between two continuous functions defined on the projection of the $d$ dimensional compact convex polytope onto the $(d-1)$ dimensional coordinate axes, and this in turn is a $(d-1)$ dimensional compact convex polytope. Hence, corollary 1 shows that the $d$ dimensional compact convex polytope is Jordan measurable.

    3. (1) We first handle the case of a closed ball $B$, and the argument is identical to the previous parts of this exercise. If $d=1$, then any closed ball $B$ is just a closed interval, hence Jordan measurable. If $d=2$, then the projection of $B$ onto the $x$-axis is itself an interval, and corollary 1 applies since $B$ may be expressed as the region $\mathscr{R}_{f\geq g}$ of two continuous (square root) functions defined on this interval. Induction then shows that any $d$-dimensional closed ball is Jordan measurable. Indeed, having established the base case, assume this. Then any $d$ dimensional closed ball is bounded between two continuous functions defined on the projection of the $d$ dimensional closed ball onto the $(d-1)$ dimensional coordinate axes, and this in turn is a $(d-1)$ dimensional closed ball. Hence, corollary 1 shows that the $d$ dimensional compact closed ball is Jordan measurable. To handle the case of an open ball $B^{\circ}$, we note that $\partial B^{\circ}=\mathscr{G}_{f}\cup\mathscr{G}_{g}$, and therefore $m^{\star,(J)}(\partial B^{\circ})=0$. Consequently, $m^{\star,(J)}(\bar{B}\Delta B^{\circ})=0$, so that $B^{\circ}$ is Jordan measurable since we just proved $\bar{B}$ is. Furthermore, their measures coincide since we may now apply additivity to obtain $m(\bar{B})=m(B^{\circ}\cup\partial B)=m(B^{\circ})+m(\partial B)=m(B^{\circ}).$

      To prove that $m(B_{r})=c_{d}r^{d}$, consider the open unit ball (taking the closure if necessary) centered at the origin \begin{align*} B_{1}(0) &=\left\{x\in\mathbb{R}^{d}:|x|<1 data-blogger-escaped-1="" data-blogger-escaped-2.="" data-blogger-escaped-2="" data-blogger-escaped-a="" data-blogger-escaped-align="" data-blogger-escaped-an="" data-blogger-escaped-and="" data-blogger-escaped-any="" data-blogger-escaped-applying="" data-blogger-escaped-are="" data-blogger-escaped-at="" data-blogger-escaped-b="" data-blogger-escaped-b_="" data-blogger-escaped-ball="" data-blogger-escaped-be="" data-blogger-escaped-begin="" data-blogger-escaped-beginning="" data-blogger-escaped-behaves="" data-blogger-escaped-below="" data-blogger-escaped-bigcup="" data-blogger-escaped-by="" data-blogger-escaped-c_="" data-blogger-escaped-center="" data-blogger-escaped-completeness.="" data-blogger-escaped-component="" data-blogger-escaped-compute="" data-blogger-escaped-computed="" data-blogger-escaped-consequence="" data-blogger-escaped-corollary="" data-blogger-escaped-could="" data-blogger-escaped-covers="" data-blogger-escaped-crucial="" data-blogger-escaped-cup="" data-blogger-escaped-d="" data-blogger-escaped-delta="" data-blogger-escaped-delta_="" data-blogger-escaped-dilating="" data-blogger-escaped-dilation="" data-blogger-escaped-dilations="" data-blogger-escaped-directly="" data-blogger-escaped-disjoint="" data-blogger-escaped-distributive="" data-blogger-escaped-e.g.="" data-blogger-escaped-e="" data-blogger-escaped-each="" data-blogger-escaped-easy="" data-blogger-escaped-elementary="" data-blogger-escaped-end="" data-blogger-escaped-equal="" data-blogger-escaped-every="" data-blogger-escaped-exercise="" data-blogger-escaped-extends="" data-blogger-escaped-fact="" data-blogger-escaped-factor="" data-blogger-escaped-follows="" data-blogger-escaped-for="" data-blogger-escaped-from="" data-blogger-escaped-gives="" data-blogger-escaped-h="" data-blogger-escaped-hand="" data-blogger-escaped-has="" data-blogger-escaped-have="" data-blogger-escaped-here="" data-blogger-escaped-how="" data-blogger-escaped-i="1}^{d}\delta_{i}=r^{d}$." data-blogger-escaped-idea="" data-blogger-escaped-if="" data-blogger-escaped-immediately="" data-blogger-escaped-in="" data-blogger-escaped-incidentally="" data-blogger-escaped-inf_="" data-blogger-escaped-infimum="" data-blogger-escaped-invariance="" data-blogger-escaped-invariant="" data-blogger-escaped-is="" data-blogger-escaped-isn="" data-blogger-escaped-it="" data-blogger-escaped-j="" data-blogger-escaped-jordan="" data-blogger-escaped-know="" data-blogger-escaped-last="" data-blogger-escaped-ldots="" data-blogger-escaped-left="" data-blogger-escaped-let="" data-blogger-escaped-li="" data-blogger-escaped-limits_="" data-blogger-escaped-m="" data-blogger-escaped-mathbb="" data-blogger-escaped-measurable="" data-blogger-escaped-measure="" data-blogger-escaped-moreover="" data-blogger-escaped-no="" data-blogger-escaped-note="" data-blogger-escaped-now="" data-blogger-escaped-observations="" data-blogger-escaped-obtain="" data-blogger-escaped-obtained="" data-blogger-escaped-obvious="" data-blogger-escaped-of="" data-blogger-escaped-on="" data-blogger-escaped-other="" data-blogger-escaped-over="" data-blogger-escaped-plausible="" data-blogger-escaped-possible="" data-blogger-escaped-present="" data-blogger-escaped-prod="" data-blogger-escaped-property="" data-blogger-escaped-prove="" data-blogger-escaped-r="" data-blogger-escaped-rb_="" data-blogger-escaped-re="" data-blogger-escaped-really="" data-blogger-escaped-reduces="" data-blogger-escaped-relatively="" data-blogger-escaped-result="" data-blogger-escaped-right="" data-blogger-escaped-rx:x="" data-blogger-escaped-rx_="" data-blogger-escaped-sake="" data-blogger-escaped-see="" data-blogger-escaped-set.="" data-blogger-escaped-sets="" data-blogger-escaped-since="" data-blogger-escaped-so="" data-blogger-escaped-some="" data-blogger-escaped-subsequent="" data-blogger-escaped-subset="" data-blogger-escaped-such="" data-blogger-escaped-sum="" data-blogger-escaped-suppose="" data-blogger-escaped-supset="" data-blogger-escaped-t="" data-blogger-escaped-text="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-then="" data-blogger-escaped-this="" data-blogger-escaped-though="" data-blogger-escaped-to="" data-blogger-escaped-translating="" data-blogger-escaped-translation="" data-blogger-escaped-true="" data-blogger-escaped-under="" data-blogger-escaped-understood="" data-blogger-escaped-uniform="" data-blogger-escaped-uniformly="" data-blogger-escaped-unit="" data-blogger-escaped-useful="" data-blogger-escaped-very="" data-blogger-escaped-way="" data-blogger-escaped-we="" data-blogger-escaped-where="" data-blogger-escaped-which="" data-blogger-escaped-will="" data-blogger-escaped-x-h:x="" data-blogger-escaped-x-h="" data-blogger-escaped-x="" data-blogger-escaped-x_="" data-blogger-escaped-yields="">

      (2) Crude lower and upper bounds for $c_{d}$ can be established by computing the measure of the supremum and infimum covers of $B_{1}(0)$ by a single cube from inside and outside, respectively, for $d=1,2,\ldots$. To begin, let's examine a few concrete cases. When $d=1$ there is no need to compute bounds since $m^{1}(B_{1}(0))=2$. For $d=2$, it is clear that the supremal cube ($E_{2}$) has a diagonal length of $\mathscr{D}=2$ and that the infimal cube ($F_{2}$) has a dimensional length of $\ell=2$ (this can be proved easily using simple optimization techniques from differential calculus). An application of the Pythagorean theorem then shows that $\mathscr{D}^{2}=2\ell^{2}$ for any cube in $\mathbb{R}^{2}$, hence $m(E_{2})=2$ and $m(F_{2})=4$. When $d=3$, we proceed similarly. The infimal cube $F_{3}$ has again dimensional length $\ell=2$ and hence $m(F_{3})=2^{d}$. Likewise, the supremal cube has diagonal length $\mathscr{D}=2$. An application of the Pythagorean theorem does not proceed as simply for $d>2$. In any case, we have $\mathscr{D}^{2}=\ell^{2}+\mathscr{D}^{'2}$ where $\mathscr{D}^{'}$ is the projection of $\mathscr{D}$ onto $\mathbb{R}^{2}$ (a picture is helpful for the cases $d=2,3$). Then $\mathscr{D}^{'}$ forms the diagonal of a right-triangle in $\mathbb{R}^{2}$ with sides $\ell$. Hence $\mathscr{D}^{'}=\sqrt{2}\ell$, which implies $3\ell^{2}=\mathscr{D}^{2}$, so that $\ell=\frac{2}{\sqrt{3}}$, thus obtaining $m(E_{3})=\frac{8}{3\sqrt{3}}$. Therefore we assert that $$\left(\frac{2}{\sqrt{d}}\right)^{d}\leq c_{d}\leq2^{d},$$ which holds for (so far) $d=1,2,3$, equality holding only for $d=1$. Now let's consider the situation for any $d+1$ (when I originally did this problem, I assumed induction would be necessary (hence the $d+1$), but it turns out not to be). The unit ball is certainly covered by a cube of dimensional length $\ell=2$ ($F_{d+1}$), and it covers a cube of diagonal length $\ell=2$ ($E_{d+1}$). (Note that we are no longer asserting such sets are infimal or supremal, but this doesn't matter at this stage.) This follows from the definition of $B_{1}(0)$ for any dimension since $|x|< 1$ implies $\left(\sum_{i=1}^{d+1}x_{i}^{2}\right)^{\frac{1}{2}}< 1$, and any point which satisfies this inequality is obviously in $F_{d+1}$ so that it covers $B_{1}(0)$. Likewise, the point $x=(1,0,\ldots,0)$ is clearly not in $E_{d+1}$, so that $B_{1}(0)$ covers $E_{d+1}$. Since all sets in question are measurable, we have that $$m(E_{d+1})\leq m(B_{1}(0))\leq m(F_{d+1}).$$ Therefore, all we must do is compute the measures of $E_{d+1}$ and $F_{d+1}$, and if they agree with the conjectured bounds, then the bounding inequality (because of the inequality just proved) holds for all $d=1,2,\ldots$. Clearly we have $m(F_{d+1})=2^{d+1}$ by definition. As for $E_{d+1}$ we take the diagonal $\mathscr{D}=2$ passing through the origin and form a right triangle by extending one line directly down the length of the cube at the point of intersection with $B_{1}(0)$. This line is the component orthogonal to the orthogonal projection $\mathscr{D}^{'}$ of $\mathscr{D}\subset\mathbb{R}^{d+1}$ into $\mathbb{R}^{d}$. Now $\mathscr{D}^{'}$ is the diagonal emanating through the origin to the intersection(s) of the projection of $B_{1}$ and $E_{d+1}$ into $\mathbb{R}^{d}$. Hence the "vertical" side is the new projection line of $\mathscr{D}^{'}\mapsto\mathbb{R}^{d-2}$ of length $\ell$ and the ``horizontal line'' is $\mathscr{D}^{'''}$. This process continues until $\mathscr{D}^{d}=\sqrt{2}\ell$ is obtained. Then back-substituting and applying the Pythagorean theorem at each stage yields $\mathscr{D}^{2}=4=\ell^{2}+d\ell^{2}$ which implies $\ell=\frac{2}{\sqrt{d+1}}$. Hence $$m(E_{d+1})=\left(\frac{2}{\sqrt{d+1}}\right)^{d+1}$$ as required.

      It can evidently be proven that the cubes $E_{d}$ and $F_{d}$ are indeed the supremal contained and infimal covering single cubes for $B_{1}(0)$ in $\mathbb{R}^{d}$. For general radius $B_{\delta}(0)$, we see that in the above formulas $2\mapsto 2\delta$. Therefore, the respective cubes are $|E_{d}|=\left(\frac{2\delta}{\sqrt{d}}\right)^{d}$ and $|F_{d}|=(2\delta)^{d}$. These cubes are useful in various instances.
    Exercise 3. (a) 1.1.11
    1. (Relative invariance of Jordan measure under linear operators).   Let $L:\mathbb{R}^{d}\to\mathbb{R}^{d}$ be a linear operator. (1) Show that there exists a real number $D$ such that $m(L(E))=Dm(E)$ for every elementary set $E$. (2) Show that if $E$ is Jordan measurable, then $L(E)$ is also, and that $m(L(E))=Dm(E)$. (3) Show that $D=|det L|$.

    Solution.
    1. We first record some quick facts about how $L$ affects a general set $S$. A general linear operator on a finite $d$-dimensional real vector space can always be represented by a $d\times d$ real matrix $[L]_{ij}$ (the standard basis is assumed). Furthermore, a sequence of elementary row operations always exists such that either $[L]$ is reduced to the identity matrix $[I]$ ($L$ is an isomorphism) or to $[I]$ with some finite number of null rows ($L$ is singular). Call such row operations ``elementary operators'' $e:\mathbb{R}^{d}\to\mathbb{R}^{d}$ represented by matrices $[e]$, which are obtained by applying the corresponding single row operation to $[I]$. Since each elementary operator is invertible (since the corresponding row operations are), it is evident that if we record the sequence of row operations that bring $[L]\to[I]$, we can re-obtain $[L]$ from $[I]$ by applying the inverse row operations in reverse sequence to $[I]$. Therefore, we see that $[I]=[L][e_{1}]...[e_{n}]$ and $[L]=[I][e_{n}]^{-1}...[e_{1}]^{-1}$, or in other words $$L(x)=(e_{n}^{-1}\circ e_{n-1}^{-1}\circ\ldots\circ e_{1}^{-1})(x).$$ (Although we have tacitly assumed $L$ is invertible, it is obvious that a singular $L$ still has a finite elementary operator decomposition). This means that if $L$ is invertible, then $L(S)$ is a sequence of coordinate plane reflections (row interchanges), proper shears (linear combinations of independent rows), and dilations (non-zero scalar multiples of rows) of $S$. Note that when dilation is by a negative scalar (in particular, $-1$), $S$ is reflected along the corresponding coordinate axis. Note also that a rigid motion (in particular, a rotation) is accomplished by a combination of reflections, shears and dilations. If $L$ is singular, then in addition to the above, $L(S)$ necessarily has a subsequence of improper (e.g. projective) shears (linear combinations of dependent rows) or projections onto coordinate subspaces (coordinate dilations by $0$). Note that if $L$ is affine (e.g. $L=T+b$), then a translation of $S$ is also possible, and if we define $\det T=\det L$, then this exercise applies to affine transformations as well. However, we shall not discuss affine transformations further, since our proof requires the translation invariance of $m$, and this is already easy to prove by itself.

      (1) Let $E\in\mathscr{E}\left(\mathbb{R}^{d}\right)$ and first assume $L$ is invertible with $d\geq2$. Then $L(E)$ is either a dilation of $E$, a reflection of $E$ about a coordinate plane, a reflection of $E$ about a coordinate axis, a shear of $E$ about some fixed axis, or some combination of these. If $L$ is a dilation of $E$ or a reflection of $E$ about a coordinate axis, then clearly $L(E)$ is still an elementary set, so it is Jordan measurable by definition. Likewise, if $L$ is a reflection of $E$ about a coordinate plane, then because such a reflection amounts to a 90 degree rotation, we see that $E$ is again an elementary set. To deal with the final possibility of a shear, first consider a single box $B\subset\mathbb{R}^{d}$. If $L(B)$ is shear of $B$, then it is a $d$-dimensional parallelogram. Since any parallelogram is the union of two $d$-dimensional triangles (e.g. $d$-simplexes), the $d$-dimensional equivalent of exercise 1.1.8 applies and we conclude $L(B)$ is Jordan measurable (the modification is straight forward since it follows from corollary 1 and the fact that a $d$-dimensional triangle can be written as the region bounded by two piecewise continuous linear functions $f,g:\mathbb{R}^{d-1}\to\mathbb{R}$). The general case $L$ now follows, since $E$ is the finite union of boxes, $L(E)$ is the finite union of (possibly translated, dilated, and/or reflected) triangles, all of which are Jordan measurable. Hence $L(E)$ is Jordan measurable because Jordan measurable sets are closed under finite unions. If $L$ is singular, then let $d'=\ker L\geq1$. If $d'\geq2$, then $L(E)$ has outer Jordan measure $0$, since it is a subset of a $d-2$ dimensional subspace, and any such set can be covered by degenerate $d-1$ dimensional boxes, which all have $d$-dimensional measure $0$. This means that its inner Jordan measure is also $0$, so $L(E)$ is Jordan measurable. If $d'=1$, then $L(E)$ is a subset of a $d-1$ dimensional subspace, but in general (because of rotations) cannot be covered by $d-1$ dimensional cubes. On the other hand, $L(E)$ is the graph of some continuous linear function $f:\mathbb{R}^{d-1}\to\mathbb{R}$, and so it is Jordan measurable with measure $0$ by exercise 1.1.7. Finally, if $d=1$, then only a translation and or dilation of $E$ is possible (if $L$ is singular, then $L(S)=\{0\}$), which is evidently Jordan measurable.

      Now that we know $L(E)$ is Jordan measurable, we can show $m(L(E))=Dm(E)$ for some constant $D$. Define the map $m'(E)=m(L(E))$, e.g. $E\mapsto m(L(E))$. Then clearly $m'$ is a nonnegative function from the elementary sets to $\mathbb{R}^{+}$, e.g. $\mathscr{E}(\mathbb{R}^{d})\mapsto \mathbb{R}^{+}$. Assume for now that $L$ is invertible. Let $E_{h}$ denote the translate of $E$ by $h$. It is obvious that because rotations, reflections, shears and dilations are all spatially invariant, that $L(E_{h}))=L(E)_{h}$. Combining this with the translation invariance of $m$ gives $$m'(E_{h})=m(L(E_{h}))=m(L(E)_{h})=m(L(E))=m'(E),$$ which shows $m'$ is also translation invariant. Since $L$ is invertible and $d<\infty$, $L$ is a bijective isomorphism (in particular, $L$ is one-to-one), so that if $\{E_{i}\}_{i=1}^{N}$ is a finite collection of pairwise disjoint elementary sets, $\{L(E_{i})\}_{i=1}^{N}$ is a finite collection of pairwise disjoint Jordan measurable sets. Then, since $m$ is finitely additive, we have that \begin{align*} m'\left(\bigcup\limits_{i=1}^{N}E_{i}\right) &=m\left(L\left(\bigcup\limits_{i=1}^{N}E_{i}\right)\right)\\ &=m\left(\bigcup\limits_{i=1}^{N}L(E_{i})\right)\\ &=\sum\limits_{i=1}^{N}m(L(E_{i}))\\ &=\sum\limits_{i=1}^{N}m'(E_{i}), \end{align*} which shows $m'$ is also finitely additive. In the case that $L$ is singular, some of the $L(E_{i})$ may no longer be disjoint, so at present we only have sub-additivity. The only way we can recover additivity is if all intersecting $L(E_{i})$ have measure $0$, and for this to hold requires $L$ to map any elementary set $E$ to a set of measure $0$. This is in fact the case, and is a consequence of the Rank Nullity theorem as discussed previously. Incidentally, this also proves $m'$ is translation invariant because $m'(E_{h})=m(L(E_{h}))=0=m(L(E)_{h})=m(L(E))=m'(E)$. Hence, for any $L$, $m'$ is nonnegative, translation invariant, and finitely additive, which implies $m'(E)=m(L(E))=Dm(E)$ by uniqueness up-to normalization of elementary measure.

      (2) Suppose $E$ is now a Jordan measurable set and assume $L$ is invertible (if $L$ is singular, then $L(E)$ has outer measure $0$, and is therefore Jordan measurable). Then for any $\epsilon>0$ there exists an elementary set $A$ such that $m^{\star,(J)}(A\Delta E)\leq\epsilon$. For convenience, let $F=A\Delta E$ and choose pairwise disjoint cubes $\{Q_{j}\}_{j=1}^{N}$ of lengths $\ell_{j}$ such that both $$F\subset\bigcup_{j=1}^{N}Q_{j}$$ and $$\sum\limits_{j=1}|Q_{j}|\leq m^{\star,(J)}(F)+\epsilon$$ hold for $\epsilon$ chosen. Now since $d<\infty$, $L$ is a bounded linear operator, and so it satisfies a Lipschitz condition on $\mathbb{R}^{d}$ for some $0
      Note: The existence of the cover $\{Q_{j}\}_{j=1}^{N}$ follows from the definition of $m^{\star,(J)}$, since every bounded set of real numbers has an infimum. Cubes are chosen for convenience, but the Lipschitz estimate still works for arbitrary boxes (just more notation is required); that cubes can be chosen in the first place follows from the fact that any box can be arbitrarily approximated by a finite number of cubes through an obvious dissection process. The comparison criterion used at the end is proven in exercise 1.1.5 for comparison with elementary sets; but it is obvious the comparison is valid with Jordan measurable sets, since a Jordan measurable set is itself arbitrarily approximated by an elementary set.

      (3) First of all, if $L$ is singular, then $m(L(E))=0$ as already shown, and since $\det L=0$, the theorem holds for this case. So now we can assume $L$ is invertible and consider its elementary operator decomposition (which always exists) $$L(x)=(e_{1}\circ e_{2}\circ\ldots\circ e_{n})(x),$$ or in matrix multiplication form $$[L]_{ij}=[e_{1}][e_{2}]\ldots[e_{n}].$$ Since the determinant of an operator is defined by the determinant of the matrix which represents it, we see that $\det I=1$, and we can use this to compute the determinants of any elementary operator $e_{j}$ $j=1,2,\ldots,n$. If $e_{j}$ is an elementary reflection (single row interchange), then $\det e_{j}=-1$ since row (or column) interchanges only change the sign of the determinant. If $e_{j}$ is an elementary shear (single linear combination of rows), then $\det e_{j}=1$ since taking linear combinations of the rows (or columns) in a matrix does not affect the value of its determinant. Finally, if $e_{j}$ is an elementary dilation or coordinate reflection (single row scalar multiplication) by $\delta\neq0$, then $\det e_{j}=\delta_{j}$ since the determinant function is multilinear on the rows of a matrix. Using the identity $\det AB=\det A\det B$ then shows $$|\det L|=\prod_{j=1}^{n}|\det e_{j}|=\delta$$ where $\delta$ is the product of all the individual component dilations. Thus, we see that it is enough to verify the theorem for when $L$ is an elementary operator. To do this, first consider a cube $Q$ of side length $\ell$. If $L$ is an elementary reflection about a coordinate plane (e.g. the line $x_{i}=x_{k})$), then $L(Q)=Q$ since all sub-intervals of $Q$ are of uniform length $\ell$, and so no rotation occurs. Hence $$m(L(Q))=m(Q)=1\cdot m(Q)=|\det L|m(Q).$$ In the case $L$ is an elementary dilation by $\delta=(0,...,\delta_{j},\ldots,0)$, only one of the subintervals of $Q$ is dilated, e.g. $[a_{j},b_{j})\mapsto [\delta_{j}a_{j}, \delta_{j}b_{j})$, and if $\delta_{j}<0 data-blogger-escaped-a_="" data-blogger-escaped-also="" data-blogger-escaped-delta_="" data-blogger-escaped-is="" data-blogger-escaped-j="" data-blogger-escaped-reflected="" data-blogger-escaped-since="" data-blogger-escaped-then="">\delta_{j} b_{j}$) about the orthogonal coordinate axes. But for cubes, such a reflection amounts to nothing more than a translation by $h=(0,\ldots,\pm|I_{j}|,\ldots)$ (the sign depends on $I_{j}$ and $h=0$ if $\delta>0$) as no rotation can occur . Therefore $$m(L(Q))=m(\delta Q_{h})=m(\delta Q)=|I_{1}\times\ldots\times |\delta_{j}||I_{j}|\times\ldots\times |I_{d}|=|\delta_{j}|m(Q)=|\det L| m(Q).$$ The case when $L$ is an elementary shear is the most difficult, since rotations always occur, in addition to a deformation of shape. However, the deformation is predictable for a cube, and always results in a $d$ dimensional parallelogram (for example, a parallelepiped when $d=3$). Since such objects are the union of two almost disjoint $d$ dimensional triangles (along their diagonal), and the measure of a triangle agrees with the usual formula $\frac{1}{2}bh$ (where $b$ is computed from $d-1$ sides) for computing the volume of a $d$-simplex (we verified this in exercise 1.1.8), and each side length of the cube is the same, we have that $$ m(L(Q))=m(T_{1})+m(T_{2})=\frac{1}{2}\ell^{d}+\frac{1}{2}\ell^{d}=\ell^{d}=1\cdot m(Q)=|\det L|m(Q)$$ as required. Now choose $\epsilon>0$ and let $E$ be any Jordan measurable set (note that this implies $m^{\star,(J)}(E)=m_{\star,(J)}(E)$). Then there exists collections of pairwise disjoint cubes $\{U_{i=1}\}_{i=1}^{M}$ and $\{Q_{j}\}_{j=1}^{N}$ such that both $$\bigcup\limits_{i=1}^{M}U_{i}\subset E\subset\bigcup\limits_{j=1}^{N}Q_{j}$$ and $$m(E)-\epsilon\leq\sum\limits_{i=1}^{M}|U_{i}|\leq m(E)\leq\sum\limits_{j=1}^{N}|Q_{j}|\leq m(E)+\epsilon$$ hold. And, since $L$ preserves monotonicity relationships (injectivity), we obtain the estimate (where $|\det L|=\Delta$) \begin{align*} \Delta\Big(m(E)-\epsilon\Big) &\leq \Delta\left(\sum\limits_{i=1}^{M}|U_{i}|\right)\\ &=\sum\limits_{i=1}^{M}m(L(U_{i})) \leq m(L(E)) \leq \sum\limits_{j=1}^{N}m(L(Q_{j}))\\ &= \Delta\left(\sum\limits_{j=1}^{N}|Q_{j}|\right)\\ &\leq \Delta\Big(m(E)+\epsilon\Big). \end{align*} In particular, we have the estimate $$\Big|m(L(E))-\Delta m(E)\Big|\leq2\Delta\epsilon,$$ and since $\epsilon$ was arbitrary, we conclude $m(L(E))=|\det L|m(E)$ as required (note that boundedness of $L$ implies $\Delta<\infty$).
    Click here to read full post »

    Some Exercises in ODE Numerical Analysis II

    (Theoretical Component)

    Exercise #T1.   Derive the region of absolute stability for the following finite difference schemes:
    1. Backward Euler method: $$y_{n}=y_{n-1}+hf(y_{n}).$$
    2. Trapezoidal method: $$y_{n}=y_{n-1}+\frac{h}{2}\left[f(y_{n})+f(y_{n-1})\right].$$

    Solution.
    1. Applying (a) to the model stability equation we obtain \begin{align*} y_{n} &=y_{n-1}h\lambda y_{n}\\ &=\frac{y_{n-1}}{1-h\lambda}\\ &=\frac{y_{0}}{(1-h\lambda)^{n}}. \end{align*} In order that $|y_{n}|\to0$ as $n\to\infty$ in the case $\Re\lambda<0 data-blogger-escaped-h="" data-blogger-escaped-lambda="" data-blogger-escaped-require="" data-blogger-escaped-we="">1.$$ If we identify $\mathbb{C}$ with the $xy$ plane, then $S=\{(x,y):(x-1)^{2}+y^{2}>1\}$. In particular, the interval of absolute stability is $(-\infty,0)\cup(2,\infty)$, so that the method is stable for all $h>0$ when $\Re\lambda<0 data-blogger-escaped-nbsp="" data-blogger-escaped-qed="" data-blogger-escaped-stability="">





  • Applying (b) to the model stability equation we obtain \begin{align*} y_{n} &=y_{n-1}+\frac{h}{2}\lambda y_{n-1}+\frac{h}{2}\lambda y_{n}\\ &=y_{n-1}\frac{1+\frac{h}{2}}{1-\frac{h}{2}}\\ &=y_{0}\left(\frac{2+h}{2-h}\right)^{n}. \end{align*} In order that $|y_{n}|\to0$ as $n\to\infty$ in the case $\Re\lambda<0 data-blogger-escaped-absolute="" data-blogger-escaped-an="" data-blogger-escaped-and="" data-blogger-escaped-changes="" data-blogger-escaped-clear="" data-blogger-escaped-component="" data-blogger-escaped-e="" data-blogger-escaped-equality="" data-blogger-escaped-for="" data-blogger-escaped-h="" data-blogger-escaped-have="" data-blogger-escaped-hence="" data-blogger-escaped-holds.="" data-blogger-escaped-imaginary="" data-blogger-escaped-in="" data-blogger-escaped-inequality="" data-blogger-escaped-infty="" data-blogger-escaped-interval="" data-blogger-escaped-invariant="" data-blogger-escaped-is="" data-blogger-escaped-it="" data-blogger-escaped-lambda="" data-blogger-escaped-mathbb="" data-blogger-escaped-method="" data-blogger-escaped-of="" data-blogger-escaped-particular="" data-blogger-escaped-require="" data-blogger-escaped-so="" data-blogger-escaped-stability="" data-blogger-escaped-stable="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-to="" data-blogger-escaped-under="" data-blogger-escaped-we="" data-blogger-escaped-when="" data-blogger-escaped-z="">0$ when $\Re\lambda<0 data-blogger-escaped-is="" data-blogger-escaped-method="" data-blogger-escaped-moreover="" data-blogger-escaped-nbsp="" data-blogger-escaped-qed="" data-blogger-escaped-stability="" data-blogger-escaped-symmetric.="" data-blogger-escaped-this="">


  • Exercise #T2.  
    1. Let $f(t,y)$ be the vector field of some $n$ dimensional IVP, $F(y)$ the vector field of the equivalent autonomous system (obtained by the introduction of a new variable $y_{n+1}=t$), and compare the eigenvalues of $Df$ and $DF$ at some point $(t^{\star}, y^{\star})$.

    Solution.
    1. If one converts a n dimensional non-autonomous system to an equivalent n+1 dimensional autonomous system by the transformations $$\dot{x}=f(x,t)\mapsto\dot{y}=g(y),$$ where $y=(x,t)$ and $g(y)=(f(y), 1)$, then it is evident $D_{y}g$ has the same eigenvalues as $D_{x}f$ in addition to the eigenvalue $\lambda_{n+1}=0.$ The $n+1$ row of $D_{y}g$ is $$\nabla g_{n+1}(y)=\nabla \dot{y_{n+1}}(t)=grad\;1=(0,\ldots,0).$$ This is equivalent to saying the spatial gradient of the function $h(t)=t\equiv0$ in the original non-autonomous system. The $n+1$ column entries are $\frac{\partial g_{i}(y)}{\partial y_{i+1}}=\frac{\partial f_{i}(t,y)}{\partial t}$ ($\frac{\partial 1}{\partial t}=0$ for the $n+1$ entry). However, these entries do not matter, since we can expand the determinant along the $n+1$ row. Since the eigenvalues of $D_{y}g$ satisfy $$\det(D_{y}g-\lambda I)=0,$$ we see that the characteristic equation is $$P_{g}(\lambda)=-\lambda P_{f}(\lambda)$$ which shows the eigenvalues of $D_{y}g$ are the same as those of $D_{x}f$ in addition to the $0$ eigenvalue. This is to be expected however; for example, an isolated fixed point $x_{0}$ in the phase space of a linear autonomous system is actually a line $(x_{0},t)$ for $-\infty < t < \infty$ in the trajectory space--by converting to autonomous system, the trajectory space is viewed as a new higher dimensional phase space, one with infinitely adjacent fixed points (incidentally, non-isolated fixed points only occur when one has a zero eigenvalue, but of course, this example is artificial since the system is already autonomous, and viewing it in trajectory space instead of phase space is pointless).   

    Exercise #T3.   Consider the van der Pol equation of the form \begin{align*} \left(\begin{array}{c} \frac{dy_{1}}{dt}\\\\ \frac{dy_{2}}{dt} \end{array}\right) &= \left(\begin{array}{cc} \gamma&-\gamma\\\\ \frac{1}{\gamma}&0 \end{array}\right) \left(\begin{array}{c} y_{1}\\\\ y_{2} \end{array}\right) + \left(\begin{array}{c} -\frac{\gamma}{3}y_{1}^{3}\\\\ 0 \end{array}\right) &\text{where}&& \begin{array}{c} y_{1}(0)=\frac{1}{2},\\ y_{2}(0)=\frac{1}{2},\\ \gamma=10. \end{array} \end{align*}
    1. Compute $DF$.
    2. Estimate the eigenvalues of $DF$.
    3. Determine if this IVP is stiff or not if $t\in[0,20]$.

    Solution.
    1. \[ D_{f}(y)=\left(\begin{array}{cc} \gamma(1-y_{1}^{2})&-\gamma\\ \frac{1}{\gamma}&0\end{array}\right)=\left(\begin{array}{cc} 10(1-y_{1}^{2})&-10\\ 0.1&0\end{array}\right)\]   
    2. $tr\;A=10-10y_{1}^{2}$ and $det\;A=1$. Therefore (as a function of $y_{1}$), \begin{align*} \lambda_{i}(y_{1}) &=\frac{tr\;A\pm\sqrt{(tr\;A)^{2}-4det\;A}}{2}\\ &=\frac{(10-10y_{1}^{2})\pm\sqrt{100-200y_{1}+100y_{1}^{4}-4}}{2}\\ &=5-5y\pm\sqrt{24-50y_{1}+25y_{1}^{4}}. \end{align*}   
    3. If the $y(t,\frac{1}{2},\frac{1}{2})$ remains in the disk $|y|<2 data-blogger-escaped--6="" data-blogger-escaped-0="" data-blogger-escaped-accounts="" data-blogger-escaped-all="" data-blogger-escaped-analyzing="" data-blogger-escaped-and="" data-blogger-escaped-any="" data-blogger-escaped-approximately="" data-blogger-escaped-are="" data-blogger-escaped-at="" data-blogger-escaped-behaves="" data-blogger-escaped-bendixson="" data-blogger-escaped-bounded="" data-blogger-escaped-by="" data-blogger-escaped-can="" data-blogger-escaped-case="" data-blogger-escaped-completely="" data-blogger-escaped-component="" data-blogger-escaped-components="" data-blogger-escaped-critical="" data-blogger-escaped-discriminant="" data-blogger-escaped-e="" data-blogger-escaped-eigenvalues="" data-blogger-escaped-equiv0="" data-blogger-escaped-estimate="" data-blogger-escaped-except="" data-blogger-escaped-expression="" data-blogger-escaped-fact="" data-blogger-escaped-first="" data-blogger-escaped-for="" data-blogger-escaped-global="" data-blogger-escaped-i="" data-blogger-escaped-imaginary.="" data-blogger-escaped-imaginary="" data-blogger-escaped-immediate="" data-blogger-escaped-implies="" data-blogger-escaped-in="" data-blogger-escaped-is="" data-blogger-escaped-lambda_="" data-blogger-escaped-less="" data-blogger-escaped-like="" data-blogger-escaped-minimum="" data-blogger-escaped-nbsp="" data-blogger-escaped-negative="" data-blogger-escaped-negligible="" data-blogger-escaped-neighborhood="" data-blogger-escaped-no="" data-blogger-escaped-nonzero="" data-blogger-escaped-of="" data-blogger-escaped-only="" data-blogger-escaped-outside="" data-blogger-escaped-parts="" data-blogger-escaped-perhaps="" data-blogger-escaped-periodic="" data-blogger-escaped-phase="" data-blogger-escaped-plane="" data-blogger-escaped-pm="" data-blogger-escaped-poincare="" data-blogger-escaped-point="" data-blogger-escaped-qed="" data-blogger-escaped-real="" data-blogger-escaped-relation="" data-blogger-escaped-relatively="" data-blogger-escaped-remains="" data-blogger-escaped-since="" data-blogger-escaped-so="" data-blogger-escaped-solution="" data-blogger-escaped-star="" data-blogger-escaped-system="" data-blogger-escaped-t="" data-blogger-escaped-than="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-then="" data-blogger-escaped-theorem="" data-blogger-escaped-they="" data-blogger-escaped-this.="" data-blogger-escaped-to="" data-blogger-escaped-unstable="" data-blogger-escaped-we="" data-blogger-escaped-when="" data-blogger-escaped-y="" data-blogger-escaped-y_="">





  • If one intends to use a uniform step size, then this system has a high probability of being stiff (although the extremely large variation in the vector field at certain points in the phase is very likely to temper the stiffness characteristics of this IVP, and perhaps to the point of making it not stiff). To see this, we first observe that although there is a small region about $y_{1}=1$ where the system is not very stiff (since the real parts of the eigenvalues are not particularly large in relation to the step-size that would be chosen from accuracy considerations), we also have $\lambda_{1}>>1$ and $\lambda_{2}<<-1 data-blogger-escaped-a="" data-blogger-escaped-absolute="" data-blogger-escaped-accuracy="" data-blogger-escaped-advantage="" data-blogger-escaped-an="" data-blogger-escaped-and="" data-blogger-escaped-approximation="" data-blogger-escaped-are="" data-blogger-escaped-as="" data-blogger-escaped-be="" data-blogger-escaped-behavior="" data-blogger-escaped-behaviors="" data-blogger-escaped-both="" data-blogger-escaped-br="" data-blogger-escaped-can="" data-blogger-escaped-capture="" data-blogger-escaped-causing="" data-blogger-escaped-characteristic="" data-blogger-escaped-choosing="" data-blogger-escaped-chosen="" data-blogger-escaped-component="" data-blogger-escaped-components.="" data-blogger-escaped-components="" data-blogger-escaped-considerations="" data-blogger-escaped-contracting.="" data-blogger-escaped-contracting="" data-blogger-escaped-creates="" data-blogger-escaped-detracting="" data-blogger-escaped-due="" data-blogger-escaped-e.g.="" data-blogger-escaped-eigenbasis="" data-blogger-escaped-eigenvalue="" data-blogger-escaped-eigenvalues="" data-blogger-escaped-evolution="" data-blogger-escaped-explicit="" data-blogger-escaped-fast="" data-blogger-escaped-first="" data-blogger-escaped-from="" data-blogger-escaped-generally="" data-blogger-escaped-governing="" data-blogger-escaped-h="" data-blogger-escaped-have="" data-blogger-escaped-implicit="" data-blogger-escaped-in="" data-blogger-escaped-independent="" data-blogger-escaped-is="" data-blogger-escaped-large="" data-blogger-escaped-local="" data-blogger-escaped-magnitude="" data-blogger-escaped-magnitudes="" data-blogger-escaped-means="" data-blogger-escaped-method="" data-blogger-escaped-more="" data-blogger-escaped-need="" data-blogger-escaped-no="" data-blogger-escaped-obvious="" data-blogger-escaped-of="" data-blogger-escaped-one="" data-blogger-escaped-only="" data-blogger-escaped-opposite="" data-blogger-escaped-other="" data-blogger-escaped-over="" data-blogger-escaped-pairs="" data-blogger-escaped-particular="" data-blogger-escaped-phase="" data-blogger-escaped-picked="" data-blogger-escaped-pm2="" data-blogger-escaped-problem="" data-blogger-escaped-problems.="" data-blogger-escaped-qualitative="" data-blogger-escaped-regions="" data-blogger-escaped-reliably="" data-blogger-escaped-require="" data-blogger-escaped-satisfy="" data-blogger-escaped-scales="" data-blogger-escaped-scheme="" data-blogger-escaped-second="" data-blogger-escaped-see="" data-blogger-escaped-shear="" data-blogger-escaped-shortcomings="" data-blogger-escaped-signs="" data-blogger-escaped-since="" data-blogger-escaped-size="" data-blogger-escaped-small="" data-blogger-escaped-smaller="" data-blogger-escaped-so="" data-blogger-escaped-space="" data-blogger-escaped-stability="" data-blogger-escaped-stable="" data-blogger-escaped-stably="" data-blogger-escaped-step-sizes="" data-blogger-escaped-step="" data-blogger-escaped-stepsize="" data-blogger-escaped-stiff="" data-blogger-escaped-striking="" data-blogger-escaped-symmetric="" data-blogger-escaped-system="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-there="" data-blogger-escaped-therefore="" data-blogger-escaped-these="" data-blogger-escaped-they="" data-blogger-escaped-this="" data-blogger-escaped-though="" data-blogger-escaped-time="" data-blogger-escaped-to="" data-blogger-escaped-trajectories="" data-blogger-escaped-trapezoidal="" data-blogger-escaped-two="" data-blogger-escaped-unstably="" data-blogger-escaped-used="" data-blogger-escaped-we="" data-blogger-escaped-where="" data-blogger-escaped-while="" data-blogger-escaped-will="" data-blogger-escaped-with="" data-blogger-escaped-words="" data-blogger-escaped-y_="" data-blogger-escaped-zero-stability="">
    (Important Remarks!). This system brings up a very important point that I feel the text and many expositions on this matter neglect to consider. Absolute stability is concerned more than with just the case of local trajectories contracting or otherwise remaining equidistant (e.g. nonpositive eigenvalues). In this case it has been aptly demonstrated that if the stepsize is picked outside the region of absolute stability, the approximation is very prone to blowup (e.g. the global error follows the unacceptably large upper exponential error bound, rather than a sum of local truncation errors; in other words, the step size is not sufficient to damp the uncontrolled propagation of local roundoff/approximation error -- one can also view this as there being a sudden discontinuity in the graph of the error as a function of $h$ over some fixed time interval $[0,T]$, where the discontinuity occurs at the boundary of the region of absolute stability).

    On the other hand, when local trajectories are expanding (positive eigenvalues), then choosing $h$ to be in the region of absolute stability again results in the approximation having significant error; the difference is that in the former case the error tends to blow up in an exponential fashion as the approximation itself grows (expands) away from the true solution, whereas in this ladder case the error tends to not blow up in such a dramatic way since the approximation itself simply tends to die out into a constant (the model problem being a decay to $0$), unless the true solution itself is growing quickly in magnitude away from the approximation. The point is though, in each case, the error in the approximation becomes unacceptably large as it routinely fails to properly match the qualitative features of the local phase space at each step; how this extra error (in addition to normal roundoff/truncation error) is then propagated depends (as explained above) on whether the reason for the failure of qualitative matching is because $h\in S$ (approximation is matching contractions when local trajectories are in fact contracting) or $h\notin S$ (approximation is matching expansions when local trajectories are in fact contracting).

    As already mentioned, this system exhibits both cases of stiffness. As such, $h$ must be chosen to be in the intersection of the separate restrictions $h\lambda_{1}\in S$ (assume $\lambda_{1}<0 data-blogger-escaped-and="" data-blogger-escaped-assume="" data-blogger-escaped-h="" data-blogger-escaped-lambda_="" data-blogger-escaped-notin="" data-blogger-escaped-s="">0$). Implicit methods tend to handle the former restriction for any $h$ while explicit methods tend to handle the ladder restriction for any $h$. But, neither (usually) handle BOTH simultaneously, unless the method is symmetric! This then, might be what is called a "dual stiff" problem: a problem which is stiff with respect to both stable and unstable solutions! For such a problem, one must choose a method and stepsize very carefully, for neither a generic implicit or explicit method is likely to handle both types of stiffness simutaneously.

    The computational experiments in the subsequent exercise (\#C1(a) demonstrate these remarks).   

  • (Computational Component)

    NOTICE! As there are far too many plots and data output files to reasonably place on this page/PDF file, I have placed them into a zip file which can be accessed via the embedded link below. The files are organized in folders according to each exercise and have the format #name#.png/txt. The first # corresponds to the particular method used (#1: forward Euler, #3: RKF4, #4: trapezium, #5: backward Euler) and the second # corresponds to the index of the sequence of decreasing stepsizes (#1 being the largest and #4 the smallest stepsize). Specific details are in each plot/data file, which should very clearly indicate what is what.
    Exercise.   Consider the particular van der Pol system \[ \begin{array}{ll} \frac{dx_{1}}{dt}=\gamma\left(x_{1}-\frac{1}{3}x_{1}^{3}-x_{2}\right)&x_{2}=\frac{1}{2}\\\\ \frac{dx_{2}}{dt}=\frac{1}{\gamma}x_{1}&x_{2}(0)=\frac{1}{2}. \end{array} \] Using a sequence of computational experiments, determine whether the system is stiff for the parameters
    1. $\gamma=10$ and $t\in[0,20].$
    2. $\gamma=100$ and $t\in[0,20].$

    Solution.
    1. When $\gamma=10$ the problem turns out to be not stiff, but the analysis in \#T3 regarding the stiffness of this system still applies. The reason it is not stiff from a practical standpoint is because the accuracy considerations themselves demand a small stepsize (this is clear from the "relaxation-discharge" oscillations whereby in certain regions of the phase space trajectories exhibit dramatic variation over very short time intervals). There are definitely very obvious absolute stability requirements, however, as is indicated by the blowup and flat-lining of the approximation when the stepsize is too large when using the forward Euler/RKF methods and backward Euler method, respectively. The trapezoidal rule never experiences absolute stability problems due to its symmetry, but shows clearly the severity of the accuracy restriction (note that even when the trapezoidal method produces an unacceptably bad approximation that it is still at least qualitatively correct; this is contrast to the backward Euler flat-lining and the explicit methods blowing up when absolute stability requirements are violated).   
    2. When $\gamma=100$ the problem turns out to be stiff. The sequence of approximations is similar to (a); however, once the qualitative behavior is resolved, the approximations are immediately accurate (contrast this to the gradual convergence in (a) where inaccuraces are only gradually damped out with decreasing the stepsize well past the absolute stability restriction). Indeed, the only affect of increasing $\gamma$ is that the relaxation-oscillation period is dramatically longer (a complete cycle for $\gamma=100$ was obtained only after taking $T>150$!). In particular, the solution remains bounded in the same $[-2,2]\times[-0.5,0.5]$ rectangle, and very probably has the same invariant attracting limit cycle as the previous IVP with $\gamma=10$. Because the system moves much slower in time, trajectories stay in the regions where the eigenvalues have high magnitudes of opposite sign for much longer, and this would tend to amplify any stiffness characteristics of this particular system. Indeed, as soon as the qualitative behavior of the system is resolved (in particular, for the explicit methods), accuracy improvements are modest with continued decrease of the step-size, which indicates that unless extreme accuracy is required, the absolute stability restriction supersedes the accuracy requirements. } As indicated in the remarks of the theoretical problem, these computations further demonstrate the dual-stiffness of this system. It is stiff with respect to stable and unstable solutions, and therefore neither a generic implicit or explicit method is likely to solve the problem independent of absolute stability requirements (at least for $\gamma$ sufficiently large). Thus we see a concrete case where an implicit method is not always the solution to approximating a stiff system!   

    Exercise.   Consider the particular van der Pol system \[ \begin{array}{ll} \frac{dx_{1}}{dt}=\frac{1}{100}-\left(\frac{1}{100}+x_{1}+x_{2}\right)\left(x_{1}^{2}+10001x_{1}+1001\right)&x_{1}(0)=0\\\\ \frac{dx_{2}}{dt}=\frac{1}{100}-\left(\frac{1}{100}+x_{1}+x_{2}\right)\left(1+x_{2}^{2}\right)&x_{2}(0)=\frac{1}{2}. \end{array} \]
    1. Using a sequence of computational experiments, determine whether the system is stiff.

    Solution.
    1. The problem is indeed very stiff. But unlike the stiffness exhibited in the previous problem, it is only unstably stiff. This is evidenced by the fact that even for large $h$, the backward Euler method captures the qualitative behavior of the solution, while for even very small $h$ the explicit Euler/RKF methods blowup. Once the explicit methods resolve the qualitative behavior, however, the accuracy is immediate and only incremental gains are achieved by further reduction of the stepsize, and this of course means the IVP is stiff (in the unstable sense). Since it is not stably stiff, an implicit method would allow one to choose a moderate stepsize based on accuracy requirements alone.   


    ODESolve#6.m (Executes sequence of computational experiments to determine stiffness of above IVPs).
    %%%%%%%%%%%%%%%%%%%%%%%
    % /////////////////// %
    % BEGIN CONFIGURATION %
    % /////////////////// %
    %%%%%%%%%%%%%%%%%%%%%%%
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% USER INPUT PARAMETERS %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    x0   = 1; % initial condition
    I    = [0, 10]; % time-interval [t_0, t_f]
    h    = .025;      % static step size
    N    = 0;      % mesh size
    FLAG = false;  % F -> H assigned; T -> N assigned
    CONV = false;  % estimate convergence rate T/F
    M    = 4;      % OSD method choice (1-Euler; 2-MP; 3-RKF)
    
    %%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% VECTOR FIELD %%%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%
    
    function xdot = F(x, t)
        xdot = [-50*(x(1)-cos(t))];
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% EXPRESSION STRINGS %%%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    OSDM = {"M_1"; "M_2"; "M_3"; "M_4"; "M_5"};
    IVP = strcat("{x'= -50(x_{1}-cos(t))",
            sprintf("\n"),
            "x_{0}=1");
       
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % //////////////////////////// %
    % BEGIN MAIN PROGRAM EXECUTION %
    % //////////////////////////// %
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    %%% Repeat numerical experients for M = 1, 3, 4. %%%
    
    source("OSDM.m");
    fid = fopen(sprintf("Output%i.txt", M), "a");
    [msg, w, t, print_str, comp_plot, param_plot] = feval(OSDM{M, 1}, @F, I, x0, h, FLAG, IVP);
    fprintf(fid, "%s", print_str);
    print(comp_plot, "-dpng", sprintf("ComponentPlot%i.png", M));
    set(comp_plot, "visible", "on");
    fclose(fid);
    %%%%%%%%%%%%%%%
    %%%%% END %%%%%
    %%%%%%%%%%%%%%%
    
    OSDM.m (implementation of various ODE approximation methods).
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % /////////////////////////// %
    % ONE STEP DIFFERENCE METHODS %
    % /////////////////////////// %
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% OSD METHOD TYPE-CHECK/ASSIGNMENT FUNCTION %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG)
        if (!is_function_handle(f) || !isrow(I) || !iscolumn(x0) || !isbool(FLAG))
            printf("Type Check Error: Invalid argument types (vector field, time interval, initial condition, or boolean flag)!\n");
            msg = 0;
            return;
        else % check round 1 passed
            test_F = f(x0, I(2));
            if (!exist("test_F", "var"))
                printf("Type Check Error: Invalid vector field parameters!\n");
                msg = 0;
                return;
            elseif (!iscolumn(test_F) || !(length(test_F) == length(x0)))
                printf("Type Check Error: Vector field is not the same dimension as initial data!");
                msg = 0;
                return;
            elseif (!(I(2)>I(1)))
                printf("Type Check Error: Time interval is not ordered!");
                msg = 0;
                return;
            else % check round 2 passed
                if (FLAG == false)
                    h = z;
                    if (!isfloat(h))
                        printf("Type Check Error: Invalid argument type (step size)!\n");
                        msg = 0;
                        return;
                    elseif (!(h>0))
                        printf("Type Check Error: Step size is not positive!\n");
                        msg = 0;
                        return;
                    endif
                elseif (FLAG == true)
                    N = z;
                    if (!(floor(N) == ceil(N)))
                        printf("Type Check Error: Invalid argument type (mesh size)!\n");
                        msg = 0;
                        return;
                    elseif (!(N>0))
                        printf("Type Check Error: Mesh size is not positive!\n");
                        msg = 0;
                        return;
                    endif % check round 3 passed
                endif
            endif
        endif % type check successful
        msg = 1;
        h = 0;
        N = 0;
        if (FLAG == false)
            h = z;
            N = ceil(abs(I(2)-I(1))/h);
        elseif (FLAG == true)
            N = z;
            h = abs(I(2)-I(1))/N;
        endif
        t = linspace(I(1), I(2), N+1);
        dim = length(x0);
        w = zeros(dim, N+1);
        w(:, 1) = x0;
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% OSD METHOD DATA FORMATTER %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, mthd, IVP)
        print_str = strcat(
        sprintf("%s (h = %f):\r\n\r\n", mthd, h(1)),
        sprintf("   x(%d) = [ %.8f,]\r\n", t(N+1), w(1, N+1)));
        if (w(2:dim-1, N+1))
            print_str = strcat(print_str, sprintf("            [ %.8f,]", w(2:dim-1, N+1)));
        endif
        print_str = strcat(print_str, sprintf("            [ %.8f  ]\r\n\r\n", w(dim, N+1)));
        comp_plot = figure("visible", "off");
        title(sprintf("COMPONENT SOLUTION PLOT\nApproximated using the %s\nStep Size: h=%.4f.\nTime Steps: N=%i", mthd, h, N));
        xlabel("t\n");
        ylabel("x_k");
        grid on;
        hold all;
        for (k=1 : dim)
            plot(t, w(k, :), sprintf(";x_%i(t);", k));
        endfor
        legend("location", "NORTHEAST");
        xlim = get(gca(), 'xlim');
        ylim = get(gca(), 'ylim');
        axis([xlim(1), xlim(2), ylim(1), ylim(2)+0.10*ylim(2)]);
        text(get(gca(),'XLim')(2)*.05, get(gca(),'YLim')(2)*.90, IVP);
        if (dim == 2)
            param_plot = figure("visible", "off");
            title(sprintf("PARAMETER SOLUTION PLOT\nApproximated using the %s\nStep Size: h=%.4f\nTime Steps: N=%i", mthd, h, N));
            xlabel("{x_{1}}");
            ylabel("{x_{2}}");
            grid on;
            hold all;
            plot(w(1, :), w(2, :), sprintf(";x(t) for t in [%d,%d];", t(1), t(N+1)));
            legend("location", "NORTHEAST");
            xlim = get(gca(), 'xlim');
            ylim = get(gca(), 'ylim');
            axis([xlim(1), xlim(2), ylim(1), ylim(2)+0.10*ylim(2)]);
            text(get(gca(),'XLim')(2)*.05, get(gca(),'YLim')(2)*.90, IVP);
        else
            param_plot = -1;
        endif
        hold off;
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% FORWARD EULER METHOD %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [msg, w, t, print_str, comp_plot, param_plot] = M_1(f, I, x0, z, FLAG, IVP)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            w = 0;
            return;
        else
            for (j=1 : N)
                w(:, j+1) = w(:, j) + h*f(w(:, j), t(j));
            endfor
            [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Forward Euler Method", IVP);
        endif
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% MIDPOINT METHOD %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%
    function [msg, w, t, print_str, comp_plot, param_plot] = M_2(f, I, x0, z, FLAG, IVP)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            w = 0;
            return;
        else
            for (j=1 : N)
                w(:, j+1) = w(:, j) + h*f(w(:, j) + (h/2)*f(w(:, j)), t(j) + h/2);
            endfor
            [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Midpoint Method", IVP);
        endif
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% RUNGE-KUTTA METHOD (STAGE 4) %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [msg, w, t, print_str, comp_plot, param_plot] = M_3(f, I, x0, z, FLAG, IVP)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            return;
        else
            for (j=1 : N)
                k_1 = h*f(w(:, j), t(j));
                k_2 = h*f(w(:, j) + k_1/2, t(j) + h/2);
                k_3 = h*f(w(:, j) + k_2/2, t(j) + h/2);
                k_4 = h*f(w(:, j) + k_3, t(j) + h);
                w(:, j+1) = w(:, j) + k_1/6 + (k_2 + k_3)/3 + k_4/6;
            endfor
          [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Runge Kutta 4 Method", IVP);
        endif
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% TRAPEZOIDAL METHOD %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function z = G(w, w0, t0, t, h, f)
        z = length(w);
        z = w0 + (h/2)*(f(w, t) + f(w0, t0)) - w;
    endfunction
    
    function [msg, w, t, print_str, comp_plot, param_plot] = M_4(f, I, x0, z, FLAG, IVP)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            w = 0;
            return;
        else
            for (j=1 : N)
                w(:, j+1) = fsolve(@(x) G(x, w(:, j), t(j), t(j+1), h, f), w(:, j)); % Octave codec; will implement from scratch later
            endfor
            [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Trapezoidal Method", IVP);
        endif
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% BACKWARD EULER %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function z = G_b(w, w0, t0, t, h, f)
        z = length(w);
        z = w0 + h*f(w, t) - w;
    endfunction
    
    function [msg, w, t, print_str, comp_plot, param_plot] = M_5(f, I, x0, z, FLAG, IVP)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            w = 0;
            return;
        else
            for (j=1 : N)
                w(:, j+1) = fsolve(@(x) G_b(x, w(:, j), t(j), t(j+1), h, f), w(:, j)); % Octave codec; will implement from scratch later
            endfor
            [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Backward Euler", IVP);
        endif
    endfunction
    




    (Theoretical Component)

    Exercise #T1.  
    1. Explain why no explicit RKF method can be A-stable or A($\alpha$)-stable.

    Solution.
    1. The answer is quite simple. In applying any s-stage RKF method to the stability test equation, one obtains the inequality $$\left|\underbrace{1+z+\frac{z^{2}}{2}+\ldots}_{p(z)}\right|< 1$$ where deg $p(z)=s$. But $|p(z)|\to\infty$ as $z\to\infty$ for any polynomial of a complex variable $z$, and so the inequality can only be satisfied on a bounded set. This means no RKF method can be A or A($\alpha$) stable, since such methods have unbounded regions of absolute stability.   

    Exercise #T2.   Consider the task of solving the fixed point problem $z=q(z)$ using the method of successive approximations (e.g. computing the limit of the recursively defined sequence $z_{n+1}=q(z_{n})$, $z_{0}$ given), where $q:\mathbb{R}\to\mathbb{R}.$
    1. State reasonable conditions on $q$ which guarantee $q$ has a fixed point, that simple fixed point iteration converges to the fixed point, and that the fixed point is unique.
    2. Assume one is using the backward Euler method to solve a given IVP in standard form. Determine the fixed point problem which must be solved at each time step to advance from $y_{n-1}$ to $y_{n}$. Determine conditions on the timestep $h$ which guarantee the method of successive approximations converges to a unique $y_{n}$ at each timestep.
    3. Repeat (a) and (b) using the trapezoidal method, and compare the stepsize restrictions.
    4. What conclusions can be drawn about the utility of implicit finite difference methods for solving general stiff systems when one uses the method of successive approximations to solve the implicit equations which arise at each timestep?

    Solution.
    1. First we need to assume that we are working on a compact set, say $K\subset\mathbb{R}$, and that $q:K\to K$ (e.g. $q$ maps $K$ into a subset of itself). Mere continuity of $q$ is then enough to ensure $q$ has a fixed point in $K$ by the Brouwer fixed point theorem (since we are on $\mathbb{R}$, a simple proof using the intermediate value theorem is obtained by looking at the graphs of $q$ and $I_{K}$, the identity on $K$). To guarantee that simple fixed point iteration converges to a unique fixed point of $q$, we require $q$ to be a contraction mapping: $|q(x)-q(y)|<\lambda|x-y|$ for some constant $0<\lambda< 1$ and $x,y\in K$. This follows from the Banach fixed point theorem for complete metric spaces. The mean value theorem then implies that $|q'(z)|< 1$, $z\in K$, is equivalent to $q$ being a contraction on $K$.   
    2. If $L$ is the Lipshitz constant of $f$ in the compact region of interest, then $h< L^{-1}$ will insure that $|q'(y)|< 1$, or equivalently, that $q$ is a contraction map.   
    3. For the trapezoidal method, we have the fixed point problem $$q(y)=y_{n-1}+\frac{h}{2}f(y_{n-1})+\frac{h}{2}f(y).$$ Therefore, we may take $h<2l data-blogger-escaped-a="" data-blogger-escaped-backward="" data-blogger-escaped-compared="" data-blogger-escaped-euler.="" data-blogger-escaped-h="" data-blogger-escaped-nbsp="" data-blogger-escaped-of="" data-blogger-escaped-on="" data-blogger-escaped-qed="" data-blogger-escaped-relaxation="" data-blogger-escaped-restriction="" data-blogger-escaped-slight="" data-blogger-escaped-stepsize="" data-blogger-escaped-the="" data-blogger-escaped-to="" data-blogger-escaped-using="">





  • The point to be taken from (b) and (c) is that for implicit methods, the particular fixed point problems which arise at each step must be considered in determining $h$; this is in addition to the usual stability and accuracy restrictions. If one uses simple fixed point iteration to solve the fixed point problems which arise, then one must restrict $h$ to be sufficiently small, in general proportional to $L^{-1}$ (even if the IVP is not stiff). But this is precisely the same restriction required of explicit methods when the IVP is stiff! Therefore, no advantage is gained in using implicit methods to solve stiff problems if simple fixed point iteration is employed.   

  • Exercise #T3.   Consider using the backward Euler method to generate approximate solution to the linear constant coefficient system $$\frac{dy}{dt}=Ay,$$ $$y(0)=y_{0}.$$
    1. Determine the fixed point problem which must be solved at each timestep to advance the solution from $y_{n-1}$ to $y_{n}$.
    2. Give conditions (if any) on the timestep and initial guess $y^{0}_{n}$ required to guarantee the method of successive approximations converges to a unique $y_{n}$ at each timestep for every initial condition $y_{0}$.
    3. What conclusions can be drawn about the utility of implicit finite difference methods for solving stiff systems of linear constant coefficient equations when one uses the method of successive approximations to solve the implicit equations which arise at each timestep?

    Solution.
    1. The nth fixed point problem which must be solved at each timestep using the backward Euler method is of course $$y=y_{n-1}+hAy=Q_{n}(y),$$ or in component form the system $$y_{i}=y_{n-1}+h\sum_{j=1}^{m}a_{ij}y_{j}\;\;\;i=1,2,\ldots,m.$$   
    2. If we use the Banach fixed point criterion for convergence of the simple fixed point iteration, then there is no requirement imposed on the initial guess $y^{0}_{n}$ as long as we are working on a compact set $K$ that $Q$ maps into itself. Now, for any $x,y\in K$, we have \begin{align*} ||Q(x)-Q(y)|| &=||hAx-hAy||\\ &=||hA(x-y)||\\ &\leq h||A||\cdot||x-y||. \end{align*} Therefore, to insure $Q$ is a contraction on $K$ so that the simple fixed point iteration converges, we must have $h||A||< 1$ or $h<(||A||)^{-1}.$   
    3. From (b), we see that the restriction on $h$ automatically implies (since $\rho(A)\leq||A||$) $$h<\frac{1}{\rho(A)}.$$ Just as in the previous exercise, the takeaway point then is that simple fixed point iteration imposes the same restrictions on the stepsize $h$ for implicit methods as does stiffness on explicit methods. Therefore, if implicit methods are to be advantageous in solving stiff IVPs, a better method for solving implicit equations is necessary.   

    Exercise #T4.   For the following systems assume the trapezoidal method is used with an LU factorization technique (e.g. Gaussian elimination) to solve the implicit equations and determine the restrictions (if any) on the timesteps required in order to obtain a numerical solution.
    1. \[ \frac{dy}{dt}=\left(\begin{array}{cc}-2&1\\1&-2\end{array}\right)y \]
    2. \[ \frac{dy}{dt}=\left(\begin{array}{cc}2&1\\1&-2\end{array}\right)y \]

    Solution.
    1. The implicit equations which arise are \begin{align*} y &=Q_{n}(y)\\ &=y_{n-1}+\frac{h}{2}Ay_{n-1}+\frac{h}{2}Ay\\ &=\left(\frac{h}{2}A-I\right)y_{n-1}+\frac{h}{2}Ay. \end{align*} Equivalently, $$\left(\frac{h}{2}A-I\right)y=-\left(\frac{h}{2}A-I\right)y_{n-1}$$ and in component form \[ \left( \begin{array}{cc} -h-1&\frac{h}{2}\\ \frac{h}{2}&-h-1\\ \end{array} \right) \left( \begin{array}{c} y^{1}\\ y^{2} \end{array} \right) = \left( \begin{array}{c} (h+1)y^{1}_{n-1}-\frac{h}{2}y^{2}_{n-1}\\ (h+1)y^{2}_{n-1}-\frac{h}{2}y^{1}_{n-1} \end{array} \right) \] Since clearly $|-h-1|>|\frac{h}{2}|$ for all $h$, the coefficient matrix for the linear system is diagonally dominant. It follows that the pivot elements at each stage of the Gaussian elimination algorithm are nonzero, and so the method is stable (only roundoff error can cause instability of an LU method since it is direct, and not iterative; diagonal dominance insures that this roundoff error is stable). Therefore, no restriction is needed on $h$ to insure the method obtains the required fixed point to advance the solution.   
    2. Similarly as in (a), we have in component form the implicit equations \[ \left( \begin{array}{cc} h-1&\frac{h}{2}\\ \frac{h}{2}&-h-1\\ \end{array} \right) \left( \begin{array}{c} y^{1}\\ y^{2} \end{array} \right) = \left( \begin{array}{c} (-h+1)y^{1}_{n-1}-\frac{h}{2}y^{2}_{n-1}\\ (-h+1)y^{2}_{n-1}-\frac{h}{2}y^{1}_{n-1} \end{array} \right) \] However, $|h-1|$ is not always greater than $\frac{h}{2}$, so the coefficient matrix is not diagonally dominant. Moreover, if $h\approx1$, then $|A_{11}|\approx0$ while all other entries are (relatively) much larger. In this situation, in the $LU$ decomposition of $A$, $L$ has an entry in $L_{21}\approx (A_{11})^{-1}$ which is much larger than any of the other entries in $L$ or $U$. In general, the roundoff $\delta$ can be estimated by $$\frac{\delta=||L||||U||O(\epsilon_{machine})}{||A||}.$$ The $L_{21}$ entry dominates the other matrix norms, and so $$\delta\approx L_{21}\epsilon_{machine}.$$ If $(L_{21})^{-1}$ is on the order of $\epsilon_{machine}$, then numerical instability results. All of this is to say if $h\approx1$, then the method is unstable in the sense that round-off error will contaminate the estimate for the fixed point needed to advance the solution. Therefore, we require $h<< 1$.

      Note that even if no timestep restriction is necessary (which is usually the case for direct LU methods), the computational cost for using a direct method over an iterative method to solve the implicit equations at each step may very quickly overwhelm any advantage gained by a relaxed stepsize restriction (unless one knows in advance that the coefficient matrices have some particular form which can be exploited, such as bandedness). Furthermore, it should be noted that any restriction on $h$ does not arise from convergence requirements; indeed, direct methods always "converge." Finally, LU methods are obviously only applicable when the ODE system is linear, for otherwise the implicit equations are non-linear and one is forced to use an iterative procedure such as Newton's method.   

    Exercise #T5.   Assume the backward Euler method with Newton iteration is to be used to solve the IVP \[ \begin{array}{cclr} \frac{du}{dt}&=&0.01-(0.01+u+v)[1+(u+1000)(u+1)]&u(0)=0,\\\\ \frac{dv}{dt}&=&0.01-(0.01+u+v)\left(1+v^{2}\right)&v(0)=0.5. \end{array} \]
    1. Write out in component form the fixed point problem which must be solved to advance each timestep.
    2. Write out the resulting iteration procedure from applying Newton's method to the fixed point problems in (a).

    Solution.
    1. Put $x=(u,v)$ and $f(u,v)=(\dot{u},\dot{v})$. Then backward Euler gives $$x_{n}=x_{n-1}+hf(x_{n})$$ yielding the nth step fixed point problem $$x=x_{n-1}+hf(x).$$ Define $Q_{n}(x):=x-x_{n-1}-hf(x)$. Then the nth fixed point problem becomes the root problem $$Q_{n}(x)=0,$$ and in component form: \[\left( \begin{array}{c} Q^{1}_{n}\\ Q^{2}_{n}\end{array}\right) =\left(\begin{array}{c} u-u_{n-1}-h0.01+h(0.01+u+v)(1001+1001u+u^{2})\\ v-v_{n-1}-h0.01+h(0.01+u+v)(1+v^{2})\end{array}\right) =\left(\begin{array}{c} 0\\ 0\end{array}\right) \] where $(u,v)=(u_{n},v_{n})=x_{n}$, the advanced timestep.
    2. Applying Newton iteration to the fixed point problem in (a) gives \begin{align*} x_{i+1} &=x_{i}-\frac{Q(x_{i})}{D_{x}Q(x_{i})}\\ &=x_{i}-[D_{x}Q(x_{i})]^{-1}Q(x_{i}) \end{align*} where \begin{align*} D_{x}Q(x) &=\left(\begin{array}{c} \nabla Q_{1}(u,v))\\ \nabla Q_{2}(u,v)\end{array}\right)\\ &=\left(\begin{array}{cc} 1+h(1001+1001u+u^{2})+h(2u+1001)(0.01+u+v)&h(1001+1001u+u^{2})\\ h(1+v^{2})&1+h(1+v^{2})+h(2v)(0.01+u+v)\end{array}\right). \end{align*} Put $\alpha=[D_{x}Q(x_{i})]^{-1}Q(x_{i})$ so that $\alpha$ can be solved from the equation $$[D_{x}Q(x_{i})]\alpha=Q(x_{i}).$$ This is to avoid having to compute $[D_{x}Q(x_{i})]$ directly, whether symbolically for all steps or approximately at each step. Then, in component form our iteration becomes: \[ \left(\begin{array}{c} u_{i+1}\\ v_{i+1}\end{array}\right) =\left(\begin{array}{c} u_{i}-\alpha^{1}\\ v_{i}-\alpha^{2}\end{array}\right) \] where $\alpha$ satisfies \[ \left(\begin{array}{cc} 1+h(1001+1001u_{i}+u_{i}^{2})+h(2u_{i}+1001)(0.01+u_{i}+v_{i})&h(1001+1001u_{i}+u_{i}^{2})\\ h(1+v_{i}^{2})&1+h(1+v_{i}^{2})+h(2v_{i})(0.01+u_{i}+v_{i})\end{array}\right) \left(\begin{array}{c} \alpha^{1}\\ \alpha^{2}\end{array}\right) \] \[ =\left(\begin{array}{c} u_{i}-u_{n-1}-h0.01+h(0.01+u_{i}+v_{i})(1001+1001u_{i}+u_{i}^{2})\\ v_{i}-v_{n-1}-h0.01+h(0.01+u_{i}+v_{i})(1+v_{i}^{2})\end{array}\right) \] Again, $(u_{i},v_{i})$ are the ith iterates in the Newton iteration which are expected to converge to $(u_{n},v_{n})$, and $(u_{n-1},v_{n-1})$ are just the approximations from the previously computed timestep.   

    (Computational Component)

    Exercise #C1.  Consider the IVP in #T5, which was determined to be stiff (though only with respect to explicit methods) in the computational component of Assignment #6.

    Implement a routine to solve this problem using the backward Euler method combined with Newton iteration for $t\in[0,50]$ and initial conditions $(u,v)=(0,0.5)$.

    1. Determine the timestep required to obtain a numerical solution that is reliably accurate to $10^{-2}$ in each compoent at $t=50.$
    2. Solve (present the solution to) the IVP using the timestep determined in (a).
    3. Explain the stopping criterion implemented in the backward Euler routine.
    4. Report the maximal number of iterations which were required to advance the solution by one step.

    Solution.
    1. The conclusion after multiple simulations with step sizes varying from $h=4$ to $h=0.005$ is that both components are $O(10^{-2})$ accurate (e.g. accurate upto the second digit) after $h=1$.
      Data Output (also included in above accessible Skydrive file)
      Backward Euler (h = 4.000000):
      
         x(50) = [ -0.99447967,]
                  [ 0.98678828  ]
      
      (MAX ITERATES:14)
      
      Backward Euler (h = 3.000000):
      
         x(50) = [ -0.99264351,]
                  [ 0.98454347  ]
      
      (MAX ITERATES:13)
      
      Backward Euler (h = 2.000000):
      
         x(50) = [ -0.98939009,]
                  [ 0.98083846  ]
      
      (MAX ITERATES:12)
      
      Backward Euler (h = 1.000000):
      
         x(50) = [ -0.99055978,]
                  [ 0.98213119  ]
      
      (MAX ITERATES:9)
      
      Backward Euler (h = 0.500000):
      
         x(50) = [ -0.99117116,]
                  [ 0.98281274  ]
      
      (MAX ITERATES:7)
      
      Backward Euler (h = 0.100000):
      
         x(50) = [ -0.99168100,]
                  [ 0.98338368  ]
      
      (MAX ITERATES:6)
      
      Backward Euler (h = 0.050000):
      
         x(50) = [ -0.99175624,]
                  [ 0.98346833  ]
      
      (MAX ITERATES:6)
      
      Backward Euler (h = 0.010000):
      
         x(50) = [ -0.99186677,]
                  [ 0.98359377  ]
      
      (MAX ITERATES:6)
      
      Backward Euler (h = 0.005000):
      
         x(50) = [ -0.99190869,]
                  [ 0.98364173  ]
      
      (MAX ITERATES:5)
      

    2. From (a), we report a solution of $$x(50)=(u(50),v(50))=(-0.99168100, 0.98338368),$$ computed using a stepsize of $h=1$ (see figure).

      Image Hosted by ImageShack.us
      Image Hosted by ImageShack.us

      The attached data files and plots at the end give more accurate approximations using smaller stepsizes.   
    3. The stopping criterion was taken to be when the iterates satisfied $$||x^{n}-x^{n-1}||_{2}< NTOL=0.00001,$$ as is standard. The bound $NTOL=0.00001$ was chosen to maintain consistency with the smallest stepsize picked in the sequence of simulations. Since the method is $\tau_{n}(h)=O(h^{2})$, $\tau_{n}(0.005)=O(0.000025)\approx O(0.00001)$, or accuracy upto the fourth digit, and in turn accuracy upto at least the second digit, or $O(0.05)\approx O(10^{-2})$. However, for good measure and to increase confidence,$NTOL$ was increased to $NTOL=1\cdot10^{-6}$, resulting in probably more iterations than was necessary as recorded in (d).   
    4. For the higher stepsize simulations, convergence was achieved after no more than approximately 15 iterations, the highest reported being 14. For $h< 1$, the convergence was achieved after no more than 5-7 iterations, with 9 iterations being the maximum for $h=1$ corresponding the approximation recorded in (b).   
    Additional plots and corresponding data for various stepsize simulations can be accessed here:

    ODESolve#7.m (implements above numerical simulations).
    %%%%%%%%%%%%%%%%%%%%%%%
    % /////////////////// %
    % BEGIN CONFIGURATION %
    % /////////////////// %
    %%%%%%%%%%%%%%%%%%%%%%%
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% USER INPUT PARAMETERS %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    x0   = [0; 0.5]; % initial condition
    I    = [0, 50]; % time-interval [t_0, t_f]
    h    = [4, 3, 2, 1, 0.5, 0.1, 0.05, 0.01, 0.005];
    N    = 0;      % mesh size
    FLAG = false;  % T -> H assigned; F -> N assigned
    CONV = false;  % estimate convergence rate T/F
    M    = 6;      % OSD method choice (1-F.Euler; 2-MP; 3-RKF; 4-TRAP; 5-B.Euler)
    
    %%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% VECTOR FIELD %%%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%
    
    function xdot = F(x, t)
        xdot = [
                    (1/100)-((1/100)+x(1)+x(2))*((x(1))^2+1001*x(1)+1001);
                    (1/100)-((1/100)+x(1)+x(2))*((x(2))^2+1)
               ];
    endfunction
    
    function z = DF(x, t) % optional (spatial!) Jacobian for implicit methods
        z = [
                -((x(1))^2+1001*x(1)+1001)-(0.01+x(1)+x(2))*(2*x(1)+1001), -1001-1001*x(1)+(x(1))^2;
                -1-(x(2))^2, -(1+(x(2))^2)-(2*x(2))*(0.01+x(1)+x(2))
            ];
    endfunction;
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% EXPRESSION STRINGS %%%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    OSDM = {"M_1"; "M_2"; "M_3"; "M_4"; "M_5"; "M_6"};
    IVP = strcat("{x_{1}\'=0.01-(0.01+x_{1}+x_{2})(x_{1}^{2}+1001x_{1}+1001)}",
            sprintf("\n"),
            "{x_{2}\'=0.01-(0.01+x(1)+x(2))(x_{2}^{2}+1)}",
            sprintf("\n"),
            "{x(0)=(0,0.5)}");
       
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % //////////////////////////// %
    % BEGIN MAIN PROGRAM EXECUTION %
    % //////////////////////////// %
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    warning("off");
    source("OSDM#7.m");
    fid = fopen(sprintf("DataOutput.txt"), "a");
    for (j=1:9)
        [msg, w, t, print_str, comp_plot, param_plot, i] = feval(OSDM{M, 1}, @F, I, x0, h(j), FLAG, IVP, @DF, 5);
        fprintf(fid, "%s(MAX ITERATES:%i)\r\n\r\n", print_str, i);
        print(comp_plot, "-dpng", sprintf("[%i]ComponentPlot#%i.png", M, j));
        print(param_plot, "-dpng", sprintf("[%i]ParametricPlot#%i.png", M, j));
    endfor
    fclose(fid);
    
    OSDM#7.m (implements backward Euler using Newton iteration)
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% BACKWARD EULER w/ NEWTON %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function z = NG_b(w, w0, t, t0, h, f)
        z = w - w0 - h*f(w, t);
    endfunction
    
    function z = NDG_b(w, w0, t, t0, h, df)
        z = eye(length(w)) - h*df(w, t);
    endfunction
    
    function [z, k] = NEWT_ODE_IMPLCT_SOLVE(w0, t, t0, h, g, dg, f, df);
        z0 = w0;
        NMAX = 100;
        NTOL = 0.000001;
        k = 0;
        for (i=1:NMAX)
            z = z0 - inv(dg(z0, w0, t, t0, h, df))*g(z0, w0, t, t0, h, f);
            k++;
            if (norm(z-z0,2) < NTOL)
                return;
            endif
            z0 = z;
        endfor
        printf("Newton iteration failed!  Discard result. . .\n");
    endfunction
    
    function [msg, w, t, print_str, comp_plot, param_plot, i] = M_6(f, I, x0, z, FLAG, IVP, df)
        [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG);
        if (!msg)
            w = 0;
            return;
        else
            i = 0;
            k = 0;
            for (j=1 : N)
                [w(:, j+1), k] = NEWT_ODE_IMPLCT_SOLVE(w(:,j), t(j+1), t(j), h, @NG_b, @NDG_b, f, df);
                if (k > i)
                    i = k;
                endif
            endfor
            [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, "Backward Euler", IVP);
        endif
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% OSD METHOD TYPE-CHECK/ASSIGNMENT FUNCTION %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [msg, h, N, t, dim, w] = set_osd_mthd_values(f, I, x0, z, FLAG)
        if (!is_function_handle(f) || !isrow(I) || !iscolumn(x0) || !isbool(FLAG))
            printf("Type Check Error: Invalid argument types (vector field, time interval, initial condition, or boolean flag)!\n");
            msg = 0;
            return;
        else % check round 1 passed
            test_F = f(x0, I(2));
            if (!exist("test_F", "var"))
                printf("Type Check Error: Invalid vector field parameters!\n");
                msg = 0;
                return;
            elseif (!iscolumn(test_F) || !(length(test_F) == length(x0)))
                printf("Type Check Error: Vector field is not the same dimension as initial data!");
                msg = 0;
                return;
            elseif (!(I(2)>I(1)))
                printf("Type Check Error: Time interval is not ordered!");
                msg = 0;
                return;
            else % check round 2 passed
                if (FLAG == false)
                    h = z;
                    if (!isfloat(h))
                        printf("Type Check Error: Invalid argument type (step size)!\n");
                        msg = 0;
                        return;
                    elseif (!(h>0))
                        printf("Type Check Error: Step size is not positive!\n");
                        msg = 0;
                        return;
                    endif
                elseif (FLAG == true)
                    N = z;
                    if (!(floor(N) == ceil(N)))
                        printf("Type Check Error: Invalid argument type (mesh size)!\n");
                        msg = 0;
                        return;
                    elseif (!(N>0))
                        printf("Type Check Error: Mesh size is not positive!\n");
                        msg = 0;
                        return;
                    endif % check round 3 passed
                endif
            endif
        endif % type check successful
        msg = 1;
        h = 0;
        N = 0;
        if (FLAG == false)
            h = z;
            N = ceil(abs(I(2)-I(1))/h);
        elseif (FLAG == true)
            N = z;
            h = abs(I(2)-I(1))/N;
        endif
        t = linspace(I(1), I(2), N+1);
        dim = length(x0);
        w = zeros(dim, N+1);
        w(:, 1) = x0;
    endfunction
    
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%% OSD DATA FORMATTER %%%%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    
    function [print_str, comp_plot, param_plot] = format_osd_data(t, w, h, N, dim, mthd, IVP)
        print_str = strcat(
        sprintf("%s (h = %f):\r\n\r\n", mthd, h(1)),
        sprintf("   x(%d) = [ %.8f,]\r\n", t(N+1), w(1, N+1)));
        if (w(2:dim-1, N+1))
            print_str = strcat(print_str, sprintf("            [ %.8f,]", w(2:dim-1, N+1)));
        endif
        print_str = strcat(print_str, sprintf("            [ %.8f  ]\r\n\r\n", w(dim, N+1)));
        comp_plot = figure("visible", "off");
        title(sprintf("COMPONENT SOLUTION PLOT\n(Approximated using the %s with h=%.2f and N=%i.)", mthd, h, N));
        xlabel("t\n");
        ylabel("x_k");
        grid on;
        hold all;
        for (k=1 : dim)
            plot(t, w(k, :), sprintf(";x_%i(t);", k));
        endfor
        legend("location", "NORTHEAST");
        xlim = get(gca(), 'xlim');
        ylim = get(gca(), 'ylim');
        axis([xlim(1), xlim(2), ylim(1), ylim(2)+0.10*ylim(2)]);
        text(get(gca(),'XLim')(2)*.02, get(gca(),'YLim')(2)*.96, IVP);
        if (dim == 2)
            param_plot = figure("visible", "off");
            title(sprintf("PARAMETER SOLUTION PLOT\n(Approximated using the %s with h=%.2f and N=%i.)", mthd, h, N));
            xlabel("t\n");
            ylabel("x_k");
            grid on;
            hold all;
            plot(w(1, :), w(2, :), sprintf(";x(t) for t in [%d,%d];", t(1), t(N+1)));
            legend("location", "NORTHEAST");
            xlim = get(gca(), 'xlim');
            ylim = get(gca(), 'ylim');
            axis([xlim(1), xlim(2), ylim(1), ylim(2)+0.10*ylim(2)]);
            text(get(gca(),'XLim')(2)*.02, get(gca(),'YLim')(2)*.96, IVP);
        elseif
            param_plot == -1;
        endif
    endfunction
    



    (Theoretical Component)

    Exercise #T1.   Consider the $m$ dimensional linear constant coefficient system $\dot{y}=Ay$ and let $\{\lambda_{i}\}_{i=1}^{M}$ be the eigenvalues of $A$.
    1. Suppose $\Re(\lambda_{i})\leq0$ for each $i=1,2,\ldots,M$. Determine whether or not ,$h\lambda_{i}\in\mathscr{S}$ is a necessary condition for a one-step finite difference method to converge to a solution on $[0,T]$ as $h\to0$.

    Solution.
    1. The answer is a qualified yes. By its very definition, no numerical approximation $y_{n}$ can converge to the true solution $y$ for a stiff problem if $h\lambda_{k}\in S$ is not satisfied for each $k$ as $h\to0$, since by construction $|y_{n}|\to\infty$ for such step sizes while $y$ remains bounded $|y|\leq M<\infty$ or decays to zero $|y|\to0$. So in this sense, the condition is necessary.

      On the other hand, if $h_{j}\to0$ is some decreasing sequence of step sizes and the ODE method $\Phi(y_{n},t_{n},h)$ is consistent (and therefore convergent when the vector field $f$ satisfies a Lipschitz condition, which is the case for $f(y)=Ay$), then the global error at the final time $T$ satisfies $e(h_{j})\leq C(T,\rho(A),|D^{p}_{y}Ay|)h_{j}^{p}$ where $p$ is the order of convergence for the ODE method. Since $C$ is independent of $h_{j}$ for all $j$, uniform in $T$ for $T<\infty$ fixed, uniform in $\rho(A)$ (e.g. -- $|D_{y}{f}|$) for $A$ fixed, and $D^{p}Ay\equiv0$ for $p>1$, we may regard $C$ as an absolute constant. For any $\epsilon>0$, taking $j\geq M$, where $h_{M}<\epsilon^{\frac{1}{p}}C^{-\frac{1}{p}}$, implies $|e(h_{j}|\leq\epsilon$. This clearly implies $e_{n}\to0$ as $h\to0$, independent of any explicit stability requirements, and so the approximation $y_{n}\to y$ as $h\to0$. Indeed, for $T$ fixed the proportionality $C\propto e^{\rho(A)T}$ implies even for small $T$ that $C$ can be very large for stiff problems. However, the regularity of solutions to $\dot{y}=Ay$ insures that the estimate $|e(h_{j})|\leq Ch^{p}$ is correct, and therefore any numerical instability on the finite integration interval $[t_{0},T]$ is captured by the finite, even if large, magnitude of $C$. By sending $h\to0$, we again obtain convergence without explicit regard to stability requirements.

      Indeed, the implication is that $h_{j}\lambda_{k}\in S$ for all $k$ and $j$ sufficiently large, or in other words that convergence implies absolute stability requirements are satisfied (note we started with the contrapositive of this statement). We also see that for the case $\Re(\lambda)>0$, $h\lambda\notin S$ for $h$ sufficiently small; indeed, a strengthening of the conclusion is that convergence implies that all stability requirements are satisfied for all $h$ sufficiently small.   

    Exercise #T2.   Consider advancing the numerical solution to some system $\dot{y}=f(y)$ from $t_{n-1}$ to $t_{n}$ using two instances of the forward Euler method with respective stepsizes $h$ and $\frac{h}{2}$.
    1. Derive the first two non-zero terms of the local truncation error for the forward Euler method.
    2. Rewrite the forward Euler method using stepsize $\frac{h}{2}$ as an RKF2 method with stepsize $h$ and then derive the first two non-zero terms of the local truncation error.
    3. Compare the local truncation errors obtained in (a) and (b).
    4. Determine how the approximations obtained in (a) can be used to obtain estimates on the local truncation error at each step for the forward Euler method and RKF2 method derived in (b) using uniform stepsize $h$.

    Solution.
    1. For the forward Euler method \begin{align*} \tau_{n}(h) &=\left(y(t_{n-1})+hf(y(t_{n-1}))+\frac{h^{2}}{2}(f_{y}f)(y(t_{n-1}+\frac{h^{3}}{6}(f_{yy}f^{2}+f_{y}^{2}f)(y(\xi_{n}))\right)\\ &-\Bigg(y(t_{n-1})+hf(y(t_{n-1}))\Bigg)\\ &=\frac{h^{2}}{2}(f_{y}f)(y(t_{n-1}))+\frac{h^{3}}{6}(f_{yy}f^{2}+f_{y}^{2}f)(y(\xi_{n})),\;\;\;\xi_{n}\in(t_{n-1},t_{n})\\ &=C_{1}h^{2}+C_{2}h^{3}+O(h^{4}). \end{align*}   
    2. Two steps of size $\frac{h}{2}$ using the forward Euler method can be written as a simple Runge Kutta method: $$y_{n}=\underbrace{y_{n-1}+\frac{h}{2}f(y_{n-1})}_{y_{n-\frac{1}{2}}}+\frac{h}{2}f(y_{n-1}+\frac{h}{2}f(y_{n-1})).$$ The local truncation error is \begin{align*} \tau_{n}(h) &=\left(y(t_{n-1})+hf(y(t_{n-1}))+\frac{h^{2}}{2}(f_{y}f)(y(t_{n-1}))+\frac{h^{3}}{6}(f_{yy}f^{2}+f_{y}^{2}f)(y(\xi_{n}))\right)\\ &-\left(y(t_{n-1})+\frac{h}{2}f(y_{n-1})+\frac{h}{2}f(y_{n-1})+\frac{h^{2}}{4}(f_{y}f)(y(t_{n-1}))+\frac{h^{3}}{8}(f_{yy}f^{2})(y(\xi_{n}))\right)\\ &=\frac{h^{2}}{4}(f_{y}f)(y(t_{n-1}))+\frac{h^{3}}{6}\left(\frac{1}{4}f_{yy}f^{2}+f_{y}^{2}f\right)(y(t_{\xi_{n}})),\;\;\;\xi_{n}\in(t_{n-1},t_{n})\\ &=K_{1}h^{2}+K_{2}h^{3}+O(h^{4}). \end{align*} Observe that averaging tangents at the left and mid points results in $O(h^{2})$ local truncation error, whereas averages at the left and right end points leads to a $O(h^{3})$ local truncation error.   
    3. We see immediately that $K_{1}=\frac{1}{2}K_{2}$. If we split the terms of $C_{2}$ and $K_{2}$ and write them as $C_{2}=C_{2}^{1}+C_{2}^{2}$ and $K_{2}=K_{2}^{1}+K_{2}^{1}$, then we see that $K_{2}^{1}=\frac{1}{4}C_{2}^{1}$ and $K_{2}^{2}=C_{2}^{2}.$ This result is somewhat peculiar. It shows that the LTE of this poorly implemented RKF2 method is $O(h^{2})$, but with smaller error coefficients than forward Euler. Indeed, it states roughly that doubling the steps of forward Euler results in half the error. This is not quite correct though, since doing this for forward Euler method should result in a LTE reduced by a factor of $\frac{1}{4}$, not $\frac{1}{2}$. However, viewed in this respect, it is actually the sum of two local truncation errors, not one!   
    4. Assuming enough regularity on solutions to justify asymptotic error expansions, we may write $$y(t_{n})-\Phi_{n}(h)=\delta_{n}+\tau_{n}(h)=C_{1}h^{2}+C_{2}h^{3}+\ldots$$ and $$y(t_{n})-\Psi_{n}(h)=\delta'_{n}\tau_{n}(h)=K_{1}h^{2}+K_{2}h^{3}+\ldots,$$ where $\Phi_{n}(h)$ and $\Psi_{n}(h)$ are the numerical approximations $y_{n}$ from the methods in (a) and (b), respectively. Since the above error estimates are only valid if the previous approximations are exact, (presumably) small correction factors $\delta_{n},\delta_{n}'$ are introduced.   




    (Theoretical Component)

    Exercise #T1.  Assume that one is attempting to approximate the solution to an IVP on some interval $[t_{0},T]$.
    1. Under what circumstances is it reasonable to assume that the error in the approximate solution at time $T$ is the sum of the local truncation errors at each timestep?
    2. Assuming (a), devise a timestep selection strategy that provides an approximate solution with error at time $T$ of size $TOL=\epsilon>0$ given. Provide a flowchart which describes the process in advancing from time $t_{n-1}$ to $t_{n}$ using this strategry.

    Solution.
    1. We can expect the global error at the final timestep to be well approximated by the sum of local truncation errors when both the order of convergence $p$ of some method $\Phi(h)$ to be chosen is high and corresponding stepsize $h$ is small, relative to the length of the integration interval $T$ and magnitude of the Lipschitz continuity $L$ of the system. To see this, write any given explicit one-step method as $$y_{n}=y_{n-1}+h\Phi(t_{n-1},y_{n-1},h).$$ It is easy to prove (see assignment \#2) that the bound on the global error for such a method is given by $$E=|e_{N}|\leq e^{L_{\Phi}T}|e_{0}|+\frac{\tau}{hL_{\Phi}}\left(e^{L_{\Phi}T}-1\right)$$ where $\tau=\sup_{0\leq n\leq N}|\tau_{n}(h)|.$ If the vector field $f$ satisfies a Lipshitz constant, then we may take $L_{\Phi}=L_{f}=L$; moreover, since the method is one-step explicit, we may assume $y_{0}=y(0)$ so that $e_{0}\equiv0$. This gives $$E\leq\frac{\tau}{hL}\left(e^{LT}-1\right)=\frac{\tau}{hL}\left(\sum\limits_{k=1}^{\infty}\frac{(LT)^{n}}{n!}\right)=\frac{\tau}{h}T+\frac{\tau}{2h}LT^{2}+\ldots.$$ Now $\frac{\tau'}{h}T\leq\sum|\tau_{n}(h)|\leq N\tau=\frac{\tau}{h}T.$ where $\tau'=\inf_{0\leq n\leq N}|\tau_{n}(h)|.$ If we take differences between the upper bound on $E$ and the lower bound on $\sum|\tau_{n}(h)|$, we obtain $$\left|E-\sum|\tau_{n}(h)|\right|\leq\frac{T}{h}\left(\tau-\tau'\right)+\frac{\tau}{h}\left(\frac{LT^{2}}{2}+\frac{L^{2}T^{3}}{6}+\ldots\right),$$ not a particularly useful estimate. Since the upper bound on the global error is likely to be a gross overestimate (observe as in exercise \#2 that the series expansion of $e^{LT}$ actually terminates after $p$ terms, $p$ being the order of convergence of the method), let's assume the difference can be well approximated by taking differences between the two upper bounds, yielding $$\left|E-\sum|\tau_{n}(h)|\right|\approx\frac{\tau}{h}\left(\frac{LT^{2}}{2}+\frac{L^{2}T^{3}}{6}+\ldots\right)=\frac{\tau}{hL}\left(e^{LT}-1\right)=\frac{Ch^{p+1}}{hL}\left(e^{LT}-1\right).$$ From this approximation, we see that the relative equivalence of the global error and the sum of local truncation errors depends heavily on (a) the length of the integration interval ($T$); (b) the magnitude of variation in the vector field across trajectories ($L$) (note also that while $C$ does not depend on $h$, $C=C(L,T)\nearrow\infty$ in general as $L,T\to\infty$); and (c) the order of the method and chosen stepsize. For highly variational vector fields (e.g. stiff problems) and long integration intervals, one cannot generally expect the global error to be approximately equal to the sum of local truncation errors; it will in general be greater. On the other hand, $L$ and $T$ are fixed before a method and stepsize are chosen. Therefore, one can insure the global error is well approximated by the sum of local truncation errors by either taking the stepsize to be sufficiently small or choosing a method with sufficiently high order of convergence (or typically some mutual combination of both).   
    2. With $T$ and $L$ fixed, the estimate in (a) shows that we can expect the sum of local truncation errors to approximate the global error at some time $T$ so long as the method we choose has sufficiently high order of convergence and $h_{n}$ at each step satisfies some reasonable upper bound ($h=0.9$, say) -- this is just to establish some real-life basis in being able to assume (a).

      With the assumptions of (a) and chosen method $\Phi$, let $h_{1}$ be a reasonable starting step-size used to advance the solution from $y_{0}\to y_{1}$. At the first iteration, estimate the (absolute value of) local truncation error $\tilde{\tau}_{1}\approx\tau_{1}(h_{1})$ using step-doubling or some other method. With this approximation, compute $$q_{1}=N_{1}\tilde{\tau_{1}}=\frac{T}{h_{1}}\tilde{\tau_{1}}.$$ If $q_{1}<\epsilon$, proceed from $y_{1}\to y_{2}$ using $h_{2}=h_{1}.$ If $q_{1}\geq\epsilon$, then $r_{1}=|\tilde{\tau_{1}}-\frac{\epsilon}{N_{1}}|$ is approximately the excess magnitude of the local truncation error at step $1$. Since $\tau_{1}(h_{1})=Ch_{1}^{p}$ for some constant $C$, $\tilde{\tau_{1}}\approx Ch_{1}^{p}$, and so we see that a good adjustment to $h_{1}$ is $$h_{1}\mapsto h_{1}^{\star}=\alpha\left(\frac{\epsilon}{N_{1}C}\right)^{\frac{1}{p}}=\alpha\left(\frac{\tau_{1}-r_{1}}{C}\right)^{\frac{1}{p}}$$ where $0<\alpha<1 data-blogger-escaped-a="" data-blogger-escaped-ability="" data-blogger-escaped-above="" data-blogger-escaped-account="" data-blogger-escaped-accumulated="" data-blogger-escaped-achieved="" data-blogger-escaped-adapted.="" data-blogger-escaped-always="" data-blogger-escaped-amount="" data-blogger-escaped-and="" data-blogger-escaped-approx="" data-blogger-escaped-approximation="" data-blogger-escaped-as="" data-blogger-escaped-assuming="" data-blogger-escaped-assumption="" data-blogger-escaped-assumptions="" data-blogger-escaped-at="" data-blogger-escaped-be="" data-blogger-escaped-been="" data-blogger-escaped-between="" data-blogger-escaped-buffer="" data-blogger-escaped-by="" data-blogger-escaped-can="" data-blogger-escaped-ch_="" data-blogger-escaped-conservative="" data-blogger-escaped-construction="" data-blogger-escaped-continue="" data-blogger-escaped-continuing="" data-blogger-escaped-correction="" data-blogger-escaped-course="" data-blogger-escaped-depends="" data-blogger-escaped-e.g.="" data-blogger-escaped-each="" data-blogger-escaped-epsilon-="" data-blogger-escaped-error.="" data-blogger-escaped-error="" data-blogger-escaped-estimate="" data-blogger-escaped-extent="" data-blogger-escaped-factor="" data-blogger-escaped-final="" data-blogger-escaped-frac="" data-blogger-escaped-general="" data-blogger-escaped-global="" data-blogger-escaped-good="" data-blogger-escaped-greater="" data-blogger-escaped-h_="" data-blogger-escaped-has="" data-blogger-escaped-have="" data-blogger-escaped-how="" data-blogger-escaped-if="" data-blogger-escaped-in="" data-blogger-escaped-initial="" data-blogger-escaped-into="" data-blogger-escaped-is="" data-blogger-escaped-it="" data-blogger-escaped-j="" data-blogger-escaped-left="" data-blogger-escaped-limits_="" data-blogger-escaped-lte="" data-blogger-escaped-ltes="" data-blogger-escaped-much="" data-blogger-escaped-n-1="" data-blogger-escaped-n="" data-blogger-escaped-nbsp="" data-blogger-escaped-new="" data-blogger-escaped-note="" data-blogger-escaped-nth="" data-blogger-escaped-number="" data-blogger-escaped-obtain="" data-blogger-escaped-of="" data-blogger-escaped-on="" data-blogger-escaped-ought="" data-blogger-escaped-our="" data-blogger-escaped-p="" data-blogger-escaped-presmably="" data-blogger-escaped-procedure="" data-blogger-escaped-properly="" data-blogger-escaped-putting="" data-blogger-escaped-q_="" data-blogger-escaped-qed="" data-blogger-escaped-quantity="" data-blogger-escaped-r_="" data-blogger-escaped-reach="" data-blogger-escaped-remaining="" data-blogger-escaped-repeat="" data-blogger-escaped-replaced="" data-blogger-escaped-right="" data-blogger-escaped-small="" data-blogger-escaped-spare="" data-blogger-escaped-star="" data-blogger-escaped-step="" data-blogger-escaped-steps="" data-blogger-escaped-suitably="" data-blogger-escaped-sum="" data-blogger-escaped-sum_="" data-blogger-escaped-t_="" data-blogger-escaped-takes="" data-blogger-escaped-tau_="" data-blogger-escaped-than="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-then="" data-blogger-escaped-this="" data-blogger-escaped-tilde="" data-blogger-escaped-time="" data-blogger-escaped-to="" data-blogger-escaped-took="" data-blogger-escaped-uncertainty="" data-blogger-escaped-until="" data-blogger-escaped-using="" data-blogger-escaped-validity="" data-blogger-escaped-we="" data-blogger-escaped-well="" data-blogger-escaped-which="" data-blogger-escaped-will="" data-blogger-escaped-with="" data-blogger-escaped-zero="">


    Exercise #T2.  
    1. Determine the leading term of the local truncation error of the method $$y_{n}=\frac{4}{3}y_{n-1}-\frac{1}{3}y_{n-2}+\frac{2}{3}hf(t_{n},y_{n}).$$
    2. Determine if the above method is convergent.

    Solution.
    1. We will take $(t_{n-2},y(t_{n-2})$ to be our base point (although any expansion point will work). With $\tilde{y_{n}}$ the approximation of $y_{n}$ assuming all previous values are exact, we have \begin{align*} \tau_{n}(h) &=y(t_{n})-\tilde{y_{n}}\\ &=\left\{y(t_{n-2})+2hy'(t_{n-2})+2h^{2}y''(t_{n-2})+\frac{4h^{3}}{3}y'''(t_{n-2})+O(16h^{4})\right\}\\ &-\Bigg\{\frac{4}{3}\left(y(t_{n-2})+hy'(t_{n-2})+\frac{h^{2}}{2}y''(t_{n-2})+\frac{h^{3}}{6}y'''(t_{n-2})+O(h^{4})\right)-\frac{1}{3}y(t_{n-2})\\ &+\frac{2}{3}h\left(y'(t_{n-2})+2hy''(t_{n-2})+2h^{2}y'''(t_{n-2})+O(8h^{3})\right)\Bigg\}\\ &=y(t_{n-2})\left(1-\frac{4}{3}+\frac{1}{3}\right)+hy'(t_{n-2})\left(2-\frac{4}{3}-\frac{2}{3}\right)+h^{2}y''(t_{n-2})\left(2-\frac{2}{3}-\frac{4}{3}\right)\\ &+h^{3}y'''(t_{n-2})\left(\frac{4}{3}-\frac{2}{9}-\frac{4}{3}\right)+O(h^{4})\\ &=-\frac{2}{9}h^{3}y'''(t_{n-2})+C_{2}h^{4}. \end{align*} Note that \begin{align*} f(y(t_{n})) &=f(y(t_{n-2}+2h))+2hD_{t}f(y(t_{n-2}))+2h^{2}D^{2}_{t}(y(t_{n-2}))+O((\Delta t)^{3})\\ &=y'+2hf_{y}f+2h^{2}(f_{yy}f+f_{y}^{2}f)+O(\Delta t^{3})&\text{(evaluated at $t_{n-2}$)}\\ &=y'(t_{n-2})+2hy''(t_{n-2})+2h^{2}y'''(t_{n-2})+O((2h)^{3}), \end{align*} where $D=\frac{d}{dt}$. This allows us to avoid having to expand $f(y(t_{n}))$ about the "point" $y(t_{n})$ and using $\Delta y$, for which the best we can do is use the approximation $dy=f\Delta t\approx\Delta y$.

      Note also that for a general $k$ step linear multi-step method we have $$\sum_{j=0}^{k}a_{j}y_{n-j}=\sum_{j=0}^{n}b_{j}f_{n-j}.$$ It is a trivial exercise to compute the local truncation error of the general method, and by grouping like powers of $h$ (as we did above), we can conclude in general that the system of equations \[\begin{array}{l} C_{0}=\sum\limits_{j=0}^{k}a_{j}=0\\ \vdots\\ C_{i}=(-1)^{i}\left(\frac{1}{i!}\sum\limits_{j=1}^{k}j^{i}a_{j}+\frac{1}{(i-1)!}\sum\limits_{j=0}^{k}j^{i-1}b_{j}=0\right). \end{array} \] determine the coefficients $a_{j},b_{j}$ and order of the method. Note that by requiring the method to be linear in $f$ (e.g. disallowing compositions), we are able to obtain a very general constructibility criterion for a linear multistep method of any desired order. This is in marked contrast to ordinary one-step methods, where higher order can only be achieved by taking approximations within the current time interval, hence requiring the method to contain multiple functional compositions. The exact form of such compositions is the source of difficulty, and even when a proposed format is given, determining the coefficients is tedious due to the complicated Taylor expansions in the LTE computation (e.g. RKF methods).   
    2. While part (a) demonstrates the advantage of the theory of linear multi-step methods over arbitrary one-step methods when it comes to constructibility, there is one drawback: one-step methods once constructed are easily proven to be convergent, whereas linear multistep methods of arbitrary order can still fail to be convergent, and the theory to establish criteria for convergence is difficult. Fortunately though, the application of the theory is simple to apply. Define the characteristic polynomial of the method to be $$\rho(\xi)=\sum_{j=0}^{k}a_{j}\xi^{k-j}.$$ Then zero stability (hence convergence since the method is presumably consistent) is equivalent to the roots of $\rho$ satisfying $|\xi_{i}|<1 data-blogger-escaped-a="" data-blogger-escaped-and="" data-blogger-escaped-conditions="" data-blogger-escaped-convergent.="" data-blogger-escaped-disk="" data-blogger-escaped-e.g.="" data-blogger-escaped-extraneous="" data-blogger-escaped-fact="" data-blogger-escaped-for="" data-blogger-escaped-frac="" data-blogger-escaped-i="" data-blogger-escaped-in="" data-blogger-escaped-inside="" data-blogger-escaped-is="" data-blogger-escaped-lies="" data-blogger-escaped-method="" data-blogger-escaped-neq0="" data-blogger-escaped-of="" data-blogger-escaped-one="" data-blogger-escaped-or="" data-blogger-escaped-order="" data-blogger-escaped-our="" data-blogger-escaped-particular="" data-blogger-escaped-rho="" data-blogger-escaped-root="" data-blogger-escaped-roots="" data-blogger-escaped-satisfy="" data-blogger-escaped-simple="" data-blogger-escaped-so="" data-blogger-escaped-stability="" data-blogger-escaped-the="" data-blogger-escaped-unit="" data-blogger-escaped-xi-1="" data-blogger-escaped-xi-="" data-blogger-escaped-xi="\frac{1}{3}$" data-blogger-escaped-zero="">strongly
    stable.   

    Exercise #T3.  
    1. Determine the leading term of the local truncation error of the method $$y_{n}=-y_{n-1}+2y_{n-2}+h\left(\frac{5}{2}f(t_{n-1},y_{n-1})+\frac{1}{2}f(t_{n-2},y_{n-2})\right).$$
    2. Determine if the above method is convergent.

    Solution.
    1. Although we could appeal to the constructibility constraints presented in the previous exercise, and simply determine the first function the vector of coefficients $x=(a_{0},a_{1},a_{2},b_{1},b_{2})$ (where $a_{0}=1,a_{1}=1,a_{2}=-2,b_{1}=\frac{5}{2},b_{2}=\frac{1}{2}$) fails to be a root of (taking the non-zero value to be the leading order term), we proceed exactly as in the previous problem instead. Expanding about the point $(t_{n-2},y(t_{n-2}))$ we obtain \begin{align*} \tau_{n}(h) &=y(t_{n})-\tilde{y_{n}}\\ &=\left\{y(t_{n-2})+2hy'(t_{n-2})+2h^{2}y''(t_{n-2})+\frac{4h^{3}}{3}y'''(t_{n-2})+O(16h^{4})\right\}\\ &-\Bigg\{-\left(y(t_{n-2})+hy'(t_{n-2})+\frac{h^{2}}{2}y''(t_{n-2})+\frac{h^{3}}{6}y'''(t_{n-2})+O(h^{4})\right)\\ &+2y(t_{n-2})+\frac{5h}{2}\left(y'(t_{n-2})+hy''(t_{n-2})+\frac{h^{2}}{2}y'(t_{n-2})+O(h^{3})\right)+\frac{h}{2}y'(t_{n-2})\Bigg\}\\ &=y(t_{n-2})\left(\underbrace{1+1-2}_{0}\right)+hy'(t_{n-2})\left(\underbrace{2+1-\frac{5}{2}-\frac{1}{2}}_{0}\right)+h^{2}y''(t_{n-2})\left(\underbrace{2+\frac{1}{2}-\frac{5}{2}}_{0}\right)\\ &+h^{3}y'''(t_{n-2})\left(\underbrace{\frac{4}{3}+\frac{1}{6}-\frac{5}{4}}_{\frac{1}{4}}\right)+O(h^{4})\\ &=\frac{1}{4}h^{3}y'''(t_{n-2})+C_{2}h^{4}. \end{align*}   
    2. The characteristic polynomial of the method is $$\rho(\xi)=\xi^{2}+\xi-2=(\xi+2)(\xi-1).$$ Evidently the extraneous root is outside the unit disk, thus the method cannot be zero stable, and is therefore not convergent.   

    Exercise #T4.  
    1. Determine the rate of convergence of a k multistep method when the initial values are all taken to be $y(t_{0})$, e.g. $y_{0}=\ldots=y_{k-1}=y(t_{0})$.

    Solution.
    1. Part II of Theorem 5.1 from the text states that sufficient conditions for $O(h^{p})$ convergence of $O(h^{p+1})$ accurate multi-step methods (and incidentally, one-step methods) is zero stability (e.g. root condition), consistency, and $O(h^{p+1})$ accurate initial data (this is just a restatement of the fundamental convergece theorem). We of course assume the method satisfies the root condition so that it is zero stable, and investigate the ladder two conditions. Indeed, the condition on the initial data implies starting approximations which are $O(h^{p+1})$ accurate in the sense that the method used to approximate $y_{0},\ldots,y_{k-1}$ is consistent with $y(t_{0}),\ldots,y(t_{k-1})$ as $h\to0$. This is certainly the case when using $O(h^{p})$ convergent methods to obtain the starting approximations; hovever, it is obvious the numerical method $y_{n}=y_{0}$ is not consistent with the initial data for any ODE other than $f(t,y)\equiv 0$, and so the starting values $y_{1},\ldots,y_{k-1}$ are also not consistent with the IVP. Certainly though, the multistep method itself is consistent on the mesh points to which it is applied, but the overall method remains inconsistent, and so the conditions which guarantee convergence are not satisfied. But the convergence theorem is not an equivalence theorem, and therefore convergence is still possible, but it is clear that in such an exceptional case the full order of convergence $O(p)$ cannot be expected.

      It is interesting to note that a one-step method will converge to a solution which has initial data $(t_{k-1},y_{0})$, and so it is never possible for a one-step method to be convergent if so much as one mesh point is inconsistent! Multistep methods, however, intermingle the previous approximations recursively throughout the integration, so it is not clear what solution it converges to if it even does at all, and therefore we cannot say with certainty like we can for one-step methods that the order of convergence is zero. To rigorously prove anything beyond this would require a deeper analysis of Theorem 5.1, which is beyond the scope of this course. However, a simple heuristic argument shows the order of convergence in most cases should be zero, even if just one mesh point is inconsistent. In fact, since $y(t_{0})$ is just the zero-order term in the expansion of $y(t)$ about $(t_{0},y_{0})$, the LTE at the first step will then be $O(h)$, and this poor approximation contaminates the successive approximations, leading to a global error of $\frac{1}{h}O(h)=O(1)$. If we take the next k-2 approximations to be $y_{0}$, the same heuristics apply, since $y_{0}$ is still the first term in the Taylor expansion at $t_{1},\ldots,t_{k-1}$; however, the change of $y(t)$ compared to the constant $y(t_{0})$ at $t_{j}$ farther from $t_{0}$ becomes greater and greater for larger $j$, so the zero order term becomes a worse and worse initial approximation. At best, the final error would then be $O(1)$, meaning the method would have order of convergence zero. The algebra of order notation provides another justification to the above heuristics. The global error is roughly no better than the sum of local truncation errors (and is usually greater). If the LTE in the first approximation is $O(h)$, and all others are $O(h^{p})$, then the global error will be roughly $O(h)+\sum O(h^{p})=O(h)$.

      As an interesting example which illustrates the exceptional case suggested above, consider the IVP $y'=\lambda y$ and suppose $y_{0}$ is consistent with $y(t_{0}$ but that $y_{1}$ is not consistent with $y(t_{1})$ by an amount $\epsilon=h$. The multistep method $y_{k}-y_{k-2}=2hf_{k-1}$ has characteristic roots $1$ and $-1$, so it satisfies the root condition and is itself consistent on the mesh points it is applied to. The inconsistency of $y_{1}$, however, destroys the zero stability of the method. Indeed, it is easy to see that the result is a oscillatory approximation to the decaying exponential curve which does not correct itself by sending $h\to0$ (however, at least the solution does not blow up as $h\to0$ which is characteristic if zero-unstable methods!). On the other hand, applying the 2-step Adams Bashforth method results in apparent convergence as $h\to0$.   




    Click here to read full post »