Modern Theory of Markov ChainsThis is the page for the course in 2013. For information about this year's course see here. Time and location
DescriptionHow many times do we need to shuffle a deck of $n$ cards in order to make them evenly mixed? Will a population of bacteria survive or eventually go extinct? Will a particle jumping randomly from one place to another eventually return to its original whereabouts? How quickly does a Monte Carlo simulation converge (if at all)? Motivated by modern applications and guided by examples, we develop the basic theory of Markov chains and familiarize ourselves with various mathematical techniques invented to analyze them.This course is a part of the master's program Stochastics and Financial Mathematics. Pre-requisitesWorking knowledge of basic concepts of probability theory (at a level covered by a first course in probability) is required. Mathematical maturity and computer skills are desirable.Literature
Course webpagehttp://www.staff.science.uu.nl/~taati001/markov/InstructorSiamak TaatiEvaluation
The course code is WISS123. Weekly outline
[June 18, 2013]
Summary
§
Project presentations
[June 11, 2013]
§
[June 7, 2013]
Time: 14:30, Place: Buys Ballot Gebouw 069
Summary
§
Project presentations
[June 4, 2013]
Summary
§
No new material.
Review of assignments and/or what the course was about. Bring your questions, comments, food for discussion. [May 28, 2013]
Summary
§
We continued the discussion of electric networks
and their correspondence with random walks.
The correspondence provides a general approach to
study the recurrence/transience of random walks.
We gave a proof of Pólya's theorem in the two-dimensional case
(i.e., the simple random walk on $\mathbb{Z}^2$ is recurrent)
and gave the idea of the proof in the three-dimensional case
(i.e., the simple random walk on $\mathbb{Z}^3$ is transient).
Starting with a general setting of electric networks, we showed how the question of recurrence/transience is reduced to finding appropriate lower/upper bounds for the effective resistance between the origin (i.e., a designated vertex) and the boundary of a box of size $n$ around the origin. The random walk is recurrent if and only if this effective resistance diverges to infinity as $n\to\infty$. Using Rayleigh's monotonicity principle, we were able to find a simple lower bound for the effective resistance in the two-dimensional lattice and show that it diverges to infinity. This settled the two-dimensional part of Pólya's theorm. To prove the monotonicity principle, we used Thomson's variational principle, which characterizes the current of a network with a given current boundary condition as the divergence-free flow that minimizes the dissipated energy. The variational principle also suggests a method to find upper bounds for the effective resistance. At the end, we gave a plausibility argument for the transience of the three-dimensional random walk. This was the idea behind Lyons's proof of the theorem, which you can find in chapter 21 of the textbook, or in chapter 1 of Grimmett's book. Other proofs can be found in the book of Doyle and Snell and in the paper by Levin and Peres.
[May 21, 2013]
Summary
§
We discussed the analogy between random walks and
electric networks. Our goal (to be achieved next week)
is to use this analogy to prove Pólya's theorem on random walks:
The simple random walk on $\mathbb{Z}^d$
is recurrent for $d=1,2$ and transient for $d\geq 3$.
The material is covered in chapters 9 and 21 of the textbook,
but we mostly follow chapter 1 of
Grimmett's book.
After observing the analogy between the drunkard's walk and an electric circuit consisting of $n$ unit resistors connected in series to a unit battery, we formulated and proved a general correspondence between random walks on weighted graphs and electric networks. Namely, we introduced a game on a weighted graph $(V,E)$ as follows. A subset $A\subseteq V$ of vertices are designated as final states, and and a number $g_0(a)$ is associated to each final state. The player performs a random walk on the graph until it arrives at one of the final states, when the game stops. If arrived at vertex $a\in A$, the player gains/loses $g_0(a)$ points. We wish to know the expected gain $g(x)$ if the player starts the random walk at vertex $x$. The corresponding electric network is obtained by placing a resistor (with appropriate resistance) in place of each edge, and connecting the final vertices to batteries in such a way that the voltage of $a\in A$ is $g_0(a)$. We claimed that the expected gain $g(x)$ of the random walk starting at vertex $x$ can be read from the electric network by measuring the voltage $W(x)$ of vertex $x$. This correspondence was proved by showing that both expected gain $g(\cdot)$ and the voltage $W(\cdot)$ are harmonic at every non-boundary vertex $x\in V\setminus A$, and that there is only one function that is harmonic at every vertex $x\in V\setminus A$ and satisfies a given boundary condition on $A$. We then observed how the problem of recurrence/transience of a random walk on a graph (such as $\mathbb{Z}^d$) can be translated using this correspondence to a problem about electric networks. [May 14, 2013]
§
[May 7, 2013]
Summary
§
We studied the mixing speed of the
top-to-random
card shuffling method.
The Markov chain that models the top-to-random shuffling has a unique stationary distribution (why?) which is the uniform distribution (why?). Starting with any arrangement of cards, the shuffling eventually mixes the cards uniformly (why?). The distance from stationary distribution has a cut-off at time $t_\text{mix}\approx n\log n$, where $n$ is the number of cards in the deck. Namely, the (worst case) distance from stationarity remains close to its maximal value (i.e., $1$) until around time $n\log n$, when in a relatively short period (of length $O(n)$) it drops to almost $0$. We proved this by establishing matching upper and lower bounds for the distance from stationary distribution in the vicinity of $n\log n$. To prove the upper bound, we used the fact that the Markov chain has a strong stationary time that has the same distribution as the time of acquiring all the coupons in the coupon collecting problem. [April 30, 2013]
Summary
§
Queen's day
[April 23, 2013]
Summary
§
We went through two more examples in which coupling arguments
could be used to estimate mixing times.
For the lazy version of the Ehrenfest model ($\equiv$ lazy random walk on the hypercube), we showed that the mixing time satisfies $t_\text{mix}(\mathrm{e}^{-r})\leq n\log n + rn$ (for all $r>0$). This, we proved by constructing a simultaneous construction (coupling) of the chain with all initial conditions. This reduced the problem to the so-called coupon collecting problem, which we took the chance to study in some detail. We then studied the Gibbs sampler for the hard-core model on a finite graph. The hard-core Gibbs sampler has rapid convergence only for small values of the fugacity $\lambda$. Our aim was to show that for $\lambda$ sufficiently small, the mixing time is bounded as $t_\text{mix}(\mathrm{e}^{-r})\leq \alpha_\lambda (n\log n + r n)$, where $\alpha_\lambda$ is a constant depending on $\lambda$. The proof (which we didn't find time to finish) is an example of the path coupling method. The basic idea is to couple two copies of the chain whose starting configurations disagree on exactly one position (a single particle). For any two configurations $x$ and $y$ we can then find a sequence $x=z_0,z_1,\ldots,z_n=y$ of configurations each disagreeing with the next one in exactly one position. Check out this (rather inefficient) Mathematica simulation (requires Version 9) of the Gibbs sampler for the hard-core model on a toroidal square lattice $\mathbb{Z}_n\times\mathbb{Z}_n$. Compare the evolution for small fugacities (say, $\lambda=0.25$) and large fugacities (say, $\lambda=25$). [April 16, 2013]
Summary
§
As an exercise in coupling arguments,
we gave a proof of Proposition 5.6 from the textbook,
which gives a bound $O(t^{-1/2})$ for the distance between
the distributions at times $t$ and $t+1$ for
the lazy version of any irreducible Markov chain.
The coupling argument reduced the problem to the problem of
estimating the tail of the distribution of the first hitting time of
site $0$ for a simple random walk on $\mathbb{Z}$
starting at site $1$.
We used the hitting time theorem and Stirling's approximation
to obtain the claimed bound.
We then used this as an excuse to prove two curious theorems about random walks on $\mathbb{Z}$: the hitting time theorem and the ballot theorem. For the hitting time theorem, we followed the elegant proof given in the paper
[April 9, 2013]
Summary
§
We continued with the study of the speed of mixing
in finite-state irreducible and aperiodic Markov chains.
As another example of using couplings to estimate the mixing speed, we considered a simple random walk on a path of length $n$ with loops at each end (Example 5.1 from the textbook). We found that the coupling time can be estimated by the first hitting time of position $n$ if the walk starts at position $0$. The expected value of this hitting time is $n(n+1)$, and the Markov inequality gives the bound $t_\text{mix}(\varepsilon)\leq n(n+1)/\varepsilon$ for the mixing time with accuracy $\varepsilon$. Being a bit more careful, we then noticed that the Markov bound for the coupling time was independent of the initial positions of the two coupled walks, and the unfortunate pedestrian principle lead us to the better bound $t_\text{mix}(2^{-r})\leq 2rn(n+1)$. We then studied the evolution of distance from stationarity $d(t)$ (and the related function $\bar{d}(t)$) for arbitrary finite-state irreducible and aperiodic Markov chains and verified that it decays exponentially. As a corollary, we found that in general, $t_\text{mix}(2^{-r})\leq r\, t_\text{mix}(1/4)$. Interpretation: the mixing time $t_\text{mix}(1/4)$ (or $t_\text{mix}(\varepsilon_0)$ for any fixed constant $\varepsilon_0<1/2$) can be seen as the time scale of the exponential decay of distance from stationarity. [April 2, 2013]
Summary
§
We talked about the speed of convergence to equilibrium
in (finite-state) irreducible, aperiodic chains.
The convergence theorem says that for a finite-state irreducible aperiodic Markov chain $(X_t)_{t\geq 0}$, the distribution of $X_t$ approaches (in total variation distance) to the stationary distribution. In most applications (say, in Monte Carlo algorithms, or for physical models), we want to know how long we should wait to be sure that the distribution of $X_t$ is within distance $\varepsilon>0$ from the stationary distribution. The smallest such time is called the mixing time (for accuracy $\varepsilon$). One approach to estimate the mixing time is to devise thought experiments in which two copies of the chain are run together (a coupling). One such thought experiment we had already used to prove the convergence theorem (independent before meeting, moving together once met). More sophisticated couplings (thought experiments) tailor-made for the particular model at hand could provide good estimates for the mixing time. For a lazy random walk on a cycle $\mathbb{Z}_n$, we devised a simple coupling that gave the upper bound $\frac{n^2}{4\varepsilon}$ for the mixing time. Similarly, for a lazy random walk on a $d$-dimensional torus $\mathbb{Z}_n^d$, we obtained the mixing time $\frac{n^2 d^2}{4\varepsilon}$. The way these bounds depend on $\varepsilon$ (inversely proportional) does not seem appealing, because we know that the distance from stationarity decreases exponentially. We shall improve this next time. [March 26, 2013]
Summary
§
Three important examples where the question of recurrence vs. transience shows up:
Assignment
For the basic coupling used in the convergence theorem, the diagonal set is a recurrent communication class that is accessible from all the other states.
Assignment 6
--- due: April 16, 2013
[March 19, 2013]
Summary
§
We gave a (partial) proof of the convergence theorem.
AssignmentWe mentioned several possible interpretations of the convergence of a sequence of distributions to another distribution. Two examples:
For two simple examples (random walk on an odd cycle, and the lazy Ehrenfest model), we saw that simple arguments using coupling two copies of the chain can be used to show convergence to the unique stationary distribution. We then formulated a general coupling argument that proves convergence if we know that the two coupled chains eventually merge. [March 12, 2013]
Summary
§
We talked about Monte Carlo simulations.
We introduced the Ising model of ferromagnetism as a guiding example. The macroscopic state of the system is described by a probability distribution (the Boltzmann distribution) on the configuration space. In order to study the model via simulations, one needs to be able to generate random samples from this distribution. We mentioned few other examples (from statistical mechanics and computer science) where one needs to generate random samples from a prescribed distribution. One approach is to design an ergodic Markov chain whose stationary distribution is the desired distribution, and run the chain for a long period. A Gibbs sampler chain does the trick. Another construction (which we did not find time to discuss about) is the Metropolis chain. A Gibbs sampler for the Ising model [Mathematica notebook (requires Version 9)] [March 5, 2013]
Summary
§
We mentioned without proof the convergence theorem (a.k.a. mixing theorem):
Assignment
Any Markov chain (on a countable state space) that is
irreducible and aperiodic and has a stationary distribution approaches
that stationary distribution.
We discussed the problem of identifying the stationary distribution(s)
of a Markov chain.
In many cases, the symmetries of the chain allow us to find a stationary distribution
without pain. We talked about two types of such symmetries:
Finally, we asked:
Do we get a Markov chain if we reverse the direction of time
in a bi-infinite Markov chain?
We found that the time-reversal of any bi-infinite Markov chain in equilibrium
is itself a Markov chain.
Assignment 4 (slightly updated)
--- due: March 19, 2013
[February 26, 2013]
Summary
§
Playing with Markov chains in Mathematica
[Notebook
(requires Version 9) or
CDF or
PDF]
We discussed the problem of uniqueness of stationary distribution.
[February 19, 2013]
Summary
§
We discussed the problem of existence of stationary distribution.
AssignmentWe first made the setting explicit:
Assignment 3
--- due: March 5, 2013 (preferably)
[February 12, 2013]
Summary
§
We considered two examples:
Assignment
Assignment 2
--- due: February 19, 2013
(extended till: preferably February 26, 2013)
Errata:
an extra condition is missing in Problem 1(a). For now, you could assume that
$N$ is independent of $Z_i$
and drop part (b). Later we shall prove a version of Wald's equation that covers part (b).
[February 5, 2013]
Summary
§
We discussed few examples:
Assignment
Assignment 1
--- due: February 19, 2013
|
|
Last Update: June 2, 2013
|