Universality of Uniform

Let be a CDF which is a continuous function and strictly increasing on the support of the distribution. This ensures that the inverse function exists, as a function from to . We then have the following results.

  • Let and . Then is an r.v. with CDF .
  • Let be an r.v. with CDF . Then .
  • This means, to sample any RV, uniformly pick a value from U(0,1) and then for that value of y, find the corresponding x in CDF of the RV.
  • If , then:
  • For any r.v.s , and any constant ,
  • Law of the unconscious statistician (LOTUS): If is a discrete .v. and is a function from to , then where the sum is taken over all possible values of .
  • The variance of an r.v. is The square root of the variance is called the standard deviation (SD): Recall that when we write , we mean the expectation of the random variable , not (which is 0 by linearity).
  • For any r.v. ,
  • For , we can write with , and then

Law of Large numbers

Let be i.i.d. with mean and variance . The law of large numbers says that as , the sample mean converges to the constant (with probability 1 ).

  • But what is its distribution along the way to becoming a constant? This is addressed by the central limit theorem (CLT), which, as its name suggests, is a limit theorem of central importance in statistics.

Central Limit Theorem

For large , the distribution of is approximately . Of course, we already knew from properties of expectation and variance that has mean and variance ; the central limit theorem gives us the additional information that is approximately Normal with said mean and variance.

  • The CLT states that for large , the distribution of after standardization approaches a standard Normal distribution. By standardization, we mean that we subtract , the expected value of , and divide by , the standard deviation of .

Markov Chain

A sequence of random variables taking values in the state space is called a Markov chain if for all ,

The quantity is called the transition probability from state to state . (subscripts indicate timestep, values of state space indicate states)

  • When referring to a Markov chain we will implicitly assume that it is time-homogeneous, which means that the transition probability is the same for all times . But care is needed, since the literature is not consistent about whether to say “time-homogeneous Markov chain” or just “Markov chain”.
  • The above condition is called the Markov property, and it says that given the entire past history , only the most recent term, , matters for predicting.
  • The Markov property greatly simplifies computations of conditional probability: instead of having to condition on the entire past, we only need to condition on the most recent value.
  • For more on Markov chains, see (Joseph Chang, 2007).
Joseph Chang. (2007). Markov Chains - from Stochastic Processes. http://www.stat.yale.edu/~pollard/Courses/251.spring2013/Handouts/Chang-MarkovChains.pdf