Universality of Uniform
Let be a CDF which is a continuous function and strictly increasing on the support of the distribution. This ensures that the inverse function exists, as a function from to . We then have the following results.
- Let and . Then is an r.v. with CDF .
- Let be an r.v. with CDF . Then .
- This means, to sample any RV, uniformly pick a value from U(0,1) and then for that value of y, find the corresponding x in CDF of the RV.
- If , then:
- For any r.v.s , and any constant ,
- Law of the unconscious statistician (LOTUS): If is a discrete .v. and is a function from to , then where the sum is taken over all possible values of .
- The variance of an r.v. is The square root of the variance is called the standard deviation (SD): Recall that when we write , we mean the expectation of the random variable , not (which is 0 by linearity).
- For any r.v. ,
- For , we can write with , and then
Law of Large numbers
Let be i.i.d. with mean and variance . The law of large numbers says that as , the sample mean converges to the constant (with probability 1 ).
- But what is its distribution along the way to becoming a constant? This is addressed by the central limit theorem (CLT), which, as its name suggests, is a limit theorem of central importance in statistics.
Central Limit Theorem
For large , the distribution of is approximately . Of course, we already knew from properties of expectation and variance that has mean and variance ; the central limit theorem gives us the additional information that is approximately Normal with said mean and variance.
- The CLT states that for large , the distribution of after standardization approaches a standard Normal distribution. By standardization, we mean that we subtract , the expected value of , and divide by , the standard deviation of .
Markov Chain
A sequence of random variables taking values in the state space is called a Markov chain if for all ,
The quantity is called the transition probability from state to state . (subscripts indicate timestep, values of state space indicate states)
- When referring to a Markov chain we will implicitly assume that it is time-homogeneous, which means that the transition probability is the same for all times . But care is needed, since the literature is not consistent about whether to say “time-homogeneous Markov chain” or just “Markov chain”.
- The above condition is called the Markov property, and it says that given the entire past history , only the most recent term, , matters for predicting.
- The Markov property greatly simplifies computations of conditional probability: instead of having to condition on the entire past, we only need to condition on the most recent value.
- For more on Markov chains, see (Joseph Chang, 2007).