The Quick Start Guide To Probability

The Quick Start Guide To Probability

Value:
8/10
Published or Updated on
July 5, 2023

These notes follow along with Grinstead and Snell’s Introduction to Probability. They are written in a very short and concise manner with minimal written out explanations. The focus is on understanding how mathematical notation transfers to real world examples.

Table of Contents:

  1. Discrete Probability Distributions
  2. Continuous Probability Densities
  3. Combinatorics
  4. Conditional Probability

1. Discrete Probability Distributions

If the value of a variable $X$ is determined by chance, we call it a random variable(rv).

The sample space($\Omega$) is the set of possible values for $X$.

The elements of $\Omega$ are called outcomes.

A subset $E \subset \Omega$ is called an event.

If the sample space is finite or countably infinite then the rv is discrete.

If $X$ is an rv with sample space $\Omega$, a probability distribution function(pdf) for $X$ is a function $m:\Omega \rightarrow \mathbb{R}$ such that,

  1. $m(w) \geq 0$ for each $\omega \in\Omega$
  2. $\sum_{\omega \in \Omega}m(\Omega)=1$

For any event $E \subset \Omega$, we define the probability of E to be,

$$P(E)=\sum_{\omega \in \Omega}m(\omega)$$

Example: One Die Roll

Let $X$ represent the random variable which represents one die roll. What is the probability that a roll of the die will have a value $\leq 4$?

Solution,

Things we know:

$\Omega=\{1,2,3,4,5,6 \}$

$m(1)=m(2)=m(3)=m(4)=m(5)=m(6)=\frac{1}{6}$

The probability that a roll is less than or equal to 4 is,

$P(x \leq 4) = m(1)+m(2)+m(3)+m(4) = \frac{2}{3}$

Solution with different notation,

Define $E=\{1,2,3,4 \}$

$P(E)= m(1)+m(2)+m(3)+m(4) = \frac{2}{3}$

Example: One Coin Flip

$\Omega=\{H,T \}$

$m(H)=m(T)=\frac{1}{2}$

Example: Two Coin Flips

$\Omega=\{HH,TT,HT,TH \}$

$m(HH)=m(TT)=m(HT)=m(TH)= \frac{1}{4}$

Let $E$ be the event that at least one heads comes up.

$P(E)=m(HH)+m(HT)+m(TH)$

$P(E)=\frac{1}{4}+\frac{1}{4}+\frac{1}{4}=\frac{3}{4}$

Let $F$ be the event that heads comes up on the first toss.

$P(F)= m(HH)+m(HT)$

$P(F)=\frac{1}{4}+\frac{1}{4}=\frac{1}{2}$

Properties

Thm 1.1:

Given $(X,\Omega, m)$, we have

  1. $P(E) \geq 0, \hspace{.5cm }\forall E \subset \Omega$
  2. $P(\Omega)=1$
  3. If $E \subset F \subset \Omega$, then; $P(E) \leq P(F)$
  4. If $E \land F=\emptyset$ (disjoint), then $P(E \lor F)=P(E) +P(F)$
  5. $P(\neg E)=1-P(E)$

Thm 1.2:

If $E_1,...,E_n$ is a collection of pairwise disjoint subsets of $\Omega$, then $$P(E_1,...,E_n)=\sum_i^nP(E_i)$$

Thm 1.3:

Let $E_1,...,E_n$ be pairwise disjoint events with $\Omega=E_1 \cup...\cup E_n$. Then for any event $E$,

$$P(E)=\sum_{i=1}^nP(E \cap E_i)$$

Thm 1.4:

For any events $E, F \subset \Omega$ we have,

$$P(E\cup F) = P(E) + P(F) - P(E \cap F)$$

Example

Let $A$ and $B$ be events such that $P(A \cap B)=\frac{1}{4}, P(\neg A)=\frac{1}{3}, P(B)=\frac{1}{2}$. What is $P(A \cup B)$?

Solution

Things we know:

$P(A) = \frac{2}{3} \hspace{1cm} P(B)=\frac{1}{2}$

$P( \neg A) = \frac{1}{3} \hspace{.68cm} P(\neg B)=\frac{1}{2}$

$P(A \cap B)=\frac{1}{4}$

Applying thm 1.4,

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$

$ = \frac{1}{3} + \frac{1}{2} - \frac{1}{4} = \frac{11}{12}$

Tree Diagrams

If an experiment is repeated its helpful to think of a new sample space consisting of paths through a tree. In general given probability space $(X, \Omega , m)$, an experiment n times can be represented by:

Example Showing Three Coin Flips

Odds Ratio

If we are willing to bet that event $E$ occurs at "r to 1 odds" we mean that:

$$P(E)=rP(\neg E)$$

Since $P(\neg E)= 1-P(E)$, we can plug in and solve,

$$P(E)=r(1-P(E))$$

$$\Leftrightarrow P(E)=\frac{r}{r+1}$$

$$\Leftrightarrow r=\frac{P(E)}{1-P(E)}$$

Example

Georgia has a 71% chance to win the college football championship game. Represent this in an odds ratio format.

Solution

$r=\frac{.71}{1-.71}=2.45$ to $1$ odds Georgia will win.

General Case

If we are willing to bet that event $E$ occurs at "r to s odds" we mean;

$$\Leftrightarrow P(E)=\frac{\frac{r}{s}}{\frac{r}{s}+1}$$

$$\Leftrightarrow \frac{r}{s}=\frac{P(E)}{1-P(E)}$$

2. Continuous Probability Densities

Up until now we have talked about modeling experiments where the sample space is finite. Now we will talk about more general cases.

Suppose that $\Omega \subset \mathbb{R}^n$, and we have a function $\mu:\Omega \rightarrow \mathbb{R}$ such that.

1. $\mu(\omega) \geq 0$ for all $\omega \in \Omega$

2. $\int_{\Omega}\mu(\omega)dw=1$

We say that $\mu$ is a pdf for a continuous rv $X$ with outcomes in $\Omega$ if for each subset $E \subset \Omega$:

$$P(x \in E)=\int_{E}\mu(\omega)d\omega $$

where the integral is defined.

Example

1) We construct a spinner and take $X$ as the angle in radians of the final postion of the arrow of one spin. Find the probability of landing on an interval  $(a,b) \subset [0,2\pi]$.

Solution

$\Omega=[0,2\pi]$

$\mu(\omega)=\frac{1}{2\pi}$, This density function choice assumes every outcome is equally likely. For any interval $(a,b) \subset [0,2\pi]$ of angles,

$$P(x \in (a,b))=\int_a^b\frac{1}{2\pi}d\omega$$

$$=\frac{1}{2\pi}(b-a)$$

2) Suppose we raise a circular umbrella while it is raining. Let $X$ be the position at which the next raindrop hits the umbrella.

Things we know:

$\Omega=$ the unit circle

$\mu(\omega)=\frac{1}{\pi}$

Lets check that our properties match our density function and outcome space:

Recall from polar coordinates that, $dxdy=rdrd\theta$

$$P(X \in \Omega)=\int_\Omega\frac{1}{\pi}dxdy$$

$$=\int_0^{2\pi} \int_0^1 \frac{1}{\pi} rdrd\theta$$

$$=\frac{1}{\pi} \int_0^{2\pi} \int_0^1 rdrd\theta$$

$$=\frac{1}{\pi} \int_0^{2\pi} \frac{r^2}{2} |_{r=0}^{r=1} \hspace{.5cm}d\theta$$

$$=\frac{1}{\pi} \int_0^{2\pi} \frac{1}{2} d\theta=\frac{2\pi}{2\pi}=1$$

Integrating our density function over the entire outcome space $\Omega$, we get 1 as desired.

Define event $E=\{X$ is within $R$ of the center of the target$\}$ .

$$P(X \in E)=\int_E\frac{1}{\pi}dxdy$$</p><p id="">$$=\int_0^{2\pi} \int_0^R \frac{1}{\pi} rdrd\theta$$

$$=2\int_0^Rrdr=R^2$$

Now consider a new rv $Y=\{$ distance from the center of the disk of $X\}$. This rv has sample space $\Omega=[0,1]$, but what is the right density?

Cumulative Distribution Function

Let $X$ be a random variable with $\Omega=\mathbb{R}$. We define the cumulative distribution function(cdf) of $X$ to be the function $F_X(x)$ such that, $$F_X(x)=P(X\leq x)$$

Property

Thm: If $X$ is a random variable with $\Omega=\mathbb{R}$ and density function $\mu(x)$,then the cdf of $F_X(x)$ is,

$$F_X(x)=\int_{-\infty}^x \mu(t)dt$$

and

$$\frac{d}{dx}F_X(x)=\mu(x)$$

Continued Example 1

Consider a new rv $Y=\{$ distance from the center of the disk of $X\}$. This rv has sample space $\Omega=[0,1]$, but what is the right density?

Observe from the first worked example,

$P(X$ is within $R$ of the center of the target$)=R^2$

Thus

$P(Y \leq R)=R^2$

$F_Y(R)=R^2$ is the cdf;

Check using the thm,

$\mu(x)=\frac{d}{dr}R^2=2R$

$\int_0^12RdR=1$ so $P(\Omega_Y)=1$ as desired

Example 2

Suppose $X$ is uniformly distributed in $[0,1]$. $Y=X^2$.

Find cdf and pdf of $Y$.

Solution

We find the cdf first:

$F_Y(\omega)=P(Y \leq \omega)=P(E_{\omega})$

$=P(X^2 \leq \omega)$

$=P(X \leq \sqrt{\omega})$

$= \int_0^{\sqrt{\omega}}1ds \hspace{1cm}$(probability density is 1 since the distribution is uniform)

$=\sqrt{\omega}$

Now we find the pdf. Note: pdf = (cdf)':

$\mu_Y(\omega)=(\sqrt{\omega})'$

$=\frac{1}{2\sqrt{\omega}}$

Example 3

Suppose $X$ and $Y$ are chosen uniformly from $[0,1]$ and that $Z=X+Y$. Find the cdf and pdf of $Z$.

Solution

We find the cdf first:

$F_Z(\omega)=P(Z \leq \omega)=P(E_{\omega})$ $=P(X+Y \leq \omega)$

Notice we will have to define the cdf in a piecewise manner depending on the value of $\omega$. The shaded purple area is $Z$.

$F_Z(\omega)=\{\frac{1}{2}\omega^2 \hspace{2.1cm},0 \leq \omega \leq 1\}$

$\hspace{2cm}\{1-\frac{1}{2}(2-\omega)^2, 1<\omega \leq 2\}$

Now we find the pdf:

We know,

$X$ is distributed in $[0,1]$ with pdf $\mu_X(x)=1$

$Y$ is distributed in $[0,1]$ with pdf $\mu_Y(y)=1$

Since pdf = (cdf)' we know $Z$ will be distributed on $[0,2]$ with a piecewise pdf,

$\mu_Z=(\frac{1}{2}\omega^2)'=\omega \hspace{2cm},0 \leq \omega \leq 1$

$\hspace{.72cm}=\omega$

$\mu_Z=(1-\frac{1}{2}(2-\omega)^2)'\hspace{1cm},1<\omega \leq 2$

$\hspace{.72cm}=(-1+2\omega-\frac{\omega^2}{2})'$

 $\hspace{.72cm}=2-\omega \hspace{2.3cm}$

Survival Function

The survival function of a continuous random variable $X$ is given by:

$$S_X(t)=P(X>t)=1-F_X(t)=\int_t^{\infty}f(x)dx$$

  • This is the complement of the cdf function.
  • It gives the probability that the event has not occurred by duration t.
  • Another way to conceptualize: It gives the probability of being "alive" just before duration t.

Appendix

CDF - Probability of an event

PDF - not a probability of any particular event. Just a density function for the probabilities over the whole outcome space.

3. Combinatorics

Computing probability of events sometimes requires that we count the number of ways an event can occur. Enter new counting methods.

A product of sets $\Omega_1,...,\Omega_n$ is the set: $\Omega=\Omega_1\times\Omega_2\times...\times \Omega_n$

$\hspace{7.8cm}=\{(\omega_1,...,\omega_n)|\omega_i \in \Omega_i\}$

$\Omega$ is the number of elements in the sample space.

Pigeonhole Principle: If $f:\Omega_1 \rightarrow \Omega_2$ and $\#\Omega_2 < \#\Omega_1$, then $f$ is not one-to-one.

Falling Factorial $(n_r)$: $n_r$$=n\times(n-1)\times(n-2)\times...\times(n-r+1)$

Example: Birthday Problem

How many people do we need to have in a room to make the probability greater than $\frac{1}{2}$ that two people have the same birthday?

Solution

We know there are 365 possible birthdays for each person, thus $\#\Omega=365$.

We order the people from 1 to $r$.

For a sample point $\omega$, we choose a possible sequence of length $r$ birthdays, each chosen from one of 365 possible days.

Thus there $365^r$ possible sequences of birthdays.

We now must find the number of these sequences that have no duplicate birthdays(We compute the negation because it is easier to calculate).

We can choose any of the 365 days for the first element, then any of the 364 elements for the second, 363 for the third, and keep going until we make $r$ choices. For the $r$th choice, there will be $365 -r +1$ possibilities. Hence the total number of sequences with no duplication is

$$365\cdot364\cdot...(365-r+1)$$

Assuming each sequence is equally likely,

$$p_r=\frac{365\cdot364\cdot...(365-r+1)}{(365)^r}$$

The numerator is known as the falling factorial$(n_r)$ defined above which is read as "n down r"

$$p_r=\frac{(365)_r}{(365)^r}$$

Testing values for r we get,

$$p_{22}=\frac{(365)_{22}}{(365)^{22}}=.524$$

$$p_{23}=\frac{(365)_{23}}{(365)^{23}}=.493$$

The probability for no duplication changes from greater than one half to less than one half as we move from 22 to 23 people. Thus we need 23 people in a room to ensure that the probability is greater than $\frac{1}{2}$ that two people have the same birthday.

Permutations

A permutation of a finite set $\Omega$ is a one-to-one mapping of $\Omega$ onto itself.

\#permutations$(\Omega)$ $=(\#\Omega)!$

- Ex: Given $\Omega=\{a,b,c\}$, a valid permutation $\sigma$ can be represented as:

- $(c,a,b)$

- $\sigma(a)=c, \sigma(b)=a, \sigma(c)=b$

- $c<a<b$

- All are equivalent.

A k-permutation is an ordering of k elements of $Omega$

\#k-permutations$(\Omega)$ $=(\#\Omega)_k$

- Ex: A 2-permutation of $\{a,b,c\}$ is $b<c$.

If $\sigma$ is a permutation of $\Omega$ and $\sigma(\Omega)=\Omega$, then we say $\Omega$ is a fixed point of the permutation.

- Ex: If $\Omega=\{1,2,3,4,5\}$, an example of a permutation $\sigma$ with a fixed point is: $(3,1,4,2,5)$

If $\sigma$ has no fixed points it is called a derangement of $\Omega$.

Mind Boggling

If $\sigma \in$ Permutations$(\Omega)$ and all permutations are equally likely what is $P(\sigma$ is a derangement$)$?

A: $\frac{1}{e}$

Combinations

$n \choose j$ is the number of distinct j-element subsets of an n-element set. It is called a binomial coefficient.

- Ex: The subsets of $\{a,b,c\}$ are

- $\emptyset, \{a\},\{b\},\{c\},\{a,c\}, \{b,c\}, \{c,a\},\{a,b,c\}$

- $\binom{3}{0}=1,\hspace{.3cm} \binom{3}{1}=3, \hspace{.3cm}\binom{3}{2}=3,\hspace{.3cm} \binom{3}{3}=1$

- Notice: $\binom{3}{0}+ \binom{3}{1}+\binom{3}{2}+ \binom{3}{3}=2^3=8$

$$\binom{n}{j}=\frac{n_j}{j!}=\frac{n!}{j!(n-j)!}$$

See Wikepedia Poker Hand Probabilties

Bernoulli trials are a sequence of n independent chance experiments each with probability of success (p) and failure (q=1-p).

$$b(n,p,j)=\binom{n}{j}p^jq^{n-j}$$

The formula represents the probability of exactly j successes in n Bernoulli trials with successful probability p.

The outcomes of the above binomial process result in ordered triples of S's and F's. An outcome $\omega$ for the entire experiement will be a path through the tree. For example $\omega_7$ represents the outcomes FFS. Generally we assign a distribution function $m(\omega)$ to be the product of the branch probabilities along the path $\omega$.

Thus the probability that the three events occur, FFS, is the product of the probabilities for the individual events.

Example

1) A fair coin is tossed 6 times. What is the probability that exactly 3 heads turn up?

Solution

$b(6,\frac{1}{2},3)=\binom{6}{3}(\frac{1}{2})^3(\frac{1}{2})^{6-3}=\frac{6\cdot5\cdot4\cdot3\cdot2\cdot1}{(3\cdot2\cdot1)(3\cdot2\cdot1)}(\frac{1}{2})^3(\frac{1}{2})^3=0.3125$

2) A die is rolled 4 times. What is the probability that we roll exactly one 6?

Solution

$b(4,\frac{1}{6},1)=\binom{4}{1}(\frac{1}{6})^1(\frac{5}{6})^{4-1}=\frac{4\cdot3\cdot2\cdot1}{(1)(3\cdot2\cdot1)}(\frac{1}{6})^1(\frac{5}{6})^3=0.386$

Binomial Expansion

The quantity $(a+b)^n$ can be expressed in the form,

$$(a+b)^n=\sum_{j=0}^na^jb^{n-j}$$

See Pascal's Triangle

Recall Thm 1.4. We can now extend a more generalized version of this with binomial expansion defined above.

Inclusion-Exclusion Principle

Let $P$ be a probability distribution on a sample space $\Omega$, and let $\{A_1,A_2,...,A_n\} \subset \Omega$ be events. Then,

$$P(A_1 \lor A_2 \lor ...\lor A_n)=\sum_{i=1}^nP(A_i)-\sum_{1 \leq i < j \leq n}P(A_i \land A_j) + \sum_{1 \leq i < j \leq k \leq n}P(A_i \land A_j \land A_k)-...$$

proof

Suppose $\omega \in A_1 \lor...\lor A_n$.

Then $P(\omega)$ is added one time on the left. We will show it is added once on the right.

Assume $\omega$ is in exactly $k$ of the sets. Then its probability is added $k$ times to the first term, subtracted $\binom{k}{2}$ times in the second, added $\binom{k}{3}$ times, and so on. Thus the total number of times it is added is,

$$\binom{k}{1}-\binom{k}{2}+\binom{k}{3}-...(-1)^{k-1}\binom{k}{k}$$

But,

$$0=(1-1)^k=\sum_{j=0}^k\binom{k}{j}(-1)^j=\binom{k}{0}-\sum_{j=1}^k\binom{k}{j}(-1)^{j-1}$$

So the sum of the right hand side is 1.

Example: Counting Derangements

Recall a permutation $\sigma$  is called a derangement if there are no fixed points.

We know there are $n!$ permutations of $\{1,...,n \}$. How many of these are derangements?

Solution

We proceed by computing the number of permutations with fixed points.

Let $A_i$ be the event that "i is fixed under this permutation".

Then,

$$P(A_1 \lor A_2 \lor ... \lor A_n)= \text{Probability that } \sigma \text{ has one or more fixed points}$$

Then,

$$P(A_i)=\text{\# of permutations where } \sigma(i)=i$$

$$=\frac{(n-1)!}{n!}=\frac{1}{n}$$

$$P(A_i \land A_j)=\text{\# of permutations whith } \sigma(i)=i, \sigma(j)=j$$

$$=\frac{(n-2)!}{n!}=\frac{1}{n(n-1)}$$

So by the inclusion exclusion principle,

$$P(A_1 \lor A_2 \lor ... \lor A_n)=\binom{n}{1}\cdot\frac{1}{n}-\binom{n}{2}\cdot\frac{1}{n(n-1)}+\binom{n}{3}\cdot\frac{1}{n(n-1)(n-2)}-...$$

$$=1-\frac{n_2}{2!}\cdot\frac{1}{n_2}+\frac{n_3}{3!}\cdot\frac{1}{n_3}-...+\frac{1}{n!}$$

$$=1-P(\text{no fixed points})$$

So, $P(\text{no fixed points})=\frac{1}{2!}-\frac{1}{3!}+\frac{1}{4!}-...\pm\frac{1}{n!}$

$$=(1-\frac{1}{n})^n \hspace{.5cm} \approx \frac{1}{e}\hspace{.5cm}\text{ for large n  }$$

4. Conditional Probability

Suppose we assign a distribution function to a sample space and then learn that an event $E$ has occured. How should we change the probabilities of the remaining events? Enter conditional probability. We call the new probability for an event $F$ the conditional probability of $F$ given $E$ and denote it by $P(F/E).$

The conditional probability of $F$ occuring, given that $E$ occurs,

$$P(F/E)=\frac{P(F\cap E)}{P(E)}$$

Given a space $\Omega$ of outcomes and a pdf $\mu$, and an event $E$. We can define the conditional distribution $\mu_E$ on $\Omega$ based on event $E$.

- Ex: $\Omega=\{1,2,3,4,5,6 \}, \mu=\frac{1}{6},...,\frac{1}{6}$. Given that $E=\{2\}$ and $F=\{2,4,6\}$:

- $\mu_E=(0,1,0,0,0,0)$

- $\mu_F=(0,\frac{1}{3},0,\frac{1}{3},0,\frac{1}{3})$

Let $E$ and $F$ be two events. We say they are** independent** if either:

1) Both events have a positive probability and $P(E/F)=P(E)\text{ and } P(F/E)=P(F)$ or

2) At least one of the events has probability 0.

Properties

Another way to check for independence:

1) Two events $E$ and $F$ are *independent** iff: $P(E\cap F)=P(E)\cdot P(F)$*

proof $\Rightarrow$

We know $P(F/E)=\frac{P(F\cap E)}{P(E)}=P(F)$. So multiplying by $P(E)$ we get

$$P(F\cap E)=P(E)\cdot P(F)$$

$\Leftarrow$

We know $P(E\cap F)=P(E)\cdot P(F)$, so divide by either $P(E)$ or $P(F)$

2) We say that $\{A_1,...,A_n\}$ are *mutually independent** if for every subset $\{A_1,...,A_m \}$ of these events we have,

$$P(A_1 \cap A_2 \cap...\cap A_m)=P(A_1)\cdot P(A_2)\cdot \cdot \cdot P(A_m)$$*

Example

Prove that if $E$ and $F$ are independent events, then $\neg E$ and $\neg F$ are independent.

Solution

We want to show $P(\neg E\cap \neg F)=P(\neg E)\cdot P(\neg F)$

$P(\neg E\cap \neg F)=P(\neg(E \cup F))$

$\hspace{2.76cm}=1-P(E\cup F)$

$\hspace{2.76cm}=1-(P(E)+P(F)-P(E\cap F))$

$\hspace{2.76cm}=1-P(E)-P(F)+P(E\cap F))$

Since $E$ and $F$ are independent,

$\hspace{2.76cm}=1-P(E)-P(F)+P(E)P(F)$

Factoring,

$\hspace{2.76cm}=(1-P(E))\cdot (1-P(F))$

$\hspace{2.76cm}=P(\neg E)\cdot P(\neg F)$

Joint Distribution Functions

Given random variables $X_1,...,X_n$ with outcomes $\Omega_1,...,\Omega_n$, and mass functions $\mu_1,...,\mu_n$, we can define the joint random variable

$\overrightarrow{x}$  by,

$$\Omega=\Omega_1 \times\Omega_2 \times ...\times\Omega_n$$

with mass function,

$$\mu(\overrightarrow{\omega)}=\text{probability of } (\omega_1,...,\omega_n), \text{where }\omega_i\in\Omega_i$$

- If the random variables $X_1,...,X_n$ are mutually independent, then $\mu(\overrightarrow{\omega)}=\mu_1(\omega_1)\mu_2(\omega_2)\cdot \cdot \cdot \mu_n(\omega_n)$

Given any distribution $\mu$ on $\Omega=\Omega_1 \times...\times \Omega_n$, we can define a marginal distribution on $\Omega_i$ by,

$$\mu_i(\omega_i)=P(\{(\omega_1,...,\omega_i,...\omega_n)|\omega_j \in \Omega_j \text{ for all } j \neq i\})$$

- If the random variables are independent then the marginal distribution is equal to the $\mu_i$