Probability

The probability, \(P(E)\), of an event is the ratio of the number of ways it can occur to the number of all possible outcomes.

Some properties:


Example

Cards: 52 cards, 4 suits, 13 cards per suit.
1 (ace), 2-10, 3 face (king, queen, jack).
2 red: diamonds and hearts.
2 black: spades and clubs.

Probability functions

The probability mass function is the probability that a discrete random variable equals some value.

The probability density function is the probability that a continuous random variable equals some value.

The probability that a random variable falls within a range is the area under the \(PDF\). The \(PDF\) must be nonnegative and the total area under it must be \(1\).

The cumulative distribution function of a random variable \(X\) is \(CDF(x) = P(X \leq x)\).

The survivor function of a random variable \(X\) is \(S(x) = P(X > x) = 1 - CDF(x)\).

The quantile \(v\) of a CDF is the point \(x_v\) at which \(CDF(x_v) = v\). The percentile is a quantile where \(v\) is expressed as a percentage.

Conditional Probability

If we ask about the chance of \(B\) given that \(A\) occurs, we find this conditional probability is the probability both \(A\) and \(B\) occur divided by the probability of \(A\):

\[ P(B|A) = \frac{P(A \& B)}{P(A)} \]

Bayes rule:

\[ P(A|B) = \frac{P(A \& B)}{P(B)} = \frac{P(B|A) P(A)}{P(B)} \]

and since \(P(B) = P(B|A) P(A) + P(B|\neg A) P(\neg A)\), we find:

\[ P(A|B) = \frac{P(B|A) P(A)}{P(B|A) P(A) + P(B|\neg A) P(\neg A)} \]

Medical diagnostics

In medicine these concepts are used to understand the efficacy of diagnostic tests.

  • sensitivity: \(P(\text{positive test when patient is positive}) = P(+|D)\)
  • specificity: \(P(\text{negative test when the patient is negative}) = P(-|\neg D)\)

By definition, \(P(+|\neg D) = 1 - P(-|\neg D)\).

  • positive predictive value: \(P(D|+) = \frac{P(+|D) P(D)}{P(+|D) P(D) + P(+|\neg D) P(\neg D)}\)
  • negative predictive value: \(P(\neg D|-) =\frac{P(-|\neg D) P(\neg D)}{P(-|\neg D) P(\neg D) + P(-|D) P(D)}\)

The likelihood ratio: \(LR_+ = \frac{\text{sensitivity}}{1 - \text{specificity}} = \frac{P(+ | D)}{P(+ | \neg D)}\)

  • diagnostic likelihood ratio positive test: \(DLR_+ = \frac{P(+|D)}{P(+|\neg D)}\) (generally large)

  • diagnostic likelihood ratio negative test: \(DLR_- = \frac{P(-|D)}{P(-|\neg D)}\) (generally small)

  • post-test odds given positive: \(\frac{P(D|+)}{P(\neg D|+)} = \frac{P(+|D) P(D)}{P(+|\neg D) P(\neg D)}\)

  • post-test odds given negative: \(\frac{P(D|-)}{P(\neg D|-)} = \frac{P(-|D) P(D)}{P(-|\neg D) P(\neg D)}\)

Expectations

A measure of central tendency. The sum over outcomes weighted by each of their probabilities.

For discrete \(X\) with \(PMF\) \(p(x)\), the expected value is the sum:

\[ E(X) = \sum_i {x_i p(x_i)}. \]

If each outcome has the same probability, this is the mean. For \(n\) values \(x_i\) where \(p(x) = 1/n\),

\[ \mu = E(X) = \frac{1}{n} \sum_i {x_i}. \]

It is linear, so that with constants \(a, b, c\):

\[ \begin{align} E(aX) &= a E(X), \\ E(X + Y) &= E(X) + E(Y), \\ E(bX + cY) &= bE(X) + cE(Y). \end{align} \]

For continuous \(X\) with \(PDF\) \(f(t)\), the expected value is the integral:

\[ E(X) = \int_t {t f(t)}. \]

Variance

A measure of spread. The expected value of the squared differences from the mean.

\[ \begin{align} \text{Var}(X) &= E((X - \mu)^2) \\ &= E(X^2) - E(2 \mu X) + E(\mu^2) \\ &= E(X^2) - 2 \mu E(X) + E(\mu^2) \\ \end{align} \]

and since \(\mu = E(X)\) and \(E(\mu^2) = E(E(X)^2) = E(X)^2\),

\[ \begin{align} &= E(X^2) - 2E(X)^2 + E(X)^2 \\ &= E(X^2) - E(X)^2 \\ &= \frac{1}{n} \sum_i {x_i^2} - \left ( \frac{1}{n} \sum_i {x_i} \right )^2 \\ &= \frac{1}{n} \left ( \sum_i {x_i^2} - \frac{1}{n} \left ( \sum_i {x_i} \right )^2 \right ) \end{align} \]


Interpretation

From Chebyshev’s inequality, the probability of a random variable being more than k standard deviations from its mean is:

\[ P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}, \]

or, for the first few deviations,

\[ \begin{align} P(|X - \mu| \geq 1 \sigma) & \leq 1 \\ P(|X - \mu| \geq 2 \sigma) & \leq 0.25 \\ P(|X - \mu| \geq 3 \sigma) & \leq 0.1111111. \end{align} \]

This is more general that the 68-95-99.7 rule of the normal distribution, as it applies to any distribution that has a defined mean and variance.


Central limit theorem ->


Orbits! | code