The probability, \(P(E)\), of an event is the ratio of the number of ways it can occur to the number of all possible outcomes.
Some properties:
Example
Cards: 52 cards, 4 suits, 13 cards per suit.
1 (ace), 2-10, 3 face (king, queen, jack).
2 red: diamonds and hearts.
2 black: spades and clubs.
The probability mass function is the probability that a discrete random variable equals some value.
The probability density function is the probability that a continuous random variable equals some value.
The probability that a random variable falls within a range is the area under the \(PDF\). The \(PDF\) must be nonnegative and the total area under it must be \(1\).
The cumulative distribution function of a random variable \(X\) is \(CDF(x) = P(X \leq x)\).
The survivor function of a random variable \(X\) is \(S(x) = P(X > x) = 1 - CDF(x)\).
The quantile \(v\) of a CDF is the point \(x_v\) at which \(CDF(x_v) = v\). The percentile is a quantile where \(v\) is expressed as a percentage.
If we ask about the chance of \(B\) given that \(A\) occurs, we find this conditional probability is the probability both \(A\) and \(B\) occur divided by the probability of \(A\):
\[ P(B|A) = \frac{P(A \& B)}{P(A)} \]
\[ P(A|B) = \frac{P(A \& B)}{P(B)} = \frac{P(B|A) P(A)}{P(B)} \]
and since \(P(B) = P(B|A) P(A) + P(B|\neg A) P(\neg A)\), we find:
\[ P(A|B) = \frac{P(B|A) P(A)}{P(B|A) P(A) + P(B|\neg A) P(\neg A)} \]
In medicine these concepts are used to understand the efficacy of diagnostic tests.
By definition, \(P(+|\neg D) = 1 - P(-|\neg D)\).
The likelihood ratio: \(LR_+ = \frac{\text{sensitivity}}{1 - \text{specificity}} = \frac{P(+ | D)}{P(+ | \neg D)}\)
diagnostic likelihood ratio positive test: \(DLR_+ = \frac{P(+|D)}{P(+|\neg D)}\) (generally large)
diagnostic likelihood ratio negative test: \(DLR_- = \frac{P(-|D)}{P(-|\neg D)}\) (generally small)
post-test odds given positive: \(\frac{P(D|+)}{P(\neg D|+)} = \frac{P(+|D) P(D)}{P(+|\neg D) P(\neg D)}\)
post-test odds given negative: \(\frac{P(D|-)}{P(\neg D|-)} = \frac{P(-|D) P(D)}{P(-|\neg D) P(\neg D)}\)
A measure of central tendency. The sum over outcomes weighted by each of their probabilities.
For discrete \(X\) with \(PMF\) \(p(x)\), the expected value is the sum:
\[ E(X) = \sum_i {x_i p(x_i)}. \]
If each outcome has the same probability, this is the mean. For \(n\) values \(x_i\) where \(p(x) = 1/n\),
\[ \mu = E(X) = \frac{1}{n} \sum_i {x_i}. \]
It is linear, so that with constants \(a, b, c\):
\[ \begin{align} E(aX) &= a E(X), \\ E(X + Y) &= E(X) + E(Y), \\ E(bX + cY) &= bE(X) + cE(Y). \end{align} \]
For continuous \(X\) with \(PDF\) \(f(t)\), the expected value is the integral:
\[ E(X) = \int_t {t f(t)}. \]
A measure of spread. The expected value of the squared differences from the mean.
\[ \begin{align} \text{Var}(X) &= E((X - \mu)^2) \\ &= E(X^2) - E(2 \mu X) + E(\mu^2) \\ &= E(X^2) - 2 \mu E(X) + E(\mu^2) \\ \end{align} \]
and since \(\mu = E(X)\) and \(E(\mu^2) = E(E(X)^2) = E(X)^2\),
\[ \begin{align} &= E(X^2) - 2E(X)^2 + E(X)^2 \\ &= E(X^2) - E(X)^2 \\ &= \frac{1}{n} \sum_i {x_i^2} - \left ( \frac{1}{n} \sum_i {x_i} \right )^2 \\ &= \frac{1}{n} \left ( \sum_i {x_i^2} - \frac{1}{n} \left ( \sum_i {x_i} \right )^2 \right ) \end{align} \]
Interpretation
From Chebyshev’s inequality, the probability of a random variable being more than k standard deviations from its mean is:
\[ P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}, \]
or, for the first few deviations,
\[ \begin{align} P(|X - \mu| \geq 1 \sigma) & \leq 1 \\ P(|X - \mu| \geq 2 \sigma) & \leq 0.25 \\ P(|X - \mu| \geq 3 \sigma) & \leq 0.1111111. \end{align} \]
This is more general that the 68-95-99.7 rule of the normal distribution, as it applies to any distribution that has a defined mean and variance.