122 lines
6.3 KiB
Markdown
122 lines
6.3 KiB
Markdown
- #[[ST2001 - Statistics in Data Science I]]
|
|
- **Previous Topic:** [[Probability]]
|
|
- **Next Topic:** [[Discrete Probability Distributions: Binomial & Poisson]]
|
|
- **Relevant Slides:** 
|
|
-
|
|
- # Random Variables
|
|
- What is a **random variable**? #card
|
|
card-last-interval:: 4
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.32
|
|
card-next-schedule:: 2022-11-21T20:17:50.156Z
|
|
card-last-reviewed:: 2022-11-17T20:17:50.157Z
|
|
card-last-score:: 3
|
|
- A **random variable** is a function that associates a real number with each element in the sample space.
|
|
- The probability distribution of a random variable $X$ gives the probability for each value of $X$.
|
|
- A random variable takes a **numeric** value based on the outcome of a random event.
|
|
- Random variables are denoted by a capital letter - $X$, $Y$, $Z$, etc.
|
|
- A particular value of a random variable will be denoted with a lower case letter - $x$, $y$, $z$, etc.
|
|
- What are the two types of random variables? #card
|
|
card-last-interval:: 28.3
|
|
card-repeats:: 4
|
|
card-ease-factor:: 2.66
|
|
card-next-schedule:: 2022-12-13T03:20:55.056Z
|
|
card-last-reviewed:: 2022-11-14T20:20:55.056Z
|
|
card-last-score:: 5
|
|
- There are two types of random variables:
|
|
- **Discrete** random variables can take one of a finite number of distinct outcomes.
|
|
- **Continuous** random variables can take any numeric value within a range of values.
|
|
-
|
|
-
|
|
- # Probability Distributions
|
|
- ## Discrete Probability Distributions
|
|
- What is the **probability distribution** of some discrete random variable $X$? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.5
|
|
card-next-schedule:: 2022-11-15T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-14T20:14:46.764Z
|
|
card-last-score:: 1
|
|
- The set of ordered pairs $(x, f(x))$ is a **probability function**, **probability mass function** (pmf), or **probability distribution** of the discrete random variable $X$ if, for each possible outcome $x$:
|
|
- 1. $f(x) \geq 0$,
|
|
- 2. $\displaystyle \sum_n f(x) = 1$,
|
|
- 3. $P(X = x) = f(x)$.
|
|
- What is the **cumulative distribution function** of a discrete random variable $X$? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.5
|
|
card-next-schedule:: 2022-11-18T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-17T20:19:37.665Z
|
|
card-last-score:: 1
|
|
- The **cumulative distribution function** is the probability that a random variable $X$ with a given probability distribution will be ^^found at a value less than or equal to^^ $x$.
|
|
- The **cumulative distribution function** $F(x)$ of a discrete random variable $X$ with probability distribution $f(x)$ is:
|
|
- $$F(x) = P(X \leq x) = \sum_{t \leq x} f(t), \text{ for } - \infty < x < \infty$$
|
|
- ## Continuous Probability Distributions
|
|
- What is the **probability distribution function** for a continuous random variable? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.5
|
|
card-next-schedule:: 2022-11-18T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-17T20:25:33.722Z
|
|
card-last-score:: 1
|
|
- The function $f(x)$ is a **probability distribution function** (pdf) for a continuous random variable $X$, defined over a set of real numbers, if:
|
|
- 1. $f(x) \geq 0, \text{ for all } x \in R$,
|
|
- 2. $\int^{\infty}_{- \infty} f(x) dx = 1$,
|
|
- 3. $P(a < X < b) = \int^{b}_{a} f(x)dx$.
|
|
- **Note:** $P(X = x) = 0$, i.e., there is no area exactly at $x$.
|
|
-
|
|
-
|
|
- # Expected Value - Location
|
|
- What is **expected value** for a **discrete** random variable? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.5
|
|
card-next-schedule:: 2022-11-19T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-18T18:33:23.280Z
|
|
card-last-score:: 1
|
|
- The average, or **expected value** of a random variable is denoted by $E[X]$ & $\mu$.
|
|
- It can be found by summing the products of each possible value multiplied by the probability that it occurs:
|
|
- $$\mu = E[X] = \sum_x xP(X = x)$$
|
|
- What is the **expected value** for a **continuous** random variable? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.5
|
|
card-next-schedule:: 2022-11-18T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-17T20:18:12.187Z
|
|
card-last-score:: 1
|
|
- A useful summary of interest is the average, or **expected value** of a random variable.
|
|
- The **expected value** is denoted by $E[X]$ & $\mu$.
|
|
- The **expected value** of a ***continuous*** random variable can be found by:
|
|
- $$\mu = E(X) = \int_{-\infty}^{\infty} xf(x)dx$$
|
|
- # Variance, Standard Deviation - Spread
|
|
- What is the **variance** & hence the **standard deviation** of a discrete random variable? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.36
|
|
card-next-schedule:: 2022-11-15T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-14T20:18:19.897Z
|
|
card-last-score:: 1
|
|
- The **variance** of a **discrete** random variable measures the squared deviation from the mean:
|
|
- $$\sigma^2 = \text{Var}(X) = E[(X - \mu)^2] = \sum_x (x - \mu)^2 P(X =x)$$
|
|
- Alternatively, variance can be calculated by:
|
|
- $$\text{Var}(X) = E(X^2) - E^2(X)$$
|
|
- Where
|
|
- $$E(X^2) = \sum x^2P(X = x)$$
|
|
- Or, more usefully, the **standard deviation** is:
|
|
- $$\sigma = \text{sd}(X) = \sqrt{\text{Var}(X)}$$
|
|
- The standard deviation has the advantage of being in the same units as $X$ (& $\mu$).
|
|
- What is the **variance** of a ***continuous*** random variable? #card
|
|
card-last-score:: 1
|
|
card-repeats:: 1
|
|
card-next-schedule:: 2022-11-19T00:00:00.000Z
|
|
card-last-interval:: -1
|
|
card-ease-factor:: 2.5
|
|
card-last-reviewed:: 2022-11-18T18:35:05.927Z
|
|
- The **variance** of a **continuous** random variable is:
|
|
- $$\text{var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x)dx$$
|
|
- # Means & Variances
|
|
- Adding or subtracting a constant from data shifts the mean, but does not change the variance or the standard deviation.
|
|
- $$E[X +c] = E[X] +c, \ \ \text{Var}(X+c) = \text{Var}(X), \ \ \text{sd}(X + c) = sd(X)$$
|
|
- $$E[X -c] = E[X] -c,\ \ \text{Var}(X -c) = \text{Var}(X), \ \ \text{sd}(X - c) = \text{sd}(X)$$
|
|
- Multiplying a random variable by a constant multiplies the mean by that constant, and the variance by the *square* of that constant.
|
|
- $$E[aX] = aE[X], \ \ \text{Var}(aX) = a^2 \text{Var}(X), \ \ \text{sd}(aX) = |a|\text{sd}(X)$$ |