6.3 KiB
6.3 KiB
- #ST2001 - Statistics in Data Science I
- Previous Topic: Probability
- Next Topic: Discrete Probability Distributions: Binomial & Poisson
- Relevant Slides:
-
Random Variables
- What is a random variable? #card
card-last-interval:: 4
card-repeats:: 2
card-ease-factor:: 2.32
card-next-schedule:: 2022-11-21T20:17:50.156Z
card-last-reviewed:: 2022-11-17T20:17:50.157Z
card-last-score:: 3
- A random variable is a function that associates a real number with each element in the sample space.
- The probability distribution of a random variable
X
gives the probability for each value ofX
. - A random variable takes a numeric value based on the outcome of a random event.
- Random variables are denoted by a capital letter -
X
,Y
,Z
, etc. - A particular value of a random variable will be denoted with a lower case letter -
x
,y
,z
, etc.
- What are the two types of random variables? #card
card-last-interval:: 28.3
card-repeats:: 4
card-ease-factor:: 2.66
card-next-schedule:: 2022-12-13T03:20:55.056Z
card-last-reviewed:: 2022-11-14T20:20:55.056Z
card-last-score:: 5
- There are two types of random variables:
- Discrete random variables can take one of a finite number of distinct outcomes.
- Continuous random variables can take any numeric value within a range of values.
- There are two types of random variables:
- What is a random variable? #card
card-last-interval:: 4
card-repeats:: 2
card-ease-factor:: 2.32
card-next-schedule:: 2022-11-21T20:17:50.156Z
card-last-reviewed:: 2022-11-17T20:17:50.157Z
card-last-score:: 3
-
Probability Distributions
-
Discrete Probability Distributions
- What is the probability distribution of some discrete random variable
X
? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:14:46.764Z card-last-score:: 1- The set of ordered pairs
(x, f(x))
is a probability function, probability mass function (pmf), or probability distribution of the discrete random variableX
if, for each possible outcomex
:-
f(x) \geq 0
,
-
\displaystyle \sum_n f(x) = 1
,
-
P(X = x) = f(x)
.
-
- The set of ordered pairs
- What is the cumulative distribution function of a discrete random variable
X
? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:19:37.665Z card-last-score:: 1- The cumulative distribution function is the probability that a random variable
X
with a given probability distribution will be ^^found at a value less than or equal to^^x
. - The cumulative distribution function
F(x)
of a discrete random variableX
with probability distributionf(x)
is:-
F(x) = P(X \leq x) = \sum_{t \leq x} f(t), \text{ for } - \infty < x < \infty
-
- The cumulative distribution function is the probability that a random variable
- What is the probability distribution of some discrete random variable
-
Continuous Probability Distributions
- What is the probability distribution function for a continuous random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-18T00:00:00.000Z
card-last-reviewed:: 2022-11-17T20:25:33.722Z
card-last-score:: 1
- The function
f(x)
is a probability distribution function (pdf) for a continuous random variableX
, defined over a set of real numbers, if:-
f(x) \geq 0, \text{ for all } x \in R
,
-
\int^{\infty}_{- \infty} f(x) dx = 1
,
-
P(a < X < b) = \int^{b}_{a} f(x)dx
.
-
- Note:
P(X = x) = 0
, i.e., there is no area exactly atx
.
- The function
- What is the probability distribution function for a continuous random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-18T00:00:00.000Z
card-last-reviewed:: 2022-11-17T20:25:33.722Z
card-last-score:: 1
-
-
Expected Value - Location
- What is expected value for a discrete random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-19T00:00:00.000Z
card-last-reviewed:: 2022-11-18T18:33:23.280Z
card-last-score:: 1
- The average, or expected value of a random variable is denoted by
E[X]
&\mu
. - It can be found by summing the products of each possible value multiplied by the probability that it occurs:
-
\mu = E[X] = \sum_x xP(X = x)
-
- The average, or expected value of a random variable is denoted by
- What is the expected value for a continuous random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-18T00:00:00.000Z
card-last-reviewed:: 2022-11-17T20:18:12.187Z
card-last-score:: 1
- A useful summary of interest is the average, or expected value of a random variable.
- The expected value is denoted by
E[X]
&\mu
.
- The expected value is denoted by
- The expected value of a continuous random variable can be found by:
-
\mu = E(X) = \int_{-\infty}^{\infty} xf(x)dx
- A useful summary of interest is the average, or expected value of a random variable.
- What is expected value for a discrete random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-19T00:00:00.000Z
card-last-reviewed:: 2022-11-18T18:33:23.280Z
card-last-score:: 1
-
Variance, Standard Deviation - Spread
- What is the variance & hence the standard deviation of a discrete random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.36
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:18:19.897Z
card-last-score:: 1
- The variance of a discrete random variable measures the squared deviation from the mean:
-
\sigma^2 = \text{Var}(X) = E[(X - \mu)^2] = \sum_x (x - \mu)^2 P(X =x)
-
- Alternatively, variance can be calculated by:
-
\text{Var}(X) = E(X^2) - E^2(X)
- Where
-
E(X^2) = \sum x^2P(X = x)
-
-
- Or, more usefully, the standard deviation is:
-
\sigma = \text{sd}(X) = \sqrt{\text{Var}(X)}
-
- The standard deviation has the advantage of being in the same units as
X
(&\mu
).
- The variance of a discrete random variable measures the squared deviation from the mean:
- What is the variance of a continuous random variable? #card
card-last-score:: 1
card-repeats:: 1
card-next-schedule:: 2022-11-19T00:00:00.000Z
card-last-interval:: -1
card-ease-factor:: 2.5
card-last-reviewed:: 2022-11-18T18:35:05.927Z
- The variance of a continuous random variable is:
-
\text{var}(X) = \int_{-\infty}^{\infty} (x - \mu)^2 f(x)dx
- What is the variance & hence the standard deviation of a discrete random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.36
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:18:19.897Z
card-last-score:: 1
-
Means & Variances
- Adding or subtracting a constant from data shifts the mean, but does not change the variance or the standard deviation.
-
E[X +c] = E[X] +c, \ \ \text{Var}(X+c) = \text{Var}(X), \ \ \text{sd}(X + c) = sd(X)
-
E[X -c] = E[X] -c,\ \ \text{Var}(X -c) = \text{Var}(X), \ \ \text{sd}(X - c) = \text{sd}(X)
-
- Multiplying a random variable by a constant multiplies the mean by that constant, and the variance by the square of that constant.
-
E[aX] = aE[X], \ \ \text{Var}(aX) = a^2 \text{Var}(X), \ \ \text{sd}(aX) = |a|\text{sd}(X)
-
- Adding or subtracting a constant from data shifts the mean, but does not change the variance or the standard deviation.