11 KiB
11 KiB
- #ST2001 - Statistics in Data Science I
- Previous Topic: Random Variables
- Next Topic: The Normal Distribution
- Relevant Slides:
- Often, the observations generated by different statistical experiments have the same type of behaviour.
- In general, only a handful of important probability distributions are needed to describe many of the discrete random variables encountered in practice.
-
Bernoulli Trials
collapsed:: true- What is a Bernoulli Trial? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:08:48.931Z
card-last-score:: 1
- A Bernoulli Trial is a random experiment with just two outcomes - success / failure.
- For a single trial, random variable:
-
X = \begin{cases}1, & \text{success,} \\0, & \text{failure.}\end{cases}
P(X = 1) = p
andP(X=0) = 1 -p
, wherep
is the success probability, or more compactly:-
P(X = x) = p^x{(1-p)^{1-x}} \ \ \ \ \ x = 0,1
-
-
- What is the expected value of a Bernoulli Trial? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T16:20:53.147Z
card-last-score:: 1
-
E[X] = (0)(1-p)+(1)p = p
-
- What is the variance of a Bernoulli Trial? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T16:24:49.061Z
card-last-score:: 1
-
Var(X) = p(1-p)
-
-
Bernoulli Trial Assumptions #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:24:43.818Z card-last-score:: 1- The outcomes of the trials are mutually independent.
- The probability of success
p
is constant over trials. - Note that these assumptions may not always be appropriate assumptions.
-
Example: Camera Flash Tests
id:: 6368f276-bc7e-4d91-b7fb-c5b34c4c6feb- The time to recharge the flash is tested in three mobile phone cameras. The probability that a camera passes the test is 0.8, and the cameras perform independently. background-color:: green
- The random variable
X
denotes the number of cameras that pass the test. The last column of the tables shows the values ofX
assigned to each outcome of the experiment. background-color:: green - What is the probability that the first & second cameras pass the test, and the third one fails? background-color:: green
- What is the probability that two cameras pass the test in three trials?
background-color:: green
- How many ways can this event happen?
-
\binom{n}{r} = \frac{n!}{r!(n-r)!} = \frac{3!}{2!(3-2)!} = 3
-
- What is the probability of this event?
- 0.128 for each of the three ways.
- Probability =
3(0.128) = 0.383
.
- This is an example of the Binomial Distribution.
- How many ways can this event happen?
- What is a Bernoulli Trial? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:08:48.931Z
card-last-score:: 1
-
The Binomial Distribution
- What is a binomial random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:25:45.051Z
card-last-score:: 1
- A random experiment consists of
n
Bernoulli trials such that:-
- The trials are independent.
- Each trial results in only two possible outcomes, labelled as "success" & "failure".
- The probability of a success in each trial, denotes as
p
, remains constant.
-
- The random variable
X
that equals the number of trials that result in a success has a binomial random variable with parameters0 < p < 1
andn = 1, 2, \cdots
. - The probability mass function of
X
is-
f(x) = \binom{n}{x}p^x (1-p)^{n-x} \ \ \ \ \ x = 0,1,\cdots, n
-
- A random experiment consists of
-
Example: Camera Flash Tests
- See ((6368f276-bc7e-4d91-b7fb-c5b34c4c6feb)) for whole question. background-color:: green
- Calculate the probability of 2 passes in 3 tests.
background-color:: green
- We are given that
n = 3
andp = 0.8
. - Use the Binomial Distribution formula where
X
is the number of passes:-
P(X = 2) = \binom{3}{2}(0.8)^2(0.2)^1 = 3(0.128) = 0.384
-
- We are given that
-
Example: Organic Pollution
id:: 6368f570-83e7-4642-a881-7ccd40bb0399- Each sample of water has a 10% chance of containing a particular organic pollutant. Assume that the sample are independent with regard to the presence of the pollutant. background-color:: green
- Find the probability that, in the next 18 samples, exactly 2 contain the pollutant.
background-color:: green
- Let
X
denote the number of samples that contain the pollutant in the next 18 samples analysed. ThenX
is a binomial random variable withp = 0.1
andn = 18
. -
P(X = 2) = \binom{18}{2}(0.1)^2(0.9)^{18-2} = 153(0.1)^2(0.9)^16 = 0.2835
- Let
- Determine the probability that
3 \leq X < 7
. background-color:: green-
X = 3,4,5,6
-
P(3 \leq X < 7) = P(X=3) + P(X=4) + P(X=5) + P(X=6)
-
\text{or}
-
P(3 \leq X < 7) = \sum^6_{x=3} \binom{18}{x}(0.1)^x(0.9)^{18-x}
-
= 0.168 + 0.070 + 0.022 + 0.005 = 0.265
-
-
Binomial Distributions in R
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:21:52.419Z card-last-score:: 1dbinom(x, size, prob)
, wherex
is the number of events required,size
is the total number of trials, &prob
is the probability of the event occurring.-
Example: Organic Pollution
- In ((6368f570-83e7-4642-a881-7ccd40bb0399)),
x=2
,size=18
, &p=0.10
. background-color:: green-
dbinom(x=2, size=18, prob=0.1) [1] 0.2835121
-
- In ((6368f570-83e7-4642-a881-7ccd40bb0399)),
-
Binomial Mean & Variance #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-22T00:00:00.000Z card-last-reviewed:: 2022-11-21T13:08:26.634Z card-last-score:: 1- If
X
is a binomial random variable with parametersp
&n
:- The mean & variance of the binomial distribution
b(x; n,p)
are-
\mu = np \text{ and } \sigma^2 = npq \text{, where } q = 1-p
-
- The mean & variance of the binomial distribution
- If
-
Chebyshev's Inequality
- What is Chebyshev's Inequality? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T16:23:31.513Z
card-last-score:: 1
- Chebyshev's Inequality provides an estimate as to where a certain percentage of observations will lie relative to the mean once the standard deviation is known.
- For example, at least 75% of values will lie within two standard deviations of the mean.
- What is Chebyshev's Inequality? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T16:23:31.513Z
card-last-score:: 1
- What is a binomial random variable? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-15T00:00:00.000Z
card-last-reviewed:: 2022-11-14T20:25:45.051Z
card-last-score:: 1
-
Poisson Distribution
- What are Poisson Experiments? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-22T00:00:00.000Z
card-last-reviewed:: 2022-11-21T13:05:40.034Z
card-last-score:: 1
- Experiments yielding numerical values of a random variable
X
, the number of outcomes occurring during a given time interval or in a specified region, are called Poisson Experiments. - The given time interval may be of any length, such as a minute, a day, a week, a month, or even a year.
- A Poisson Experiment is derived from the Poisson Process and possesses the following properties:
- The number of outcomes occurring one time interval or specified region of space is independent of the number that occur in any other disjoint time interval or region. In this sense, we say that the Poisson Process "has no memory".
- The probability that a single outcome will occur during a very short time interval or in a small region is proportional to the length of the time interval or the size of the region, and does not depend on the number of outcomes occurring outside this time interval or region.
- The probability that more than one outcome will occur in such a short time interval or fall in such a small region is negligible.
- Experiments yielding numerical values of a random variable
- What is the Poisson Distribution? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-22T00:00:00.000Z
card-last-reviewed:: 2022-11-21T13:06:55.129Z
card-last-score:: 1
- The random variable
X
that equals the number of events in a Poisson Process is a Poisson Random Variable with parameter\lambda > 0
, and the probability density function is-
f(x) = \frac{e^{-\lambda}\lambda^x}{x!} \text{ for } x = 0,1,2,3,\cdots
-
- The random variable
-
Mean & Variance of Poisson Distribution
- If
\lambda
is the average number of successes occurring in a given time interval or region in the Poisson Distribution, then the mean & the variance of the Poisson distribution are both equal to\lambda
.- Mean =
\lambda
, variance =\lambda
.
- Mean =
- A one parameter distribution.
- If
-
Poisson Density Functions for Different Means
-
Poisson Example: Calculations for Wire Flaws
- Suppose that the number of flaws on a thin copper wire follows a Poisson Distribution with a mean of 2.3 flaws per millimetre. background-color:: green
- Find the probability of exactly 2 flaws in 1mm of wire.
background-color:: green
-
P(X = 2) = \frac{e^{-2.3}2.3{2}}{2!} = 0.265
-
-
Poisson Example: Car Park
- A car park has 3 entrances,
A
,B
, &C
. The number of cars per hour entering through each of these is Poisson-distributed with mean\lambda_A = 1.5
,\lambda_B = 1.0
, and\lambda_C = 2.5
. Arrivals at each entrance are independent. background-color:: greenT
is the total number of cars entering in an hour.-
T \sim \text{ Poisson}(\lambda_A + \lambda_B + \lambda_C) \equiv \text{Poisson}(1.5 + 1.0 + 2.5) \equiv \text{Poisson}(5)
-
P(T = 4) = \frac{e^{-5} 5^4}{4!} = 0.1755
- A car park has 3 entrances,
-
Sum of Independent Poisson Random Variables #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T15:54:18.796Z card-last-score:: 1- If
X_1, X_2, \cdots, X_n
are independently Poisson distributed with parameters\lambda_1, \lambda_2, \cdots, \lambda_n
then-
T = X_1 + X_2 + \cdots + X_n \text{ is Poisson}(\lambda_1 + \lambda_2 + \cdots + \lambda_n)
-
- and
-
E[T] = \lambda_1 + \lambda_2 + \cdots + \lambda_n
-
- and
-
\text{Var}(T) = \lambda_1 + \lambda_2 + \cdots + \lambda_n
-
- If
- What are Poisson Experiments? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.5
card-next-schedule:: 2022-11-22T00:00:00.000Z
card-last-reviewed:: 2022-11-21T13:05:40.034Z
card-last-score:: 1