- #[[ST2001 - Statistics in Data Science I]] - **Previous Topic:** [[Random Variables]] - **Next Topic:** [[The Normal Distribution]] - **Relevant Slides:** ![Topic 6 - Binomial and Poisson.pdf](../assets/Topic_6_-_Binomial_and_Poisson_1665414148124_0.pdf) - - Often, the observations generated by different statistical experiments have the same type of behaviour. - In general, only a handful of important probability distributions are needed to describe many of the discrete random variables encountered in practice. - - # Bernoulli Trials collapsed:: true - What is a **Bernoulli Trial**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:08:48.931Z card-last-score:: 1 - A **Bernoulli Trial** is a random experiment with just two outcomes - success / failure. - For a single trial, random variable: - $$X = \begin{cases}1, & \text{success,} \\0, & \text{failure.}\end{cases}$$ - $P(X = 1) = p$ and $P(X=0) = 1 -p$, where $p$ is the success probability, or more compactly: - $$P(X = x) = p^x{(1-p)^{1-x}} \ \ \ \ \ x = 0,1$$ - What is the **expected value** of a Bernoulli Trial? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:20:53.147Z card-last-score:: 1 - $$E[X] = (0)(1-p)+(1)p = p$$ - What is the **variance** of a Bernoulli Trial? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:24:49.061Z card-last-score:: 1 - $$Var(X) = p(1-p)$$ - ## Bernoulli Trial Assumptions #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:24:43.818Z card-last-score:: 1 - The outcomes of the trials are mutually **independent**. - The probability of success $p$ is **constant** over trials. - Note that these assumptions may not always be appropriate assumptions. - ## Example: Camera Flash Tests id:: 6368f276-bc7e-4d91-b7fb-c5b34c4c6feb - The time to recharge the flash is tested in three mobile phone cameras. The probability that a camera passes the test is 0.8, and the cameras perform independently. background-color:: green - The random variable $X$ denotes the number of cameras that pass the test. The last column of the tables shows the values of $X$ assigned to each outcome of the experiment. background-color:: green - What is the probability that the first & second cameras pass the test, and the third one fails? background-color:: green - ![image.png](../assets/image_1667822368192_0.png) - Each camera test can be treated as a **Bernoulli Trial**. - $$P(PPF) = (0.8)(0.8)(0.2) = 0.128$$ - What is the probability that two cameras pass the test in three trials? background-color:: green - How many ways can this event happen? - $$\binom{n}{r} = \frac{n!}{r!(n-r)!} = \frac{3!}{2!(3-2)!} = 3$$ - What is the probability of this event? - 0.128 for each of the three ways. - Probability = $3(0.128) = 0.383$. - This is an example of the **Binomial Distribution**. - - # The Binomial Distribution - What is a **binomial random variable**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T20:25:45.051Z card-last-score:: 1 - A random experiment consists of $n$ Bernoulli trials such that: - 1. The trials are independent. 2. Each trial results in only two possible outcomes, labelled as "success" & "failure". 3. The probability of a success in each trial, denotes as $p$, remains constant. - The random variable $X$ that equals the number of trials that result in a success has a **binomial random variable** with parameters $0 < p < 1$ and $n = 1, 2, \cdots$. - The **probability mass function** of $X$ is - $$f(x) = \binom{n}{x}p^x (1-p)^{n-x} \ \ \ \ \ x = 0,1,\cdots, n$$ - ## Example: Camera Flash Tests - See ((6368f276-bc7e-4d91-b7fb-c5b34c4c6feb)) for whole question. background-color:: green - Calculate the probability of 2 passes in 3 tests. background-color:: green - We are given that $n = 3$ and $p = 0.8$. - Use the Binomial Distribution formula where $X$ is the number of passes: - $$P(X = 2) = \binom{3}{2}(0.8)^2(0.2)^1 = 3(0.128) = 0.384$$ - ## Example: Organic Pollution id:: 6368f570-83e7-4642-a881-7ccd40bb0399 - Each sample of water has a 10% chance of containing a particular organic pollutant. Assume that the sample are independent with regard to the presence of the pollutant. background-color:: green - Find the probability that, in the next 18 samples, exactly 2 contain the pollutant. background-color:: green - Let $X$ denote the number of samples that contain the pollutant in the next 18 samples analysed. Then $X$ is a binomial random variable with $p = 0.1$ and $n = 18$. - $$P(X = 2) = \binom{18}{2}(0.1)^2(0.9)^{18-2} = 153(0.1)^2(0.9)^16 = 0.2835$$ - Determine the probability that $3 \leq X < 7$. background-color:: green - $$X = 3,4,5,6$$ - $$P(3 \leq X < 7) = P(X=3) + P(X=4) + P(X=5) + P(X=6)$$ - $$ \text{or}$$ - $$P(3 \leq X < 7) = \sum^6_{x=3} \binom{18}{x}(0.1)^x(0.9)^{18-x}$$ - $$ = 0.168 + 0.070 + 0.022 + 0.005 = 0.265$$ - ## Binomial Distributions in R card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:21:52.419Z card-last-score:: 1 - `dbinom(x, size, prob)`, where `x` is the number of events required, `size` is the total number of trials, & `prob` is the probability of the event occurring. - ### Example: Organic Pollution - In ((6368f570-83e7-4642-a881-7ccd40bb0399)), `x=2`, `size=18`, & `p=0.10`. background-color:: green - ```R dbinom(x=2, size=18, prob=0.1) [1] 0.2835121 ``` - ## Binomial Mean & Variance #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-22T00:00:00.000Z card-last-reviewed:: 2022-11-21T13:08:26.634Z card-last-score:: 1 - If $X$ is a **binomial random variable** with parameters $p$ & $n$: - The **mean** & **variance** of the binomial distribution $b(x; n,p)$ are - $$\mu = np \text{ and } \sigma^2 = npq \text{, where } q = 1-p$$ - ## Chebyshev's Inequality - What is **Chebyshev's Inequality**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:23:31.513Z card-last-score:: 1 - **Chebyshev's Inequality** provides an estimate as to where a certain percentage of observations will lie relative to the mean once the **standard deviation** is known. - For example, at least 75% of values will lie within two standard deviations of the mean. - - # Poisson Distribution - What are **Poisson Experiments**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-22T00:00:00.000Z card-last-reviewed:: 2022-11-21T13:05:40.034Z card-last-score:: 1 - Experiments yielding numerical values of a random variable $X$, the number of outcomes occurring during a given time interval or in a specified region, are called **Poisson Experiments**. - The given time interval may be of any length, such as a minute, a day, a week, a month, or even a year. - A Poisson Experiment is derived from the **Poisson Process** and possesses the following properties: - The number of outcomes occurring one time interval or specified region of space is **independent** of the number that occur in any other disjoint time interval or region. In this sense, we say that the Poisson Process "has no memory". - The probability that a single outcome will occur during a very short time interval or in a small region is **proportional** to the **length** of the time interval or the size of the region, and does not depend on the number of outcomes occurring outside this time interval or region. - The probability that more than one outcome will occur in such a short time interval or fall in such a small region is **negligible**. - What is the **Poisson Distribution**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-22T00:00:00.000Z card-last-reviewed:: 2022-11-21T13:06:55.129Z card-last-score:: 1 - The random variable $X$ that equals the number of events in a Poisson Process is a **Poisson Random Variable** with parameter $\lambda > 0$, and the probability density function is - $$f(x) = \frac{e^{-\lambda}\lambda^x}{x!} \text{ for } x = 0,1,2,3,\cdots$$ - ## Mean & Variance of Poisson Distribution - If $\lambda$ is the average number of successes occurring in a given time interval or region in the Poisson Distribution, then the **mean** & the **variance** of the Poisson distribution are both equal to $\lambda$. - Mean = $\lambda$, variance = $\lambda$. - A one parameter distribution. - ## Poisson Density Functions for Different Means - ![image.png](../assets/image_1667824994941_0.png) - If the variance is much greater than the mean, then the Poisson Distribution would not be a good model for the distribution of the random variable. - ## Poisson Example: Calculations for Wire Flaws - Suppose that the number of flaws on a thin copper wire follows a Poisson Distribution with a mean of 2.3 flaws per millimetre. background-color:: green - Find the probability of exactly 2 flaws in 1mm of wire. background-color:: green - $$P(X = 2) = \frac{e^{-2.3}2.3{2}}{2!} = 0.265$$ - ## Poisson Example: Car Park - A car park has 3 entrances, $A$, $B$, & $C$. The number of cars per hour entering through each of these is Poisson-distributed with mean $\lambda_A = 1.5$, $\lambda_B = 1.0$, and $\lambda_C = 2.5$. Arrivals at each entrance are **independent**. background-color:: green - $T$ is the total number of cars entering in an hour. - $$T \sim \text{ Poisson}(\lambda_A + \lambda_B + \lambda_C) \equiv \text{Poisson}(1.5 + 1.0 + 2.5) \equiv \text{Poisson}(5)$$ - $$P(T = 4) = \frac{e^{-5} 5^4}{4!} = 0.1755$$ - ## Sum of Independent Poisson Random Variables #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T15:54:18.796Z card-last-score:: 1 - If $X_1, X_2, \cdots, X_n$ are independently Poisson distributed with parameters $\lambda_1, \lambda_2, \cdots, \lambda_n$ then - $$T = X_1 + X_2 + \cdots + X_n \text{ is Poisson}(\lambda_1 + \lambda_2 + \cdots + \lambda_n)$$ - and - $$E[T] = \lambda_1 + \lambda_2 + \cdots + \lambda_n$$ - and - $$\text{Var}(T) = \lambda_1 + \lambda_2 + \cdots + \lambda_n$$ -