155 lines
7.7 KiB
Markdown
155 lines
7.7 KiB
Markdown
- #[[ST2001 - Statistics in Data Science I]]
|
|
- **Previous Topic:** [[Sampling]]
|
|
- **Next Topic:** [[Random Variables]]
|
|
- **Relevant Slides:** 
|
|
-
|
|
- Probability provides the *framework* for the study & application of statistics.
|
|
- # What are Probabilities?
|
|
- Take, for example, a 6-sided die about to be tossed for the first time.
|
|
- **Classical:** 6 possible outcomes, by symmetry, each equally likely to occur,
|
|
- **Frequentist:** Empirical evidence shows that similar dice thrown in the past have landed on each side about equally often.
|
|
- **Subjective:** The degree of individual belief in occurrence of an event can be influenced by classical or frequentist arguments.
|
|
- Subjective probabilities are also influenced by other reasons when symmetry arguments don't apply & repeated trials are not possible.
|
|
-
|
|
- # Probability
|
|
- The probability of an event $A$ is the number of (equally likely & disjoint) outcomes in the event divided by the total number of (equally likely & disjoint) possible outcomes.
|
|
- $$P(A) = \frac{\text{\# of outcomes in A}}{\text{\# of possible outcomes}}$$
|
|
- $$(0 \leq P(A) \leq 1)$$
|
|
- ## Sample Spaces
|
|
- What is a **sample space**? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.7
|
|
card-next-schedule:: 2022-11-15T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-14T16:42:50.065Z
|
|
card-last-score:: 1
|
|
- The set of all possible outcomes of a random experiment is called the **sample space**, $S$.
|
|
- $S$ is **discrete** if it consists of a finite or countably infinite set of outcomes.
|
|
- $S$ is **continuous** if it contains an interval of real numbers.
|
|
- $$P(S) = 1$$
|
|
- ## Events
|
|
- What is an **event**? #card
|
|
card-last-interval:: 4.14
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.56
|
|
card-next-schedule:: 2022-11-21T23:15:48.008Z
|
|
card-last-reviewed:: 2022-11-17T20:15:48.008Z
|
|
card-last-score:: 5
|
|
- An **event** is a specific collection of sample points / possible outcomes.
|
|
- An event is denoted by $E$ or by capital letters, $A$, $B$, etc.
|
|
- What is a **SImple Event**? #card
|
|
card-last-interval:: 3.57
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.46
|
|
card-next-schedule:: 2022-11-22T07:34:13.841Z
|
|
card-last-reviewed:: 2022-11-18T18:34:13.841Z
|
|
card-last-score:: 5
|
|
- A **Simple Event** is a collection of only **one** sample point / possible outcomes.
|
|
- What is a **Compound Event**? #card
|
|
card-last-interval:: 4.14
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.56
|
|
card-next-schedule:: 2022-11-21T23:15:54.761Z
|
|
card-last-reviewed:: 2022-11-17T20:15:54.762Z
|
|
card-last-score:: 5
|
|
- A **Compound Event** is a collection of **more than one** sample point / possible outcomes.
|
|
- ## Permuatations
|
|
- A **permutation** is an arrangement of objects.
|
|
- It can also be an arrangement of $r$ objects chosen from $n$ distinct objects where replacement in the selection is not allowed.
|
|
- The symbol, $P^n_r$ represents the number of permutations of $r$ objects selected from $n$ objects.
|
|
- The calculation is given by the formula:
|
|
- $$P^n_r = \frac{n!}{(n-r)!}$$
|
|
- ## Joint Events (and / or)
|
|
- Probabilities of **joint events** can often be determined from the probabilities of the individual events that comprise them.
|
|
- Joint events are generated by applying basic set operations to individual events, specifically:
|
|
- **Complement** of event $A$ is $\bar{A} =$ all outcomes *not* in $A$.
|
|
- **Union** of events $A \cup B$; $A$ **or** $B$ or both.
|
|
- **Intersection** of events $A$ **and** $B$ -> $A \cap B$.
|
|
- **Disjoint** events cannot occur together -> $A \cap B = \empty$.
|
|
-
|
|
- ## Probability of a Union ($A$ **or** $B$) #card
|
|
card-last-interval:: 21.53
|
|
card-repeats:: 4
|
|
card-ease-factor:: 2.32
|
|
card-next-schedule:: 2022-12-06T08:01:39.281Z
|
|
card-last-reviewed:: 2022-11-14T20:01:39.281Z
|
|
card-last-score:: 3
|
|
- For any two events $A$ and $B$, the probability of union is given by:
|
|
- $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$
|
|
- For two **disjoint** (also called **mutually exclusive**) events $A$ and $B$, the probability that one *or* the other occurs is the sum of the probabilities of the two events (provided that $A$ and $B$ are disjoint).
|
|
- $$P(A \cup B) = P(A) + P(B)$$
|
|
- If $P(A \cup B)$ is greater than 1, then you know you have made a mistake and that the events were not mutually exclusive -> there is an intersection.
|
|
- ## Intersections ($A$ **and** $B$)
|
|
card-last-interval:: 3.51
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.6
|
|
card-next-schedule:: 2022-10-08T00:26:58.336Z
|
|
card-last-reviewed:: 2022-10-04T12:26:58.337Z
|
|
card-last-score:: 5
|
|
- #### Multiplication Rule for Independent Events #card
|
|
card-last-interval:: 33.64
|
|
card-repeats:: 4
|
|
card-ease-factor:: 2.9
|
|
card-next-schedule:: 2022-12-18T11:07:08.232Z
|
|
card-last-reviewed:: 2022-11-14T20:07:08.232Z
|
|
card-last-score:: 5
|
|
- For two **independent** events $A$ and $B$, the probability that *both* $A$ **and** $B$ occur is the product of the probabilities of the two events.
|
|
- $$P(A \cap B) = P(A) \times P(B)$$
|
|
- If two events are **independent**, that means that one event has no impact on the probability of occurrence of the other event.
|
|
- ## Conditional Probability
|
|
card-last-score:: 1
|
|
card-repeats:: 1
|
|
card-next-schedule:: 2022-10-04T23:00:00.000Z
|
|
card-last-interval:: -1
|
|
card-ease-factor:: 2.5
|
|
card-last-reviewed:: 2022-10-04T12:31:44.517Z
|
|
- What is **conditional probability**? #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.36
|
|
card-next-schedule:: 2022-11-15T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-14T16:30:38.038Z
|
|
card-last-score:: 1
|
|
- $P(B | A)$ is the probability of event $B$ occurring, given that event $A$ has already occurred.
|
|
- The **conditional probability** of $B$ given $A$, denoted by $P(B | A)$, is defined by: #card
|
|
card-last-interval:: 0.98
|
|
card-repeats:: 2
|
|
card-ease-factor:: 2.36
|
|
card-next-schedule:: 2022-11-15T19:08:28.312Z
|
|
card-last-reviewed:: 2022-11-14T20:08:28.312Z
|
|
card-last-score:: 3
|
|
- $$P(B|A) = \frac{P(A \cap B)}{P(A)} \text{, provided } P(A) > 0$$
|
|
- **Note:** $P(A)$ cannot equal 0, since we know that $A$ *has* occurred.
|
|
- ### General Multiplication Rule for Dependent Events #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.6
|
|
card-next-schedule:: 2022-11-15T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-14T16:33:46.273Z
|
|
card-last-score:: 1
|
|
- The conditional probability can be rewritten to further generalise the multiplication rule:
|
|
- $$P(A \cap B) = P(A) \cdot P(B|A)$$
|
|
- $$P(B \cap A) = P(B)B \cdot P(B|A)$$
|
|
- $$\text{As } P(A \cap B) = P(B \cap A) \text{ implies}$$
|
|
- $$P(A) \cdot P(B | A) = P(B) \cdot P(A |B)$$
|
|
- These results mean that $P(A |B)$ can be calculated once we know $P(A)$, $P(B)$, and $P(B | A)$.
|
|
- #### Bayes' Theorem #card
|
|
card-last-interval:: -1
|
|
card-repeats:: 1
|
|
card-ease-factor:: 2.36
|
|
card-next-schedule:: 2022-11-22T00:00:00.000Z
|
|
card-last-reviewed:: 2022-11-21T13:07:04.337Z
|
|
card-last-score:: 1
|
|
- **Bayes' Theorem** states that:
|
|
- $$P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)} \text{ for } P(B)>0$$
|
|
- ## Independence
|
|
- Two events, $A$ and $B$ are independent, if and only if:
|
|
- $$P(A \cap B) = P(A)\cdot P(B)$$
|
|
- Therefore, to obtain the probability that two independent events will occur, we simply find the product of their individual probabilities.
|
|
- Two events $A$ and $B$ are independent, if and only if:
|
|
- $$P(B | A) = P(B) \text{ or } P(A|B) = P(A)$$
|
|
- assuming the existence of the conditional probabilities.
|
|
- Otherwise, $A$ and $B$ are **dependent**.
|
|
- If in an experiment, the events $A$ and $B$ can both occur, then:
|
|
- $$P(A \cap B) = P(A)P(B|A) \text{, provided } P(A) > 0$$
|
|
- |