uni/Hypothesis Testing.md at 9694cc81d803692f51940714f97805d0509d6c63

aindreas/uni

Fork 0

Files

Andrew 1f7d812b98 Rename year directories to allow natural ordering

2023-12-20 03:57:27 +00:00

10 KiB

Raw Blame History

#ST2001 - Statistics in Data Science I
Previous Topic: Sampling Distributions & Confidence Intervals
Next Topic: Correlation & Linear Regression
Relevant Slides:
What is a hypothesis test? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:21:16.988Z card-last-score:: 1
- A hypothesis test is intended to assess whether a population parameter of interest is equal to some specified value of direct interest to the researcher.
- Hypothesis tests are structured in a very specific manner.
- The CLT and t-distribution provide the framework for assessing if the sample mean is not the same as the proposed parameter mean.
Null & Alternative Hypotheses #card
card-last-interval:: 2.8 card-repeats:: 1 card-ease-factor:: 2.6 card-next-schedule:: 2022-11-17T11:15:25.890Z card-last-reviewed:: 2022-11-14T16:15:25.891Z card-last-score:: 5
- The null hypothesis is a claim to be tested - often the sceptical claim of "no effect", i.e.:
  - H_0 : \mu = \mu_0
- The alternative hypothesis is an alternative claim under consideration, often represented by a range of parameter values, i.e.:
  - H_1 : \mu \neq \mu_0
- We only reject the null in favour of the alternative if there is strong supporting evidence.
- We decide a priori how much evidence is "strong" enough to reject the null.
Stages in Hypothesis Testing #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T15:53:56.498Z card-last-score:: 1
- 1. Null Hypothesis: The hypothesis that the population parameter is equal to some claimed value (H_0).
  2. Study or Alternative Hypothesis: The hypothesis that must be true if the null hypothesis is false (H_1).
  3. Collect appropriate data.
  4. ^^Assess, through a test statistic, how probable (the p-value) it would be to observe data as or more extreme than the data actually collected if, in fact, the null hypothesis was true.^^
  5. Come to a conclusion whether or not to reject the null hypothesis.
- Rejecting / Not Rejecting the Null
  - If we do not reject the null hypothesis in favour of the alternative, we are saying that the effect indicated by the sample is due only to sampling variation.
  - If we do reject the null hypothesis in favour of the alternative, we are saying that the effect indicated by the sample is real, in that it is more than can be attributed to sampling variation.
Formal Testing Using p-values
- What is the p-value? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-19T00:00:00.000Z card-last-reviewed:: 2022-11-18T18:36:35.132Z card-last-score:: 1
  - The p-value is the ==probability of observing data at least as favourable to the alternative hypothesis== as our current data set, if the null hypothesis is true.
  - The p-value is a way of quantifying the strength of the evidence against the null hypothesis and in favour of the alternative. Formally, the p-value is a conditional probability.
  - The smaller the p-value, the stronger the data favours H_1 over H_0. A small p-value (usually < 0.05) corresponds to sufficient evidence to reject H_0 in favour of H_1.
One-Sample Tests for the Population Mean
- Steps #card
  card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:16:23.254Z card-last-score:: 1
  - 1. Specify the hypotheses about \mu.
    2. Calculate a test statistic - based on the sampling distribution of the sample mean.
    3. See how extreme the test statistic is if the null hypothesis was true - compare the test statistic with the t or Normal Distribution.
    4. Make a decision: reject the null, or fail to reject it.
- Strategy #card
  card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T16:13:48.280Z card-last-score:: 1
  - If the sample came from the population in question, the sample mean should be "close" to the population mean in question.
    - "Close" needs to take into account the sample size used and the variability in the measure (i.e., the standard error).
  - For testing means, the ((6356abee-cb6a-48c5-8f8b-72122b6099eb)) or t-distribution (or the bootstrap) is key.
- Conditions
  - Independence: Random samples / assignment.
  - Normality: For small samples where we use the t-distribution, we require the observations to be approximately normally distributed. For larger (n \geq 30) samples with no extreme skew we can use the CLT and do not require the observations to be normally distributed.
- p-values & (\alpha) Significance Levels #card
  card-last-interval:: 0.98 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-15T15:23:09.685Z card-last-reviewed:: 2022-11-14T16:23:09.685Z card-last-score:: 3
  - A p-value \leq 0.05 is (typically) considered as sufficient evidence against a null hypothesis (i.e., sufficient evidence to reject the null).
  - If the p-value for the test of a parameter with 2-sided alternative is < 0.05, the 95% Confidence Interval will not include the parameter.
  - A p-value is not the probability of the null hypothesis being true given the data observed - It is the probability of observing such data (or more extreme data) given that the null hypothesis is actually true.
    - A non-significant test does not imply that the null hypothesis is true - It actually means that ==we do not have enough evidence to reject the null hypothesis.==
    - A significant result does not mean that the alternative hypothesis is true - It means that we have ==enough evidence to reject the null.==
  - Statistical Signifance
    - Whenever the p-value is less than a particular threshold, the result is said to be statistically significant at that level.
      - The threshold should be decided a priori, before you calculate the test statistic.
    - For example, if the threshold is p \leq 0.05, the result is statistically significant at the 5% level; if p \leq 0.01, the result is statistically significant at the 1% level, and so on.
    - If a result is statistically significant at the 100\alpha\% level, we can also say that the null hypothesis is "rejected at level 100\alpha\%
  - Example: Golf Club Design
    - An experiment was performed in which 15 drivers produced by a particular club-maker were selected at random, and their coefficients of restitution measured. It is of interest to determine if there is evidence (with \alpha = 0.05 significance level) to support a claim that the mean coefficient of restitution exceeds 0.82. background-color:: green
    - The sample mean & sample standard deviation are \bar x = 0.83725 & s = 0.02456. background-color:: green
      - The objective of the experimenter is to demonstrate that the mean coefficient of restitution exceeds 0.82, hence, a one-sided alternative hypothesis is appropriate.
        
        The parameter of interest is the mean coefficient of restitution, \mu.
        
        The null hypothesis is H_0: \mu = 0.82.
        
        The alternative hypothesis is H_1: \mu > 0.82.
        
        We decide a priori that we will reject H_0 is the p-value is < 0.05.
        
        The test statistic is
        
        T_0 = \frac{\bar X - \mu_0}{S / \sqrt{n}}
        
        Computations: Since \bar x = 0.83725, s = 0.02456, \mu = 0.82, and n = 15, our observed test statistic is
        
        t_0 = \frac{0.83725 - 0.82}{0.02456 / \sqrt{15}} = 2.72
      - Conclusions: The probability of observing such data (or more extreme data) if the null hypothesis is true is less than 0.008.
      - Interpretation: There is strong evidence (p = 0.008) to conclude that the mean coefficient of restitution exceeds 0.82.
        
        A CI would give an interval estimate as to what it actually is.
- Connection Between Hypothesis Tests & Confidence Intervals
  - A close relationship exists between the test of a hypothesis for \theta & the confidence interval for \theta.
    - If [I, u] is a 95% confidence interval for the parameter \theta, the test of the null hypothesis against a 2-sided alternative at the 0.05 significance level
      - H_0: \theta = \theta_0
      - H_1: \theta \neq \theta_0
      - will lead to rejection of H_0 if and only if \theta_0 is not in the 95% CI [I,u].
Decision Errors
- What is a type 1 error? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.5 card-next-schedule:: 2022-11-15T00:00:00.000Z card-last-reviewed:: 2022-11-14T15:53:18.720Z card-last-score:: 1
  - A type 1 error is rejecting the null hypothesis when H_0 is true.
  - Type 1 Error Rate
    - As a general rule, we reject H_0 if the p-value is less than 0.05, i.e., we use a significance level of 0.05, \alpha = 0.05.
      - This means that, for those cases where H_0 is actually true, we do not want to incorrectly reject it more than 5% of the times.
      - In other words, when using a 5% significance level, there is about a 5% chance of making a Type 1 Error if the null hypothesis is true.
    - P(\text{Type 1 Error}) = \alpha
    - P(\text{Reject }H_0 | H_0 \text{ true}) = \alpha
      - This is why we prefer small values of \alpha - increasing \alpha increases the Type 1 Error rate.
- What is a type 2 error? #card card-last-interval:: 2.8 card-repeats:: 1 card-ease-factor:: 2.6 card-next-schedule:: 2022-11-17T10:54:03.881Z card-last-reviewed:: 2022-11-14T15:54:03.881Z card-last-score:: 5
  - A type 2 error is failing to reject the null hypothesis when H_A is true.

10 KiB Raw Blame History

Null & Alternative Hypotheses #card

Stages in Hypothesis Testing #card

Rejecting / Not Rejecting the Null

Formal Testing Using p-values

One-Sample Tests for the Population Mean

Steps #card

Strategy #card

Conditions

p-values & (\alpha) Significance Levels #card

Statistical Signifance

Example: Golf Club Design

Connection Between Hypothesis Tests & Confidence Intervals

Decision Errors

Type 1 Error Rate

10 KiB

Raw Blame History

p-values & (`\alpha`) Significance Levels #card