4.8 KiB
4.8 KiB
- #ST2001 - Statistics in Data Science I
- Previous Topic: Exploratory Data Analysis
- Next Topic: Probability
- Relevant Slides:
- What is a Parameter? #card
card-last-interval:: -1
card-repeats:: 1
card-ease-factor:: 2.6
card-next-schedule:: 2022-11-23T00:00:00.000Z
card-last-reviewed:: 2022-11-22T13:40:14.581Z
card-last-score:: 1
- A parameter is a single value summarising some feature or variable of interest in the population.
- It is usually unknown.
- What is inference? #card
card-last-interval:: 33.64
card-repeats:: 4
card-ease-factor:: 2.9
card-next-schedule:: 2022-11-22T23:24:34.466Z
card-last-reviewed:: 2022-10-20T08:24:34.467Z
card-last-score:: 5
- Inference is the process of making decisions about a population based on information in a sample.
- A consequence of natural variation is that two samples drawn form the same population will usually give different estimates of the population parameters.
-
Sampling
collapsed:: true- What is non-probabilistic sampling? #card
card-last-interval:: 29.04
card-repeats:: 4
card-ease-factor:: 2.56
card-next-schedule:: 2022-12-13T20:02:44.059Z
card-last-reviewed:: 2022-11-14T20:02:44.059Z
card-last-score:: 5
- Non-probabilistic sampling methods are techniques of obtaining a sample that is not chosen at random and may be subject to sampling bias.
-
Simple Random Sample
-
Difficulties:
- Obtaining a sampling frame (list of all experimental units).
- Possibly time consuming / expensive.
- Minority groups, by chance, may not be represented in the sample.
-
-
Stratified Random Sampling #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:19:40.391Z card-last-score:: 1-
- Split entire population into homogeneous groups, called strata.
- Take a Simple Random Sample from each stratum.
-
Stratified VS Simple Random Sample
- Ensure representation from minority groups.
- Estimates of the population parameters per strata may be of interest.
- Possible reduction in cost per observation in the survey.
- Increased accuracy as reduced sampling error (less variation within a stratum).
-
Difficulties
- Can you correctly allocate each individual to one & only one stratum?
- Should every group receive equal weight?
- What if some strata are more varied than others?
- Take into account mean, variance, and cost to get "optimal allocation".
-
-
Cluster Sampling #card
card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:18:48.138Z card-last-score:: 1- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
- The population is broken up into regions or groups, usually a natural partition, called a cluster.
- Internally heterogeneous, homogeneous between the clusters.
- Clusters are assumed representation of the entire population.
- Small number of clusters are selected at random.
- Every individual within a cluster is observed.
-
Cluster Over Stratified
- Sampling frame not necessarily needed.
- May be more practical and / or economical than Simple or Stratified Random Sampling.
- Will be biased if the entire cluster is not sampled.
- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
- Note: In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
- What is non-probabilistic sampling? #card
card-last-interval:: 29.04
card-repeats:: 4
card-ease-factor:: 2.56
card-next-schedule:: 2022-12-13T20:02:44.059Z
card-last-reviewed:: 2022-11-14T20:02:44.059Z
card-last-score:: 5
-
Studies & Experiments
-
Observational Studies & Experiments #card
card-last-interval:: 4.14 card-repeats:: 2 card-ease-factor:: 2.56 card-next-schedule:: 2022-11-27T15:18:34.757Z card-last-reviewed:: 2022-11-23T12:18:34.758Z card-last-score:: 5- In an Observational Study, data is collected only be observing what occurs.
- E.g., surveys, historical records.
- When researchers want to investigate causal relationships, it's best to conduct an experiment.
- Usually there will be both an explanatory variable & a response variable.
- Be wary of confounding variables.
- In an Observational Study, data is collected only be observing what occurs.
-
Designed (Comparative) Study
- An experiment allows us to prove a cause-and-effect relationship.
- The experimenter must identify:
- at least one explanatory variable, called a factor to manipulate.
- at least one response variable to measure.
- The experimenter must also control any other nuisance factors that could influence the response.
-
e.g., weather, day of the week.
-
-