- #[[ST2001 - Statistics in Data Science I]] - **Previous Topic:** [[Exploratory Data Analysis]] - **Next Topic:** [[Probability]] - **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf) - - What is a **Parameter**? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.6 card-next-schedule:: 2022-11-23T00:00:00.000Z card-last-reviewed:: 2022-11-22T13:40:14.581Z card-last-score:: 1 - A **parameter** is a single value summarising some feature or variable of interest in the population. - It is usually unknown. - What is **inference**? #card card-last-interval:: 33.64 card-repeats:: 4 card-ease-factor:: 2.9 card-next-schedule:: 2022-11-22T23:24:34.466Z card-last-reviewed:: 2022-10-20T08:24:34.467Z card-last-score:: 5 - **Inference** is the process of making decisions about a population based on information in a sample. - A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters. - - # Sampling collapsed:: true - What is **non-probabilistic sampling**? #card card-last-interval:: 29.04 card-repeats:: 4 card-ease-factor:: 2.56 card-next-schedule:: 2022-12-13T20:02:44.059Z card-last-reviewed:: 2022-11-14T20:02:44.059Z card-last-score:: 5 - **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**. - ## Simple Random Sample - ### Difficulties: - Obtaining a sampling frame (list of all experimental units). - Possibly time consuming / expensive. - Minority groups, by chance, may not be represented in the sample. - ## Stratified Random Sampling #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:19:40.391Z card-last-score:: 1 - 1. Split entire population into **homogeneous groups**, called **strata**. 2. Take a Simple Random Sample from each stratum. - ### Stratified VS Simple Random Sample - Ensure representation from minority groups. - Estimates of the population parameters per strata may be of interest. - Possible reduction in cost per observation in the survey. - Increased accuracy as reduced sampling error (less variation within a stratum). - - ### Difficulties - Can you correctly allocate each individual to one & only one stratum? - Should every group receive equal weight? - What if some strata are more varied than others? - Take into account mean, variance, and cost to get "optimal allocation". - ## Cluster Sampling #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:18:48.138Z card-last-score:: 1 - Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken. - The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**. - Internally heterogeneous, homogeneous between the clusters. - Clusters are assumed representation of the entire population. - Small number of clusters are selected at random. - Every individual within a cluster is observed. - - ### Cluster Over Stratified - Sampling frame not necessarily needed. - May be more practical and / or economical than Simple or Stratified Random Sampling. - Will be biased if the entire cluster is not sampled. - Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error. - **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled. - # Studies & Experiments - ## Observational Studies & Experiments #card card-last-interval:: 4.14 card-repeats:: 2 card-ease-factor:: 2.56 card-next-schedule:: 2022-11-27T15:18:34.757Z card-last-reviewed:: 2022-11-23T12:18:34.758Z card-last-score:: 5 - In an **Observational Study**, data is collected only be *observing* what occurs. - E.g., surveys, historical records. - When researchers want to investigate **causal relationships**, it's best to conduct an experiment. - Usually there will be both an explanatory variable & a response variable. - Be wary of confounding variables. - ## Designed (Comparative) Study - An experiment allows us to prove a cause-and-effect relationship. - The experimenter must identify: - at least one **explanatory variable**, called a **factor** to manipulate. - at least one **response** variable to measure. - The experimenter must also control any other **nuisance factors** that could influence the response. - e.g., weather, day of the week. -