Files
uni/year2/semester1/logseq-stuff/pages/Sampling.md

4.8 KiB

  • #ST2001 - Statistics in Data Science I
  • Previous Topic: Exploratory Data Analysis
  • Next Topic: Probability
  • Relevant Slides: Topic 3 - Sampling.pdf
  • What is a Parameter? #card card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.6 card-next-schedule:: 2022-11-23T00:00:00.000Z card-last-reviewed:: 2022-11-22T13:40:14.581Z card-last-score:: 1
    • A parameter is a single value summarising some feature or variable of interest in the population.
    • It is usually unknown.
  • What is inference? #card card-last-interval:: 33.64 card-repeats:: 4 card-ease-factor:: 2.9 card-next-schedule:: 2022-11-22T23:24:34.466Z card-last-reviewed:: 2022-10-20T08:24:34.467Z card-last-score:: 5
    • Inference is the process of making decisions about a population based on information in a sample.
  • A consequence of natural variation is that two samples drawn form the same population will usually give different estimates of the population parameters.
  • Sampling

    collapsed:: true
    • What is non-probabilistic sampling? #card card-last-interval:: 29.04 card-repeats:: 4 card-ease-factor:: 2.56 card-next-schedule:: 2022-12-13T20:02:44.059Z card-last-reviewed:: 2022-11-14T20:02:44.059Z card-last-score:: 5
      • Non-probabilistic sampling methods are techniques of obtaining a sample that is not chosen at random and may be subject to sampling bias.
    • Simple Random Sample

      • Difficulties:

        • Obtaining a sampling frame (list of all experimental units).
        • Possibly time consuming / expensive.
        • Minority groups, by chance, may not be represented in the sample.
    • Stratified Random Sampling #card

      card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:19:40.391Z card-last-score:: 1
        1. Split entire population into homogeneous groups, called strata.
        2. Take a Simple Random Sample from each stratum.
      • Stratified VS Simple Random Sample

        • Ensure representation from minority groups.
        • Estimates of the population parameters per strata may be of interest.
        • Possible reduction in cost per observation in the survey.
        • Increased accuracy as reduced sampling error (less variation within a stratum).
        • Difficulties

          • Can you correctly allocate each individual to one & only one stratum?
          • Should every group receive equal weight?
          • What if some strata are more varied than others?
          • Take into account mean, variance, and cost to get "optimal allocation".
    • Cluster Sampling #card

      card-last-interval:: -1 card-repeats:: 1 card-ease-factor:: 2.36 card-next-schedule:: 2022-11-18T00:00:00.000Z card-last-reviewed:: 2022-11-17T20:18:48.138Z card-last-score:: 1
      • Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
      • The population is broken up into regions or groups, usually a natural partition, called a cluster.
        • Internally heterogeneous, homogeneous between the clusters.
      • Clusters are assumed representation of the entire population.
      • Small number of clusters are selected at random.
      • Every individual within a cluster is observed.
      • Cluster Over Stratified

        • Sampling frame not necessarily needed.
        • May be more practical and / or economical than Simple or Stratified Random Sampling.
        • Will be biased if the entire cluster is not sampled.
        • Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
        • Note: In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
  • Studies & Experiments

    • Observational Studies & Experiments #card

      card-last-interval:: 4.14 card-repeats:: 2 card-ease-factor:: 2.56 card-next-schedule:: 2022-11-27T15:18:34.757Z card-last-reviewed:: 2022-11-23T12:18:34.758Z card-last-score:: 5
      • In an Observational Study, data is collected only be observing what occurs.
        • E.g., surveys, historical records.
      • When researchers want to investigate causal relationships, it's best to conduct an experiment.
        • Usually there will be both an explanatory variable & a response variable.
        • Be wary of confounding variables.
    • Designed (Comparative) Study

      • An experiment allows us to prove a cause-and-effect relationship.
      • The experimenter must identify:
        • at least one explanatory variable, called a factor to manipulate.
        • at least one response variable to measure.
      • The experimenter must also control any other nuisance factors that could influence the response.
        • e.g., weather, day of the week.