Add second year

2023-12-07 01:19:12 +00:00
parent 3291e5c79e
commit 3d12031ab8
1168 changed files with 431409 additions and 0 deletions
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-09-30T10_01_42.381Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-09-30T10_01_42.381Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: -1
+  card-repeats:: 1
+  card-ease-factor:: 2.5
+  card-next-schedule:: 2022-09-19T23:00:00.000Z
+  card-last-reviewed:: 2022-09-19T18:03:31.423Z
+  card-last-score:: 1
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 4.86
+  card-repeats:: 1
+  card-ease-factor:: 2.6
+  card-next-schedule:: 2022-09-24T13:58:35.788Z
+  card-last-reviewed:: 2022-09-19T17:58:35.789Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.5
+	  card-next-schedule:: 2022-09-19T23:00:00.000Z
+	  card-last-reviewed:: 2022-09-19T17:51:31.085Z
+	  card-last-score:: 1
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.5
+	  card-next-schedule:: 2022-09-26T23:00:00.000Z
+	  card-last-reviewed:: 2022-09-26T12:16:41.996Z
+	  card-last-score:: 1
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.5
+	  card-next-schedule:: 2022-09-26T23:00:00.000Z
+	  card-last-reviewed:: 2022-09-26T12:16:26.204Z
+	  card-last-score:: 1
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.5
+	  card-next-schedule:: 2022-09-26T23:00:00.000Z
+	  card-last-reviewed:: 2022-09-26T12:12:25.456Z
+	  card-last-score:: 1
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-07T18_07_00.028Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-07T18_07_00.028Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: -1
+  card-repeats:: 1
+  card-ease-factor:: 2.5
+  card-next-schedule:: 2022-10-07T23:00:00.000Z
+  card-last-reviewed:: 2022-10-07T10:42:17.374Z
+  card-last-score:: 1
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 4
+  card-repeats:: 2
+  card-ease-factor:: 2.7
+  card-next-schedule:: 2022-10-07T14:31:27.689Z
+  card-last-reviewed:: 2022-10-03T14:31:27.690Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: 9.76
+	  card-repeats:: 3
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-10-17T04:35:48.922Z
+	  card-last-reviewed:: 2022-10-07T10:35:48.922Z
+	  card-last-score:: 5
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: 4
+	  card-repeats:: 2
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-10T09:40:03.399Z
+	  card-last-reviewed:: 2022-10-06T09:40:03.400Z
+	  card-last-score:: 3
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: 1.47
+	  card-repeats:: 2
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-08T21:38:07.258Z
+	  card-last-reviewed:: 2022-10-07T10:38:07.258Z
+	  card-last-score:: 3
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.6
+	  card-next-schedule:: 2022-10-07T23:00:00.000Z
+	  card-last-reviewed:: 2022-10-07T10:38:47.387Z
+	  card-last-score:: 1
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-10T11_59_52.472Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-10T11_59_52.472Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: 2.6
+  card-repeats:: 2
+  card-ease-factor:: 2.6
+  card-next-schedule:: 2022-10-11T05:09:09.705Z
+  card-last-reviewed:: 2022-10-08T15:09:09.705Z
+  card-last-score:: 5
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 11.2
+  card-repeats:: 3
+  card-ease-factor:: 2.8
+  card-next-schedule:: 2022-10-18T19:20:23.932Z
+  card-last-reviewed:: 2022-10-07T15:20:23.932Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: 9.76
+	  card-repeats:: 3
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-10-17T04:35:48.922Z
+	  card-last-reviewed:: 2022-10-07T10:35:48.922Z
+	  card-last-score:: 5
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: 4
+	  card-repeats:: 2
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-10T09:40:03.399Z
+	  card-last-reviewed:: 2022-10-06T09:40:03.400Z
+	  card-last-score:: 3
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-09T23:00:00.000Z
+	  card-last-reviewed:: 2022-10-09T08:52:46.030Z
+	  card-last-score:: 1
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.6
+	  card-next-schedule:: 2022-10-08T23:00:00.000Z
+	  card-last-reviewed:: 2022-10-08T15:24:45.845Z
+	  card-last-score:: 1
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-20T09_05_58.001Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-10-20T09_05_58.001Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: 2.6
+  card-repeats:: 2
+  card-ease-factor:: 2.6
+  card-next-schedule:: 2022-10-11T05:09:09.705Z
+  card-last-reviewed:: 2022-10-08T15:09:09.705Z
+  card-last-score:: 5
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 11.2
+  card-repeats:: 3
+  card-ease-factor:: 2.8
+  card-next-schedule:: 2022-10-18T19:20:23.932Z
+  card-last-reviewed:: 2022-10-07T15:20:23.932Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: 9.76
+	  card-repeats:: 3
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-10-17T04:35:48.922Z
+	  card-last-reviewed:: 2022-10-07T10:35:48.922Z
+	  card-last-score:: 5
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: 4
+	  card-repeats:: 2
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-10T09:40:03.399Z
+	  card-last-reviewed:: 2022-10-06T09:40:03.400Z
+	  card-last-score:: 3
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-10-09T23:00:00.000Z
+	  card-last-reviewed:: 2022-10-09T08:52:46.030Z
+	  card-last-score:: 1
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: 3.45
+	  card-repeats:: 2
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-10-13T21:37:27.708Z
+	  card-last-reviewed:: 2022-10-10T11:37:27.708Z
+	  card-last-score:: 3
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-11-23T09_22_44.508Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-11-23T09_22_44.508Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: -1
+  card-repeats:: 1
+  card-ease-factor:: 2.6
+  card-next-schedule:: 2022-11-15T00:00:00.000Z
+  card-last-reviewed:: 2022-11-14T16:30:17.122Z
+  card-last-score:: 1
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 33.64
+  card-repeats:: 4
+  card-ease-factor:: 2.9
+  card-next-schedule:: 2022-11-22T23:24:34.466Z
+  card-last-reviewed:: 2022-10-20T08:24:34.467Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: 29.04
+	  card-repeats:: 4
+	  card-ease-factor:: 2.56
+	  card-next-schedule:: 2022-12-13T20:02:44.059Z
+	  card-last-reviewed:: 2022-11-14T20:02:44.059Z
+	  card-last-score:: 5
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-11-18T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-17T20:19:40.391Z
+	  card-last-score:: 1
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-11-18T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-17T20:18:48.138Z
+	  card-last-score:: 1
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-11-15T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-14T16:49:06.426Z
+	  card-last-score:: 1
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-
--- a/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-11-23T12_15_36.873Z.Desktop.md
+++ b/second/semester1/logseq-stuff/logseq/bak/pages/Sampling/2022-11-23T12_15_36.873Z.Desktop.md
@ -0,0 +1,100 @@
+- #[[ST2001 - Statistics in Data Science I]]
+- **Previous Topic:** [[Exploratory Data Analysis]]
+- **Next Topic:** [[Probability]]
+- **Relevant Slides:** ![Topic 3 - Sampling.pdf](../assets/Topic_3_-_Sampling_1663599787566_0.pdf)
+-
+- What is a **Parameter**? #card
+  card-last-interval:: -1
+  card-repeats:: 1
+  card-ease-factor:: 2.6
+  card-next-schedule:: 2022-11-15T00:00:00.000Z
+  card-last-reviewed:: 2022-11-14T16:30:17.122Z
+  card-last-score:: 1
+	- A **parameter** is a single value summarising some feature or variable of interest in the population.
+	- It is usually unknown.
+- What is **inference**? #card
+  card-last-interval:: 33.64
+  card-repeats:: 4
+  card-ease-factor:: 2.9
+  card-next-schedule:: 2022-11-22T23:24:34.466Z
+  card-last-reviewed:: 2022-10-20T08:24:34.467Z
+  card-last-score:: 5
+	- **Inference** is the process of making decisions about a population based on information in a sample.
+- A consequence of **natural variation** is that two samples drawn form the same population will usually give different estimates of the population parameters.
+-
+- # Sampling
+  collapsed:: true
+	- What is **non-probabilistic sampling**? #card
+	  card-last-interval:: 29.04
+	  card-repeats:: 4
+	  card-ease-factor:: 2.56
+	  card-next-schedule:: 2022-12-13T20:02:44.059Z
+	  card-last-reviewed:: 2022-11-14T20:02:44.059Z
+	  card-last-score:: 5
+		- **Non-probabilistic sampling** methods are techniques of obtaining a sample that is not chosen at random and may be subject to **sampling bias**.
+	- ## Simple Random Sample
+		- ### Difficulties:
+			- Obtaining a sampling frame (list of all experimental units).
+			- Possibly time consuming / expensive.
+			- Minority groups, by chance, may not be represented in the sample.
+	- ## Stratified Random Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-11-18T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-17T20:19:40.391Z
+	  card-last-score:: 1
+		- 1. Split entire population into **homogeneous groups**, called **strata**.
+		  2. Take a Simple Random Sample from each stratum.
+		- ### Stratified VS Simple Random Sample
+			- Ensure representation from minority groups.
+			- Estimates of the population parameters per strata may be of interest.
+			- Possible reduction in cost per observation in the survey.
+			- Increased accuracy as reduced sampling error (less variation within a stratum).
+			-
+			- ### Difficulties
+				- Can you correctly allocate each individual to one & only one stratum?
+				- Should every group receive equal weight?
+				- What if some strata are more varied than others?
+				- Take into account mean, variance, and cost to get "optimal allocation".
+	- ## Cluster Sampling #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.36
+	  card-next-schedule:: 2022-11-18T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-17T20:18:48.138Z
+	  card-last-score:: 1
+		- Instead of randomly choosing individuals, a Simple Random Sample of collection or groups of individuals is taken.
+		- The population is broken up into regions or groups, usually a *natural partition*, called a **cluster**.
+			- Internally heterogeneous, homogeneous between the clusters.
+		- Clusters are assumed representation of the entire population.
+		- Small number of clusters are selected at random.
+		- Every individual within a cluster is observed.
+		-
+		- ### Cluster Over Stratified
+			- Sampling frame not necessarily needed.
+			- May be more practical and / or economical than Simple or Stratified Random Sampling.
+			- Will be biased if the entire cluster is not sampled.
+			- Careful if homogeneity within a cluster and heterogeneity between clusters as this can increase sample error.
+			- **Note:** In stratified sampling, all strata are sampled, while in cluster sampling only some clusters are sampled.
+- # Studies & Experiments
+	- ## Observational Studies & Experiments #card
+	  card-last-interval:: -1
+	  card-repeats:: 1
+	  card-ease-factor:: 2.46
+	  card-next-schedule:: 2022-11-15T00:00:00.000Z
+	  card-last-reviewed:: 2022-11-14T16:49:06.426Z
+	  card-last-score:: 1
+		- In an **Observational Study**, data is collected only be *observing* what occurs.
+			- E.g., surveys, historical records.
+		- When researchers want to investigate **causal relationships**, it's best to conduct an experiment.
+			- Usually there will be both an explanatory variable & a response variable.
+			- Be wary of confounding variables.
+	- ## Designed (Comparative) Study
+		- An experiment allows us to prove a cause-and-effect relationship.
+		- The experimenter must identify:
+			- at least one **explanatory variable**, called a **factor** to manipulate.
+			- at least one **response** variable to measure.
+		- The experimenter must also control any other **nuisance factors** that could influence the response.
+			- e.g., weather, day of the week.
+				-