Summary of Statistics

Recommended Articles : 【Statistics】 Table of Contents

1. Data, Information, Knowledge

⑴ Data : Given data

⑵ Information : Name of the data

⑶ Knowledge : Relationship between information and information

2. Ratio Scale, Interval Scale, Ordinal Scale, Nominal Scale

⑴ Ratio Scale : Absolute zero point exists. Ratio concept exists. Absolute zero, etc.

⑵ Interval Scale : Absolute zero point does not exist. Ratio concept does not exist. Celsius temperature, etc.

⑶ Ordinal Scale : Order concept

⑷ Nominal Scale : Gender, etc.

3. Difference between Hypothetico-Deductive Method and Data Science

⑴ Hypothetico-Deductive Method sets a hypothesis first and then conducts experiments

⑵ Data Science conducts experiments first and then sets hypotheses

4. Accuracy and Precision

⑴ Accuracy is a concept about how close the sample mean is to the population mean

⑵ Precision is a concept about how small the variance of the sample is

5. Confounding Effect

⑴ When a third factor affects both the manipulated variable and the dependent variable

⑵ Why correlation does not imply causation

6. Meaning of Batch Effect

⑴ Incorrect repeated experiments

⑵ When control variables applied to batches are not properly controlled, it leads to incorrect statistical conclusions

7. Meaning of Observational Experiment and Experimental Experiment

⑴ Observational Experiment is testing hypotheses without changing conditions. Applied in data science.

⑵ Experimental Experiment is testing hypotheses by changing conditions. Applied in general scientific methodology.

8. Difference between Replication Experiment and Replication Measurement

⑴ Replication Experiment is related to accuracy. Testing multiple patients in drug testing, for example.

⑵ Replication Measurement is related to precision. Testing the same patient multiple times in drug testing, for example.

9. Comparison between Mean and Median

⑴ Left is median, right is mean

⑵ The areas on both sides should be equal with respect to the median

10. Quantile vs. Quantile Plot

11. Meaning of Independence

⑴ One variable provides no information about another variable

⑵ P(X = x, Y = y) = P(X = x) × P(Y = y)

12. Central Limit Theorem

Regardless of the population distribution, the distribution of sample means follows a normal distribution

13. Chi-Square Goodness-of-Fit Test

14. Chi-Square Independence Test

15. Characteristics of the t-Distribution

⑴ T-distribution is fatter-tailed compared to the standard normal distribution

16. Considerations when Formulating the Alternative Hypothesis

⑴ There should be potential for refutation

17. Parametric Testing vs. Non-Parametric Testing

⑴ Parametric Testing

① Generally when the distribution of samples follows a normal distribution

② Calculation of p-value through parameters

⑵ Non-Parametric Testing

① Generally when the distribution of samples does not follow a normal distribution

② Calculation of p-value without parameters

18. Reasons for One-Tailed Testing

⑴ Situation : When there is confidence that it is not on one side

⑵ Advantages : Can reduce Type II errors

① Type II Error : Rejecting the null hypothesis when the alternative hypothesis is true

② Possibility of obtaining the desired conclusion from the experimenter’s perspective

⑶ Disadvantage 1. Can lead to incorrect statistical conclusions: p-value is underestimated

⑷ Disadvantage 2. Need to convince oneself

19. Experimental Design :** Shoe Experiment

⑴ Left shoe and right shoe have significant differences in shape and usage, how to compare them is the issue

20. Experimental Design

⑴ Problem : Whether to experiment on the genetic diversity group or the genetic unified group

⑵ Answer : Genetic diversity group

① In practice, it is the genetic diversity group : Issues with the effectiveness of conclusions drawn from the genetic unified group

② Significant conclusions can be obtained through post-tests : Subsequent application of the hypothetico-deductive method

21. Power of the Test

⑴ Increasing the power means using statistical techniques where the p-value is smaller while α is constant

⑵ Example 1. T-distribution has higher power with higher degrees of freedom

① Higher degrees of freedom make the t-distribution more similar to the normal distribution

② Higher degrees of freedom result in a narrower t-distribution, leading to higher power

⑶ Example 2. Two-sample t-test has higher power than paired sample t-test

① Paired sample t-test : Essentially one variable. Degrees of freedom are n-1

② Two-sample t-test : Two variables. Degrees of freedom are n+m-2

③ Increasing degrees of freedom in the t-distribution leads to higher power, so two-sample t-test has higher power

⑷ Example 3. In two-sample t-test, having the assumption of equal variances results in higher power compared to when there is no assumption of equal variances

① When there is the assumption of equal variances, degrees of freedom

② When there is no assumption of equal variances, degrees of freedom

⑸ Example 4. In regression analysis, F-test has higher power than t-test

⑹ Example 5. Performing ANCOVA on data that satisfies the parallelism assumption has higher power than comparing y-intercepts of each regression line in data

① Comparing y-intercepts has the sample size level of one sample group

② ANCOVA calculates with the entire error term, so the sample size level is twice that of one sample group

22. Reasons Not to Conduct Pairwise t-Tests When There Are Multiple Groups

Due to the accumulation of Type I errors, even groups with little difference are likely to be concluded as having differences

23. Assumptions of ANOVA Analysis

⑴ Normality : All data are sampled from populations that follow normal distributions

⑵ Independence : All data are independently sampled from populations

⑶ Homoscedasticity : All data, regardless of different means, are sampled from populations with the same variance

24. Meaning of Robustness

Even in cases of heteroscedasticity or non-normality, statistical conclusions (accepting or rejecting null hypothesis) do not change with a large sample size or repeated measures within categories

25. Assumptions of Linear Correlation

⑴ Randomly sampled data

⑵ Each variable is sampled from populations that follow normal distributions

⑶ The relationship appears linear

26. Difference between Correlation and Regression Analysis

⑴ Correlation simply indicates the degree of the relationship between variables

⑵ Regression analysis indicates the causal relationship between independent variables and the dependent variable. Since the goal is prediction, it does not necessarily prove the actual causal relationship

27. Assumptions of Regression Model

⑴ Y values are assumed to be measured from populations that satisfy normality and homoscedasticity

⑵ Independent variable X is assumed to be measured without error : Difficult to satisfy in practice

⑶ The dependent variable is assumed to be determined by the independent variable

⑷ The relationship between X and Y is assumed to be linear

28. Meaning of Multicollinearity

In multiple linear regression, two or more independent variables have strong correlations, causing problems by increasing the standard errors of the regression coefficients

29. Assumptions of ANCOVA

⑴ Homoscedasticity

⑵ Independence

⑶ Normality

⑷ Linearity between covariate and dependent variable

⑸ Parallelism

30. Steps in ANCOVA

⑴ 1^st. Verify that the interaction between the independent variable and the covariate is not statistically significant

⑵ 2^nd. Calculate the regression line for the covariate for the dependent variable

⑶ 3^rd. Calculate regression lines for each level of the independent variable by adjusting the y-intercept to minimize the sum of squares

⑷ 4^th. Calculate residuals for data within each level from the respective regression lines

⑸ 5^th. Calculate standardized values for the function values of each regression line based on the mean of the covariate for the entire group

⑹ 6^th. Display residuals above and below standardized values for each level

⑺ 7^th. Perform ANOVA on the adjusted data

31. Meaning of Parallelism

⑴ For example, when calculating regression lines for contaminated mining areas and non-contaminated areas, it refers to the assumption that the slopes are the same

⑵ If parallelism is not satisfied, comparing differences for a selected value (e.g., overall average age) cannot represent the entire range of covariates

32. Why Overfitting Should Be Avoided in Machine Learning

⑴ Overfitting incorporates inaccuracies in the sample, leading to reduced predictive accuracy

⑵ In practical machine learning, errors are intentionally introduced at each step

Input : 2019-12-10 00:07

1774

Summary of Statistics

1. Data, Information, Knowledge

2. Ratio Scale, Interval Scale, Ordinal Scale, Nominal Scale

3. Difference between Hypothetico-Deductive Method and Data Science

4. Accuracy and Precision

5. Confounding Effect

6. Meaning of Batch Effect

7. Meaning of Observational Experiment and Experimental Experiment

8. Difference between Replication Experiment and Replication Measurement

9. Comparison between Mean and Median

10. Quantile vs. Quantile Plot

11. Meaning of Independence

12. Central Limit Theorem

13. Chi-Square Goodness-of-Fit Test

14. Chi-Square Independence Test

15. Characteristics of the t-Distribution

16. Considerations when Formulating the Alternative Hypothesis

17. Parametric Testing vs. Non-Parametric Testing

18. Reasons for One-Tailed Testing

19. Experimental Design :** Shoe Experiment

20. Experimental Design

21. Power of the Test

22. Reasons Not to Conduct Pairwise t-Tests When There Are Multiple Groups

23. Assumptions of ANOVA Analysis

24. Meaning of Robustness

25. Assumptions of Linear Correlation

26. Difference between Correlation and Regression Analysis

27. Assumptions of Regression Model

28. Meaning of Multicollinearity

29. Assumptions of ANCOVA

30. Steps in ANCOVA

31. Meaning of Parallelism

32. Why Overfitting Should Be Avoided in Machine Learning

results matching ""

No results matching ""

Summary of Statistics

1. Data, Information, Knowledge

2. Ratio Scale, Interval Scale, Ordinal Scale, Nominal Scale

3. Difference between Hypothetico-Deductive Method and Data Science

4. Accuracy and Precision

5. Confounding Effect

6. Meaning of Batch Effect

7. Meaning of Observational Experiment and Experimental Experiment

8. Difference between Replication Experiment and Replication Measurement

9. Comparison between Mean and Median

10. Quantile vs. Quantile Plot

11. Meaning of Independence

12. Central Limit Theorem

13. Chi-Square Goodness-of-Fit Test

14. Chi-Square Independence Test

15. Characteristics of the t-Distribution

16. Considerations when Formulating the Alternative Hypothesis

17. Parametric Testing vs. Non-Parametric Testing

18. Reasons for One-Tailed Testing

19. Experimental Design **: Shoe Experiment

20. Experimental Design

21. Power of the Test

22. Reasons Not to Conduct Pairwise t-Tests When There Are Multiple Groups

23. Assumptions of ANOVA Analysis

24. Meaning of Robustness

25. Assumptions of Linear Correlation

26. Difference between Correlation and Regression Analysis

27. Assumptions of Regression Model

28. Meaning of Multicollinearity

29. Assumptions of ANCOVA

30. Steps in ANCOVA

31. Meaning of Parallelism

32. Why Overfitting Should Be Avoided in Machine Learning

results matching ""

No results matching ""

19. Experimental Design :** Shoe Experiment