Chapter 14-3. Kruskal-Wallis H test
Recommended Article: 【Statistics】 Chapter 14. Statistical Tests
1. Overview
1. Overview
⑴ Definition
① A test method for comparing distributions of three or more groups
② Used for the same purpose as one-way ANOVA in parametric methods
③ Tests whether the group median is the same, not the mean
④ Sample sizes may vary across groups
⑵ (Reference) Choice of Test Method
① Single sample
○ Parametric test: Single sample T-test
○ Non-parametric test: Sign test, Wilcoxon signed rank test
② Two samples (paired samples): Essentially the same as a single sample
○ Parametric test: Paired sample T-test
○ Non-parametric test: Sign test, Wilcoxon signed rank test
③ Two samples (independent samples)
○ Parametric test: Independent sample T-test
○ Non-parametric test: Wilcoxon rank sum test
④ Analysis of variance
○ Parametric test: ANOVA
○ Non-parametric test: Kruskal-Wallis test
⑤ Randomness
○ Non-parametric test: Run test
⑥ Correlation analysis
○ Pearson correlation coefficient
○ Spearman rank correlation coefficient
2. Kruskal-Wallis H Test
⑴ Example sample
Figure 1. Example sample
⑵ Step 1. Set up hypotheses
① Null hypothesis H0: Medians of each group are the same
② Alternative hypothesis H1: At least one group’s median is different
⑶ Step 2. Assign ranks to 16 data points (four data points from each of four sample groups)
⑷ Step 3. Define the test statistic H as follows: It has a closed form.
① N: Total number of samples
② Rij: Overall rank of Yij
③ R.j: Sum of ranks of group j
④ R..: Sum of all ranks
⑤ The total sum of squares is used in the denominator as opposed to the error sum of squares as in regular one-way ANOVA.
⑸ Step 4. Apply H to the chi-squared test
① H is distribution-free like other rank-based tests, but it asymptotically follows a chi-squared distribution.
② Degrees of freedom: the number of groups (J) − 1. That is, in the example above, the degrees of freedom are 4 − 1 = 3.
③ Each group follows χ2(1), and due to the constraint imposed by the total sum, the degrees of freedom for the chi-squared test become J−1.
⑹ Rejection region for significance level α
① If H ≥ h(α, k, (1, 2, ∙∙∙, m)), then reject H0
② h(α, k, (1, 2, ∙∙∙, m)) is the upper 100α percentile of H satisfying P(H ≥ h(α, k, (1, 2, ∙∙∙, m)))
⑺ RStudio
kruskal.test(y ~ x, data = my_data)
**3. Friedman test
⑴ A rank-based procedure for repeated measure designs, to test
① Null hypothesis H0: For all i = 1, …, I, (Yi,1, …, Yi,J) are exchangeable
② Alternative hypothesis H1: The negation on H0
⑵ Formula
① The treatment reponses are compared within each subject.
② Rij: The rank of Yi,j among (Yi,1, …, Yi,J).
③ I: The number of repetitive experiments.
④ Under the null, G has asymptotically (as I → ∞) the chi-squared distribution with J-1 degrees of freedom.
Posted: 2019.08.24 00:58