Chapter 14-9. Kolmogorov-Smirnov Test
Recommended Post: 【Statistics】 Chapter 14. Statistical Test
1. General Kolmogorov-Smirnov Test
2. Parametric Kolmogorov-Smirnov Test
4. Kolmogorov-Smirnov Two-Sample Test
1. General Kolmogorov-Smirnov Test
⑴ Define the sample (i.e., empirical) distribution function as follows: Step function form
⑵ Reject the null hypothesis when the following value is large for the two-sided Kolmogorov-Smirnov test
Figure 1. Kolmogorov-Smirnov test statistic
⑶ Null hypothesis
① The null distribution of D does not depend on F0 (assumed to be continuous), and this is summarized in a table for various sample sizes n
② In other words, the null hypothesis is that F hat and F0, the two probability distributions, are the same.
⑷ Theory: Under the null hypothesis, √n D asymptotically (n → ∞) follows the distribution of the maximum absolute value of a Brownian bridge
⑸ Simulation: The p-value can be computed through Monte Carlo simulation
① In practice, since the distribution of D under the null distribution does not depend on F0, this only needs to be performed once for each sample size
② For example, F0 can be set to Unif(0, 1)
2. Parametric Kolmogorov-Smirnov Test
⑴ The test statistic is defined as follows
⑵ The p-value is usually estimated through parametric bootstrap: Here, F hat and θ hat are recalculated for each bootstrap sample
⑶ When 𝒢 is a family of normal distributions: That is, when testing whether a given distribution follows a normal distribution (Normality Test)
① This test is often called the Lilliefors normality test
② The above test statistic can be adjusted for all distributions within the family, and this is performed through Monte Carlo simulation
③ Therefore, the null distribution of this test statistic is summarized in a table
④ R code:
ks.test(dat, "pnorm", mean=mu, sd=sigma)
⑷ The same logic applies to distributions of other location-scale families
① Here, G0 is a given distribution defined on the real set ℝ
② Location: mean, median, quantiles/percentiles, etc
③ Scale: standard deviation, median absolute deviation, etc
3. Cramér–von Mises Test
⑴ A variation of the Kolmogorov-Smirnov test
⑵ The Cramér–von Mises test rejects the null hypothesis when the following value is large:
① f0(x) = dF0(x)/dx is the probability density function (PDF) under the null hypothesis.
② Formulate the equation similarly to MSE (mean-squared error).
⑶ This has a simple closed-form expression that does not require integration:
① Here, X(1) ≤ ⋯ ≤ X(n) is the ordered sample, known as order statistics.
② The null distribution of D does not depend on F0 and has been tabulated.
③ The asymptotic null distribution is also known but is complicated.
④ Therefore, Monte Carlo simulations can be used to compute the p-value.
4. Kolmogorov-Smirnov Two-Sample Test
⑴ The (one-sided) Kolmogorov-Smirnov test rejects for large values of
⑵ Theory
① The distribution of Dm,n+ can be computed exactly and efficiently using some recursion formulas.
② In the large-sample limit
③ This happens to be the same limiting distribution as in the one-sample case with sample size ⎣ mn / (m+n) ⎦.
Input: 2025.03.23 18:32