Korean, Edit

Chapter 14-9. Kolmogorov-Smirnov Test

Recommended Post: 【Statistics】 Chapter 14. Statistical Test


1. General Kolmogorov-Smirnov Test

2. Parametric Kolmogorov-Smirnov Test

3. Cramér–von Mises Test

4. Kolmogorov-Smirnov Two-Sample Test



1. General Kolmogorov-Smirnov Test

⑴ Define the sample (i.e., empirical) distribution function as follows: Step function form


스크린샷 2025-03-23 오후 7 04 03


⑵ Reject the null hypothesis when the following value is large for the two-sided Kolmogorov-Smirnov test


스크린샷 2025-03-23 오후 7 04 24


스크린샷 2025-03-28 오전 12 44 02

Figure 1. Kolmogorov-Smirnov test statistic


⑶ Null hypothesis

① The null distribution of D does not depend on F0 (assumed to be continuous), and this is summarized in a table for various sample sizes n

② In other words, the null hypothesis is that F hat and F0, the two probability distributions, are the same.

Theory: Under the null hypothesis, √n D asymptotically (n → ∞) follows the distribution of the maximum absolute value of a Brownian bridge

Simulation: The p-value can be computed through Monte Carlo simulation

① In practice, since the distribution of D under the null distribution does not depend on F0, this only needs to be performed once for each sample size

② For example, F0 can be set to Unif(0, 1)



2. Parametric Kolmogorov-Smirnov Test

⑴ The test statistic is defined as follows


스크린샷 2025-03-23 오후 7 05 13


⑵ The p-value is usually estimated through parametric bootstrap: Here, F hat and θ hat are recalculated for each bootstrap sample

⑶ When 𝒢 is a family of normal distributions: That is, when testing whether a given distribution follows a normal distribution (Normality Test)


스크린샷 2025-03-23 오후 7 05 36


① This test is often called the Lilliefors normality test

② The above test statistic can be adjusted for all distributions within the family, and this is performed through Monte Carlo simulation

③ Therefore, the null distribution of this test statistic is summarized in a table

④ R code: ks.test(dat, "pnorm", mean=mu, sd=sigma)

⑷ The same logic applies to distributions of other location-scale families


스크린샷 2025-03-23 오후 7 06 15


① Here, G0 is a given distribution defined on the real set ℝ

② Location: mean, median, quantiles/percentiles, etc

③ Scale: standard deviation, median absolute deviation, etc



3. Cramér–von Mises Test

⑴ A variation of the Kolmogorov-Smirnov test

⑵ The Cramér–von Mises test rejects the null hypothesis when the following value is large:


스크린샷 2025-03-23 오후 7 22 15


① f0(x) = dF0(x)/dx is the probability density function (PDF) under the null hypothesis.

② Formulate the equation similarly to MSE (mean-squared error).

⑶ This has a simple closed-form expression that does not require integration:


스크린샷 2025-03-23 오후 7 23 03


① Here, X(1) ≤ ⋯ ≤ X(n) is the ordered sample, known as order statistics.

② The null distribution of D does not depend on F0 and has been tabulated.

③ The asymptotic null distribution is also known but is complicated.

④ Therefore, Monte Carlo simulations can be used to compute the p-value.



4. Kolmogorov-Smirnov Two-Sample Test

⑴ The (one-sided) Kolmogorov-Smirnov test rejects for large values of


스크린샷 2025-03-27 오전 12 30 27


Theory

① The distribution of Dm,n+ can be computed exactly and efficiently using some recursion formulas.

② In the large-sample limit


스크린샷 2025-03-27 오전 12 31 50


③ This happens to be the same limiting distribution as in the one-sample case with sample size ⎣ mn / (m+n) ⎦.



Input: 2025.03.23 18:32

results matching ""

    No results matching ""