Korean, Edit

Chapter 14. Statistical Test

Higher category: 【Statistics】 Statistics Overview


1. terminology

2. Neyman-Pearson lemma

3. generalized likelihood ratio test

4. p value


a. Comprehensive Summary of Statistical Test Examples

b. Simple Test

c. Kruskal-Wallis H Test

d. Wilcoxon Rank Test

e. Run Test

f. Fisher Exact Test (hypergeometric test)

g. Chi-Squared Test



1. terminology 

⑴ test

① definition : verifying if the hypothesis is statistically significant

application 1. randomization check (balance test): verifying that the random sampling goes well 

application 2. causal effect: verifying that a particular treatment makes a significant change

② test statistic: summarizing the n-dimensional information of the state space in one dimension and using it for statistical test

○ example: Z, T, χ2, F, etc 

○ being able to be summarized in one dimension is important when the size of the critical region is constant

③ parametric test 

○ definition: testing parameters based on test statistics

○ in general, it is assumed that the distribution of the population is normal distribution: the central limit theorem is used for this assumption 

○ in reality, using a parametric test on any sample without the above assumption is not a big problem

④ Non-parametric test

○ Definition: A method of testing non-parametric characteristics through test statistics.

○ Used when the population distribution cannot be specified (distribution-free method).

○ Compared to parametric methods, the calculation of statistics is simpler and more intuitive to understand.

○ Less affected by outliers.

○ The reliability of the test statistics is often insufficient.

⑵ hypothesis

① null hypothesis (H0): a hypothesis to be tested directly 

② alternative hypothesis (H1): a hypothesis to be accepted when null hypothesis is rejected 

○ Also known as the research hypothesis.

③ characteristics: for parameter θ,

○ H0 : Θ0 ={θ0, θ0’, θ0’’, ···} 

○ H1 : Θ1 ={θ1, θ1’, θ1’’, ···}

characteristic 1. p(θ ∈ Θ0 or θ ∈ Θ1) = 1

characteristic 2. p(θ ∈ Θ0 and θ ∈ Θ1) = 0

④ classification

○ simple hypothesis: in the cases of Θ ={θ0}, Θ ={θ1}, ···

○ composite hypothesis: if the hypothesis is not a simple hypothesis

○ example: the hypotheses that H0 : θ ≤ θ0, H1 : θ > θ0 are composite hypotheses

⑶ introduction of critical region 

① state space = critical region + acceptable region

○ Rejection Region: The range of test statistics that lead to the rejection of the null hypothesis.

○ sample ∈ critical region: H0 is rejected 

○ sample ∉ critical region: H0 is not rejected 

② power function πC(θ): the probability of the samples being included in the critical region when the critical region is C and the parameter is θ


drawing

③ an example of power function 

○ p(x) = 4x3 / θ4 I{0 < x < θ}

○ C ={x x ≤ 0.5, x > 1}

○ θ ≤ 0.5 


πC(θ) = 1


○ 0.5 < θ ≤ 1 


πC(θ) = ∫ p(x) dx (assuming x ∈ \[0, 0.5\] ) = 1 / 16θ4


○ 1 < θ


πC(θ) = ∫ p(x) dx (assuming x ∈ \[0, 0.5\] ∪\[1, θ\] ) = 1 - 15 / 16θ4


④ size of critical region (size of test): the maximum value of the probability of samples being included in the critical region when the null hypothesis is true 


drawing

⑤ power : the probability of samples being included in the critical region when the alternative hypothesis is true. it is also the probability of null hypothesis to be rejected when the alternative hypothesis is true 


drawing

⑥ error : making a wrong statistical conclusion 

○ ideal critical region 


drawing

○ type Ⅰ error 

○ definition : the error rejecting null hypothesis when null hypothesis is true

○ condition : defined when null hypothesis is simple hypothesis

○ the probability of type Ⅰ error (α) = the size of the critical region 

○ significance level: 10%, 5%, 1%, etc 

○ confidence level: 90%, 95%, 99%, etc 

○ type Ⅱ error 

○ definition : the error adopting null hypothesis when alternative hypothesis is true

○ condition : defined when alternative hypothesis is simple hypothesis

○ the probability of type Ⅱ error (β) = 1 - power

○ trade-off between α and β 


drawing

Figure. 1. trade-off between type Ⅰ error (α) and type Ⅱ error (β)


○ the critical region appears as an interval larger than a specific value or less than a specific value ( Neyman-Pearson lemma)

○ both α and β cannot be reduced

⑷ comparison of critical region 

criterion: power should be greater when the critical region is same in size

② more powerful testing: for a specific θ1 ∈ Θ1 and two critical regions C1, C2


drawing

③ most powerful testing : for a specific θ1 ∈ Θ1 and any critical region C,


drawing

④ uniformly most powerful testing: for any θ ∈ Θ1 and any critical region C,


drawing


2. Neyman-Pearson lemma 

⑴ idea

① premise: H0 : θ = θ0, H1 : θ = θ1 (simple hypothesis)

② question : finding a critical region that maximizes the power when the size of the critical region is constant

③ speculation: samples from a state space are included in the critical region C one by one 

○ when a sample x is included C, both p(x, θ0) and p(x, θ1) increase

○ p(x, θ0): a kind of cost. the increase of p(x, θ0) increases the size of the critical region 

○ p(x, θ1): a kind of benefit. the increase of p(x, θ1) increases the power 

④ conclusion 

○ line-up strategy: it is advantageous to include a sample x having more p(x, θ1) ÷ p(x, θ0) in the critical region in a faster order  

the critical region made by the line-up strategy, C ={x | p(x, θ1) ÷ p(x, θ0) ≥ k}, is a critical region for uniformly most powerful testing

⑵ lemma 

① premise : H0, H1 are simple hypotheses

② statement : for any k ∈ ℝ, if we take the following critical region it will be a critical region for uniformly most powerful testing 


drawing

○ ℒ : likelihood function

○ likelihood ratio test (LR test): a test like λ(x) ≥ k

○ determination of critical region: to know the exact form of critical region, the size of the critical region should be given 

○ every x satisfying λ(x) ≥ k is included in critical region C*

○ every x satisfying λ(x) < k is not included in critical region C*

③ application

○ as only the order of p(x, θ1) ÷ p(x, θ0) is important, the following conversion using a monotone increasing function f(·) is allowed


drawing

○ terms related to θ0, θ1, n, etc are easily removed 

○ point: the modification of critical region is allowed as far as the existence of k’ is ensured  

○ determination of critical region: to know the exact form of critical region the size of the critical region should be given 

⑶ proof 

① assumption : C* and C are same in size


drawing

② definition of C* 


drawing

③ conclusion: C* is a critical region with the uniformly most powerful testing


drawing

example 1.

① X1, ···, Xn ~ Bernoulli(θ)


drawing

② H0 : θ = θ0, H1 : θ = θ1 > θ0

③ likelihood ratio test


drawing

④ Z-test (confidence level : α) 

○ θ1 > θ0 : one-tailed test


drawing

○ θ1 < θ0 : one-tailed test 


drawing

○ critical region with uniformly most powerful testing does not exist because the size of critical region of the most powerful testing depends on whether θ0 is bigger than θ1 or not  

example 2.

① X1, ···, Xn ~ N(μ, 12)


drawing

② H0 : μ = μ0, H1 : μ = μ1 > μ0

③ likelihood ratio test


drawing

④ Z-test : one-tailed test (confidence level : α)


drawing

generalization 1. the form of the critical region is constant no matter whether H1 is a composite hypothesis or not, when the critical region does not depend on the specific values of θ1

① X1, ···, Xn ~ Bernoulli(θ)

② H0 : θ = θ0, H1 : θ > θ0

generalization 2. in the generalization 1, the form of the critical region is constant when the alternative hypothesis is a composite hypothesis including θ0 and α is the maximum if θ = θ0 

① X1, ···, Xn ~ Bernoulli(θ)

② H0 : θ < θ0, H1 : θ > θ0



3. generalized likelihood ratio test  

⑴ definition

① the limit of Neyman-Pearson lemma : in general, the null hypothesis and the alternative hypothesis should be simple hypotheses

② GLR test (generalized likelihood ratio test) 


drawing

③ max p(x, θ) utilizes the maximum likelihood method (ML) 

④ this method has been proven to set a statistically not bad critical regions 

example 1. Xi ~ N(μ, σ2), σ2 is known 

① H0 : μ = μ0, H1 : μ ≠ μ0

② generalized likelihood ratio test  


drawing

③ τ-test: one-tailed test (confidence level: α)


drawing

④ Z-test : two-tailed test (confidence level: α)


drawing

⑤ it is proven that even if Xi does not follow the normal distribution, the above method can be applied approximately 

example 2. Xi ~ N(μ, σ2), σ2 is unknown 

① H0 : μ = μ0, H1 : μ ≠ μ0

② generalized likelihood ratio test  


drawing

③ F-test: one-tailed test (confidence level: α)


drawing

④ T-test: two-tailed test (confidence level: α) 


drawing

example 3. Xi ~ N(μ, σ2), σ2 is unknown 

① H0 : μ = μ0, H1 : μ > μ0

② generalized likelihood ratio test 


drawing

③ key assumptions 

○ Xavg ≥ μ0 has a higher likelihood ratio than Xavg < μ0, so the former has a higher priority in the line-up strategy than the latter 

○ as the significance level is only 0.025, 0.05, and 0.10 at most, it is sufficient to consider Xavg ≥ μ0 having the half of the full probable cases 

④ T-test : one-tailed test (confidence level: α)


drawing

⑤ H1 : the same logic is applied even if μ < μ0 

example 4. Xi ~ N(μ, σ2), μ is unknown 

① H0 : σ2 = σ02, H1 : σ2 ≠ σ02

② generalized likelihood ratio test


drawing

③ setting the critical region 

○ f(τ) is a function that is convex downwards with a minimal value at  τ = n

condition 1. P(τ ≥ k’ | H0) + P(τ ≤ k’’ | H0) = α 

condition 2. f(k’) = f(k’’)


drawing

④ τ-test: two-tailed test (confidence level: α)

○ numerical analysis is required to set an ideal critical region

○ in practice, simpler critical regions are used


drawing

example 5. special likelihood ratio test

① definition

○ in the case that Xi ~ N(μ, σ2) and σ2 is known, 2 ln λ ~ χ2(1) 

○ if the sample size n is large enough, the following is mathematically demonstrated for the number of parameters, i.e.


drawing

② τ-test: one-tailed test (confidence level : α)


drawing

③ supplements

○ some statisticians only refer to these tests as the likelihood ratio test (LR test)  

○ some statisticians define -2 ln λ = 2 ln ℒ(H1) - 2 ln ℒ(H0)) 



4. p value 

⑴ definition: probability of more extreme values than a given sample when null hypothesis is true

another definition : probability of null hypothesis being true

② rejecting only if the test statistic is included in the critical region and rejecting only if the p value is less than α are necessary and sufficient conditions

⑵ calculation : θ* is a measured value

① right-sided test: p value = P(θ ≥ θ*)

② left-sided test: p value = P(θ ≤ θ*)

symmetric distribution about μ: p value = P(|θ - μ| ≥ |θ* - μ|)

④ chi-squared distribution: if θ* is bigger than the median, p value = P(θ ≥ θ*). if θ* is smaller than the median, p value = P(θ ≤ θ*)

⑶ power and p value 

① the main issues in classical statistics are finding distribution and increasing power

② strict meaning: high power means that if the alternative hypothesis is true when α is constant, the probability of rejecting the null hypothesis is high

③ meaning that α is constant

○ meaning to define a constant Maginot line for each distribution obtained from various statistical techniques

○ meaning that many other cases other than a given sample are regarded as the null hypothesis is true even if they are not necessarily indicating the true null hypothesis

④ meaning of increasing 1-β: meaning of making the Maginot Line more extreme in various statistical techniques

⑤ intuitive meaning: higher power means that we will use statistical techniques that show a smaller p value when α is constant

example 1. for the same sample, using the F statistic than the t statistic has a smaller pvalue → higher power

example 2. the t-distribution becomes narrower as the degree of freedom increases → the power increases

⑧ different statistical techniques have different power: meaning that statistical conclusions may differ for the same statistical data

example : correlation coefficient and p value.

① H0 : X and Y are not correlated 

② meaning of p value : probability that the correlation coefficient of the sample group taken from the uncorrelated population is greater than the given correlation coefficient

③ assumtion of the calculation of the value through the normal distribution

○ random sampling data

○ two-variate normal distribution: two variables X and Y follow normal distribution

○ linear relationship: the relationship of two-order or three-order is not suitable 

○ failure to meet three conditions above, p value must be calculated by non-parametric test 

⑸ multiple testing problem 

① Overview

○ Problem Definition: Suppose we test 1,000 hypotheses and reject the null hypothesis for each hypothesis with a p-value less than α = 0.05. In this case, how many null hypotheses would we expect to be incorrectly rejected? The answer is approximately 50 (∵ 1000 × 0.05 = 50). Thus, we cannot assume that all rejected hypotheses are significant.

○ Key Issue: Conducting multiple statistical tests inherently increases the likelihood of inaccurate conclusions.

○ Example: This problem is particularly relevant when identifying differentially expressed genes (DEGs) from sequencing data consisting of multiple genes.

Solution 1: Controlling the Family-Wise Error Rate (FWER)

○ Definition: The probability of making at least one incorrect conclusion among all hypotheses. For instance, a 5% FWER means that the probability of making even a single incorrect conclusion is less than or equal to 5%. This approach is very conservative and minimizes false positives.

Methods 1. Sidak Correction: Adjusts the alpha threshold instead of the p-values. Used when p-values are independent.


스크린샷 2024-11-26 오후 3 38 43


○ d: Number of statistical tests

Method 2. Bonferroni Correction: Adjusts individual p-values directly. Can be applied even if p-values are not independent. Very conservative.


스크린샷 2024-11-26 오후 3 39 44


○ d: Number of statistical tests

○ Note: If the adjusted p-value exceeds 1, it is forcibly set to 1.

③ Solution 2: Controlling the False Discovery Rate (FDR)

○ Definition: Limits the proportion of incorrect conclusions (false discoveries) among hypotheses where the null hypothesis is rejected to a certain level.

Method 1. Benjamini–Hochberg (B&H):

○ Suitable when the correlations among tests are simple.

○ d: Number of statistical tests

○ rank: Sorting order of p-values

○ Note: If the adjusted p-value exceeds 1, it is forcibly set to 1.

○ The lower the rank (e.g., rank = 1), the lower the p-value should be. If this condition is not met, there is a step to adjust it.

○ Example: For significance level α, total tests m, and the i-th smallest p-value p(i)


스크린샷 2024-11-26 오후 3 42 05


Gene p-val Rank Initial Adj p-val Final Adj p-val
A 0.039 3 0.039 × (25/3) = 0.325 0.21
B 0.001 1 0.001 × (25/1) = 0.025 0.025
C 0.041 4 0.041 × (25/4) = 0.256 0.21
D 0.042 5 0.042 × (25/5) = 0.21 0.21
E 0.008 2 0.008 × (25/2) = 0.1 0.1


Table 1. Example of B&H Test with 25 Genes


Method 2. Benjamini–Yekutieli (B&Y):


스크린샷 2024-11-26 오후 3 43 44


○ Suitable for cases with complex correlations among tests.

○ d: Number of statistical tests

○ rank: Sorting order of p-values

○ ∑i=1d i/1: Adjusted constant to more conservatively control FDR by accounting for test correlations.

○ Note: If the adjusted p-value exceeds 1, it is forcibly set to 1.



Input : 2019.06.19 14:52

results matching ""

    No results matching ""