Korean, Edit

Chapter 14. Statistical Test

Higher category: 【Statistics】 Statistics Overview


1. terminology

2. Neyman-Pearson lemma

3. generalized likelihood ratio test

4. p value


a. Comprehensive Summary of Statistical Test Examples

b. Simple Test

c. Kruskal-Wallis H Test

d. Wilcoxon Rank Test

e. Run Test

f. Fisher Exact Test (hypergeometric test)

g. Chi-Squared Test



1. terminology 

⑴ test

① definition : verifying if the hypothesis is statistically significant

application 1. randomization check (balance test): verifying that the random sampling goes well 

application 2. causal effect: verifying that a particular treatment makes a significant change

② test statistic: summarizing the n-dimensional information of the state space in one dimension and using it for statistical test

○ example: Z, T, χ2, F, etc 

○ being able to be summarized in one dimension is important when the size of the critical region is constant

③ parametric test 

○ definition: testing parameters based on test statistics

○ in general, it is assumed that the distribution of the population is normal distribution: the central limit theorem is used for this assumption 

○ in reality, using a parametric test on any sample without the above assumption is not a big problem

④ Non-parametric test

○ Definition: A method of testing non-parametric characteristics through test statistics.

○ Used when the population distribution cannot be specified (distribution-free method).

○ Compared to parametric methods, the calculation of statistics is simpler and more intuitive to understand.

○ Less affected by outliers.

○ The reliability of the test statistics is often insufficient.

⑵ hypothesis

① null hypothesis (H0): a hypothesis to be tested directly 

② alternative hypothesis (H1): a hypothesis to be accepted when null hypothesis is rejected 

○ Also known as the research hypothesis.

③ characteristics: for parameter θ,

○ H0 : Θ0 ={θ0, θ0’, θ0’’, ···} 

○ H1 : Θ1 ={θ1, θ1’, θ1’’, ···}

characteristic 1. p(θ ∈ Θ0 or θ ∈ Θ1) = 1

characteristic 2. p(θ ∈ Θ0 and θ ∈ Θ1) = 0

④ classification

○ simple hypothesis: in the cases of Θ ={θ0}, Θ ={θ1}, ···

○ composite hypothesis: if the hypothesis is not a simple hypothesis

○ example: the hypotheses that H0 : θ ≤ θ0, H1 : θ > θ0 are composite hypotheses

⑶ introduction of critical region 

① state space = critical region + acceptable region

○ Rejection Region: The range of test statistics that lead to the rejection of the null hypothesis.

○ sample ∈ critical region: H0 is rejected 

○ sample ∉ critical region: H0 is not rejected 

② power function πC(θ): the probability of the samples being included in the critical region when the critical region is C and the parameter is θ


drawing

③ an example of power function 

○ p(x) = 4x3 / θ4 I{0 < x < θ}

○ C ={x x ≤ 0.5, x > 1}

○ θ ≤ 0.5 


πC(θ) = 1


○ 0.5 < θ ≤ 1 


πC(θ) = ∫ p(x) dx (assuming x ∈ \[0, 0.5\] ) = 1 / 16θ4


○ 1 < θ


πC(θ) = ∫ p(x) dx (assuming x ∈ \[0, 0.5\] ∪\[1, θ\] ) = 1 - 15 / 16θ4


④ size of critical region (size of test): the maximum value of the probability of samples being included in the critical region when the null hypothesis is true 


drawing

⑤ power : the probability of samples being included in the critical region when the alternative hypothesis is true. it is also the probability of null hypothesis to be rejected when the alternative hypothesis is true 


drawing

⑥ error : making a wrong statistical conclusion 

○ ideal critical region 


drawing

○ type Ⅰ error 

○ definition : the error rejecting null hypothesis when null hypothesis is true

○ condition : defined when null hypothesis is simple hypothesis

○ the probability of type Ⅰ error (α) = the size of the critical region 

○ significance level: 10%, 5%, 1%, etc 

○ confidence level: 90%, 95%, 99%, etc 

○ type Ⅱ error 

○ definition : the error adopting null hypothesis when alternative hypothesis is true

○ condition : defined when alternative hypothesis is simple hypothesis

○ the probability of type Ⅱ error (β) = 1 - power

○ trade-off between α and β 


drawing

Figure. 1. trade-off between type Ⅰ error (α) and type Ⅱ error (β)


○ the critical region appears as an interval larger than a specific value or less than a specific value ( Neyman-Pearson lemma)

○ both α and β cannot be reduced

⑷ comparison of critical region 

criterion: power should be greater when the critical region is same in size

② more powerful testing: for a specific θ1 ∈ Θ1 and two critical regions C1, C2


drawing

③ most powerful testing : for a specific θ1 ∈ Θ1 and any critical region C,


drawing

④ uniformly most powerful testing: for any θ ∈ Θ1 and any critical region C,


drawing


2. Neyman-Pearson lemma 

⑴ idea

① premise: H0 : θ = θ0, H1 : θ = θ1 (simple hypothesis)

② question : finding a critical region that maximizes the power when the size of the critical region is constant

③ speculation: samples from a state space are included in the critical region C one by one 

○ when a sample x is included C, both p(x, θ0) and p(x, θ1) increase

○ p(x, θ0): a kind of cost. the increase of p(x, θ0) increases the size of the critical region 

○ p(x, θ1): a kind of benefit. the increase of p(x, θ1) increases the power 

④ conclusion 

○ line-up strategy: it is advantageous to include a sample x having more p(x, θ1) ÷ p(x, θ0) in the critical region in a faster order  

the critical region made by the line-up strategy, C ={x | p(x, θ1) ÷ p(x, θ0) ≥ k}, is a critical region for uniformly most powerful testing

⑵ lemma 

① premise : H0, H1 are simple hypotheses

② statement : for any k ∈ ℝ, if we take the following critical region it will be a critical region for uniformly most powerful testing 


drawing

○ ℒ : likelihood function

○ likelihood ratio test (LR test): a test like λ(x) ≥ k

○ determination of critical region: to know the exact form of critical region, the size of the critical region should be given 

○ every x satisfying λ(x) ≥ k is included in critical region C*

○ every x satisfying λ(x) < k is not included in critical region C*

③ application

○ as only the order of p(x, θ1) ÷ p(x, θ0) is important, the following conversion using a monotone increasing function f(·) is allowed


drawing

○ terms related to θ0, θ1, n, etc are easily removed 

○ point: the modification of critical region is allowed as far as the existence of k’ is ensured  

○ determination of critical region: to know the exact form of critical region the size of the critical region should be given 

⑶ proof 

① assumption : C* and C are same in size


drawing

② definition of C* 


drawing

③ conclusion: C* is a critical region with the uniformly most powerful testing


drawing

example 1.

① X1, ···, Xn ~ Bernoulli(θ)


drawing

② H0 : θ = θ0, H1 : θ = θ1 > θ0

③ likelihood ratio test


drawing

④ Z-test (confidence level : α) 

○ θ1 > θ0 : one-tailed test


drawing

○ θ1 < θ0 : one-tailed test 


drawing

○ critical region with uniformly most powerful testing does not exist because the size of critical region of the most powerful testing depends on whether θ0 is bigger than θ1 or not  

example 2.

① X1, ···, Xn ~ N(μ, 12)


drawing

② H0 : μ = μ0, H1 : μ = μ1 > μ0

③ likelihood ratio test


drawing

④ Z-test : one-tailed test (confidence level : α)


drawing

generalization 1. the form of the critical region is constant no matter whether H1 is a composite hypothesis or not, when the critical region does not depend on the specific values of θ1

① X1, ···, Xn ~ Bernoulli(θ)

② H0 : θ = θ0, H1 : θ > θ0

generalization 2. in the generalization 1, the form of the critical region is constant when the alternative hypothesis is a composite hypothesis including θ0 and α is the maximum if θ = θ0 

① X1, ···, Xn ~ Bernoulli(θ)

② H0 : θ < θ0, H1 : θ > θ0



3. generalized likelihood ratio test  

⑴ definition

① the limit of Neyman-Pearson lemma : in general, the null hypothesis and the alternative hypothesis should be simple hypotheses

② GLR test (generalized likelihood ratio test) 


drawing

③ max p(x, θ) utilizes the maximum likelihood method (ML) 

④ this method has been proven to set a statistically not bad critical regions 

example 1. Xi ~ N(μ, σ2), σ2 is known 

① H0 : μ = μ0, H1 : μ ≠ μ0

② generalized likelihood ratio test  


drawing

③ τ-test: one-tailed test (confidence level: α)


drawing

④ Z-test : two-tailed test (confidence level: α)


drawing

⑤ it is proven that even if Xi does not follow the normal distribution, the above method can be applied approximately 

example 2. Xi ~ N(μ, σ2), σ2 is unknown 

① H0 : μ = μ0, H1 : μ ≠ μ0

② generalized likelihood ratio test  


drawing

③ F-test: one-tailed test (confidence level: α)


drawing

④ T-test: two-tailed test (confidence level: α) 


drawing

example 3. Xi ~ N(μ, σ2), σ2 is unknown 

① H0 : μ = μ0, H1 : μ > μ0

② generalized likelihood ratio test 


drawing

③ key assumptions 

○ Xavg ≥ μ0 has a higher likelihood ratio than Xavg < μ0, so the former has a higher priority in the line-up strategy than the latter 

○ as the significance level is only 0.025, 0.05, and 0.10 at most, it is sufficient to consider Xavg ≥ μ0 having the half of the full probable cases 

④ T-test : one-tailed test (confidence level: α)


drawing

⑤ H1 : the same logic is applied even if μ < μ0 

example 4. Xi ~ N(μ, σ2), μ is unknown 

① H0 : σ2 = σ02, H1 : σ2 ≠ σ02

② generalized likelihood ratio test


drawing

③ setting the critical region 

○ f(τ) is a function that is convex downwards with a minimal value at  τ = n

condition 1. P(τ ≥ k’ | H0) + P(τ ≤ k’’ | H0) = α 

condition 2. f(k’) = f(k’’)


drawing

④ τ-test: two-tailed test (confidence level: α)

○ numerical analysis is required to set an ideal critical region

○ in practice, simpler critical regions are used


drawing

example 5. special likelihood ratio test

① definition

○ in the case that Xi ~ N(μ, σ2) and σ2 is known, 2 ln λ ~ χ2(1) 

○ if the sample size n is large enough, the following is mathematically demonstrated for the number of parameters, i.e.


drawing

② τ-test: one-tailed test (confidence level : α)


drawing

③ supplements

○ some statisticians only refer to these tests as the likelihood ratio test (LR test)  

○ some statisticians define -2 ln λ = 2 ln ℒ(H1) - 2 ln ℒ(H0)) 



4. p value 

⑴ definition: probability of more extreme values than a given sample when null hypothesis is true

another definition : probability of null hypothesis being true

② rejecting only if the test statistic is included in the critical region and rejecting only if the p value is less than α are necessary and sufficient conditions

⑵ calculation : θ* is a measured value

① right-sided test: p value = P(θ ≥ θ*)

② left-sided test: p value = P(θ ≤ θ*)

symmetric distribution about μ: p value = P(|θ - μ| ≥ |θ* - μ|)

④ chi-squared distribution: if θ* is bigger than the median, p value = P(θ ≥ θ*). if θ* is smaller than the median, p value = P(θ ≤ θ*)

⑶ power and p value 

① the main issues in classical statistics are finding distribution and increasing power

② strict meaning: high power means that if the alternative hypothesis is true when α is constant, the probability of rejecting the null hypothesis is high

③ meaning that α is constant

○ meaning to define a constant Maginot line for each distribution obtained from various statistical techniques

○ meaning that many other cases other than a given sample are regarded as the null hypothesis is true even if they are not necessarily indicating the true null hypothesis

④ meaning of increasing 1-β: meaning of making the Maginot Line more extreme in various statistical techniques

⑤ intuitive meaning: higher power means that we will use statistical techniques that show a smaller p value when α is constant

example 1. for the same sample, using the F statistic than the t statistic has a smaller pvalue → higher power

example 2. the t-distribution becomes narrower as the degree of freedom increases → the power increases

⑧ different statistical techniques have different power: meaning that statistical conclusions may differ for the same statistical data

example : correlation coefficient and p value.

① H0 : X and Y are not correlated 

② meaning of p value : probability that the correlation coefficient of the sample group taken from the uncorrelated population is greater than the given correlation coefficient

③ assumtion of the calculation of the value through the normal distribution

○ random sampling data

○ two-variate normal distribution: two variables X and Y follow normal distribution

○ linear relationship: the relationship of two-order or three-order is not suitable 

○ failure to meet three conditions above, p value must be calculated by non-parametric test 

⑸ multiple testing problem 

① false positive : even if the significance threshold is set, incorrectly classified false-positive cases may occur

② multiple testing problem: a typical p vaule would show statistically significnat conclusions even if they are not the reality  

○ problematic especially in bioinformatics 

○ if the p value threshold is set to 5%, the significant gene can be interpreted as the meaningful gene assuming the false positive results would not come up in one test, but if it is conducted on 30,000 genes, 1,500 false positive seemingly significant genes would come up and the 1,500 genes should not be interpreted as meaningful genes

③ solution: adjusted p value is introduced 

○ q-value Bonferroni : it interprets data conservatively

○ q-value FDR B&H (Benjamini–Hochberg method): it manipulates false discovery rate (FDR) 

○ q-value FDR B&Y : it manipulates false discovery rate(FDR)



Input : 2019.06.19 14:52

results matching ""

    No results matching ""