Chapter 14-6. Fisher Exact Test (Hypergeometric Test)
Recommended Article : 【Statistics】 Chapter 14. Statistical Testing
1. Example
2. Explanation
3. Application
1. Example
Figure. 1. Example
⑴ Such a table as above is called a contingency table
2. Explanation
⑴ Premise : Marginal totals are known
① Marginal total : Refers to a + b, c + d, a + c, b + d
② It is also known that a + b + c + d = n
⑵ Null Hypothesis H0 : Male and female groups are the same group
⑶ Modification of Null Hypothesis : Male group is just a group randomly selected from a + c individuals out of n people
⑷ Statistic 1. Probability (Probability of coming out like the sample) : The probability that a of the randomly selected a + c individuals are studying
① Denominator : The case of randomly selecting a + c individuals out of n
② Numerator : Among n people, a + c are men, and a + b are studying (given), the case where a men are studying
⑸ Statistic 2. Odds Ratio : A measure showing whether given male and female groups are similar or dissimilar
① Sometimes expressed as - log (odds ratio)
② Concept similar to fold change in genetic group analysis
⑹ Statistic 3. Ratio : Generally represents a / (a+c)
⑺ Statistic 4. Count : Usually represents a, the number of elements in the intersection of the given two sets
⑻ The above calculation shows the same formula as the hypergeometric distribution
⑼ If the calculated p-value is very small
① The act of selecting the male group out of n people is not a random selection
② In other words, the male and female groups are different groups
○ ‘Different’ means that the ratio of men and women in the act of studying is significantly different
3. Application
⑴ Impact of sample size
① Can be used regardless of sample size
② Generally used when the sample size is small
○ In cases of large sample size, chi-squared test is generally used
○ Due to the size of factorial calculations, Fisher exact test is usually used when the sample size is small
○ However, the size of factorial calculations can be circumvented by logarithmic calculations
③ However, if the p-value is too small, only Fisher’s exact test is used
○ Chi-squared test is based on approximation, so it always outputs 0 in this case
⑵ Can also be used to test the similarity or identity of two sets
Figure. 2. Testing the similarity of two sets using Fisher’s exact test
① ‘Studying’ in Figure. 1. corresponds to Set A in Figure. 2. and ‘Men’ corresponds to Set B
⑶ Implementation in R
my.Fisher.exact.test <- function(total, A, B, cross){
a1 <- log10_factorial(A)
a2 <- log10_factorial(total - A)
a3 <- log10_factorial(B)
a4 <- log10_factorial(total - B)
b1 <- log10_factorial(cross)
b2 <- log10_factorial(A - cross)
b3 <- log10_factorial(B - cross)
b4 <- log10_factorial(total - cross - (A - cross) - (B - cross))
b5 <- log10_factorial(total)
out = a1 + a2 + a3 + a4 - b1 - b2 - b3 - b4 - b5
return(10^out)
}
Input : August 24, 2019, 01:28
Updated : April 18, 2022, 11:23