Chapter 14-6. Fisher Exact Test (Hypergeometric Test)
Recommended Article: 【Statistics】 Chapter 14. Statistical Testing
1. Example
2. Explanation
3. Application
1. Example
Figure 1. Example
⑴ Such a table as above is called a contingency table
2. Explanation
⑴ Premise: Marginal totals are known
① Marginal total: Refers to a + b, c + d, a + c, b + d
② It is also known that a + b + c + d = n
⑵ Null Hypothesis H0: Male and female groups are the same group
⑶ Modification of Null Hypothesis: Male group is just a group randomly selected from a + c individuals out of n people
⑷ Statistic 1. Probability (Probability of coming out like the sample): The probability that a of the randomly selected a + c individuals are studying
① Denominator: The case of randomly selecting a + c individuals out of n
② Numerator: Among n people, a + c are men, and a + b are studying (given), the case where a men are studying
③ The above calculation shows the same formula as the hypergeometric distribution
⑸ Statistic 2. Odds Ratio: A measure showing whether given male and female groups are similar or dissimilar
① Sometimes expressed as -log (odds ratio)
② Concept similar to fold change in genetic group analysis
③ An odds ratio of 1 means that the two groups are very similar.
⑹ Statistic 3. Ratio: Generally represents a/(a+c)
⑺ Statistic 4. Count: Usually represents a, the number of elements in the intersection of the given two sets
⑻ Statistic 5. p-value: The probability of obtaining a p-value equal to or smaller (i.e., more extreme) than the given probability p.
⑼ If the calculated p-value is very small
① The act of selecting the male group out of n people is not a random selection
② In other words, the male and female groups are different groups
○ ‘Different’ means that the ratio of men and women in the act of studying is significantly different
3. Application
⑴ Impact of sample size
① Can be used regardless of sample size
② Generally used when the sample size is small
○ In cases of large sample size, chi-squared test is generally used.
○ Due to the size of factorial calculations, Fisher exact test is usually used when the sample size is small
○ However, the size of factorial calculations can be circumvented by logarithmic calculations
③ However, if the p-value is too small, only Fisher’s exact test is used
○ Chi-squared test is based on approximation, so it always outputs 0 in this case
⑵ Can also be used to test the similarity or identity of two sets
Figure 2. Testing the similarity of two sets using Fisher’s exact test
① ‘Studying’ in Figure 1. corresponds to Set A in Figure 2. and ‘Men’ corresponds to Set B
⑶ Exact testing for general S × T tables
① M = (mst : s = 1, ···, S; t = 1, ···, T)
② Row sums: (ms* : s = 1, ···, S)
③ Col sums: (m*t : t = 1, ···, T)
④ n = ∑s∑t mst
⑷ Implementation in R
my.Fisher.exact.test <- function(total, A, B, cross){
a1 <- log10_factorial(A)
a2 <- log10_factorial(total - A)
a3 <- log10_factorial(B)
a4 <- log10_factorial(total - B)
b1 <- log10_factorial(cross)
b2 <- log10_factorial(A - cross)
b3 <- log10_factorial(B - cross)
b4 <- log10_factorial(total - cross - (A - cross) - (B - cross))
b5 <- log10_factorial(total)
out = a1 + a2 + a3 + a4 - b1 - b2 - b3 - b4 - b5
return(10^out)
}
Input: August 24, 2019, 01:28
Updated: April 18, 2022, 11:23