Korean, Spanish, Edit

Chapter 14-6. Fisher Exact Test (Hypergeometric Test)

Recommended Article: 【Statistics】 Chapter 14. Statistical Testing


1. Example

2. Explanation

3. Application



1. Example


image

Figure 1. Example


⑴ Such a table as above is called a contingency table



2. Explanation

⑴ Premise: Marginal totals are known

① Marginal total: Refers to a + b, c + d, a + c, b + d

② It is also known that a + b + c + d = n

⑵ Null Hypothesis H0: Male and female groups are the same group

⑶ Modification of Null Hypothesis: Male group is just a group randomly selected from a + c individuals out of n people

Statistic 1. Probability (Probability of coming out like the sample): The probability that a of the randomly selected a + c individuals are studying


image


① Denominator: The case of randomly selecting a + c individuals out of n

② Numerator: Among n people, a + c are men, and a + b are studying (given), the case where a men are studying

③ The above calculation shows the same formula as the hypergeometric distribution

Statistic 2. Odds Ratio: A measure showing whether given male and female groups are similar or dissimilar


image


① Sometimes expressed as -log (odds ratio)

② Concept similar to fold change in genetic group analysis

③ An odds ratio of 1 means that the two groups are very similar.

Statistic 3. Ratio: Generally represents a/(a+c)

Statistic 4. Count: Usually represents a, the number of elements in the intersection of the given two sets

Statistic 5. p-value: The probability of obtaining a p-value equal to or smaller (i.e., more extreme) than the given probability p.

⑼ If the calculated p-value is very small

① The act of selecting the male group out of n people is not a random selection

② In other words, the male and female groups are different groups

○ ‘Different’ means that the ratio of men and women in the act of studying is significantly different



3. Application

⑴ Impact of sample size

① Can be used regardless of sample size

② Generally used when the sample size is small

○ In cases of large sample size, chi-squared test is generally used.

○ Due to the size of factorial calculations, Fisher exact test is usually used when the sample size is small

○ However, the size of factorial calculations can be circumvented by logarithmic calculations

③ However, if the p-value is too small, only Fisher’s exact test is used

Chi-squared test is based on approximation, so it always outputs 0 in this case

⑵ Can also be used to test the similarity or identity of two sets


스크린샷 2025-03-28 오후 6 57 13

Figure 2. Testing the similarity of two sets using Fisher’s exact test


① ‘Studying’ in Figure 1. corresponds to Set A in Figure 2. and ‘Men’ corresponds to Set B

⑶ Exact testing for general S × T tables


스크린샷 2025-03-28 오후 6 58 14


① M = (mst : s = 1, ···, S; t = 1, ···, T)

② Row sums: (ms* : s = 1, ···, S)

③ Col sums: (m*t : t = 1, ···, T)

④ n = ∑st mst

⑷ Implementation in R

my.Fisher.exact.test <- function(total, A, B, cross){
  a1 <- log10_factorial(A)
  a2 <- log10_factorial(total - A)
  a3 <- log10_factorial(B)
  a4 <- log10_factorial(total - B)

  b1 <- log10_factorial(cross)
  b2 <- log10_factorial(A - cross)
  b3 <- log10_factorial(B - cross)
  b4 <- log10_factorial(total - cross - (A - cross) - (B - cross))
  b5 <- log10_factorial(total)

  out = a1 + a2 + a3 + a4 - b1 - b2 - b3 - b4 - b5
  return(10^out)
}



Input: August 24, 2019, 01:28

Updated: April 18, 2022, 11:23

results matching ""

    No results matching ""