14-2 Lecture. Simple Testing
Recommended Article : 【Statistics】 Lecture 14. Statistical Testing
1. Sign Test
2. ROC Analysis
1. Sign Test
⑴ Overview
① A test method that uses only the sign of the difference, ignoring the magnitude of the difference, to test the position of the median
② Convert the data into signs of + and - based on the median, and then test based on the number of these signs
③ Assumes that the data distribution is continuous and independent
⑵ Procedure
① Step 1. Sample extraction
○ Extract a continuous sample from the population
○ Define the remaining samples as X1, X2, …, Xn when the number of samples remaining after excluding samples equal to the assumed median θ0 is n
② Step 2. Test statistic
③ Step 3. Rejection region for significance level α
○ Null hypothesis : θ = θ0
○ If the alternative hypothesis is θ > θ0, then the rejection region is B ≥ b(α, n, 1/2)
○ If the alternative hypothesis is θ < θ0, then the rejection region is B ≤ b(α, n, 1/2)
○ If the alternative hypothesis is θ ≠ θ0, then the rejection region is B ≥ b(α/2, n, 1/2) or B < b(1 - α/2, b, 1/2)
2. ROC Analysis (receiver operator characteristic)
⑴ Parameter Definition
① TP (true positive) : The case where the actual value is true and the measured value is true. (Note) Means real positive
② FN (false negative) : The case where the actual value is true and the measured value is false. (Note) Means fake negative
③ FP (false positive) : The case where the actual value is false and the measured value is true. (Note) Means fake positive
④ TN (true negative) : The case where the actual value is false and the measured value is false. (Note) Means real negative
⑤ Sensitivity (true positive rate, TPR) or Recall : TP / (TP + FN)
⑥ Specificity : TN / (TN + FP)
⑦ Accuracy : (TP + TN) / (TP + FN + FP + TN)
⑧ Error rate : 1 - Accuracy
⑨ Precision or Positive Predictive Value (PPV) : TP / (TP + FP)
⑨ Negative Predictive Value (NPV) : TN / (TN + FN)
⑩ False Discovery Rate (FDR, false positive rate) : FP / (TN + FP)
⑪ F1 Score : 2 × precision × recall / (precision + recall)
○ A performance evaluation indicator that combines precision and sensitivity
○ Ranges from 0 to 1
○ The higher the precision and sensitivity, the higher the F1 Score
⑫ Kappa Statistic
○ K = (Pr(a) - Pr(e)) / (1 - Pr(e))
○ K : Kappa coefficient
○ Pr(a) : Probability of prediction being accurate
○ Pr(e) : Probability of prediction being coincidentally accurate
○ A method to measure the agreement of categorical values measured by two observers
○ Ranges from 0 to 1, with closer to 1 indicating better agreement between model predictions and actual values, and closer to 0 indicating disagreement
○ In addition to accuracy, the kappa statistic is used to demonstrate that the evaluation results of the model are not coincidental
⑵ Concordance Index
① Generally, adjusting the threshold causes sensitivity and specificity to show opposite trends
Figure. 1. Trend of sensitivity and specificity with respect to the threshold
② ROC curve : A graph visualized with 1 - specificity (= FDR) on the x-axis and sensitivity on the y-axis
Figure. 2. AOC curve
○ The ideal case is when both sensitivity and specificity are 1
③ Concordance Index : Refers to the area under the AOC curve
④ If the ROC is random, the concordance index = 0.5
⑤ The concordance cannot exceed 1
Input : 2021.04.13 15:22