Chapter 20. Analysis of Variance of Regression Analysis
Higher category: 【Statistics】 Statistics Overview
1. ANOVA of simple linear regression analysis
2. ANCOVA
1. ANOVA of simple linear regression analysis
⑴ problem situation
Age (year) | Number of mites |
---|---|
3 | 5 |
6 | 13 |
9 | 16 |
12 | 14 |
15 | 18 |
18 | 23 |
21 | 20 |
24 | 32 |
27 | 29 |
30 | 28 |
Table 1. ANOVA of simple linear regression analysis
⑵ table of t statistic
factor | coefficient | standard error | t | significance |
---|---|---|---|---|
Intercept | 5.733 | 2.265 | 2.531 | 0.035 |
Age | 0.853 | 0.122 | 7.006 | 0.001 |
Table 2. table of t statistic
⑶ table of F statistic
factor | sum of squares | df | mean square | F | significance |
---|---|---|---|---|---|
Regression | 539.648 | 1 | 539.648 | 49.086 | < 0.001 |
Residual | 87.952 | 8 | 10.994 | ||
Total | 627.600 | 9 |
Table 3. table of F statistic
① null hypothesis H0 : the slope of the regression line is equal to zero
② idea : if MS of regression > MS of residual, the slope of the regression line is not zero
③ calculation
④ the reason why the degree of freedom of the regression line is 1: because there is only one regression variable
⑷ the reason why F statistic shows greater power than t statistic : p value of F statistic is smaller
① F statistic uses more information than t statistic
② F statistics shows greater power due to the effect of larger sample sizes
2. ANCOVA
⑴ overview
① concept of fusing simple linear regression analysis with one-way ANOVA
② necessity : in actual problem situations, the second factor changes due to the confounding effect of one factor, which can affect the dependent variable
③ difference from two-way ANOVA: ANCOVA technique does not compete with certain ANOVA techniques. ANOVA and ANCOVA can be performed at the same time
⑵ problem situation
① independent variable: whether it is a contaminated mine area or not
② dependent variable: lead concentration in organs of the rats
③ confounding effect: age
⑶ table of results without considering age
factor | sum of squares | degree of freedom | mean square | F ratio |
---|---|---|---|---|
Treatment | SS Treatment | k-1 | MS Treatment = SS Treatment / (k-1) | F = MS Treament / MS Error |
Error | SS Error | N-k | MS Error = SS Error / (N-k) | |
Sum | SS Total | N-1 |
Table 4. table of results of simple one-way ANOVA
① if the age effect is not controlled, the residuals become larger
② as the residuals increase, MS Error increases and F ratio decreases
③ as F ratio decreases, the power decreases: in other words, it is difficult to prove the significance of the treatment
⑷ assumptions
① homoscedasticity
② independency
③ normality
④ the relationship between a covariate and the dependent variable should be linear
⑤ parallelism
○ for example, when calculating the regression line for each contaminated mining area and non-contaminated area, the slopes will be the same
○ satisfying parallelism means the same thing as no interaction
○ if parallelism is not satisfied, comparing differences for one selected value (e.g., the overall mean of age) cannot represent the entire range of covariatess
Figure 1. example of lack of parallelism in ANCOVA
○ before ANCOVA, the interaction of age and region should be evaluated to confirm parallelism
⑸ procedure
① 1st. confirm the correlation between age and lead concentration
Figure 2. correlation between age and lead concentration
② 2nd. confirm that the interaction of age and region is not statistically significant
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Age | |||||
Site | |||||
Age × Site | NS | ||||
Error | |||||
Total |
Table 5. table of results including interaction term
③ 3rd. calculate the regression line of lead concentration according to age
Figure 3. regression line of lead concentration according to age
④ 4th. calculate two regression lines satisfying the following conditions from the regression line obtained from 3rd step
Figure 4. calculation of two regression lines
○ for the regression line obtained from the 3rd step, change only the y intercept while maintaining the slope of the regression line
○ minimize the sum of squares for each independent variable’s level
⑤ 5th. calculate the residuals from each regression line obtained from 4th step
Figure 5. calculation of residuals
⑥ 6th. after calculating the average age for the entire group, the function value of each regression line for that value is designated as the standard value
○ the average age for the entire group is just an example and it doesn’t matter what value it is.
Figure 6. calculation of standard value
⑦ 7th. mark the residuals obtained from 5th step up and down at the standard value for each treatment group
Figure 7. final result
⑧ 8th. finally, you can see that SS Error is smaller: p value is smaller
Figure. 8. comparison of results
⑹ result
① correction result
Figure 9. lead concentration in contaminated mine areas
Figure 10. lead concentration in non-contaminated mine areas
② table of results before correction
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Site | 320 | 1 | 320 | 2.74 | 0.115 |
Error | 2100.8 | 18 | 116.71 | ||
Total | 2420.800 | 19 |
Table 6. table of results before correction
③ table of results after correction: the sum of squares for Age can be calculated from the regression line
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Age | 1776.290 | 1 | 1776.290 | 93.054 | < 0.001 |
Site | 1094.335 | 1 | 1094.335 | 57.329 | < 0.001 |
Error | 324.510 | 17 | 19.089 | ||
Total | 2420.800 | 19 |
Table 7. table of results after correction
④ report example : “A preliminary analysis for parallelism showed no significant difference between the slopes of the lines for lead concentration in relation to age (age × site: F1,16 = 0.00, NS). The subsequent ANCOVA showed a significant effect of site (F1,17 = 57.329, P < 0.001) as well as a significant effect of the covariate (age) (F1,17 = 93.054, P < 0.001). Rats from the mine site had higher levels of lead than those from the control.
⑺ the reason for not comparing the y-intercept of the regression line in the contaminated mine area with that of the regression line in the other area
① situation : if parallelism is satisfied, it is much easier to compare the y-intercepts
② comparing y-intercepts is similar to comparing sample groups with only one sample
③ because ANCOVA takes the total error terms, ANCOVA is similar to comparing sample groups with twice as many elements as the size of the given sample group
④ therefore, performing ANCOVA has higher power than simply comparing y-intercepts
⑻ application 1. 2-factor ANCOVA
① applying the ANCOVA technique when analyzing 2-way ANOVA
② example: when the independent variables are gender and drug treatment, the dependent variable is blood pressure, and the confounding factor is age
⑼ application 2. if there are multiple confounding factors
① multiple linear regression analysis is used
② advanced regression analysis may also be used
Input: 2019.12.07 23:04