Korean, Edit

Chapter 19. Analysis of Variance of Regression Analysis

Higher category: 【Statistics】 Statistics Overview


1. ANOVA of simple linear regression analysis

2. ANCOVA



1. ANOVA of simple linear regression analysis 

⑴ problem situation

Age (year) Number of mites
3 5
6 13
9 16
12 14
15 18
18 23
21 20
24 32
27 29
30 28
Table. 1. ANOVA of simple linear regression analysis


⑵ table of t statistic 

factor coefficient standard error t significance
Intercept 5.733 2.265 2.531 0.035
Age 0.853 0.122 7.006 0.001
Table. 2. table of t statistic 


drawing

⑶ table of F statistic

factor sum of squares df mean square F significance
Regression 539.648 1 539.648 49.086 < 0.001
Residual 87.952 8 10.994    
Total 627.600 9      
Table. 3. table of F statistic


① null hypothesis H0 : the slope of the regression line is equal to zero 

② idea : if MS of regression > MS of residual, the slope of the regression line is not zero 

③ calculation


drawing

④ the reason why the degree of freedom of the regression line is 1: because there is only one regression variable

⑷ the reason why F statistic shows greater power than t statistic : p value of F statistic is smaller 

① F statistic uses more information than t statistic

② F statistics shows greater power due to the effect of larger sample sizes



2. ANCOVA (analysis of covariance) 

⑴ overview

① concept of fusing simple linear regression analysis with one-way ANOVA

② necessity : in actual problem situations, the second factor changes due to the confounding effect of one factor, which can affect the dependent variable

③ difference from two-way ANOVA: ANCOVA technique does not compete with certain ANOVA techniques. ANOVA and ANCOVA can be performed at the same time

⑵ problem situation

① independent variable: whether it is a contaminated mine area or not

② dependent variable: lead concentration in organs of the rats

③ confounding effect: age

⑶ table of results without considering age


factor sum of squares degree of freedom mean square F ratio
Treatment SS Treatment k-1 MS Treatment = SS Treatment / (k-1) F = MS Treament / MS Error
Error SS Error N-k MS Error = SS Error / (N-k)  
Sum SS Total N-1    
Table. 4. table of results of simple one-way ANOVA 


① if the age effect is not controlled, the residuals become larger

② as the residuals increase, MS Error increases and F ratio decreases

③ as F ratio decreases, the power decreases: in other words, it is difficult to prove the significance of the treatment

⑷ assumptions

① homoscedasticity

② independency

③ normality

④ the relationship between a covariate and the dependent variable should be linear

⑤ parallelism

○ for example, when calculating the regression line for each contaminated mining area and non-contaminated area, the slopes will be the same

○ satisfying parallelism means the same thing as no interaction 

○ if parallelism is not satisfied, comparing differences for one selected value (e.g., the overall mean of age) cannot represent the entire range of covariatess


drawing

Figure. 1. example of lack of parallelism in ANCOVA


○ before ANCOVA, the interaction of age and region should be evaluated to confirm parallelism

⑸ procedure

① 1st. confirm the correlation between age and lead concentration


drawing

Figure. 2. correlation between age and lead concentration


② 2nd. confirm that the interaction of age and region is not statistically significant 


factor sum of squares degree of freedom mean square F ratio p value
Age          
Site          
Age × Site         NS
Error          
Total          
Table. 5. table of results including interaction term


③ 3rd. calculate the regression line of lead concentration according to age 


drawing

Figure. 3. regression line of lead concentration according to age


④ 4th. calculate two regression lines satisfying the following conditions from the regression line obtained from 3rd step 


drawing

Figure. 4. calculation of two regression lines


○ for the regression line obtained from the 3rd step, change only the y intercept while maintaining the slope of the regression line 

○ minimize the sum of squares for each independent variable’s level  

⑤ 5th. calculate the residuals from each regression line obtained from 4th step


drawing

Figure. 5. calculation of residuals


⑥ 6th. after calculating the average age for the entire group, the function value of each regression line for that value is designated as the standard value

○ the average age for the entire group is just an example and it doesn’t matter what value it is.


drawing

Figure. 6. calculation of standard value


⑦ 7th. mark the residuals obtained from 5th step up and down at the standard value for each treatment group


drawing

Figure. 7. final result


⑧ 8th. finally, you can see that SS Error is smaller: p value is smaller


drawing

Figure. 8. comparison of results


⑹ result

① correction result


drawing

Figure. 9. lead concentration in contaminated mine areas


drawing

Figure. 10. lead concentration in non-contaminated mine areas


② table of results before correction


factor sum of squares degree of freedom mean square F ratio p value
Site 320 1 320 2.74 0.115
Error 2100.8 18 116.71    
Total 2420.800 19      
Table. 6. table of results before correction


③ table of results after correction: the sum of squares for Age can be calculated from the regression line

factor sum of squares degree of freedom mean square F ratio p value
Age 1776.290 1 1776.290 93.054 < 0.001
Site 1094.335 1 1094.335 57.329 < 0.001
Error 324.510 17 19.089    
Total 2420.800 19      
Table. 7. table of results after correction


④ report example : “A preliminary analysis for parallelism showed no significant difference between the slopes of the lines for lead concentration in relation to age (age × site: F1,16 = 0.00, NS). The subsequent ANCOVA showed a significant effect of site (F1,17 = 57.329, P < 0.001) as well as a significant effect of the covariate (age) (F1,17 = 93.054, P < 0.001). Rats from the mine site had higher levels of lead than those from the control.

⑺ the reason for not comparing the y-intercept of the regression line in the contaminated mine area with that of the regression line in the other area

① situation : if parallelism is satisfied, it is much easier to compare the y-intercepts 

② comparing y-intercepts is similar to comparing sample groups with only one sample

③ because ANCOVA takes the total error terms, ANCOVA is similar to comparing sample groups with twice as many elements as the size of the given sample group 

④ therefore, performing ANCOVA has higher power than simply comparing y-intercepts

application 1. 2-factor ANCOVA 

① applying the ANCOVA technique when analyzing 2-way ANOVA

② example: when the independent variables are gender and drug treatment, the dependent variable is blood pressure, and the confounding factor is age  

application 2. if there are multiple confounding factors  

① multiple linear regression analysis is used 

advanced regression analysis may also be used



Input: 2019.12.07 23:04

results matching ""

    No results matching ""