Chapter 19. Analysis of Variance of Regression Analysis
Higher category: 【Statistics】 Statistics Overview
1. ANOVA of simple linear regression analysis
2. ANCOVA
1. ANOVA of simple linear regression analysis
⑴ problem situation
Age (year) | Number of mites |
---|---|
3 | 5 |
6 | 13 |
9 | 16 |
12 | 14 |
15 | 18 |
18 | 23 |
21 | 20 |
24 | 32 |
27 | 29 |
30 | 28 |
⑵ table of t statistic
factor | coefficient | standard error | t | significance |
---|---|---|---|---|
Intercept | 5.733 | 2.265 | 2.531 | 0.035 |
Age | 0.853 | 0.122 | 7.006 | 0.001 |
⑶ table of F statistic
factor | sum of squares | df | mean square | F | significance |
---|---|---|---|---|---|
Regression | 539.648 | 1 | 539.648 | 49.086 | < 0.001 |
Residual | 87.952 | 8 | 10.994 | ||
Total | 627.600 | 9 |
① null hypothesis H0 : the slope of the regression line is equal to zero
② idea : if MS of regression > MS of residual, the slope of the regression line is not zero
③ calculation
④ the reason why the degree of freedom of the regression line is 1: because there is only one regression variable
⑷ the reason why F statistic shows greater power than t statistic : p value of F statistic is smaller
① F statistic uses more information than t statistic
② F statistics shows greater power due to the effect of larger sample sizes
2. ANCOVA (analysis of covariance)
⑴ overview
① concept of fusing simple linear regression analysis with one-way ANOVA
② necessity : in actual problem situations, the second factor changes due to the confounding effect of one factor, which can affect the dependent variable
③ difference from two-way ANOVA: ANCOVA technique does not compete with certain ANOVA techniques. ANOVA and ANCOVA can be performed at the same time
⑵ problem situation
① independent variable: whether it is a contaminated mine area or not
② dependent variable: lead concentration in organs of the rats
③ confounding effect: age
⑶ table of results without considering age
factor | sum of squares | degree of freedom | mean square | F ratio |
---|---|---|---|---|
Treatment | SS Treatment | k-1 | MS Treatment = SS Treatment / (k-1) | F = MS Treament / MS Error |
Error | SS Error | N-k | MS Error = SS Error / (N-k) | |
Sum | SS Total | N-1 |
① if the age effect is not controlled, the residuals become larger
② as the residuals increase, MS Error increases and F ratio decreases
③ as F ratio decreases, the power decreases: in other words, it is difficult to prove the significance of the treatment
⑷ assumptions
① homoscedasticity
② independency
③ normality
④ the relationship between a covariate and the dependent variable should be linear
⑤ parallelism
○ for example, when calculating the regression line for each contaminated mining area and non-contaminated area, the slopes will be the same
○ satisfying parallelism means the same thing as no interaction
○ if parallelism is not satisfied, comparing differences for one selected value (e.g., the overall mean of age) cannot represent the entire range of covariatess
○ before ANCOVA, the interaction of age and region should be evaluated to confirm parallelism
⑸ procedure
① 1st. confirm the correlation between age and lead concentration
② 2nd. confirm that the interaction of age and region is not statistically significant
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Age | |||||
Site | |||||
Age × Site | NS | ||||
Error | |||||
Total |
③ 3rd. calculate the regression line of lead concentration according to age
④ 4th. calculate two regression lines satisfying the following conditions from the regression line obtained from 3rd step
○ for the regression line obtained from the 3rd step, change only the y intercept while maintaining the slope of the regression line
○ minimize the sum of squares for each independent variable’s level
⑤ 5th. calculate the residuals from each regression line obtained from 4th step
⑥ 6th. after calculating the average age for the entire group, the function value of each regression line for that value is designated as the standard value
○ the average age for the entire group is just an example and it doesn’t matter what value it is.
⑦ 7th. mark the residuals obtained from 5th step up and down at the standard value for each treatment group
⑧ 8th. finally, you can see that SS Error is smaller: p value is smaller
⑹ result
① correction result
② table of results before correction
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Site | 320 | 1 | 320 | 2.74 | 0.115 |
Error | 2100.8 | 18 | 116.71 | ||
Total | 2420.800 | 19 |
③ table of results after correction: the sum of squares for Age can be calculated from the regression line
factor | sum of squares | degree of freedom | mean square | F ratio | p value |
---|---|---|---|---|---|
Age | 1776.290 | 1 | 1776.290 | 93.054 | < 0.001 |
Site | 1094.335 | 1 | 1094.335 | 57.329 | < 0.001 |
Error | 324.510 | 17 | 19.089 | ||
Total | 2420.800 | 19 |
④ report example : “A preliminary analysis for parallelism showed no significant difference between the slopes of the lines for lead concentration in relation to age (age × site: F1,16 = 0.00, NS). The subsequent ANCOVA showed a significant effect of site (F1,17 = 57.329, P < 0.001) as well as a significant effect of the covariate (age) (F1,17 = 93.054, P < 0.001). Rats from the mine site had higher levels of lead than those from the control.
⑺ the reason for not comparing the y-intercept of the regression line in the contaminated mine area with that of the regression line in the other area
① situation : if parallelism is satisfied, it is much easier to compare the y-intercepts
② comparing y-intercepts is similar to comparing sample groups with only one sample
③ because ANCOVA takes the total error terms, ANCOVA is similar to comparing sample groups with twice as many elements as the size of the given sample group
④ therefore, performing ANCOVA has higher power than simply comparing y-intercepts
⑻ application 1. 2-factor ANCOVA
① applying the ANCOVA technique when analyzing 2-way ANOVA
② example: when the independent variables are gender and drug treatment, the dependent variable is blood pressure, and the confounding factor is age
⑼ application 2. if there are multiple confounding factors
① multiple linear regression analysis is used
② advanced regression analysis may also be used
Input: 2019.12.07 23:04