Chapter 20. Analysis of Variance of Regression Analysis

Higher category: 【Statistics】 Statistics Overview

1. ANOVA of simple linear regression analysis

⑴ problem situation

Age (year)	Number of mites
3	5
6	13
9	16
12	14
15	18
18	23
21	20
24	32
27	29
30	28

Table 1. ANOVA of simple linear regression analysis

⑵ table of t statistic

factor	coefficient	standard error	t	significance
Intercept	5.733	2.265	2.531	0.035
Age	0.853	0.122	7.006	0.001

Table 2. table of t statistic

⑶ table of F statistic

factor	sum of squares	df	mean square	F	significance
Regression	539.648	1	539.648	49.086	< 0.001
Residual	87.952	8	10.994
Total	627.600	9

Table 3. table of F statistic

① null hypothesis H0 : the slope of the regression line is equal to zero

② idea : if MS of regression ＞ MS of residual, the slope of the regression line is not zero

③ calculation

④ the reason why the degree of freedom of the regression line is 1: because there is only one regression variable

⑷ the reason why F statistic shows greater power than t statistic : p value of F statistic is smaller

① F statistic uses more information than t statistic

② F statistics shows greater power due to the effect of larger sample sizes

2. ANCOVA

⑴ overview

① concept of fusing simple linear regression analysis with one-way ANOVA

② necessity : in actual problem situations, the second factor changes due to the confounding effect of one factor, which can affect the dependent variable

③ difference from two-way ANOVA: ANCOVA technique does not compete with certain ANOVA techniques. ANOVA and ANCOVA can be performed at the same time

⑵ problem situation

① independent variable: whether it is a contaminated mine area or not

② dependent variable: lead concentration in organs of the rats

③ confounding effect: age

⑶ table of results without considering age

factor	sum of squares	degree of freedom	mean square	F ratio
Treatment	SS Treatment	k-1	MS Treatment = SS Treatment / (k-1)	F = MS Treament / MS Error
Error	SS Error	N-k	MS Error = SS Error / (N-k)
Sum	SS Total	N-1

Table 4. table of results of simple one-way ANOVA

① if the age effect is not controlled, the residuals become larger

② as the residuals increase, MS Error increases and F ratio decreases

③ as F ratio decreases, the power decreases: in other words, it is difficult to prove the significance of the treatment

⑷ assumptions

① homoscedasticity

② independency

③ normality

④ the relationship between a covariate and the dependent variable should be linear

⑤ parallelism

○ for example, when calculating the regression line for each contaminated mining area and non-contaminated area, the slopes will be the same

○ satisfying parallelism means the same thing as no interaction

○ if parallelism is not satisfied, comparing differences for one selected value (e.g., the overall mean of age) cannot represent the entire range of covariatess

Figure 1. example of lack of parallelism in ANCOVA

○ before ANCOVA, the interaction of age and region should be evaluated to confirm parallelism

⑸ procedure

① 1^st. confirm the correlation between age and lead concentration

Figure 2. correlation between age and lead concentration

② 2^nd. confirm that the interaction of age and region is not statistically significant

factor	sum of squares	degree of freedom	mean square	F ratio	p value
Age
Site
Age × Site					NS
Error
Total

Table 5. table of results including interaction term

③ 3^rd. calculate the regression line of lead concentration according to age

Figure 3. regression line of lead concentration according to age

④ 4^th. calculate two regression lines satisfying the following conditions from the regression line obtained from 3^rd step

Figure 4. calculation of two regression lines

○ for the regression line obtained from the 3^rd step, change only the y intercept while maintaining the slope of the regression line

○ minimize the sum of squares for each independent variable’s level

⑤ 5^th. calculate the residuals from each regression line obtained from 4^th step

Figure 5. calculation of residuals

⑥ 6^th. after calculating the average age for the entire group, the function value of each regression line for that value is designated as the standard value

○ the average age for the entire group is just an example and it doesn’t matter what value it is.

Figure 6. calculation of standard value

⑦ 7^th. mark the residuals obtained from 5th step up and down at the standard value for each treatment group

Figure 7. final result

⑧ 8^th. finally, you can see that SS Error is smaller: p value is smaller

Figure. 8. comparison of results

⑹ result

① correction result

Figure 9. lead concentration in contaminated mine areas

Figure 10. lead concentration in non-contaminated mine areas

② table of results before correction

factor	sum of squares	degree of freedom	mean square	F ratio	p value
Site	320	1	320	2.74	0.115
Error	2100.8	18	116.71
Total	2420.800	19

Table 6. table of results before correction

③ table of results after correction: the sum of squares for Age can be calculated from the regression line

factor	sum of squares	degree of freedom	mean square	F ratio	p value
Age	1776.290	1	1776.290	93.054	< 0.001
Site	1094.335	1	1094.335	57.329	< 0.001
Error	324.510	17	19.089
Total	2420.800	19

Table 7. table of results after correction

④ report example : “A preliminary analysis for parallelism showed no significant difference between the slopes of the lines for lead concentration in relation to age (age × site: F_1,16 = 0.00, NS). The subsequent ANCOVA showed a significant effect of site (F_1,17 = 57.329, P < 0.001) as well as a significant effect of the covariate (age) (F_1,17 = 93.054, P < 0.001). Rats from the mine site had higher levels of lead than those from the control.

⑺ the reason for not comparing the y-intercept of the regression line in the contaminated mine area with that of the regression line in the other area

① situation : if parallelism is satisfied, it is much easier to compare the y-intercepts

② comparing y-intercepts is similar to comparing sample groups with only one sample

③ because ANCOVA takes the total error terms, ANCOVA is similar to comparing sample groups with twice as many elements as the size of the given sample group

④ therefore, performing ANCOVA has higher power than simply comparing y-intercepts

⑻ application 1. 2-factor ANCOVA

① applying the ANCOVA technique when analyzing 2-way ANOVA

② example: when the independent variables are gender and drug treatment, the dependent variable is blood pressure, and the confounding factor is age

⑼ application 2. if there are multiple confounding factors

① multiple linear regression analysis is used

② advanced regression analysis may also be used

Input: 2019.12.07 23:04

1798

Chapter 20. Analysis of Variance of Regression Analysis

1. ANOVA of simple linear regression analysis

2. ANCOVA

results matching ""

No results matching ""