Chapter 18. Advanced Regression Analysis

Higher category : 【Statistics】 Statistics Overview

1. validity

2. panel data

3. instrumental variable

4. randomized controlled experiment

5. quasi-experiment

6. heterogeneous population

1. validity

⑴ internal validity

① definition : qualitative evaluation of whether each coefficient obtained as a result of regression analysis is reasonably calculated

② threat 1. omitted variable bias

○ definition: if there are variables that satisfy the following two conditions, the expected value of the residual is not zero

○ condition 1. the omitted variable correlates with one or several existing variables

○ condition 2. the omitted variable must be the determinator of Y

○ example of the expected value of the residual

○ solution

○ include omitted variables in regression analysis

○ if there is no data related to the omitted variable, the following three methods exist:

○ method 1. panel data: remove properties that do not change over time

○ method 2. instrumental variable regression: only essence information can be extracted through instrumental variables

○ method 3. collecting new information under randomized controlled experiment

③ threat 2. wrong functional form bias

○ definition: bias arising from linear regression analysis in nonlinear relationships

○ a kind of omitted variable bias

④ threat 3. errors-in-variable bias or measurement error in the regressors

○ definition : an independent variable with measurement error, X̃_i, can be correlated with error v_i

○ formula

○ issue 1. the iron law of econometrics: the OLS estimator of slope tends to be lower than the true value

○ issue 2. OLS estimator doesn’t have consistency

○ issue 3. statistical estimation is inaccurate

○ solution

○ method 1. improvement of accuracy of measuring instruments

○ method 2. instrumental variable regression: only essence information can be extracted through instrumental variables

○ method 3. error correction: correction is possible if there is a pattern of error

○ (note) if there is a measurement error in the dependent variable

○ formula

○ the estimator of the slope does not change

○ satisfying the three major assumptions of a simple linear regression model

○ assumption 1. X_i does not provide any information on v_i

○ assumption 2. X_i and Ỹ_i are i.i.d.

○ as Y_i and wi are i.i.d. and mutually independent, Ỹ_i is i.i.d.

○ as X_i is independent with Y_j or w_j for i ≠ j, X_i and Ỹ_i are independent

○ therefore, the assumption 2 is satisfied

○ assumption 3. existence of 4^th order moment

○ because u_i and w_i have finite 4th order moments and mutually independent, v_i = u_i + w_i has a finitie 4^th order moment

○ thus, (X_i, v_i) has non-zero finite 4-th moment

○ there are three differences between errors-in-variable bias

○ difference 1. OLS estimator has consistency

○ difference 2. statistical estimation is accurate

○ difference 3. increases variance of regression errors → increases the variance of OLS estimator

⑤ threat 4. sample selection bias

○ when bias occurs in the data selection process

○ in other words, bias is generated by deducing the characteristics of the entire group from a part

○ example 1. recruitment rate for factor A and factor B

○ assume that the recruitment rate increases as A and B increase

○ people with low A factor do not want to apply

○ among those with low A factor, those with high B factor apply

○ as a result, the employment rate regression curve for A factor measures the effect on A factor lower than the actual one

⑥ threat 5. simultaneous causality bias

○ it is natural that there is a casual link from the independent variable to the dependent variable

○ if there is a causal link from the dependent variable to the independent variable, bias occurs in the coefficient of the independent variable

○ it is as if the feedback circuit is expressed in a complex formula

○ positive feedback circuit : increases the absolute value of the coefficients

○ negative feedback circuit : decreases the absolute value of the coefficients

○ example : birth rate and mortality rate have a mutual causal relationship. similar to a positive feedback circuit

○ solution

○ method 1. instrumental variable regression: extracts only the essence information that has been removed from the causal link

○ method 2. randomized controlled experiment: eliminate causality of dependent variables by randomly performing the treatment

⑵ external validity

① definition : a qualitative evaluation of whether the coefficients for each independent variable obtained from regression analysis are applicable to other populations

② threat 1. non-representative sample: difference in populations themselves

③ threat 2. non-representative program or policy: difference in system

○ different systems can violate external validity even if the population is the same

○ example : difference in educational environment, difference in laws and institutions, difference in physical environment, etc

④ threat 3. general equilibrium effect

○ definition: treatment changes the overall environment, which can amplify or suppress the effectiveness of treatment

○ similar to simultaneous causality bias

○ example : effect of the existence of oil fields on income

○ existence of oil fields → increase in workers’ income

○ increase in workers’ income → increase in the inflow of new workers

○ increase in home purchase → increase in housing prices due to lack of housing → decrease in income

○ increased car congestion → factors of income reduction

○ increased demand for increased restaurant quality due to increased income → increase in dining out costs → factor of decreasing income

⑤ solutions

○ method for adjusting the conclusions of the regression relationship according to population and setting

○ meta-analysis: comparing conclusions of similar but not identical populations

2. panel data

⑴ overview

① referring to the following data

② balanced panel data : all entities are equipped in all time intervals

③ unbalanced panel data : if it is not balanced panel data

④ (comparison) repeated cross-sectional data

○ panel data is data tracked for each individual

○ repeated cross-sectional data is data obtained over time

○ even repeated cross-sectional data can include the same person in the before and after data and is cheap

⑵ before and after regression model

① formula

○ this model can remove constant elements over time

○ Z is different from intercept because it has different values depending on i

② a kind of fixed effect regression model

⑶ fixed effect regression model

① major assumptions

○ assumption 1. E(u_it X_i1, ···, X_iT, α_i)</span> = 0: it is not sufficient that E(u_it X_it, α_i) = 0 (∵ information of all time is used for the average value of y and u)

○ assumption 2. (X_i1, ···, X_iT, u_i1, ···, u_iT) is i.i.d. under joint distribution : in other words, it does not mean that cov(u_it, u_is) = 0 (assuming t ≠ s)

○ assumption 3. existence of 4^th order moment

○ assumption 4. no perfect multicollinearity : X_it must depend on t

○ under the majore assumptions, fixed effect estimator satisfies consistency and asymptotic normality

○ even if n increases to infinite, the average of Y on time does not satisfy consistency and normality (∵ n and T are irrelevant)

② formula

○ data should be comprehended as a table represented on the axes of i and t

○ in the case of T = 2, the situation is the same with before and after regression model

○ the standard deviation of slope = clustered standard error = heteroscedasticity & autocorrelation consistent standard error (HAC)

○ there are not a total of T regression lines up to t = 1, · · · and T. it’s just one regression line

○ it is not for β_{1, t} but β₁

③ an example of algorithm

data <- read.csv("C:/Users/sun/Desktop/Guns.csv", header = T)
attach(data)

y <- data[, 2]
y <- log(y)
x1 <- data[, 13]
x2 <- data[, 5]
x3 <- data[, 11]
x4 <- data[, 10]
x5 <- data[, 9]
x6 <- data[, 6]
x7 <- data[, 7]
x8 <- data[, 8]

state_y <- array(dim = 56)
state_x1 <- array(dim = 56)
state_x2 <- array(dim = 56)
state_x3 <- array(dim = 56)
state_x4 <- array(dim = 56)
state_x5 <- array(dim = 56)
state_x6 <- array(dim = 56)
state_x7 <- array(dim = 56)
state_x8 <- array(dim = 56)

for(i in 1:56){
    if(i != 3 && i != 7 && i != 14 && i != 43 && i != 52){
        data_sub <- data[stateid == i, ]
        state_y[i] <- mean(data_sub[, 2])
        state_x1[i] <- mean(data_sub[, 13])
        state_x2[i] <- mean(data_sub[, 5])
        state_x3[i] <- mean(data_sub[, 11])
        state_x4[i] <- mean(data_sub[, 10])
        state_x5[i] <- mean(data_sub[, 9])
        state_x6[i] <- mean(data_sub[, 6])
        state_x7[i] <- mean(data_sub[, 7])
        state_x8[i] <- mean(data_sub[, 8])
    }
}

Y <- array(dim = 1173)
X1 <- array(dim = 1173)
X2 <- array(dim = 1173)
X3 <- array(dim = 1173)
X4 <- array(dim = 1173)
X5 <- array(dim = 1173)
X6 <- array(dim = 1173)
X7 <- array(dim = 1173)
X8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 12]
    Y[i] <- y[i] - state_y[j]
    X1[i] <- x1[i] - state_x1[j]
    X2[i] <- x2[i] - state_x2[j]
    X3[i] <- x3[i] - state_x3[j]
    X4[i] <- x4[i] - state_x4[j]
    X5[i] <- x5[i] - state_x5[j]
    X6[i] <- x6[i] - state_x6[j]
    X7[i] <- x7[i] - state_x7[j]
    X8[i] <- x8[i] - state_x8[j]
}



RELATION <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8)

summary(RELATION)

④ when a result of applying the fixed effect regression model shows a significantly different conclusion from the original result

○ strong implication that there was an omitted variable bias in the original model

⑤ even if the main assumptions are satisfied, there can be autocorrelation

○ autocorrelation: uit and uit* (t ≠ t*) also have serial correlation. associated with HAC

○ the case without autocorrelation

Figure 1. the case without autocorrelation

○ the case with autocorrelation

Figure 2. the case with autocorrelation

○ the proof of cov(v_it, v_is) = 0 (assuming k ≠ s) in the case without autocorrelation

⑷ matrix notation of fixed effect regression model

① modeling

② assumption

③ fixed effects estimator

④ consistency

⑤ asymptotic normality

⑸ least squares dummy variables model (LSDV)

① formula

② the reason why D1_i is not included: to avoid perfect multi-collinearity

○ formula

○ if the coefficient exists, γ₁ cannot be specified

○ perfect multicollinearity caused by dummy variables is also called dummy variable trap

③ cannot perform the regression analysis if range of i (i.e., n) is too large: because there are too many regression variables

⑹ time effect

① time effect term: marked as λ_t

② modeling

③ an example of algorithm

data <- read.csv("C:/Users/sun/Desktop/Guns.csv", header = T)
attach(data)

# definition
y <- data[, 2]
y <- log(y)
x1 <- data[, 13]
x2 <- data[, 5]
x3 <- data[, 11]
x4 <- data[, 10]
x5 <- data[, 9]
x6 <- data[, 6]
x7 <- data[, 7]
x8 <- data[, 8]

# elimination of fixed state effects
state_y <- array(dim = 56)
state_x1 <- array(dim = 56)
state_x2 <- array(dim = 56)
state_x3 <- array(dim = 56)
state_x4 <- array(dim = 56)
state_x5 <- array(dim = 56)
state_x6 <- array(dim = 56)
state_x7 <- array(dim = 56)
state_x8 <- array(dim = 56)

for(i in 1:56){
    if(i != 3 && i != 7 && i != 14 && i != 43 && i != 52){
        data_sub <- data[stateid == i, ]
        state_y[i] <- mean(data_sub[, 2])
        state_x1[i] <- mean(data_sub[, 13])
        state_x2[i] <- mean(data_sub[, 5])
        state_x3[i] <- mean(data_sub[, 11])
        state_x4[i] <- mean(data_sub[, 10])
        state_x5[i] <- mean(data_sub[, 9])
        state_x6[i] <- mean(data_sub[, 6])
        state_x7[i] <- mean(data_sub[, 7])
        state_x8[i] <- mean(data_sub[, 8])
    }
}

Y <- array(dim = 1173)
X1 <- array(dim = 1173)
X2 <- array(dim = 1173)
X3 <- array(dim = 1173)
X4 <- array(dim = 1173)
X5 <- array(dim = 1173)
X6 <- array(dim = 1173)
X7 <- array(dim = 1173)
X8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 12]
    Y[i] <- y[i] - state_y[j]
    X1[i] <- x1[i] - state_x1[j]
    X2[i] <- x2[i] - state_x2[j]
    X3[i] <- x3[i] - state_x3[j]
    X4[i] <- x4[i] - state_x4[j]
    X5[i] <- x5[i] - state_x5[j]
    X6[i] <- x6[i] - state_x6[j]
    X7[i] <- x7[i] - state_x7[j]
    X8[i] <- x8[i] - state_x8[j]
}

# elimination of fixed time effects
time_Y <- array(dim = 23)
time_X1 <- array(dim = 23)
time_X2 <- array(dim = 23)
time_X3 <- array(dim = 23)
time_X4 <- array(dim = 23)
time_X5 <- array(dim = 23)
time_X6 <- array(dim = 23)
time_X7 <- array(dim = 23)
time_X8 <- array(dim = 23)

for(t in 77:99){
    data_sub2 <- data[year == t, ]
    time_Y[t - 76] <- mean(data_sub2[, 2]) - mean(state_y, na.rm = TRUE)
    time_X1[t - 76] <- mean(data_sub2[, 13]) - mean(state_x1, na.rm = TRUE)
    time_X2[t - 76] <- mean(data_sub2[, 5]) - mean(state_x2, na.rm = TRUE)
    time_X3[t - 76] <- mean(data_sub2[, 11]) - mean(state_x3, na.rm = TRUE)
    time_X4[t - 76] <- mean(data_sub2[, 10]) - mean(state_x4, na.rm = TRUE)
    time_X5[t - 76] <- mean(data_sub2[, 9]) - mean(state_x5, na.rm = TRUE)
    time_X6[t - 76] <- mean(data_sub2[, 6]) - mean(state_x6, na.rm = TRUE)
    time_X7[t - 76] <- mean(data_sub2[, 7]) - mean(state_x7, na.rm = TRUE)
    time_X8[t - 76] <- mean(data_sub2[, 8]) - mean(state_x8, na.rm = TRUE)
}

YY <- array(dim = 1173)
XX1 <- array(dim = 1173)
XX2 <- array(dim = 1173)
XX3 <- array(dim = 1173)
XX4 <- array(dim = 1173)
XX5 <- array(dim = 1173)
XX6 <- array(dim = 1173)
XX7 <- array(dim = 1173)
XX8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 1]
    YY[i] <- Y[i] - time_Y[j - 76]
    XX1[i] <- X1[i] - time_X1[j - 76]
    XX2[i] <- X2[i] - time_X2[j - 76]
    XX3[i] <- X3[i] - time_X3[j - 76]
    XX4[i] <- X4[i] - time_X4[j - 76]
    XX5[i] <- X5[i] - time_X5[j - 76]
    XX6[i] <- X6[i] - time_X6[j - 76]
    XX7[i] <- X7[i] - time_X7[j - 76]
    XX8[i] <- X8[i] - time_X8[j - 76]
}

RELATION <- lm(YY ~ XX1 + XX2 + XX3 + XX4 + XX5 + XX6 + XX7 + XX8)

summary(RELATION)

⑺ time effect regression using dummy variables

① formula

② the reason why B1_t is not included : to avoid perfect multi-collinearity

○ formula

○ if the coefficient exists, δ₁ cannot be specified

○ perfect multi-collinearity caused by dummy variables is also called dummy variable trap

3. instrumental variable

⑴ definition: a method of separating only the essence information of the regression variable using the third variable

⑵ simple expression

① modeling

○ if there is one regression variable

○ if there are multiple regression variables

○ endogenous variable : a variable correlated with u_i

○ exogenous variable : a variable that is not correlated with u_i

○ exactly identified : m = k

○ over-identified : m ＞ k

○ under-identified : m ＜ k

○ unable to perform modeling in under-identified : it means that there should be a lot of instrumental variables

○ the reason why W is included : it is useful when it is difficult to find a Z that meets the criteria

② assumptions for using instrumetnal variables

○ assumption 1. E(u_i | W_1i, ···, W_ri) = 0

○ assumption 2. (X_1i, ···, X_ki, W_1i, ···, W_ri, Z_1i, ···, Z_mi, Y_i) is i.i.d.

○ assumption 3. all variables have finite 4th order moment

○ assumption 4. instrumental variable effectiveness

○ 4-1. instrument relevance

○ 4-2. instrument exogeneity

○ 4-3. no perfectly collinearity

○ if the assumptions are satisfied, the TSLS estimator satisfies consistency and asymptotic normality

③ procedure

○ if there is one regression variable

○ 1^st. perform regression analysis of X_i by using instrumental variable Z_i

○ 2^nd. calculate the estimator of X_i

○ 3^rd. perform regression analysis of Y_i by using the estimator of X_i

○ if there are multiple regression variables

○ 1^st. perform regression analysis of X_i by using instrumental variable Z_i : for ℓ = 1, ···, k,

○ 2^nd. calculate the estimator of X_i : for ℓ = 1, ···, k,

○ 3^rd. perform regression analysis of Y_i by using the estimator of X_i : for ℓ = 1, ···, k,

○ doing OLS regression twice may miscalculate the standard error

④ two-step least squares (TSLS) estimator

○ formula

○ proof

○ (note) if Z_i := X_i, the TSLS estimator of β₁ is the same with the OLS estimator of β₁

⑤ consistency

⑥ asymptotic normality

⑶ supplement of instrumental variable effectiveness

① instrument relevance

○ formula

○ weak instrumental variable: the case that the instrumental variable is not sufficiently correlated with the regression variable. the estimaor shows very strange values

○ test of instrumental variable strength

○ when calculating 1^st stage F statistic, if F is bigger than 10 the instrumental variable is strong

○ available only in homoskedasticity

○ W_1i, ···, W_ri have nothing to do with the strength of an instrumental variable

② instrument exogeneity

○ formula

○ u must be specified to know the instrument exogenuity

○ over-identifying restrictions test

○ when the following statistics are calculated, J follows a chi-squared distribution with a degree of freedom of m-k

○ null hypothesis for J, H₀ : the proposition that instrumental variables are exogenous

○ logic is similar to instrument relevance : if the F statistic is small, it means that there is no correlation (all coefficients 0)

○ available only in homoskedasticity : many statistical programs also offer heteroskedasticity-robust J-test

○ unable to determine which instrumental variable is endogenous when rejecting null hypothesis

○ meaning of the degree of freedom in J statistic

○ k instumental variables are used to make residuals: they correspond to k endogenous variables

○ the remaining m-k instrumental variables are used to test the correlation with the residuals

○ unable to apply J test in the case of exactly identified because there are no instumental variables to be used in correlation relationship analysis: J statistic is always zero in this case

③ no perfect collinearity

⑷ matrix notation

① modeling

○ X_i and Z_i may overlap

② assumptions

○ Y_i = X_i^tβ + u_i

○ (Y_i, X_i, Z_i), i = 1, ···, N is i.i.d.

○ E(u_i | Z_i) = 0

○ E(Z_iX_i^t), E(Z_iZ_i^t) have inverse matrices

○ Z_i, ,X_i, and ui have finite 4^th order moments

③ procedure

○ 1^st. perform regression analysis of X_i by using instrumental variable Z_i

○ 2^nd. calculate the estimator of X_i

○ 3^rd. perform regression analysis of Y_i by using the estimator of Xi

④ estimator

⑤ consistency

⑥ asymptotic normality

⑦ estimator of the variance of the normal distribution

⑸ exploration of instrumental variables: the exploration is in the realm of art

① Joshua Angrist (MIT)

② Steven Levitt (Chicago): published “Freakonomics”

③ Daron Acemoglu (MIT): published “Why Nations Fail”

4. randomized controlled experiment

⑴ overview

① definition: randomly extracting subjects from the population and then dividing the groups randomly again to perform different treatments

② randomized controlled experiments are rare in econometrics

③ randomized controlled experiments can remove omitted variable bias: 100% validity is not guaranteed

④ it provides standard for which to judge causality

⑵ formula

① simple model

② model including additional regression variables

③ reason for adding additional regression variables

○ reason 1. randomization check

○ regardless of whether additional regression variables are present or not, β₁ is consistent

○ it was not random if β₁ changed significantly depending on the presence or absence of additional regression variables

○ reason 2. efficiency: if there are additional regression variables, the variance is smaller

○ reason 3. conditional randomization

○ depending on the individual characteristics of a person, it may not be random even if it appears to be randomly extracted

○ random sampling with additional regression variables fixed can minimize such concerns

○ the following conditional independence must be satisfied for the β₁ estimator to be consistent: a weaker conditon than independence

○ interaction: treatment effect can depend on W

⑶ threats to internal validity

① failure to randomize

○ not only the treatment effect but also the non-random assignment effect appears

○ hypothesis test: if the coefficients are all zero when performing regression analysis of pre-treatment characteristics of W_1i, ···, W_ri by using X_i, the experiment can be regarded as a randomized experiment

○ example: if random processing is performed by name, a specific ethnic groups may be assigned to the processing group preferentially

② failure to follow treatment protocol (partial compliance)

○ definition: even though random processing works well, subjects may not comply with the protocol well

○ due to this, X_i can be correlated with ui

○ randomized encourage design : partial complieance can be identified if the random treatment is the instrumental variable and the real treatment is analyzed under instrumental variable regression

③ attrition

○ definition: excluding subjects for reasons related to treatment after random sampling

④ Hawthorne effect

○ definition : the subject’s knowledge of what experiments he or she is carrying out can affect the results of the experiment

○ in a new drug research, double blind test can be used to avoid this issue

○ difficult to perform double-blind test in econometrics

⑤ small sample

○ because human-related research is expensive, the sample size is small

○ many statistical estimations are based on asymptotic normality

○ if the sample size is small, the sample should not be estimated by the normal distribution

⑷ threats to external validity

① non-representative sample

○ general econometric experiments target undergraduate volunteers

○ volunteers are more motivated and can be overestimated in terms of meauring effects

② non-representative program or policy

○ the experimental program or policy should be similar to the actual one

○ example: the experimental program is performed for a short period of time. real-life areas of curiosity may require longer time

③ general equilibrium effect

○ definition: treatment changes the overall environment, which can amplify or suppress the effectiveness of treatment

○ small experiments do not reflect changes in the environment, so external validity must be considered separately

5. quasi-experiment

⑴ definition

① an experiment in which an independent variable is not under the control of a researcher and is conducted in a natural situation

② also known as natural experiment

③ objective: program evaluation

⑵ method 1. differences-in-differences (DID) estimator

① the simplest model (assuming panel data)

Figure 3. graphical representation of DID estimator

② model with additional regression variables (assuming panel data): because conditions may change between before and after data

③ criterion for repeated cross-sectional data

⑶ method 2. instrumental variable regression

① 1^st. define Z_i as a regression variable in a randomized controlled experiment

② 2^nd. Zi is a good instrumental variable for Xi: instrument relevance is satistifed

③ 3^rd. Yi is the result of interest.

④ 4^th. evaluate the effect of X_i on Y_i with Z_i as an instrumental variable

⑷ method 3. regression discontinuity design (RDD)

① overview

○ if you set the threshold (cut-off) ω₀, the data near the threshold may be similar

○ when data near the threshold is processed differently, the following difference can be completely seen as a treatment effect

○ it is a very popular experimental technique

○ disadvantage : difficult to apply a regression discontinuity design to the outlier

② sharp regression discontinuity design

Figure 4. sharp regression discontinuity design

③ fuzzy regression discontinuity design

○ the experiment may not be tested as smoothly as Xi defined in the sharp regression discontinuity design

○ the following instrumental variable Zi can be a good instrumental variable on actual Xi

⑸ threats to internal validity

① failure to randomize

○ not only the treatment effect but also the non-random assignment effect appears

○ hypothesis test : if the coefficients are all zero when performing regression analysis of pre-treatment characteristics of W_1i, ···, W_ri by using X_i, the experiment can be regarded as a randomized experiment

○ example : if random processing is performed by name, a specific ethnic group may be assigned to the processing group preferentially

② failure to follow treatment protocol (partial compliance)

○ definition : even though random processing works well, subjects may not comply with the protocol well

○ due to this, X_i can be correlated with ui

○ randomized encourage design: partial compliance can be identified if the random treatment is the instrumental variable and the real treatment is analyzed under instrumental variable regression

③ attrition

○ definition: excluding subjects for reasons related to treatment after random sampling

④ No Hawthorne effect

○ no reason to be cautious about the Hawthorne effect in a quasi-experiment: because it is a natural experiment

⑤ instrumental variance effectiveness

○ instrument relevance can be evaluated through data

○ instrument exogenity may not be established even if the instrumental variable appears to be randomly assigned

○ example: Xi and ui may have a correlation while inducing low-numbered people to act to avoid conscription even if researchers want to see income according to the number of draft lottery

⑹ threats to external validity

① non-representative sample

② non-representative program or policy

③ general equilibrium effect

⑺ criticism

① attempts are performed to find good variables in quasi-experiments

② there aren’t that many really good quasi-experiments

6. heterogeneous population

⑴ definition: the case that the coefficients of regression line β_0i , β_1i are not constants but vary according to to sample

① β_1i : heterogenous effect of X_i

② the parameter of interest is E(β_1i)

③ if β_1i is observable, models using interaction can be used

④ if β_1i is unobservable, it is analyzed as follows

⑵ OLS

① assumption : X_i should be random → X_i and (u_i, β_0i, β_1i) should be independent

○ conditions that are difficult to satisfy in practice

② formula

⑶ instrumental variables estimation (IV)

① assumption : Z_i should be random → Z_i and (u_i, v_i, β_0i, β_1i, π_0i, π_1i) should be independent

② formula

○ E(β_1iπ_1i) / E(π_1i) is called local average treatment effect (LATE)

③ conditions for equalizing LATE and ATE

○ case 1. β_1i = β₁ = constant: no heteroscedasticity is required

○ case 2. π_1i = π₁ : no heteroscedasticity in instumental variable

○ case 3. β_1i and π_1i are independent

④ connotations

○ difficult to evaluate instrument exogenuity

○ J-test only tells the difference between LATEs

Input: 2019.11.26 10:29

1792

Chapter 18. Advanced Regression Analysis

1. validity

2. panel data

3. instrumental variable

4. randomized controlled experiment

5. quasi-experiment

6. heterogeneous population

results matching ""

No results matching ""