Korean, Edit

Chapter 18. Advanced Regression Analysis 

Higher category : 【Statistics】 Statistics Overview

1. validity

2. panel data

3. instrumental variable

4. randomized controlled experiment

5. quasi-experiment

6. heterogeneous population

1. validity 

⑴ internal validity

① definition : qualitative evaluation of whether each coefficient obtained as a result of regression analysis is reasonably calculated

threat 1. omitted variable bias 

○ definition: if there are variables that satisfy the following two conditions, the expected value of the residual is not zero

condition 1. the omitted variable correlates with one or several existing variables

condition 2. the omitted variable must be the determinator of Y

○ example of the expected value of the residual


○ solution 

○ include omitted variables in regression analysis

○ if there is no data related to the omitted variable, the following three methods exist:

method 1. panel data: remove properties that do not change over time

method 2. instrumental variable regression: only essence information can be extracted through instrumental variables

method 3. collecting new information under randomized controlled experiment

threat 2. wrong functional form bias 

○ definition: bias arising from linear regression analysis in nonlinear relationships

○ a kind of omitted variable bias 

④ threat 3. errors-in-variable bias or measurement error in the regressors 

○ definition : an independent variable with measurement error, X̃i, can be correlated with error vi 

○ formula


issue 1. the iron law of econometrics: the OLS estimator of slope tends to be lower than the true value


issue 2. OLS estimator doesn’t have consistency 

○ issue 3. statistical estimation is inaccurate

○ solution 

method 1. improvement of accuracy of measuring instruments

method 2. instrumental variable regression: only essence information can be extracted through instrumental variables

method 3. error correction: correction is possible if there is a pattern of error

○ (note) if there is a measurement error in the dependent variable

○ formula


○ the estimator of the slope does not change


○ satisfying the three major assumptions of a simple linear regression model

assumption 1. Xi does not provide any information on vi 


assumption 2. Xi and Ỹi are i.i.d. 

○ as Yi and wi are i.i.d. and mutually independent, Ỹi is i.i.d. 

○ as Xi is independent with Yj or wj for i ≠ j, Xi and Ỹi are independent 

○ therefore, the assumption 2 is satisfied 

assumption 3. existence of 4th order moment 

○ because ui and wi have finite 4th order moments and mutually independent, vi = ui + wi has a finitie 4th order moment 

○ thus, (Xi, vi) has non-zero finite 4-th moment 

○ there are three differences between errors-in-variable bias

○ difference 1. OLS estimator has consistency 

difference 2. statistical estimation is accurate 

difference 3. increases variance of regression errors → increases the variance of OLS estimator


threat 4. sample selection bias

○ when bias occurs in the data selection process

○ in other words, bias is generated by deducing the characteristics of the entire group from a part 

example 1. recruitment rate for factor A and factor B

○ assume that the recruitment rate increases as A and B increase

○ people with low A factor do not want to apply

○ among those with low A factor, those with high B factor apply

○ as a result, the employment rate regression curve for A factor measures the effect on A factor lower than the actual one

threat 5. simultaneous causality bias 

○ it is natural that there is a casual link from the independent variable to the dependent variable

○ if there is a causal link from the dependent variable to the independent variable, bias occurs in the coefficient of the independent variable

○ it is as if the feedback circuit is expressed in a complex formula

○ positive feedback circuit : increases the absolute value of the coefficients

○ negative feedback circuit : decreases the absolute value of the coefficients

○ example : birth rate and mortality rate have a mutual causal relationship. similar to a positive feedback circuit

○ solution 

method 1. instrumental variable regression: extracts only the essence information that has been removed from the causal link

method 2. randomized controlled experiment: eliminate causality of dependent variables by randomly performing the treatment

⑵ external validity 

① definition : a qualitative evaluation of whether the coefficients for each independent variable obtained from regression analysis are applicable to other populations

② threat 1. non-representative sample: difference in populations themselves

threat 2. non-representative program or policy: difference in system 

○ different systems can violate external validity even if the population is the same

○ example : difference in educational environment, difference in laws and institutions, difference in physical environment, etc

threat 3. general equilibrium effect 

○ definition: treatment changes the overall environment, which can amplify or suppress the effectiveness of treatment

○ similar to simultaneous causality bias

example : effect of the existence of oil fields on income

○ existence of oil fields → increase in workers’ income

○ increase in workers’ income → increase in the inflow of new workers

○ increase in home purchase → increase in housing prices due to lack of housing → decrease in income

○ increased car congestion → factors of income reduction

○ increased demand for increased restaurant quality due to increased income → increase in dining out costs → factor of decreasing income

⑤ solutions

○ method for adjusting the conclusions of the regression relationship according to population and setting

○ meta-analysis: comparing conclusions of similar but not identical populations

2. panel data

⑴ overview 

① referring to the following data


② balanced panel data : all entities are equipped in all time intervals

③ unbalanced panel data : if it is not balanced panel data

④ (comparison) repeated cross-sectional data 

○ panel data is data tracked for each individual

○ repeated cross-sectional data is data obtained over time 

○ even repeated cross-sectional data can include the same person in the before and after data and is cheap

⑵ before and after regression model

① formula


○ this model can remove constant elements over time 

○ Z is different from intercept because it has different values depending on i

② a kind of fixed effect regression model 

⑶ fixed effect regression model

① major assumptions

assumption 1. E(uit Xi1, ···, XiT, αi)</span> = 0: it is not sufficient that E(uit Xit, αi) = 0 ( information of all time is used for the average value of y and u)

assumption 2. (Xi1, ···, XiT, ui1, ···, uiT) is i.i.d. under joint distribution : in other words, it does not mean that cov(uit, uis) = 0 (assuming t ≠ s) 

assumption 3. existence of 4th order moment 


○ assumption 4. no perfect multicollinearity : Xit must depend on t

○ under the majore assumptions, fixed effect estimator satisfies consistency and asymptotic normality

○ even if n increases to infinite, the average of Y on time does not satisfy consistency and normality ( n and T are irrelevant)

② formula 


○ data should be comprehended as a table represented on the axes of i and t

○ in the case of T = 2, the situation is the same with before and after regression model

○ the standard deviation of slope = clustered standard error = heteroscedasticity & autocorrelation consistent standard error (HAC)

○ there are not a total of T regression lines up to t = 1, · · · and T. it’s just one regression line

○ it is not for β1, t but β1 

③ an example of algorithm 

data <- read.csv("C:/Users/sun/Desktop/Guns.csv", header = T)

y <- data[, 2]
y <- log(y)
x1 <- data[, 13]
x2 <- data[, 5]
x3 <- data[, 11]
x4 <- data[, 10]
x5 <- data[, 9]
x6 <- data[, 6]
x7 <- data[, 7]
x8 <- data[, 8]

state_y <- array(dim = 56)
state_x1 <- array(dim = 56)
state_x2 <- array(dim = 56)
state_x3 <- array(dim = 56)
state_x4 <- array(dim = 56)
state_x5 <- array(dim = 56)
state_x6 <- array(dim = 56)
state_x7 <- array(dim = 56)
state_x8 <- array(dim = 56)

for(i in 1:56){
    if(i != 3 && i != 7 && i != 14 && i != 43 && i != 52){
        data_sub <- data[stateid == i, ]
        state_y[i] <- mean(data_sub[, 2])
        state_x1[i] <- mean(data_sub[, 13])
        state_x2[i] <- mean(data_sub[, 5])
        state_x3[i] <- mean(data_sub[, 11])
        state_x4[i] <- mean(data_sub[, 10])
        state_x5[i] <- mean(data_sub[, 9])
        state_x6[i] <- mean(data_sub[, 6])
        state_x7[i] <- mean(data_sub[, 7])
        state_x8[i] <- mean(data_sub[, 8])

Y <- array(dim = 1173)
X1 <- array(dim = 1173)
X2 <- array(dim = 1173)
X3 <- array(dim = 1173)
X4 <- array(dim = 1173)
X5 <- array(dim = 1173)
X6 <- array(dim = 1173)
X7 <- array(dim = 1173)
X8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 12]
    Y[i] <- y[i] - state_y[j]
    X1[i] <- x1[i] - state_x1[j]
    X2[i] <- x2[i] - state_x2[j]
    X3[i] <- x3[i] - state_x3[j]
    X4[i] <- x4[i] - state_x4[j]
    X5[i] <- x5[i] - state_x5[j]
    X6[i] <- x6[i] - state_x6[j]
    X7[i] <- x7[i] - state_x7[j]
    X8[i] <- x8[i] - state_x8[j]

RELATION <- lm(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8)


④ when a result of applying the fixed effect regression model shows a significantly different conclusion from the original result

○ strong implication that there was an omitted variable bias in the original model

⑤ even if the main assumptions are satisfied, there can be autocorrelation

○ autocorrelation: uit and uit* (t ≠ t*) also have serial correlation. associated with HAC

○ the case without autocorrelation 


Figure. 1. the case without autocorrelation

○ the case with autocorrelation 


Figure. 2. the case with autocorrelation 

○ the proof of cov(vit, vis) = 0 (assuming k ≠ s) in the case without autocorrelation 


⑷ matrix notation of fixed effect regression model 

① modeling


② assumption


③ fixed effects estimator


④ consistency


⑤ asymptotic normality


⑸ least squares dummy variables model (LSDV)

① formula


② the reason why D1i is not included: to avoid perfect multi-collinearity

○ formula


○ if the coefficient exists, γ1 cannot be specified

○ perfect multicollinearity caused by dummy variables is also called dummy variable trap

③ cannot perform the regression analysis if range of i (i.e., n) is too large: because there are too many regression variables 

⑹ time effect 

① time effect term: marked as λt  


② modeling


③ an example of algorithm 

data <- read.csv("C:/Users/sun/Desktop/Guns.csv", header = T)

# definition
y <- data[, 2]
y <- log(y)
x1 <- data[, 13]
x2 <- data[, 5]
x3 <- data[, 11]
x4 <- data[, 10]
x5 <- data[, 9]
x6 <- data[, 6]
x7 <- data[, 7]
x8 <- data[, 8]

# elimination of fixed state effects
state_y <- array(dim = 56)
state_x1 <- array(dim = 56)
state_x2 <- array(dim = 56)
state_x3 <- array(dim = 56)
state_x4 <- array(dim = 56)
state_x5 <- array(dim = 56)
state_x6 <- array(dim = 56)
state_x7 <- array(dim = 56)
state_x8 <- array(dim = 56)

for(i in 1:56){
    if(i != 3 && i != 7 && i != 14 && i != 43 && i != 52){
        data_sub <- data[stateid == i, ]
        state_y[i] <- mean(data_sub[, 2])
        state_x1[i] <- mean(data_sub[, 13])
        state_x2[i] <- mean(data_sub[, 5])
        state_x3[i] <- mean(data_sub[, 11])
        state_x4[i] <- mean(data_sub[, 10])
        state_x5[i] <- mean(data_sub[, 9])
        state_x6[i] <- mean(data_sub[, 6])
        state_x7[i] <- mean(data_sub[, 7])
        state_x8[i] <- mean(data_sub[, 8])

Y <- array(dim = 1173)
X1 <- array(dim = 1173)
X2 <- array(dim = 1173)
X3 <- array(dim = 1173)
X4 <- array(dim = 1173)
X5 <- array(dim = 1173)
X6 <- array(dim = 1173)
X7 <- array(dim = 1173)
X8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 12]
    Y[i] <- y[i] - state_y[j]
    X1[i] <- x1[i] - state_x1[j]
    X2[i] <- x2[i] - state_x2[j]
    X3[i] <- x3[i] - state_x3[j]
    X4[i] <- x4[i] - state_x4[j]
    X5[i] <- x5[i] - state_x5[j]
    X6[i] <- x6[i] - state_x6[j]
    X7[i] <- x7[i] - state_x7[j]
    X8[i] <- x8[i] - state_x8[j]

# elimination of fixed time effects
time_Y <- array(dim = 23)
time_X1 <- array(dim = 23)
time_X2 <- array(dim = 23)
time_X3 <- array(dim = 23)
time_X4 <- array(dim = 23)
time_X5 <- array(dim = 23)
time_X6 <- array(dim = 23)
time_X7 <- array(dim = 23)
time_X8 <- array(dim = 23)

for(t in 77:99){
    data_sub2 <- data[year == t, ]
    time_Y[t - 76] <- mean(data_sub2[, 2]) - mean(state_y, na.rm = TRUE)
    time_X1[t - 76] <- mean(data_sub2[, 13]) - mean(state_x1, na.rm = TRUE)
    time_X2[t - 76] <- mean(data_sub2[, 5]) - mean(state_x2, na.rm = TRUE)
    time_X3[t - 76] <- mean(data_sub2[, 11]) - mean(state_x3, na.rm = TRUE)
    time_X4[t - 76] <- mean(data_sub2[, 10]) - mean(state_x4, na.rm = TRUE)
    time_X5[t - 76] <- mean(data_sub2[, 9]) - mean(state_x5, na.rm = TRUE)
    time_X6[t - 76] <- mean(data_sub2[, 6]) - mean(state_x6, na.rm = TRUE)
    time_X7[t - 76] <- mean(data_sub2[, 7]) - mean(state_x7, na.rm = TRUE)
    time_X8[t - 76] <- mean(data_sub2[, 8]) - mean(state_x8, na.rm = TRUE)

YY <- array(dim = 1173)
XX1 <- array(dim = 1173)
XX2 <- array(dim = 1173)
XX3 <- array(dim = 1173)
XX4 <- array(dim = 1173)
XX5 <- array(dim = 1173)
XX6 <- array(dim = 1173)
XX7 <- array(dim = 1173)
XX8 <- array(dim = 1173)

for(i in 1 : dim(data)[1]){
    j <- data[i, 1]
    YY[i] <- Y[i] - time_Y[j - 76]
    XX1[i] <- X1[i] - time_X1[j - 76]
    XX2[i] <- X2[i] - time_X2[j - 76]
    XX3[i] <- X3[i] - time_X3[j - 76]
    XX4[i] <- X4[i] - time_X4[j - 76]
    XX5[i] <- X5[i] - time_X5[j - 76]
    XX6[i] <- X6[i] - time_X6[j - 76]
    XX7[i] <- X7[i] - time_X7[j - 76]
    XX8[i] <- X8[i] - time_X8[j - 76]

RELATION <- lm(YY ~ XX1 + XX2 + XX3 + XX4 + XX5 + XX6 + XX7 + XX8)


⑺ time effect regression using dummy variables

① formula


② the reason why B1t is not included : to avoid perfect multi-collinearity

○ formula


○ if the coefficient exists, δ1 cannot be specified

○ perfect multi-collinearity caused by dummy variables is also called dummy variable trap

3. instrumental variable 

⑴ definition: a method of separating only the essence information of the regression variable using the third variable

⑵ simple expression

① modeling

○ if there is one regression variable


○ if there are multiple regression variables 


○ endogenous variable : a variable correlated with ui  

○ exogenous variable : a variable that is not correlated with ui

○ exactly identified : m = k

○ over-identified : m > k

○ under-identified : m < k 

○ unable to perform modeling in under-identified : it means that there should be a lot of instrumental variables

○ the reason why W is included : it is useful when it is difficult to find a Z that meets the criteria


② assumptions for using instrumetnal variables 

assumption 1. E(ui | W1i, ···, Wri) = 0 

assumption 2. (X1i, ···, Xki, W1i, ···, Wri, Z1i, ···, Zmi, Yi) is i.i.d. 

assumption 3. all variables have finite 4th order moment

assumption 4. instrumental variable effectiveness

4-1. instrument relevance

4-2. instrument exogeneity

4-3. no perfectly collinearity

○ if the assumptions are satisfied, the TSLS estimator satisfies consistency and asymptotic normality 

③ procedure

○ if there is one regression variable

○ 1st. perform regression analysis of Xi by using instrumental variable Zi 


○ 2nd. calculate the estimator of Xi  


○ 3rd. perform regression analysis of Yi by using the estimator of Xi  


○ if there are multiple regression variables 

○ 1st. perform regression analysis of Xi by using instrumental variable Zi : for ℓ = 1, ···, k, 


○ 2nd. calculate the estimator of Xi : for ℓ = 1, ···, k,


○ 3rd. perform regression analysis of Yi by using the estimator of Xi : for ℓ = 1, ···, k, 


○ doing OLS regression twice may miscalculate the standard error

④ two-step least squares (TSLS) estimator 

○ formula


○ proof


○ (note) if Zi := Xi, the TSLS estimator of β1 is the same with the OLS estimator of β1 


⑤ consistency


⑥ asymptotic normality


⑶ supplement of instrumental variable effectiveness 

① instrument relevance

○ formula


○ weak instrumental variable: the case that the instrumental variable is not sufficiently correlated with the regression variable. the estimaor shows very strange values 

○ test of instrumental variable strength 

○ when calculating 1st stage F statistic, if F is bigger than 10 the instrumental variable is strong


○ available only in homoskedasticity

○ W1i, ···, Wri have nothing to do with the strength of an instrumental variable

② instrument exogeneity 

○ formula 


○ u must be specified to know the instrument exogenuity


○ over-identifying restrictions test 

○ when the following statistics are calculated, J follows a chi-squared distribution with a degree of freedom of m-k


○ null hypothesis for J, H0 : the proposition that instrumental variables are exogenous

○ logic is similar to instrument relevance : if the F statistic is small, it means that there is no correlation (all coefficients 0)

○ available only in homoskedasticity : many statistical programs also offer heteroskedasticity-robust J-test

○ unable to determine which instrumental variable is endogenous when rejecting null hypothesis

○ meaning of the degree of freedom in J statistic

○ k instumental variables are used to make residuals: they correspond to k endogenous variables

○ the remaining m-k instrumental variables are used to test the correlation with the residuals

○ unable to apply J test in the case of exactly identified because there are no instumental variables to be used in correlation relationship analysis: J statistic is always zero in this case

③ no perfect collinearity


⑷ matrix notation

① modeling


○ Xi and Zi may overlap

② assumptions 

○ YiXitβ + ui

○ (YiXiZi), i = 1, ···, N is i.i.d.

○ E(ui | Zi) = 0

○ E(ZiXit), E(ZiZit) have inverse matrices  

Zi, ,Xi, and ui have finite 4th order moments 

③ procedure 

○ 1st. perform regression analysis of Xi by using instrumental variable Zi 

○ 2nd. calculate the estimator of Xi 

○ 3rd. perform regression analysis of Yi by using the estimator of Xi 

④ estimator


⑤ consistency


⑥ asymptotic normality


⑦ estimator of the variance of the normal distribution  


⑸ exploration of instrumental variables: the exploration is in the realm of art  

① Joshua Angrist (MIT)

② Steven Levitt (Chicago): published “Freakonomics”

③ Daron Acemoglu (MIT): published “Why Nations Fail”

4. randomized controlled experiment 

⑴ overview 

① definition: randomly extracting subjects from the population and then dividing the groups randomly again to perform different treatments

② randomized controlled experiments are rare in econometrics

③ randomized controlled experiments can remove omitted variable bias: 100% validity is not guaranteed

④ it provides standard for which to judge causality

⑵ formula 

① simple model


② model including additional regression variables


③ reason for adding additional regression variables

reason 1. randomization check 

○ regardless of whether additional regression variables are present or not, β1 is consistent

○ it was not random if β1 changed significantly depending on the presence or absence of additional regression variables 

reason 2. efficiency: if there are additional regression variables, the variance is smaller

reason 3. conditional randomization

○ depending on the individual characteristics of a person, it may not be random even if it appears to be randomly extracted

○ random sampling with additional regression variables fixed can minimize such concerns

○ the following conditional independence must be satisfied for the β1 estimator to be consistent: a weaker conditon than independence 


○ interaction: treatment effect can depend on W


⑶ threats to internal validity 

① failure to randomize

○ not only the treatment effect but also the non-random assignment effect appears

○ hypothesis test: if the coefficients are all zero when performing regression analysis of pre-treatment characteristics of W1i, ···, Wri by using Xi, the experiment can be regarded as a randomized experiment 

○ example: if random processing is performed by name, a specific ethnic groups may be assigned to the processing group preferentially 

② failure to follow treatment protocol (partial compliance)

○ definition: even though random processing works well, subjects may not comply with the protocol well

○ due to this, Xi can be correlated with ui  

○ randomized encourage design : partial complieance can be identified if the random treatment is the instrumental variable and the real treatment is analyzed under instrumental variable regression 

③ attrition

○ definition: excluding subjects for reasons related to treatment after random sampling 

④ Hawthorne effect 

○ definition : the subject’s knowledge of what experiments he or she is carrying out can affect the results of the experiment

○ in a new drug research, double blind test can be used to avoid this issue 

○ difficult to perform double-blind test in econometrics

⑤ small sample

○ because human-related research is expensive, the sample size is small

○ many statistical estimations are based on asymptotic normality

○ if the sample size is small, the sample should not be estimated by the normal distribution

⑷ threats to external validity 

① non-representative sample

○ general econometric experiments target undergraduate volunteers

○ volunteers are more motivated and can be overestimated in terms of meauring effects

② non-representative program or policy

○ the experimental program or policy should be similar to the actual one

○ example: the experimental program is performed for a short period of time. real-life areas of curiosity may require longer time

③ general equilibrium effect 

○ definition: treatment changes the overall environment, which can amplify or suppress the effectiveness of treatment

○ small experiments do not reflect changes in the environment, so external validity must be considered separately

5. quasi-experiment

⑴ definition

① an experiment in which an independent variable is not under the control of a researcher and is conducted in a natural situation

② also known as natural experiment

③ objective: program evaluation

method 1. differences-in-differences (DID) estimator 

① the simplest model (assuming panel data)


Figure. 3. graphical representation of DID estimator


② model with additional regression variables (assuming panel data): because conditions may change between before and after data


③ criterion for repeated cross-sectional data 


method 2. instrumental variable regression 

① 1st. define Zi as a regression variable in a randomized controlled experiment

② 2nd. Zi is a good instrumental variable for Xi: instrument relevance is satistifed 

③ 3rd. Yi is the result of interest.

④ 4th. evaluate the effect of Xi on Yi with Zi as an instrumental variable 

method 3. regression discontinuity design (RDD) 

① overview

○ if you set the threshold (cut-off) ω0, the data near the threshold may be similar

○ when data near the threshold is processed differently, the following difference can be completely seen as a treatment effect

○ it is a very popular experimental technique

○ disadvantage : difficult to apply a regression discontinuity design to the outlier

② sharp regression discontinuity design


Figure. 4. sharp regression discontinuity design


③ fuzzy regression discontinuity design 

○ the experiment may not be tested as smoothly as Xi defined in the sharp regression discontinuity design

○ the following instrumental variable Zi can be a good instrumental variable on actual Xi  


⑸ threats to internal validity

① failure to randomize

○ not only the treatment effect but also the non-random assignment effect appears

○ hypothesis test : if the coefficients are all zero when performing regression analysis of pre-treatment characteristics of W1i, ···, Wri by using Xi, the experiment can be regarded as a randomized experiment 

○ example : if random processing is performed by name, a specific ethnic group may be assigned to the processing group preferentially 

② failure to follow treatment protocol (partial compliance)

○ definition : even though random processing works well, subjects may not comply with the protocol well

○ due to this, Xi can be correlated with ui

○ randomized encourage design: partial compliance can be identified if the random treatment is the instrumental variable and the real treatment is analyzed under instrumental variable regression 

③ attrition 

○ definition: excluding subjects for reasons related to treatment after random sampling

④ Hawthorne effect 

○ no reason to be cautious about the Hawthorne effect in a quasi-experiment: because it is a natural experiment

⑤ instrumental variance effectiveness 

○ instrument relevance can be evaluated through data 

○ instrument exogenity may not be established even if the instrumental variable appears to be randomly assigned

○ example: Xi and ui may have a correlation while inducing low-numbered people to act to avoid conscription even if researchers want to see income according to the number of draft lottery 

⑹ threats to external validity  

① non-representative sample 

② non-representative program or policy 

③ general equilibrium effect 

⑺ criticism

① attempts are performed to find good variables in quasi-experiments

② there aren’t that many really good quasi-experiments

6. heterogeneous population 

⑴ definition: the case that the coefficients of regression line β0i , β1i are not constants but vary according to to sample


① β1i : heterogenous effect of Xi

② the parameter of interest is E(β1i

③ if β1i is observable, models using interaction can be used 

④ if β1i is unobservable, it is analyzed as follows

⑵ OLS 

① assumption : Xi should be random → Xi and (ui, β0i, β1i) should be independent

○ conditions that are difficult to satisfy in practice

② formula


⑶ instrumental variables estimation (IV) 

① assumption : Zi should be random → Zi and (ui, vi, β0i, β1i, π0i, π1i) should be independent

② formula 


○ E(β1iπ1i) / E(π1i) is called local average treatment effect (LATE)

③ conditions for equalizing LATE and ATE

case 1. β1i = β1 = constant: no heteroscedasticity is required 

○ case 2. π1i = π1 : no heteroscedasticity in instumental variable

○ case 3. β1i and π1i are independent 

④ connotations

○ difficult to evaluate instrument exogenuity

○ J-test only tells the difference between LATEs

Input: 2019.11.26 10:29

results matching ""

    No results matching ""