/*******************************************************************************
********************************************************************************
********************************************************************************
This do-file shows how the MR-LATE estimator of Calvi-Lewbel-Tommasi (2017)
is actually implemented. Please for any coding question, e-mail:
denni.tommasi@gmail.com.
As an example of how MR-LATE can be used, we simulate the following situation,
which is very common for applied researchers. Suppose that we want to estimate
the effects of an endogenous treatment variable D on an output variable Y, using
Z as an instrument. The problem we face is that D is not directly observed. We
only have a proxy T of the true treatment status. For example, we may be
interested in studying the effects of bargaining power on health status of
family members, just like in our empirical application. Many other examples are
possible, in any field of research. As a proxy of the true treatment "power"
we may use a question in our dataset like the following:
"Who usually makes decisions about food in your household?"
And the answers we may have are:
1) man
2) both
3) woman
As an applied reseacher, we may decide that if the answer is 1), then this is a
household where the man makes the most important decisions, so he has "power".
If the answer is 3), the woman is the one having power. If the answer is 2), we
cannot know who actually makes decisions in the household. Answers like 2) are
what causes measurement error in our estimation, and hence biased results.
MR-LATE solves this problem. The following code implements a simple simulation
to compare MR-LATE with the standard OLS and IV estimators.
********************************************************************************
********************************************************************************
*******************************************************************************/
clear
clear matrix
program drop _all
set mem 10m
set matsize 800
set seed 1234
set more off
********************************************************************************
** 0. Set up
********************************************************************************
global obs "10000"
global Nreps "100"
global c "0.5" // mean bargaining power
global z "0.1" // strenght of the instrument
********************************************************************************
** 1. Program MR-LATE
********************************************************************************
program MRLATE, rclass
drop _all
/* Set-up */
* Set number of obs
set obs $obs
* gen regressor
gen x = rnormal(0,0.1)
label var x "covariate"
* gen omitted variable
gen a = rnormal(0,0.1)
* gen binary instrumental variable (only 10% of individuals are eligible)
gen z_temp = runiform()
gen z = (z_temp > 0.9)
label var z "binary instrument"
drop z_temp
tab z
* gen error terms
gen u = rnormal(0,0.1)
sum u
replace u = u - r(mean)
gen v0 = rnormal(0,1)
sum v0
replace v0 = v0 - r(mean)
gen v1 = rnormal(0,1)
sum v1
replace v1 = v1 - r(mean)
*** True participation
gen d_star = $c + 0.1*z + 0.1*x + 0.1*a + u
gen d = (d_star > $c)
label var d "True status"
*** Generate the 3 values of the treatment variable
xtile t = d_star, nq(3)
label var t "Proxy of the true treatment"
tab t,g(t_g)
*** Potential outcomes
gen y0 = 0.5 + 1*x + 1*a + v0
gen y1 = 1.5 + 1*x + 1*a + v1
gen y = (1-d)*y0 + d*y1
label var y "Observed outcome"
/* Re-classify */
* t
gen t_a = (t == 3)
gen t_b = (t == 1)
* y
gen y_a = t_a*y
gen y_b = t_b*y
/* Estimation: MR-LATE*/
* t_a
* 1st stage
reg t_a z x
predict t_a_hat, xb
* 2nd stage
reg y_a t_a_hat x
est store new_a
* t_b
* 1st stage
reg t_b z x
predict t_b_hat, xb
* 2nd stage
reg y_b t_b_hat x
est store new_b
*** MR-LATE
gsem (y_a <- t_a_hat x) (y_b <- t_b_hat x), nocaps
lincomest _b[y_a: t_a_hat] - _b[y_b: t_b_hat]
/*******************************************************************************
Remark. For estimation on an actual dataset, you may want to bootstrap your
standard errors. This is how this can be done:
xi: bootstrap , reps(1000) seed($seed) cluster(cluster): //
gsem (y_a <- t_a_hat x) (y_b <- t_b_hat x), nocaps
lincomest _b[y_a: t_a_hat] - _b[y_b: t_b_hat]
*******************************************************************************/
/* Estimation: OLS or IV (estimates of the naive approach will be biased)*/
*1
*reg y x t
*2
*gen t_p = (t == 3) // This is how a naive approach would define the treatment
*ivregress 2sls y x (t_p=z)
end
********************************************************************************
** 2. MC simulations
********************************************************************************
simulate _b _se, reps($Nreps): MRLATE
sum _sim_1,d
scalar Bias=r(mean)-1
scalar MSE=Bias^2+r(Var)
di Bias
di MSE