/******************************************************************************* ******************************************************************************** ******************************************************************************** This do-file shows how the MR-LATE estimator of Calvi-Lewbel-Tommasi (2017) is actually implemented. Please for any coding question, e-mail: denni.tommasi@gmail.com. As an example of how MR-LATE can be used, we simulate the following situation, which is very common for applied researchers. Suppose that we want to estimate the effects of an endogenous treatment variable D on an output variable Y, using Z as an instrument. The problem we face is that D is not directly observed. We only have a proxy T of the true treatment status. For example, we may be interested in studying the effects of bargaining power on health status of family members, just like in our empirical application. Many other examples are possible, in any field of research. As a proxy of the true treatment "power" we may use a question in our dataset like the following: "Who usually makes decisions about food in your household?" And the answers we may have are: 1) man 2) both 3) woman As an applied reseacher, we may decide that if the answer is 1), then this is a household where the man makes the most important decisions, so he has "power". If the answer is 3), the woman is the one having power. If the answer is 2), we cannot know who actually makes decisions in the household. Answers like 2) are what causes measurement error in our estimation, and hence biased results. MR-LATE solves this problem. The following code implements a simple simulation to compare MR-LATE with the standard OLS and IV estimators. ******************************************************************************** ******************************************************************************** *******************************************************************************/ clear clear matrix program drop _all set mem 10m set matsize 800 set seed 1234 set more off ******************************************************************************** ** 0. Set up ******************************************************************************** global obs "10000" global Nreps "100" global c "0.5" // mean bargaining power global z "0.1" // strenght of the instrument ******************************************************************************** ** 1. Program MR-LATE ******************************************************************************** program MRLATE, rclass drop _all /* Set-up */ * Set number of obs set obs \$obs * gen regressor gen x = rnormal(0,0.1) label var x "covariate" * gen omitted variable gen a = rnormal(0,0.1) * gen binary instrumental variable (only 10% of individuals are eligible) gen z_temp = runiform() gen z = (z_temp > 0.9) label var z "binary instrument" drop z_temp tab z * gen error terms gen u = rnormal(0,0.1) sum u replace u = u - r(mean) gen v0 = rnormal(0,1) sum v0 replace v0 = v0 - r(mean) gen v1 = rnormal(0,1) sum v1 replace v1 = v1 - r(mean) *** True participation gen d_star = \$c + 0.1*z + 0.1*x + 0.1*a + u gen d = (d_star > \$c) label var d "True status" *** Generate the 3 values of the treatment variable xtile t = d_star, nq(3) label var t "Proxy of the true treatment" tab t,g(t_g) *** Potential outcomes gen y0 = 0.5 + 1*x + 1*a + v0 gen y1 = 1.5 + 1*x + 1*a + v1 gen y = (1-d)*y0 + d*y1 label var y "Observed outcome" /* Re-classify */ * t gen t_a = (t == 3) gen t_b = (t == 1) * y gen y_a = t_a*y gen y_b = t_b*y /* Estimation: MR-LATE*/ * t_a * 1st stage reg t_a z x predict t_a_hat, xb * 2nd stage reg y_a t_a_hat x est store new_a * t_b * 1st stage reg t_b z x predict t_b_hat, xb * 2nd stage reg y_b t_b_hat x est store new_b *** MR-LATE gsem (y_a <- t_a_hat x) (y_b <- t_b_hat x), nocaps lincomest _b[y_a: t_a_hat] - _b[y_b: t_b_hat] /******************************************************************************* Remark. For estimation on an actual dataset, you may want to bootstrap your standard errors. This is how this can be done: xi: bootstrap , reps(1000) seed(\$seed) cluster(cluster): // gsem (y_a <- t_a_hat x) (y_b <- t_b_hat x), nocaps lincomest _b[y_a: t_a_hat] - _b[y_b: t_b_hat] *******************************************************************************/ /* Estimation: OLS or IV (estimates of the naive approach will be biased)*/ *1 *reg y x t *2 *gen t_p = (t == 3) // This is how a naive approach would define the treatment *ivregress 2sls y x (t_p=z) end ******************************************************************************** ** 2. MC simulations ******************************************************************************** simulate _b _se, reps(\$Nreps): MRLATE sum _sim_1,d scalar Bias=r(mean)-1 scalar MSE=Bias^2+r(Var) di Bias di MSE