* DISCLAIMER: The following code is provided as is and should be used at your own risk. * No user support is available. * Andrew Jones, November 2006 * This program relates to the material described in Chapter 3 of the book * Jones, A.M., Rice, N., Bago d'Uva, T. & Balia, S. (2007) * "Applied Health Economics", London: Routledge {ISBN: 9780415397728} * DO FILE FOR INTERVAL REGRESSION AND RELATED COMPUTATIONS * DESIGNED FOR USE WITH CANADIAN NPHS DATA capture clear capture log close set more off set memory 80000 log using JHE-hedg.log, replace use nphs94nm.dta #delimit ; global xvar "lincome educ1 educ2 educ3 educ4 househ student disabled unemploy retired other married div_wid m20_24 m25_29 m30_34 m35_39 m40_44 m45_49 m50_54 m55_59 m60_64 m65_69 m70_74 m75_79 m80_ f15_19 f20_24 f25_29 f30_34 f35_39 f40_44 f45_49 f50_54 f55_59 f60_64 f65_69 f70_74 f75_79 f80_"; #delimit cr; * Programmes for WEIGHTED data, using pweights = nmweight set matsize 800 capture drop predhui * compute percentiles of sah and hui cutoffs corresponding to these percentiles; svyprop sah [pweight=nmweight]; * compute cum frequencies in Excel; * compute hui cutoff points for percentile values in SPSS using freq option; * insert hui cutoff values in sah1 and sah2; * attributing lower and upper HUI bounds to sah intervals; recode sah1 0.428=0.428 0.773=0.756 0.897=0.897 0.968=0.947 recode sah2 0.428=0.428 0.773=0.756 0.897=0.897 0.968=0.947 tab sah1 sah2 * ANALYSIS OF THE DISTRIBUTION OF HUI; * kernel densities of HUI and ln(HUI); summ hui,detail sort sah by sah: summ hui, detail sort hui egen rhui=rank(hui) sort sah by sah: summ rhui,detail gen lnhui=ln(1-hui) if hui<1 replace lnhui=ln(0.001) if hui==1 kdensity hui kdensity lnhui *EMPIRICAL DISTRIBUTION FUNCTIONS; * full sample cumul hui, gen(chui) graph twoway scatter chui hui, ti("Empirical CDF of HUI") * by income quintile gen iquin=0 replace iquin=1 if rlincome<0.2 replace iquin=2 if rlincome>=0.2 & rlincome<0.4 replace iquin=3 if rlincome>=0.4 & rlincome<0.6 replace iquin=4 if rlincome>=0.6 & rlincome<0.8 replace iquin=5 if rlincome>=0.8 tab iquin [aweight=nmweight] cumul hui if iquin==1, gen(chui1) cumul hui if iquin==2, gen(chui2) cumul hui if iquin==3, gen(chui3) cumul hui if iquin==4, gen(chui4) cumul hui if iquin==5, gen(chui5) graph twoway scatter chui1 chui2 chui3 chui4 chui5 hui, ti("Empirical CDFs of HUI, by income quintile") * compute percentiles of sah and hui cutoffs corresponding to these percentiles for different income quintiles; svyprop sah [pweight=nmweight]if iquin==1 svyprop sah [pweight=nmweight]if iquin==2 svyprop sah [pweight=nmweight]if iquin==3 svyprop sah [pweight=nmweight]if iquin==4 svyprop sah [pweight=nmweight]if iquin==5 * PERCENTILES OF HUI CORRESPONDING TO FREQUENCIES OF SAH, WITH STD ERRORS; centile hui, centile(2.4 11 38.1 75.2) _pctile hui [pweight=nmweight], percentiles(2.4,11,38.1,75.2) return list centile lnhui, centile(2.4 11 38.1 75.2) _pctile lnhui [pweight=nmweight], percentiles(2.4,11,38.1,75.2) return list * TEST FOR NORMALITY sktest hui [aweight=nmweight] sktest lnhui [aweight=nmweight] * Generalised Lorenz and concentration curves; glcurve hui[aweight=nmweight] glcurve hui [aweight=nmweight], lorenz glcurve hui [aweight=nmweight], sortvar(lincome) glcurve hui [aweight=nmweight], sortvar(lincome) lorenz * Use to test for dominance glcurve hui [aweight=nmweight], sortvar(lincome) lorenz by(sex) split ti("Concentration curves for HUI, by gender") glcurve hui [aweight=nmweight], by(iquin) split glcurve hui [aweight=nmweight], by(iquin) split lorenz ti("Lorenz curves for HUI, by income quintile") * DESCRIPTIVE STATISTICS FOR HUI BY CATEGORY OF SAH; sort sah table sah, contents(n hui mean hui min hui max hui sd hui) table sah, contents(p25 hui p50 hui p75 hui) *table sah, contents(n lnhui mean lnhui min lnhui max lnhui sd lnhui) *table sah, contents(p25 lnhui p50 lnhui p75 lnhui) *by iquin: table sah, contents(n hui mean hui min hui max hui sd hui) *by iquin: table sah, contents(p25 hui p50 hui p75 hui) *by iquin: tab sah * INTERVAL REGRESSION * unconditional predictions for decomposition * interval regression intreg sah1 sah2 $xvar [pweight=nmweight] capture drop predhui predict predhui table sah, contents(n predhui mean predhui min predhui max predhui sd predhui) table sah, contents(p25 predhui p50 predhui p75 predhui) * conditional predictions (on sah) * interval regression (weighted) *intreg sah1 sah2 $xvar [pweight=nmweight] capture drop prechui predict prechui, e(sah1,sah2) *RESET test predict yf gen yf2=yf^2 quietly intreg sah1 sah2 yf2 $xvar [pweight=nmweight] test yf2 drop yf yf2 * OLS REGRESSION - using hui and sahhui summ hui if hui==1 summ hui reg hui $xvar [pweight=nmweight] * conditional prediction capture drop prhui1 predict prhui1, e(sah1,sah2) table sah, contents(n prhui1 mean prhui1 min prhui1 max prhui1 sd prhui1) table sah, contents(p25 prhui1 p50 prhui1 p75 prhui1) * unconditional predictions for decomposition capture drop prhui1 predict prhui1 predict ehui, resid summ ehui, detail kdensity ehui , ti("kernel density estimate for OLS residuals") * UNCONDITIONAL PREDICTIONS table sah, contents(n prhui1 mean prhui1 min prhui1 max prhui1 sd prhui1) table sah, contents(p25 prhui1 p50 prhui1 p75 prhui1) *RESET test predict yf gen yf2=yf^2 quietly reg hui yf2 $xvar[pweight=nmweight] test yf2 drop yf yf2 * ORDERED PROBIT REGRESSIONS oprobit sah $xvar [pweight=nmweight], ro capture drop prhui3 * unconditional predictions predict prhui3, xb by sah: summ prhui3 [aweight=nmweight] predict yf, xb * RESCALE ORDERED PROBIT PREDICTIONS *note sigma=1 for ordered probit * CONDITIONAL MEANS gen ey1 = yf +(-normden(_b[_cut1]-yf))/(norm(_b[_cut1]-yf)) if sah==1 gen ey2 = yf +(normden(_b[_cut1]-yf) - normden(_b[_cut2]-yf))/(norm(_b[_cut2]-yf) - norm(_b[_cut1]-yf)) if sah==2 gen ey3 = yf +(normden(_b[_cut2]-yf) - normden(_b[_cut3]-yf))/(norm(_b[_cut3]-yf) - norm(_b[_cut2]-yf)) if sah==3 gen ey4 = yf +(normden(_b[_cut3]-yf) - normden(_b[_cut4]-yf))/(norm(_b[_cut4]-yf) - norm(_b[_cut3]-yf)) if sah==4 gen ey5 = yf +(normden(_b[_cut4]-yf))/(1 - norm(_b[_cut4]-yf)) if sah==5 capture drop y2 y3 * rescale ordered probit predictions to [0,1] interval gen y2 = (yf - (-1.45436))/(1.96409 - (-1.45436)) * rescale to predicted value range of HUI gen y3 = 0.457835 + (0.96659-0.457835)*y2 * rescale to Cutler etc. gen y4 = (yf -_b[_cut1])/(_b[_cut4]-_b[_cut1]) gen y41=y4 replace y41=0 if y4<0 replace y41=1 if y4>1 summ yf y2 y3 y4 y41 * re-scale conditional means using y2 = a + b.yf = 0.425442 + 0.2925302yf gen ey21 = 0.425442 + 0.29253*ey1 gen ey22 = 0.425442 + 0.29253*ey2 gen ey23 = 0.425442 + 0.29253*ey3 gen ey24 = 0.425442 + 0.29253*ey4 gen ey25 = 0.425442 + 0.29253*ey5 *rescale conditional means using y3= c + dyf = 0.6742807 + 0.1488262yf gen ey31 = 0.6742807 + 0.1488262*ey1 gen ey32 = 0.6742807 + 0.1488262*ey2 gen ey33 = 0.6742807 + 0.1488262*ey3 gen ey34 = 0.6742807 + 0.1488262*ey4 gen ey35 = 0.6742807 + 0.1488262*ey5 *rescale using y4 gen ey41 = ey1/(_b[_cut4]-_b[_cut1]) gen ey42 = ey2/(_b[_cut4]-_b[_cut1]) gen ey43 = ey3/(_b[_cut4]-_b[_cut1]) gen ey44 = ey4/(_b[_cut4]-_b[_cut1]) gen ey45 = ey5/(_b[_cut4]-_b[_cut1]) * summary table conditional predictions sort sah by sah: summ ey1 ey2 ey3 ey4 ey5 [aweight=nmweight] by sah: summ ey21 ey22 ey23 ey24 ey25[aweight=nmweight] by sah: summ ey31 ey32 ey33 ey34 ey35 [aweight=nmweight] summ ey1 ey2 ey3 ey4 ey5 [aweight=nmweight] summ ey21 ey22 ey23 ey24 ey25[aweight=nmweight] summ ey31 ey32 ey33 ey34 ey35 [aweight=nmweight] summ ey41 ey42 ey43 ey44 ey45 [aweight=nmweight] summ ey1 ey2 ey3 ey4 ey5 summ ey21 ey22 ey23 ey24 ey25 summ ey31 ey32 ey33 ey34 ey35 summ ey41 ey42 ey43 ey44 ey45 by sah: summ hui sahhui prechui prhui1c prhui2c [aweight=nmweight] * RESET test gen yf2=yf^2 quietly oprobit sah yf2 $xvar [pweight=nmweight], ro test yf2 capture drop yf yf2 * GENERALISED ORDERED PROBIT gen cdsah1=0 gen cdsah2=0 gen cdsah3=0 gen cdsah4=0 replace cdsah1=1 if sah>1 replace cdsah2=1 if sah>2 replace cdsah3=1 if sah>3 replace cdsah4=1 if sah>4 probit cdsah1 $xvar [pweight=nmweight], ro predict cdyf1, xb probit cdsah2 $xvar [pweight=nmweight], ro predict cdyf2, xb probit cdsah3 $xvar [pweight=nmweight], ro predict cdyf3, xb probit cdsah4 $xvar [pweight=nmweight], ro predict cdyf4, xb gen prhop=cdsah1*cdyf1+cdsah2*cdyf2+cdsah3*cdyf3+cdsah4*cdyf4 summ prhui3 prhop graph prhui3 prhop cor prhui3 prhop reg prhui3 prhop [pweight=nmweight]