/* The Inequality Key Figures disseminated on the LIS website are computed using the R programming language. When attempting to replicate these figures in Stata, small discrepancies may occasionally arise. These differences are due to variations in numerical precision between the two software environments, as well as dataset-specific characteristics. For example, in datasets with very small subsamples (such as poorsm – rs19), results may be highly sensitive to whether a few households are classified as poor or not. In other datasets with many identical values, even minor precision differences can affect poverty classification and lead to noticeable mismatches. Although such discrepancies are very rare, users seeking full replication of the published figures are advised to use R. For further questions, please contact the LIS User Support team at usersupport@lisdatacenter.org */ // package = stata // project = lis // To select specific datasets, other than an entire country series, use option 'ccyy()' instead of 'iso2()' --> example: ccyy(de22 it16) // 1) Load data lissyuse, iso2(lu) /// hvars(hid hwgt nhhmem dhi dname year) // (data is at household-level) levelsof dname, local(levels) foreach ccyy of local levels { ** preserve ** di "`ccyy'" keep if dname == "`ccyy'" // 2) Data preparation *=================================== * Filter out missing observations *=================================== drop if missing(dhi) *=================================== * Create person weight (data is at household-level) *=================================== generate double pwt = hwgt * nhhmem *=================================== * Bottom and top coding / outlier detection *=================================== generate double dhi_log = log(dhi) replace dhi_log = 0 if dhi_log == . & dhi != . // keep negatives and 0 sort dhi_log hwgt qui percentils dhi_log [aw = hwgt], p(25 75) local p75 = e(Perc_75) local p25 = e(Perc_25) gen double iqr = `p75' - `p25' // interquartile range gen double upper_bound = `p75' + (iqr * 3) // upper bound for extreme values gen double lower_bound = `p25' - (iqr * 3) // lower bound for extreme values replace dhi=exp(upper_bound) if dhi>exp(upper_bound) // top code income at upper bound for extreme values replace dhi=exp(lower_bound) if dhi