Using R on the LISSY system

This page explains the special methods used to load LIS and LWS data in R and generate output, and documents the add-on packages currently available on the system.

Loading Data

Data is read into the workspace using the read.LIS function, which has one required argument and three optional arguments. It works as follows:

Usage

read.LIS(ccyyuu, labels=TRUE, vars=NULL, subset=NULL)

Arguments

ccyyuu A string or vector of strings containing dataset identifiers, indicating which datasets to load. For the formatting of this identifier, see Details.
labels A logical value indicating whether to use the value labels for categorical variables. If TRUE, creates a factor from the labels. If FALSE, uses the numeric codes.
subset An optional string specifying a subset of observations to be return. By default, all observations are returned.
vars An optional vector of variables to be loaded. By default, all variables are returned.

Details

The format of ccyyuu is a country and year code followed by a one-letter code used to identify the specific type of dataset within each database:

LIS dataset ‘h’ or ‘p’ for household or person file e.g. “lu04h”, “us10p”
LWS dataset ‘h’, ‘p’ or ‘r’ for household, person or implicate file e.g. “ca05h”, “us10p” or “us10r”

If a vector of multiple dataset identifiers is provided, the datasets will be concatenated and returned as a single “stacked” data frame. Attempting to simultaneously retrieve two datasets containing different variables (e.g., a household and a person file) will cause an error, because such incompatible datasets cannot be stacked on top of one another.

See Examples for code that retrieves the household and person files separately and then generates a merged file, with household data appended to individual records.

When labels is set to FALSE, the value labels will still be stored in the attributes of the data frame, in the named list attribute “label.table”.

Value

A data frame with attributes. See the documentation for read.dta13 for more information.

Examples

# Load the household file for Luxembourg 2010
ds <- read.LIS('lu10h')

# Load the person file for Luxembourg 2010, numeric codes only
ds <- read.LIS('lu10p', labels=FALSE)
print(table(ds$educ))
print(attributes(ds)$label.table$educ)

# Load a combined dataset for Luxembourg 2004 and 2010, containing 
only the dataset name, weights, and disposable household income, for 
households containing children under age 18.
ds <- read.LIS(c('lu04h','lu10h'), 
vars=c('dname','hwgt','dhi'), 
subset="nhhmem17>0")

# Load and merge the household and person files for Luxembourg 2010
dsh <- read.LIS('lu10h')
dsp <- read.LIS('lu10p')
ds <- merge(dsp, dsh, by="hid")

Producing Output

Because of LIS security procedures, you may need to make small modifications to your code in order to produce output. In order to ensure that it will appear in the log file, you must explicitly print() an object, for example:

print(table(x))

In addition, the printing of objects of class data.frame has been disabled.

Add-on packages

The R installation on LISSY includes all the packages that are included in base R, as well as all the recommended packages. In addition, the packages listed below are available and can be loaded using functions such as

library()

or

require()

If you would like to request a package that is not currently installed, please inform User Support and we will consider it for inclusion in a future system update.

Package Version Title
abind 1.4-3 Combine Multidimensional Arrays
arm 1.8-6 Data Analysis Using Regression and Multilevel/Hierarchical Models
coda 0.18-1 Output Analysis and Diagnostics for MCMC
dplyr 0.4.3 A Grammar of Data Manipulation
fit.models 0.5-10 fit.models
Formula 1.2-1 Extended Model Formulas
gdata 2.17.0 Various R Programming Tools for Data Manipulation
gee 4.13-19 Generalized Estimation Equation Solver
gmm 1.5-2 Generalized Method of Moments and Generalized Empirical Likelihood
gmodels 2.16.2 Various R Programming Tools for Model Fitting
gtools 3.5.0 Various R Programming Tools
Hmisc 3.17-1 Harrell Miscellaneous
lavaan 0.5-20 Latent Variable Analysis
lmtest 0.9-34 Testing Linear Regression Models
matrixcalc 1.0-3 Collection of functions for matrix calculations
maxLik 1.3-4 Maximum Likelihood Estimation and Related Tools
mi 1 Missing Data Imputation and Model Checking
miscTools 0.6-16 Miscellaneous Tools and Utilities
mlogit 0.2-4 Multinomial logit model
mvtnorm 1.0-3 Multivariate Normal and t Distributions
numDeriv 2014.2-1 Accurate Numerical Derivatives
pcaPP 1.9-60 Robust PCA by Projection Pursuit
plyr 1.8.3 Tools for Splitting, Applying and Combining Data
psych 1.5.8 Procedures for Psychological, Psychometric, and Personality Research
quantreg 5.19 Quantile Regression
robust 0.4-16 Robust Library
robustbase 0.92-5 Basic Robust Statistics
rrcov 1.3-8 Scalable Robust Estimators with High Breakdown Point
sandwich 2.3-4 Robust Covariance Matrix Estimators
sem 3.1-6 Structural Equation Models
SparseM 1.7 Sparse Linear Algebra
statmod 1.4.23 Statistical Modeling
stringi 1.0-1 Character String Processing Facilities
stringr 1.0.0 Simple, Consistent Wrappers for Common String Operations
survey 3.30-3 Analysis of complex survey samples
tidyr 0.4.1 Easily Tidy Data with `spread()` and `gather()` Functions
zoo 1.7-12 S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations)