# Using R on the LISSY system

This page explains the special methods used to load *LIS and LWS data* in *R* and generate output, and documents the add-on packages currently available on the system.

## Loading Data

Data is read into the workspace using the *read.LIS* function, which has one required argument and three optional arguments. It works as follows:

## Usage

read.LIS(ccyyuu, labels=TRUE, vars=NULL, subset=NULL)

## Arguments

ccyyuu | A string or vector of strings containing dataset identifiers, indicating which datasets to load. For the formatting of this identifier, see Details. |

labels | A logical value indicating whether to use the value labels for categorical variables. If TRUE, creates a factor from the labels. If FALSE, uses the numeric codes. |

subset | An optional string specifying a subset of observations to be return. By default, all observations are returned. |

vars | An optional vector of variables to be loaded. By default, all variables are returned. |

## Details

The format of *ccyyuu* is a country and year code followed by a one-letter code used to identify the specific type of dataset within each database:

LIS dataset |
‘h’ or ‘p’ for household or person file | e.g. “lu04h”, “us10p” |

LWS dataset |
‘h’, ‘p’ or ‘r’ for household, person or implicate file | e.g. “ca05h”, “us10p” or “us10r” |

If a vector of multiple dataset identifiers is provided, the datasets will be concatenated and returned as a single “stacked” data frame. Attempting to simultaneously retrieve two datasets containing different variables (e.g., a household and a person file) will cause an error, because such incompatible datasets cannot be stacked on top of one another.

See Examples for code that retrieves the household and person files separately and then generates a merged file, with household data appended to individual records.

When labels is set to FALSE, the value labels will still be stored in the attributes of the data frame, in the named list attribute “label.table”.

## Value

A data frame with attributes. See the documentation for read.dta13 for more information.

## Examples

# Load the household file for Luxembourg 2010ds <- read.LIS('lu10h')# Load the person file for Luxembourg 2010, numeric codes onlyds <- read.LIS('lu10p', labels=FALSE) print(table(ds$educ)) print(attributes(ds)$label.table$educ)# Load a combined dataset for Luxembourg 2004 and 2010, containing only the dataset name, weights, and disposable household income, for households containing children under age 18.ds <- read.LIS(c('lu04h','lu10h'), vars=c('dname','hwgt','dhi'), subset="nhhmem17>0")# Load and merge the household and person files for Luxembourg 2010dsh <- read.LIS('lu10h') dsp <- read.LIS('lu10p') ds <- merge(dsp, dsh, by="hid")

## Producing Output

Because of LIS security procedures, you may need to make small modifications to your code in order to produce output. In order to ensure that it will appear in the log file, you must explicitly *print()* an object, for example:

print(table(x))

In addition, the printing of objects of class data.frame has been disabled.

## Add-on packages

The *R* installation on *LISSY* includes all the packages that are included in base *R*, as well as all the recommended packages. In addition, the packages listed below are available and can be loaded using functions such as

library()

or

require()

If you would like to request a package that is not currently installed, please inform User Support and we will consider it for inclusion in a future system update.

Package |
Version |
Title |
---|---|---|

abind | 1.4-3 | Combine Multidimensional Arrays |

arm | 1.8-6 | Data Analysis Using Regression and Multilevel/Hierarchical Models |

coda | 0.18-1 | Output Analysis and Diagnostics for MCMC |

dplyr | 0.4.3 | A Grammar of Data Manipulation |

fit.models | 0.5-10 | fit.models |

Formula | 1.2-1 | Extended Model Formulas |

gdata | 2.17.0 | Various R Programming Tools for Data Manipulation |

gee | 4.13-19 | Generalized Estimation Equation Solver |

gmm | 1.5-2 | Generalized Method of Moments and Generalized Empirical Likelihood |

gmodels | 2.16.2 | Various R Programming Tools for Model Fitting |

gtools | 3.5.0 | Various R Programming Tools |

Hmisc | 3.17-1 | Harrell Miscellaneous |

lavaan | 0.5-20 | Latent Variable Analysis |

lmtest | 0.9-34 | Testing Linear Regression Models |

matrixcalc | 1.0-3 | Collection of functions for matrix calculations |

maxLik | 1.3-4 | Maximum Likelihood Estimation and Related Tools |

mi | 1 | Missing Data Imputation and Model Checking |

miscTools | 0.6-16 | Miscellaneous Tools and Utilities |

mlogit | 0.2-4 | Multinomial logit model |

mvtnorm | 1.0-3 | Multivariate Normal and t Distributions |

numDeriv | 2014.2-1 | Accurate Numerical Derivatives |

pcaPP | 1.9-60 | Robust PCA by Projection Pursuit |

plyr | 1.8.3 | Tools for Splitting, Applying and Combining Data |

psych | 1.5.8 | Procedures for Psychological, Psychometric, and Personality Research |

quantreg | 5.19 | Quantile Regression |

robust | 0.4-16 | Robust Library |

robustbase | 0.92-5 | Basic Robust Statistics |

rrcov | 1.3-8 | Scalable Robust Estimators with High Breakdown Point |

sandwich | 2.3-4 | Robust Covariance Matrix Estimators |

sem | 3.1-6 | Structural Equation Models |

SparseM | 1.7 | Sparse Linear Algebra |

statmod | 1.4.23 | Statistical Modeling |

stringi | 1.0-1 | Character String Processing Facilities |

stringr | 1.0.0 | Simple, Consistent Wrappers for Common String Operations |

survey | 3.30-3 | Analysis of complex survey samples |

tidyr | 0.4.1 | Easily Tidy Data with `spread()` and `gather()` Functions |

zoo | 1.7-12 | S3 Infrastructure for Regular and Irregular Time Series (Z's Ordered Observations) |