midas: Efficient Multiple Imputation for Large and Complex Data in Python


This paper introduces software packages for efficiently imputing missing data using deep learning methods in Python (MIDASpy) and R (rMIDAS). The software implements a recently developed approach to multiple imputation known as MIDAS, which involves introducing an additional portion of missingness into the dataset, attempting to reconstruct this portion with a type of unsupervised neural network known as a denoising autoencoder, and using the resulting model to draw imputations of originally missing values. These steps are executed by a fast, scalable, and flexible algorithm that expands both the quantity and the range of data that can be analyzed with multiple imputation. To help users optimize the algorithm for their specific application, MIDASpy and rMIDAS offer a host of user-friendly tools for calibrating and validating the imputation model. We provide a comprehensive guide to these functionalities and demonstrate their usage on a large real dataset.

Working Paper