Statistics Colloquium : Dr. Naim Rashid
UNC
Abstract: Deep Learning (DL) methods have dramatically increased in popularity in recent years. While its initial success was demonstrated in the classification of images, there has been significant growth in the application of DL methods to an array of problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in biomedical datasets present challenges for Deep Learning methods. Here we present a formal treatment of missing data in the context of Variational Autoencoders (VAEs), a popular DL architecture commonly utilized for unsupervised learning tasks such as dimension reduction, imputation, and learning latent representation of complex data. We propose a new VAE architecture with the flexibility to handle both ignorable (MCAR, MAR) and non-ignorable (MNAR) patterns of missingness in a unified framework. We also examine missingness in the context of feed-forward neural networks for supervised learning tasks (such as regression and classification), assuming ignorable or non-ignorable missingness in the input covariates. In both cases, we demonstrate the impact of improper handling of missing data in terms of model performance. We motivate our methods with an EHR dataset pertaining to 12,000 ICU patients containing a large number of diagnostic measurements and clinical outcomes, where many features are only partially observed.