Statistics Colloquium, Dr. Abhirup Datta
Department of Biostatistics, Johns Hopkins Bloomberg School
Title: Large scale spatial analysis using sparse Cholesky factors
Abstract: Gaussian process (GP) models are widely used for analyzing space and space-time indexed data from forestry, environmental health, disease epidemiology etc. However, traditional GP models entail computations that become prohibitive for datasets with large number of spatial or temporal locations. In this talk, I will present highly scalable alternative models based on sparse Cholesky factors for analyzing massive spatial, spatio-temporal and areal datasets. For spatial and spatio-temporal datasets, I will introduce our proposed Nearest Neighbor Gaussian Process (NNGP) models that offer a scalable fully model based approach which effectively reproduces the corresponding inference from traditional (but highly expensive) GP models. For areal datasets, I will discuss ongoing work on Directed Acyclic Graph Autoregressive Model that provides an alternative to the widely used Conditional Autoregressive Model. I will describe Matrix-free Markov chain Monte Carlo (MCMC) algorithms for massive scalability. I will also discuss applications large scale prediction of forest biomass and analysis of air pollution data.