Statistics Colloquium : Dr. Arkaprava Roy
Duke University
Title: Factor models for multigroup data
Abstract: Many modern data sets require inference methods that can estimate the shared and individual-specific components of variability in collections of matrices that change over time. Promising methods have been developed to analyze these types of data in static cases, but very few approaches are available for dynamic settings. To address this gap, we consider novel models and inference methods for pairs of matrices in which the columns correspond to multivariate observations at different
time points. In order to characterize common and individual features, we propose a Bayesian dynamic factor modeling framework called Time Aligned Common and Individual Factor Analysis (TACIFA) that includes uncertainty in time alignment through an unknown warping function. We provide theoretical support for the proposed model, showing identifiability and posterior concentration. The structure enables efficient computation through a Hamiltonian Monte Carlo (HMC) algorithm. We show excellent
performance in simulations, and illustrate the method through application to a social synchrony experiment.
https://arxiv.org/abs/1904.12103
Factor analysis is routinely used for dimensionality reduction. However, a major issue is `brittleness’ in which one can obtain substantially different factors in analyzing similar datasets. Factor models have been developed for multi-study data by using additive expansions incorporating common and study-specific factors. However, allowing study-specific factors runs counter to the goal of producing a single set of factors that hold across studies. As an alternative, we propose a class of Perturbed Factor Analysis (PFA) models that assume a common factor structure across studies after perturbing the data via multiplication by a study-specific matrix. Bayesian inference algorithms can be easily modified in this case by using a matrix normal hierarchical model for the perturbation matrices. The resulting model is just as flexible as current approaches in allowing arbitrarily large differences across studies, but has substantial advantages that we illustrate in simulation studies and an application to NHANES data. We additionally show advantages of PFA in single study data analyses in which we assign each individual their own perturbation matrix, including reduced generalization error and improved identifiability.