Stat Colloquium [In-Person]: Dr. Riddhi Ghosh
Bowling Green State University
Title: Statistical Methods for Network Data
Abstract: In the first part a novel method for network regression will be introduced. Regression models applied to network data, where node attributes are the dependent variables, pose methodological challenges. As has been well studied, naive regression neither properly accounts for network community structure, nor does it account for the dependent variable acting as both model outcome and covariate. To address this methodological gap, we propose a network regression model motivated by the important observation that controlling for community structure can, when a network is modular, significantly account for meaningful correlation between observations induced by network connections. We propose a generalized estimating equation (GEE) approach to learn model parameters based on node clusters defined through any single-membership community detection algorithm applied to the observed network. We provide a necessary condition on the network size and edge formation probabilities to establish the asymptotic normality of the estimates of the model parameters under the assumption that the graph structure is a stochastic block model. We evaluate the performance of our approach through simulations and apply it to estimate the impact of the county-level commercial airline transportation network on COVID-19 incidence rates and on net financial aid given or received. In the second part, a bootstrap-based approach to two-sample hypothesis testing for large random graphs of unequal size will be discussed. Our approach involves an algorithm for generating bootstrapped adjacency matrices from estimated community-wise edge probability matrices, forming the basis of the Frobenius test statistic. We derive the asymptotic distribution of the proposed test statistic and validate its stability and efficiency in detecting minor differences in underlying models through simulations. Furthermore, we explore its application to fMRI data where we can distinguish brain activity patterns when subjects are exposed to sentences and pictures for two different stimuli and the control group.