Statistics Colloquium : Dr. Dongjun Chung
Medical University of South Carolina
A statistical framework for the integration of GWAS results for multiple diseases with literature mining data
Abstract:
Integration of genetic studies for multiple diseases with biomedical big data is recently considered to be a powerful approach to improve identification of risk genetic variants. However, it still remains challenging to integrate genome-wide association studies (GWAS) datasets for multiple diseases and effectively utilize information in biomedical big data for GWAS data analyses. In this presentation, I will discuss our novel DDNet-graph-GPA framework which addresses these challenges. Specifically, we developed graph-GPA, a novel Bayesian model that integrates multiple GWAS datasets using a latent Markov random field architecture and allows to incorporate external prior biological knowledge. In addition, we also generated a biologically meaningful data resource to infer disease-gene relationships by implementing an effective text mining of biomedical literature utilizing gene ontology knowledge. We further developed DDNet, a public database and web interface that allow researchers mine relationships among diseases based on disease-gene associations in the biomedical literature. We applied the proposed approach to simulation studies and real GWAS datasets, while the disease-disease graph obtained from DDNet was used as prior knowledge for graph-GPA. The results show that the proposed approach does not only improve identification of risk genetic variants, but also facilitates understanding of genetic relationships among complex diseases. Finally, I will discuss our current research projects for more effective utilization of biomedical literature mining data, including GAIL and bayesGO.