Talk: ML Reproducibility: Sources of Algorithmic, Implementation, and Observational Variability
Presented by Kevin Coakley
Tuesday, October 29, 2024 · 4 - 5 PM
Join us for our Talk Tuesday on October 29, 2024 at 4p.
Guest Speaker
Kevin Coakley, Computational and Data Science Research Specialist at the San Diego Supercomputer Center and UC San Diego
Talk Title
ML Reproducibility: Sources of Algorithmic, Implementation, and Observational Variability
Abstract
Reproducibility is fundamental to scientific research, as it underpins
trust, progress, and credibility. In machine learning (ML), achieving
reproducibility is difficult due to variability in algorithms,
implementations, and observational factors. This presentation explores
key contributors to irreproducibility in ML, including algorithmic
factors like hyperparameter tuning and random weight initialization,
implementation differences in software and hardware, and observational
factors such as dataset bias and data preprocessing. It emphasizes the
need to view ML model performance as a distribution, not a single metric
or average of results, and clarifies the difference between
reproducibility and portability. The goal is to guide researchers on
improving ML reproducibility and identifying the critical information
necessary for replicating experimental outcomes.
Speaker Bio
Kevin Coakley is a Computational and Data Science
Research Specialist at the San Diego Supercomputer Center and UC San
Diego focusing on AI reproducibility. Kevin holds a MAS in
Architecture-based Enterprise Systems Engineering and Leadership from UC
San Diego and is pursuing a PhD in Computer Science at the Norwegian
University of Science and Technology. Kevin specializes in training and
evaluating machine learning models for accuracy and reproducibility in
applications like image recognition, time series prediction, and natural
language processing.