Computer Science and Electrical Engineering
University Of Maryland, Baltimore County
M.S. Thesis Defense
Assessing Confidence in Relation Extraction Systems
Lianjie Sun
2:00ppm Thursday, 27 July 2015, ITE 325b, UMBC
In information extraction, a central and challenging task is extraction of relations. Systems that extract relations from text tend to be very productive, so it is important to quantify confidence or certainty in what is extracted. In this thesis we introduce a framework to assess confidence in relation extraction systems. We trained our system using a logistic regression model based on manually tagged sentences from the New York Times Annotated Corpora. Empirical results based on ROC curves show that our system performs better at computing confidence than previous systems such as Reverb. We conclude with a detailed analysis of the features used in our system and explain how these features might be tailored for use in other relation extraction systems.
Committee: Drs. Tim Oates (chair), Charles Nicholas and Matt Schmill