You are here

Statistical Approach for Information Extraction of Biological Relationships

Title: A Statistical Approach for Information Extraction of Biological Relationships.
38 views
14 downloads
Name(s): Bell, Lindsey R., author
Zhang, Jinfeng, professor co-directing dissertation
Niu, Xufeng, professor co-directing dissertation
Tyson, Gary, university representative
Huffer, Fred, committee member
Department of Statistics, degree granting department
Florida State University, degree granting institution
Type of Resource: text
Genre: Text
Issuance: monographic
Date Issued: 2011
Publisher: Florida State University
Place of Publication: Tallahassee, Florida
Physical Form: computer
online resource
Extent: 1 online resource
Language(s): English
Abstract/Description: Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and small-molecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and cross-corpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Identifier: FSU_migr_etd-1314 (IID)
Submitted Note: A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philisophy.
Degree Awarded: Summer Semester, 2011.
Date of Defense: June 9, 2011.
Keywords: protein, protein interaction, information extraction
Bibliography Note: Includes bibliographical references.
Advisory Committee: Jinfeng Zhang, Professor Co-Directing Dissertation; Xufeng Niu, Professor Co-Directing Dissertation; Gary Tyson, University Representative; Fred Huffer, Committee Member.
Subject(s): Statistics
Persistent Link to This Record: http://purl.flvc.org/fsu/fd/FSU_migr_etd-1314
Owner Institution: FSU

Choose the citation style.
Bell, L. R. (2011). A Statistical Approach for Information Extraction of Biological Relationships. Retrieved from http://purl.flvc.org/fsu/fd/FSU_migr_etd-1314