Exploring the Data Work Organization of the Gene Ontology
Wu, Shuheng (author)
Stvilia, Besiki (professor directing dissertation)
Bass, Hank W. (university representative)
Jörgensen, Corinne, 1947- (committee member)
Kazmer, Michelle M. (committee member)
Florida State University (degree granting institution)
College of Communication and Information (degree granting college)
School of Library and Information Studies (degree granting department)
The advent of high-throughput techniques has led to exponential increase in the size of biological data encoded in various formats and stored in different databases. This has posed challenges for biological scientists to retrieve, use, analyze, and integrate data. To meet the urgent need of organizing a massive amount of heterogeneous data, there has been a trend towards the development of bio-ontologies. Among many current bio-ontologies, the Gene Ontology (GO) is one of the most successful and has been widely used across different biological communities. This study applied Activity Theory and Stvilia's Information Quality Assessment Framework to examine the infrastructure supporting the development, maintenance, and use of the GO among different biological communities. Employing the netnographic approach, this study gathered data in a natural setting via archival data analysis, participant observations, and qualitative semi-structured interviews. The findings indicated that the GO was collaboratively developed and maintained by a consortium of biological communities, mainly model organism databases. Representatives from each of the GO Consortium member were assigned the role of GO curators and formed into groups working on different aspects of the GO. The division of labor within the GO Consortium ensures that the formidable ontology development process can be divided into manageable projects. The GO Consortium consists of not only biocurators but also software engineers and bioinformaticians, providing technical and software support. As an open community, the GO Consortium has been bringing in new groups and welcomes any individuals to submit content for inclusion in the GO database. GO's collaborative development approach can be adopted by other similar ontologies or large-scale sociotechnical systems. This study also provided a rich description of GO's data quality work and a conceptualization of GO's data quality structure, including a typology of GO's data quality problems and corresponding quality assurance actions. This knowledge base can be used for the design and management of similar sociotechnical systems and the development of best practices for knowledge organization system curation in molecular biology and biomedicine. The data curation skills that were perceived important for the GO can not only inform the training of biocurators, but also give new insight into the curriculum design and training in Library and Information Science and Data Science. The findings of this study can benefit the GO by identifying various data quality issues and contradictions in its data curation work as well as suggesting strategies and actions for improvement. Future research includes developing quantitative models for assessing the quality of different aspects of GO's data curation work. Netnographic studies can be conducted with different groups and teams within the GO Consortium to investigate their data practices and collaboration patterns, which can inform the design of support repertoires for scientific teams.
Activity Theory, Data curation, Data quality, Gene Ontology, knowledge organization, Scientific data
October 24, 2014.
A Dissertation submitted to the School of Information in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Includes bibliographical references.
Besiki Stvilia, Professor Directing Dissertation; Henry W. Bass, University Representative; Corinne L. Jörgensen, Committee Member; Michelle M. Kazmer, Committee Member.
Florida State University
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.