Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve. Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models - the multilayer perceptron network, k-nearest neighbor and support vector machine - to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSE-weighted, AUC-weighted, logistic and flexible naive Bayes. The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction.
A Dissertation submitted to the Department of Statistics in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy.
Includes bibliographical references.
Dan McGee, Professor Directing Dissertation; Jinfeng Zhang, Professor Co-Directing Dissertation; Isaac Eberstein, University Representative; Debajyoti Sinha, Committee Member.
Florida State University
Use and Reproduction
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.