You are here

Variable Selection of Correlated Predictors in Logistic Regression

Title: Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis.
Name(s): Thompson, Warren R. (Warren Robert), author
McGee, Daniel, professor directing dissertation
Eberstein, Isaac, university representative
Huffer, Fred, committee member
Sinha, Debajyoti, committee member
She, Yiyuan, committee member
Department of Statistics, degree granting department
Florida State University, degree granting institution
Type of Resource: text
Genre: Text
Issuance: monographic
Date Issued: 2009
Publisher: Florida State University
Place of Publication: Tallahassee, Florida
Physical Form: computer
online resource
Extent: 1 online resource
Language(s): English
Abstract/Description: Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the diet-heart hypothesis. The diet-heart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program - a study of CHD incidence in men of Japanese descent. Our results were largely method-specific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Identifier: FSU_migr_etd-1360 (IID)
Submitted Note: A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Degree Awarded: Fall Semester, 2009.
Date of Defense: August 10, 2009.
Keywords: Logistic Regression, Bootstrap, Lasso, Ridge Regression, Bayesian Model Averaging, Diet-Heart Hypothesis
Bibliography Note: Includes bibliographical references.
Advisory Committee: Daniel McGee, Professor Directing Dissertation; Isaac Eberstein, University Representative; Fred Huffer, Committee Member; Debajyoti Sinha, Committee Member; Yiyuan She, Committee Member.
Subject(s): Statistics
Persistent Link to This Record:
Owner Institution: FSU

Choose the citation style.
Thompson, W. R. (W. R. ). (2009). Variable Selection of Correlated Predictors in Logistic Regression: Investigating the Diet-Heart Hypothesis. Retrieved from