Current Search: Research Repository (x) » * (x) » Thesis (x) » Biological Abstracts, Inc (x) » Saint Petersburg (x) » Statistics (x)
Search results
Pages
 Title
 AP Student Visual Preferences for Problem Solving.
 Creator

Swoyer, Liesl, Department of Statistics
 Abstract/Description

The purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that...
Show moreThe purpose of this study is to explore the mathematical preference of high school AP Calculus students by examining their tendencies for using differing methods of thought. A student's preferred mode of thinking was measured on a scale ranging from a preference for analytical thought to a preference for visual thought as they completed derivative and antiderivative tasks presented both algebraically and graphically. This relates to previous studies by continuing to analyze the factors that have been found to mediate the students' performance and preference in regards to a variety of calculus tasks. Data was collected by Dr. Erhan Haciomeroglu at the University of Central Florida. Students' preferences were not affected by gender. Students were found to approach graphical and algebraic tasks similarly, without any significant change with regards to derivative or antiderivative nature of the tasks. Highly analytic and highly visual students revealed the same proportion of change in visuality as harmonic students when more difficult calculus tasks were encountered. Thus, a strong preference for visual thinking when completing algebraic tasks was not the determining factor of their preferred method of thinking when approaching graphical tasks.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_uhm0052
 Format
 Thesis
 Title
 Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters.
 Creator

Guo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department...
Show moreGuo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated...
Show moreOur view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar kads with a VeroneseWhitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VWmeans. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively antimean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet antimean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic antimean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( antimean ) located at a smooth point, one derives the limit distribution of such estimators.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Guo_fsu_0071E_13977
 Format
 Thesis
 Title
 An Effective and Efficient Approach for Clusterability Evaluation.
 Creator

Adolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of...
Show moreAdolfsson, Andreas, Ackerman, Margareta, Brownstein, Naomi Chana, Haiduc, Sonia, Tyson, Gary Scott, Florida State University, College of Arts and Sciences, Department of Computer Science
Show less  Abstract/Description

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real...
Show moreClustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real datasets. In this thesis, we propose a novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure in real data. Our method applies multimodality tests to the (onedimensional) set of pairwise distances based on the original, potentially highdimensional data. We present extensive analyses of our approach for both the Dip and Silverman multimodality tests on real data as well as 17,000 simulations, demonstrating the success of our approach as the first practical notion of clusterability.
Show less  Date Issued
 2016
 Identifier
 FSU_SUMMER2017_Adolfsson_fsu_0071N_13478
 Format
 Thesis
 Title
 Regression Methods for Skewed and Heteroscedastic Response with HighDimensional Covariates.
 Creator

Wang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences,...
Show moreWang, Libo, Sinha, Debajyoti, Taylor, Miles G., Pati, Debdeep, She, Yiyuan, Yang, Yun (Professor of Statistics), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The rise of studies with highdimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response...
Show moreThe rise of studies with highdimensional potential covariates has invited a renewed interest in dimension reduction that promotes more parsimonious models, ease of interpretation and computational tractability. However, current variable selection methods restricted to continuous response often assume Gaussian response for methodological as well as theoretical developments. In this thesis, we consider regression models that induce sparsity, gain prediction power, and accommodates response distributions beyond Gaussian with common variance. The first part of this thesis is a transformbothside Bayesian variable selection model (TBS) which allows skewness, heteroscedasticity and extreme heavy tailed responses. Our method develops a framework which facilitates computationally feasible inference in spite of inducing nonlocal priors on the original regression coefficients. Even if the transformed conditional mean is no longer linear with respect to covariates, we still prove the consistency of our Bayesian TBS estimators. Simulation studies and real data analysis demonstrate the advantages of our methods. Another main part of this thesis deals the above challenges from a frequentist standpoint. This model incorporates a penalized likelihood to accommodate skewed response, arising from an epsilonskewnormal (ESN) distribution. With suitable optimization techniques to handle this twopiece penalized likelihood, our method demonstrates substantial gains in sensitivity and specificity even under highdimensional settings. We conclude this thesis with a novel Bayesian semiparametric modal regression method along with its implementation and simulation studies.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Wang_fsu_0071E_13950
 Format
 Thesis
 Title
 Nonparametric Change Point Detection Methods for Profile Variability.
 Creator

Geneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering)...
Show moreGeneus, Vladimir J. (Vladimir Jacques), Chicken, Eric, Liu, Guosheng (Professor of Earth, Ocean and Atmospheric Science), Sinha, Debajyoti, Zhang, Xin (Professor of Engineering), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Due to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Waveletbased change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of...
Show moreDue to the importance of seeing profile change in devices such as of medical apparatus, measuring the change point in variability of a different functions is important. In a sequence of functional observations (each of the same length), we wish to determine as quickly as possible when a change in the observations has occurred. Waveletbased change point methods are proposed that determine when the variability of the noise in a sequence of functional profiles (i.e. the precision profile of medical devices) has occurred; goes out of control from a known, fixed value, or an estimated incontrol value. Various methods have been proposed which focus on changes in the form of the function. One method, the NEWMA, based on EWMA, focuses on changes in both. However, the drawback is that the form of the incontrol function is known. Others methods, including the χ² for Phase I & Phase II make some assumption about the function. Our interest, however, is in detecting changes in the variance from one function to the next. In particular, we are interested not on differences from one profile to another (variance between), rather differences in variance (variance within). The functional portion of the profiles is allowed to come from a large class of functions and may vary from profile to profile. The estimator is evaluated on a variety of conditions, including allowing the wavelet noise subspace to be substantially contaminated by the profile's functional structure, and is compared to two competing noise monitoring methods. Nikoo and Noorossana (2013) propose a nonparametric wavelet regression method that uses both change point techniques to monitor the variance: a Nonparametric Control Charts, via the mean of m median control charts, and a Parametric Control Charts, via χ²distribution. We propose improvements to their method by incorporating prior data and making use of likelihood ratios. Our methods make use of the orthogonal properties of wavelet projections to accurately and efficiently monitor the level of noise from one profile to the next; detect changes in noise in Phase II setting. We show through simulation results that our proposed methods have better power and are more robust against the confounding effect between variance estimation and function estimation. The proposed methods are shown to be very efficient at detecting when the variability has changed through an extensive simulation study. Extensions are considered that explore the usage of windowing and estimated incontrol values for the MAD method; and the effect of the exact distribution under normality rather than the asymptotic distribution. These developments are implemented in the parametric, nonparametric scale, and complete nonparameric settings. The proposed methodologies are tested through simulation and applicable to various biometric and health related topics; and have the potential to improve in computational efficiency and in reducing the number of assumptions required.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Geneus_fsu_0071E_13862
 Format
 Thesis
 Title
 Scalable and Structured High Dimensional Covariance Matrix Estimation.
 Creator

Sabnis, Gautam, Pati, Debdeep, Kercheval, Alec N., Sinha, Debajyoti, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

With rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of highdimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lowerdimensional structure that explains the physical or...
Show moreWith rapid advances in data acquisition and storage techniques, modern scientific investigations in epidemiology, genomics, imaging and networks are increasingly producing challenging data structures in the form of highdimensional vectors, matrices and multiway arrays (tensors) rendering traditional statistical and computational tools inappropriate. One hope for meaningful inferences in such situations is to discover an inherent lowerdimensional structure that explains the physical or biological process generating the data. The structural assumptions impose constraints that force the objects of interest to lie in lowerdimensional spaces, thereby facilitating their estimation and interpretation and, at the same time reducing computational burden. The assumption of an inherent structure, motivated by various scientific applications, is often adopted as the guiding light in the analysis and is fast becoming a standard tool for parsimonious modeling of such high dimensional data structures. The content of this thesis is specifically directed towards methodological development of statistical tools, with attractive computational properties, for drawing meaningful inferences though such structures. The third chapter of this thesis proposes a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for highdimensional Bayesian factor models. Our approach distributes the task of highdimensional covariance matrix estimation to multiple cores, solves each subproblem separately via a latent factor model, and then combines these estimates to produce a global estimate of the covariance matrix. Existing divide and conquer methods focus exclusively on dividing the total number of observations n into subsamples while keeping the dimension p fixed. The approach is novel in this regard: it includes all of the n samples in each subproblem and, instead, splits the dimension p into smaller subsets for each subproblem. The subproblems themselves can be challenging to solve when p is large due to the dependencies across dimensions. To circumvent this issue, a novel hierarchical structure is specified on the latent factors that allows for flexible dependencies across dimensions, while still maintaining computational efficiency. Our approach is readily parallelizable and is shown to have computational efficiency of several orders of magnitude in comparison to fitting a full factor model. The fourth chapter of this thesis proposes a novel way of estimating a covariance matrix that can be represented as a sum of a lowrank matrix and a diagonal matrix. The proposed method compresses highdimensional data, computes the sample covariance in the compressed space, and lifts it back to the ambient space via a decompression operation. A salient feature of our approach relative to existing literature on combining sparsity and lowrank structures in covariance matrix estimation is that we do not require the lowrank component to be sparse. A principled framework for estimating the compressed dimension using Stein's Unbiased Risk Estimation theory is demonstrated. In the final chapter of this thesis, we tackle the problem of variable selection in high dimensions. Consistent model selection in high dimensions has received substantial interest in recent years and is an extremely challenging problem for Bayesians. The literature on model selection with continuous shrinkage priors is even lessdeveloped due to the unavailability of exact zeros in the posterior samples of parameter of interest. Heuristic methods based on thresholding the posterior mean are often used in practice which lack theoretical justification, and inference is highly sensitive to the choice of the threshold. We aim to address the problem of selecting variables through a novel method of post processing the posterior samples.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Sabnis_fsu_0071E_14043
 Format
 Thesis
 Title
 Analysis of Multivariate Data with Random Cluster Size.
 Creator

Li, Xiaoyun, Sinha, Debajyoti, Zhou, Yi, McGee, Dan, Lipsitz, Stuart, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as emph{clustered data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal data} when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real...
Show moreIn this dissertation, we examine binary correlated data with present/absent component or missing data that are related to binary responses of interest. Depending on the data structure, correlated binary data can be referred as emph{clustered data} if sampling unit is a cluster of subjects, or it can be referred as emph{longitudinal data} when it involves repeated measurement of same subject over time. We propose our novel models in these two data structures and illustrate the model with real data applications. In biomedical studies involving clustered binary responses, the cluster size can vary because some components of the cluster can be absent. When both the presence of a cluster component as well as the binary disease status of a present component are treated as responses of interest, we propose a novel twostage random effects logistic regression framework. For the ease of interpretation of regression effects, both the marginal probability of presence/absence of a component as well as the conditional probability of disease status of a present component, preserve the approximate logistic regression forms. We present a maximum likelihood method of estimation implementable using standard statistical software. We compare our models and the physical interpretation of regression effects with competing methods from literature. We also present a simulation study to assess the robustness of our procedure to wrong specification of the random effects distribution and to compare finite sample performances of estimates with existing methods. The methodology is illustrated via analyzing a study of the periodontal health status in a diabetic Gullah population. We extend this model in longitudinal studies with binary longitudinal response and informative missing data. In longitudinal studies, when treating each subject as a cluster, cluster size is the total number of observations for each subject. When data is informatively missing, cluster size of each subject can vary and is related to the binary response of interest and we are also interested in the missing mechanism. This is a modified situation of the cluster binary data with present components. We modify and adopt our proposed twostage random effects logistic regression model so that both the marginal probability of binary response and missing indicator as well as the conditional probability of binary response and missing indicator preserve logistic regression forms. We present a Bayesian framework of this model and illustrate our proposed model on an AIDS data example.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1425
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Variable Selection of Correlated Predictors in Logistic Regression: Investigating the DietHeart Hypothesis.
 Creator

Thompson, Warren R. (Warren Robert), McGee, Daniel, Eberstein, Isaac, Huﬀer, Fred, Sinha, Debajyoti, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the...
Show moreVariable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the dietheart hypothesis. The dietheart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program  a study of CHD incidence in men of Japanese descent. Our results were largely methodspecific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1360
 Format
 Thesis
 Title
 Some New Methods for Design and Analysis of Survival Data.
 Creator

Wang, Wenting, Sinha, Debajyoti, Arjmandi, Bahram H., McGee, Dan, Niu, Xufeng, Yu, Kai, Department of Statistics, Florida State University
 Abstract/Description

For survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival...
Show moreFor survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival model (POSM). We presented a sufficient condition to check whether logrank based tests have inflated Type I error rates. We show that POSM and some other commonly used survival models within the LTM class all satisfy this condition. Simulation studies show that repeated use of our test instead of using logrank based tests will be a safer statistical practice. Our second goal is to develop a practical Bayesian model for survival data with high dimensional covariate vector. We develop the Information Matrix (IM) and Information Matrix Ridge (IMR) priors for commonly used survival models including the Cox's model and the cure rate model proposed by Chen et al. (1999), and examine many desirable theoretical properties including sufficient conditions for the existence of the moment generating functions for these priors and corresponding posterior distributions. The performance of these priors in practice is compared with some competing priors via the Bayesian analysis of a study that investigates the relationship between lung cancer survival time and a large number of genetic markers.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1248
 Format
 Thesis
 Title
 Bayesian Generalized Polychotomous Response Models and Applications.
 Creator

Yang, Fang, Niu, XuFeng, Johnson, Suzanne B., McGee, Dan, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Polychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from...
Show morePolychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from Gibbs sampling estimates will be compared to traditional maximum likelihood estimates to show the strength that using the uniform improper priors for the regression coeffcients. Motivated by investigating of relationship between BMI categories and several risk factors, we carry out the application studies to examine the impact of risk factors on BMI categories, especially for categories of "Overweight" and "Obesities". By applying the mixedeffects Bayesian polychotomous response model with uniform improper priors, we would get similar interpretations of the association between risk factors and BMI, comparing to literature findings.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1092
 Format
 Thesis
 Title
 Covariance on Manifolds.
 Creator

Balov, Nikolay H. (Nikolay Hristov), Srivastava, Anuj, Klassen, Eric, Patrangenaru, Victor, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

With ever increasing complexity of observational and theoretical data models, the sufficiency of the classical statistical techniques, designed to be applied only on vector quantities, is being challenged. Nonlinear statistical analysis has become an area of intensive research in recent years. Despite the impressive progress in this direction, a unified and consistent framework has not been reached. In this regard, the following work is an attempt to improve our understanding of random...
Show moreWith ever increasing complexity of observational and theoretical data models, the sufficiency of the classical statistical techniques, designed to be applied only on vector quantities, is being challenged. Nonlinear statistical analysis has become an area of intensive research in recent years. Despite the impressive progress in this direction, a unified and consistent framework has not been reached. In this regard, the following work is an attempt to improve our understanding of random phenomena on nonEuclidean spaces. More specifically, the motivating goal of the present dissertation is to generalize the notion of distribution covariance, which in standard settings is defined only in Euclidean spaces, on arbitrary manifolds with metric. We introduce a tensor field structure, named covariance field, that is consistent with the heterogeneous nature of manifolds. It not only describes the variability imposed by a probability distribution but also provides alternative distribution representations. The covariance field combines the distribution density with geometric characteristics of its domain and thus fills the gap between these two.We present some of the properties of the covariance fields and argue that they can be successfully applied to various statistical problems. In particular, we provide a systematic approach for defining parametric families of probability distributions on manifolds, parameter estimation for regression analysis, nonparametric statistical tests for comparing probability distributions and interpolation between such distributions. We then present several application areas where this new theory may have potential impact. One of them is the branch of directional statistics, with domain of influence ranging from geosciences to medical image analysis. The fundamental level at which the covariance based structures are introduced, also opens a new area for future research.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1045
 Format
 Thesis
 Title
 A Study of the Asymptotic Properties of Lasso Estimates for Correlated Data.
 Creator

Gupta, Shuva, Bunea, Florentina, Gert, Joshua, Hollander, Myles, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors...
Show moreIn this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors that have a weakdependency structure (Doukhan 1996). In all cases, the number M of regression variables is allowed to exceed the sample size n. We further investigate the asymptotic distribution of our estimates, when M < n, and show that under appropriate choices of the tuning parameters the limiting distribution is multivariate normal. This generalizes to the case of correlated errors the result of Knight and Fu (2000), obtained for regression models with independent errors.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3896
 Format
 Thesis
 Title
 Nonparametric Estimation of Three Dimensional Projective Shapes with Applications in Medical Imaging and in Pattern Recognition.
 Creator

Crane, Michael, Patrangenaru, Victor, Liu, Xiuwen, Huﬀer, Fred W., Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

This dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision reconstruction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic...
Show moreThis dissertation is on analysis of invariants of a 3D configuration from its 2D images in pictures of this configuration, without requiring any restriction on the camera positioning relative to the scene pictured. We briefly review some of the main results found in the literature. The methodology used is nonparametric, manifold based combined with standard computer vision reconstruction techniques. More specifically, we use asymptotic results for the extrinsic sample mean and the extrinsic sample covariance to construct bootstrap confidence regions for mean projective shapes of 3D configurations. Chapters 4, 5 and 6 contain new results. In chapter 4, we develop tests for coplanarity. In chapter 5, is on reconstruction of 3D polyhedral scenes, including texture from arbitrary partial views. In chapter 6, we develop a nonparametric methodology for estimating the mean change for matched samples on a Lie group. We then notice that for k ≥ 4, a manifold of projective shapes of kads in general position in 3D has a structure of 3k − 15 dimensional Lie group (PQuaternions) that is equivariantly embedded in an Euclidean space, therefore testing for mean 3D projective shape change amounts to a one sample test for extrinsic mean PQuaternion Objects. The Lie group technique leads to a large sample and nonparametric bootstrap test for one population extrinsic mean on a projective shape space, as recently developed by Patrangenaru, Liu and Sughatadasa. On the other hand, in absence of occlusions, the 3D projective shape of a spatial configuration can be recovered from a stereo pair of images, thus allowing to test for mean glaucomatous 3D projective shape change detection from standard stereo pairs of eye images.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd4607
 Format
 Thesis
 Title
 A Probabilistic and Graphical Analysis of Evidence in O.J. Simpson's Murder Case Using Bayesian Networks.
 Creator

Olumide, Kunle, Huﬀer, Fred, Shute, Valerie, Sinha, Debajyoti, Niu, Xufeng, Logan, Wayne, Department of Statistics, Florida State University
 Abstract/Description

This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of...
Show moreThis research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or nonguilt of the accused; this main hypothesis will be supplemented by a series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such an evaluation. Some sensitivity analyses will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd2287
 Format
 Thesis
 Title
 Investigating the Use of Mortality Data as a Surrogate for Morbidity Data.
 Creator

Miller, Gregory, Hollander, Myles, McGee, Daniel, Hurt, Myra, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate...
Show moreWe are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and find that the coefficients differ for the age covariate, but find no significant differences for the other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to represent the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped confidence intervals, and find that age and systolic blood pressure have differences with their covariates. We continue the simulation by using Net Reclassification Index (NRI) to evaluate the treatment decision performance of the two models, and find that the two models do not perform significantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across different risk models, treatment decisions based on classifying an upper specified percent as high risk will not be significantly different. We conclude the dissertation with statements about future methods for approaching our question.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd2408
 Format
 Thesis
 Title
 Sparse Factor AutoRegression for Forecasting Macroeconomic Time Series with Very Many Predictors.
 Creator

Galvis, Oliver Kurt, She, Yiyuan, Okten, Giray, Beaumont, Paul, Huﬀer, Fred, Tao, Minjing, Department of Statistics, Florida State University
 Abstract/Description

Forecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are...
Show moreForecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are meaningful and advantageous for accuracy of the forecasting task. In macroeconomics, multiple categories are available with the target series belonging to one of them. With all series available we advocate constructing category level factors to enhance the performance of the forecasting task. We introduce a novel methodology, the Sparse Factor AutoRegression (SFAR) methodology, to construct predictive factors from a reduced set of relevant time series. SFAR attains dimension reduction via joint variable selection and rank reduction in high dimensional time series data. A multivariate setting is used to achieve simultaneous low rank and cardinality control on the matrix of coefficients where $ell_{0}$constraint regulates the number of useful series and the rank constrain elucidates the upper bound for constructed factors. The doublyconstrained matrix is a nonconvex mathematical problem optimized via an efficient iterative algorithm with a theoretical guarantee of convergence. SFAR fits factors using a sparse low rank matrix in response to a target category series. Forecasting is then performed using lagged observations and shrinkage methods. We generate a finite sample data to verify our theoretical findings via a comparative study of the SFAR. We also analyze realworld macroeconomic time series data to demonstrate the usage of the SFAR in practice.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8990
 Format
 Thesis
 Title
 Adaptive Series Estimators for Copula Densities.
 Creator

Gui, Wenhao, Wegkamp, Marten, Van Engelen, Robert A., Niu, Xufeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

In this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the...
Show moreIn this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the copula density estimator based on a finite series using the estimated coefficients. We propose a stopping rule for selecting the number of coefficients used in the series and we prove that this rule minimizes the mean integrated squared error. In addition, we consider hard and soft thresholding techniques for sparse representations. We obtain oracle inequalities that hold with prescribed probability for various norms of the difference between the copula density and our threshold series density estimator. Uniform confidence bands are derived as well. The oracle inequalities clearly reveal that our estimator adapts to the unknown degree of sparsity of the series representation of the copula density. A simulation study indicates that our method is extremely easy to implement and works very well, and it compares favorably to the popular kernel based copula density estimator, especially around the boundary points, in terms of mean squared error. Finally, we have applied our method to an insurance dataset. After comparing our method with the previous data analyses, we reach the same conclusion as the parametric methods in the literature and as such we provide additional justification for the use of the developed parametric model.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3929
 Format
 Thesis
 Title
 Estimating the Probability of Cardiovascular Disease: A Comparison of Methods.
 Creator

Fan, Li, McGee, Daniel, Hurt, Myra, Niu, XuFeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Risk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for...
Show moreRisk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for risk assessment and evaluates the performance of prediction at two levels, including the overall level and the individual level. At the overall level, cross validation and simulation are used to assess the risk prediction, while at the individual level, the "Parametric Bootstrap" and the delta method are used to evaluate the uncertainty of the individual risk prediction. Further exploration of the reasons producing different performance among the models is ongoing.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd4508
 Format
 Thesis
 Title
 The Oneand TwoSample Problem for Data on Hilbert Manifolds with Applications to Shape Analysis.
 Creator

Qiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College...
Show moreQiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we...
Show moreThis dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we consider testing a onesample null hypothesis, while in the infinite dimensional case, we considered a neighborhood hypothesis testing methods. For 3D scenes, we retrieve the 3D projective shape, and use the Lie group structure of the projective shape space. We test the equality of two extrinsic means, by introducing the mean projective shape change. For 2D MRI of midsections of Corpus Callosum contours, we use an automatic matching technique that is necessary in pursuing a onesample neighborhood hypothesis testing for the similarity shapes. We conclude that the mean similarity shape of the Corpus Callosum of average individuals is very far from the shape of Albert Einstein's, which may explain his geniality. Another application of our Hilbert manifold methodology is twosample testing problem for VeroneseWhitney means of projective shapes of 3D contours. Particularly, our data consisting comparing 3D projective shapes of contours of leaves from the same tree species.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Qiu_fsu_0071E_12922
 Format
 Thesis
 Title
 Statistical Methods for Big Data and Their Applications in Biomedical Research.
 Creator

Yu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences,...
Show moreYu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Big data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed...
Show moreBig data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed a novel threestage BN structure learning method, called GRASP (GRowthbased Approach with Staged Pruning). In the first stage, a new skeleton (undirected edges) discovery method, double filtering (DF), was designed. Compared to existing methods, DF requires smaller sample sizes to achieve similar statistical power. Based on the skeleton estimated in the first step, we proposed a sequential Monte Carlo (SMC) method to sample the edges and their directions to optimize a BICbased score. SMC method has less tendency to be trapped in local optima, and the computation is easily parallelizable. On the third stage, we reclaim the edges that may be missed from previous stages. We obtained satisfactory results from simulation study and applied the method to infer GRNs from real experimental data. A method on personalized chemotherapy regimen selection for breast cancer and a novel algorithm for relationship extraction from unstructured documents will be discussed as well.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Yu_fsu_0071E_13079
 Format
 Thesis
 Title
 Quasi3D Statistical Inversion of Oceanographic Tracer Data.
 Creator

Herbei, Radu, Speer, Kevin, Wegkamp, Marten, Laurent, Louis St., Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error....
Show moreWe perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error. This approach estimates both horizontal and vertical flow as well as diffusion coefficients. We find a system of alternating zonal jets at the depths of the North Atlantic Deep Water, consistent with direct measurements of flow and concentration maps. A uniqueness analysis of our model is performed in terms of the oxygen consumption rate. The vertical mixing coefficient bears some relation to the bottom topography even though we do not incorporate that into our model. We extend the method to a multilayer model, using thermal wind relations weakly in a local fashion (as opposed to integrating the entire water column) to connect layers vertically. Results suggest that the estimated deep zonal jets extend vertically, with a clear depth dependent structure. The vertical structure of the flow field is modified by the tracer fields over that set a priori by thermal wind. Our estimates are consistent with observed flow at the depths of the Antarctic Intermediate Water; at still shallower depths, above the layers considered here, the subtropical gyre is a significant feature of the horizontal flow.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd4101
 Format
 Thesis
 Title
 Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques.
 Creator

Scolnik, Ryan, McGee, Daniel, Slate, Elizabeth H., Eberstein, Isaac W., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of...
Show moreScolnik, Ryan, McGee, Daniel, Slate, Elizabeth H., Eberstein, Isaac W., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The...
Show moreEvaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, LogLoss, Rsquared and Fscore. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the NewtonRaphson algorithm, the maximization of the area under the ROC curve requires optimization of a nonlinear, nondifferentiable function. Usage of the NelderMead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Scolnik_fsu_0071E_13146
 Format
 Thesis
 Title
 TimeVarying Mixture Models for Financial Risk Management.
 Creator

Zhang, Shuguang, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Motivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually...
Show moreMotivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually assumed for asset returns that may underestimate the real risk. To address this issue, in this study we propose a series of twocomponent mixture models  one component is normal distribution and the other is a fattailed distribution such as Cauchy distribution, student's tdistribution or Gumbel distribution. Instead of assuming distribution parameters and weights to be constant, we allow them to change over time which guarantees exibility of our models. Monte Carlo ExpectationMaximization method and Monte Carlo maximum likelihood estimation were used for parameter estimation. Simulation studies are conducted and the models are applied in stock market price data.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Zhang_fsu_0071E_13150
 Format
 Thesis
 Title
 Sparse Generalized PCA and Dependency Learning for LargeScale Applications Beyond Gaussianity.
 Creator

Zhang, Qiaoya, She, Yiyuan, Ma, Teng, Niu, Xufeng, Sinha, Debajyoti, Slate, Elizabeth H., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

The age of big data has reinvited much interest in dimension reduction. How to cope with highdimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reductionprojecting data into a lowerrank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection...
Show moreThe age of big data has reinvited much interest in dimension reduction. How to cope with highdimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reductionprojecting data into a lowerrank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing value into consideration. Based upon the best performing one, we develop the SGPCA algorithm. With acceleration techniques and a progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning (GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular distribution of the data, GIDL takes any prespecified smooth loss function and is able to both extract and infuse its association into the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Zhang_fsu_0071E_13087
 Format
 Thesis
 Title
 Modeling Differential Item Functioning (DIF) Using Multilevel Logistic Regression Models: A Bayesian Perspective.
 Creator

Chaimongkol, Saengla, Huﬀer, Fred W., Kamata, Akihito, Tate, Richard, Niu, XuFeng, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

A multilevel logistic regression approach provides an attractive and practical alternative for the study of Differential Item Functioning (DIF). It is not only useful for identifying items with DIF but also for explaining the presence of DIF. Kamata and Binici (2003) first attempted to identify group unit characteristic variables explaining the variation of DIF by using hierarchical generalized linear models. Their models were implemented by the HLM5 software, which uses the penalized or...
Show moreA multilevel logistic regression approach provides an attractive and practical alternative for the study of Differential Item Functioning (DIF). It is not only useful for identifying items with DIF but also for explaining the presence of DIF. Kamata and Binici (2003) first attempted to identify group unit characteristic variables explaining the variation of DIF by using hierarchical generalized linear models. Their models were implemented by the HLM5 software, which uses the penalized or predictive quasilikelihood (PQL) method. They found that the variance estimates produced by HLM5 for the level 3 parameters are substantially negatively biased. This study extends their work by using a Bayesian approach to obtain more accurate parameter estimates. Two different approaches to modeling the DIF will be presented. These are referred to as the relative and mixture distribution approach, respectively. The relative approach measures the DIF of a particular item relative to the mean overall DIF for all items in the test. The mixture distribution approach treats the DIF as independent values drawn from a distribution which is a mixture of a normal distribution and a discrete distribution concentrated at zero. A simulation study is presented to assess the adequacy of the proposed models. This work also describes and studies models which allow the DIF to vary at level 3 (from school to school). In an example using real data, it is shown how the models can be applied to the identification of items with DIF and the explanation of the source of the DIF.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3939
 Format
 Thesis
 Title
 Time Scales in Epidemiological Analysis.
 Creator

Chalise, Prabhakar, McGee, Daniel L., Chicken, Eric, Carlson, Elwood, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the...
Show moreThe Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the entryage. Also, it has been asserted that if the cumulative baseline hazard is exponential or if the ageatentry is independent of covariate, the two models are equivalent. Our studies do not satisfy these two conditions in general. We found that the true factor that makes the models perform significantly different is the variability in the ageatentry. If there is no variability in the entryage, time scales do not matter and both models estimate exactly the same coefficients. As the variability increases the models disagree with each other. We also computed the optimum time scale proposed by Oakes and utilized them for the Cox model. Both of our empirical and simulation studies show that follow up time scale model using age at entry as a covariate is better than the chronological age and Oakes time scale models. This finding is illustrated with two examples with data from Diverse Population Collaboration. Based on our findings, we recommend using follow up time as a time scale for epidemiological analysis.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3933
 Format
 Thesis
 Title
 New Semiparametric Methods for Recurrent Events Data.
 Creator

Gu, Yu, Sinha, Debajyoti, Eberstein, Isaac W., McGee, Dan, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Recurrent events data are rising in all areas of biomedical research. We present a model for recurrent events data with the same link for the intensity and mean functions. Simple interpretations of the covariate effects on both the intensity and mean functions lead to a better understanding of the covariate effects on the recurrent events process. We use partial likelihood and empirical Bayes methods for inference and provide theoretical justifications and as well as relationships between...
Show moreRecurrent events data are rising in all areas of biomedical research. We present a model for recurrent events data with the same link for the intensity and mean functions. Simple interpretations of the covariate effects on both the intensity and mean functions lead to a better understanding of the covariate effects on the recurrent events process. We use partial likelihood and empirical Bayes methods for inference and provide theoretical justifications and as well as relationships between these methods. We also show the asymptotic properties of the empirical Bayes estimators. We illustrate the computational convenience and implementation of our methods with the analysis of a heart transplant study. We also propose an additive regression model and associated empirical Bayes method for the risk of a new event given the history of the recurrent events. Both the cumulative mean and rate functions have closed form expressions for our model. Our inference method for the simiparametric model is based on maximizing a finite dimensional integrated likelihood obtained by integrating over the nonparametric cumulative baseline hazard function. Our method can accommodate timevarying covariates and is easier to implement computationally instead of iterative algorithm based full Bayes methods. The asymptotic properties of our estimates give the largesample justifications from a frequentist stand point. We apply our method on a study of heart transplant patients to illustrate the computational convenience and other advantages of our method.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3941
 Format
 Thesis
 Title
 New Methods in Tornado Risk and Vulnerability Assessments.
 Creator

Widen, Holly Marie, Elsner, James B., Hart, Robert E. (Robert Edward), Uejio, Christopher K., Pau, Stephanie, Medders, Lori A., Florida State University, College of Social...
Show moreWiden, Holly Marie, Elsner, James B., Hart, Robert E. (Robert Edward), Uejio, Christopher K., Pau, Stephanie, Medders, Lori A., Florida State University, College of Social Sciences and Public Policy, Department of Geography
Show less  Abstract/Description

This dissertation includes a series of studies that present innovative methodologies to improve tornado risk and vulnerability assessments. Limitations of the historical tornado dataset are well known and relate to inconsistencies in data collection procedures, rating assessments, updates in technology, and public awareness. The limitations make it difficult to accurately evaluate tornado risk and vulnerability. Thus, the research presented in this dissertation aims to 1) improve tornado risk...
Show moreThis dissertation includes a series of studies that present innovative methodologies to improve tornado risk and vulnerability assessments. Limitations of the historical tornado dataset are well known and relate to inconsistencies in data collection procedures, rating assessments, updates in technology, and public awareness. The limitations make it difficult to accurately evaluate tornado risk and vulnerability. Thus, the research presented in this dissertation aims to 1) improve tornado risk assessments using the historical dataset by accounting for known nonmeteorological factors and 2) enhance tornado vulnerability assessments by utilizing a new dataset containing more precise damage survey data. This work includes three individual studies, two focused on risk and one on vulnerability, using different geographic scales. Tornado occurrence rates computed from the available reports are biased low relative to the unknown true rates. A method to estimate the annual statewide probability of getting hit by a tornado improves this low bias by using the average report density as a function of distance from nearest city center. The method is demonstrated on Kansas and then applied to 15 other tornadoprone states from Nebraska to Tennessee over the period 19502011. The adjusted rates are significantly higher than the raw rates and thus, the return periods are less than previously thought (closer to 1000 years). The expected annual number of people exposed to tornadoes has also increased for every state. The evaluation of tornado occurrences is improved using a statistical model that produces a smoothed regionalscale climatology. The model is applied to data aggregated at the county level, including annual population, annual tornado counts, and an index of terrain roughness. The model has a term to capture the smoothed frequency relative to the state average and is used to examine additional hypotheses concerning relationships of tornado activity with terrain roughness and County Warning Area. Tornado reports are found to increase by 13\% for a twofold increase in population across Kansas after accounting for improvements in rating procedures. The pattern of spatially correlated errors also shows Kansas tornado activity to be consistent with the dryline climatology. The model is significantly improved by adding terrain roughness, which has a negative relationship with tornado activity and its flexibility is demonstrated by fitting it to data from Illinois, Mississippi, South Dakota, and Ohio. Advancements in technology have improved the collection of tornado damage survey data which can be used to enhance vulnerability assessments. The National Weather Service (NWS) Damage Assessment Toolkit (DAT) contains the most extensive GISbased damage survey data available to the public which provides more precise damage path areas. These data are used with socioeconomic data in two statistical models. The models are developed to determine which factors are significant predictors of the incidence and magnitude of casualties while accounting for maximum EF Scale rating, total path area, and population density at the storm level. Percent unemployment is a significant predictor and produces the best model for the incidence of at least one tornado casualty. Although percent elderly generates the best model for predicting the magnitude of casualties, it is only marginally significant and its relationship is negative. The Southeast has the highest averages of the sensitivity factors considering all of the tornado events. These results highlight the need for heightened tornado awareness and preparedness as our exposure to these events increases due to our population continuing to expand. As demonstrated in this work, these methods can be used to enhance regional/local tornado forecasts, insurance risk estimates, public policy, urban planning, and emergency management and mitigation with the detection of spatiotemporal patterns in tornado activity (due to variations in climate) and vulnerability (due to changes in population demographics and urban sprawl). They can be employed to examine other geographic locations on multiple scales. They can also be adapted to study the patterns and relationships of other spatial and temporal phenomena.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Widen_fsu_0071E_13208
 Format
 Thesis
 Title
 A Statistical Approach to an Ocean Circulation Inverse Problem.
 Creator

Choi, Seoeun, Huﬀer, Fred W., Speer, Kevin G., Nolder, Craig, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the...
Show moreThis dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the prior, the equation of the Stommel Gulf Stream model on a twodimensional grid is used.Comparisons with earlier approaches used by oceanographers are also presented.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd3758
 Format
 Thesis
 Title
 Modeling Multivariate Data with ParameterBased Subspaces.
 Creator

Gupta, Ajay, Barbu, Adrian G. (Adrian Gheorghe), MeyerBaese, Anke, She, Yihuan, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

When modeling multivariate data such as vectorized images, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by yaw rotation, and one would expect a face rotated 65 degrees to the left to have characteristics more similar to a face rotated 55 degrees to the left than to a face rotated 65 degrees to the right. We introduce a novel method, parameterized principal component...
Show moreWhen modeling multivariate data such as vectorized images, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by yaw rotation, and one would expect a face rotated 65 degrees to the left to have characteristics more similar to a face rotated 55 degrees to the left than to a face rotated 65 degrees to the right. We introduce a novel method, parameterized principal component analysis (PPCA), that can model data with linear variation like principal component analysis (PCA), but can also take advantage of this parameter of contextual information like yaw rotation. Like PCA, PPCA models an observation using a mean vector and the product of observationspecific coefficients and basis vectors. Unlike PCA, PPCA treats the elements of the mean vector and basis vectors as smooth, piecewise linear functions of the contextual parameter. PPCA is fit by a penalized optimization that penalizes potential models which have overly large differences between corresponding mean or basis vector elements for similar parameter values. The penalty ensures that each observation's projection will share information with observations that have similar parameter values, but not with observations that have dissimilar parameter values. We tested PPCA on artificial data based on known, smooth functions of an added parameter, as well as on three real datasets with different types of parameters. We compared PPCA to independent principal component analysis (IPCA), which groups observations by their parameter values and projects each group using principal component analysis with no sharing of information for different groups. PPCA recovers the known functions with less error and projects the datasets' test set observations with consistently less reconstruction error than IPCA does. PPCA's performance is particularly strong, relative to IPCA, when there are limited training data. We also tested the use of spectral clustering to form the groups in an IPCA model. In our experiment, the clustered IPCA model had very similar error to the parameterbased IPCA model, suggesting that spectral clustering might be a viable alternative if one did not know the parameter values for an application.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Gupta_fsu_0071E_13422
 Format
 Thesis
 Title
 Bayesian Inference and Novel Models for Survival Data with Cured Fraction.
 Creator

Gupta, Cherry Chunqi Huang, Sinha, Debajyoti, Glueckauf, Robert L., Slate, Elizabeth H., Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of...
Show moreGupta, Cherry Chunqi Huang, Sinha, Debajyoti, Glueckauf, Robert L., Slate, Elizabeth H., Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Existing curerate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. They also do not allow inference on the change in the number of clonogens over time. This dissertation proposes two novel classes of curerate model, the transformbothsides curerate model (TBSCRM) and the clonogen proliferation curerate model (CPCRM). Both can be used to make inference about both the curerate and the survival...
Show moreExisting curerate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. They also do not allow inference on the change in the number of clonogens over time. This dissertation proposes two novel classes of curerate model, the transformbothsides curerate model (TBSCRM) and the clonogen proliferation curerate model (CPCRM). Both can be used to make inference about both the curerate and the survival probabilities over time. The TBSCRM can also produce estimates of a patient's quantiles of survival time, and the CPCRM can produce estimates of a patient's expected number of clonogens at each time. We develop methods of Bayesian inference about the covariate effects on relevant quantities such as the curerate, methods which use Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRMbased and CPCRMbased Bayesian methods perform well in simulation studies and outperform existing curerate models in application to the breast cancer survival data from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) database.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Gupta_fsu_0071E_13423
 Format
 Thesis
 Title
 Examining the Relationship of Dietary Component Intakes to Each Other and to Mortality.
 Creator

Alrajhi, Sharifah, McGee, Daniel, Levenson, Cathy W., Niu, Xufeng, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean followup period for the participants was 16 years. During followup 2,042 (48%) males and 1,754 (27%) females...
Show moreIn this essay we present analysis examining the basic dietary structure and its relationship to mortality in the first National Health and Nutrition Examination Survey (NHANES I) conducted between 1971 and 1975. We used results from 24hour recalls on 10,483 individuals in this study. All of the indivduals in the analytic sample were followed through 1992 for vital status. The mean followup period for the participants was 16 years. During followup 2,042 (48%) males and 1,754 (27%) females died. We first attempted to capture the inherent structure of the dietary data using principal components analyses (PCA). We performed this estimation separately for each race (white and black) and gender (male and female) and compared the estimated principal components among these four strata. We found that the principal components were similar (but not identical) in the four strata. we also related our estimated principal components to mortality using Cox Proportional Hazards (CPH) models and related dietary component to mortality using forward variable selection.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Alrajhi_fsu_0071E_12802
 Format
 Thesis
 Title
 Median Regression for Complex Survey Data.
 Creator

Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the doubletransformbothsides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations. Furthermore, the doubletransformbothsides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Fraser_fsu_0071E_12825
 Format
 Thesis
 Title
 Matched Sample Based Approach for CrossPlatform Normalization on Gene Expression Data.
 Creator

Shao, Jiang, Zhang, Jinfeng, Sang, QingXiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Geneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to...
Show moreGeneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distanceweighted discrimination method. With the public webbased tool NCI60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both crossplatform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based crossplatform normalization method.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Shao_fsu_0071E_12833
 Format
 Thesis
 Title
 An analysis of test reliability.
 Creator

Isaacson, Fenton R., Florida State University
 Abstract/Description

"The need for efficient means of testing has long been recognized. To obtain efficiency in testing requires the study of four attributes of the testing instrumentnamely: reliability, validity, interpretability and administrability. It is the purpose of this paper to examine in some detail the first of these attributes, reliability. In particular, this is an attempt to analyse the reliability of Mathematics 101 Test D which was administered at Florida State University in the fall of 1948"...
Show more"The need for efficient means of testing has long been recognized. To obtain efficiency in testing requires the study of four attributes of the testing instrumentnamely: reliability, validity, interpretability and administrability. It is the purpose of this paper to examine in some detail the first of these attributes, reliability. In particular, this is an attempt to analyse the reliability of Mathematics 101 Test D which was administered at Florida State University in the fall of 1948"Introduction.
Show less  Date Issued
 1949
 Identifier
 FSU_historic_AKP4870
 Format
 Thesis
 Title
 The Use of a MetaAnalysis Technique in Equating and Its Comparison with Several Small Sample Equating Methods.
 Creator

Caglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology...
Show moreCaglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

The main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study....
Show moreThe main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study. To conduct metaequating, the previous and current equating outcomes obtained from the chosen equating methods (ID (Identity Equating), CircleArc (CA) and Nominal Weights Mean (NW)) and synthetic functions (SFs) of these methods (CAS and NWS) were used, and then, empirical Bayesian (EB) and metaequating (META) procedures were implemented to estimate the equating relationship between test forms at the population level. The SFs were created by giving equal weight to each of the chosen equating methods and the identity (ID) equating. Finally, the chosen equating methods, the SFs of each method (e.g., CAS, NWS, etc.), and also the META and EB versions (e.g., NWEB, CAMETA, NWSMETA, etc.) were investigated and compared under varying testing conditions. These steps involved manipulating some of the factors that influence the accuracy of test score equating. In particular, the effect of test form difficulty levels, the groupmean ability differences, the number of previous equatings, and the sample size on the accuracy of the equating outcomes were investigated. The Chained Equipercentile (CE) equating with 6univariate and 2bivariate moments loglinear presmoothing was used as the criterion equating function to establish the equating relationship between the new form and the base (reference) form with 50,000 examinees per test form. To compare the performance of the equating methods, small numbers of examinee samples were randomly drawn from examinee populations with different ability levels in each simulation replication. Each pairs of the new and base test forms were randomly and independently selected from all available condition specific test form pairs. Those test forms were then used to obtain previous equating outcomes. However, purposeful selections of the examinee ability and test form difficulty distributions were made to obtain the current equating outcomes in each simulation replication. The previous equating outcomes were later used for the implementation of both the META and EB score transformation procedures. The effect of study factors and their possible interactions on each of the accuracy measures were investigated along the entirescore range and the cut (reduced)score range using a series of mixedfactorial ANOVA (MFA) procedures. The performances of the equating methods were also compared based on posthoc tests. Results show that the behaviors of the equating methods vary based on the each level of the group ability difference, test form difficult difference, and new group examinee sample size. Also, the use of both META and EB procedures improved the accuracy of equating results on average. The META and EB versions of the chosen equating methods therefore might be a solution to equate the test forms that are similar in their psychometric characteristics and also taken by new form examinee samples less than 50. However, since there are many factors affecting the equating results in reality, one should always expect that equating methods and score transformation procedures, or in more general terms, estimation procedures may function differently, to some degree, depending on conditions in which they are implemented. Therefore, one should consider the recommendations for the use of the proposed equating methods in this study as a piece of information, not an absolute guideline, for a rule of thumbs for practicing small sample test equating in teacher certification/licensure examinations.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Caglak_fsu_0071E_12863
 Format
 Thesis
 Title
 Four Methods for Combining Dependent Effects from Studies Reporting Regression Analysis.
 Creator

Gunter, Tracey Danielle, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Almond, Russell G., Paek, Insu, Florida State University, College of Education, Department of...
Show moreGunter, Tracey Danielle, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Almond, Russell G., Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Over the years a variety of indices have been proposed to summarize regression analyses. Unfortunately the proposed indices are only appropriate when metaanalysts want to understand the role of a single predictor variable in predicting the outcome variable. However, sometimes metaanalysts want to understand the effect of a set of variables on an outcome variable. In this paper, four methods are presented for obtaining a composite effect for two focal predictor variables from a single...
Show moreOver the years a variety of indices have been proposed to summarize regression analyses. Unfortunately the proposed indices are only appropriate when metaanalysts want to understand the role of a single predictor variable in predicting the outcome variable. However, sometimes metaanalysts want to understand the effect of a set of variables on an outcome variable. In this paper, four methods are presented for obtaining a composite effect for two focal predictor variables from a single regression model. The indices are the average of the standardized regression coefficients (ASC), the average of the standardized regression coefficients using Hedges and Olkin's (1985) approach (AHO), the sheaf coefficient (SC), and the squared multiple semipartial correlation coefficient (MSP). A simulation study was conducted to examine the behavior of the indices and their variance when the number of predictor variables in the model, the sample size, the correlations between the focal predictor variables in the model, and the correlations between the focal and nonfocal predictor variables in the model were manipulated. The results of the study show that the average bias values of the ASC and AHO estimates are small even when the sample size is small. Furthermore, the ASC and AHO estimates and their estimated variances are more precise than the other indices under all conditions examined. Therefore, when metaanalysts are interested in estimating the effect of a set of predictor variables on an outcome variable from a single regression model, the ASC or AHO procedures are preferred.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Gunter_fsu_0071E_12829
 Format
 Thesis
 Title
 Functional Component Analysis and Regression Using Elastic Methods.
 Creator

Tucker, J. Derek, Srivastava, Anuj, Wu, Wei, Klassen, Eric, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Constructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability...
Show moreConstructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability leads to a loss of structure in the data, and inefficiency in data models. Moreover, most methods use a "preprocessing'' alignment step to remove the phasevariability; without considering a more natural joint solution. This dissertation presents three approaches to this problem. The first relies on separating the phase (xaxis) and amplitude (yaxis), then modeling these components using joint distributions. This separation in turn, is performed using a technique called elastic alignment of functions that involves a new mathematical representation of functional data. Then, using individual principal components, one for each phase and amplitude components, it imposes joint probability models on principal coefficients of these components while respecting the nonlinear geometry of the phase representation space. The second combines the phasevariability into the objective function for two component analysis methods, functional principal component analysis and functional principal least squares. This creates a more complete solution, as the phasevariability is removed while simultaneously extracting the components. The third approach combines the phasevariability into the functional linear regression model and then extends the model to logistic and multinomial logistic regression. Through incorporating the phasevariability a more parsimonious regression model is obtained and therefore, more accurate prediction of observations is achieved. These models then are easily extended from functional data to curves (which are essentially functions in R2) to perform regression with curves as predictors. These ideas are demonstrated using random sampling for models estimated from simulated and real datasets, and show their superiority over models that ignore phaseamplitude separation. Furthermore, the models are applied to classification of functional data and achieve high performance in applications involving SONAR signals of underwater objects, handwritten signatures, periodic body movements recorded by smart phones, and physiological data.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9106
 Format
 Thesis
 Title
 Parametric and Nonparametric Spherical Regression with Diffeomorphisms.
 Creator

Rosenthal, Michael, Srivastava, Anuj, Wu, Wei, Klassen, Eric, Pati, Debdeep, Department of Statistics, Florida State University
 Abstract/Description

Spherical regression explores relationships between pairs of variables on spherical domains. Spherical data has become more prevalent in biological, gaming, geographical, and meteorological investigations, creating a need for tools that analyze such data. Previous works on spherical regression have focused on rigid parametric models or nonparametric kernel smoothing methods. This leaves a huge gap in the available tools with no intermediate options currently available. This work will develop...
Show moreSpherical regression explores relationships between pairs of variables on spherical domains. Spherical data has become more prevalent in biological, gaming, geographical, and meteorological investigations, creating a need for tools that analyze such data. Previous works on spherical regression have focused on rigid parametric models or nonparametric kernel smoothing methods. This leaves a huge gap in the available tools with no intermediate options currently available. This work will develop two such intermediate models, one parametric using projective linear transformation and one nonparametric model using diffeomorphic maps from a sphere to itself. The models are estimated in a maximumlikelihood framework using gradientbased optimizations. For the parametric model, an efficient NewtonRaphson algorithm is derived and asymptotic analysis is developed. A firstorder roughness penalty is specified for the nonparametric model using the Jacobian of diffeomorphisms. The prediction performance of the proposed models are compared with stateoftheart methods using simulated and real data involving plate tectonics, cloud deformations, wind, accelerometer, bird migration, and vectorcardiogram data.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9082
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait Recognition.
 Creator

Kaziska, David M., Srivastava, Anuj, Mio, Washington, Chicken, Eric, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal...
Show moreIn this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclostationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using clusterbased trees and boosting.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3275
 Format
 Thesis
 Title
 A Framework for Comparing Shape Distributions.
 Creator

Henning, Wade, Srivastava, Anuj, Alamo, Ruﬁna G., Huﬀer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

The problem of comparisons of shape populations is present in many branches of science, including nanomanufacturing, medical imaging, particle analysis, fisheries, seed science, and computer vision. Researchers in these fields have traditionally characterized the profiles in these sets using combinations of scalar valued descriptor features, like aspect ratio or roughness, whose distributions are easy to compare using classical statistics. However, there is a desire in this community for a...
Show moreThe problem of comparisons of shape populations is present in many branches of science, including nanomanufacturing, medical imaging, particle analysis, fisheries, seed science, and computer vision. Researchers in these fields have traditionally characterized the profiles in these sets using combinations of scalar valued descriptor features, like aspect ratio or roughness, whose distributions are easy to compare using classical statistics. However, there is a desire in this community for a single comprehensive feature that uniquely defines these profiles. The shape of the profile itself is such a feature. Shape features have traditionally been studied as individuals, and comparing distributions underlying sets of shapes is challenging. Since the data comes in the form of samples from shape populations, we use kernel methods to estimate underlying shape densities. We then take a metric approach to define a proper distance, termed the FisherRao distance, to quantify differences between any two densities. This distance can be used for clustering, classification and other types of statistical modeling; however, this dissertation focuses on comparing shape populations as a classical twosample hypothesis test with populations characterized by respective probability densities on shape space. Since we are interested in the shapes of planar closed curves and the space of such curves is infinite dimensional, there are some theoretical issues in defining and estimating densities on this space. We therefore use a spherical multidimensional scaling algorithm to project shape distributions to the unit twosphere, and this allows us to use a von MisesFisher kernel for density estimation. The estimated densities are then compared using the FisherRao distance, which, in turn, is estimated using Monte Carlo methods. This distance estimate is used as a test statistic for the twosample hypothesis test mentioned above. We use a bootstrap approach to perform the test and to evaluate population classification performance. We demonstrate these ideas using applications from industrial and chemical engineering.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9185
 Format
 Thesis
 Title
 Discrimination and Calibration of Prognostic Survival Models.
 Creator

Simino, Jeannette M., Hollander, Myles, McGee, Daniel, Hurt, Myra, Niu, XuFeng, Department of Statistics, Florida State University
 Abstract/Description

Clinicians employ prognostic survival models for diseases such as coronary heart disease and cancer to inform patients about risks, treatments, and clinical decisions (Altman and Royston 2000). These prognostic models are not useful unless they are valid in the population to which they are applied. There are no generally accepted algorithms for assessing the validity of an external survival model in a new population. Researchers often invoke measures of predictive accuracy, the degree to...
Show moreClinicians employ prognostic survival models for diseases such as coronary heart disease and cancer to inform patients about risks, treatments, and clinical decisions (Altman and Royston 2000). These prognostic models are not useful unless they are valid in the population to which they are applied. There are no generally accepted algorithms for assessing the validity of an external survival model in a new population. Researchers often invoke measures of predictive accuracy, the degree to which predicted outcomes match observed outcomes (Justice et al. 1999). One component of predictive accuracy is discrimination, the ability of the model to correctly rank the individuals in the sample by risk. A common measure of discrimination for prognostic survival models is the concordance index, also called the cstatistic. We utilize the concordance index to determine the discrimination of Framinghambased Cox and Loglogistic models of coronary heart disease (CHD) death in cohorts from the Diverse Populations Collaboration, a collection of studies that encompasses many ethnic, geographic, and socioeconomic groups. Pencina and D'Agostino presented a confidence interval for the concordance index when assessing the discrimination of an external prognostic model. We perform simulations to determine the robustness of their confidence interval when measuring discrimination during internal validation. The Pencina and D'Agostino confidence interval is not valid in the internal validation setting because their assumption of mutually independent observations is violated. We compare the Pencina and D'Agostino confidence interval to a bootstrap confidence interval that we propose that is valid for the internal validation. We specifically discern the performance of the interval when the same sample is used to both fit and determine the validity of a prognostic model. The framework for our simulations is a Weibull proportional hazards model of CHD death fit to the Framingham exam 4 data. We then focus on the second component of accuracy, calibration, which measures the agreement between the observed and predicted event rates for groups of patients (Altman and Royston 2000). In 2000, van Houwelingen introduced a method called validation by calibration to allow a clinician to assess the validity of a wellaccepted published survival model on his/her own patient population and adjust the published model to fit that population. Van Houwelingen embeds the published model into a new model with only 3 parameters which helps combat the overfitting that occurs when models with many covariates are fit on data sets with a small number of events. We explore validation by calibration as a tool to adjust models when an external model over or underestimates risk. Van Houwelingen discusses the general method and then focusses on the proportional hazards model. There are situations where proportional hazards may not hold, thus we extend the methodology to the Loglogistic accelerated failure time model. We perform validation by calibration of Framinghambased Cox and Loglogistic models of CHD death to cohorts from the Diverse Populations Collaboration. Lastly, we conduct simulations that investigate the power of the global Wald validation by calibration test. We study its power to reject an invalid proportional hazards or Loglogistic accelerated failure time model under various scale and/or shape misspecifications.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd0328
 Format
 Thesis
 Title
 A Comparison of Three Approaches to Confidence Interval Estimation for Coefficient Omega.
 Creator

Xu, Jie, Yang, Yanyun, Becker, Betsy Jane, Almond, Russell G., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
 Abstract/Description

Coefficient Omega was introduced by McDonald (1978) as a reliability coefficient of composite scores for the congeneric model. Interval estimation (Neyman, 1937) on coefficient Omega provides a range of plausible values which is likely to capture the population reliability of composite scores. The Wald method, likelihood method, and biascorrected and accelerated bootstrap method are three methods to construct confidence interval for coefficient Omega (e.g., Cheung, 2009b; Kelley & Cheng,...
Show moreCoefficient Omega was introduced by McDonald (1978) as a reliability coefficient of composite scores for the congeneric model. Interval estimation (Neyman, 1937) on coefficient Omega provides a range of plausible values which is likely to capture the population reliability of composite scores. The Wald method, likelihood method, and biascorrected and accelerated bootstrap method are three methods to construct confidence interval for coefficient Omega (e.g., Cheung, 2009b; Kelley & Cheng, 2012; Raykov, 2002, 2004, 2009; Raykov & Marcoulides, 2004; Padilla & Divers, 2013). Very limited number of studies on the evaluation of these three methods can be found in the literature (e.g., Cheung, 2007, 2009a, 2009b; Kelley & Cheng, 2012; Padilla & Divers, 2013). No simulation study has been conducted to evaluate the performance of these three methods for interval construction on coefficient Omega. In the current simulation study, I assessed these three methods by comparing their empirical performance on interval estimation for coefficient Omega. Four factors were included in the simulation design: sample size, number of items, factor loading, and degree of nonnormality. Two thousands datasets were generated in R 2.15.0 (R Core Team, 2012) for each condition. For each generated dataset, three approaches (i.e., the Wald method, likelihood method, and biascorrected and accelerated bootstrap method) were used to construct 95% confidence interval of coefficient Omega in R 2.15.0. The results showed that when the data were multivariate normally distributed, three methods performed equally well and coverage probabilities were very close to the prespecified .95 confidence level. When the data were multivariate nonnormally distributed, coverage probabilities decreased and interval widths became wider for all three methods as the degree of nonnormality increased. In general, when the data departed from the multivariate normality, the BCa bootstrap method performed better than the other two methods, with relatively higher coverage probabilities, while the Wald and likelihood methods were comparable and yielded narrower interval width than the BCa bootstrap method.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9269
 Format
 Thesis
 Title
 Bayesian Portfolio Optimization with TimeVarying Factor Models.
 Creator

Zhao, Feng, Niu, Xufeng, Cheng, Yingmei, Huﬀer, Fred W., Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the riskfree rate ("risk premia"). Both firmlevel characteristics and macroeconomic variables are used to predict stocks' timevarying alphas and betas, and macroeconomic variables are used to predict the risk...
Show moreWe develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the riskfree rate ("risk premia"). Both firmlevel characteristics and macroeconomic variables are used to predict stocks' timevarying alphas and betas, and macroeconomic variables are used to predict the risk premia. All of the models are specified in a Bayesian framework to account for estimation risk, and informative prior distributions on both stock returns and model parameters are adopted to reduce estimation error. To gauge the economic signicance of the predictability, we apply the models to the U.S. stock market and construct optimal portfolios based on model predictions. Outofsample performance of the portfolios is evaluated to compare the models. The empirical results confirm predictabiltiy from all of the sources considered in our model: (1) The equity risk premium is timevarying and predictable using macroeconomic variables; (2) Stocks' alphas and betas differ crosssectionally and are predictable using firmlevel characteristics; and (3) Stocks' alphas and betas are also timevarying and predictable using macroeconomic variables. Comparison of different subperiods shows that the predictability of stocks' betas is persistent over time, but the predictability of stocks' alphas and the risk premium has diminished to some extent. The empirical results also suggest that Bayesian statistical techinques, especially the use of informative prior distributions, help reduce model estimation error and result in portfolios that outperform the passive indexing strategy. The findings are robust in the presence of transaction costs.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0526
 Format
 Thesis
 Title
 A Bayesian Approach to MetaRegression: The Relationship Between Body Mass Index and AllCause Mortality.
 Creator

Marker, Mahtab, McGee, Dan, Hurt, Myra, Niu, Xiufeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This thesis presents a Bayesian approach to MetaRegression and Individual Patient Data (IPD) Metaanalysis. The focus of the research is on establishing the relationship between Body Mass Index (BMI) and allcause mortality. This has been an area of continuing interest in the medical and public health communities and no concensus has been reached on what the optimal weight for individuals is. Standards are usually speci ed in terms of body mass index (BMI = wt(kg) over height(m)2 ) which is...
Show moreThis thesis presents a Bayesian approach to MetaRegression and Individual Patient Data (IPD) Metaanalysis. The focus of the research is on establishing the relationship between Body Mass Index (BMI) and allcause mortality. This has been an area of continuing interest in the medical and public health communities and no concensus has been reached on what the optimal weight for individuals is. Standards are usually speci ed in terms of body mass index (BMI = wt(kg) over height(m)2 ) which is associated with body fat percentage. Many studies in the literature have modelled the relationship between BMI and mortality and reported a variety of relationships including Ushaped, Jshaped and linear curves. The aim of my research was to use statistical methods to determine whether we can combine these diverse results an obtain single estimated relationship, using which one can nd the point of minimum mortality and establish reasonable ranges for optimal BMI or how we can best examine the reasons for the heterogeneity of results. Commonly used techniques of Metaanalysis and Metaregression are explored and a problem with the estimation procedure in the multivariate setting is presented. A Bayesian approach using Hierarchical Generalized Linear Mixed Model is suggested and implemented to overcome this drawback of standard estimation techniques. Another area which is explored briefly is that of Individual Patient Data metaanalysis. A Frailty model or Random Effects Proportional Hazards Survival model approach is proposed to carry out IPD metaregression and come up with a single estimated relationship between BMI and mortality, adjusting for the variation between studies.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd2736
 Format
 Thesis
 Title
 Nonlinear Multivariate Tests for HighDimensional Data Using Wavelets with Applications in Genomics and Engineering.
 Creator

Girimurugan, Senthil Balaji, Chicken, Eric, Zhang, Jinfeng, Ahlquist, Jon, Tao, Minjing, Department of Statistics, Florida State University
 Abstract/Description

Gaussian processes are not uncommon in various fields of science such as engineering, genomics, quantitative finance and astronomy, to name a few. In fact, such processes are special cases in a broader class of data known as functional data. When the underlying mean response of a process is a function, the resulting data from these processes are functional responses and specialized statistical tools are required in their analysis. The methodology discussed in this work offers nonparametric...
Show moreGaussian processes are not uncommon in various fields of science such as engineering, genomics, quantitative finance and astronomy, to name a few. In fact, such processes are special cases in a broader class of data known as functional data. When the underlying mean response of a process is a function, the resulting data from these processes are functional responses and specialized statistical tools are required in their analysis. The methodology discussed in this work offers nonparametric tests that can detect differences in such data with greater power and good control of TypeI error over existing methods. The incorporation of Wavelet Transforms makes the test an efficient approach due to its decorrelation properties. These tests are designed primarily to handle functional responses from multiple treatments simultaneously and generally are extensible to high dimensional data. The sparseness introduced by Wavelet Transforms is another advantage of this test when compared to traditional tests. In addition to offering a theoretical framework, several applications of such tests in the fields of engineering, genomics and quantitative finance are also discussed.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8789
 Format
 Thesis
 Title
 Practical Methods for Equivalence and NonInferiority Studies with Survival Response.
 Creator

Martinez, Elvis Englebert, Sinha, Debajyoti, Levenson, Cathy W., Chicken, Eric, Lipsitz, Stuart, McGee, Daniel, Florida State University, College of Arts and Sciences,...
Show moreMartinez, Elvis Englebert, Sinha, Debajyoti, Levenson, Cathy W., Chicken, Eric, Lipsitz, Stuart, McGee, Daniel, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Determining the equivalence or noninferiority of a new drug (test drug) with a existing treatment (reference drug) is an important topic of statistical interest. Wellek (1993) pioneered the way for logrank based equivalence and noninferiority testing by formulating a testing procedure using proportional hazards model (PHM) of Cox (1972). In many equivalence and noninferiority trials, two hazards functions may converge to one rather than being proportional for all timepoints. In this case...
Show moreDetermining the equivalence or noninferiority of a new drug (test drug) with a existing treatment (reference drug) is an important topic of statistical interest. Wellek (1993) pioneered the way for logrank based equivalence and noninferiority testing by formulating a testing procedure using proportional hazards model (PHM) of Cox (1972). In many equivalence and noninferiority trials, two hazards functions may converge to one rather than being proportional for all timepoints. In this case, the proportional odds survival model (POSM) of Bennett (1983) will be more sufficient than a Cox's PHM assumption. We show in both cases, when the wrong modeling assumption is made and Cox's PH assumption is violated, the popular procedure of Wellek (1993) has an inflated type I error. On the contrary, our proposed POS model based equivalence and noninferiority tests maintains the practitioners desired 5% level of significance regardless of the underlying modeling assumption (e.g. Cox,1972; Wellek, 1993). Furthermore for noninferiority trials, we introduce a method to determine the optimal sample size required when a desired power and type I error is specified and the data follows the POSM of Bennett (1983). For both of the above trials, we present simulation studies showing the finite approximation of powers and type I error rates, when the underlying modeling assumption are correctly specified and when the assumptions are misspecified.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9214
 Format
 Thesis
 Title
 GoodnessofTests for Logistic Regression.
 Creator

Wu, Sutan, McGee, Dan L., Zhang, Jinfeng, Hurt, Myra, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the...
Show moreThe generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the cumulative residual test. We compare their performance in a series of empirical studies as well as particular simulation scenarios. We conclude that the unweighted sum of squares test and the cumulative sums of residuals test give better overall performance than the other two. We also conclude that the commonly suggested practice of assuming that a pvalue less than 0.15 is an indication of lack of fit at the initial steps of model diagnostics should be adopted. Additionally, D'Agostino et al. presented the relationship of the stacked logistic regression and the Cox regression model in the Framingham Heart Study. So in our future study, we will examine the possibility and feasibility of the adaption these goodnessoffit tests to the Cox proportional hazards model using the stacked logistic regression.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0693
 Format
 Thesis