Current Search: Research Repository (x) » Statistics (x)
Search results
Pages
 Title
 Weighted Adaptive Methods for Multivariate Response Models with an HIV/Neurocognitive Application.
 Creator

Geis, Jennifer Ann, She, Yiyuan, MeyerBaese, Anke, Barbu, Adrian, Bunea, Florentina, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the...
Show moreMultivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the error matrix has independent constant variance entries. While this assumption is necessary to show their strong theoretical results, in practical application, some flexibility is required. That is, such assumption cannot always be safely made. What is developed here are the theoretics that parallel Bunea et al.'s work with the addition of a "decorrelator" weight matrix. One choice for the weight matrix is the residual covariance, but this introduces many issues in practice. A computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving rise to an Adaptive CCA (ACCA) with principal proofs for the large sample setting. However, particular considerations are required for the highdimensional setting, where similar theoretics do not hold. What is offered instead are extensive empirical simulations that reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also provides good estimation of the number of canonical relationships and variates. It is argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the highdimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these highdimensional concerns. To offer a practical application of these ideas, ACCA is applied to a "large sample'" neurocognitive dataset. Then, a highdimensional dataset is generated to which Group LASSO will be first utilized before ACCA. This provides a unique perspective into the relationships between cognitive deficiencies in HIVpositive patients and the extensive, available neuroimaging measures.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4861
 Format
 Thesis
 Title
 A WeaklyInformative GroupSpecific Prior Distribution for MetaAnalysis.
 Creator

Thompson, Christopher, Becker, Betsy Jane, Clark, Kathleen M., Almond, Russell G., Aloe, Ariel M., Yang, Yanyun, Florida State University, College of Education, Department of...
Show moreThompson, Christopher, Becker, Betsy Jane, Clark, Kathleen M., Almond, Russell G., Aloe, Ariel M., Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

While Bayesian metaanalysis has flourished both in methodological and substantive work, groupspecific Bayesian modeling remains scarce. Common practice for choosing prior distributions entails using typical noninformative priors. Currently, there is a push to use more informative prior distributions. In this dissertation I propose a group specific weakly informative prior distribution. The new prior distribution uses a frequentist estimate of betweenstudies heterogeneity as the...
Show moreWhile Bayesian metaanalysis has flourished both in methodological and substantive work, groupspecific Bayesian modeling remains scarce. Common practice for choosing prior distributions entails using typical noninformative priors. Currently, there is a push to use more informative prior distributions. In this dissertation I propose a group specific weakly informative prior distribution. The new prior distribution uses a frequentist estimate of betweenstudies heterogeneity as the noncentrality parameter in a folded noncentral t distribution. This new distribution is then modeled individually for groups based on some categorical factor. An extensive simulation study was performed to assess the performance of the new groupspecific prior distribution to several noninformative prior distributions in a variety of metaanalytic scenarios. An application using data from a previously published metaanalysis on dynamic geometry software is also provided.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Thompson_fsu_0071E_13051
 Format
 Thesis
 Title
 WaveletBased Bayesian Approaches to Sequential Profile Monitoring.
 Creator

Varbanov, Roumen, Chicken, Eric, Linero, Antonio Ricardo, Huffenberger, Kevin M., Yang, Yanyun, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

We consider changepoint detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three waveletbased Bayesian approaches...
Show moreWe consider changepoint detection and estimation in sequences of functional observations. This setting often arises when the quality of a process is characterized by such observations, termed profiles, and monitoring profiles for changes in structure can be used to ensure the stability of the process over time. While interest in profile monitoring has grown, few methods approach the problem from a Bayesian perspective. In this dissertation, we propose three waveletbased Bayesian approaches to profile monitoring  the last of which can be extended to a general process monitoring setting. First, we develop a general framework for the problem of interest in which we base inference on the posterior distribution of the change point without placing restrictive assumptions on the form of profiles. The proposed method uses an analytic form of the posterior distribution in order to run online without relying on Markov chain Monte Carlo (MCMC) simulation. Wavelets, an effective tool for estimating nonlinear signals from noisecontaminated observations, enable the method to flexibly distinguish between sustained changes in profiles and the inherent variability of the process. Second, we modify the initial framework in a posterior approximation algorithm designed to utilize past information in a computationally efficient manner. We show that the approximation can detect changes of smaller magnitude better than traditional alternatives for curbing computational cost. Third, we introduce a monitoring scheme that allows an unchanged process to run infinitely long without a false alarm; the scheme maintains the ability to detect a change with probability one. We include theoretical results regarding these properties and illustrate the implementation of the scheme in the previously established framework. We demonstrate the efficacy of proposed methods on simulated data and significantly outperform a relevant frequentist competitor.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Varbanov_fsu_0071E_14513
 Format
 Thesis
 Title
 Volatility Matrix Estimation for HighFrequency Financial Data.
 Creator

Xue, Yang, Tao, Minjing, Cheng, Yingmei, Fendler, Rachel Loveitt, Huffer, Fred W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Volatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernelbased spot volatility matrix estimator with preaveraging approach for highfrequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence...
Show moreVolatility is usually employed to measure the dispersion of asset returns, and it’s widely used in risk analysis and asset management. This first chapter studies a kernelbased spot volatility matrix estimator with preaveraging approach for highfrequency data contaminated by market microstructure noise. When the sample size goes to infinity and the bandwidth vanishes, we show that our estimator is consistent and its asymptotic normality is established with achieving an optimal convergence rate. We also construct a consistent pairwise spot covolatility estimator with HayashiYoshida method for nonsynchronous highfrequency data with noise contamination. The simulation studies demonstrate that the proposed estimators work well under different noise levels, and their estimation performances are improved by the increasing sample frequency. In empirical applications, we implement the estimators on the intraday prices of four component stocks of Dow Jones Industrial Average. The second chapter shows a factorbased vast volatility matrix estimation method for high frequency financial data with market microstructure noise, finite large jumps and infinite activity small jumps. We construct the sample volatility matrix estimator based on the approximate factor model, and use the preaveraging and thresholding estimation method (PATH) to digest the noise and jumps. After using the principle component analysis (PCA) to decompose the sample volatility matrix estimator, our proposed volatility matrix estimator is finally obtained by imposing the blockdiagonal regularization on the residual covariance matrix through sorting the assets with the global industry classification standard (GICS) codes. The Monte Carlo simulation shows that our proposed volatility matrix estimator can remove the majority effects of noise and jumps, and its estimation performance improves fast when the sample frequency increases. Finally, the PCAbased estimators are employed to perform volatility matrix estimation and asset allocation for S&P 500 stocks. To compare with PCAbased estimators, we also include the exchangetraded funds (ETFs) data to construct observable factors such as the FamaFrench factors for volatility estimation.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Xue_fsu_0071E_14471
 Format
 Thesis
 Title
 Variable Selection of Correlated Predictors in Logistic Regression: Investigating the DietHeart Hypothesis.
 Creator

Thompson, Warren R. (Warren Robert), McGee, Daniel, Eberstein, Isaac, Huﬀer, Fred, Sinha, Debajyoti, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the...
Show moreVariable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the dietheart hypothesis. The dietheart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program  a study of CHD incidence in men of Japanese descent. Our results were largely methodspecific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1360
 Format
 Thesis
 Title
 The Use of a MetaAnalysis Technique in Equating and Its Comparison with Several Small Sample Equating Methods.
 Creator

Caglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology...
Show moreCaglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

The main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study....
Show moreThe main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study. To conduct metaequating, the previous and current equating outcomes obtained from the chosen equating methods (ID (Identity Equating), CircleArc (CA) and Nominal Weights Mean (NW)) and synthetic functions (SFs) of these methods (CAS and NWS) were used, and then, empirical Bayesian (EB) and metaequating (META) procedures were implemented to estimate the equating relationship between test forms at the population level. The SFs were created by giving equal weight to each of the chosen equating methods and the identity (ID) equating. Finally, the chosen equating methods, the SFs of each method (e.g., CAS, NWS, etc.), and also the META and EB versions (e.g., NWEB, CAMETA, NWSMETA, etc.) were investigated and compared under varying testing conditions. These steps involved manipulating some of the factors that influence the accuracy of test score equating. In particular, the effect of test form difficulty levels, the groupmean ability differences, the number of previous equatings, and the sample size on the accuracy of the equating outcomes were investigated. The Chained Equipercentile (CE) equating with 6univariate and 2bivariate moments loglinear presmoothing was used as the criterion equating function to establish the equating relationship between the new form and the base (reference) form with 50,000 examinees per test form. To compare the performance of the equating methods, small numbers of examinee samples were randomly drawn from examinee populations with different ability levels in each simulation replication. Each pairs of the new and base test forms were randomly and independently selected from all available condition specific test form pairs. Those test forms were then used to obtain previous equating outcomes. However, purposeful selections of the examinee ability and test form difficulty distributions were made to obtain the current equating outcomes in each simulation replication. The previous equating outcomes were later used for the implementation of both the META and EB score transformation procedures. The effect of study factors and their possible interactions on each of the accuracy measures were investigated along the entirescore range and the cut (reduced)score range using a series of mixedfactorial ANOVA (MFA) procedures. The performances of the equating methods were also compared based on posthoc tests. Results show that the behaviors of the equating methods vary based on the each level of the group ability difference, test form difficult difference, and new group examinee sample size. Also, the use of both META and EB procedures improved the accuracy of equating results on average. The META and EB versions of the chosen equating methods therefore might be a solution to equate the test forms that are similar in their psychometric characteristics and also taken by new form examinee samples less than 50. However, since there are many factors affecting the equating results in reality, one should always expect that equating methods and score transformation procedures, or in more general terms, estimation procedures may function differently, to some degree, depending on conditions in which they are implemented. Therefore, one should consider the recommendations for the use of the proposed equating methods in this study as a piece of information, not an absolute guideline, for a rule of thumbs for practicing small sample test equating in teacher certification/licensure examinations.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Caglak_fsu_0071E_12863
 Format
 Thesis
 Title
 Univariate and Multivariate Volatility Models for Portfolio Value at Risk.
 Creator

Xiao, Jingyi, Niu, Xufeng, Ökten, Giray, Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype...
Show moreIn modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype model parameters via Quasi Maximum Likelihood Estimation (QMLE) while for those of SV we employ MCMC with Ancillary Sufficient Interweaving Strategy. We use the forecast volatilities corresponding to each model to predict the VaR of the 5 indices. We test the predictive performances of the estimated models by a twostage backtesting procedure and then compare them via the Lopez loss function. Results of this dissertation indicate that even though it is more computational demanding than GARCHtype models, SV dominates them in forecasting VaR. Since financial volatilities are moving together across assets and markets, it becomes apparent that modeling the volatilities in a multivariate framework of modeling is more appropriate. However, existing studies in the literature do not present compelling evidence for a strong preference between univariate and multivariate models. In this dissertation we also address the problem of forecasting portfolio VaR via multivariate GARCH models versus univariate GARCH models. We construct 3 portfolios with stock returns of 3 major US stock indices, 6 major banks and 6 major technical companies respectively. For each portfolio, we model the portfolio conditional covariances with GARCH, EGARCH and MGARCHBEKK, MGARCHDCC, and GOGARCH models. For each estimated model, the forecast portfolio volatilities are further used to calculate (portfolio) VaR. The ability to capture the portfolio volatilities is evaluated by MAE and RMSE; the VaR prediction performance is tested through a twostage backtesting procedure and compared in terms of the loss function. The results of our study indicate that even though MGARCH models are better in predicting the volatilities of some portfolios, GARCH models could perform as well as their multivariate (and computationally more demanding) counterparts.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xiao_fsu_0071E_15172
 Format
 Thesis
 Title
 Ultrafast Lattice Dynamics in Metal Thin Films and NanoParticles.
 Creator

Wang, Xuan, Cao, Jim, Yang, Wei, Bonesteel, Nicholas, Riley, Mark, Xiong, Peng, Department of Physics, Florida State University
 Abstract/Description

This thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signalto...
Show moreThis thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signaltonoise ratio and thus enables us to study more complex solid state materials. Two main thrusts are discussed in details in this thesis. The first one is the dynamics of coherent phonon generation by ultrafast heating in gold thin film and nanoparticles, which emphasizes the electronic thermal stress. The other one is the ultrafast dynamics in Nickel, which shows that the mutual interactions among lattice, spin and electron subsystems can significantly alter the ultrafast lattice dynamics. In these studies, we exploit the advantage of FED instrument as an ideal tool that can directly and simultaneously monitor the coherent and random motion of lattice.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1247
 Format
 Thesis
 Title
 TWOWAY CLUSTER ANALYSIS WITH NOMINAL DATA.
 Creator

COOPER, PAUL GAYLORD., Florida State University
 Abstract/Description

Consider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows,...
Show moreConsider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows, columns, and elements of X. A generalization of a twoway joining algorithm (TWJA) introduced by J. A. Hartigan (1975) is used to construct the three trees., The TWJA requires the definition of measures of dissimilarity between row clusters and column clusters respectively. Two approaches are used in the construction of these dissimilarity coefficientsone based on intuition and one based on a formal prediction model. For matrices with binary elements (0 or 1), measures of dissimilarity between row or column clusters are based on the number of mismatching pairs. Consider two distinct row clusters R(,p) and R(,q) containing m(,p) and m(,q) rows respectively. One measure of dissimilarity, d(,0)(R(,p), R(,q)), between R(,p) and R(,q), is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where b(,p(beta)) and b(,q(beta)) are the number of ones in column (beta) of clusters R(,p) and R(,q) respectively. Two additional intuitive dissimilarity coefficients are also defined and studied., For matrices containing nominal level data, dissimilarity coefficients are based on a formal prediction model. Analogous to the procedure of Cleveland and Relles (1974), for a given data matrix, the model consists of a scheme for random selection of two rows (or columns) from the matrix and an identification rule for distinguishing between the two rows (or columns). A loss structure is defined for both rows and columns and the expected loss due to incorrect row or column identification is computed. The dissimilarity between two (say) row clusters is then defined to be the increase in expected loss due to joining those two row clusters into a single cluster., Stopping criteria are suggested for both the intuitive and prediction model approaches. For the intuitive approach, it is suggested that joining be stopped when the dissimilarity between the (say) row clusters to be joined next exceeds that expected by chance under the assumption that the (say) column totals of the matrix are fixed. For the prediction model approach the stopping criterion is based on a cluster prediction model in which the objective is to distinguish between row or column clusters. A cluster identification rule is defined based on the information in the partitioned data matrix and the expected loss due to incorrect cluster identification is computed. The expected cluster loss is also computed when cluster identification is based on strict randomization. The relative decrease in expected cluster loss due to identification based on the partitioned matrix versus that based on randomization is suggested as a stopping criterion., Both contrived and real data examples are used to illustrate and compare the two clustering procedures. Computational aspects of the procedure are discussed and it is concluded that the intuitive approach is less costly in terms of computation time. Further, five admissibility properties are defined and, for certain intuitive dissimilarity coefficients, the trees produced by the TWJA are shown to possess three of the five properties.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026123, 3084693, FSDT3084693, fsu:74194
 Format
 Document (PDF)
 Title
 Trend and VariablePhase Seasonality Estimation from Functional Data.
 Creator

Tai, LiangHsuan, Gallivan, Kyle A., Srivastava, Anuj, Wu, Wei, Klassen, E. (Eric), Ökten, Giray, Florida State University, College of Arts and Sciences, Department of Mathematics
 Abstract/Description

The problem of estimating trend and seasonality has been studied over several decades, although mostly using single time series setup. This dissertation studies the problem of estimating these components from a functional data point of view, i.e. multiple curves, in situations where seasonal effects exhibit arbitrary time warpings or phase variability across different observations. Rather than ignoring the phase variability, or using an offtheshelf alignment method to remove phase, we take...
Show moreThe problem of estimating trend and seasonality has been studied over several decades, although mostly using single time series setup. This dissertation studies the problem of estimating these components from a functional data point of view, i.e. multiple curves, in situations where seasonal effects exhibit arbitrary time warpings or phase variability across different observations. Rather than ignoring the phase variability, or using an offtheshelf alignment method to remove phase, we take a modelbased approach and seek Maximum Likelihood Estimators (MLEs) of the trend and the seasonal effects, while performing alignments over the seasonal effects at the same time. The MLEs of trend, seasonality, and phase are computed using a coordinate descent based optimization method. We use bootstrap replication for computing confidence bands and for testing hypothesis about the estimated components. We also utilize loglikelihood for selecting the trend subspace, and for comparisons with other candidate models. This framework is demonstrated using experiments involving synthetic data and three real data (Berkeley growth velocity, U.S. electricity price, and USD exchange fluctuation). Our framework is further applied to another biological problem, significance analysis of gene sets of timecourse gene expression data and outperform the stateoftheart method.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Tai_fsu_0071E_13816
 Format
 Thesis
 Title
 Transformations of certain Gaussian random fields, with applications in survival analysis.
 Creator

Sun, Yanqing., Florida State University
 Abstract/Description

It has been almost sixty years since Kolmogorov introduced a distributionfree test for the simple null hypothesis that a distribution function coincides with a given distribution function. In 1949 Doob observed that Kolmogorov's approach could be simplified by transforming the empirical process to an empirical process based on uniform random variables. In recent years this approach has led to the construction of distributionfree tests when unknown parameters are present. The purpose of this...
Show moreIt has been almost sixty years since Kolmogorov introduced a distributionfree test for the simple null hypothesis that a distribution function coincides with a given distribution function. In 1949 Doob observed that Kolmogorov's approach could be simplified by transforming the empirical process to an empirical process based on uniform random variables. In recent years this approach has led to the construction of distributionfree tests when unknown parameters are present. The purpose of this dissertation is to apply the transformation approach in the setting of survival analysis, where censoring and covariate information further complicate the problem. Asymptotic distributionfree tests are developed for testing independence of a survival time from a covariate, and for checking the adequacy of Cox's proportional hazards model. The test statistics are obtained from certain test statistic processes (indexed by time and covariate) which converge in distribution to Brownian sheets. A simulation study is carried out to investigate the finite sample properties of the proposed tests and they are applied to data from the British Medical Research Council's (1984) 4th myelomatosis trial.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9309739, 3088014, FSDT3088014, fsu:76821
 Format
 Document (PDF)
 Title
 Transformation Models for Survival Data Analysis and Applications.
 Creator

Liu, Yang, Niu, XuFeng, Lloyd, Donald, McGee, Dan, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

It is often assumed that all uncensored subjects will eventually experience the event of interest in standard survival models. However, in some situations when the event considered is not death, it will never occur for a proportion of subjects. Survival models with a cure fraction are becoming popular in analyzing this type of study. We propose a generalized transformation model motivated by Zeng et al's (2006) transformed proportional time cure model. In our proposed model, fractional...
Show moreIt is often assumed that all uncensored subjects will eventually experience the event of interest in standard survival models. However, in some situations when the event considered is not death, it will never occur for a proportion of subjects. Survival models with a cure fraction are becoming popular in analyzing this type of study. We propose a generalized transformation model motivated by Zeng et al's (2006) transformed proportional time cure model. In our proposed model, fractional polynomials are used instead of the simple linear combination of the covariates. The proposed models give us more flexibility without loosing any good properties of the original model, such as asymptotic consistency and asymptotic normality of the regression coefficients. The proposed model will better fit the data where the relationship between a response variable and covariates is nonlinear. We also provide a power selection procedure based on the likelihood function. A simulation study is carried out to show the accuracy of the proposed power selection procedure. The proposed models are applied to coronary heart disease and cancer related medical data from both observational cohort studies and clinical trials
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1155
 Format
 Thesis
 Title
 Traits, Species, and Communities: Integrative Bayesian Approaches to Ecological Biogeography across Geographic, Environmental, Phylogenetic, and Morphological Space.
 Creator

Humphreys, John M., Elsner, James B., Steppan, Scott J., Mesev, Victor, Pau, Stephanie, Florida State University, College of Social Sciences and Public Policy, Department of...
Show moreHumphreys, John M., Elsner, James B., Steppan, Scott J., Mesev, Victor, Pau, Stephanie, Florida State University, College of Social Sciences and Public Policy, Department of Geography
Show less  Abstract/Description

Assuming a methodological perspective, this dissertation proceeds through a series of studies that cover levels of biological organization ranging from the morphological traits of individual specimens to community assemblages. The presented research explores geographic extents ranging from local to global scales, examines both plants and animals, and explores relationships among species with common ancestry. The research appraises and then proposes solutions to a variety of yet unresolved...
Show moreAssuming a methodological perspective, this dissertation proceeds through a series of studies that cover levels of biological organization ranging from the morphological traits of individual specimens to community assemblages. The presented research explores geographic extents ranging from local to global scales, examines both plants and animals, and explores relationships among species with common ancestry. The research appraises and then proposes solutions to a variety of yet unresolved issues in species distribution modeling; including, preferential sampling, spatial dependency, multiscaled spatial processes, niche equilibrium assumptions, data structure arising from shared evolutionary history, and correlations between predictor variables. Approaching the geographic distribution of wetlands as an applied concern, the study presented in Chapter 2 emphasizes that the identication and inventory of wetlands are essential components of water resource management. To be eective in these endeavors, it is critical that the process used to detect and document wetlands be time ecient, accurate, and repeatable as new environmental information becomes available. Approaches dependent on aerial photographic interpretation of land cover by individual human analysts necessitate hours of assessment, introduce human error, and fail to include the best available soils and hydrologic data. The goal of Chapter 2 is to apply hierarchical modeling and Bayesian inference to predict the probability of wetland presence as a continuous gradient with the explicit consideration of spatial structure. The presented spatial statistical model can evaluate 100 km2 at a 50 x 50 meter resolution in approximately 50 minutes while simultaneously incorporating ancillary data and accounting for latent spatial processes. Model results demonstrate an ability to consistently capture wetlands identied through aerial interpretation with greater than 90% accuracy (scaled Brier Score) and to identify wetland extents, ecotones, and hydrologic connections not identied through use of other modeling and mapping techniques. The provided model is reasonably robust to changes in resolution, areal extents between 100 km2 and 300 km2, and regionspecic physical conditions. As with modeling wetland occurrence, species distribution modeling aimed at forecasting the spread of invasive species under projected global warming also oers land managers an important tool for assessing future ecological risk and for prioritizing management actions. Chapter 3 applies Bayesian inference and newly available geostatistical tools to forecast global range expansion for the ecosystem altering invasive climbing fern Lygodium microphyllum. The presented modeling framework emphasizes the need to account for spatial processes at both the individual and aggregate levels, the necessity of modeling nonlinear responses to environmental gradients, and the explanatory power of biotic covariates. Results indicate that Lygodium microphyllum will undergo global range expansion in concert with anthropogenic global warming and that the species is likely temperature and dispersal limited. Predictions are presented for current and future climate conditions assuming both limited and unlimited dispersal scenarios. Finally, Chapter 4 provides a novel framework to combine multispecies joint modeling techniques with spatially explicit phylogenetic regression to simultaneously predict the probability of species occurrence and the geographic distribution of interspecic continuous morphological traits. Choosing the South American leafeared mice (genus: Phyllotis) as an empirical example, a threetiered phylogenetic coregionalization trait biogeography model (PhyCoRTBio) is constructed. The conditionally dependent structure of the PhyCoRTBio model enables information from multiple species and from multiple specimenspecic trait metrics to be leveraged towards estimation of a focal species distribution. I hypothesize that, relative to other commonly used species distribution modeling methods, the PhyCoRTBio approach will exhibit improved performance in predicting occurrence for species within the genus Phyllotis. After describing its statistical implementation, this hypothesis is assessed by constructing PhyCoRTBio models for six dierent Phyllotis species and then comparing results to those derived using maximum entropy methods, random forest clustering, Gaussian random eld species distribution models, and Hierarchical Bayesian species distribution models. To judge the relative performance of each modeling approach, model sensitivity (proportion of correctly predicted presences), specicity (proportion of correctly predicted absences), the area under the receiver operating characteristic curve (AUC), and the True Skill Statistic (TSS) are calculated. Findings indicate that traitbased covariates improve model performance and highlight the need to consider spatial processes and phylogenetic information during multispecies distribution modeling.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Humphreys_fsu_0071E_14298
 Format
 Thesis
 Title
 Tools for Statistical Analysis on Shape Spaces of ThreeDimensional Object.
 Creator

Xie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of...
Show moreXie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

With the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical...
Show moreWith the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical representations of data and designing proper algorithms to process them. The existing tools for analyzing highdimensional data, including 3D shape data, are found to be insufficient as they usually suffer from many factors, such as misalignments, noise, and clutter. This thesis attempts to develop a framework for processing, analyzing and understanding highdimensional data, especially 3D shapes, by proposing a set of statistical tools including theory, algorithms and optimization applied to practical problems. In particular, the following aspects of shape analysis are considered: 1. A framework adopting the SRNF representation, based on parallel transport of deformations across surfaces in the shape space, leads to statistical analysis on shape data. Three main analyses are conducted under this framework: (1) computing geodesics when either two end surfaces or the starting surface and an initial deformation are given; (2) parallel transporting deformation across surfaces; and (3) sampling random surfaces. 2. Computational efficiency plays an important role in performing statistical shape analysis on large datasets of 3D objects. To speed up the previous method, a framework with numerical solution is introduced by approximating the inverse mapping, and it reduces the computational cost by an order of magnitude. 3. The geometrical and morphological information, or their shapes, of 3D objects can be analyzed explicitly using boundaries extracted from original image scans. An alternative idea is to consider variability in shapes directly from their embedding images. A novel framework is proposed to unify three important tasks, registering, comparing and modeling images. 4. Finally, the spatial deformations learned from registering images are modeled using the GRID based decomposition. This specific model provides a way to decompose a large deformation into local and fundamental ones so that shape differences between images are easily interpretable. We conclude this thesis with conclusions drawn in this research and discuss potential future directions of statistical shape analysis in the last chapter, both from methodological and application aspects.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9495
 Format
 Thesis
 Title
 TimeVarying Mixture Models for Financial Risk Management.
 Creator

Zhang, Shuguang, Niu, Xufeng, Cheng, Yingmei, Huffer, Fred W. (Fred William), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Motivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually...
Show moreMotivated by understanding the devastating financial crisis in 2008 that was partially caused by underestimation of financial risk, we propose a class of timevarying mixture models for risk analysis and management. There are various metrics for financial risk including value at risk (VaR), expected shortfall, expected / unexpected loss, etc. In this study we focus on VaR and one commonly used method to estimate VaR is the VarianceCovariance method, in which normal distribution is usually assumed for asset returns that may underestimate the real risk. To address this issue, in this study we propose a series of twocomponent mixture models  one component is normal distribution and the other is a fattailed distribution such as Cauchy distribution, student's tdistribution or Gumbel distribution. Instead of assuming distribution parameters and weights to be constant, we allow them to change over time which guarantees exibility of our models. Monte Carlo ExpectationMaximization method and Monte Carlo maximum likelihood estimation were used for parameter estimation. Simulation studies are conducted and the models are applied in stock market price data.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Zhang_fsu_0071E_13150
 Format
 Thesis
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 Time Scales in Epidemiological Analysis.
 Creator

Chalise, Prabhakar, McGee, Daniel L., Chicken, Eric, Carlson, Elwood, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the...
Show moreThe Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the entryage. Also, it has been asserted that if the cumulative baseline hazard is exponential or if the ageatentry is independent of covariate, the two models are equivalent. Our studies do not satisfy these two conditions in general. We found that the true factor that makes the models perform significantly different is the variability in the ageatentry. If there is no variability in the entryage, time scales do not matter and both models estimate exactly the same coefficients. As the variability increases the models disagree with each other. We also computed the optimum time scale proposed by Oakes and utilized them for the Cox model. Both of our empirical and simulation studies show that follow up time scale model using age at entry as a covariate is better than the chronological age and Oakes time scale models. This finding is illustrated with two examples with data from Diverse Population Collaboration. Based on our findings, we recommend using follow up time as a time scale for epidemiological analysis.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3933
 Format
 Thesis
 Title
 Theories on Group Variable Selection in Multivariate Regression Models.
 Creator

Ha, SeungYeon, She, Yiyuan, Okten, Giray, Huffer, Fred, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

We study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized...
Show moreWe study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection. Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and HardRidge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low. We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure HardRidge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, λ works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of blockwise weight to l2 penalty.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7404
 Format
 Thesis
 Title
 TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES.
 Creator

SINCLAIR, DENNIS FRANKLIN., Florida State University
 Abstract/Description

Character displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing...
Show moreCharacter displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields., Consider the model, X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i),, where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance., Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.), Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratiosan important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns., We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested., Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8223194, 3085332, FSDT3085332, fsu:74827
 Format
 Document (PDF)
 Title
 Tests and Classifications in Adaptive Designs with Applications.
 Creator

Chen, Qiusheng, Niu, Xufeng, McGee, Daniel, Slate, Elizabeth H., Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Statistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a modelbased identification method, the popular ttest, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we...
Show moreStatistical tests for biomarker identification and classification methods for patient grouping are two important topics in adaptive designs of clinical trials. In this article, we evaluate four test methods for biomarker identification: a modelbased identification method, the popular ttest, the nonparametric Wilcoxon Rank Sum test, and the Least Absolute Shrinkage and Selection Operator (Lasso) method. For selecting the best classification methods in Stage 2 of an adaptive design, we examine classification methods including the recently developed machine learning approaches such as Random Forest, Lasso and ElasticNet Regularized Generalized Linear Models (Glmnet), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Extreme Gradient Boost ing (XGBoost). Statistical simulations are carried out in our study to assess the performance of biomarker identification methods and the classification methods. The best identification method and the classification technique will be selected based on the True Positive Rate (TPR,also called Sensitivity) and the True Negative Rate (TNR,also called Specificity). The optimal test method for gene identification and classification method for patient grouping will be applied to the Adap tive Signature Design (ASD) for the purpose of evaluating the performance of ASD in different situations, including simulated data and a real data set for breast cancer patients.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Chen_fsu_0071E_14309
 Format
 Thesis
 Title
 TESTING WHETHER NEW IS BETTER THAN USED OF A SPECIFIED AGE.
 Creator

PARK, DONG HO., Florida State University
 Abstract/Description

This research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x ...
Show moreThis research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0., The NBU and NBUt(,0) classes have dual classes (New Worse Than Used and New Worse Than Used At t(,0), respectively) defined by reversing the inequality., The NBUt(,0) class is a new class of life distributions and contains the NBU class. We study the basic properties of the NBUt(,0) class and propose a test of H(,0): F(x+t(,0))(' )=(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0, versus H(,A): F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0 and the inequality holds for some x (GREATERTHEQ) 0, based on a complete random sample X(,1), ..., X(,n) from F. Our test can also be used to test H(,0) against the NWUt(,0) alternatives. Asymptotic relative efficiencies of our test with respect to the Hollander and Proschan (1972, Ann. Math. Statist. 43, 11361146) NBU test are calculated for several distributions., We extend our test of H(,0) versus H(,A) to accommodate randomly censored data. For the censored data situation our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where F is the KaplanMeier (1958, J. Amer. Statist. Assoc. 53, 457481) estimator of(' )F. Under mild regularity conditions on the amount of censoring, a consistent test of H(,0) versus H(,A) for the randomly censored model is obtained., In Chapter III we develop a twosample NBU test of the null hypothesis that two distributions F and G are equal, versus the alternative that F is "more NBU" than is G. Our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m and n are the sample sizes from F and G, and F(,m) and G(,n) are the empirical distributions of F and G. Asymptotic normality of T(,m,n), suitably normalized, is a direct consequence of Hoeffding's (1948, Ann. Math. Statist. 19, 293325) Ustatistic theorem. Then, using a consistent estimator of the null asymptotic variance of N(' 1/2)T(,m,n), where N = m + n, we obtain an asymptotically distributionfree test. We extend the twosample NBU test to the ksample case., Our test of H(,0) versus H(,A) utilizes the KaplanMeier estimator. However, there are other possible estimators of the survival function for the randomly censored model. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author.) UMI
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8301540, 3085466, FSDT3085466, fsu:74958
 Format
 Document (PDF)
 Title
 TESTING WHETHER MEAN RESIDUAL LIFE CHANGES TREND.
 Creator

GUESS, FRANK MITCHELL., Florida State University
 Abstract/Description

Given that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" ...
Show moreGiven that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" (DIMRL), models aging that is initially adverse, then beneficial. We present situations where IDMRL (DIMRL) distributions are useful models. We propose two testing procedures for H(,0): constant MRL (i.e., exponentiality) versus H(,1): IDMRL, but not constant MRL (or H(,1)(''): DIMRL, but not constant MRL). The first testing procedure assumes the turning point, (tau), from IMRL to DMRL is specified by the user or is known. Our IDMRL((tau)) test statistic, T(,n), is a differentiable statistical function of order 1; thus, T(,n), suitably standardized is asymptotically normal. The second procedure assumes knowledge of the proportion, (rho), of the population that "dies" at or before the turning point (knowledge of (tau) itself is not assumed). We use Lstatistic theory to show our IDMRL((rho)) test statistic, V(,n)('*), appropriately standardized is asymptotically normal. The exact null distribution of V(,n)('*) is established. For each of these procedures an application is given. After this we modify the complete data tests to yield analogous censored data procedures. The standard KaplanMeier Estimator is a key tool that we exploit for our censored data tests. A limited Monte Carlo study investigates the censored data procedures.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8428699, 3085942, FSDT3085942, fsu:75428
 Format
 Document (PDF)
 Title
 Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters.
 Creator

Guo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department...
Show moreGuo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated...
Show moreOur view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar kads with a VeroneseWhitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VWmeans. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively antimean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet antimean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic antimean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( antimean ) located at a smooth point, one derives the limit distribution of such estimators.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Guo_fsu_0071E_13977
 Format
 Thesis
 Title
 Testing for a timedependent covariate effect in the linear risk model.
 Creator

Amirsehi, Kourosh., Florida State University
 Abstract/Description

We propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the...
Show moreWe propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the nonparametric and the partly parametric linear risk models using the method of sieves. We carry a simulation study to assess the performance of the proposed test and apply it to real data from a clinical trial on myelomatosis.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9620872, 3088860, FSDT3088860, fsu:77659
 Format
 Document (PDF)
 Title
 Survival Analysis Using Bayesian Joint Models.
 Creator

Xu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department...
Show moreXu, Zhixing, Sinha, Debajyoti, Schatschneider, Christopher, Bradley, Jonathan R., Chicken, Eric, Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

In many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latentclass based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous sharedfrailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or...
Show moreIn many clinical studies, each patient is at risk of recurrent events as well as the terminating event. In Chapter 2, we present a novel latentclass based semiparametric joint model that offers clinically meaningful and estimable association between the recurrence profile and risk of termination. Unlike previous sharedfrailty based joint models, this model has a coherent interpretation of the covariate effects on all relevant functions and model quantities that are either conditional or unconditional on events history. We offer a fully Bayesian method for estimation and prediction using a complete specification of the prior process of the baseline functions. When there is a lack of prior information about the baseline functions, we derive a practical and theoretically justifiable partial likelihood based semiparametric Bayesian approach. Our Markov Chain Monte Carlo tools for both Bayesian methods are implementable via publicly available software. Practical advantages of our methods are illustrated via a simulation study and the analysis of a transplant study with recurrent NonFatal Graft Rejections (NFGR) and the termination event of death due to total graft rejection. In Chapter 3, we are motivated by the important problem of estimating Daily Fine Particulate Matter (PM2.5) over the US. Tracking and estimating Daily Fine Particulate Matter (PM2.5) is very important as it has been shown that PM2.5 is directly related to mortality related to the lungs, cardiovascular system, and stroke. That is, high values of PM2.5 constitute a public health problem in the US, and it is important that we precisely estimate PM2.5 to aid in public policy decisions. Thus, we propose a Bayesian hierarchical model for highdimensional ``multitype" responses. By ``multitype" responses we mean a collection of correlated responses that have different distributional assumptions (e.g., continuous skewed observations, and countvalued observations). The Centers for Disease Control and Prevention (CDC) database provides counts of mortalities related to PM2.5 and daily averaged PM2.5 which are treated as responses in our analysis. Our model capitalizes on the shared conjugate structure between the Weibull (to model PM2.5), Poisson (to model diseases mortalities), and multivariate loggamma distributions, and use dimension reduction to aid with computation. Our model can also be used to improve the precision of estimates and estimate at undisclosed/missing counties. We provide a simulation study to illustrate the performance of the model and give an indepth analysis of the CDC dataset.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xu_fsu_0071E_15078
 Format
 Thesis
 Title
 A Study of the Asymptotic Properties of Lasso Estimates for Correlated Data.
 Creator

Gupta, Shuva, Bunea, Florentina, Gert, Joshua, Hollander, Myles, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors...
Show moreIn this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors that have a weakdependency structure (Doukhan 1996). In all cases, the number M of regression variables is allowed to exceed the sample size n. We further investigate the asymptotic distribution of our estimates, when M < n, and show that under appropriate choices of the tuning parameters the limiting distribution is multivariate normal. This generalizes to the case of correlated errors the result of Knight and Fu (2000), obtained for regression models with independent errors.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3896
 Format
 Thesis
 Title
 A Study of Some Issues of GoodnessofFit Tests for Logistic Regression.
 Creator

Ma, Wei, McGee, Daniel, Mai, Qing, Levenson, Cathy W., Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Goodnessoffit tests are important to assess how well a model fits a set of observations. HosmerLemeshow (HL) test is a popular and commonly used method to assess the goodnessoffit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number...
Show moreGoodnessoffit tests are important to assess how well a model fits a set of observations. HosmerLemeshow (HL) test is a popular and commonly used method to assess the goodnessoffit for logistic regression. However, there are two issues for using HL test. One of them is that we have to specify the number of partition groups and the different groups often suggest the different decisions. So in this study, we propose several grouping tests to combine multiple HL tests with varying the number of groups to make the decision instead of just using one arbitrary group or finding the optimum group. This is due to the reason that the best selection for the groups is datadependent and it is not easy to find. The other drawback of HL test is that it is not powerful to detect the violation of missing interactions between continuous and dichotomous covariates. Therefore, we propose global and interaction tests in order to capture such violations. Simulation studies are carried out to assess the Type I errors and powers for all the proposed tests. These tests are illustrated by the bone mineral density data from NHANES III.
Show less  Date Issued
 2018
 Identifier
 2018_Su_Ma_fsu_0071E_14681
 Format
 Thesis
 Title
 The Studies of Joint Structure Sparsity Pursuit in the Applications of Hierarchical Variable Selection and Fused Lasso.
 Creator

Jiang, He, She, Yiyuan, Ökten, Giray, Barbu, Adrian G. (Adrian Gheorghe), Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a twoway interaction model. In highdimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an...
Show moreIn this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a twoway interaction model. In highdimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an interaction term implies that at least one or both associated main effects must be present. Lately this problem has attracted a lot of attentions from statisticians, but existing computational algorithms converge slow and cannot meet the challenge of big data computation. More importantly, theoretical studies of hierarchical variable selection are extremely scarce, largely due to the difficulty that multiple sparsitypromoting penalties are enforced on the same subject. This work investigates a new type of estimator based on group multiregularization to capture various types of structural parsimony simultaneously. In this work, we present nonasymptotic results based on combined statistical and computational analysis, and reveal the minimax optimal rate. A generalpurpose algorithm is developed with a theoretical guarantee of strict iterate convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach. The second topic studies Fused Lasso which pursues joint sparsity of both variables and their consecutive differences simultaneously. The overlapping penalties of Fused Lasso pose critical challenges to computation studies and theoretical analysis. Some theoretical analysis about fused lasso, however, is only performed under an orthogonal design and there is hardly any nonasymptotic study in the past literature. In this work, we study Fused Lasso and its application in a classification problem to achieve exact clustering. Computationally, we derive a simpletoimplement algorithm which scales well to big data computation; in theory, we propose a brand new technique and some nonasymptotic analysis are performed. To evaluate the prediction performance theoretically, we derived oracle inequality of Fused Lasso estimator to show the $ell_2$ prediction error rate. The minimax optimal rate is also revealed. For estimation accuracy, $ell_q (1leq q leq infty)$ norm error bound for fused lasso estimator is derived. The simulation studies shows that exact clustering can be achieved using postthresholding technique.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9362
 Format
 Thesis
 Title
 Structural Health Monitoring with LambWave Sensors: Problems in Damage Monitoring, Prognostics and Multisensory Decision Fusion.
 Creator

Mishra, Spandan, Vanli, Omer Arda, Okoli, Okenwa, Jung, Sungmoon, Park, Chiwoo, Florida State University, FAMUFSU College of Engineering, Department of Industrial and...
Show moreMishra, Spandan, Vanli, Omer Arda, Okoli, Okenwa, Jung, Sungmoon, Park, Chiwoo, Florida State University, FAMUFSU College of Engineering, Department of Industrial and Manufacturing Engineering
Show less  Abstract/Description

Carbon ﬁber reinforced composites (CFRC) have several desirable traits that can be exploited in the design of advanced structures and systems. The applications requiring high strength toweight ratio and high stiﬀnesstoweight ratio such as, fuselage of airplanes, wind turbine blades, waterboats etc. have found profound use of CFRC. Furthermore, low density, good vibration damping ability, easy manufacturability, carbon ﬁber’s electrical conductivity, as well as high thermal conductivity...
Show moreCarbon ﬁber reinforced composites (CFRC) have several desirable traits that can be exploited in the design of advanced structures and systems. The applications requiring high strength toweight ratio and high stiﬀnesstoweight ratio such as, fuselage of airplanes, wind turbine blades, waterboats etc. have found profound use of CFRC. Furthermore, low density, good vibration damping ability, easy manufacturability, carbon ﬁber’s electrical conductivity, as well as high thermal conductivity and smooth surface ﬁnish provide additional beneﬁts to the users. Various applications of CFRC can be relevant for aerospace, military, windturbines, robotics, sports equipment etc. However, among many advantages of CFRC there are a few disadvantages; CFRC undergo completely diﬀerent failure patterns compared to metals. Once the yield strength is exceeded, CFRC will fail suddenly and catastrophically. The inherent anisotropic nature of CFRC makes it very diﬃcult for traditional condition monitoring methods to assess the condition of the structure. The complex failure patterns, including delamination, microcracks, and matrixcracks require specialized sensing and monitoring schemes for composite structure. This Ph.D. research is focuses on developing an integrated structural health monitoring methodology for damage monitoring, remaining useful life estimation (RUL), and decision fusion using Lambwave data. The main objective of this research is to develop an integrated damage detection method that utilizes Lambwave sensor data to infer the state of the damage condition and make an accurate prognosis of the structure. Slow fatigue loading results in very unique failure patterns in the CFRC structures, fatigue damage ﬁrst manifests itself as ﬁberbreakage and then slowly progresses to matrixcracks and that ultimately leads to delamination damage. This type of failure process is very diﬃcult to monitor using the traditionally used damage monitoring methods such as Xray evaluation, ultrasonic evaluation, infrared evaluation etc. For this research, we have used principal component (PC) based multivariate cumulative sum (MCUSUM) to monitor the structure. MCUSUM chart is very useful when monitoring structures undergoing slow and gradual change. For remainingusefullife (RUL) estimation, we have proposed to use the Wiener process model coupled with principal component regression (PCR). For damage detection/classiﬁcation we studied discriminant analysis, inspite of the popular use in image analysis and in the gene data classiﬁcation problem, has not been widely used for damage classiﬁcation. In this research, we showed that discriminant analysis is a useful detecting known damage modes, while dealing with the high dimensionality of Lambwave data. We modiﬁed the standard Gaussian discriminant analysis by introducing regularization parameters to directly process raw Lambwave data without requiring an intermediate feature extraction step.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Mishra_fsu_0071E_13346
 Format
 Thesis
 Title
 STOCHASTIC VERSIONS OF REARRANGEMENT INEQUALITIES WITH APPLICATIONS TO STATISTICS.
 Creator

D'ABADIE, CATHERINE ANNE., Florida State University
 Abstract/Description

In this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and...
Show moreIn this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and the function f from R('n) x R('n) into R('n) is monotone with respect to a certain partial ordering on R('n) x R('n) then for every permutation (pi) the stochastic inequalities, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), hold. This result yields a unified way of obtaining stochastic versions of rearrangement inequalities., We then show that many multivariate densities of interest in statistical practice govern pairs of random vectors which are SSA., Next we show that under certain statistical operations on pairs of SSA random vectors the property of being SSA is preserved. For example, we show that the rank order of SSA random variables is SSA. We also show that the SSA property is preserved under certain contamination models., Finally, we show how the results we obtain can be applied to problems in hypothesis testing.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8205717, 3085181, FSDT3085181, fsu:74676
 Format
 Document (PDF)
 Title
 Stochastic Models and Inferences for Commodity Futures Pricing.
 Creator

Ncube, Moeti M., Srivastava, Anuj, Doran, James, Mason, Patrick, Niu, Xufeng, Huﬀer, Fred, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis...
Show moreThe stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis, we propose an alternative Kalman Smoother Expectation Maximization procedure (KSEM) that can jointly estimate all the parameters and produces better model t that compared to alternative estimation procedures. Further, we consider the additional complexity of the modeling of jumps or spikes that may occur in a time series. For this calibration we develop a Particle Smoother Expectation Maximization procedure (PSEM) for the optimization of nonlinear systems. This is an entirely new estimation approach, and we provide several examples of it's application.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2707
 Format
 Thesis
 Title
 Statistical Shape Analysis on Manifolds with Applications to Planar Contours and Structural Proteomics.
 Creator

Ellingson, Leif A., Patrangenaru, Vic, Mio, Washington, Zhang, Jinfeng, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so...
Show moreThe technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the threedimensional atomic structures of protein binding sites. First, we adapt Kendall's definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezold's nonparametric view of Frechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of HilbertSchmidt operators in order to obtain the VeroneseWhitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a onesample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using extrinsic reflection sizeandshape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0053
 Format
 Thesis
 Title
 Statistical Shape Analysis of Neuronal Tree Structures.
 Creator

Duncan, Adam, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Neuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in...
Show moreNeuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in ℝ³  which emanate from the main branch at arbitrary points. We present two shapeanalytic frameworks which each give a metric structure to the set of such tree shapes, Both frameworks are based on an elastic metric on the space of curves with certain shapepreserving nuisance variables modded out. In the first framework, the side branches are treated as a continuum of curvevalued annotations to the main branch. In the second framework, the side branches are treated as discrete entities and are matched to each other by permutation. We show geodesic deformations between tree shapes in both frameworks, and we show Fréchet means and modes of variability, as well as crossvalidated classification between different experimental groups using the second framework. We conclude with a smaller project which extends some of these ideas to more general weighted attributed graphs.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Duncan_fsu_0071E_14500
 Format
 Thesis
 Title
 Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait Recognition.
 Creator

Kaziska, David M., Srivastava, Anuj, Mio, Washington, Chicken, Eric, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal...
Show moreIn this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclostationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using clusterbased trees and boosting.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3275
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 Statistical Methods for Big Data and Their Applications in Biomedical Research.
 Creator

Yu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences,...
Show moreYu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Big data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed...
Show moreBig data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed a novel threestage BN structure learning method, called GRASP (GRowthbased Approach with Staged Pruning). In the first stage, a new skeleton (undirected edges) discovery method, double filtering (DF), was designed. Compared to existing methods, DF requires smaller sample sizes to achieve similar statistical power. Based on the skeleton estimated in the first step, we proposed a sequential Monte Carlo (SMC) method to sample the edges and their directions to optimize a BICbased score. SMC method has less tendency to be trapped in local optima, and the computation is easily parallelizable. On the third stage, we reclaim the edges that may be missed from previous stages. We obtained satisfactory results from simulation study and applied the method to infer GRNs from real experimental data. A method on personalized chemotherapy regimen selection for breast cancer and a novel algorithm for relationship extraction from unstructured documents will be discussed as well.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Yu_fsu_0071E_13079
 Format
 Thesis
 Title
 A Statistical Approach to an Ocean Circulation Inverse Problem.
 Creator

Choi, Seoeun, Huﬀer, Fred W., Speer, Kevin G., Nolder, Craig, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the...
Show moreThis dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the prior, the equation of the Stommel Gulf Stream model on a twodimensional grid is used.Comparisons with earlier approaches used by oceanographers are also presented.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd3758
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Statistical Analysis on Object Spaces with Applications.
 Creator

Yao, Kouadio David, Patrangenaru, Victor, Kercheval, Alec N., Liu, Xiuwen, Mio, Washington, Wang, Xiaoming, Florida State University, College of Arts and Sciences, Department of...
Show moreYao, Kouadio David, Patrangenaru, Victor, Kercheval, Alec N., Liu, Xiuwen, Mio, Washington, Wang, Xiaoming, Florida State University, College of Arts and Sciences, Department of Mathematics
Show less  Abstract/Description

Most of the data encountered is bounded nonlinear data. The Universe is bounded, planets are sphere like shaped objects, and life growing on Earth comes in various shapes and colors that can hardly be represented as points on a linear space, and even if the object space they sit on is embedded in a Euclidean space, their mean vector can not be represented as a point on that object space, except for the case when such space is convex. To address this misgiving, since the mean vector is the...
Show moreMost of the data encountered is bounded nonlinear data. The Universe is bounded, planets are sphere like shaped objects, and life growing on Earth comes in various shapes and colors that can hardly be represented as points on a linear space, and even if the object space they sit on is embedded in a Euclidean space, their mean vector can not be represented as a point on that object space, except for the case when such space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Fr\'echet (1948), on a compact metric space, one may consider both minimizers and maximizers of the expected square distance to a given point on the object space as mean, respectively {\bf antimean} of a given random point. Of all distances on a object space, one considers here the chord distance associated with an embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Fr\'echet mean (respectively Fr\'echet antimean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic antimean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover one derives the limit distribution of such estimators around a mean located at a smooth extrinsic antimean. Extrinsic analysis is thus a general framework that allows one to run object data analysis on nonlinear object spaces that can be embedded in a numerical space. In particular one focuses on VeroneseWhitney (VW) means and antimeans of 3D projective shapes of configurations extracted from digital camera images. The 3D data extraction is greatly simplified by an RGB based algorithm followed by the FaugerasHartleyGuptaChen 3D reconstruction method. In particular one derives two sample tests for face analysis based on projective shapes, and more generally a MANOVA on manifolds method to be used in 3D projective shape analysis. The manifold based approach is also applicable to financial data analysis for exchange rates.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Yao_fsu_0071E_13605
 Format
 Thesis
 Title
 Statistical Analysis of Trajectories on Riemannian Manifolds.
 Creator

Su, Jingyong, Srivastava, Anuj, Klassen, Erik, Huffer, Fred, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

This thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete...
Show moreThis thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete at unsynchronized times. Second, trajectories are observed under arbitrary temporal evolutions. In this work, we first address the problem of estimating full smooth trajectories on nonlinear manifolds using only a set of timeindexed points, for use in interpolation, smoothing, and prediction of dynamic systems. Furthermore, we study statistical analysis of trajectories that take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal evolutions. The problem of analyzing such temporal trajectories including registration, comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity that provides both a cost function for temporal registration and a proper distance for comparison of trajectories. This distance, in turn, is used to define statistical summaries, such as the sample means and covariances, of given trajectories and Gaussiantype models to capture their variability. Both theoretical proofs and experimental results are provided to validate our work. The problems of detecting, classifying, and estimating shapes in point cloud data are important due to their general applicability in image analysis, computer vision, and graphics. They are challenging because the data is typically noisy, cluttered, and unordered. We study these problems using a fully statistical model where the data is modeled using a Poisson process on the objects boundary (curves or surfaces), corrupted by additive noise and a clutter process. Using likelihood functions dictated by the model, we develop a generalized likelihood ratio test for detecting a shape in a point cloud. Additionally, we develop a procedure for estimating most likely shapes in observed point clouds under given shape hypotheses. We demonstrate this framework using examples of 2D and 3D shape detection and estimation in both real and simulated data, and a usage of this framework in shape retrieval from a 3D shape database.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7619
 Format
 Thesis
 Title
 Spatiotemporal Bayesian Hierarchical Models, with Application to Birth Outcomes.
 Creator

Norton, Jonathan D. (Jonathan David), Niu, Xufeng, Eberstein, Isaac, Huﬀer, Fred, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

A class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear...
Show moreA class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear predictor. Temporal dependence is incorporated by including a temporal IAR term also. It is shown that the variance parameters underlying these random effects (IAR, convolution, convolution plus temporal IAR) are identifiable. The same results are also shown to hold when the IAR is replaced by a conditional autoregressive (CAR) model. Furthermore, properties of the CAR parameter ρ are explored. The Deviance Information Criterion (DIC) is considered as a way to compare spatial hierarchical models. Simulations are performed to test whether the DIC can identify whether binomial outcomes come from an IAR, an IAR convolution, or independent normal deviates. Having established the theoretical foundations of the class of models and validated the DIC as a means of comparing models, we examine preterm birth and low birth weight counts in the state of Arkansas from 1994 to 2005. We find that preterm birth and low birth weight have different spatial patterns of risk, and that rates of low birth weight can be fit with a strikingly simple model that includes a constant spatial effect for all periods, a linear trend, and three covariates. It is also found that the risks of each outcome are increasing over time, even with adjustment for covariates.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd2523
 Format
 Thesis
 Title
 Spatial Statistics and Its Applications in Biostatistics and Environmental Statistics.
 Creator

Hu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreHu, Guanyu, Huffer, Fred W. (Fred William), Paek, Insu, Sinha, Debajyoti, Slate, Elizabeth H., Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for...
Show moreThis dissertation presents some topics in spatial statistics and their application in biostatistics and environmental statistics. The field of spatial statistics is an energetic area in statistics. In Chapter 2 and Chapter 3, the goal is to build subregion models under the assumption that the responses or the parameters are spatially correlated. For regression models, considering spatially varying coecients is a reasonable way to build subregion models. There are two different techniques for exploring spatially varying coecients. One is geographically weighted regression (Brunsdon et al. 1998). The other is a spatially varying coecients model which assumes a stationary Gaussian process for the regression coecients (Gelfand et al. 2003). Based on the ideas of these two techniques, we introduce techniques for exploring subregion models in survival analysis which is an important area of biostatistics. In Chapter 2, we introduce modied versions of the KaplanMeier and NelsonAalen estimators which incorporate geographical weighting. We use ideas from counting process theory to obtain these modied estimators, to derive variance estimates, and to develop associated hypothesis tests. In Chapter 3, we introduce a Bayesian parametric accelerated failure time model with spatially varying coefficients. These two techniques can explore subregion models in survival analysis using both nonparametric and parametric approaches. In Chapter 4, we introduce Bayesian parametric covariance regression analysis for a response vector. The proposed method denes a regression model between the covariance matrix of a pdimensional response vector and auxiliary variables. We propose a constrained MetropolisHastings algorithm to get the estimates. Simulation results are presented to show performance of both regression and covariance matrix estimates. Furthermore, we have a more realistic simulation experiment in which our Bayesian approach has better performance than the MLE. Finally, we illustrate the usefulness of our model by applying it to the Google Flu data. In Chapter 5, we give a brief summary of future work.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Hu_fsu_0071E_14205
 Format
 Thesis