Current Search: Research Repository (x) » Statistics (x) » College of Arts and Sciences (x)
Search results
Pages
 Title
 Weighted Adaptive Methods for Multivariate Response Models with an HIV/Neurocognitive Application.
 Creator

Geis, Jennifer Ann, She, Yiyuan, MeyerBaese, Anke, Barbu, Adrian, Bunea, Florentina, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the...
Show moreMultivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the error matrix has independent constant variance entries. While this assumption is necessary to show their strong theoretical results, in practical application, some flexibility is required. That is, such assumption cannot always be safely made. What is developed here are the theoretics that parallel Bunea et al.'s work with the addition of a "decorrelator" weight matrix. One choice for the weight matrix is the residual covariance, but this introduces many issues in practice. A computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving rise to an Adaptive CCA (ACCA) with principal proofs for the large sample setting. However, particular considerations are required for the highdimensional setting, where similar theoretics do not hold. What is offered instead are extensive empirical simulations that reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also provides good estimation of the number of canonical relationships and variates. It is argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the highdimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these highdimensional concerns. To offer a practical application of these ideas, ACCA is applied to a "large sample'" neurocognitive dataset. Then, a highdimensional dataset is generated to which Group LASSO will be first utilized before ACCA. This provides a unique perspective into the relationships between cognitive deficiencies in HIVpositive patients and the extensive, available neuroimaging measures.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4861
 Format
 Thesis
 Title
 A WeaklyInformative GroupSpecific Prior Distribution for MetaAnalysis.
 Creator

Thompson, Christopher, Becker, Betsy Jane, Clark, Kathleen M., Almond, Russell G., Aloe, Ariel M., Yang, Yanyun, Florida State University, College of Education, Department of...
Show moreThompson, Christopher, Becker, Betsy Jane, Clark, Kathleen M., Almond, Russell G., Aloe, Ariel M., Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

While Bayesian metaanalysis has flourished both in methodological and substantive work, groupspecific Bayesian modeling remains scarce. Common practice for choosing prior distributions entails using typical noninformative priors. Currently, there is a push to use more informative prior distributions. In this dissertation I propose a group specific weakly informative prior distribution. The new prior distribution uses a frequentist estimate of betweenstudies heterogeneity as the...
Show moreWhile Bayesian metaanalysis has flourished both in methodological and substantive work, groupspecific Bayesian modeling remains scarce. Common practice for choosing prior distributions entails using typical noninformative priors. Currently, there is a push to use more informative prior distributions. In this dissertation I propose a group specific weakly informative prior distribution. The new prior distribution uses a frequentist estimate of betweenstudies heterogeneity as the noncentrality parameter in a folded noncentral t distribution. This new distribution is then modeled individually for groups based on some categorical factor. An extensive simulation study was performed to assess the performance of the new groupspecific prior distribution to several noninformative prior distributions in a variety of metaanalytic scenarios. An application using data from a previously published metaanalysis on dynamic geometry software is also provided.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Thompson_fsu_0071E_13051
 Format
 Thesis
 Title
 Variable Selection of Correlated Predictors in Logistic Regression: Investigating the DietHeart Hypothesis.
 Creator

Thompson, Warren R. (Warren Robert), McGee, Daniel, Eberstein, Isaac, Huﬀer, Fred, Sinha, Debajyoti, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the...
Show moreVariable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the dietheart hypothesis. The dietheart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program  a study of CHD incidence in men of Japanese descent. Our results were largely methodspecific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1360
 Format
 Thesis
 Title
 The Use of a MetaAnalysis Technique in Equating and Its Comparison with Several Small Sample Equating Methods.
 Creator

Caglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology...
Show moreCaglak, Serdar, Paek, Insu, Patrangenaru, Victor, Almond, Russell G., Roehrig, Alysia D., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

The main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study....
Show moreThe main objective of this study was to investigate the improvement of the accuracy of small sample equating, which typically occurs in teacher certification/licensure examinations due to a low volume of test takers per test administration, under the NonEquivalent Groups with Anchor Test (NEAT) design by combining previous and current equating outcomes using a metaanalysis technique. The proposed metaanalytic score transformation procedure was called "metaequating" throughout this study. To conduct metaequating, the previous and current equating outcomes obtained from the chosen equating methods (ID (Identity Equating), CircleArc (CA) and Nominal Weights Mean (NW)) and synthetic functions (SFs) of these methods (CAS and NWS) were used, and then, empirical Bayesian (EB) and metaequating (META) procedures were implemented to estimate the equating relationship between test forms at the population level. The SFs were created by giving equal weight to each of the chosen equating methods and the identity (ID) equating. Finally, the chosen equating methods, the SFs of each method (e.g., CAS, NWS, etc.), and also the META and EB versions (e.g., NWEB, CAMETA, NWSMETA, etc.) were investigated and compared under varying testing conditions. These steps involved manipulating some of the factors that influence the accuracy of test score equating. In particular, the effect of test form difficulty levels, the groupmean ability differences, the number of previous equatings, and the sample size on the accuracy of the equating outcomes were investigated. The Chained Equipercentile (CE) equating with 6univariate and 2bivariate moments loglinear presmoothing was used as the criterion equating function to establish the equating relationship between the new form and the base (reference) form with 50,000 examinees per test form. To compare the performance of the equating methods, small numbers of examinee samples were randomly drawn from examinee populations with different ability levels in each simulation replication. Each pairs of the new and base test forms were randomly and independently selected from all available condition specific test form pairs. Those test forms were then used to obtain previous equating outcomes. However, purposeful selections of the examinee ability and test form difficulty distributions were made to obtain the current equating outcomes in each simulation replication. The previous equating outcomes were later used for the implementation of both the META and EB score transformation procedures. The effect of study factors and their possible interactions on each of the accuracy measures were investigated along the entirescore range and the cut (reduced)score range using a series of mixedfactorial ANOVA (MFA) procedures. The performances of the equating methods were also compared based on posthoc tests. Results show that the behaviors of the equating methods vary based on the each level of the group ability difference, test form difficult difference, and new group examinee sample size. Also, the use of both META and EB procedures improved the accuracy of equating results on average. The META and EB versions of the chosen equating methods therefore might be a solution to equate the test forms that are similar in their psychometric characteristics and also taken by new form examinee samples less than 50. However, since there are many factors affecting the equating results in reality, one should always expect that equating methods and score transformation procedures, or in more general terms, estimation procedures may function differently, to some degree, depending on conditions in which they are implemented. Therefore, one should consider the recommendations for the use of the proposed equating methods in this study as a piece of information, not an absolute guideline, for a rule of thumbs for practicing small sample test equating in teacher certification/licensure examinations.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Caglak_fsu_0071E_12863
 Format
 Thesis
 Title
 Ultrafast Lattice Dynamics in Metal Thin Films and NanoParticles.
 Creator

Wang, Xuan, Cao, Jim, Yang, Wei, Bonesteel, Nicholas, Riley, Mark, Xiong, Peng, Department of Physics, Florida State University
 Abstract/Description

This thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signalto...
Show moreThis thesis presents the new development of the 3rd generation femtosecond diffractometer (FED) in Professor Jim Cao's group and its application to study ultrafast structural dynamics of solid state materials. The 3rd generation FED prevails its former type and other similar FED instruments by a DC electron gun that can generate much higher energy electron pulses, and a more efficient imaging system. This combination together with miscellaneous improvements significantly boosts the signaltonoise ratio and thus enables us to study more complex solid state materials. Two main thrusts are discussed in details in this thesis. The first one is the dynamics of coherent phonon generation by ultrafast heating in gold thin film and nanoparticles, which emphasizes the electronic thermal stress. The other one is the ultrafast dynamics in Nickel, which shows that the mutual interactions among lattice, spin and electron subsystems can significantly alter the ultrafast lattice dynamics. In these studies, we exploit the advantage of FED instrument as an ideal tool that can directly and simultaneously monitor the coherent and random motion of lattice.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1247
 Format
 Thesis
 Title
 TWOWAY CLUSTER ANALYSIS WITH NOMINAL DATA.
 Creator

COOPER, PAUL GAYLORD., Florida State University
 Abstract/Description

Consider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows,...
Show moreConsider an M by N data matrix X whose elements may assume values 0, 1, 2, . . ., H. Denote the rows of X by (alpha)(,1), (alpha)(,2), . . ., (alpha)(,M). A tree on the rows of X is a sequence of distinct partitions {P(,1)}(,i=1) such that: (a) P(,1) = {((alpha)(,1)), . . ., ((alpha)(,M))}, (b) P(,i) is a refinement of P(,i+1) for i = 1, . . ., k1, and (c) P(,k) = {((alpha)(,1), . . ., (alpha)(,M))}. The twoway clustering problem consists of simultaneously constructing trees on the rows, columns, and elements of X. A generalization of a twoway joining algorithm (TWJA) introduced by J. A. Hartigan (1975) is used to construct the three trees., The TWJA requires the definition of measures of dissimilarity between row clusters and column clusters respectively. Two approaches are used in the construction of these dissimilarity coefficientsone based on intuition and one based on a formal prediction model. For matrices with binary elements (0 or 1), measures of dissimilarity between row or column clusters are based on the number of mismatching pairs. Consider two distinct row clusters R(,p) and R(,q) containing m(,p) and m(,q) rows respectively. One measure of dissimilarity, d(,0)(R(,p), R(,q)), between R(,p) and R(,q), is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where b(,p(beta)) and b(,q(beta)) are the number of ones in column (beta) of clusters R(,p) and R(,q) respectively. Two additional intuitive dissimilarity coefficients are also defined and studied., For matrices containing nominal level data, dissimilarity coefficients are based on a formal prediction model. Analogous to the procedure of Cleveland and Relles (1974), for a given data matrix, the model consists of a scheme for random selection of two rows (or columns) from the matrix and an identification rule for distinguishing between the two rows (or columns). A loss structure is defined for both rows and columns and the expected loss due to incorrect row or column identification is computed. The dissimilarity between two (say) row clusters is then defined to be the increase in expected loss due to joining those two row clusters into a single cluster., Stopping criteria are suggested for both the intuitive and prediction model approaches. For the intuitive approach, it is suggested that joining be stopped when the dissimilarity between the (say) row clusters to be joined next exceeds that expected by chance under the assumption that the (say) column totals of the matrix are fixed. For the prediction model approach the stopping criterion is based on a cluster prediction model in which the objective is to distinguish between row or column clusters. A cluster identification rule is defined based on the information in the partitioned data matrix and the expected loss due to incorrect cluster identification is computed. The expected cluster loss is also computed when cluster identification is based on strict randomization. The relative decrease in expected cluster loss due to identification based on the partitioned matrix versus that based on randomization is suggested as a stopping criterion., Both contrived and real data examples are used to illustrate and compare the two clustering procedures. Computational aspects of the procedure are discussed and it is concluded that the intuitive approach is less costly in terms of computation time. Further, five admissibility properties are defined and, for certain intuitive dissimilarity coefficients, the trees produced by the TWJA are shown to possess three of the five properties.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8026123, 3084693, FSDT3084693, fsu:74194
 Format
 Document (PDF)
 Title
 Transformations of certain Gaussian random fields, with applications in survival analysis.
 Creator

Sun, Yanqing., Florida State University
 Abstract/Description

It has been almost sixty years since Kolmogorov introduced a distributionfree test for the simple null hypothesis that a distribution function coincides with a given distribution function. In 1949 Doob observed that Kolmogorov's approach could be simplified by transforming the empirical process to an empirical process based on uniform random variables. In recent years this approach has led to the construction of distributionfree tests when unknown parameters are present. The purpose of this...
Show moreIt has been almost sixty years since Kolmogorov introduced a distributionfree test for the simple null hypothesis that a distribution function coincides with a given distribution function. In 1949 Doob observed that Kolmogorov's approach could be simplified by transforming the empirical process to an empirical process based on uniform random variables. In recent years this approach has led to the construction of distributionfree tests when unknown parameters are present. The purpose of this dissertation is to apply the transformation approach in the setting of survival analysis, where censoring and covariate information further complicate the problem. Asymptotic distributionfree tests are developed for testing independence of a survival time from a covariate, and for checking the adequacy of Cox's proportional hazards model. The test statistics are obtained from certain test statistic processes (indexed by time and covariate) which converge in distribution to Brownian sheets. A simulation study is carried out to investigate the finite sample properties of the proposed tests and they are applied to data from the British Medical Research Council's (1984) 4th myelomatosis trial.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9309739, 3088014, FSDT3088014, fsu:76821
 Format
 Document (PDF)
 Title
 Transformation Models for Survival Data Analysis and Applications.
 Creator

Liu, Yang, Niu, XuFeng, Lloyd, Donald, McGee, Dan, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

It is often assumed that all uncensored subjects will eventually experience the event of interest in standard survival models. However, in some situations when the event considered is not death, it will never occur for a proportion of subjects. Survival models with a cure fraction are becoming popular in analyzing this type of study. We propose a generalized transformation model motivated by Zeng et al's (2006) transformed proportional time cure model. In our proposed model, fractional...
Show moreIt is often assumed that all uncensored subjects will eventually experience the event of interest in standard survival models. However, in some situations when the event considered is not death, it will never occur for a proportion of subjects. Survival models with a cure fraction are becoming popular in analyzing this type of study. We propose a generalized transformation model motivated by Zeng et al's (2006) transformed proportional time cure model. In our proposed model, fractional polynomials are used instead of the simple linear combination of the covariates. The proposed models give us more flexibility without loosing any good properties of the original model, such as asymptotic consistency and asymptotic normality of the regression coefficients. The proposed model will better fit the data where the relationship between a response variable and covariates is nonlinear. We also provide a power selection procedure based on the likelihood function. A simulation study is carried out to show the accuracy of the proposed power selection procedure. The proposed models are applied to coronary heart disease and cancer related medical data from both observational cohort studies and clinical trials
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1155
 Format
 Thesis
 Title
 Traits, Species, and Communities: Integrative Bayesian Approaches to Ecological Biogeography across Geographic, Environmental, Phylogenetic, and Morphological Space.
 Creator

Humphreys, John M., Elsner, James B., Steppan, Scott J., Mesev, Victor, Pau, Stephanie, Florida State University, College of Social Sciences and Public Policy, Department of...
Show moreHumphreys, John M., Elsner, James B., Steppan, Scott J., Mesev, Victor, Pau, Stephanie, Florida State University, College of Social Sciences and Public Policy, Department of Geography
Show less  Abstract/Description

Assuming a methodological perspective, this dissertation proceeds through a series of studies that cover levels of biological organization ranging from the morphological traits of individual specimens to community assemblages. The presented research explores geographic extents ranging from local to global scales, examines both plants and animals, and explores relationships among species with common ancestry. The research appraises and then proposes solutions to a variety of yet unresolved...
Show moreAssuming a methodological perspective, this dissertation proceeds through a series of studies that cover levels of biological organization ranging from the morphological traits of individual specimens to community assemblages. The presented research explores geographic extents ranging from local to global scales, examines both plants and animals, and explores relationships among species with common ancestry. The research appraises and then proposes solutions to a variety of yet unresolved issues in species distribution modeling; including, preferential sampling, spatial dependency, multiscaled spatial processes, niche equilibrium assumptions, data structure arising from shared evolutionary history, and correlations between predictor variables. Approaching the geographic distribution of wetlands as an applied concern, the study presented in Chapter 2 emphasizes that the identication and inventory of wetlands are essential components of water resource management. To be eective in these endeavors, it is critical that the process used to detect and document wetlands be time ecient, accurate, and repeatable as new environmental information becomes available. Approaches dependent on aerial photographic interpretation of land cover by individual human analysts necessitate hours of assessment, introduce human error, and fail to include the best available soils and hydrologic data. The goal of Chapter 2 is to apply hierarchical modeling and Bayesian inference to predict the probability of wetland presence as a continuous gradient with the explicit consideration of spatial structure. The presented spatial statistical model can evaluate 100 km2 at a 50 x 50 meter resolution in approximately 50 minutes while simultaneously incorporating ancillary data and accounting for latent spatial processes. Model results demonstrate an ability to consistently capture wetlands identied through aerial interpretation with greater than 90% accuracy (scaled Brier Score) and to identify wetland extents, ecotones, and hydrologic connections not identied through use of other modeling and mapping techniques. The provided model is reasonably robust to changes in resolution, areal extents between 100 km2 and 300 km2, and regionspecic physical conditions. As with modeling wetland occurrence, species distribution modeling aimed at forecasting the spread of invasive species under projected global warming also oers land managers an important tool for assessing future ecological risk and for prioritizing management actions. Chapter 3 applies Bayesian inference and newly available geostatistical tools to forecast global range expansion for the ecosystem altering invasive climbing fern Lygodium microphyllum. The presented modeling framework emphasizes the need to account for spatial processes at both the individual and aggregate levels, the necessity of modeling nonlinear responses to environmental gradients, and the explanatory power of biotic covariates. Results indicate that Lygodium microphyllum will undergo global range expansion in concert with anthropogenic global warming and that the species is likely temperature and dispersal limited. Predictions are presented for current and future climate conditions assuming both limited and unlimited dispersal scenarios. Finally, Chapter 4 provides a novel framework to combine multispecies joint modeling techniques with spatially explicit phylogenetic regression to simultaneously predict the probability of species occurrence and the geographic distribution of interspecic continuous morphological traits. Choosing the South American leafeared mice (genus: Phyllotis) as an empirical example, a threetiered phylogenetic coregionalization trait biogeography model (PhyCoRTBio) is constructed. The conditionally dependent structure of the PhyCoRTBio model enables information from multiple species and from multiple specimenspecic trait metrics to be leveraged towards estimation of a focal species distribution. I hypothesize that, relative to other commonly used species distribution modeling methods, the PhyCoRTBio approach will exhibit improved performance in predicting occurrence for species within the genus Phyllotis. After describing its statistical implementation, this hypothesis is assessed by constructing PhyCoRTBio models for six dierent Phyllotis species and then comparing results to those derived using maximum entropy methods, random forest clustering, Gaussian random eld species distribution models, and Hierarchical Bayesian species distribution models. To judge the relative performance of each modeling approach, model sensitivity (proportion of correctly predicted presences), specicity (proportion of correctly predicted absences), the area under the receiver operating characteristic curve (AUC), and the True Skill Statistic (TSS) are calculated. Findings indicate that traitbased covariates improve model performance and highlight the need to consider spatial processes and phylogenetic information during multispecies distribution modeling.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Humphreys_fsu_0071E_14298
 Format
 Thesis
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 Time Scales in Epidemiological Analysis.
 Creator

Chalise, Prabhakar, McGee, Daniel L., Chicken, Eric, Carlson, Elwood, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the...
Show moreThe Cox proportional hazards model is routinely used to determine the time until an event of interest. Two time scales are used in practice: follow up time and chronological age. The former is the most frequently used time scale both in clinical studies and longitudinal observational studies. However, there is no general consensus about which time scale is the best. In recent years, papers have appeared arguing for using chronological age as the time scale either with or without adjusting the entryage. Also, it has been asserted that if the cumulative baseline hazard is exponential or if the ageatentry is independent of covariate, the two models are equivalent. Our studies do not satisfy these two conditions in general. We found that the true factor that makes the models perform significantly different is the variability in the ageatentry. If there is no variability in the entryage, time scales do not matter and both models estimate exactly the same coefficients. As the variability increases the models disagree with each other. We also computed the optimum time scale proposed by Oakes and utilized them for the Cox model. Both of our empirical and simulation studies show that follow up time scale model using age at entry as a covariate is better than the chronological age and Oakes time scale models. This finding is illustrated with two examples with data from Diverse Population Collaboration. Based on our findings, we recommend using follow up time as a time scale for epidemiological analysis.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3933
 Format
 Thesis
 Title
 Theories on Group Variable Selection in Multivariate Regression Models.
 Creator

Ha, SeungYeon, She, Yiyuan, Okten, Giray, Huffer, Fred, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

We study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized...
Show moreWe study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection. Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and HardRidge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low. We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure HardRidge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, λ works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of blockwise weight to l2 penalty.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7404
 Format
 Thesis
 Title
 TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES.
 Creator

SINCLAIR, DENNIS FRANKLIN., Florida State University
 Abstract/Description

Character displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing...
Show moreCharacter displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields., Consider the model, X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i),, where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance., Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.), Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratiosan important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns., We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested., Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8223194, 3085332, FSDT3085332, fsu:74827
 Format
 Document (PDF)
 Title
 TESTING WHETHER NEW IS BETTER THAN USED OF A SPECIFIED AGE.
 Creator

PARK, DONG HO., Florida State University
 Abstract/Description

This research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x ...
Show moreThis research contributes to the theory and methods of testing hypotheses for classes of life distributions. Two classes of life distributions considered in this dissertation are: (1) The New Better Than Used (NBU) Class: The life distribution F is NBU if F(x+y)(' )(LESSTHEQ)(' )F(x)F(y) for all x, y (GREATERTHEQ) 0, where F(' )(TBOND)(' )1  F. (2) The New Better Than Used at t(,0) (NBUt(,0)) Class: The life distribution F is NBUt(,0) if F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0., The NBU and NBUt(,0) classes have dual classes (New Worse Than Used and New Worse Than Used At t(,0), respectively) defined by reversing the inequality., The NBUt(,0) class is a new class of life distributions and contains the NBU class. We study the basic properties of the NBUt(,0) class and propose a test of H(,0): F(x+t(,0))(' )=(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0, versus H(,A): F(x+t(,0))(' )(LESSTHEQ)(' )F(x)F(t(,0)) for all x (GREATERTHEQ) 0 and the inequality holds for some x (GREATERTHEQ) 0, based on a complete random sample X(,1), ..., X(,n) from F. Our test can also be used to test H(,0) against the NWUt(,0) alternatives. Asymptotic relative efficiencies of our test with respect to the Hollander and Proschan (1972, Ann. Math. Statist. 43, 11361146) NBU test are calculated for several distributions., We extend our test of H(,0) versus H(,A) to accommodate randomly censored data. For the censored data situation our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where F is the KaplanMeier (1958, J. Amer. Statist. Assoc. 53, 457481) estimator of(' )F. Under mild regularity conditions on the amount of censoring, a consistent test of H(,0) versus H(,A) for the randomly censored model is obtained., In Chapter III we develop a twosample NBU test of the null hypothesis that two distributions F and G are equal, versus the alternative that F is "more NBU" than is G. Our test is based on the statistic, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m and n are the sample sizes from F and G, and F(,m) and G(,n) are the empirical distributions of F and G. Asymptotic normality of T(,m,n), suitably normalized, is a direct consequence of Hoeffding's (1948, Ann. Math. Statist. 19, 293325) Ustatistic theorem. Then, using a consistent estimator of the null asymptotic variance of N(' 1/2)T(,m,n), where N = m + n, we obtain an asymptotically distributionfree test. We extend the twosample NBU test to the ksample case., Our test of H(,0) versus H(,A) utilizes the KaplanMeier estimator. However, there are other possible estimators of the survival function for the randomly censored model. . . . (Author's abstract exceeds stipulated maximum length. Discontinued here with permission of author.) UMI
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8301540, 3085466, FSDT3085466, fsu:74958
 Format
 Document (PDF)
 Title
 TESTING WHETHER MEAN RESIDUAL LIFE CHANGES TREND.
 Creator

GUESS, FRANK MITCHELL., Florida State University
 Abstract/Description

Given that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" ...
Show moreGiven that an item is of age t, the expected value of the random remaining life is called the mean residual life (MRL) at age t. We propose two new nonparametric classes of life distributions for modeling aging based on MRL. The first class of life distributions consists of those with "increasing initially, then decreasing mean residual life" (IDMRL). The IDMRL class models aging that is initially beneficial, then adverse. The second class, "decreasing, then increasing mean residual life" (DIMRL), models aging that is initially adverse, then beneficial. We present situations where IDMRL (DIMRL) distributions are useful models. We propose two testing procedures for H(,0): constant MRL (i.e., exponentiality) versus H(,1): IDMRL, but not constant MRL (or H(,1)(''): DIMRL, but not constant MRL). The first testing procedure assumes the turning point, (tau), from IMRL to DMRL is specified by the user or is known. Our IDMRL((tau)) test statistic, T(,n), is a differentiable statistical function of order 1; thus, T(,n), suitably standardized is asymptotically normal. The second procedure assumes knowledge of the proportion, (rho), of the population that "dies" at or before the turning point (knowledge of (tau) itself is not assumed). We use Lstatistic theory to show our IDMRL((rho)) test statistic, V(,n)('*), appropriately standardized is asymptotically normal. The exact null distribution of V(,n)('*) is established. For each of these procedures an application is given. After this we modify the complete data tests to yield analogous censored data procedures. The standard KaplanMeier Estimator is a key tool that we exploit for our censored data tests. A limited Monte Carlo study investigates the censored data procedures.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8428699, 3085942, FSDT3085942, fsu:75428
 Format
 Document (PDF)
 Title
 Testing for a timedependent covariate effect in the linear risk model.
 Creator

Amirsehi, Kourosh., Florida State University
 Abstract/Description

We propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the...
Show moreWe propose two tests to identify a time dependent covariate effect in the partly parametric linear risk model, and derive asymptotic distributions of the test statistics under the assumption that the covariate effect of interest is constant. One of the asymptotic distributions depends on unknown functions and we devise a weighted bootstrap procedure to estimate its quantiles. We also derive rates of convergence of maximum likelihood estimators of regression coefficients in both the nonparametric and the partly parametric linear risk models using the method of sieves. We carry a simulation study to assess the performance of the proposed test and apply it to real data from a clinical trial on myelomatosis.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9620872, 3088860, FSDT3088860, fsu:77659
 Format
 Document (PDF)
 Title
 A Study of the Asymptotic Properties of Lasso Estimates for Correlated Data.
 Creator

Gupta, Shuva, Bunea, Florentina, Gert, Joshua, Hollander, Myles, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors...
Show moreIn this thesis we investigate postmodel selection properties of L1 penalized weighted least squares estimators in regression models with a large number of variables M and correlated errors. We focus on correct subset selection and on the asymptotic distribution of the penalized estimators. In the simple case of AR(1) errors we give conditions under which correct subset selection can be achieved via our procedure. We then provide a detailed generalization of this result to models with errors that have a weakdependency structure (Doukhan 1996). In all cases, the number M of regression variables is allowed to exceed the sample size n. We further investigate the asymptotic distribution of our estimates, when M < n, and show that under appropriate choices of the tuning parameters the limiting distribution is multivariate normal. This generalizes to the case of correlated errors the result of Knight and Fu (2000), obtained for regression models with independent errors.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3896
 Format
 Thesis
 Title
 Structural Health Monitoring with LambWave Sensors: Problems in Damage Monitoring, Prognostics and Multisensory Decision Fusion.
 Creator

Mishra, Spandan, Vanli, Omer Arda, Okoli, Okenwa, Jung, Sungmoon, Park, Chiwoo, Florida State University, FAMUFSU College of Engineering, Department of Industrial and...
Show moreMishra, Spandan, Vanli, Omer Arda, Okoli, Okenwa, Jung, Sungmoon, Park, Chiwoo, Florida State University, FAMUFSU College of Engineering, Department of Industrial and Manufacturing Engineering
Show less  Abstract/Description

Carbon ﬁber reinforced composites (CFRC) have several desirable traits that can be exploited in the design of advanced structures and systems. The applications requiring high strength toweight ratio and high stiﬀnesstoweight ratio such as, fuselage of airplanes, wind turbine blades, waterboats etc. have found profound use of CFRC. Furthermore, low density, good vibration damping ability, easy manufacturability, carbon ﬁber’s electrical conductivity, as well as high thermal conductivity...
Show moreCarbon ﬁber reinforced composites (CFRC) have several desirable traits that can be exploited in the design of advanced structures and systems. The applications requiring high strength toweight ratio and high stiﬀnesstoweight ratio such as, fuselage of airplanes, wind turbine blades, waterboats etc. have found profound use of CFRC. Furthermore, low density, good vibration damping ability, easy manufacturability, carbon ﬁber’s electrical conductivity, as well as high thermal conductivity and smooth surface ﬁnish provide additional beneﬁts to the users. Various applications of CFRC can be relevant for aerospace, military, windturbines, robotics, sports equipment etc. However, among many advantages of CFRC there are a few disadvantages; CFRC undergo completely diﬀerent failure patterns compared to metals. Once the yield strength is exceeded, CFRC will fail suddenly and catastrophically. The inherent anisotropic nature of CFRC makes it very diﬃcult for traditional condition monitoring methods to assess the condition of the structure. The complex failure patterns, including delamination, microcracks, and matrixcracks require specialized sensing and monitoring schemes for composite structure. This Ph.D. research is focuses on developing an integrated structural health monitoring methodology for damage monitoring, remaining useful life estimation (RUL), and decision fusion using Lambwave data. The main objective of this research is to develop an integrated damage detection method that utilizes Lambwave sensor data to infer the state of the damage condition and make an accurate prognosis of the structure. Slow fatigue loading results in very unique failure patterns in the CFRC structures, fatigue damage ﬁrst manifests itself as ﬁberbreakage and then slowly progresses to matrixcracks and that ultimately leads to delamination damage. This type of failure process is very diﬃcult to monitor using the traditionally used damage monitoring methods such as Xray evaluation, ultrasonic evaluation, infrared evaluation etc. For this research, we have used principal component (PC) based multivariate cumulative sum (MCUSUM) to monitor the structure. MCUSUM chart is very useful when monitoring structures undergoing slow and gradual change. For remainingusefullife (RUL) estimation, we have proposed to use the Wiener process model coupled with principal component regression (PCR). For damage detection/classiﬁcation we studied discriminant analysis, inspite of the popular use in image analysis and in the gene data classiﬁcation problem, has not been widely used for damage classiﬁcation. In this research, we showed that discriminant analysis is a useful detecting known damage modes, while dealing with the high dimensionality of Lambwave data. We modiﬁed the standard Gaussian discriminant analysis by introducing regularization parameters to directly process raw Lambwave data without requiring an intermediate feature extraction step.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Mishra_fsu_0071E_13346
 Format
 Thesis
 Title
 STOCHASTIC VERSIONS OF REARRANGEMENT INEQUALITIES WITH APPLICATIONS TO STATISTICS.
 Creator

D'ABADIE, CATHERINE ANNE., Florida State University
 Abstract/Description

In this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and...
Show moreIn this dissertation we develop a theory which offers a unified approach to the problem of obtaining stochastic versions of deterministic rearrangement inequalities., To develop the theory we first define two new classes of functions and establish preservation properties of these functions under various statistical and mathematical operations., Next we introduce the notion of stochastically similarly arranged (SSA) pairs of random vectors. We prove that if the random vectors (X,Y) are SSA and the function f from R('n) x R('n) into R('n) is monotone with respect to a certain partial ordering on R('n) x R('n) then for every permutation (pi) the stochastic inequalities, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), hold. This result yields a unified way of obtaining stochastic versions of rearrangement inequalities., We then show that many multivariate densities of interest in statistical practice govern pairs of random vectors which are SSA., Next we show that under certain statistical operations on pairs of SSA random vectors the property of being SSA is preserved. For example, we show that the rank order of SSA random variables is SSA. We also show that the SSA property is preserved under certain contamination models., Finally, we show how the results we obtain can be applied to problems in hypothesis testing.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8205717, 3085181, FSDT3085181, fsu:74676
 Format
 Document (PDF)
 Title
 Stochastic Models and Inferences for Commodity Futures Pricing.
 Creator

Ncube, Moeti M., Srivastava, Anuj, Doran, James, Mason, Patrick, Niu, Xufeng, Huﬀer, Fred, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis...
Show moreThe stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis, we propose an alternative Kalman Smoother Expectation Maximization procedure (KSEM) that can jointly estimate all the parameters and produces better model t that compared to alternative estimation procedures. Further, we consider the additional complexity of the modeling of jumps or spikes that may occur in a time series. For this calibration we develop a Particle Smoother Expectation Maximization procedure (PSEM) for the optimization of nonlinear systems. This is an entirely new estimation approach, and we provide several examples of it's application.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2707
 Format
 Thesis
 Title
 Statistical Shape Analysis on Manifolds with Applications to Planar Contours and Structural Proteomics.
 Creator

Ellingson, Leif A., Patrangenaru, Vic, Mio, Washington, Zhang, Jinfeng, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so...
Show moreThe technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the threedimensional atomic structures of protein binding sites. First, we adapt Kendall's definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezold's nonparametric view of Frechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of HilbertSchmidt operators in order to obtain the VeroneseWhitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a onesample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using extrinsic reflection sizeandshape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0053
 Format
 Thesis
 Title
 Statistical Models on Human Shapes with Application to Bayesian Image Segmentation and Gait Recognition.
 Creator

Kaziska, David M., Srivastava, Anuj, Mio, Washington, Chicken, Eric, Wegkamp, Marten, Department of Statistics, Florida State University
 Abstract/Description

In this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal...
Show moreIn this dissertation we develop probability models for human shapes and apply those probability models to the problems of image segmentation and human identi_cation by gait recognition. To build probability models on human shapes, we consider human shape to be realizations of random variables on a space of simple closed curves and a space of elastic curves. Both of these spaces are quotient spaces of in_nite dimensional manifolds. Our probability models arise through Tangent Principal Component Analysis, a method of studying probability models on manifolds by projecting them onto a tangent plane to the manifold. Since we put the tangent plane at the Karcher mean of sample shapes, we begin our study by examining statistical properties of Karcher means on manifolds. We derive theoretical results for the location of Karcher means on certain manifolds, and perform a simulation study of properties of Karcher means on our shape space. Turning to the speci_c problem of distributions on human shapes we examine alternatives for probability models and _nd that kernel density estimators perform well. We use this model to sample shapes and to perform shape testing. The _rst application we consider is human detection in infrared images. We pursue this application using Bayesian image segmentation, in which our proposed human in an image is a maximum likelihood estimate, obtained using a prior distribution on human shapes and a likelihood arising from a divergence measure on the pixels in the image. We then consider human identi_cation by gait recognition. We examine human gait as a cyclostationary process on the space of elastic curves and develop a metric on processes based on the geodesic distance between sequences on that space. We develop and demonstrate a framework for gait recognition based on this metric, which includes the following elements: automatic detection of gait cycles, interpolation to register gait cycles, computation of a mean gait cycle, and identi_cation by matching a test cycle to the nearest member of a training set. We perform the matching both by an exhaustive search of the training set and through an expedited method using clusterbased trees and boosting.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3275
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 A Statistical Approach to an Ocean Circulation Inverse Problem.
 Creator

Choi, Seoeun, Huﬀer, Fred W., Speer, Kevin G., Nolder, Craig, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the...
Show moreThis dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the prior, the equation of the Stommel Gulf Stream model on a twodimensional grid is used.Comparisons with earlier approaches used by oceanographers are also presented.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd3758
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Statistical Analysis of Trajectories on Riemannian Manifolds.
 Creator

Su, Jingyong, Srivastava, Anuj, Klassen, Erik, Huffer, Fred, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

This thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete...
Show moreThis thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete at unsynchronized times. Second, trajectories are observed under arbitrary temporal evolutions. In this work, we first address the problem of estimating full smooth trajectories on nonlinear manifolds using only a set of timeindexed points, for use in interpolation, smoothing, and prediction of dynamic systems. Furthermore, we study statistical analysis of trajectories that take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal evolutions. The problem of analyzing such temporal trajectories including registration, comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity that provides both a cost function for temporal registration and a proper distance for comparison of trajectories. This distance, in turn, is used to define statistical summaries, such as the sample means and covariances, of given trajectories and Gaussiantype models to capture their variability. Both theoretical proofs and experimental results are provided to validate our work. The problems of detecting, classifying, and estimating shapes in point cloud data are important due to their general applicability in image analysis, computer vision, and graphics. They are challenging because the data is typically noisy, cluttered, and unordered. We study these problems using a fully statistical model where the data is modeled using a Poisson process on the objects boundary (curves or surfaces), corrupted by additive noise and a clutter process. Using likelihood functions dictated by the model, we develop a generalized likelihood ratio test for detecting a shape in a point cloud. Additionally, we develop a procedure for estimating most likely shapes in observed point clouds under given shape hypotheses. We demonstrate this framework using examples of 2D and 3D shape detection and estimation in both real and simulated data, and a usage of this framework in shape retrieval from a 3D shape database.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7619
 Format
 Thesis
 Title
 Spatiotemporal Bayesian Hierarchical Models, with Application to Birth Outcomes.
 Creator

Norton, Jonathan D. (Jonathan David), Niu, Xufeng, Eberstein, Isaac, Huﬀer, Fred, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

A class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear...
Show moreA class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear predictor. Temporal dependence is incorporated by including a temporal IAR term also. It is shown that the variance parameters underlying these random effects (IAR, convolution, convolution plus temporal IAR) are identifiable. The same results are also shown to hold when the IAR is replaced by a conditional autoregressive (CAR) model. Furthermore, properties of the CAR parameter ρ are explored. The Deviance Information Criterion (DIC) is considered as a way to compare spatial hierarchical models. Simulations are performed to test whether the DIC can identify whether binomial outcomes come from an IAR, an IAR convolution, or independent normal deviates. Having established the theoretical foundations of the class of models and validated the DIC as a means of comparing models, we examine preterm birth and low birth weight counts in the state of Arkansas from 1994 to 2005. We find that preterm birth and low birth weight have different spatial patterns of risk, and that rates of low birth weight can be fit with a strikingly simple model that includes a constant spatial effect for all periods, a linear trend, and three covariates. It is also found that the risks of each outcome are increasing over time, even with adjustment for covariates.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd2523
 Format
 Thesis
 Title
 Sparse Factor AutoRegression for Forecasting Macroeconomic Time Series with Very Many Predictors.
 Creator

Galvis, Oliver Kurt, She, Yiyuan, Okten, Giray, Beaumont, Paul, Huﬀer, Fred, Tao, Minjing, Department of Statistics, Florida State University
 Abstract/Description

Forecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are...
Show moreForecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are meaningful and advantageous for accuracy of the forecasting task. In macroeconomics, multiple categories are available with the target series belonging to one of them. With all series available we advocate constructing category level factors to enhance the performance of the forecasting task. We introduce a novel methodology, the Sparse Factor AutoRegression (SFAR) methodology, to construct predictive factors from a reduced set of relevant time series. SFAR attains dimension reduction via joint variable selection and rank reduction in high dimensional time series data. A multivariate setting is used to achieve simultaneous low rank and cardinality control on the matrix of coefficients where $ell_{0}$constraint regulates the number of useful series and the rank constrain elucidates the upper bound for constructed factors. The doublyconstrained matrix is a nonconvex mathematical problem optimized via an efficient iterative algorithm with a theoretical guarantee of convergence. SFAR fits factors using a sparse low rank matrix in response to a target category series. Forecasting is then performed using lagged observations and shrinkage methods. We generate a finite sample data to verify our theoretical findings via a comparative study of the SFAR. We also analyze realworld macroeconomic time series data to demonstrate the usage of the SFAR in practice.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8990
 Format
 Thesis
 Title
 SOME RESULTS ON THE DISTRIBUTION OF GRUBBS ESTIMATORS.
 Creator

BRINDLEY, DENNIS ALFRED., Florida State University
 Abstract/Description

This dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,...
Show moreThis dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), and the (epsilon)(,ij) are independent, zeromean, normal variates with, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), A set of unbiased estimates, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), developed in earlier work by Grubbs (J. AMER. STATIST. ASSOC. 43 (1948), 243264), Ehrenberg (BIOMETRIKA 37 (1950), 347357) and Russell and Bradley (BIOMETRIKA 45 (1958), 111129) are considered., The exact joint density of Q(,1), ..., Q(,r) is obtained for r = 3 and two exact results are derived for testing the null hypothesis,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), unknown, versus the two specific alternatives,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), for at least some j, j = 1, 2, 3, and,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8229146, 3085401, FSDT3085401, fsu:74896
 Format
 Document (PDF)
 Title
 Some New Methods for Design and Analysis of Survival Data.
 Creator

Wang, Wenting, Sinha, Debajyoti, Arjmandi, Bahram H., McGee, Dan, Niu, Xufeng, Yu, Kai, Department of Statistics, Florida State University
 Abstract/Description

For survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival...
Show moreFor survival outcomes, usually, statistical equivalent tests to show a new treatment therapeutically equivalent to a standard treatment are based on the Cox (1972) proportional hazards assumption. We present an alternative method based on the linear transformation model (LTM) for two treatment arms, and show the advantages of using this equivalence test instead of tests based on the Cox's model. LTM is a very general class of models including models such as the proportional odds survival model (POSM). We presented a sufficient condition to check whether logrank based tests have inflated Type I error rates. We show that POSM and some other commonly used survival models within the LTM class all satisfy this condition. Simulation studies show that repeated use of our test instead of using logrank based tests will be a safer statistical practice. Our second goal is to develop a practical Bayesian model for survival data with high dimensional covariate vector. We develop the Information Matrix (IM) and Information Matrix Ridge (IMR) priors for commonly used survival models including the Cox's model and the cure rate model proposed by Chen et al. (1999), and examine many desirable theoretical properties including sufficient conditions for the existence of the moment generating functions for these priors and corresponding posterior distributions. The performance of these priors in practice is compared with some competing priors via the Bayesian analysis of a study that investigates the relationship between lung cancer survival time and a large number of genetic markers.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1248
 Format
 Thesis
 Title
 Semiparametric Survival Analysis Using Models with LogLinear Median.
 Creator

Lin, Jianchang, Sinha, Debajyoti, Zhou, Yi, Lipsitz, Stuart, McGee, Dan, Niu, XuFeng, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

First, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our...
Show moreFirst, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our modeling techniques facilitate the ease of prior elicitation and computation for both parametric and semiparametric Bayesian analysis of survival data. We illustrate the advantages of our modeling, as well as model diagnostics, via reanalysis of a smallcell lung cancer study. Results of our simulation study provide further guidance regarding appropriate modelling in practice. Our second goal is to develop the methods of analysis and associated theoretical properties for interval censored and current status survival data. These new regression models use loglinear regression function for the median. We present frequentist and Bayesian procedures for estimation of the regression parameters. Our model is a useful and practical alternative to the popular semiparametric models which focus on modeling the hazard function. We illustrate the advantages and properties of our proposed methods via reanalyzing a breast cancer study. Our other aim is to develop a model which is able to account for the heteroscedasticity of response, together with robust parameter estimation and outlier detection using sparsity penalization. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing median lasso regression model. Considering the estimation bias, mean squared error and other identication benchmark measures, our proposed model performs better than the competing frequentist estimator.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4992
 Format
 Thesis
 Title
 The Risk of Lipids on Coronary Heart Disease: Prognostic Models and MetaAnalysis.
 Creator

Almansour, Aseel, McGee, Daniel, Flynn, Heather, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Prognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but...
Show morePrognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but all included one or more of the lipid determinations: Total cholesterol (TC). Low Density Lipoproteins (LDL), High Density Lipoproteins (HDL), or ratios TC/HDL and LDL/HDL. None of these researchers included both LDL and TC in the same model due to the high correlation between these measurements. In this thesis we will examine some questions about the inclusion of lipid determinations in prognostic models: Can the effect of LDL and TC on the risk of dying from CHD be differentiated? If one measure is demonstrably stronger than the other, then a single model using that variable would be considered advantageous. Is it possible to derive a single measure from TC and LDL that is a stronger predictor than either measure? If so, then a new summarization of the lipid measurements should be used in prognostic modeling. Does the addition of HDL to a prognostic model improve the predictive accuracy of the model? If it does, then this determination that is almost universally determined should be used when developing prognostic models. We use data from nine independent studies to examine these issues. The studies were chosen because they include longitudinal followup of participants and included lipid determinations in the baseline examination of participants. There are many methodologies available for developing prognostic models, including logistic regression and the proportional hazards model. We used the proportional hazards model since we have followup times and times to death from CHD on all of the participants in the included studies. We summarized our results using a metaanalytic approach. Using the metaanalytic approach, we addressed the additional question of whether the results vary significantly among the different studies and also whether adding additional characteristics to the prognostic models changes the estimated effect of the lipid determinations. All of our results are presented stratified by gender and, when appropriate, by race. Finally, because our studies were not selected randomly, we also examined whether there is evidence of bias in our metaanalyses. For this examination we used funnel plots with related methodology for testing whether there is evidence of bias in the results.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8724
 Format
 Thesis
 Title
 Riemannian Shape Analysis of Curves and Surfaces.
 Creator

Kurtek, Sebastian, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Huﬀer, Fred, Dryden, Ian, Department of Statistics, Florida State University
 Abstract/Description

Shape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the...
Show moreShape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the analysis must be performed on infinitedimensional and sometimes nonlinear spaces, which poses an additional difficulty. In this work, we develop and apply methods which address these issues. We begin by defining a framework for shape analysis of parameterized open curves and extend these ideas to shape analysis of surfaces. We utilize the presented frameworks in various classification experiments spanning multiple application areas. In the case of curves, we consider the problem of clustering DTMRI brain fibers, classification of protein backbones, modeling and segmentation of signatures and statistical analysis of biosignals. In the case of surfaces, we perform disease classification using 3D anatomical structures in the brain, classification of handwritten digits by viewing images as quadrilateral surfaces, and finally classification of cropped facial surfaces. We provide two additional extensions of the general shape analysis frameworks that are the focus of this dissertation. The first one considers shape analysis of marked spherical surfaces where in addition to the surface information we are given a set of manually or automatically generated landmarks. This requires additional constraints on the definition of the reparameterization group and is applicable in many domains, especially medical imaging and graphics. Second, we consider reflection symmetry analysis of planar closed curves and spherical surfaces. Here, we also provide an example of disease detection based on brain asymmetry measures. We close with a brief summary and a discussion of open problems, which we plan on exploring in the future.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4963
 Format
 Thesis
 Title
 A Riemannian Framework for Annotated Curves Analysis.
 Creator

Liu, Wei, Srivastava, Anuj, Zhang, Jinfeng, Klassen, Eric P., Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

We propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions...
Show moreWe propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions using both geometric and auxiliary coordinates, and analyze shapes of these curves. The difficulty comes from the need for removing different groups from different components: the shape is invariant to rigidmotion, global scale and reparameterization while the auxiliary component is usually invariant only to the reparameterization. Thus, the removal of some transformations (rigid motion and global scale) is restricted only to the geometric coordinates, while the reparameterization group is removed for all coordinates. We demonstrate this framework using a number of experiments.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd4997
 Format
 Thesis
 Title
 Ridge regression: Application to educational data.
 Creator

Churngchow, Chidchanok., Florida State University
 Abstract/Description

Ridge regression is a type of regression technique which was developed to remedy the problem of multicollinearity in regression analysis. The major problem with multicollinearity is that it causes high variances in the estimation of regression coefficients. The ridge model introduces some bias into the regression equation in order to reduce the variance of the estimators. The purposes of this study were to demonstrate the application of the ridge regression model to educational data and to...
Show moreRidge regression is a type of regression technique which was developed to remedy the problem of multicollinearity in regression analysis. The major problem with multicollinearity is that it causes high variances in the estimation of regression coefficients. The ridge model introduces some bias into the regression equation in order to reduce the variance of the estimators. The purposes of this study were to demonstrate the application of the ridge regression model to educational data and to compare the characteristics and performance of the ridge method and the least squares method. In this study, four types of ridge were compared to the least squares method. They were ridge trace, generalized, ordinary and directed ridge., The sample of this study consisted of 141 public schools in Dade County, Florida. The dependent variable was the students' average scores in mathematical computation and reading comprehension. Six variables representing teacher and student characteristics were employed as the predictors. The performance of ridge and the least squares were compared in terms of the confidence interval of an individual estimator and predictive accuracy for the whole model. Since the statistical inference for the ridge method has not been completely developed, the bootstrap technique with a sample size of twenty, was used to calculate the confidence interval of each estimator., The study resulted in a successful application of ridge regression to school level data in which it was found that (1) ridge regression yielded a smaller confidence interval for every estimated regression coefficient and (2) ridge regression produced higher predictive accuracy than ordinary least squares., Since the results were just based on one particular set of data, it cannot be guaranteed that ridge always outperforms the least squares method in all cases.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8805652, 3086742, FSDT3086742, fsu:76217
 Format
 Document (PDF)
 Title
 Revealing Sparse Signals in Functional Data.
 Creator

Ivanescu, Andrada E. (Andrada Eugenia), Bunea, Florentina, Wegkamp, Marten, Gert, Joshua, Niu, Xufeng, Hollander, Myles, Department of Statistics, Florida State University
 Abstract/Description

My dissertation presents a novel statistical method to estimate a sparse signal in functional data and to construct confidence bands for the signal. Existing methods for inference for the mean function in this framework include smoothing splines and kernel estimates. Our methodology involves thresholding a least squares estimator, and the threshold level depends on the sources of variability that exist in this type of data. The proposed estimation method and the confidence bands successfully...
Show moreMy dissertation presents a novel statistical method to estimate a sparse signal in functional data and to construct confidence bands for the signal. Existing methods for inference for the mean function in this framework include smoothing splines and kernel estimates. Our methodology involves thresholding a least squares estimator, and the threshold level depends on the sources of variability that exist in this type of data. The proposed estimation method and the confidence bands successfully adapt to the sparsity of the signal. We present supporting evidence through simulations and applications to real datasets.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd3852
 Format
 Thesis