Current Search: Research Repository (x) » Statistics (x) » Probabilities (x)
Search results
Pages
 Title
 2D Affine and Projective Shape Analysis, and Bayesian Elastic Active Contours.
 Creator

Bryner, Darshan W., Srivastava, Anuj, Klassen, Eric, Gallivan, Kyle, Huffer, Fred, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

An object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian...
Show moreAn object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian framework for shape analysis where metrics and related quantities are invariant to larger groups, the affine and projective groups, that approximate such transformations that arise from perspective skews. Highlighting two possibilities for representing object boundaries  ordered points (or landmarks) and parametrized curves  we study different combinations of these representations (points and curves) and transformations (affine and projective). Specifically, we provide solutions to three out of four situations and develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussiantype statistical models, and classifying test shapes using such models learned from training data. In the case of parametrized curves, an added issue is to obtain invariance to the reparameterization group. The geodesics are constructed by particularizing the pathstraightening algorithm to geometries of current manifolds and are used, in turn, to compute shape statistics and Gaussiantype shape models. We demonstrate these ideas using a number of examples from shape and activity recognition. After developing such Gaussiantype shape models, we present a variational framework for naturally incorporating these shape models as prior knowledge in guidance of active contours for boundary extraction in images. This socalled Bayesian active contour framework is especially suitable for images where boundary estimation is difficult due to low contrast, low resolution, and presence of noise and clutter. In traditional active contour models curves are driven towards minimum of an energy composed of image and smoothing terms. We introduce an additional shape term based on shape models of prior known relevant shape classes. The minimization of this total energy, using iterated gradientbased updates of curves, leads to an improved segmentation of object boundaries. We demonstrate this Bayesian approach to segmentation using a number of shape classes in many imaging scenarios including the synthetic imaging modalities of SAS (synthetic aperture sonar) and SAR (synthetic aperture radar), which are notoriously difficult to obtain accurate boundary extractions. In practice, the training shapes used for priorshape models may be collected from viewing angles different from those for the test images and thus may exhibit a shape variability brought about by perspective effects. Therefore, by allowing for a prior shape model to be invariant to, say, affine transformations of curves, we propose an active contour algorithm where the resulting segmentation is robust to perspective skews.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8534
 Format
 Thesis
 Title
 Age Effects in the Extinction of Planktonic Foraminifera: A New Look at Van Valen's Red Queen Hypothesis.
 Creator

Wiltshire, Jelani, Huﬀer, Fred, Parker, William, Chicken, Eric, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic...
Show moreVan Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic Foraminifera (Foram) extinction data include Weibull and Exponential modeling (Parker and Arnold, 1997), and Cox proportional hazards modeling (Doran et al. 2004,2006). I propose a general class of test statistics that can be used to test for the effect of age on extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead I control for covariate effects by pairing or grouping together similar species. I use simulated data sets to compare the power of the statistics. In applying the test statistics to the Foram data, I have found age to have a positive effect on extinction.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0952
 Format
 Thesis
 Title
 Algorithmic Lung Nodule Analysis in Chest Tomography Images: Lung Nodule Malignancy Likelihood Prediction and a Statistical Extension of the Level Set Image Segmentation Method.
 Creator

Hancock, Matthew C. (Matthew Charles), Magnan, Jeronimo Francisco, Duke, D. W., Hurdal, Monica K., Mio, Washington, Florida State University, College of Arts and Sciences,...
Show moreHancock, Matthew C. (Matthew Charles), Magnan, Jeronimo Francisco, Duke, D. W., Hurdal, Monica K., Mio, Washington, Florida State University, College of Arts and Sciences, Department of Mathematics
Show less  Abstract/Description

Lung cancer has the highest mortality rate of all cancers in both men and women in the United States. The algorithmic detection, characterization, and diagnosis of abnormalities found in chest CT scan images can aid radiologists by providing additional medicallyrelevant information to consider in their assessment of medical images. Such algorithms, if robustly validated in clinical settings, carry the potential to improve the health of the general population. In this thesis, we first give an...
Show moreLung cancer has the highest mortality rate of all cancers in both men and women in the United States. The algorithmic detection, characterization, and diagnosis of abnormalities found in chest CT scan images can aid radiologists by providing additional medicallyrelevant information to consider in their assessment of medical images. Such algorithms, if robustly validated in clinical settings, carry the potential to improve the health of the general population. In this thesis, we first give an analysis of publicly available chest CT scan annotation data, in which we determine upper bounds on expected classification accuracy when certain radiological features are used as inputs to statistical learning algorithms for the purpose of inferring the likelihood of a lung nodule as being either malignant or benign. Second, a statistical extension of the level set method for image segmentation is introduced and applied to both syntheticallygenerated and real threedimensional image volumes of lung nodules in chest CT scans, obtaining results comparable to the current stateoftheart on the latter.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Hancock_fsu_0071E_14427
 Format
 Thesis
 Title
 Analysis of crossclassified data using negative binomial models.
 Creator

Ramakrishnan, Viswanathan., Florida State University
 Abstract/Description

Several procedures are available for analyzing crossclassified data under the Poisson model. When data suggest the presence of "nonPoisson" variation an alternative model is desirable. Often a negative binomial model is useful as an alternative. In this dissertation methodology for analyzing data under a twoparameter negative binomial model is provided. A conditional likelihood approach is suggested to simplify estimation and inference procedures. Large sample properties of the conditional...
Show moreSeveral procedures are available for analyzing crossclassified data under the Poisson model. When data suggest the presence of "nonPoisson" variation an alternative model is desirable. Often a negative binomial model is useful as an alternative. In this dissertation methodology for analyzing data under a twoparameter negative binomial model is provided. A conditional likelihood approach is suggested to simplify estimation and inference procedures. Large sample properties of the conditional likelihood approach are derived. Based on simulations these properties are examined for small samples. The suggested methodology is applied to two sets of data from ecological research studies.
Show less  Date Issued
 1989, 1989
 Identifier
 AAI9016503, 3161994, FSDT3161994, fsu:78193
 Format
 Document (PDF)
 Title
 An analysis of test reliability.
 Creator

Isaacson, Fenton R., Florida State University
 Abstract/Description

"The need for efficient means of testing has long been recognized. To obtain efficiency in testing requires the study of four attributes of the testing instrumentnamely: reliability, validity, interpretability and administrability. It is the purpose of this paper to examine in some detail the first of these attributes, reliability. In particular, this is an attempt to analyse the reliability of Mathematics 101 Test D which was administered at Florida State University in the fall of 1948"...
Show more"The need for efficient means of testing has long been recognized. To obtain efficiency in testing requires the study of four attributes of the testing instrumentnamely: reliability, validity, interpretability and administrability. It is the purpose of this paper to examine in some detail the first of these attributes, reliability. In particular, this is an attempt to analyse the reliability of Mathematics 101 Test D which was administered at Florida State University in the fall of 1948"Introduction.
Show less  Date Issued
 1949
 Identifier
 FSU_historic_AKP4870
 Format
 Thesis
 Title
 Bayesian Inference and Novel Models for Survival Data with Cured Fraction.
 Creator

Gupta, Cherry Chunqi Huang, Sinha, Debajyoti, Glueckauf, Robert L., Slate, Elizabeth H., Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of...
Show moreGupta, Cherry Chunqi Huang, Sinha, Debajyoti, Glueckauf, Robert L., Slate, Elizabeth H., Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Existing curerate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. They also do not allow inference on the change in the number of clonogens over time. This dissertation proposes two novel classes of curerate model, the transformbothsides curerate model (TBSCRM) and the clonogen proliferation curerate model (CPCRM). Both can be used to make inference about both the curerate and the survival...
Show moreExisting curerate survival models are generally not convenient for modeling and estimating the survival quantiles of a patient with specified covariate values. They also do not allow inference on the change in the number of clonogens over time. This dissertation proposes two novel classes of curerate model, the transformbothsides curerate model (TBSCRM) and the clonogen proliferation curerate model (CPCRM). Both can be used to make inference about both the curerate and the survival probabilities over time. The TBSCRM can also produce estimates of a patient's quantiles of survival time, and the CPCRM can produce estimates of a patient's expected number of clonogens at each time. We develop methods of Bayesian inference about the covariate effects on relevant quantities such as the curerate, methods which use Markov Chain Monte Carlo (MCMC) tools. We also show that the TBSCRMbased and CPCRMbased Bayesian methods perform well in simulation studies and outperform existing curerate models in application to the breast cancer survival data from the National Cancer Institute’s Surveillance, Epidemiology and End Results (SEER) database.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Gupta_fsu_0071E_13423
 Format
 Thesis
 Title
 Bayesian Methods for Skewed Response Including Longitudinal and Heteroscedastic Data.
 Creator

Tang, Yuanyuan, Sinha, Debajyoti, Pati, Debdeep, Flynn, Heather, She, Yiyuan, Lipsitz, Stuart, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

Skewed response data are very popular in practice, especially in biomedical area. We begin our work from the skewed longitudinal response without heteroscedasticity. We extend the skewed error density to the multivariate response. Then we study the heterocedasticity. We extend the transformbothsides model to the bayesian variable selection area to handle the univariate skewed response, where the variance of response is a function of the median. At last, we proposed a novel model to handle...
Show moreSkewed response data are very popular in practice, especially in biomedical area. We begin our work from the skewed longitudinal response without heteroscedasticity. We extend the skewed error density to the multivariate response. Then we study the heterocedasticity. We extend the transformbothsides model to the bayesian variable selection area to handle the univariate skewed response, where the variance of response is a function of the median. At last, we proposed a novel model to handle the skewed univariate response with a flexible heteroscedasticity. For longitudinal studies with heavily skewed continuous response, statistical model and methods focusing on mean response are not appropriate. In this paper, we present a partial linear model of median regression function of skewed longitudinal response. We develop a semiparametric Bayesian estimation procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We provide justifications for using our methods including theoretical investigation of the support of the prior, asymptotic properties of the posterior and also simulation studies of finite sample properties. Ease of implementation and advantages of our model and method compared to existing methods are illustrated via analysis of a cardiotoxicity study of children of HIV infected mother. Our second aim is to develop a Bayesian simultaneous variable selection and estimation of median regression for skewed response variable. Our hierarchical Bayesian model can incorporate advantages of $l_0$ penalty for skewed and heteroscedastic error. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing frequentist median lasso regression model. Considering the estimation bias and total square error, our proposed model performs as good as, or better than competing frequentist estimators. In biomedical studies, the covariates often affect the location, scale as well as the shape of the skewed response distribution. Existing biostatistical literature mainly focuses on the mean regression with a symmetric error distribution. While such modeling assumptions and methods are often deemed as restrictive and inappropriate for skewed response, the completely nonparametric methods may lack a physical interpretation of the covariate effects. Existing nonparametric methods also miss any easily implementable computational tool. For a skewed response, we develop a novel model accommodating a nonparametric error density that depends on the covariates. The advantages of our semiparametric associated Bayes method include the ease of prior elicitation/determination, an easily implementable posterior computation, theoretically sound properties of the selection of priors and accommodation of possible outliers. The practical advantages of the method are illustrated via a simulation study and an analysis of a reallife epidemiological study on the serum response to DDT exposure during gestation period.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7622
 Format
 Thesis
 Title
 Bayesian Modeling and Variable Selection for Complex Data.
 Creator

Li, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences,...
Show moreLi, Hanning, Pati, Debdeep, Huffer, Fred W. (Fred William), Kercheval, Alec N., Sinha, Debajyoti, Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

As we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the...
Show moreAs we routinely encounter highthroughput datasets in complex biological and environment research, developing novel models and methods for variable selection has received widespread attention. In this dissertation, we addressed a few key challenges in Bayesian modeling and variable selection for highdimensional data with complex spatial structures. a) Most Bayesian variable selection methods are restricted to mixture priors having separate components for characterizing the signal and the noise. However, such priors encounter computational issues in high dimensions. This has motivated continuous shrinkage priors, resembling the twocomponent priors facilitating computation and interpretability. While such priors are widely used for estimating highdimensional sparse vectors, selecting a subset of variables remains a daunting task. b) Spatial/spatialtemporal data sets with complex structures are nowadays commonly encountered in various scientific research fields ranging from atmospheric sciences, forestry, environmental science, biological science, and social science. Selecting important spatial variables that have significant influences on occurrences of events is undoubtedly necessary and essential for providing insights to researchers. Selfexcitation, which is a feature that occurrence of an event increases the likelihood of more occurrences of the same type of events nearby in time and space, can be found in many natural/social events. Research on modeling data with selfexcitation feature has increasingly drawn interests recently. However, existing literature on selfexciting models with inclusion of highdimensional spatial covariates is still underdeveloped. c) Gaussian Process is among the most powerful model frames for spatial data. Its major bottleneck is the computational complexity which stems from inversion of dense matrices associated with a Gaussian process covariance. Hierarchical divideconquer Gaussian Process models have been investigated for ultra large data sets. However, computation associated with scaling the distributing computing algorithm to handle a large number of subgroups poses a serious bottleneck. In chapter 2 of this dissertation, we propose a general approach for variable selection with shrinkage priors. The presence of very few tuning parameters makes our method attractive in comparison to ad hoc thresholding approaches. The applicability of the approach is not limited to continuous shrinkage priors, but can be used along with any shrinkage prior. Theoretical properties for nearcollinear design matrices are investigated and the method is shown to have good performance in a wide range of synthetic data examples and in a real data example on selecting genes affecting survival due to lymphoma. In Chapter 3 of this dissertation, we propose a new selfexciting model that allows the inclusion of spatial covariates. We develop algorithms which are effective in obtaining accurate estimation and variable selection results in a variety of synthetic data examples. Our proposed model is applied on Chicago crime data where the influence of various spatial features is investigated. In Chapter 4, we focus on a hierarchical Gaussian Process regression model for ultrahigh dimensional spatial datasets. By evaluating the latent Gaussian process on a regular grid, we propose an efficient computational algorithm through circulant embedding. The latent Gaussian process borrows information across multiple subgroups, thereby obtaining a more accurate prediction. The hierarchical model and our proposed algorithm are studied through simulation examples.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Li_fsu_0071E_14159
 Format
 Thesis
 Title
 Bayesian Models for Capturing Heterogeneity in Discrete Data.
 Creator

Geng, Junxian, Slate, Elizabeth H., Pati, Debdeep, Schmertmann, Carl P., Zhang, Xin, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Population heterogeneity exists frequently in discrete data. Many Bayesian models perform reasonably well in capturing this subpopulation structure. Typically, the Dirichlet process mixture model (DPMM) and a variable dimensional alternative that we refer to as the mixture of finite mixtures (MFM) model are used, as they both have natural byproducts of clustering derived from Polya urn schemes. The first part of this dissertation focuses on a model for the association between a binary...
Show morePopulation heterogeneity exists frequently in discrete data. Many Bayesian models perform reasonably well in capturing this subpopulation structure. Typically, the Dirichlet process mixture model (DPMM) and a variable dimensional alternative that we refer to as the mixture of finite mixtures (MFM) model are used, as they both have natural byproducts of clustering derived from Polya urn schemes. The first part of this dissertation focuses on a model for the association between a binary response and binary predictors. The model incorporates Boolean combinations of predictors, called logic trees, as parameters arising from a DPMM or MFM. Joint modeling is proposed to solve the identifiability issue that arises when using a mixture model for a binary response. Different MCMC algorithms are introduced and compared for fitting these models. The second part of this dissertation is the application of the mixture of finite mixtures model to community detection problems. Here, the communities are analogous to the clusters in the earlier work. A probabilistic framework that allows simultaneous estimation of the number of clusters and the cluster configuration is proposed. We prove clustering consistency in this setting. We also illustrate the performance of these methods with simulation studies and discuss applications.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Geng_fsu_0071E_13791
 Format
 Thesis
 Title
 Bayesian nonparametric estimation via Gibbs sampling for coherent systems with redundancy.
 Creator

Lawson, Kevin Lee., Florida State University
 Abstract/Description

We consider a coherent system S consisting of m independent components for which we do not know the distributions of the components' lifelengths. If we know the structure function of the system, then we can estimate the distribution of the system lifelength by estimating the distributions of the lifelengths of the individual components. Suppose that we can collect data under the 'autopsy model', wherein a system is run until a failure occurs and then the status (functioning or dead) of each...
Show moreWe consider a coherent system S consisting of m independent components for which we do not know the distributions of the components' lifelengths. If we know the structure function of the system, then we can estimate the distribution of the system lifelength by estimating the distributions of the lifelengths of the individual components. Suppose that we can collect data under the 'autopsy model', wherein a system is run until a failure occurs and then the status (functioning or dead) of each component is obtained. This test is repeated n times. The autopsy statistics consist of the age of the system at the time of breakdown and the set of parts that are dead by the time of breakdown. Using the structure function and the recorded status of the components, we then classify the failure time of each component. We develop a nonparametric Bayesian estimate of the distributions of the component lifelengths and then use this to obtain an estimate of the distribution of the lifelength of the system. The procedure is applicable to machinetest settings wherein the machines have redundant designs. A parametric procedure is also given.
Show less  Date Issued
 1994, 1994
 Identifier
 AAI9502812, 3088467, FSDT3088467, fsu:77272
 Format
 Document (PDF)
 Title
 A Bayesian Semiparametric Joint Model for Longitudinal and Survival Data.
 Creator

Wang, Pengpeng, Slate, Elizabeth H., Bradley, Jonathan R., Wetherby, Amy M., Lin, Lifeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Many biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed...
Show moreMany biomedical studies monitor both a longitudinal marker and a survival time on each subject under study. Modeling these two endpoints as joint responses has potential to improve the inference for both. We consider the approach of Brown and Ibrahim (2003) that proposes a Bayesian hierarchical semiparametric joint model. The model links the longitudinal and survival outcomes by incorporating the mean longitudinal trajectory as a predictor for the survival time. The usual parametric mixed effects model for the longitudinal trajectory is relaxed by using a Dirichlet process prior on the coefficients. A Cox proportional hazards model is then used for the survival time. The complicated joint likelihood increases the computational complexity. We develop a computationally efficient method by using a multivariate loggamma distribution instead of Gaussian distribution to model the data. We use Gibbs sampling combined with Neal's algorithm (2000) and the MetropolisHastings method for inference. Simulation studies illustrate the procedure and compare this loggamma joint model with the Gaussian joint models. We apply this joint modeling method to a human immunodeciency virus (HIV) data and a prostatespecific antigen (PSA) data.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Wang_fsu_0071E_15120
 Format
 Thesis
 Title
 BAYESIAN SOLUTIONS TO SOME CLASSICAL PROBLEMS OF STATISTICS.
 Creator

PEREIRA, CARLOS ALBERTO DE BRAGANCA., Florida State University
 Abstract/Description

Three of the basic questions of Statistics may be stated as follows: (A) Which portion of the data X is actually informative about the parameter of interest (theta)? (B) How can all the relevant information about (theta) provided by the data X be extracted? (C) What kind of information about (theta) do the data X possess?, The perspective of this dissertation is that of a Bayesian., Chapter I is essentially concerned with question A. The theory of conditional independence is explained and the...
Show moreThree of the basic questions of Statistics may be stated as follows: (A) Which portion of the data X is actually informative about the parameter of interest (theta)? (B) How can all the relevant information about (theta) provided by the data X be extracted? (C) What kind of information about (theta) do the data X possess?, The perspective of this dissertation is that of a Bayesian., Chapter I is essentially concerned with question A. The theory of conditional independence is explained and the relations between ancillarity, sufficiency, and statistical independence are discussed in depth. Some related concepts like specific sufficiency, bounded completeness, and splitting sets are also studied in some details. The language of conditional independence is used in the remaining Chapters., Chapter II deals with question B for the particular problem of analysing categorical data with missing entries. It is demonstrated how a suitably chosen prior for the frequency parameters can streamline the analysis in the presence of missing entries due to nonresponse or other causes. The two cases where the data follow the Multinomial or the Multivariate Hypergeometric model are treated separately. In the first case it is adequate to restrict the prior (for the cell probabilities) to the class of Dirichlet distributions. In the Hypergeometric case it is convenient to select a prior (for the cell population frequencies) from the class of DirichletMultinomial (DM) distributions. The DM distributions are studied in detail., Chapter III is directly related to question C. Conditions on the likelihood function and on the prior distribution are presented in order to assess the effect of the sample on the posterior distribution. More specifically, it is shown that under certain conditions, the larger the observations obtained, the larger (stochastically in terms of the posterior distribution) is the appropriate parameter., Finally, Chapter IV deals with the characterization of distributions in terms of Blackwell comparison of experiments. It is shown that a result (for the Hypergeometric model) obtained in Chapter II is actually a consequence of a property of complete families of distributions.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8108380, 3084857, FSDT3084857, fsu:74358
 Format
 Document (PDF)
 Title
 Bayesian Tractography Using Geometric Shape Priors.
 Creator

Dong, Xiaoming, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Diffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some...
Show moreDiffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some major cognitive diseases such as multiple sclerosis, schizophrenia, epilepsy, etc. There are lots of efforts have been put into this area. On the one hand, a vast spectrum of tractography algorithms have been developed in recent years, ranging from deterministic approaches through probabilistic methods to global tractography; On the other hand, various mathematical models, such as diffusion tensor, multitensor model, spherical deconvolution, Qball modeling, have been developed to better exploit the acquisition dependent signal of Diffusionweighted image(DWI). Despite considerable progress in this area, current methods still face many challenges, such as sensitive to noise, lots of false positive/negative fibers, incapable of handling complex fiber geometry and expensive computation cost. More importantly, recent researches have shown that, even with highquality data, the results using current tractography methods may not be improved, suggesting that it is unlikely to obtain an anatomically accurate map of the human brain solely based on the diffusion profile. Motivated by these issues, this dissertation develops a global approach that incorporates anatomical validated geometric shape prior when reconstructing neuron fibers. The fiber tracts between regions of interest are initialized and updated via deformations based on gradients of the posterior energy defined in this paper. This energy has contributions from diffusion data, shape prior information, and roughness penalty. The dissertation first describes and demonstrates the proposed method on the 2D dataset and then extends it to 3D Phantom data and the real brain data. The results show that the proposed method is relatively immune to issues such as noise, complicated fiber structure like fiber crossings and kissing, false positive fibers, and achieve more explainable tractography results.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_DONG_fsu_0071E_15144
 Format
 Thesis
 Title
 Building a Model Performance Measure for Examining Clinical Relevance Using Net Benefit Curves.
 Creator

Mukherjee, Anwesha, McGee, Daniel, Hurt, Myra M., Slate, Elizabeth H., Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

ROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the misclassification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the...
Show moreROC curves are often used to evaluate predictive accuracy of statistical prediction models. This thesis studies other measures which not only incorporate the statistical but also the clinical consequences of using a particular prediction model. Depending on the disease and population under study, the misclassification costs of false positives and false negatives vary. The concept of Decision Curve Analysis (DCA) takes this cost into account, by using the threshold probability (the probability above which a patient opts for treatment). Using the DCA technique, a Net Benefit Curve is built by plotting "Net Benefit", a function of the expected benefit and expected harm of using a model, by the threshold probability. Only the threshold probability range that is relevant to the disease and the population under study is used to plot the net benefit curve to obtain the optimum results using a particular statistical model. This thesis concentrates on the process of construction of a summary measure to find which predictive model yields highest net benefit. The most intuitive approach is to calculate the area under the net benefit curve. We examined whether the use of weights such as, the estimated empirical distribution of the threshold probability to compute the weighted area under the curve, creates a better summary measure. Real data from multiple cardiovascular research studies The Diverse Population Collaboration (DPC) datasets, is used to compute the summary measures: area under the ROC curve (AUROC), area under the net benefit curve (ANBC) and weighted area under the net benefit curve (WANBC). The results from the analysis are used to compare these measures to examine whether these measures are in agreement with each other and which would be the best to use in specified clinical scenarios. For different models the summary measures and its standard errors (SE) were calculated to study the variability in the measure. The method of metaanalysis is used to summarize these estimated summary measures to reveal if there is significant variability among these studies.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Mukherjee_fsu_0071E_14350
 Format
 Thesis
 Title
 A Class of Semiparametric Volatility Models with Applications to Financial Time Series.
 Creator

Chung, Steve S., Niu, XuFeng, Gallivan, Kyle, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied...
Show moreThe autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied and well documented in financial and econometric literature and many variants of ARCH/GARCH models have been proposed. To list a few, these include exponential GARCH(EGARCH), GJRGARHCH(or threshold GARCH), integrated GARCH(IGARCH), quadratic GARCH(QGARCH), and fractionally integrated GARCH(FIGARCH). The ARCH/GARCH models and their variant models have gained a lot of attention and they are still popular choice for modeling volatility. Despite their popularity, they suffer from model flexibility. Volatility is a latent variable and hence, putting a specific model structure violates this latency assumption. Recently, several attempts have been made in order to ease the strict structural assumptions on volatility. Both nonparametric and semiparametric volatility models have been proposed in the literature. We review and discuss these modeling techniques in detail. In this dissertation, we propose a class of semiparametric multiplicative volatility models. We define the volatility as a product of parametric and nonparametric parts. Due to the positivity restriction, we take the log and square transformations on the volatility. We assume that the parametric part is GARCH(1,1) and it serves as a initial guess to the volatility. We estimate GARCH(1,1) parameters by using conditional likelihood method. The nonparametric part assumes an additive structure. There may exist some loss of interpretability by assuming an additive structure but we gain flexibility. Each additive part is constructed from a sieve of Bernstein basis polynomials. The nonparametric component acts as an improvement for the parametric component. The model is estimated from an iterative algorithm based on boosting. We modified the boosting algorithm (one that is given in Friedman 2001) such that it uses a penalized least squares method. As a penalty function, we tried three different penalty functions: LASSO, ridge, and elastic net penalties. We found that, in our simulations and application, ridge penalty worked the best. Our semiparametric multiplicative volatility model is evaluated using simulations and applied to the six major exchange rates and SP 500 index. The results show that the proposed model outperforms the existing volatility models in both insample estimation and outofsample prediction.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8756
 Format
 Thesis
 Title
 Comparative mRNA Expression Analysis Leveraging Known Biochemical Interactions.
 Creator

Steppi, Albert Joseph, Zhang, Jinfeng, Sang, QingXiang, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

We present two studies incorporating existing biological knowledge into differential gene expression analysis that attempt to place the results within a broader biological context. The studies investigate breast cancer health disparity between differing ethnic groups by comparing gene expression levels in tumor samples from patients from different ethnic populations. We incorporate existing knowledge by making comparisons not just between individual genes, but between sets of related genes...
Show moreWe present two studies incorporating existing biological knowledge into differential gene expression analysis that attempt to place the results within a broader biological context. The studies investigate breast cancer health disparity between differing ethnic groups by comparing gene expression levels in tumor samples from patients from different ethnic populations. We incorporate existing knowledge by making comparisons not just between individual genes, but between sets of related genes and networks of interacting genes. In the first study, a comparison is made between mRNA expression patterns in Asian and Caucasian American breast cancer samples in an attempt to better understand why there are significantly lower breast cancer incidence and mortality rates in Asian Americans compared to Caucasian Americans. In the second study, the expression levels of genes related to drug and xenobiotic metabolizing enzymes (DXME) are compared between African, Asian, and Caucasian American breast cancer patients. The expression of genes related to these enzymes has been found to significantly affect drug clearance and the onset of drug resistance. Both studies found differentially expressed genes and pathways that may be associated with health disparities between the three ethnic populations. A thorough investigation of the literature was made in order to understand the context in which these differences in gene expression could affect the development and progression of breast tumors, and to identify genes and pathways that may be differentially expressed between the ethnic groups in general but not associated with breast cancer. Many of the relevant differences in gene expression were found to be linked to factors such as diet and differences in body composition. The process of finding relevant pathways and sets of interacting genes to inform comparative mRNA expression analysis can be laborious and time consuming. The literature is expanding at an exponential rate, and there is little hope for research groups to be able to keep up with all of the latest research. It is becoming more common for journals to require authors to make their results available in public databases, but many results concerning biochemical interactions are only accessible in unstructured text. Extracting relationships and interactions from the biological literature using techniques from machine learning and natural language processing is an important and growing field of research. To gain a better understanding of this field, we participated in the BioCreative VI Track 4 challenge, which involved classifying PubMed abstracts that contain examples of proteinprotein interactions that are affected by a mutation. We discuss the model we developed and the lessons learned while participating in the competition. The problem of acquiring sufficient quantities of quality labeled data is a great obstacle preventing the improvement of performance. We present a web application we are developing to streamline the annotation of entityentity interactions in text. It makes use of a database of known interactions to locate passages that are likely to be relevant and offers a simple and concise user interface to minimize the cognitive burden on the annotator.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Steppi_fsu_0071E_14522
 Format
 Thesis
 Title
 A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
 Creator

Delpish, Ayesha Nneka, Niu, XuFeng, Tate, Richard L., Huﬀer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
 Abstract/Description

The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chisquared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chisquared distribution, particularly for the variancecovariance component estimates.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0771
 Format
 Thesis
 Title
 A comparison of robust and least squares regression models using actual and simulated data.
 Creator

Gilbert, Scott Alan., Florida State University
 Abstract/Description

The purpose of this study was to compare several robust regression techniques to ordinary least squares (OLS) regression when analyzing bivariate and multivariate data. The bivariate analysis compared of the performance of alternative robust procedures in regard to the detection of outliers versus the standard OLS regression techniques. The bivariate analysis demonstrated the weaknesses of OLS regression and the standard OLS outlier diagnostic techniques when multiple outliers are present. In...
Show moreThe purpose of this study was to compare several robust regression techniques to ordinary least squares (OLS) regression when analyzing bivariate and multivariate data. The bivariate analysis compared of the performance of alternative robust procedures in regard to the detection of outliers versus the standard OLS regression techniques. The bivariate analysis demonstrated the weaknesses of OLS regression and the standard OLS outlier diagnostic techniques when multiple outliers are present. In addition, this research assessed the empirical performance of alpha and power under three nonnormal probability density functions using a Monte Carlo simulation., The first analysis focused on several bivariate data sets. Each data set was plotted and each of the regression models used to analyze the data. The usual results (e.g., R$\sp2$, regression coefficients, standard errors, and regression diagnostics) were examined to give a visual as well as empirical analysis of the models' performance in the presence of multiple outliers., The second component of this study entailed a Monte Carlo simulation of five robust regression models and OLS regression under four probability density functions. The variables included in the study were placed in one 2$\sp1$3$\sp2$ and two 3$\sp2$ factorial design repeated over four probability density functions, resulting in a total of 90 experimental runs of the Monte Carlo simulation. Random samples were generated and then transformed to fit desired distributional moment characteristics. The incremental null hypothesis was used as the basis to calculate empirical alpha and power values calculated., The analysis demonstrated the inadequacies of the standard OLS based outlier detection methods and explained how regression analysis could be improved if a robust regression method is used in parallel with OLS regression. The multivariate analysis demonstrated the robustness of the OLS regression model to three nonnormal populations. It further demonstrated a moderate inflation of alpha for the Mclass of robust regression model and a lack of power stability with the rank transform regression method., Based on the results of this study, recommendations were made for using robust regression methods and suggestions for future research offered.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9222385, 3087822, FSDT3087822, fsu:76632
 Format
 Document (PDF)
 Title
 THE COMPARISON OF SENSITIVITIES OF EXPERIMENTS (MAXIMUM LIKELIHOOD, RANDOM, FIXED, ANALYSIS OF VARIANCE).
 Creator

YOUNG, BARBARA NELSON., Florida State University
 Abstract/Description

The sensitivity of a measurement technique is defined to be its ability to detect differences among the treatments in a fixed effects design, or the presence of a between treatments component of variance in a random effects design. Consider an experiment, consisting of two identical subexperiments, designed specifically for the purpose of comparing two measurement techniques. It is assumed that the techniques of analysis of variance are applicable in analyzing the data obtained from the two...
Show moreThe sensitivity of a measurement technique is defined to be its ability to detect differences among the treatments in a fixed effects design, or the presence of a between treatments component of variance in a random effects design. Consider an experiment, consisting of two identical subexperiments, designed specifically for the purpose of comparing two measurement techniques. It is assumed that the techniques of analysis of variance are applicable in analyzing the data obtained from the two measurement techniques. The subexperiments may have either fixed or random treatment effects in either oneway or general block designs. It is assumed that the experiment yields bivariate observations from the two measurement methods which may or may not be independent. Likelihood ratio tests are used in the various settings of this dissertation to both extend current techniques and provide alternative methods for comparing the sensitivities of experiments.
Show less  Date Issued
 1985, 1985
 Identifier
 AAI8524629, 3086182, FSDT3086182, fsu:75665
 Format
 Document (PDF)
 Title
 A Comparison of Three Approaches to Confidence Interval Estimation for Coefficient Omega.
 Creator

Xu, Jie, Yang, Yanyun, Becker, Betsy Jane, Almond, Russell G., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
 Abstract/Description

Coefficient Omega was introduced by McDonald (1978) as a reliability coefficient of composite scores for the congeneric model. Interval estimation (Neyman, 1937) on coefficient Omega provides a range of plausible values which is likely to capture the population reliability of composite scores. The Wald method, likelihood method, and biascorrected and accelerated bootstrap method are three methods to construct confidence interval for coefficient Omega (e.g., Cheung, 2009b; Kelley & Cheng,...
Show moreCoefficient Omega was introduced by McDonald (1978) as a reliability coefficient of composite scores for the congeneric model. Interval estimation (Neyman, 1937) on coefficient Omega provides a range of plausible values which is likely to capture the population reliability of composite scores. The Wald method, likelihood method, and biascorrected and accelerated bootstrap method are three methods to construct confidence interval for coefficient Omega (e.g., Cheung, 2009b; Kelley & Cheng, 2012; Raykov, 2002, 2004, 2009; Raykov & Marcoulides, 2004; Padilla & Divers, 2013). Very limited number of studies on the evaluation of these three methods can be found in the literature (e.g., Cheung, 2007, 2009a, 2009b; Kelley & Cheng, 2012; Padilla & Divers, 2013). No simulation study has been conducted to evaluate the performance of these three methods for interval construction on coefficient Omega. In the current simulation study, I assessed these three methods by comparing their empirical performance on interval estimation for coefficient Omega. Four factors were included in the simulation design: sample size, number of items, factor loading, and degree of nonnormality. Two thousands datasets were generated in R 2.15.0 (R Core Team, 2012) for each condition. For each generated dataset, three approaches (i.e., the Wald method, likelihood method, and biascorrected and accelerated bootstrap method) were used to construct 95% confidence interval of coefficient Omega in R 2.15.0. The results showed that when the data were multivariate normally distributed, three methods performed equally well and coverage probabilities were very close to the prespecified .95 confidence level. When the data were multivariate nonnormally distributed, coverage probabilities decreased and interval widths became wider for all three methods as the degree of nonnormality increased. In general, when the data departed from the multivariate normality, the BCa bootstrap method performed better than the other two methods, with relatively higher coverage probabilities, while the Wald and likelihood methods were comparable and yielded narrower interval width than the BCa bootstrap method.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9269
 Format
 Thesis
 Title
 A comparison of two methods of bootstrapping in a reliability model.
 Creator

Chiang, YuangChin., Florida State University
 Abstract/Description

We consider bootstrapping in the following reliability model which was considered by Doss, Freitag, and Proschan (1987). Available for testing is a sample of iid systems each having the same structure of m independent components. Each system is continuously observed until it fails. For every component in each system, either a failure time or a censoring time is recorded. A failure time is recorded if the component fails before or at the time of system failure; otherwise a censoring time is...
Show moreWe consider bootstrapping in the following reliability model which was considered by Doss, Freitag, and Proschan (1987). Available for testing is a sample of iid systems each having the same structure of m independent components. Each system is continuously observed until it fails. For every component in each system, either a failure time or a censoring time is recorded. A failure time is recorded if the component fails before or at the time of system failure; otherwise a censoring time is recorded. To estimate the distribution of the component lifelengths F$\sb1,\...$,F$\sb{\rm m}$, one can formally compute the KaplanMeier estimates F$\sb1,\...$,F$\sb{\rm m}$. Various quantities of interest, such as the probability that a new system will survive time t$\sb0$, may then be estimated by combining F$\sb1,\...$,F$\sb{\rm m}$ in a suitable way. In this model, bootstrapping can be carried out in two different ways. One can resample n systems at random from the original n systems. Alternatively, one can construct artificial systems by generating independent random lifelengths from the KaplanMeier estimates F$\sb{\rm j}$, and from those form artificial data. The two methods are distinct. We show that asymptotically, bootstrapping by either method yields correct answers. We also compare the two methods via simulation studies.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8906216, 3161719, FSDT3161719, fsu:77918
 Format
 Document (PDF)
 Title
 The computation of probabilities which involve spacings, with applications to the scan statistic.
 Creator

Lin, ChienTai., Florida State University
 Abstract/Description

We develop a methodology for evaluating probabilities which involve linear combinations of spacings and then present some applications of this methodology. The basic idea underlying our method was given by Huffer (1988): A recursion is used to break up the joint distribution of several linear combinations of spacings into a sum of simpler components. The same recursion is then applied to each of these components and so on. The process is continued until we obtain components which are simple...
Show moreWe develop a methodology for evaluating probabilities which involve linear combinations of spacings and then present some applications of this methodology. The basic idea underlying our method was given by Huffer (1988): A recursion is used to break up the joint distribution of several linear combinations of spacings into a sum of simpler components. The same recursion is then applied to each of these components and so on. The process is continued until we obtain components which are simple and easily expressed in closed form. We describe algorithms and a computer program (written in C) which implement this approach. Our approach has two advantages. First, it is fairly general and can be used to solve a variety of problems involving linear combinations of spacings. Secondly, because the output of our procedure is a polynomial whose coefficients are computed exactly, we can supply numerical answers which are accurate to any required degree of precision. We apply our program to compute the distribution of the scan statistic for small sample sizes. We also use the recursion and computer program to calculate the lower order moments of the number of clumps in randomly distributed points. We can use these moments to obtain bounds and approximations for the distribution of the scan statistic. Our approximations are based on fitting a compound Poisson distribution to the moments of the number of clumps.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9416150, 3088291, FSDT3088291, fsu:77095
 Format
 Document (PDF)
 Title
 Conditional bootstrap methods for censored data.
 Creator

Kim, JiHyun., Florida State University
 Abstract/Description

We first consider the random censorship model of survival analysis. The pairs of positive random variables ($X\sb{i},Y\sb{i}$), i = 1,$\...$,n, are independent and identically distributed, with distribution functions F(t) = P($X\sb{i} \leq\ t$) and G(t) = P($Y\sb{i} \leq\ t$) and the Y's are independent of the X's. We observe only ($T\sb{i},\delta\sb{i}$), i = 1,$\...$,n, where $T\sb{i}$ = min($X\sb{i},Y\sb{i}$) and $\delta\sb{i}$ = I($X\sb{i} \leq\ Y\sb{i}$). The X's represent survival times...
Show moreWe first consider the random censorship model of survival analysis. The pairs of positive random variables ($X\sb{i},Y\sb{i}$), i = 1,$\...$,n, are independent and identically distributed, with distribution functions F(t) = P($X\sb{i} \leq\ t$) and G(t) = P($Y\sb{i} \leq\ t$) and the Y's are independent of the X's. We observe only ($T\sb{i},\delta\sb{i}$), i = 1,$\...$,n, where $T\sb{i}$ = min($X\sb{i},Y\sb{i}$) and $\delta\sb{i}$ = I($X\sb{i} \leq\ Y\sb{i}$). The X's represent survival times, the Y's represent censoring times. Efron (1981) proposed two bootstrap methods for the random censorship model and showed that they are distributionally the same. Akritas (1986) established the weak convergence of the bootstrapped KaplanMeier estimator of F when bootstrapping is done by this method. Let us now consider bootstrapping more closely. Suppose that we wish to estimate the variance of F(t). If we knew the Y's then we would condition on them by the ancillarity principle, since the distribution of the Y's does not depend on F. That is, we would want to estimate Var$\{$F(t)$\vert Y\sb1,\...,Y\sb{n}\}$. Unfortunately, in the random censorship model we do not see all the Y's. If $\delta\sb{i}$ = 0 we see the exact value of $Y\sb{i}$, but if $\delta\sb{i}$ = 1 we know only that $Y\sb{i} > T\sb{i}$. Let us denote this information on the Y's by ${\cal C}$. Thus, what we want to estimate is Var$\{$F(t)$\vert{\cal C}\}$. Efron's scheme is appropriate for estimating the unconditional variance. We propose a new bootstrap method which provides an estimate of Var$\{$F(t)$\vert{\cal C}\}$., In this research we show that the KaplanMeier estimator of F formed by the new bootstrap method has the same limiting distribution as the one by Efron's approach. The results of simulation studies assessing the small sample performance of the two bootstrap methods are reported. We also consider the model in which the $X\sb{i}$'s are censored by the $Y\sb{i}$'s and also by known fixed constants, and propose an appropriate bootstrap method for that model. This bootstrap method is a readily modified version of the new bootstrap method above.
Show less  Date Issued
 1990, 1990
 Identifier
 AAI9113938, 3162201, FSDT3162201, fsu:78399
 Format
 Document (PDF)
 Title
 Contributions to the theory of arrangement increasing functions.
 Creator

Proschan, Michael Arthur., Florida State University
 Abstract/Description

A function $f(\underline{x})$ which increases each time we transpose an out of order pair of coordinates, $x\sb{j} > x\sb{k}$ for some $j x\sb{k}$ by transposing the two x coordinates. The theory of AI functions is tailor made for ranking and selection problems, in which case we assume that the density $f(\underline{\theta}$,$\underline{x})$ of observations with respective parameters $\theta\sb1, \..., \theta\sb{n}$ is AI, and the goal is to determine the largest or smallest parameters., In...
Show moreA function $f(\underline{x})$ which increases each time we transpose an out of order pair of coordinates, $x\sb{j} > x\sb{k}$ for some $j x\sb{k}$ by transposing the two x coordinates. The theory of AI functions is tailor made for ranking and selection problems, in which case we assume that the density $f(\underline{\theta}$,$\underline{x})$ of observations with respective parameters $\theta\sb1, \..., \theta\sb{n}$ is AI, and the goal is to determine the largest or smallest parameters., In this dissertation we present new applications of AI functions in such areas as biology and reliability, and we generalize the notion of AI functions. We consider multivector extensions, some with and one without respect to parameter vectors, and we connect these. Another generalization (TEGO) is motivated by the connection between total positivity (TP) and AI. TEGO results are shown to imply AI and TP results. We also define and develop a partial ordering on densities of rank vectors. The theory, which involves finding the extreme points of the convex set of AI rank densities, is then used to establish some power results of rank tests.
Show less  Date Issued
 1989, 1989
 Identifier
 AAI9002934, 3161869, FSDT3161869, fsu:78068
 Format
 Document (PDF)
 Title
 Critical Issues in Survey MetaAnalysis.
 Creator

Gozutok, Ahmet Serhat, Becker, Betsy Jane, Huffer, Fred W., Yang, Yanyun, Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and...
Show moreGozutok, Ahmet Serhat, Becker, Betsy Jane, Huffer, Fred W., Yang, Yanyun, Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

In research synthesis, researchers may aim at summarizing peoples' attitudes and perceptions of phenomena that have been assessed using different measures. Selfreport rating scales are among the most commonly used measurement tools to quantify such latent constructs in education and psychology. However, selfreport ratingscale questions measuring the same construct may differ from each other in many ways. Scale format, number of response options, wording of questions, and labeling of...
Show moreIn research synthesis, researchers may aim at summarizing peoples' attitudes and perceptions of phenomena that have been assessed using different measures. Selfreport rating scales are among the most commonly used measurement tools to quantify such latent constructs in education and psychology. However, selfreport ratingscale questions measuring the same construct may differ from each other in many ways. Scale format, number of response options, wording of questions, and labeling of response option categories may vary across questions. Consequently, variations across the measures of the same construct bring about the issue of comparability of the results across the studies in metaanalytic investigations. In this study, I examine the complexities of summarizing the results of different survey questions about the same construct in the metaanalytic fashion. More specifically, this study focuses on the practical problems that arise when combining survey items that differ from one another in the wording of question stems, numbers of response option categories, scale direction (i.e., unipolar and bipolar scales), response scale labeling (i.e., fullylabeled scales and endpointslabeled scales), and responseoption labeling (e.g., "extremely happy"  "completely happy"  "most happy", "pretty happy", "quite happy" "moderately happy", and "not at all happy"  "least happy"  "most unhappy"). In addition, I propose practical solutions to handle the issues that arise due to such variations when conducting a metaanalysis. I discuss the implications of the proposed solutions from the perspective of metaanalysis. Examples are obtained from the collection of studies in the World Happiness Database (Veenhoven, 2006), which includes various singleitem happiness measures.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Gozutok_fsu_0071E_14866
 Format
 Thesis
 Title
 Cumulative regression function methods in survival analysis and time series.
 Creator

Zhang, MeiJie., Florida State University
 Abstract/Description

One may estimate a conditional hazard function from grouped (and possibly censored) survival data by the time and covariate specific occurrence/exposure rate. Asymptotic results for cumulative versions of this estimator are developed, utilizing the general framework of counting processes. In particular, a grouped data based goodnessoffit test for Cox's proportional hazard model is given. Various constraints on the asymptotic behavior of the widths of the calendar periods and covariate...
Show moreOne may estimate a conditional hazard function from grouped (and possibly censored) survival data by the time and covariate specific occurrence/exposure rate. Asymptotic results for cumulative versions of this estimator are developed, utilizing the general framework of counting processes. In particular, a grouped data based goodnessoffit test for Cox's proportional hazard model is given. Various constraints on the asymptotic behavior of the widths of the calendar periods and covariate strata employed in grouping the data are needed to prove the results. Actual performance of the estimators and test statistics is evaluated by Monte Carlo methods., We also consider the problem of identifying the class of time series model to which a series belongs based on observation of part of the series. Techniques of nonparametric estimation have been applied to this problem by Auestad and Tjostheim (Biometrika 77(1990):669687) who used kernel estimates of the onestep lagged conditional mean and variance functions. We study cumulative versions of such estimates. These are more stable than the kernel estimates and can be used to construct confidence bands for the underlying cumulative mean and variance functions. Goodnessoffit tests for specific parametric models are also developed.
Show less  Date Issued
 1991, 1991
 Identifier
 AAI9202323, 3087663, FSDT3087663, fsu:76478
 Format
 Document (PDF)
 Title
 DETERMINING A SUFFICIENT LEVEL OF INTERRATER RELIABILITY (POWER ANALYSIS, MISCLASSIFICATION, SAMPLE SIZE).
 Creator

RASP, JOHN M., Florida State University
 Abstract/Description

The reliability of a test or measurement procedure is, generally speaking, an index of the consistency of its results. Interrater reliability assesses the consistency of judgements among a set of raters. We model the observation taken on a subject by an unreliable procedure as the sum of a true score with mean (mu) and variance (sigma)(,T)('2) and an error term with mean 0 and variance (sigma)(,E)('2). The reliability coefficient then is (rho) = (sigma)(,T)('2)/((sigma)(,T)('2) + (sigma)(,E)...
Show moreThe reliability of a test or measurement procedure is, generally speaking, an index of the consistency of its results. Interrater reliability assesses the consistency of judgements among a set of raters. We model the observation taken on a subject by an unreliable procedure as the sum of a true score with mean (mu) and variance (sigma)(,T)('2) and an error term with mean 0 and variance (sigma)(,E)('2). The reliability coefficient then is (rho) = (sigma)(,T)('2)/((sigma)(,T)('2) + (sigma)(,E)('2))., The reliability of an instrument or rating procedure is generally evaluated in an initial experiment (or series of experiments) known as a "reliability study." Once an instrument is established as having some degree of reliability, it is then used as a measurement tool in subsequent research, known as "decision studies.", An unreliable procedure measures imperfectly. The impact of the error in measurement is investigated as it relates to three broad areas of statistical procedures: estimation, hypothesis testing, and decisionmaking., An unreliable measurement decreases the precision of estimates. The effect of an unreliable measurement on the width of a confidence interval for the population mean is examined. Also, an expression is developed to facilitate estimation of the reliability of a test or measurement in a decision study when the populations of interest may differ from those in the reliability study., An unreliable instrument weakens hypothesis tests. The extent to which lack of reliability attenuates the power of the twosample ttest, the Ftest in the analysis of variance, and the ttest for statistically significant correlation between two variables is investigated., An unreliable measurement engenders false classifications. A dichotomous decision is considered, and expressions for the probability of misclassifying a subject by a rating procedure with a given reliability are developed. Overall as well as directional misclassification rates are found under the model of true scores and errors distributed as independent normals. Effects of departures from this model, by heavytailed and skewed true score and error distributions, and by errors whose variance is a function of the true score, are considered. A general expression for this misclassification probability is found. A confidence interval for the misclassification probability is developed., These results provide tools for a researcher better to make decisions concerning the design of an experiment. They permit the costs of increased reliability to be more knowledgeably compared with the consequences of using an unreliable measurement procedure in a given situation.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8416723, 3085837, FSDT3085837, fsu:75324
 Format
 Document (PDF)
 Title
 Developing SRSF Shape Analysis Techniques for Applications in Neuroscience and Genomics.
 Creator

Wesolowski, Sergiusz, Wu, Wei, Bertram, R. (Richard), Srivastava, Anuj, Beerli, Peter, Mio, Washington, Florida State University, College of Arts and Sciences, Department of...
Show moreWesolowski, Sergiusz, Wu, Wei, Bertram, R. (Richard), Srivastava, Anuj, Beerli, Peter, Mio, Washington, Florida State University, College of Arts and Sciences, Department of Mathematics
Show less  Abstract/Description

Dissertation focuses on exploring the capabilities of the SRSF statistical shape analysis framework through various applications. Each application gives rise to a specific mathematical shape analysis model. The theoretical investigation of the models, driven by real data problems, give rise to new tools and theorems necessary to conduct a sound inference in the space of shapes. From theoretical standpoint the robustness results are provided for the model parameters estimation and an ANOVA...
Show moreDissertation focuses on exploring the capabilities of the SRSF statistical shape analysis framework through various applications. Each application gives rise to a specific mathematical shape analysis model. The theoretical investigation of the models, driven by real data problems, give rise to new tools and theorems necessary to conduct a sound inference in the space of shapes. From theoretical standpoint the robustness results are provided for the model parameters estimation and an ANOVAlike statistical testing procedure is discussed. The projects were a result of the collaboration between theoretical and applicationfocused research groups: the Shape Analysis Group at the Department of Statistics at Florida State University, the Center of Genomics and Personalized Medicine at FSU and the FSU's Department of Neuroscience. As a consequence each of the projects consists of two aspects—the theoretical investigation of the mathematical model and the application driven by a real life problem. The applications components, are similar from the data modeling standpoint. In each case the problem is set in an infinite dimensional space, elements of which are experimental data points that can be viewed as shapes. The three projects are: ``A new framework for Euclidean summary statistics in the neural spike train space''. The project provides a statistical framework for analyzing the spike train data and a new noise removal procedure for neural spike trains. The framework adapts the SRSF elastic metric in the space of point patterns to provides a new notion of the distance. ``SRSF shape analysis for sequencing data reveal new differentiating patterns''. This project uses the shape interpretation of the Next Generation Sequencing data to provide a new point of view of the exon level gene activity. The novel approach reveals a new differential gene behavior, that can't be captured by the stateofthe art techniques. Code is available online on github repository. ``How changes in shape of nucleosomal DNA near TSS influence changes of gene expression''. The result of this work is the novel shape analysis model explaining the relation between the change of the DNA arrangement on nucleosomes and the change in the differential gene expression.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Wesolowski_fsu_0071E_14177
 Format
 Thesis