Current Search: Research Repository (x) » Statistics (x)
Search results
Pages
 Title
 Fused Lasso and Tensor Covariance Learning with Robust Estimation.
 Creator

Kunz, Matthew Ross, She, Yiyuan, Stiegman, Albert E., Mai, Qing, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

With the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of...
Show moreWith the increase in computation and data storage, there has been a vast collection of information gained with scientific measurement devices. However, with this increase in data and variety of domain applications, statistical methodology must be tailored to specific problems. This dissertation is focused on analyzing chemical information with an underlying structure. Robust fused lasso leverages information about the neighboring regression coefficient structure to create blocks of coefficients. Robust modifications are made to the mean to account for gross outliers in the data. This method is applied to near infrared spectral measurements in prediction of an aqueous analyte concentration and is shown to improve prediction accuracy. Expansion on the robust estimation and structure analysis is performed by examining graph structures within a clustered tensor. The tensor is subjected to wavelet smoothing and robust sparse precision matrix estimation for a detailed look into the covariance structure. This methodology is applied to catalytic kinetics data where the graph structure estimates the elementary steps within the reaction mechanism.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Kunz_fsu_0071E_14844
 Format
 Thesis
 Title
 Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and SemiSupervised Learning in Bioinformatics.
 Creator

Liu, Shuyi, Wu, Wei, Wang, Xiaoqiang, Zhang, Jinfeng, Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters...
Show moreIn the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semisupervised dimension reduction in clinical area. The advantage of semisupervised dimension reduction is very easy to understand. SemiSupervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semisupervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semisupervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semisupervised nonlinear dimension reduction, such as semisupervised LLE and semisupervised Isomap.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Liu_fsu_0071E_14367
 Format
 Thesis
 Title
 Generalized PearsonFisher chisquare goodness of fit tests, with applications to models with life history data.
 Creator

Li, Gang., Florida State University
 Abstract/Description

Suppose that $X\sb1,\...,X\sb{n}$ are i.i.d. $\sim$ F, and we wish to test the null hypothesis that F is a member of the parametric family ${\cal F}=\{F\sb\theta(x);$ $\theta\in\Theta\}$ where $\Theta\subset\IR\sp{q}.$ The classical PearsonFisher chisquare test involves partitioning the real axis into k cells $I\sb1,\...,I\sb{k}$ and forming the chisquare statistic $X\sp2=\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb{\\theta}(I\sb{i}))\sp2/nF\sb{\\theta}(I\sb{i}),$ where $O\sb{i}$ is the number...
Show moreSuppose that $X\sb1,\...,X\sb{n}$ are i.i.d. $\sim$ F, and we wish to test the null hypothesis that F is a member of the parametric family ${\cal F}=\{F\sb\theta(x);$ $\theta\in\Theta\}$ where $\Theta\subset\IR\sp{q}.$ The classical PearsonFisher chisquare test involves partitioning the real axis into k cells $I\sb1,\...,I\sb{k}$ and forming the chisquare statistic $X\sp2=\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb{\\theta}(I\sb{i}))\sp2/nF\sb{\\theta}(I\sb{i}),$ where $O\sb{i}$ is the number of observations falling into cell i and $\\theta$ is the value of $\theta$ minimizing $\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb\theta(I\sb{i}))\sp2/nF\sb\theta(I\sb{i}).$ We obtain a generalization of this test to any situation for which there is available a nonparametric estimator F of F for which $n\sp{1\over2}(\{F}  F){d\atop\to}W$ where W is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\\theta$ that minimizes a "distance" between the vectors $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb\theta(I\sb1),\...,F\sb\theta(I\sb{k})),$ where distance is measured through an arbitrary positive definite quadratic form, and then form a chisquare type test statistic based on the difference between $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb{\\theta}(I\sb1),\...,F\sb{\\theta}(I\sb{k})).$ We prove that this test statistic has asymptotically a chisquare distribution with $kq1$ degrees of freedom, and point out some errors in the literature on chisquare tests in survival analysis. Our procedure is very general and applies to a number of wellknown models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation, period of the AIDS virus using the CDC's data on bloodtransfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9234234, 3087898, FSDT3087898, fsu:76708
 Format
 Document (PDF)
 Title
 Generating Poisson and binomial random variates.
 Creator

Lee, WenChiung., Florida State University
 Abstract/Description

Many methods for generating variates from discrete distributions have been developed over the past years. They vary from simple to complicated, from specific to general. Some are based on interesting underlying theory, while others are more concerned with efficient computer implementation., This dissertation is directed toward the latter. We describe methods that are best suited for efficient (fast) computer implementation. We develop specific programs for both the Poisson and the binomial...
Show moreMany methods for generating variates from discrete distributions have been developed over the past years. They vary from simple to complicated, from specific to general. Some are based on interesting underlying theory, while others are more concerned with efficient computer implementation., This dissertation is directed toward the latter. We describe methods that are best suited for efficient (fast) computer implementation. We develop specific programs for both the Poisson and the binomial distributions with two versions of each, one for when the parameters are fixed and the other for when the parameters change from call to call. These programs are developed with a sparenoexpense attitude, and timing comparisons will support our belief that they are faster than any other published methods., For the fixedparameter case, an algorithm which combines the table lookup, the square histogram (Marsaglia's lecture notes), and the direct search method is given. We will apply the algorithm to the Poisson and the binomial distributions., For the variableparameter Poisson case, we take advantage of Marsaglia's (1986) approach and incorporate additional techniques in order to have a Poisson variate generator which works for any value of $\lambda$, using, most of the time, the integer part of a polynomial in a normal variate. We extend the procedure to the binomial distribution.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9334283, 3088178, FSDT3088178, fsu:76985
 Format
 Document (PDF)
 Title
 Geometric Approaches for Analysis of Images, Densities and Trajectories on Manifolds.
 Creator

Zhang, Zhengwu, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis....
Show moreIn this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis. Second, the representation spaces are often nonlinear manifolds. Thus, proper Riemannian structures are necessary to compare objects. Third, the analysis and comparison of objects need be invariant to certain nuisance variables. For example, comparison between two images should be invariant to their blur levels, and comparison between timeindexed trajectories on manifolds should be invariant to their temporal evaluation rates. We start by introducing frameworks for representing, comparing and analyzing functions in Euclidean space including signals, images and densities, and the comparisons are invariant to the Gaussian blur existed in these objects. Applications in blur levels matching, blurred image recognition, image classification and twosample hypothesis test are discussed. Next, we present frameworks for analyzing longitudinal trajectories on a manifold M, while the analysis is invariant to the reparameterization action (temporal variation). Particularly, we are interested in analyzing trajectories in two manifolds: the twosphere and the set of symmetric positivedefinite matrices. Applications such as bird migration and hurricane tracks analysis, visual speech recognition and hand gesture recognition are used to demonstrate the advantages of the proposed frameworks. In the end, a Bayesian framework for clustering of shapes of curves is presented, and examples of clustering cell shapes and protein structures are discussed.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9503
 Format
 Thesis
 Title
 GoodnessofTests for Logistic Regression.
 Creator

Wu, Sutan, McGee, Dan L., Zhang, Jinfeng, Hurt, Myra, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the...
Show moreThe generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the cumulative residual test. We compare their performance in a series of empirical studies as well as particular simulation scenarios. We conclude that the unweighted sum of squares test and the cumulative sums of residuals test give better overall performance than the other two. We also conclude that the commonly suggested practice of assuming that a pvalue less than 0.15 is an indication of lack of fit at the initial steps of model diagnostics should be adopted. Additionally, D'Agostino et al. presented the relationship of the stacked logistic regression and the Cox regression model in the Framingham Heart Study. So in our future study, we will examine the possibility and feasibility of the adaption these goodnessoffit tests to the Cox proportional hazards model using the stacked logistic regression.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0693
 Format
 Thesis
 Title
 High Level Image Analysis on Manifolds via Projective Shapes and 3D Reflection Shapes.
 Creator

Lester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences,...
Show moreLester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Shape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D...
Show moreShape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D reflection shape) and human facial configurations extracted from digital camera images. We also present the method of MANOVA on manifolds, and apply it to face data extracted from digital camera images. Finally, we present a new statistical tool called antiregression.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Lester_fsu_0071E_13856
 Format
 Thesis
 Title
 HighDimensional Statistical Methods for Tensor Data and Efficient Algorithms.
 Creator

Pan, Yuqing, Mai, Qing, Zhang, Xin, Yu, Weikuan, Slate, Elizabeth H., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of highdimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for CovariateAdjusted Tensor Classification in Highdimensions, which efficiently integrates the lowdimensional covariates and the tensor to perform...
Show moreIn contemporary sciences, it is of great interest to study supervised and unsupervised learning problems of highdimensional tensor data. In this dissertation, we develop new methods for tensor classification and clustering problems, and discuss algorithms to enhance their performance. For supervised learning, we propose CATCH model, in short for CovariateAdjusted Tensor Classification in Highdimensions, which efficiently integrates the lowdimensional covariates and the tensor to perform classification and variable selection. The CATCH model preserves and utilizes the structures of the data for maximum interpretability and optimal prediction. We propose a penalized approach to select a subset of tensor predictor entries that has direct discriminative effects after adjusting for covariates. Theoretical results confirm that our approach achieves variable selection consistency and optimal classification accuracy. For unsupervised learning, we consider clustering problem on highdimensional tensor data. we propose an efficient procedure based on EM algorithm. It directly estimates the sparse discriminant vector from a penalized objective function and provides computationally efficient rules to update all other parameters. Meanwhile, the algorithm takes advantage of the tensor structure to reduce the number of parameters, which leads to lower storage costs. The performance of our method over existing methods is demonstrated in simulated and real data examples. Moreover, based on tensor computation, we propose a novel algorithm referred to as the SMORE algorithm for differential network analysis. The SMORE algorithm has low storage cost and high computation speed, especially in the presence of strong sparsity. It also provides a unified framework for binary and multiple network problems. In addition, we note that the SMORE algorithm can be applied to highdimensional quadratic discriminant analysis problems, providing a new approach for multiclass highdimensional quadratic discriminant analysis. In the end, we discuss some directions of the future work, including new approaches, applications and relaxing assumptions.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Pan_fsu_0071E_15135
 Format
 Thesis
 Title
 A hypothesis test of cumulative sums of multinomial parameters.
 Creator

Clair, James Hunter., Florida State University
 Abstract/Description

Consider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $
Show moreConsider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $<$ 1. We wish to have at least 100 ($\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$)% of items repaired by time $L\sb{i}$, $1 \le i \le K$, $K \ge 2$. Denote the unknown quantity $F(L\sb{i}$)  $F(L\sb{i1})$ as $p\sb{i}$, $1 \le i \le K$. Thus we wish to test the hypothesis(UNFORMATTED TABLE OR EQUATION FOLLOWS), A simple procedure is to test this hypothesis with the $K$ statistics $N\sb1$, $\sum\sbsp{\nu=1}{2}N\sb{\nu},\cdots,\sum\sbsp{\nu=a}{K}N\sb{\nu}$, where $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ = the number of repairs that takes place on or before $l\sb{i}$, $1 \le i \le K$. Each $\sum\sbsp{\nu=n}{i}N\sb{\nu}$ is a binomial random variable with unknown parameter $\sum\sbsp{\nu=1}{i}p\sb{\nu}$. The hypothesis H$\sb0$ is rejected if any of the $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ $\le$ $n\sbsp{i}{0}$, where the $n\sbsp{i}{0}$ are chosen from binomial tables. This test is shown to have several deficiencies. We construct an alternative procedure with which to test this hypothesis., The Generalized Likelihood Ratio Statistic (GLRT) is based on the multinomial random variable ($N\sb1,N\sb2,\cdots,N\sb{K}$), with parameter ${(p\sb1,}$ $p\sb2,\cdots,$ $p\sb{K}$). The parameter space is(UNFORMATTED TABLE OR EQUATION FOLLOWS), An algorithm is constructed and computer code supplied to calculate $\lambda(N)$ efficiently for any finite $N$., For small samples computer code is given to calculate exactly $\delta$ or a pvalue for an observed value of $\lambda(N(K))$, 2 $\le$ $K$ $\le$ 5, and $K\ \le\ N\ \le\ N(K)$., For large $N$, we apply a theorem by Feder(1968) to evaluate the asymptotic critical values and power., The GLRT statistic, $\lambda(N)$, is shown to be approximately a unionintersection test and thus is approximated by a collection of uniformly most powerful unbiased tests of binomial parameters. The GLRT is shown empirically in the case of $K$ = 3 to have higher power than competing unionintersection tests., Two power estimation techniques are described and compared empirically., References. Feder, Paul J. (1968), "On the distribution of the loglikelihood ratio test statistic when the true parameter is 'near' the boundaries of the hypothesis region," Annals of Mathematical Statistics, 39, 20442055.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8822443, 3161637, FSDT3161637, fsu:77837
 Format
 Document (PDF)
 Title
 Identifiability in the autopsy model of reliability theory.
 Creator

Antoine, Robin Michael., Florida State University
 Abstract/Description

Let S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to...
Show moreLet S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to estimate or identify the component lifelengths from the observed data. A sufficient condition for the identifiability of the component distributions is given for the case in which the distributions are assumed to be analytic. Necessary and sufficient conditions are given for the case in which the distributions are assumed to belong to certain parametric families., The model of Doss, Freitag and Proschan is considered in two special cases. In the first of these the component distributions are known to be identical. In the second, the distributions are known to be exponential. Estimators of the component and system life lengths are given for each of these cases, and the asymptotic relative efficiency of each with respect to the corresponding estimator of Doss, Freitag and Proschan is calculated.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9222356, 3087814, FSDT3087814, fsu:76624
 Format
 Document (PDF)
 Title
 Identifying influential effects in factorial experiments with sixteen runs: Empirical Bayes approaches.
 Creator

Chen, ChingHsiang., Florida State University
 Abstract/Description

To identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study...
Show moreTo identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study also examines the identification of influential effects based on information about the design structure such as the alias relationships, design resolution, and sizes of interactions. A simulation study, based primarily on the criterion of reducing experimental cost of misidentifying factors, has been performed to compare different methods. The results show that when the number of factors is large and when the factorial experiment is highly fractionated, the incorporation of information about the design structure into the analysis reduces the cost in a screening experiment compared to methods not considering design structure.
Show less  Date Issued
 1994, 1994
 Identifier
 AAI9424751, 3088354, FSDT3088354, fsu:77159
 Format
 Document (PDF)
 Title
 Impact of Missing Data on Building Prognostic Models and Summarizing Models Across Studies.
 Creator

Munshi, Mahtab R., McGee, Daniel, Eberstein, Isaac, Hollander, Myles, Niu, Xufeng, Chattopadhyay, Somesh, Department of Statistics, Florida State University
 Abstract/Description

We examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in...
Show moreWe examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in medical research, new covariates become available for studying various outcomes. While we want to investigate the influence of new factors on the outcome, we also do not want to discard the historical datasets that do not have information about these markers. Our research plan is to investigate different methods to estimate parameters for a model when some of the covariates are missing. These methods include likelihood based inference for the studylevel coefficients and likelihood based inference for the logistic model on the personlevel data. We compare the results from our methods to the corresponding results from complete case analysis. We focus our empirical investigation on a historical example, the addition of high density lipoproteins to existing equations for predicting death due to coronary heart disease. We verify our methods through simulation studies on this example.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd2191
 Format
 Thesis
 Title
 The Impact of Rater Variability on Relationships among Different EffectSize Indices for InterRater Agreement between Human and Automated Essay Scoring.
 Creator

Yun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and...
Show moreYun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement...
Show moreSince researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement the investigations, my study consists of two parts: empirical and simulation studies. Based on the results from the empirical study, the overall effects for interrater agreement were .63 and .99 for exact and adjacent proportions of agreement, .48 for kappas, and between .75 and .78 for correlations. Additionally, significant differences between 6point scales and the other scales (i.e., 3, 4, and 5point scales) for correlations, kappas and proportions of agreement existed. Moreover, based on the results of the simulated data, the highest agreements and lowest discrepancies achieved in the matched rater distribution pairs. Specifically, the means of exact and adjacent proportions of agreement, kappa and weighted kappa values, and correlations were .58, .95, .42, .78, and .78, respectively. Meanwhile the average standardized mean difference was .0005 in the matched rater distribution pairs. Acceptable values for interrater agreement as evaluation criteria for automated essay scoring, impacts of rater variability on interrater agreement, and relationships among interrater agreement indices were discussed.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Yun_fsu_0071E_14144
 Format
 Thesis
 Title
 Impact of Violations of Measurement Invariance in Longitudinal Mediation Modeling.
 Creator

Xu, Jie, Yang, Yanyun, Zhang, Qian, Huffer, Fred W. (Fred William), Becker, Betsy J., Florida State University, College of Education, Department of Educational Psychology and...
Show moreXu, Jie, Yang, Yanyun, Zhang, Qian, Huffer, Fred W. (Fred William), Becker, Betsy J., Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Research has shown that crosssectional mediation analysis cannot accurately reflect a true longitudinal mediated effect. To investigate longitudinal mediated effects, different longitudinal mediation models have been proposed and these models focus on different research questions related to longitudinal mediation. When fitting mediation models to longitudinal data, the assumption of longitudinal measurement invariance is usually made. However, the consequences of violating this assumption...
Show moreResearch has shown that crosssectional mediation analysis cannot accurately reflect a true longitudinal mediated effect. To investigate longitudinal mediated effects, different longitudinal mediation models have been proposed and these models focus on different research questions related to longitudinal mediation. When fitting mediation models to longitudinal data, the assumption of longitudinal measurement invariance is usually made. However, the consequences of violating this assumption have not been thoroughly studied in mediation analysis. No studies have examined issues of measurement noninvariance in a latent crosslagged panel mediation (LCPM) model with three or more measurement occasions. The goal of the current study is to investigate the impact of violations of measurement invariance on longitudinal mediation analysis. The focal model in the study is the LCPM model suggested by Cole and Maxwell (2003). This model can be used to examine mediated effects among the latent predictor, mediator, and outcome variables across time. In addition, it can account for measurement error and allow for the evaluation of longitudinal measurement invariance. Simulation methods were used and the investigation was performed using population covariance matrices and sample data generated under various conditions. Eight design factors were considered for data generation: sample size, proportion of noninvariant items, position of latent factors with noninvariant items, type of noninvariant parameters, magnitude of noninvariance, pattern of noninvariance, size of the direct effect, and size of the mediated effect. Results from population investigation were evaluated based on overall model fit and the calculated direct and mediated effects; results from finite sample analysis were evaluated in terms of convergence and inadmissible solutions, overall model fit, bias/relative bias, coverage rates, and statistical power/type I error rates. In general, results obtained from finite sample analysis were consistent with those from the population investigation, with respect to both model fit and parameter estimation. The type I error rate of the mediated effects was inflated under the noninvariant conditions with small sample size (200); power of the direct and mediated effects was excellent (1.0 or close to 1.0) across all investigated conditions. Type I error rates based on the chisquare statistic test were seriously inflated under the invariant conditions, especially when the sample size was relatively small. Power for detecting model misspecifications due to longitudinal noninvariance was excellent across all investigated conditions. Fit indices (CFI, TLI, RMSEA, and SRMR) were not sensitive in detecting misspecifications caused by violations of measurement invariance in the investigated LCPM model. Study results also showed that as the magnitude of noninvariance, the proportion of noninvariant items, and the number of positions of latent variables with noninvariant items increased, estimation of the direct and mediated effects tended to be less accurate. The decreasing pattern of change in item parameters over measurement occasions resulted in the least accurate estimates of the direct and mediated effects. Parameter estimates were fairly accurate under the conditions of the decreasing and then increasing pattern and the mixed pattern of change in item parameters. Findings from this study can help empirical researchers better understand the potential impact of violating measurement invariance on longitudinal mediation analysis using the LCPM model.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xu_fsu_0071E_14994
 Format
 Thesis
 Title
 The importance of skewness and kurtosis in the timeseries of security returns.
 Creator

St. Pierre, Eileen Foley., Florida State University
 Abstract/Description

The importance of skewness and kurtosis in the return generating process is assessed by examining the outofsample forecasting power of three different Exponential GARCH models that assume the conditional errors are generated by a normal distribution, a generalized error distribution, and a nonparametric distribution. These models are selected because they incorporate the timeseries properties of security returns and each of these distributions allows for various degrees of conditional...
Show moreThe importance of skewness and kurtosis in the return generating process is assessed by examining the outofsample forecasting power of three different Exponential GARCH models that assume the conditional errors are generated by a normal distribution, a generalized error distribution, and a nonparametric distribution. These models are selected because they incorporate the timeseries properties of security returns and each of these distributions allows for various degrees of conditional skewness and kurtosis., First, daily security returns of firms listed on the New York and American Stock Exchanges over the period 1971 to 1991, excluding the year of 1987, are used to estimate the three models. This study finds that the importance of skewness and kurtosis varies over time and across firm size. The length of the holding period also affects the accuracy and reliability of expected returns generated by the three Exponential GARCH models., Second, daily security returns, computed from both traded prices and bidask averages, of National Market System firms in the OTC market from 1988 to 1991, are used to estimate the three models. This study finds that there is a tradeoff between obtaining lower forecast errors and the volatility of the forecast errors when skewness and kurtosis are incorporated in the return generating process. Overall, forecast errors are lower and less volatile when bidask averages are used to compute security returns. However, the bidask "bounce" does not have a significant affect on the importance of skewness and kurtosis in the return generating process.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9407828, 3088217, FSDT3088217, fsu:77021
 Format
 Document (PDF)
 Title
 Improvement of Quality Prediction in InterConnected Manufacturing System by Integrating MultiSource Data.
 Creator

Ren, Jie, Wang, Hui, Vanli, Omer Arda, Park, Chiwoo, Huffer, Fred W. (Fred William), Florida State University, FAMUFSU College of Engineering, Department of Industrial and...
Show moreRen, Jie, Wang, Hui, Vanli, Omer Arda, Park, Chiwoo, Huffer, Fred W. (Fred William), Florida State University, FAMUFSU College of Engineering, Department of Industrial and Manufacturing Engineering
Show less  Abstract/Description

With the development of advanced sensing and network technology such as wireless data transmission and data storage and analytics under cloud platforms, the manufacturing plant is going through a new revolution, by which different production units/components can communicate with each other, leading to interconnected manufacturing. The interconnection enables the close coordination of process control actions among machines to improve product quality. Traditional quality prediction methods...
Show moreWith the development of advanced sensing and network technology such as wireless data transmission and data storage and analytics under cloud platforms, the manufacturing plant is going through a new revolution, by which different production units/components can communicate with each other, leading to interconnected manufacturing. The interconnection enables the close coordination of process control actions among machines to improve product quality. Traditional quality prediction methods that focus on the data from one single source are not sufficient to deal with the variation modeling, and quality prediction problems involved the interconnected manufacturing. Instead, new quality prediction methods that can integrate the data from multiple sources are necessary. This research addresses the fundamental challenges in improving quality prediction by data fusion for interconnected manufacturing including knowledge sharing and transfer among different machines and collaboration error monitoring. The methodology is demonstrated through surface machining and additive manufacturing processes. The first study is on the surface quality prediction for one machining process by fusing multiresolution spatial data measured from multiple surfaces or different surface machining processes. The surface variation is decomposed into a global trend part that characterizes the spatially varying relationship of selected process variables and surface height and a zeromean spatial Gaussian process part. Three models including two varying coefficientbased spatial models and an inference rulebased spatial model are proposed and compared. Also, transfer learning technique is used to help train the model via transferring useful information from a datarich surface to a datalacking surface, which demonstrates the advantage of interconnected manufacturing. The second study deals with the surface mating errors caused by the surface variations from two interconnected surface machining processes. A model aggregating data from two surfaces is proposed to predict the leak areas for surface assembly. By using the measurements of leak areas and the profiles of surfaces mated as training data along with Hagen–Poiseuille law, this study develops a novel diagnostic method to predict potential leak areas (leakage paths). The effectiveness and robustness of the proposed method are verified by an experiment and a simulation study. The approach provides practical guidance for the subsequent assembly process as well as troubleshooting in manufacturing processes. The last study focuses on the learning of quality prediction model in interconnected additive manufacturing systems, by which different 3D printing processes involved are driven by similar printing mechanisms and can exchange quality data via a network. A quality prediction model that estimates the printing widths along the printing paths for materialextrusionbased additive manufacturing (a.k.a., fused filament fabrication or fused deposition modeling) is established by leveraging the betweenprinter quality data. The established mathematical model quantifies the printing linewidth along the printing paths based on the kinematic parameters, e.g., printing speed and acceleration while considering data from multiple printers that contain betweenmachines similarity. The method can allow for the betweenprinter knowledge sharing to improve the quality prediction so that a printing process with limited historical data can quickly learn an effective quality model without intensive retraining, thus improving the system responsiveness to product variety. In the long run, the outcome of this research can help contribute to the development of highefficient InternetofThings manufacturing services for personalized products.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Ren_fsu_0071E_15160
 Format
 Thesis
 Title
 AN INCREASING FAILURE RATE APPROACH TO CONSERVATIVE LOW DOSE EXTRAPOLATION (SAFE DOSE).
 Creator

SCHELL, MICHAEL J., Florida State University
 Abstract/Description

This dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four...
Show moreThis dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four extensions of the univariate class of IFR functions are introduced, differing in the way that convexity of the hazard function, H(x,y) = ln(1F(x,y)) is posited. The notion of dependent action is considered and a hypothesis test for its existence given., Conservative low dose extrapolation techniques for the two most prominent classes are given. An upper bound for the hazard function is established for low doses with proofs that the bounds are sharp.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8427325, 3085936, FSDT3085936, fsu:75422
 Format
 Document (PDF)
 Title
 Individual PatientLevel Data MetaAnalysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set.
 Creator

Dutton, Matthew Thomas, McGee, Daniel, Becker, Betsy, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

DerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta...
Show moreDerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the metaanalytic framework are investigated. A twostage analysis is first conducted, in which individual models are fit for each study and summarized using classical metaanalysis procedures. Secondly, a onestage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0620
 Format
 Thesis
 Title
 Inference for a nonlinear semimartingale regression model.
 Creator

Utikal, Klaus Johannes., Florida State University
 Abstract/Description

Consider the semimartingale regression model $X(t)$ = $X(0)$ + $\int\sbsp{0}{t}$ $Y(s)\alpha(s,Z(s))$ $ds + M(t)$, where $Y, Z$ are observable covariate processes, $\alpha$ is a (deterministic) function of both time and the covariate process $Z$, and $M$ is a square integrable martingale. Under the assumption that i.i.d. copies of $X, Y, Z$ are observed continuously over a finite time interval, inference for the function $\alpha(t,z)$ is investigated. Applications of this model include hazard...
Show moreConsider the semimartingale regression model $X(t)$ = $X(0)$ + $\int\sbsp{0}{t}$ $Y(s)\alpha(s,Z(s))$ $ds + M(t)$, where $Y, Z$ are observable covariate processes, $\alpha$ is a (deterministic) function of both time and the covariate process $Z$, and $M$ is a square integrable martingale. Under the assumption that i.i.d. copies of $X, Y, Z$ are observed continuously over a finite time interval, inference for the function $\alpha(t,z)$ is investigated. Applications of this model include hazard function estimation for survival analysis and inference for the drift function of a diffusion process., An estimator $\ A$ for the time integrated $\alpha(t,z)$ and a kernel estimator of $\alpha(t,z)$ itself are introduced. For $X$ a counting process, $\ A$ reduces to the NelsonAalen estimator when $Z$ is not present in the model. Various forms of consistency are proved, rates of convergence and asymptotic distributions of the estimators are derived. Asymptotic confidence bands for the time integrated $\alpha(t,z)$ and a KolmogorovSmirnovtype test of equality of $\alpha$ at different levels of the covariate are given., For the case $Y$ $\equiv$ 1 we introduce an estimator $\{\cal A}$ of the time and space integrated $\alpha(t,z)$. The asymptotic distribution of the estimator $\{\cal A}$ is derived under the assumption that the covariate process $Z$ is $\cal F\sb0$adapted, where ($\cal F\sb{t}$) is the filtration with respect to which $M$ is a martingale. In the counting process case this amounts to assuming that $X$ is a doubly stochastic Poisson process. Weak convergence of the appropriately normalized time and state indexed process $\{\cal A}$ to a Gaussian random field is shown. As an application of this result, confidence bands for the covariate state integrated hazard function of a doubly stochastic Poisson process whose intensity does not explicitly depend on time are derived.
Show less  Date Issued
 1987, 1987
 Identifier
 AAI8807999, 3086793, FSDT3086793, fsu:76268
 Format
 Document (PDF)
 Title
 Inference for Semiparametric TimeVarying Covariate Effect Relative Risk Regression Models.
 Creator

Ye, Gang, McKeague, Ian W., Wang, Xiaoming, Huffer, Fred W., Song, KaiSheng, Department of Statistics, Florida State University
 Abstract/Description

A major interest of survival analysis is to assess covariate effects on survival via appropriate conditional hazard function regression models. The Cox proportional hazards model, which assumes an exponential form for the relative risk, has been a popular choice. However, other regression forms such as Aalen's additive risk model may be more appropriate in some applications. In addition, covariate effects may depend on time, which can not be reflected by a Cox proportional hazards model. In...
Show moreA major interest of survival analysis is to assess covariate effects on survival via appropriate conditional hazard function regression models. The Cox proportional hazards model, which assumes an exponential form for the relative risk, has been a popular choice. However, other regression forms such as Aalen's additive risk model may be more appropriate in some applications. In addition, covariate effects may depend on time, which can not be reflected by a Cox proportional hazards model. In this dissertation, we study a class of timevarying covariate effect regression models in which the link function (relative risk function) is a twice continuously differentiable and prespecified, but otherwise general given function. This is a natural extension of the PrenticeSelf model, in which the link function is general but covariate effects are modelled to be time invariant. In the first part of the dissertation, we focus on estimating the cumulative or integrated covariate effects. The standard martingale approach based on counting processes is utilized to derive a likelihoodbased iterating equation. An estimator for the cumulative covariate effect that is generated from the iterating equation is shown to be ¡Ìnconsistent. Asymptotic normality of the estimator is also demonstrated. Another aspect of the dissertation is to investigate a new test for the above timevarying covariate effect regression model and study consistency of the test based on martingale residuals. For Aalen's additive risk model, we introduce a test statistic based on the HufferMcKeague weightedleastsquares estimator and show its consistency against some alternatives. An alternative way to construct a test statistic based on Bayesian Bootstrap simulation is introduced. An application to real lifetime data will be presented.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd0949
 Format
 Thesis
 Title
 Influence Measures for Bayesian Data Analysis.
 Creator

De Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College...
Show moreDe Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Identifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data...
Show moreIdentifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data, where the crossvalidated data has the ith observation removed. Although, in Bayesian inference, the posterior distribution contains all the relevant information about a parameter of interest, when the goal is prediction, perhaps the predictive distribution should be used to identifying influential observations. So, we extended our method to the comparison of the posterior predictive distributions given full data and crossvalidated data. We generalize and extend existing popular Bayesian crossvalidated influence diagnostics using Bregman divergence based measure (BD). We derive useful properties of these BD based on the influence of each observation on the posterior distribution and we show that it can be extended to the predictive distribution. We show that these BD based measures allow interpretable calibration and that they can be computed via Monte Carlo Markov Chain (MCMC) samples from a single posterior based on full data. We illustrate how our new measure of influence of observations have more useful practical roles for data analysis than popular Bayesian residual analysis tools (CPO) in an example of metaanalysis with binary response and in other cases of intervalcensored data.
Show less  Date Issued
 2018
 Identifier
 2018_Su_DeOliveira_fsu_0071E_14712
 Format
 Thesis
 Title
 INFORMATION IN CENSORED MODELS.
 Creator

SCONING, JAMES., Florida State University
 Abstract/Description

Criteria are developed for measuring information in the randomly rightcensored model. Measures which are appropriate include an extension of Shannon's entropy. The measures are seen to satisfy some fundamental properties including (1) information decreases as censoring increases stochastically, (2) the uncensored case is always at least as informative as any censored model, and (3) the information gain is marginally decreasing., Measures of information in censored models can also be...
Show moreCriteria are developed for measuring information in the randomly rightcensored model. Measures which are appropriate include an extension of Shannon's entropy. The measures are seen to satisfy some fundamental properties including (1) information decreases as censoring increases stochastically, (2) the uncensored case is always at least as informative as any censored model, and (3) the information gain is marginally decreasing., Measures of information in censored models can also be developed by adapting measures of dependence between the lifetime variable and the observed variable. Some common notions of bivariate dependence enjoy property (1) cited above. An exception occurs when dependence is defined in terms of association. Conditions under which the coefficients of divergence satisfy (1) and (2) are established., Information is also studied in terms of asymptotic efficiency. We consider the proportional hazards model where the distribution G of the censoring random variable is related to the distribution F of the lifetime variable via (1G) = (1F)(beta). Nonparametric estimators of F are developed for the case where (beta) is unknown and the case where (beta) is known. Of interest in their own right, these estimators also enable us to study the robustness of the KaplanMeier estimator (KME) in a nonparametric model for which it is not the preferred estimator. Comparisons are based on asymptotic efficiencies and exact mean square errors. We also compare the KME to the empirical survival function thereby providing, in a nonparametric setting, a measure of the loss in efficiency due to censoring.
Show less  Date Issued
 1986, 1986
 Identifier
 AAI8605791, 3086279, FSDT3086279, fsu:75762
 Format
 Document (PDF)
 Title
 Intensity Estimation in Poisson Processes with Phase Variability.
 Creator

Gordon, Glenna, Wu, Wei, Whyte, James, Srivastava, Anuj, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a...
Show moreIntensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a result, these estimation methods can yield estimators that are inefficient or that underperform in simulations and applications. This dissertation summarizes two key projects which examine estimation of the intensity of a Poisson process in the presence of phase variability. The first project proposes an alignmentbased framework for intensity estimation. First, it is shown that the intensity function is areapreserved with respect to compositional noise. Such a property implies that the time warping is only encoded in the density, or normalized intensity, function. Then, the intensity function can be decomposed into the product of the estimated total intensity (a scalar value) and the estimated density function. The estimation of the density relies on a metric which measures the phase difference between two density functions. An asymptotic study shows that the proposed estimation algorithm provides a consistent estimator for the normalized intensity. The success of the proposed estimation algorithm is illustrated using two simulations and the new framework is applied in a real data set of neural spike trains, showing that the proposed estimation method yields improved classification accuracy over previous methods. The second project utilizes 2014 Florida data from the Healthcare Cost and Utilization Project's State Inpatient Database and State Emergency Department Database (provided to the U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality by the Florida Agency for Health Care Administration) to examine heart failure emergency department arrival times. Current estimation methods for examining emergency department arrival data ignore the functional nature of the data and implement naive analysis methods. In this dissertation, the arrivals are treated as a Poisson process and the intensity of the process is estimated using existing density estimation and function registration methods. The results of these analyses show the importance of considering the functional nature of emergency department arrival data and the critical role that function registration plays in the intensity estimation of the arrival process.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Gordon_fsu_0071E_13511
 Format
 Thesis
 Title
 Interrelating of Longitudinal Processes: An Empirical Example.
 Creator

RoyalThomas, Tamika Y. N., McGee, Daniel, Levenson, Cathy, Sinha, Debajyoti, Osmond, Clive, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process Blood pressure and anthropometric measurements at seven timepoints on the mother during pregnancy. (ii) In Utero measurements  Ultrasound measurements of the fetus taken at six timepoints during...
Show moreThe Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process Blood pressure and anthropometric measurements at seven timepoints on the mother during pregnancy. (ii) In Utero measurements  Ultrasound measurements of the fetus taken at six timepoints during pregnancy. (iii) Birth to present process  Children's anthropometric and blood pressure measurements at 24 timepoints from birth to 14 years. A comprehensive analysis of the interrelationship of these three longitudinal processes is presented using joint modeling for multivariate longitudinal profiles. We propose a new methodology of examining child's cardiovascular risk by extending a current view of likelihood estimation. Joint modeling of multivariate longitudinal profiles is done and the extension of the traditional likelihood method is utilized in this paper and compared to the maximum likelihood estimates. Our main goal is to examine whether the process in mothers predicts fetal development which in turn predicts the future cardiovascular health of the children. One of the difficulties with `in utero' and early childhood data is that certain variables are highly correlated and so using dimension reduction techniques are quite applicable in this scenario. Principal component analysis (PCA) is utilized in creating a smaller dimension of uncorrelated data which is then utilized in a longitudinal analysis setting. These principal components are then utilized in an optimal linear mixed model for longitudinal data which indicates that in utero and early childhood attributes predicts the future cardiovascular health of the children. This dissertation has added a body of knowledge to developmental origins of adult diseases and has supplied some significant results while utilizing a rich diversity of statistical methodologies.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1792
 Format
 Thesis
 Title
 Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
 Creator

Franks, Billy J., McGee, Daniel, Hurt, Myra, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain dataderived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd4402
 Format
 Thesis
 Title
 Investigating the ChiSquareBased ModelFit Indexes for WLSMV and ULSMV Estimators.
 Creator

Xia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of...
Show moreXia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

In structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from...
Show moreIn structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from DWLS so that its first and second order moments are in alignment with the target central chisquare distribution under correctly specified models. DWLS with such a correction is called the mean and varianceadjusted weighted least squares (WLSMV) estimator. An alternative to WLSMV is the meanand varianceadjusted unweighted least squares (ULSMV) estimator, which has been shown to perform as well as, or slightly better than WLSMV. Because the chisquare statistic is corrected, the chisquarebased RMSEA, CFI, and TLI are thus also corrected by replacing the uncorrected chisquare statistic with the robust chisquare statistic. The robust model fit indexes calculated in such a way are named as the populationcorrected robust (PR) model fit indexes following BrosseauLiard, Savalei, and Li (2012). The PR model fit indexes are currently reported in almost every application when WLSMV or ULSMV is used. Nevertheless, previous studies have found the PR model fit indexes from WLSMV are sensitive to several factors such as sample sizes, model sizes, and thresholds for categorization. The first focus of this dissertation is on the dependency of model fit indexes on the thresholds for ordered categorical data. Because the weight matrix in the WLSMV fit function and the correction factors for both WLSMV and ULSMV include the asymptotic variances of thresholds and polychoric correlations, the model fit indexes are very likely to depend on the thresholds. The dependency of model fit indexes on the thresholds is not a desirable property, because when the misspecification lies in the factor structures (e.g., cross loadings are ignored or two factors are considered as a single factor), model fit indexes should reflect such misspecification rather than the threshold values. As alternatives to the PR model fit indexes, BrosseauLiard et al. (2012), BrosseauLiard and Savalei (2014), and Li and Bentler (2006) proposed the samplecorrected robust (SR) model fit indexes. The PR fit indexes are found to converge to distorted asymptotic values, but the SR fit indexes converge to their definitions asymptotically. However, the SR model fit indexes were proposed for continuous data, and have been neither investigated nor implemented in SEM software when WLSMV and ULSMV are applied. This dissertation thus investigates the PR and SR model fit indexes for WLSMV and ULSMV. The first part of the simulation study examines the dependency of the model fit indexes on the thresholds when the model misspecification results from omitting crossloadings or collapsing factors in confirmatory factor analysis. The study is conducted on extremely large computergenerated datasets in order to approximate the asymptotic values of model fit indexes. The results find that only the SR fit indexes from ULSMV are independent of the population threshold values, given the other design factors. The PR fit indexes from ULSMV, and the PR and SR fit indexes from WLSMV are influenced by thresholds, especially when data are binary and the hypothesized model is greatly misspecified. The second part of the simulation varies the sample sizes from 100 to 1000 to investigate whether the SR fit indexes under finite samples are more accurate estimates of the defined values of RMSEA, CFI, and TLI, compared with the uncorrected model fit indexes without robust correction and the PR fit indexes. Results show that the SR fit indexes are the more accurate in general. However, when the thresholds are different across items, data are binary, and sample size is less than 500, all versions of these indexes can be very inaccurate. In such situations, larger sample sizes are needed. In addition, the conventional cutoffs developed from continuous data with maximum likelihood (e.g., RMSEA < .06, CFI > .95, and TLI > .95; Hu & Bentler, 1999) have been applied to WLSMV and ULSMV regardless of the arguments against such a practice (e.g., Marsh, Hau, & Wen, 2004). For comparison purposes, this dissertation reports the RMSEA, CFI, and TLI based on continuous data using maximum likelihood before the variables are categorized to create ordered categorical data. Results show that the model fit indexes from maximum likelihood are very different from those from WLSMV and ULSMV, suggesting that the conventional rules should not be applied to WLSMV and ULSMV.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Xia_fsu_0071E_13379
 Format
 Thesis
 Title
 Investigating the Use of Mortality Data as a Surrogate for Morbidity Data.
 Creator

Miller, Gregory, Hollander, Myles, McGee, Daniel, Hurt, Myra, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate...
Show moreWe are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and find that the coefficients differ for the age covariate, but find no significant differences for the other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to represent the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped confidence intervals, and find that age and systolic blood pressure have differences with their covariates. We continue the simulation by using Net Reclassification Index (NRI) to evaluate the treatment decision performance of the two models, and find that the two models do not perform significantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across different risk models, treatment decisions based on classifying an upper specified percent as high risk will not be significantly different. We conclude the dissertation with statements about future methods for approaching our question.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd2408
 Format
 Thesis
 Title
 AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES.
 Creator

WOOLLEY, THOMAS WILLIAM, JR., Florida State University
 Abstract/Description

Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the...
Show moreStatistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples., Pseudorandom samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper nonoutlier discordant, as a unit., The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described., Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8113272, 3084903, FSDT3084903, fsu:74401
 Format
 Document (PDF)
 Title
 KLEMS translog cost estimates and energy elasticities.
 Creator

Campbell, Timothy Alan., Florida State University
 Abstract/Description

Data from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the...
Show moreData from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the time proxy for technological change to follow a random walk with drift. The general state space form provides a unified structure that subsumes other models. After smoothing the Kalman filter model is equivalent to including time proxy., An errorcorrection model or ECM is used to make the translog specification more dynamic. Nested within the most general ECM specification are the more restrictive static, partial adjustment, and autoregressive models. Likelihood ratio tests reject the more restricted models in favor of the general ECM specification, but theoretical symmetry and addingup restrictions are rejected for most twodigit Standard Industrial Code industries using the general ECM specification. Elasticities are computed for total manufacturing and compared with those found in other studies with a special emphasis on energy. Many violations of the monotonic, ownprice, and concavity theoretical requirements are found.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9410157, 3088225, FSDT3088225, fsu:77029
 Format
 Document (PDF)
 Title
 Knowledge acquisition and pattern recognition with random sets.
 Creator

Peng, Xiantu T., Florida State University
 Abstract/Description

In this dissertation we investigate knowledge acquisition (KA) and pattern recognition (PR) from a mathematical point of view. Based on random set theory, we develop some estimation theorems and procedures for setvalued statistics, such as nonparametric estimators and setvaluedization techniques. Under random interval assumption, we establish some special possibility distributions that can be easily implemented in KA tools. The knowledge studied here are rules describing relationships...
Show moreIn this dissertation we investigate knowledge acquisition (KA) and pattern recognition (PR) from a mathematical point of view. Based on random set theory, we develop some estimation theorems and procedures for setvalued statistics, such as nonparametric estimators and setvaluedization techniques. Under random interval assumption, we establish some special possibility distributions that can be easily implemented in KA tools. The knowledge studied here are rules describing relationships between various concepts, as used in diagnosis (pattern recognition) expert systems. Several examples are given to illustrate the estimation theorems and procedures for the acquisition of concepts and relationships. We use our acquisition techniques on a modeling prediction example in two different ways: one is by acquiring the concepts and relationships simultaneously, another is by acquiring rules for predefined concepts. On two classification problems, we use our methods to acquire classification rules. The results are compared with several machine learning methods. Finally, we introduce an expert system shell, STIM, which is largely based on the theory and methods developed here. The embedded KA tools and recognition process are discussed in detail.
Show less  Date Issued
 1991, 1991
 Identifier
 AAI9213744, 3087750, FSDT3087750, fsu:76560
 Format
 Document (PDF)
 Title
 LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
 Creator

CHAGANTY, NARASINGA RAO., Florida State University
 Abstract/Description

Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of nonlattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signedrank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('1)exp{H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n)  n(tau)(,n))/n('1 1/2r) has a limiting distribution which is nonGaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of JongWoo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117139). Chapters IV and V extend the above to the multivariate case.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8225279, 3085419, FSDT3085419, fsu:74914
 Format
 Document (PDF)
 Title
 Learning Political Will in Organizations: A Social Learning Theory Perspective.
 Creator

Maher, Liam Patrick, Ferris, Gerald R., Schatschneider, Christopher, Hochwarter, Wayne A., Van Iddekinge, Chad H., Wang, Gang, Florida State University, College of Business,...
Show moreMaher, Liam Patrick, Ferris, Gerald R., Schatschneider, Christopher, Hochwarter, Wayne A., Van Iddekinge, Chad H., Wang, Gang, Florida State University, College of Business, Department of Management
Show less  Abstract/Description

The past several decades have seen great advances in the field of organizational politics. At the individual level, political skill has garnered the majority of the scholarly focus, whereas it's motivational counterpart, political will, has gone relatively unexamined. Political will represents the motivation to engage in political behavior, which, regardless of the skill with which it is executed, potentially has tremendous effects on myriad different organizational outcomes. Thus, it is...
Show moreThe past several decades have seen great advances in the field of organizational politics. At the individual level, political skill has garnered the majority of the scholarly focus, whereas it's motivational counterpart, political will, has gone relatively unexamined. Political will represents the motivation to engage in political behavior, which, regardless of the skill with which it is executed, potentially has tremendous effects on myriad different organizational outcomes. Thus, it is critical for scholars to understand how political will spreads through work units. This dissertation synthesizes theories of political will, political skill, social identity, social learning, and relationship quality to explain the process of how followers learn political will from their leaders and environments. Specifically, I plan to show that when leaders possess political will, they engage in political behavior. Followers will learn the virtues and drawbacks of political behavior from their leaders, both vicariously and through direct mentoring, and thus their political will should be a function of their leader’s political will. Leaders and their many followers differ in their levels of leadermember relationship quality, political skill, and selfconcept congruence, it is proposed that these differences will drive the level of learning that occurs. The proposed model is tested using data from 406 government workers and their 78 direct supervisors. The primary analyses only supported the hypothesis that leader political will predicts leader political behavior. Exploratory analyses that employed follower rated measures of leader political behavior provided evidence that follower political will is a function of follower perceptions of their leader’s political behavior and their own histories with organizational politics. Strengths, limitations, and opportunities for future research are discussed.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Maher_fsu_0071E_14422
 Format
 Thesis
 Title
 Likelihood ratio based confidence bands in survival analysis.
 Creator

Yang, Jie., Florida State University
 Abstract/Description

Thomas and Grunkemeier (1975) introduced a nonparametric likelihood ratio approach to confidence interval estimation of survival probabilities based on right censored data. We construct simultaneous confidence bands for survival, cumulative hazard rate and quantile functions using this approach. The boundaries of the bands for survival functions are contained within (0,1). A procedure essentially equivalent to a bias correction is developed. The resulting increase in coverage accuracy is...
Show moreThomas and Grunkemeier (1975) introduced a nonparametric likelihood ratio approach to confidence interval estimation of survival probabilities based on right censored data. We construct simultaneous confidence bands for survival, cumulative hazard rate and quantile functions using this approach. The boundaries of the bands for survival functions are contained within (0,1). A procedure essentially equivalent to a bias correction is developed. The resulting increase in coverage accuracy is illustrated by an example and a simulation study. We look at various versions of likelihood ratio based (LR) confidence bands for the survival function and compare them with the HallWellner band and Nair's equal precision band. We show that LR bands for the cumulative hazard rate function and the quantile function can be obtained by employing a functional and the inverse transformation of the survival function respectively to an LR band for the survival function. At the mean time, the testbased and reflected methods are shown to be valid for constructing bands for the quantile function. The various confidence bands for the quantile function are illustrated through an example.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9544337, 3088762, FSDT3088762, fsu:77561
 Format
 Document (PDF)
 Title
 Limit theorems for Markov random fields.
 Creator

Kurien, Thekkthalackal Varugis., Florida State University
 Abstract/Description

Markov Random Fields (MRF's) have been extensively applied in Statistical Mechanics as well as in Bayesian Image Analysis. MRF's are a special class of dependent random variables located at the vertices of a graph whose joint distribution includes a parameter called the temperature. When the number of vertices of the graph tends to infinity, the normalized distribution of statistics based on these random variables converge in distribution. It can happen that for certain values of the...
Show moreMarkov Random Fields (MRF's) have been extensively applied in Statistical Mechanics as well as in Bayesian Image Analysis. MRF's are a special class of dependent random variables located at the vertices of a graph whose joint distribution includes a parameter called the temperature. When the number of vertices of the graph tends to infinity, the normalized distribution of statistics based on these random variables converge in distribution. It can happen that for certain values of the temperature, that the rate of growth of these normalizing constants change drastically. This feature is generally used to explain the phenomenon of phase transition as understood by physicists. In this dissertation we will show that this drastic change in normalizing constants occurs even in the relatively smooth case when all the random variables are Gaussian. Hence any image analytic MRF ought to be checked for such discontinuous behavior before any analysis is performed., Mixed limit theorems in Bayesian Image Analysis seek to replace intensive simulations of MRF's with limit theorems that approximate the distribution of the MRF's as the number of sites increases. The problem of deriving mixed limit theorems for MRF's on a one dimensional lattice graph with an acceptor function that has a second moment has been studied by Chow. A mixed limit theorem for the integer lattice graph is derived when the acceptor function does not have a second moment as for instance when the acceptor function is a symmetric stable density of index 0 $<$ $\alpha$ $<$ 2.
Show less  Date Issued
 1991, 1991
 Identifier
 AAI9202304, 3087655, FSDT3087655, fsu:76470
 Format
 Document (PDF)
 Title
 Logistic Regression, Measures of Explained Variation, and the Base Rate Problem.
 Creator

Sharma, Dinesh R., McGee, Daniel L., Hurt, Myra, Niu, XuFeng, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

One of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the...
Show moreOne of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the R2 measure used for logistic regression analysis are highly sensitive to the base rate (proportion of successes in the sample) and thus do not possess this property. An R2 measure sensitive to the base rate is not suitable to comparison for the same or different model on different datasets, different subsets of a dataset or different but related dependent variables. We evaluated 14 R2 measures that have been suggested or might be useful to measure the explained variation in the logistic regression models based on three criteria 1) intuitively reasonable interpret ability; 2) numerical consistency with the Rho2 of underlying model, and 3) the base rate sensitivity. We carried out a Monte Carlo Simulation study to examine the numerical consistency and the base rate dependency of the various R2 measures for logistic regression analysis. We found all of the parametric R2 measures to be substantially sensitive to the base rate. The magnitude of the base rate sensitivity of these measures tends to be further influenced by the rho2 of the underlying model. None of the measures considered in our study are found to perform equally well in all of the three evaluation criteria used. While R2L stands out for its intuitively reasonable interpretability as a measures of explained variation as well as its independence from the base rate, it appears to severely underestimate the underlying rho2. We found R2CS to be numerically most consistent with the underlying Rho2, with R2N its nearest competitor. In addition, the base rate sensitivity of these two measures appears to be very close to that of the R2L, the most base rate invariant parametric R2 measure. Therefore, we suggest to use R2CS and R2N for logistic regression modeling, specially when it is reasonable to believe that a underlying latent variable exists. However, when the latent variable does not exit, comparability with theunderlying rho2 is not an issue and R2L might be a better choice over all the R2 measures.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd1789
 Format
 Thesis
 Title
 LUMPABILITY AND WEAK LUMPABILITY IN FINITE MARKOV CHAINS.
 Creator

ABDELMONEIM, ATEF MOHAMED., Florida State University
 Abstract/Description

Consider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial...
Show moreConsider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial probability vector of x(t), x(t) is said to be lumped to y(t) with respect to the partition A. If y(t) is again Markov for only certain initial probability vectors of x(t), x(t) is said to be weakly lumped to y(t) with respect to the partition A., Conditions under which x(t) can be lumped or weakly lumped to y(t) with respect to A, are introduced. Relationships between the two processes x(t) and y(t) and the properties of the new process y(t) are discussed., Criteria are developed to determine whether a given Markov chain can be weakly lumped with respect to a given partition in terms of an analysis of systems of linear equations. Necessary and sufficient conditions on the transition probability matrix of a Markov chain, a partition, A, on N and a subset S of probability vectors for weak lumpability to occur are given in terms of the solution classes to these systems of linear equations. Finally, given that weak lumping occurs, the class S of all initial probability vectors which allow weak lumping is determined as is the transition probability matrix of the lumped process, y(t)., Lumpability and weak lumpability are also studied for Markov chains which are not irreducible. This involves a study of the interplay between two partitions of the state space N, the partition C, induced by the closed sets of states of the Markov chain and the partition A, with respect to which lumpability is to be considered. Under the assumptions that lumpability occurs the relationships which must exist between sets of the two partitions A and C are obtained in detail. It is found, for example that if neither partition is a refinement of the other and (A,C) form an irreducible pair of partitions over N then for each A (epsilon) A and C (epsilon) C, A (INTERSECT) C (NOT=) (phi). Further conditions which the transition probability matrix P must satisfy if lumpability is to hold are obtained as are relationships which must exist between P and P*., Suppose a process y(t) is known to arise as a result of a weak lumping or lumping from some unknown Markov chain x(t). Let (chi)(t) be the class of all Markov chains x(t) with n states which yield this weak lumping or lumping. The problem of characterizing this class and a class S of initial probability vectors which allow this lumping is considered. A complete solution is given when n = 3 and m = 2., The importance of lumpability in application is discussed.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8109927, 3084860, FSDT3084860, fsu:74361
 Format
 Document (PDF)
 Title
 Marked Determinantal Point Processes.
 Creator

Feng, Yiming, Nolder, Craig, Niu, Xufeng, Bradley, Jonathan R., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Determinantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas...
Show moreDeterminantal point processes (DPPs), which can be dened by their correlation kernels with known moments, are useful models for point patterns where nearby points exhibit repulsion. They have many nice properties, such as closedform densities, tractable estimation of parameterized families, and no edge eects. In the past, univariate DPPs have been wellstudied, both in discrete and continuous settings although their statistical applications are fairly recent and still rather limited, whereas the multivariate DPPs, or the socalled multitype marked DPPs, have been little explored. In this thesis, we propose a class of multivariate DPPs based on a block kernel construction. For the marked DPP, we show that the conditions of existence of DPP can easily be satised. The block construction allows us to model the individually marked DPPs as well as controlling the scale of repulsion of points having dierent marks. Unlike other researchers who model the kernel function of a DPP, we model its spectral representation, which not only guarantees the existence of the multivariate DPP, but makes the simulationbased estimation methods readily available. In our research, we adopted bivariate complex Fourier basis, which demonstrates nice properties such as constant intensity and approximate isotropy within a short distance between the nearby points. The parameterized block kernels can approximate to commonlyused covariance functions using Fourier expansion. The parameters can be estimated using Maximum Likelihood Estimation, Bayesian approach and Minimum Contrast Estimation.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Feng_fsu_0071E_15011
 Format
 Thesis
 Title
 Matched Sample Based Approach for CrossPlatform Normalization on Gene Expression Data.
 Creator

Shao, Jiang, Zhang, Jinfeng, Sang, QingXiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Geneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to...
Show moreGeneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distanceweighted discrimination method. With the public webbased tool NCI60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both crossplatform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based crossplatform normalization method.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Shao_fsu_0071E_12833
 Format
 Thesis
 Title
 A MatchedSampleBased Normalization Method: CrossPlatform Microarray and NGS Data Integration.
 Creator

Zhang, Se Rin, Zhang, Jinfeng, Sang, QingXiang, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Utilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and...
Show moreUtilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and tried to remove the differences caused by the platform discrepancy but they also remove the important biological signals as well. Zhang and Jiang (2015) introduced a new method focusing on eliminating platform effect among systematic effects by employing matched samples which are measured by different platforms for getting a benchmark model. Since the matched sample have no biological difference, their approach is robust to get rid of solely the platform effect. They showed that the new method performs better than Distance Weighted Discrimination (DWD) method. This paper is a followup study of their work and we attempt to improve the new method by incorporating Fast Linear Mixed Regression (FLMER) model. The result indicates that the FLMER model works better than the original proposed model, OLS (Ordinary Least Squares) model in afternormalization concordance comparison and Differential Expression(DE) analysis. Also, we compare our methods to other existing crossplatform normalization methods not only DWD but also Empirical Bayes methods, XPN and GQ methods. The results showed that the proposed method performs much better than other crossplatform normalization methods for removing platform differences and keeping the biological information.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Zhang_fsu_0071E_14868
 Format
 Thesis
 Title
 A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS.
 Creator

TIWARI, RAM CHANDRA., Florida State University
 Abstract/Description

This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset...
Show moreThis dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi)., A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) Ginvariant Dirichlet process (G being a finite group of transformations) is presented., The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is Ginvariant Dirichlet process is also obtained.(,), Let ((chi),A) be the onedimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak, convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If, {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained., When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8108190, 3084828, FSDT3084828, fsu:74329
 Format
 Document (PDF)
 Title
 Median Regression for Complex Survey Data.
 Creator

Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the doubletransformbothsides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations. Furthermore, the doubletransformbothsides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Fraser_fsu_0071E_12825
 Format
 Thesis