Current Search: Research Repository (x) » Statistics (x)
Search results
Pages
 Title
 Generalized PearsonFisher chisquare goodness of fit tests, with applications to models with life history data.
 Creator

Li, Gang., Florida State University
 Abstract/Description

Suppose that $X\sb1,\...,X\sb{n}$ are i.i.d. $\sim$ F, and we wish to test the null hypothesis that F is a member of the parametric family ${\cal F}=\{F\sb\theta(x);$ $\theta\in\Theta\}$ where $\Theta\subset\IR\sp{q}.$ The classical PearsonFisher chisquare test involves partitioning the real axis into k cells $I\sb1,\...,I\sb{k}$ and forming the chisquare statistic $X\sp2=\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb{\\theta}(I\sb{i}))\sp2/nF\sb{\\theta}(I\sb{i}),$ where $O\sb{i}$ is the number...
Show moreSuppose that $X\sb1,\...,X\sb{n}$ are i.i.d. $\sim$ F, and we wish to test the null hypothesis that F is a member of the parametric family ${\cal F}=\{F\sb\theta(x);$ $\theta\in\Theta\}$ where $\Theta\subset\IR\sp{q}.$ The classical PearsonFisher chisquare test involves partitioning the real axis into k cells $I\sb1,\...,I\sb{k}$ and forming the chisquare statistic $X\sp2=\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb{\\theta}(I\sb{i}))\sp2/nF\sb{\\theta}(I\sb{i}),$ where $O\sb{i}$ is the number of observations falling into cell i and $\\theta$ is the value of $\theta$ minimizing $\Sigma\sbsp{i=1}{k}$ $(O\sb{i}  nF\sb\theta(I\sb{i}))\sp2/nF\sb\theta(I\sb{i}).$ We obtain a generalization of this test to any situation for which there is available a nonparametric estimator F of F for which $n\sp{1\over2}(\{F}  F){d\atop\to}W$ where W is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\\theta$ that minimizes a "distance" between the vectors $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb\theta(I\sb1),\...,F\sb\theta(I\sb{k})),$ where distance is measured through an arbitrary positive definite quadratic form, and then form a chisquare type test statistic based on the difference between $(\{F}(I\sb1),\...,\{F}(I\sb{k}))$ and $(F\sb{\\theta}(I\sb1),\...,F\sb{\\theta}(I\sb{k})).$ We prove that this test statistic has asymptotically a chisquare distribution with $kq1$ degrees of freedom, and point out some errors in the literature on chisquare tests in survival analysis. Our procedure is very general and applies to a number of wellknown models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation, period of the AIDS virus using the CDC's data on bloodtransfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9234234, 3087898, FSDT3087898, fsu:76708
 Format
 Document (PDF)
 Title
 Generating Poisson and binomial random variates.
 Creator

Lee, WenChiung., Florida State University
 Abstract/Description

Many methods for generating variates from discrete distributions have been developed over the past years. They vary from simple to complicated, from specific to general. Some are based on interesting underlying theory, while others are more concerned with efficient computer implementation., This dissertation is directed toward the latter. We describe methods that are best suited for efficient (fast) computer implementation. We develop specific programs for both the Poisson and the binomial...
Show moreMany methods for generating variates from discrete distributions have been developed over the past years. They vary from simple to complicated, from specific to general. Some are based on interesting underlying theory, while others are more concerned with efficient computer implementation., This dissertation is directed toward the latter. We describe methods that are best suited for efficient (fast) computer implementation. We develop specific programs for both the Poisson and the binomial distributions with two versions of each, one for when the parameters are fixed and the other for when the parameters change from call to call. These programs are developed with a sparenoexpense attitude, and timing comparisons will support our belief that they are faster than any other published methods., For the fixedparameter case, an algorithm which combines the table lookup, the square histogram (Marsaglia's lecture notes), and the direct search method is given. We will apply the algorithm to the Poisson and the binomial distributions., For the variableparameter Poisson case, we take advantage of Marsaglia's (1986) approach and incorporate additional techniques in order to have a Poisson variate generator which works for any value of $\lambda$, using, most of the time, the integer part of a polynomial in a normal variate. We extend the procedure to the binomial distribution.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9334283, 3088178, FSDT3088178, fsu:76985
 Format
 Document (PDF)
 Title
 Geometric Approaches for Analysis of Images, Densities and Trajectories on Manifolds.
 Creator

Zhang, Zhengwu, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis....
Show moreIn this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis. Second, the representation spaces are often nonlinear manifolds. Thus, proper Riemannian structures are necessary to compare objects. Third, the analysis and comparison of objects need be invariant to certain nuisance variables. For example, comparison between two images should be invariant to their blur levels, and comparison between timeindexed trajectories on manifolds should be invariant to their temporal evaluation rates. We start by introducing frameworks for representing, comparing and analyzing functions in Euclidean space including signals, images and densities, and the comparisons are invariant to the Gaussian blur existed in these objects. Applications in blur levels matching, blurred image recognition, image classification and twosample hypothesis test are discussed. Next, we present frameworks for analyzing longitudinal trajectories on a manifold M, while the analysis is invariant to the reparameterization action (temporal variation). Particularly, we are interested in analyzing trajectories in two manifolds: the twosphere and the set of symmetric positivedefinite matrices. Applications such as bird migration and hurricane tracks analysis, visual speech recognition and hand gesture recognition are used to demonstrate the advantages of the proposed frameworks. In the end, a Bayesian framework for clustering of shapes of curves is presented, and examples of clustering cell shapes and protein structures are discussed.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9503
 Format
 Thesis
 Title
 GoodnessofTests for Logistic Regression.
 Creator

Wu, Sutan, McGee, Dan L., Zhang, Jinfeng, Hurt, Myra, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

The generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the...
Show moreThe generalized linear model and particularly the logistic model are widely used in public health, medicine, and epidemiology. Goodnessoffit tests for these models are popularly used to describe how well a proposed model fits a set of observations. These different goodnessoffit tests all have individual advantages and disadvantages. In this thesis, we mainly consider the performance of the "HosmerLemeshow" test, the Pearson's chisquare test, the unweighted sum of squares test and the cumulative residual test. We compare their performance in a series of empirical studies as well as particular simulation scenarios. We conclude that the unweighted sum of squares test and the cumulative sums of residuals test give better overall performance than the other two. We also conclude that the commonly suggested practice of assuming that a pvalue less than 0.15 is an indication of lack of fit at the initial steps of model diagnostics should be adopted. Additionally, D'Agostino et al. presented the relationship of the stacked logistic regression and the Cox regression model in the Framingham Heart Study. So in our future study, we will examine the possibility and feasibility of the adaption these goodnessoffit tests to the Cox proportional hazards model using the stacked logistic regression.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0693
 Format
 Thesis
 Title
 High Level Image Analysis on Manifolds via Projective Shapes and 3D Reflection Shapes.
 Creator

Lester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences,...
Show moreLester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Shape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D...
Show moreShape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D reflection shape) and human facial configurations extracted from digital camera images. We also present the method of MANOVA on manifolds, and apply it to face data extracted from digital camera images. Finally, we present a new statistical tool called antiregression.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Lester_fsu_0071E_13856
 Format
 Thesis
 Title
 A hypothesis test of cumulative sums of multinomial parameters.
 Creator

Clair, James Hunter., Florida State University
 Abstract/Description

Consider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $<$ 1. We wish to have at least 100 ($\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$)% of items repaired by time $L\sb{i}$, $1 \le i \le K$, $K \ge 2$. Denote the unknown quantity $F(L\sb{i}$)  $F(L\sb{i1})$ as $p\sb{i}$, $1 \le i \le K$. Thus we wish to test the hypothesis(UNFORMATTED TABLE OR...
Show moreConsider $N$ times to repair, $T\sb1,T\sb2\cdots,T\sb{N}$, from a repair time distribution function $F(\cdot)$. Let $p\sb{0~1},p\sb{0~2},\cdots,p\sb{0~K}$ be $K$ proportions with $\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$ $<$ 1. We wish to have at least 100 ($\sum\sbsp{\nu =1}{K}p\sb{0~\nu}$)% of items repaired by time $L\sb{i}$, $1 \le i \le K$, $K \ge 2$. Denote the unknown quantity $F(L\sb{i}$)  $F(L\sb{i1})$ as $p\sb{i}$, $1 \le i \le K$. Thus we wish to test the hypothesis(UNFORMATTED TABLE OR EQUATION FOLLOWS), A simple procedure is to test this hypothesis with the $K$ statistics $N\sb1$, $\sum\sbsp{\nu=1}{2}N\sb{\nu},\cdots,\sum\sbsp{\nu=a}{K}N\sb{\nu}$, where $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ = the number of repairs that takes place on or before $l\sb{i}$, $1 \le i \le K$. Each $\sum\sbsp{\nu=n}{i}N\sb{\nu}$ is a binomial random variable with unknown parameter $\sum\sbsp{\nu=1}{i}p\sb{\nu}$. The hypothesis H$\sb0$ is rejected if any of the $\sum\sbsp{\nu=1}{i}N\sb{\nu}$ $\le$ $n\sbsp{i}{0}$, where the $n\sbsp{i}{0}$ are chosen from binomial tables. This test is shown to have several deficiencies. We construct an alternative procedure with which to test this hypothesis., The Generalized Likelihood Ratio Statistic (GLRT) is based on the multinomial random variable ($N\sb1,N\sb2,\cdots,N\sb{K}$), with parameter ${(p\sb1,}$ $p\sb2,\cdots,$ $p\sb{K}$). The parameter space is(UNFORMATTED TABLE OR EQUATION FOLLOWS), An algorithm is constructed and computer code supplied to calculate $\lambda(N)$ efficiently for any finite $N$., For small samples computer code is given to calculate exactly $\delta$ or a pvalue for an observed value of $\lambda(N(K))$, 2 $\le$ $K$ $\le$ 5, and $K\ \le\ N\ \le\ N(K)$., For large $N$, we apply a theorem by Feder(1968) to evaluate the asymptotic critical values and power., The GLRT statistic, $\lambda(N)$, is shown to be approximately a unionintersection test and thus is approximated by a collection of uniformly most powerful unbiased tests of binomial parameters. The GLRT is shown empirically in the case of $K$ = 3 to have higher power than competing unionintersection tests., Two power estimation techniques are described and compared empirically., References. Feder, Paul J. (1968), "On the distribution of the loglikelihood ratio test statistic when the true parameter is 'near' the boundaries of the hypothesis region," Annals of Mathematical Statistics, 39, 20442055.
Show less  Date Issued
 1988, 1988
 Identifier
 AAI8822443, 3161637, FSDT3161637, fsu:77837
 Format
 Document (PDF)
 Title
 Identifiability in the autopsy model of reliability theory.
 Creator

Antoine, Robin Michael., Florida State University
 Abstract/Description

Let S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to...
Show moreLet S be a coherent system of m components acting independently. Two statistical models are considered. In the autopsy model S is observed until it fails. The set of failed components and the failure time of the system are noted. The failure times of the dead components are not known. In the second model, which was considered by Doss, Freitag and Proschan (Ann. Statist., 1989), the failure times of the dead components are also known., In the autopsy model, it is not always possible to estimate or identify the component lifelengths from the observed data. A sufficient condition for the identifiability of the component distributions is given for the case in which the distributions are assumed to be analytic. Necessary and sufficient conditions are given for the case in which the distributions are assumed to belong to certain parametric families., The model of Doss, Freitag and Proschan is considered in two special cases. In the first of these the component distributions are known to be identical. In the second, the distributions are known to be exponential. Estimators of the component and system life lengths are given for each of these cases, and the asymptotic relative efficiency of each with respect to the corresponding estimator of Doss, Freitag and Proschan is calculated.
Show less  Date Issued
 1992, 1992
 Identifier
 AAI9222356, 3087814, FSDT3087814, fsu:76624
 Format
 Document (PDF)
 Title
 Identifying influential effects in factorial experiments with sixteen runs: Empirical Bayes approaches.
 Creator

Chen, ChingHsiang., Florida State University
 Abstract/Description

To identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study...
Show moreTo identify influential effects in unreplicated (possibly fractionated) factorial experiments, the effectsparsity assumption (Box and Meyer (1986), Technometrics 28. 1118) has been adopted in many studies. Although this assumption has been traditionally used for outlierdetecting problems, it may not be suitable to describe the effects from factorial experiments. In this research, we examine the effectsparsity approach and propose empirical Bayes methods relaxing this assumption. The study also examines the identification of influential effects based on information about the design structure such as the alias relationships, design resolution, and sizes of interactions. A simulation study, based primarily on the criterion of reducing experimental cost of misidentifying factors, has been performed to compare different methods. The results show that when the number of factors is large and when the factorial experiment is highly fractionated, the incorporation of information about the design structure into the analysis reduces the cost in a screening experiment compared to methods not considering design structure.
Show less  Date Issued
 1994, 1994
 Identifier
 AAI9424751, 3088354, FSDT3088354, fsu:77159
 Format
 Document (PDF)
 Title
 Impact of Missing Data on Building Prognostic Models and Summarizing Models Across Studies.
 Creator

Munshi, Mahtab R., McGee, Daniel, Eberstein, Isaac, Hollander, Myles, Niu, Xufeng, Chattopadhyay, Somesh, Department of Statistics, Florida State University
 Abstract/Description

We examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in...
Show moreWe examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in medical research, new covariates become available for studying various outcomes. While we want to investigate the influence of new factors on the outcome, we also do not want to discard the historical datasets that do not have information about these markers. Our research plan is to investigate different methods to estimate parameters for a model when some of the covariates are missing. These methods include likelihood based inference for the studylevel coefficients and likelihood based inference for the logistic model on the personlevel data. We compare the results from our methods to the corresponding results from complete case analysis. We focus our empirical investigation on a historical example, the addition of high density lipoproteins to existing equations for predicting death due to coronary heart disease. We verify our methods through simulation studies on this example.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd2191
 Format
 Thesis
 Title
 The Impact of Rater Variability on Relationships among Different EffectSize Indices for InterRater Agreement between Human and Automated Essay Scoring.
 Creator

Yun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and...
Show moreYun, Jiyeo, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Zhang, Qian, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement...
Show moreSince researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for interrater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for interrater agreement used to assess the relatedness of human and automated essay scoring, and to examine impacts of rater variability on interrater agreement. To implement the investigations, my study consists of two parts: empirical and simulation studies. Based on the results from the empirical study, the overall effects for interrater agreement were .63 and .99 for exact and adjacent proportions of agreement, .48 for kappas, and between .75 and .78 for correlations. Additionally, significant differences between 6point scales and the other scales (i.e., 3, 4, and 5point scales) for correlations, kappas and proportions of agreement existed. Moreover, based on the results of the simulated data, the highest agreements and lowest discrepancies achieved in the matched rater distribution pairs. Specifically, the means of exact and adjacent proportions of agreement, kappa and weighted kappa values, and correlations were .58, .95, .42, .78, and .78, respectively. Meanwhile the average standardized mean difference was .0005 in the matched rater distribution pairs. Acceptable values for interrater agreement as evaluation criteria for automated essay scoring, impacts of rater variability on interrater agreement, and relationships among interrater agreement indices were discussed.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Yun_fsu_0071E_14144
 Format
 Thesis
 Title
 The importance of skewness and kurtosis in the timeseries of security returns.
 Creator

St. Pierre, Eileen Foley., Florida State University
 Abstract/Description

The importance of skewness and kurtosis in the return generating process is assessed by examining the outofsample forecasting power of three different Exponential GARCH models that assume the conditional errors are generated by a normal distribution, a generalized error distribution, and a nonparametric distribution. These models are selected because they incorporate the timeseries properties of security returns and each of these distributions allows for various degrees of conditional...
Show moreThe importance of skewness and kurtosis in the return generating process is assessed by examining the outofsample forecasting power of three different Exponential GARCH models that assume the conditional errors are generated by a normal distribution, a generalized error distribution, and a nonparametric distribution. These models are selected because they incorporate the timeseries properties of security returns and each of these distributions allows for various degrees of conditional skewness and kurtosis., First, daily security returns of firms listed on the New York and American Stock Exchanges over the period 1971 to 1991, excluding the year of 1987, are used to estimate the three models. This study finds that the importance of skewness and kurtosis varies over time and across firm size. The length of the holding period also affects the accuracy and reliability of expected returns generated by the three Exponential GARCH models., Second, daily security returns, computed from both traded prices and bidask averages, of National Market System firms in the OTC market from 1988 to 1991, are used to estimate the three models. This study finds that there is a tradeoff between obtaining lower forecast errors and the volatility of the forecast errors when skewness and kurtosis are incorporated in the return generating process. Overall, forecast errors are lower and less volatile when bidask averages are used to compute security returns. However, the bidask "bounce" does not have a significant affect on the importance of skewness and kurtosis in the return generating process.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9407828, 3088217, FSDT3088217, fsu:77021
 Format
 Document (PDF)
 Title
 AN INCREASING FAILURE RATE APPROACH TO CONSERVATIVE LOW DOSE EXTRAPOLATION (SAFE DOSE).
 Creator

SCHELL, MICHAEL J., Florida State University
 Abstract/Description

This dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four...
Show moreThis dissertation provides a new method of treating the conservative low dose extrapolation problem. One wishes to determine the largest dose d, called the "safe" dose, for which P(F(d) (LESSTHEQ) r) (GREATERTHEQ) 1  (eta) where F(d) is the proportion of failures, say cancers induced, at dose d by time T. F is a life distribution function, presumed to come from some class of functions F, T is prespecified, r () {0,1}, denotes the proportion of failures at doses (x,y) by fixed time T. Four extensions of the univariate class of IFR functions are introduced, differing in the way that convexity of the hazard function, H(x,y) = ln(1F(x,y)) is posited. The notion of dependent action is considered and a hypothesis test for its existence given., Conservative low dose extrapolation techniques for the two most prominent classes are given. An upper bound for the hazard function is established for low doses with proofs that the bounds are sharp.
Show less  Date Issued
 1984, 1984
 Identifier
 AAI8427325, 3085936, FSDT3085936, fsu:75422
 Format
 Document (PDF)
 Title
 Individual PatientLevel Data MetaAnalysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set.
 Creator

Dutton, Matthew Thomas, McGee, Daniel, Becker, Betsy, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

DerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta...
Show moreDerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the metaanalytic framework are investigated. A twostage analysis is first conducted, in which individual models are fit for each study and summarized using classical metaanalysis procedures. Secondly, a onestage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0620
 Format
 Thesis
 Title
 Inference for a nonlinear semimartingale regression model.
 Creator

Utikal, Klaus Johannes., Florida State University
 Abstract/Description

Consider the semimartingale regression model $X(t)$ = $X(0)$ + $\int\sbsp{0}{t}$ $Y(s)\alpha(s,Z(s))$ $ds + M(t)$, where $Y, Z$ are observable covariate processes, $\alpha$ is a (deterministic) function of both time and the covariate process $Z$, and $M$ is a square integrable martingale. Under the assumption that i.i.d. copies of $X, Y, Z$ are observed continuously over a finite time interval, inference for the function $\alpha(t,z)$ is investigated. Applications of this model include hazard...
Show moreConsider the semimartingale regression model $X(t)$ = $X(0)$ + $\int\sbsp{0}{t}$ $Y(s)\alpha(s,Z(s))$ $ds + M(t)$, where $Y, Z$ are observable covariate processes, $\alpha$ is a (deterministic) function of both time and the covariate process $Z$, and $M$ is a square integrable martingale. Under the assumption that i.i.d. copies of $X, Y, Z$ are observed continuously over a finite time interval, inference for the function $\alpha(t,z)$ is investigated. Applications of this model include hazard function estimation for survival analysis and inference for the drift function of a diffusion process., An estimator $\ A$ for the time integrated $\alpha(t,z)$ and a kernel estimator of $\alpha(t,z)$ itself are introduced. For $X$ a counting process, $\ A$ reduces to the NelsonAalen estimator when $Z$ is not present in the model. Various forms of consistency are proved, rates of convergence and asymptotic distributions of the estimators are derived. Asymptotic confidence bands for the time integrated $\alpha(t,z)$ and a KolmogorovSmirnovtype test of equality of $\alpha$ at different levels of the covariate are given., For the case $Y$ $\equiv$ 1 we introduce an estimator $\{\cal A}$ of the time and space integrated $\alpha(t,z)$. The asymptotic distribution of the estimator $\{\cal A}$ is derived under the assumption that the covariate process $Z$ is $\cal F\sb0$adapted, where ($\cal F\sb{t}$) is the filtration with respect to which $M$ is a martingale. In the counting process case this amounts to assuming that $X$ is a doubly stochastic Poisson process. Weak convergence of the appropriately normalized time and state indexed process $\{\cal A}$ to a Gaussian random field is shown. As an application of this result, confidence bands for the covariate state integrated hazard function of a doubly stochastic Poisson process whose intensity does not explicitly depend on time are derived.
Show less  Date Issued
 1987, 1987
 Identifier
 AAI8807999, 3086793, FSDT3086793, fsu:76268
 Format
 Document (PDF)
 Title
 Inference for Semiparametric TimeVarying Covariate Effect Relative Risk Regression Models.
 Creator

Ye, Gang, McKeague, Ian W., Wang, Xiaoming, Huffer, Fred W., Song, KaiSheng, Department of Statistics, Florida State University
 Abstract/Description

A major interest of survival analysis is to assess covariate effects on survival via appropriate conditional hazard function regression models. The Cox proportional hazards model, which assumes an exponential form for the relative risk, has been a popular choice. However, other regression forms such as Aalen's additive risk model may be more appropriate in some applications. In addition, covariate effects may depend on time, which can not be reflected by a Cox proportional hazards model. In...
Show moreA major interest of survival analysis is to assess covariate effects on survival via appropriate conditional hazard function regression models. The Cox proportional hazards model, which assumes an exponential form for the relative risk, has been a popular choice. However, other regression forms such as Aalen's additive risk model may be more appropriate in some applications. In addition, covariate effects may depend on time, which can not be reflected by a Cox proportional hazards model. In this dissertation, we study a class of timevarying covariate effect regression models in which the link function (relative risk function) is a twice continuously differentiable and prespecified, but otherwise general given function. This is a natural extension of the PrenticeSelf model, in which the link function is general but covariate effects are modelled to be time invariant. In the first part of the dissertation, we focus on estimating the cumulative or integrated covariate effects. The standard martingale approach based on counting processes is utilized to derive a likelihoodbased iterating equation. An estimator for the cumulative covariate effect that is generated from the iterating equation is shown to be ¡Ìnconsistent. Asymptotic normality of the estimator is also demonstrated. Another aspect of the dissertation is to investigate a new test for the above timevarying covariate effect regression model and study consistency of the test based on martingale residuals. For Aalen's additive risk model, we introduce a test statistic based on the HufferMcKeague weightedleastsquares estimator and show its consistency against some alternatives. An alternative way to construct a test statistic based on Bayesian Bootstrap simulation is introduced. An application to real lifetime data will be presented.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd0949
 Format
 Thesis
 Title
 Influence Measures for Bayesian Data Analysis.
 Creator

De Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College...
Show moreDe Oliveira, Melaine C. (Melaine Cristina), Sinha, Debajyoti, Panton, Lynn B., Bradley, Jonathan R., Linero, Antonio Ricardo, Lipsitz, Stuart, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Identifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data...
Show moreIdentifying influential observations in the data is desired to ensure proper inference and statistical analysis. Modern methods to identify influence cases uses crossvalidation diagnostics based on the effect of deletion of ith observation on inference. A popular method to identify influential observations is to use KullbackLiebler divergence measure between the posterior distribution of the parameter of interest given full data and the posterior distribution given the crossvalidated data, where the crossvalidated data has the ith observation removed. Although, in Bayesian inference, the posterior distribution contains all the relevant information about a parameter of interest, when the goal is prediction, perhaps the predictive distribution should be used to identifying influential observations. So, we extended our method to the comparison of the posterior predictive distributions given full data and crossvalidated data. We generalize and extend existing popular Bayesian crossvalidated influence diagnostics using Bregman divergence based measure (BD). We derive useful properties of these BD based on the influence of each observation on the posterior distribution and we show that it can be extended to the predictive distribution. We show that these BD based measures allow interpretable calibration and that they can be computed via Monte Carlo Markov Chain (MCMC) samples from a single posterior based on full data. We illustrate how our new measure of influence of observations have more useful practical roles for data analysis than popular Bayesian residual analysis tools (CPO) in an example of metaanalysis with binary response and in other cases of intervalcensored data.
Show less  Date Issued
 2018
 Identifier
 2018_Su_DeOliveira_fsu_0071E_14712
 Format
 Thesis
 Title
 INFORMATION IN CENSORED MODELS.
 Creator

SCONING, JAMES., Florida State University
 Abstract/Description

Criteria are developed for measuring information in the randomly rightcensored model. Measures which are appropriate include an extension of Shannon's entropy. The measures are seen to satisfy some fundamental properties including (1) information decreases as censoring increases stochastically, (2) the uncensored case is always at least as informative as any censored model, and (3) the information gain is marginally decreasing., Measures of information in censored models can also be...
Show moreCriteria are developed for measuring information in the randomly rightcensored model. Measures which are appropriate include an extension of Shannon's entropy. The measures are seen to satisfy some fundamental properties including (1) information decreases as censoring increases stochastically, (2) the uncensored case is always at least as informative as any censored model, and (3) the information gain is marginally decreasing., Measures of information in censored models can also be developed by adapting measures of dependence between the lifetime variable and the observed variable. Some common notions of bivariate dependence enjoy property (1) cited above. An exception occurs when dependence is defined in terms of association. Conditions under which the coefficients of divergence satisfy (1) and (2) are established., Information is also studied in terms of asymptotic efficiency. We consider the proportional hazards model where the distribution G of the censoring random variable is related to the distribution F of the lifetime variable via (1G) = (1F)(beta). Nonparametric estimators of F are developed for the case where (beta) is unknown and the case where (beta) is known. Of interest in their own right, these estimators also enable us to study the robustness of the KaplanMeier estimator (KME) in a nonparametric model for which it is not the preferred estimator. Comparisons are based on asymptotic efficiencies and exact mean square errors. We also compare the KME to the empirical survival function thereby providing, in a nonparametric setting, a measure of the loss in efficiency due to censoring.
Show less  Date Issued
 1986, 1986
 Identifier
 AAI8605791, 3086279, FSDT3086279, fsu:75762
 Format
 Document (PDF)
 Title
 Intensity Estimation in Poisson Processes with Phase Variability.
 Creator

Gordon, Glenna, Wu, Wei, Whyte, James, Srivastava, Anuj, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a...
Show moreIntensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a result, these estimation methods can yield estimators that are inefficient or that underperform in simulations and applications. This dissertation summarizes two key projects which examine estimation of the intensity of a Poisson process in the presence of phase variability. The first project proposes an alignmentbased framework for intensity estimation. First, it is shown that the intensity function is areapreserved with respect to compositional noise. Such a property implies that the time warping is only encoded in the density, or normalized intensity, function. Then, the intensity function can be decomposed into the product of the estimated total intensity (a scalar value) and the estimated density function. The estimation of the density relies on a metric which measures the phase difference between two density functions. An asymptotic study shows that the proposed estimation algorithm provides a consistent estimator for the normalized intensity. The success of the proposed estimation algorithm is illustrated using two simulations and the new framework is applied in a real data set of neural spike trains, showing that the proposed estimation method yields improved classification accuracy over previous methods. The second project utilizes 2014 Florida data from the Healthcare Cost and Utilization Project's State Inpatient Database and State Emergency Department Database (provided to the U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality by the Florida Agency for Health Care Administration) to examine heart failure emergency department arrival times. Current estimation methods for examining emergency department arrival data ignore the functional nature of the data and implement naive analysis methods. In this dissertation, the arrivals are treated as a Poisson process and the intensity of the process is estimated using existing density estimation and function registration methods. The results of these analyses show the importance of considering the functional nature of emergency department arrival data and the critical role that function registration plays in the intensity estimation of the arrival process.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Gordon_fsu_0071E_13511
 Format
 Thesis
 Title
 Interrelating of Longitudinal Processes: An Empirical Example.
 Creator

RoyalThomas, Tamika Y. N., McGee, Daniel, Levenson, Cathy, Sinha, Debajyoti, Osmond, Clive, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process Blood pressure and anthropometric measurements at seven timepoints on the mother during pregnancy. (ii) In Utero measurements  Ultrasound measurements of the fetus taken at six timepoints during...
Show moreThe Barker Hypothesis states that maternal and `in utero' attributes during pregnancy affects a child's cardiovascular health throughout life. We present an analysis of a unique longitudinal dataset from Jamaica that consists of three longitudinal processes: (i) Maternal longitudinal process Blood pressure and anthropometric measurements at seven timepoints on the mother during pregnancy. (ii) In Utero measurements  Ultrasound measurements of the fetus taken at six timepoints during pregnancy. (iii) Birth to present process  Children's anthropometric and blood pressure measurements at 24 timepoints from birth to 14 years. A comprehensive analysis of the interrelationship of these three longitudinal processes is presented using joint modeling for multivariate longitudinal profiles. We propose a new methodology of examining child's cardiovascular risk by extending a current view of likelihood estimation. Joint modeling of multivariate longitudinal profiles is done and the extension of the traditional likelihood method is utilized in this paper and compared to the maximum likelihood estimates. Our main goal is to examine whether the process in mothers predicts fetal development which in turn predicts the future cardiovascular health of the children. One of the difficulties with `in utero' and early childhood data is that certain variables are highly correlated and so using dimension reduction techniques are quite applicable in this scenario. Principal component analysis (PCA) is utilized in creating a smaller dimension of uncorrelated data which is then utilized in a longitudinal analysis setting. These principal components are then utilized in an optimal linear mixed model for longitudinal data which indicates that in utero and early childhood attributes predicts the future cardiovascular health of the children. This dissertation has added a body of knowledge to developmental origins of adult diseases and has supplied some significant results while utilizing a rich diversity of statistical methodologies.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1792
 Format
 Thesis
 Title
 Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
 Creator

Franks, Billy J., McGee, Daniel, Hurt, Myra, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain dataderived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd4402
 Format
 Thesis
 Title
 Investigating the ChiSquareBased ModelFit Indexes for WLSMV and ULSMV Estimators.
 Creator

Xia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of...
Show moreXia, Yan, Yang, Yanyun, Huffer, Fred W. (Fred William), Almond, Russell G., Becker, Betsy Jane, Paek, Insu, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

In structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from...
Show moreIn structural equation modeling (SEM), researchers use the model chisquare statistic and modelfit indexes to evaluate modeldata fit. Root mean square error of approximation (RMSEA), comparative fit index (CFI), and TuckerLewis index (TLI) are widely applied modelfit indexes. When data are ordered and categorical, the most popular estimator is the diagonally weighted least squares (DWLS) estimator. Robust corrections have been proposed to adjust the uncorrected chisquare statistic from DWLS so that its first and second order moments are in alignment with the target central chisquare distribution under correctly specified models. DWLS with such a correction is called the mean and varianceadjusted weighted least squares (WLSMV) estimator. An alternative to WLSMV is the meanand varianceadjusted unweighted least squares (ULSMV) estimator, which has been shown to perform as well as, or slightly better than WLSMV. Because the chisquare statistic is corrected, the chisquarebased RMSEA, CFI, and TLI are thus also corrected by replacing the uncorrected chisquare statistic with the robust chisquare statistic. The robust model fit indexes calculated in such a way are named as the populationcorrected robust (PR) model fit indexes following BrosseauLiard, Savalei, and Li (2012). The PR model fit indexes are currently reported in almost every application when WLSMV or ULSMV is used. Nevertheless, previous studies have found the PR model fit indexes from WLSMV are sensitive to several factors such as sample sizes, model sizes, and thresholds for categorization. The first focus of this dissertation is on the dependency of model fit indexes on the thresholds for ordered categorical data. Because the weight matrix in the WLSMV fit function and the correction factors for both WLSMV and ULSMV include the asymptotic variances of thresholds and polychoric correlations, the model fit indexes are very likely to depend on the thresholds. The dependency of model fit indexes on the thresholds is not a desirable property, because when the misspecification lies in the factor structures (e.g., cross loadings are ignored or two factors are considered as a single factor), model fit indexes should reflect such misspecification rather than the threshold values. As alternatives to the PR model fit indexes, BrosseauLiard et al. (2012), BrosseauLiard and Savalei (2014), and Li and Bentler (2006) proposed the samplecorrected robust (SR) model fit indexes. The PR fit indexes are found to converge to distorted asymptotic values, but the SR fit indexes converge to their definitions asymptotically. However, the SR model fit indexes were proposed for continuous data, and have been neither investigated nor implemented in SEM software when WLSMV and ULSMV are applied. This dissertation thus investigates the PR and SR model fit indexes for WLSMV and ULSMV. The first part of the simulation study examines the dependency of the model fit indexes on the thresholds when the model misspecification results from omitting crossloadings or collapsing factors in confirmatory factor analysis. The study is conducted on extremely large computergenerated datasets in order to approximate the asymptotic values of model fit indexes. The results find that only the SR fit indexes from ULSMV are independent of the population threshold values, given the other design factors. The PR fit indexes from ULSMV, and the PR and SR fit indexes from WLSMV are influenced by thresholds, especially when data are binary and the hypothesized model is greatly misspecified. The second part of the simulation varies the sample sizes from 100 to 1000 to investigate whether the SR fit indexes under finite samples are more accurate estimates of the defined values of RMSEA, CFI, and TLI, compared with the uncorrected model fit indexes without robust correction and the PR fit indexes. Results show that the SR fit indexes are the more accurate in general. However, when the thresholds are different across items, data are binary, and sample size is less than 500, all versions of these indexes can be very inaccurate. In such situations, larger sample sizes are needed. In addition, the conventional cutoffs developed from continuous data with maximum likelihood (e.g., RMSEA < .06, CFI > .95, and TLI > .95; Hu & Bentler, 1999) have been applied to WLSMV and ULSMV regardless of the arguments against such a practice (e.g., Marsh, Hau, & Wen, 2004). For comparison purposes, this dissertation reports the RMSEA, CFI, and TLI based on continuous data using maximum likelihood before the variables are categorized to create ordered categorical data. Results show that the model fit indexes from maximum likelihood are very different from those from WLSMV and ULSMV, suggesting that the conventional rules should not be applied to WLSMV and ULSMV.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Xia_fsu_0071E_13379
 Format
 Thesis
 Title
 Investigating the Use of Mortality Data as a Surrogate for Morbidity Data.
 Creator

Miller, Gregory, Hollander, Myles, McGee, Daniel, Hurt, Myra, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate...
Show moreWe are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and find that the coefficients differ for the age covariate, but find no significant differences for the other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to represent the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped confidence intervals, and find that age and systolic blood pressure have differences with their covariates. We continue the simulation by using Net Reclassification Index (NRI) to evaluate the treatment decision performance of the two models, and find that the two models do not perform significantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across different risk models, treatment decisions based on classifying an upper specified percent as high risk will not be significantly different. We conclude the dissertation with statements about future methods for approaching our question.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd2408
 Format
 Thesis
 Title
 AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES.
 Creator

WOOLLEY, THOMAS WILLIAM, JR., Florida State University
 Abstract/Description

Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the...
Show moreStatistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples., Pseudorandom samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper nonoutlier discordant, as a unit., The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described., Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8113272, 3084903, FSDT3084903, fsu:74401
 Format
 Document (PDF)
 Title
 KLEMS translog cost estimates and energy elasticities.
 Creator

Campbell, Timothy Alan., Florida State University
 Abstract/Description

Data from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the...
Show moreData from the Bureau of Labor Statistics (BLS) for capital, labor, energy, materials, and business services (KLEMS) are used to estimate translog cost functions. Much of the work developing and testing production and cost functions has used the same Berndt and Wood (BW) data for total manufacturing. Results from the BLS are compared with the BW data and considerable differences found., To improve the translog estimates the Kalman filter and state space form are used in an effort to permit the time proxy for technological change to follow a random walk with drift. The general state space form provides a unified structure that subsumes other models. After smoothing the Kalman filter model is equivalent to including time proxy., An errorcorrection model or ECM is used to make the translog specification more dynamic. Nested within the most general ECM specification are the more restrictive static, partial adjustment, and autoregressive models. Likelihood ratio tests reject the more restricted models in favor of the general ECM specification, but theoretical symmetry and addingup restrictions are rejected for most twodigit Standard Industrial Code industries using the general ECM specification. Elasticities are computed for total manufacturing and compared with those found in other studies with a special emphasis on energy. Many violations of the monotonic, ownprice, and concavity theoretical requirements are found.
Show less  Date Issued
 1993, 1993
 Identifier
 AAI9410157, 3088225, FSDT3088225, fsu:77029
 Format
 Document (PDF)
 Title
 Knowledge acquisition and pattern recognition with random sets.
 Creator

Peng, Xiantu T., Florida State University
 Abstract/Description

In this dissertation we investigate knowledge acquisition (KA) and pattern recognition (PR) from a mathematical point of view. Based on random set theory, we develop some estimation theorems and procedures for setvalued statistics, such as nonparametric estimators and setvaluedization techniques. Under random interval assumption, we establish some special possibility distributions that can be easily implemented in KA tools. The knowledge studied here are rules describing relationships...
Show moreIn this dissertation we investigate knowledge acquisition (KA) and pattern recognition (PR) from a mathematical point of view. Based on random set theory, we develop some estimation theorems and procedures for setvalued statistics, such as nonparametric estimators and setvaluedization techniques. Under random interval assumption, we establish some special possibility distributions that can be easily implemented in KA tools. The knowledge studied here are rules describing relationships between various concepts, as used in diagnosis (pattern recognition) expert systems. Several examples are given to illustrate the estimation theorems and procedures for the acquisition of concepts and relationships. We use our acquisition techniques on a modeling prediction example in two different ways: one is by acquiring the concepts and relationships simultaneously, another is by acquiring rules for predefined concepts. On two classification problems, we use our methods to acquire classification rules. The results are compared with several machine learning methods. Finally, we introduce an expert system shell, STIM, which is largely based on the theory and methods developed here. The embedded KA tools and recognition process are discussed in detail.
Show less  Date Issued
 1991, 1991
 Identifier
 AAI9213744, 3087750, FSDT3087750, fsu:76560
 Format
 Document (PDF)
 Title
 LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
 Creator

CHAGANTY, NARASINGA RAO., Florida State University
 Abstract/Description

Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of nonlattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signedrank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('1)exp{H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n)  n(tau)(,n))/n('1 1/2r) has a limiting distribution which is nonGaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of JongWoo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117139). Chapters IV and V extend the above to the multivariate case.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8225279, 3085419, FSDT3085419, fsu:74914
 Format
 Document (PDF)
 Title
 Learning Political Will in Organizations: A Social Learning Theory Perspective.
 Creator

Maher, Liam Patrick, Ferris, Gerald R., Schatschneider, Christopher, Hochwarter, Wayne A., Van Iddekinge, Chad H., Wang, Gang, Florida State University, College of Business,...
Show moreMaher, Liam Patrick, Ferris, Gerald R., Schatschneider, Christopher, Hochwarter, Wayne A., Van Iddekinge, Chad H., Wang, Gang, Florida State University, College of Business, Department of Management
Show less  Abstract/Description

The past several decades have seen great advances in the field of organizational politics. At the individual level, political skill has garnered the majority of the scholarly focus, whereas it's motivational counterpart, political will, has gone relatively unexamined. Political will represents the motivation to engage in political behavior, which, regardless of the skill with which it is executed, potentially has tremendous effects on myriad different organizational outcomes. Thus, it is...
Show moreThe past several decades have seen great advances in the field of organizational politics. At the individual level, political skill has garnered the majority of the scholarly focus, whereas it's motivational counterpart, political will, has gone relatively unexamined. Political will represents the motivation to engage in political behavior, which, regardless of the skill with which it is executed, potentially has tremendous effects on myriad different organizational outcomes. Thus, it is critical for scholars to understand how political will spreads through work units. This dissertation synthesizes theories of political will, political skill, social identity, social learning, and relationship quality to explain the process of how followers learn political will from their leaders and environments. Specifically, I plan to show that when leaders possess political will, they engage in political behavior. Followers will learn the virtues and drawbacks of political behavior from their leaders, both vicariously and through direct mentoring, and thus their political will should be a function of their leader’s political will. Leaders and their many followers differ in their levels of leadermember relationship quality, political skill, and selfconcept congruence, it is proposed that these differences will drive the level of learning that occurs. The proposed model is tested using data from 406 government workers and their 78 direct supervisors. The primary analyses only supported the hypothesis that leader political will predicts leader political behavior. Exploratory analyses that employed follower rated measures of leader political behavior provided evidence that follower political will is a function of follower perceptions of their leader’s political behavior and their own histories with organizational politics. Strengths, limitations, and opportunities for future research are discussed.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Maher_fsu_0071E_14422
 Format
 Thesis
 Title
 Likelihood ratio based confidence bands in survival analysis.
 Creator

Yang, Jie., Florida State University
 Abstract/Description

Thomas and Grunkemeier (1975) introduced a nonparametric likelihood ratio approach to confidence interval estimation of survival probabilities based on right censored data. We construct simultaneous confidence bands for survival, cumulative hazard rate and quantile functions using this approach. The boundaries of the bands for survival functions are contained within (0,1). A procedure essentially equivalent to a bias correction is developed. The resulting increase in coverage accuracy is...
Show moreThomas and Grunkemeier (1975) introduced a nonparametric likelihood ratio approach to confidence interval estimation of survival probabilities based on right censored data. We construct simultaneous confidence bands for survival, cumulative hazard rate and quantile functions using this approach. The boundaries of the bands for survival functions are contained within (0,1). A procedure essentially equivalent to a bias correction is developed. The resulting increase in coverage accuracy is illustrated by an example and a simulation study. We look at various versions of likelihood ratio based (LR) confidence bands for the survival function and compare them with the HallWellner band and Nair's equal precision band. We show that LR bands for the cumulative hazard rate function and the quantile function can be obtained by employing a functional and the inverse transformation of the survival function respectively to an LR band for the survival function. At the mean time, the testbased and reflected methods are shown to be valid for constructing bands for the quantile function. The various confidence bands for the quantile function are illustrated through an example.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9544337, 3088762, FSDT3088762, fsu:77561
 Format
 Document (PDF)
 Title
 Limit theorems for Markov random fields.
 Creator

Kurien, Thekkthalackal Varugis., Florida State University
 Abstract/Description

Markov Random Fields (MRF's) have been extensively applied in Statistical Mechanics as well as in Bayesian Image Analysis. MRF's are a special class of dependent random variables located at the vertices of a graph whose joint distribution includes a parameter called the temperature. When the number of vertices of the graph tends to infinity, the normalized distribution of statistics based on these random variables converge in distribution. It can happen that for certain values of the...
Show moreMarkov Random Fields (MRF's) have been extensively applied in Statistical Mechanics as well as in Bayesian Image Analysis. MRF's are a special class of dependent random variables located at the vertices of a graph whose joint distribution includes a parameter called the temperature. When the number of vertices of the graph tends to infinity, the normalized distribution of statistics based on these random variables converge in distribution. It can happen that for certain values of the temperature, that the rate of growth of these normalizing constants change drastically. This feature is generally used to explain the phenomenon of phase transition as understood by physicists. In this dissertation we will show that this drastic change in normalizing constants occurs even in the relatively smooth case when all the random variables are Gaussian. Hence any image analytic MRF ought to be checked for such discontinuous behavior before any analysis is performed., Mixed limit theorems in Bayesian Image Analysis seek to replace intensive simulations of MRF's with limit theorems that approximate the distribution of the MRF's as the number of sites increases. The problem of deriving mixed limit theorems for MRF's on a one dimensional lattice graph with an acceptor function that has a second moment has been studied by Chow. A mixed limit theorem for the integer lattice graph is derived when the acceptor function does not have a second moment as for instance when the acceptor function is a symmetric stable density of index 0 $<$ $\alpha$ $<$ 2.
Show less  Date Issued
 1991, 1991
 Identifier
 AAI9202304, 3087655, FSDT3087655, fsu:76470
 Format
 Document (PDF)
 Title
 Logistic Regression, Measures of Explained Variation, and the Base Rate Problem.
 Creator

Sharma, Dinesh R., McGee, Daniel L., Hurt, Myra, Niu, XuFeng, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

One of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the...
Show moreOne of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the R2 measure used for logistic regression analysis are highly sensitive to the base rate (proportion of successes in the sample) and thus do not possess this property. An R2 measure sensitive to the base rate is not suitable to comparison for the same or different model on different datasets, different subsets of a dataset or different but related dependent variables. We evaluated 14 R2 measures that have been suggested or might be useful to measure the explained variation in the logistic regression models based on three criteria 1) intuitively reasonable interpret ability; 2) numerical consistency with the Rho2 of underlying model, and 3) the base rate sensitivity. We carried out a Monte Carlo Simulation study to examine the numerical consistency and the base rate dependency of the various R2 measures for logistic regression analysis. We found all of the parametric R2 measures to be substantially sensitive to the base rate. The magnitude of the base rate sensitivity of these measures tends to be further influenced by the rho2 of the underlying model. None of the measures considered in our study are found to perform equally well in all of the three evaluation criteria used. While R2L stands out for its intuitively reasonable interpretability as a measures of explained variation as well as its independence from the base rate, it appears to severely underestimate the underlying rho2. We found R2CS to be numerically most consistent with the underlying Rho2, with R2N its nearest competitor. In addition, the base rate sensitivity of these two measures appears to be very close to that of the R2L, the most base rate invariant parametric R2 measure. Therefore, we suggest to use R2CS and R2N for logistic regression modeling, specially when it is reasonable to believe that a underlying latent variable exists. However, when the latent variable does not exit, comparability with theunderlying rho2 is not an issue and R2L might be a better choice over all the R2 measures.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd1789
 Format
 Thesis
 Title
 LUMPABILITY AND WEAK LUMPABILITY IN FINITE MARKOV CHAINS.
 Creator

ABDELMONEIM, ATEF MOHAMED., Florida State University
 Abstract/Description

Consider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial...
Show moreConsider a Markov chain x(t), t = 0, 1, 2, ..., with a finite state space, N = {1, 2, ..., n}, transition probability matrix P = (p(,ij)) i, j (epsilon) N, and an initial probability vector V = (v(,i)) i (epsilon) N. For m (LESSTHEQ) n let A = {A(,1), A(,2), ..., A(,m)} be a partition on the set N. Define the process, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), The new process y(t), called a function of Markov chain, need not be Markov. If y(t) is again Markov, whatever the initial probability vector of x(t), x(t) is said to be lumped to y(t) with respect to the partition A. If y(t) is again Markov for only certain initial probability vectors of x(t), x(t) is said to be weakly lumped to y(t) with respect to the partition A., Conditions under which x(t) can be lumped or weakly lumped to y(t) with respect to A, are introduced. Relationships between the two processes x(t) and y(t) and the properties of the new process y(t) are discussed., Criteria are developed to determine whether a given Markov chain can be weakly lumped with respect to a given partition in terms of an analysis of systems of linear equations. Necessary and sufficient conditions on the transition probability matrix of a Markov chain, a partition, A, on N and a subset S of probability vectors for weak lumpability to occur are given in terms of the solution classes to these systems of linear equations. Finally, given that weak lumping occurs, the class S of all initial probability vectors which allow weak lumping is determined as is the transition probability matrix of the lumped process, y(t)., Lumpability and weak lumpability are also studied for Markov chains which are not irreducible. This involves a study of the interplay between two partitions of the state space N, the partition C, induced by the closed sets of states of the Markov chain and the partition A, with respect to which lumpability is to be considered. Under the assumptions that lumpability occurs the relationships which must exist between sets of the two partitions A and C are obtained in detail. It is found, for example that if neither partition is a refinement of the other and (A,C) form an irreducible pair of partitions over N then for each A (epsilon) A and C (epsilon) C, A (INTERSECT) C (NOT=) (phi). Further conditions which the transition probability matrix P must satisfy if lumpability is to hold are obtained as are relationships which must exist between P and P*., Suppose a process y(t) is known to arise as a result of a weak lumping or lumping from some unknown Markov chain x(t). Let (chi)(t) be the class of all Markov chains x(t) with n states which yield this weak lumping or lumping. The problem of characterizing this class and a class S of initial probability vectors which allow this lumping is considered. A complete solution is given when n = 3 and m = 2., The importance of lumpability in application is discussed.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8109927, 3084860, FSDT3084860, fsu:74361
 Format
 Document (PDF)
 Title
 Matched Sample Based Approach for CrossPlatform Normalization on Gene Expression Data.
 Creator

Shao, Jiang, Zhang, Jinfeng, Sang, QingXiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Geneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to...
Show moreGeneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distanceweighted discrimination method. With the public webbased tool NCI60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both crossplatform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based crossplatform normalization method.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Shao_fsu_0071E_12833
 Format
 Thesis
 Title
 A MatchedSampleBased Normalization Method: CrossPlatform Microarray and NGS Data Integration.
 Creator

Zhang, Se Rin, Zhang, Jinfeng, Sang, QingXiang, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Utilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and...
Show moreUtilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and tried to remove the differences caused by the platform discrepancy but they also remove the important biological signals as well. Zhang and Jiang (2015) introduced a new method focusing on eliminating platform effect among systematic effects by employing matched samples which are measured by different platforms for getting a benchmark model. Since the matched sample have no biological difference, their approach is robust to get rid of solely the platform effect. They showed that the new method performs better than Distance Weighted Discrimination (DWD) method. This paper is a followup study of their work and we attempt to improve the new method by incorporating Fast Linear Mixed Regression (FLMER) model. The result indicates that the FLMER model works better than the original proposed model, OLS (Ordinary Least Squares) model in afternormalization concordance comparison and Differential Expression(DE) analysis. Also, we compare our methods to other existing crossplatform normalization methods not only DWD but also Empirical Bayes methods, XPN and GQ methods. The results showed that the proposed method performs much better than other crossplatform normalization methods for removing platform differences and keeping the biological information.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Zhang_fsu_0071E_14868
 Format
 Thesis
 Title
 A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS.
 Creator

TIWARI, RAM CHANDRA., Florida State University
 Abstract/Description

This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset...
Show moreThis dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi)., A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) Ginvariant Dirichlet process (G being a finite group of transformations) is presented., The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is Ginvariant Dirichlet process is also obtained.(,), Let ((chi),A) be the onedimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak, convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If, {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained., When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8108190, 3084828, FSDT3084828, fsu:74329
 Format
 Document (PDF)
 Title
 Median Regression for Complex Survey Data.
 Creator

Fraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and...
Show moreFraser, Raphael André, Sinha, Debajyoti, Lipsitz, Stuart, Carlson, Elwood, Slate, Elizabeth H., Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

The ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data...
Show moreThe ready availability of publicuse data from various large national complex surveys has immense potential for the assessment of population characteristicsmeans, proportions, totals, etcetera. Using a modelbased approach, complex surveys can be used to evaluate the effectiveness of treatments and to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to design features such as stratification, multistage sampling and unequal selection probabilities. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a doubletransformbothsides based estimating equations approach to estimate the median regression parameters of the highly skewed response; the doubletransformbothsides method applies the same transformation twice to both the response and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudolikelihood based on minimizing absolute deviations. Furthermore, the doubletransformbothsides estimator is relatively robust to the true underlying distribution, and has much smaller mean square error than the least absolute deviations estimator. The method is motivated by an analysis of laboratory data on urinary iodine concentration from the National Health and Nutrition Examination Survey.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Fraser_fsu_0071E_12825
 Format
 Thesis
 Title
 Meta Analysis and Meta Regression of a Measure of Discrimination Used in Prognostic Modeling.
 Creator

Rivera, Gretchen L., McGee, Daniel, Hurt, Myra, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our...
Show moreIn this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. The predictors are: age, diabetes, total serum cholesterol (mg/dl), high density lipoprotein (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 84 cohort groups. Our main interest is to evaluate how well the prognostic model discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUROC). The AUROC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUROC and its standard error (SE). We used Metaanalysis to summarize the estimated AUROCs and to evaluate if there is heterogeneity in our estimates. To evaluate the existence of significant heterogeneity we used the Q statistic. Since heterogeneity was found in our study we compare seven different methods for estimating τ2 (between study variance). We conclude by examining whether differences in study characteristics explained the heterogeneity in the values of the AUROC.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7580
 Format
 Thesis
 Title
 MetaAnalysis of Factor Analyses: Comparison of Univariate and Multivariate Approaches Using Correlation Matrices and Factor Loadings.
 Creator

Cho, Kyunghwa, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology...
Show moreCho, Kyunghwa, Becker, Betsy Jane, Huffer, Fred W. (Fred William), Paek, Insu, Yang, Yanyun, Florida State University, College of Education, Department of Educational Psychology and Learning Systems
Show less  Abstract/Description

Currently, more sophisticated techniques such as factor analyses are frequently applied in primary research thus may need to be metaanalyzed. This topic has been given little attention in the past due to its complexity. Because factor analysis is becoming more popular in research in many areas including education, social work, social science, and so on, the study of methods for the metaanalysis of factor analyses is also becoming more important. The first main purpose of this dissertation...
Show moreCurrently, more sophisticated techniques such as factor analyses are frequently applied in primary research thus may need to be metaanalyzed. This topic has been given little attention in the past due to its complexity. Because factor analysis is becoming more popular in research in many areas including education, social work, social science, and so on, the study of methods for the metaanalysis of factor analyses is also becoming more important. The first main purpose of this dissertation is to compare the results of seven different approaches to doing metaanalysis of confirmatory factor analyses. Specifically, five approaches are based on univariate metaanalysis methods. The next two approaches use multivariate metaanalysis to obtain the results of factor loadings and the standard errors of factor loadings. The results from each approach are compared. Given the fact that factor analyses are commonly used in many areas, the second purpose of this dissertation is to explore the appropriate approach or approaches to use for the metaanalysis of factor analyses, especially Confirmatory Factor Analysis (CFA). When the average sample size was small, the results of IRD, WMC, WMFL, and GLSMFL approaches showed better performance than those of UMC, MFL, and GLSMC approaches to estimating parameters. With large average sample sizes (larger than 150), the performance to estimate the parameters across all seven approaches seemed to be similar in this dissertation. Based on my simulation results, researchers who want to conduct metaanalytic confirmatory factor analysis can apply any of these approaches to synthesize the results from primary studies it their studies have n > 150.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9570
 Format
 Thesis
 Title
 A Method for Finding the Nadir of NonMonotonic Relationships.
 Creator

Tan, Fei, McGee, Daniel, Lloyd, Donald, Huﬀer, Fred, Niu, Xufeng, Dutton, Gareth, Department of Statistics, Florida State University
 Abstract/Description

Different methods have been proposed to model the Jshaped or Ushaped relationship between a risk factor and mortality so that the optimal riskfactor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixedknot spline regression. A quadratic method contains both the linear and the...
Show moreDifferent methods have been proposed to model the Jshaped or Ushaped relationship between a risk factor and mortality so that the optimal riskfactor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixedknot spline regression. A quadratic method contains both the linear and the quadratic term of the risk factor, it is simple but often it generates unrealistic nadir estimates. The transformation method converts the original risk factor so that after transformation it has a Normal distribution, but this may not work when there is no good transformation to normality. Fractional polynomials are an extended class of regular polynomials that applies negative and fractional powers to the risk factor. Compared with the quadratic method or the transformation method it does not always have a good model interpretation and inferences about it do not incorporate the uncertainty coming from preselection of powers and degree. A change point method models the prognostic index using two pieces of upward quadratic functions that meet at their common nadir. This method assumes the knot and the nadir are the same, which is not always true. Fixedknot spline regression has also been used to model nonlinear prognostic indices. But its inference does not account for variation arising from knot selections. Here we consider spline regressions with free knots, a natural generalization of the quadratic, the change point and the fixedknot spline method. They can be applied to risk factors that do not have a good transformation to normality as well as keep intuitive model interpretations. Asymptotic normality and consistency of the maximum partial likelihood estimators are established under certain condition. When the condition is not satisfied simulations are used to explore asymptotic properties. The new method is motivated by and applied to the nadir estimation in nonmonotonic relationships between BMI (body mass index) and allcause mortality. Its performance is compared with that of existing methods, adopting criteria of nadir estimation ability and goodness of fit.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd1719
 Format
 Thesis
 Title
 Methods of Block Thresholding Across Multiple Resolution Levels in Adaptive Wavelet Estimation.
 Creator

Schleeter, Tiffany M., Chicken, Eric, Clark, Kathleen M., Pati, Debdeep, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Blocking methods of thresholding have demonstrated many advantages over termbyterm methods in adaptive wavelet estimation. These blocking methods are resolutionlevel specific, meaning the coefficients are grouped together only within the same resolution level. Techniques have not yet been proposed for blocking across multiple resolution levels and do not take into consideration varying shapes of blocks for wavelet coefficients. Presently, several methods of block thresholding across...
Show moreBlocking methods of thresholding have demonstrated many advantages over termbyterm methods in adaptive wavelet estimation. These blocking methods are resolutionlevel specific, meaning the coefficients are grouped together only within the same resolution level. Techniques have not yet been proposed for blocking across multiple resolution levels and do not take into consideration varying shapes of blocks for wavelet coefficients. Presently, several methods of block thresholding across multiple resolution levels are described. Various simulation studies analyze the use of these methods among nonparametric functions, including comparisons to other blocking and nonblocking wavelet thresholding methods. The introduction of a this new technique questions when this method will be advantageous over resolutionlevel specific methods. Another simulation study demonstrates a method of statistically selecting when blocking across resolution levels is beneficial over traditional techniques. Additional analysis will conclude how effective the automated selection method is in both simulation and if put into practice.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9677
 Format
 Thesis
 Title
 Minimax Tests for Nonparametric Alternatives with Applications to High Frequency Data.
 Creator

Yu, Han, Song, KaiSheng, Professor, Jack Quine, Professor, Fred Huﬀer, Professor, Dan McGee, Department of Statistics, Florida State University
 Abstract/Description

We present a general methodology for developing an asymptotically distributionfree, asymptotic minimax tests. The tests are constructed via a nonparametric densityquantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple...
Show moreWe present a general methodology for developing an asymptotically distributionfree, asymptotic minimax tests. The tests are constructed via a nonparametric densityquantile function and the limiting distribution is derived by a martingale approach. The procedure can be viewed as a novel parametric extension of the classical parametric likelihood ratio test. The proposed tests are shown to be omnibus within an extremely large class of nonparametric global alternatives characterized by simple conditions. Furthermore, we establish that the proposed tests provide better minimax distinguishability. The tests have much greater power for detecting highfrequency nonparametric alternatives than the existing classical tests such as KolmogorovSmirnov and Cramervon Mises tests. The good performance of the proposed tests is demonstrated by Monte Carlo simulations and applications in High Energy Physics.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0796
 Format
 Thesis
 Title
 MixedEffects Models for Count Data with Applications to Educational Research.
 Creator

Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixedeffects and mixeddistribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zeroinflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zeroinflated Poisson models to repeated measures data of zeroinflated data by introducing a pair of possibly correlated random effects to the zeroinflated Poisson model to accommodate withinsubject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasiNewton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zeroinflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5181
 Format
 Thesis
 Title
 Modeling Differential Item Functioning (DIF) Using Multilevel Logistic Regression Models: A Bayesian Perspective.
 Creator

Chaimongkol, Saengla, Huﬀer, Fred W., Kamata, Akihito, Tate, Richard, Niu, XuFeng, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

A multilevel logistic regression approach provides an attractive and practical alternative for the study of Differential Item Functioning (DIF). It is not only useful for identifying items with DIF but also for explaining the presence of DIF. Kamata and Binici (2003) first attempted to identify group unit characteristic variables explaining the variation of DIF by using hierarchical generalized linear models. Their models were implemented by the HLM5 software, which uses the penalized or...
Show moreA multilevel logistic regression approach provides an attractive and practical alternative for the study of Differential Item Functioning (DIF). It is not only useful for identifying items with DIF but also for explaining the presence of DIF. Kamata and Binici (2003) first attempted to identify group unit characteristic variables explaining the variation of DIF by using hierarchical generalized linear models. Their models were implemented by the HLM5 software, which uses the penalized or predictive quasilikelihood (PQL) method. They found that the variance estimates produced by HLM5 for the level 3 parameters are substantially negatively biased. This study extends their work by using a Bayesian approach to obtain more accurate parameter estimates. Two different approaches to modeling the DIF will be presented. These are referred to as the relative and mixture distribution approach, respectively. The relative approach measures the DIF of a particular item relative to the mean overall DIF for all items in the test. The mixture distribution approach treats the DIF as independent values drawn from a distribution which is a mixture of a normal distribution and a discrete distribution concentrated at zero. A simulation study is presented to assess the adequacy of the proposed models. This work also describes and studies models which allow the DIF to vary at level 3 (from school to school). In an example using real data, it is shown how the models can be applied to the identification of items with DIF and the explanation of the source of the DIF.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3939
 Format
 Thesis