Current Search: Research Repository (x) » Statistics (x)
Search results
Pages
 Title
 Intensity Estimation in Poisson Processes with Phase Variability.
 Creator

Gordon, Glenna, Wu, Wei, Whyte, James, Srivastava, Anuj, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a...
Show moreIntensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a result, these estimation methods can yield estimators that are inefficient or that underperform in simulations and applications. This dissertation summarizes two key projects which examine estimation of the intensity of a Poisson process in the presence of phase variability. The first project proposes an alignmentbased framework for intensity estimation. First, it is shown that the intensity function is areapreserved with respect to compositional noise. Such a property implies that the time warping is only encoded in the density, or normalized intensity, function. Then, the intensity function can be decomposed into the product of the estimated total intensity (a scalar value) and the estimated density function. The estimation of the density relies on a metric which measures the phase difference between two density functions. An asymptotic study shows that the proposed estimation algorithm provides a consistent estimator for the normalized intensity. The success of the proposed estimation algorithm is illustrated using two simulations and the new framework is applied in a real data set of neural spike trains, showing that the proposed estimation method yields improved classification accuracy over previous methods. The second project utilizes 2014 Florida data from the Healthcare Cost and Utilization Project's State Inpatient Database and State Emergency Department Database (provided to the U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality by the Florida Agency for Health Care Administration) to examine heart failure emergency department arrival times. Current estimation methods for examining emergency department arrival data ignore the functional nature of the data and implement naive analysis methods. In this dissertation, the arrivals are treated as a Poisson process and the intensity of the process is estimated using existing density estimation and function registration methods. The results of these analyses show the importance of considering the functional nature of emergency department arrival data and the critical role that function registration plays in the intensity estimation of the arrival process.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Gordon_fsu_0071E_13511
 Format
 Thesis
 Title
 Prediction and Testing for NonParametric Random Function Signals in a Complex System.
 Creator

Hill, Paul C., Chicken, Eric, Klassen, Eric, Niu, Xufeng, Barbu, Adrian, Department of Statistics, Florida State University
 Abstract/Description

Methods employed in the construction of prediction bands for continuous curves require a dierent approach to those used for a data point. In many cases, the underlying function is unknown and thus a distributionfree approach which preserves sufficient coverage for the entire signal is necessary in the signal analysis. This paper discusses three methods for the formation of (1alpha)100% bootstrap prediction bands and their performances are compared through the coverage probabilities obtained...
Show moreMethods employed in the construction of prediction bands for continuous curves require a dierent approach to those used for a data point. In many cases, the underlying function is unknown and thus a distributionfree approach which preserves sufficient coverage for the entire signal is necessary in the signal analysis. This paper discusses three methods for the formation of (1alpha)100% bootstrap prediction bands and their performances are compared through the coverage probabilities obtained for each technique. Bootstrap samples are first obtained for the signal and then three dierent criteria are provided for the removal of 100% of the curves resulting in the (1alpha)100% prediction band. The first method uses the L1 distance between the upper and lower curves as a gauge to extract the widest bands in the dataset of signals. Also investigated are extractions using the Hausdorffdistance between the bounds as well as an adaption to the bootstrap intervals discussed in Lenhoffet al (1999). The bootstrap prediction bands each have good coverage probabilities for the continuous signals in the dataset. For a 95% prediction band, the coverage obtained were 90.59%, 93.72% and 95% for the L1 Distance, Hausdorff Distance and the adjusted Bootstrap methods respectively. The methods discussed in this paper have been applied to constructing prediction bands for spring discharge in a successful manner giving good coverage in each case. Spring Discharge measured over time can be considered as a continuous signal and the ability to predict the future signals of spring discharge is useful for monitoring flow and other issues related to the spring. While in some cases, rainfall has been tted with the gamma distribution, the discharge of the spring represented as continuous curves, is better approached not assuming any specific distribution. The Bootstrap aspect occurs not in sampling the output discharge curves but rather in simulating the input recharge that enters the spring. Bootstrapping the rainfall as described in this paper, allows for adequately creating new samples over different periods of time as well as specic rain events such as hurricanes or drought. The Bootstrap prediction methods put forth in this paper provide an approach that supplies adequate coverage for prediction bands for signals represented as continuous curves. The pathway outlined by the flow of the discharge through the springshed is described as a tree. A nonparametric pairwise test, motivated by the idea of Kmeans clustering, is proposed to decipher whether there is equality between two trees in terms of their discharges. A large sample approximation is devised for this lowertail significance test and test statistics for different numbers of input signals are compared to a generated table of critical values.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4910
 Format
 Thesis
 Title
 Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage.
 Creator

Cuevas, Jordan, Chicken, Eric, Sobanjo, John, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector)....
Show moreStatistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an ndimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a PhaseI setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the PhaseII problem of monitoring sequences of profiles for deviations from incontrol. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4788
 Format
 Thesis
 Title
 Weighted Adaptive Methods for Multivariate Response Models with an HIV/Neurocognitive Application.
 Creator

Geis, Jennifer Ann, She, Yiyuan, MeyerBaese, Anke, Barbu, Adrian, Bunea, Florentina, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the...
Show moreMultivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the error matrix has independent constant variance entries. While this assumption is necessary to show their strong theoretical results, in practical application, some flexibility is required. That is, such assumption cannot always be safely made. What is developed here are the theoretics that parallel Bunea et al.'s work with the addition of a "decorrelator" weight matrix. One choice for the weight matrix is the residual covariance, but this introduces many issues in practice. A computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving rise to an Adaptive CCA (ACCA) with principal proofs for the large sample setting. However, particular considerations are required for the highdimensional setting, where similar theoretics do not hold. What is offered instead are extensive empirical simulations that reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also provides good estimation of the number of canonical relationships and variates. It is argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the highdimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these highdimensional concerns. To offer a practical application of these ideas, ACCA is applied to a "large sample'" neurocognitive dataset. Then, a highdimensional dataset is generated to which Group LASSO will be first utilized before ACCA. This provides a unique perspective into the relationships between cognitive deficiencies in HIVpositive patients and the extensive, available neuroimaging measures.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4861
 Format
 Thesis
 Title
 Nonparametric Wavelet Thresholding and Profile Monitoring for NonGaussian Errors.
 Creator

McGinnity, Kelly, Chicken, Eric, Hoeﬂich, Peter, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

Recent advancements in data collection allow scientists and researchers to obtain massive amounts of information in short periods of time. Often this data is functional and quite complex. Wavelet transforms are popular, particularly in the engineering and manufacturing fields, for handling these type of complicated signals. A common application of wavelets is in statistical process control (SPC), in which one tries to determine as quickly as possible if and when a sequence of profiles has...
Show moreRecent advancements in data collection allow scientists and researchers to obtain massive amounts of information in short periods of time. Often this data is functional and quite complex. Wavelet transforms are popular, particularly in the engineering and manufacturing fields, for handling these type of complicated signals. A common application of wavelets is in statistical process control (SPC), in which one tries to determine as quickly as possible if and when a sequence of profiles has gone outofcontrol. However, few wavelet methods have been proposed that don't rely in some capacity on the assumption that the observational errors are normally distributed. This dissertation aims to fill this void by proposing a simple, nonparametric, distributionfree method of monitoring profiles and estimating changepoints. Using only the magnitudes and location maps of thresholded wavelet coefficients, our method uses the spatial adaptivity property of wavelets to accurately detect profile changes when the signal is obscured with a variety of nonGaussian errors. Wavelets are also widely used for the purpose of dimension reduction. Applying a thresholding rule to a set of wavelet coefficients results in a "denoised" version of the original function. Once again, existing thresholding procedures generally assume independent, identically distributed normal errors. Thus, the second main focus of this dissertation is a nonparametric method of thresholding that does not assume Gaussian errors, or even that the form of the error distribution is known. We improve upon an existing evenodd crossvalidation method by employing block thresholding and level dependence, and show that the proposed method works well on both skewed and heavytailed distributions. Such thresholding techniques are essential to the SPC procedure developed above.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7502
 Format
 Thesis
 Title
 The Frequentist Performance of Some Bayesian Confidence Intervals for the Survival Function.
 Creator

Tao, Yingfeng, Huﬀer, Fred, Okten, Giray, Sinha, Debajyoti, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based...
Show moreEstimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based in some way on the KaplanMeier estimator first proposed by Kaplan and Meier (1958) and widely used as the nonparametric estimator in the presence of rightcensored data. For intervalcensored data, the Turnbull estimator (Turnbull (1974)) plays a similar role. For a class of Bayesian models involving Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques lead to probability intervals for the survival function (at arbitrary time points) and its quantiles for both the rightcensored and intervalcensored cases. This dissertation will examine the frequentist properties and general performance of these probability intervals when the prior is noninformative. Simulation studies will be used to compare these probability intervals with other published approaches. Extensions of the DossHuffer approach are given for constructing simultaneous confidence bands for the survival function and for computing approximate confidence intervals for the survival function based on Edgeworth expansions using posterior moments. The performance of these extensions is studied by simulation.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7624
 Format
 Thesis
 Title
 Bayesian Methods for Skewed Response Including Longitudinal and Heteroscedastic Data.
 Creator

Tang, Yuanyuan, Sinha, Debajyoti, Pati, Debdeep, Flynn, Heather, She, Yiyuan, Lipsitz, Stuart, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

Skewed response data are very popular in practice, especially in biomedical area. We begin our work from the skewed longitudinal response without heteroscedasticity. We extend the skewed error density to the multivariate response. Then we study the heterocedasticity. We extend the transformbothsides model to the bayesian variable selection area to handle the univariate skewed response, where the variance of response is a function of the median. At last, we proposed a novel model to handle...
Show moreSkewed response data are very popular in practice, especially in biomedical area. We begin our work from the skewed longitudinal response without heteroscedasticity. We extend the skewed error density to the multivariate response. Then we study the heterocedasticity. We extend the transformbothsides model to the bayesian variable selection area to handle the univariate skewed response, where the variance of response is a function of the median. At last, we proposed a novel model to handle the skewed univariate response with a flexible heteroscedasticity. For longitudinal studies with heavily skewed continuous response, statistical model and methods focusing on mean response are not appropriate. In this paper, we present a partial linear model of median regression function of skewed longitudinal response. We develop a semiparametric Bayesian estimation procedure using an appropriate Dirichlet process mixture prior for the skewed error distribution. We provide justifications for using our methods including theoretical investigation of the support of the prior, asymptotic properties of the posterior and also simulation studies of finite sample properties. Ease of implementation and advantages of our model and method compared to existing methods are illustrated via analysis of a cardiotoxicity study of children of HIV infected mother. Our second aim is to develop a Bayesian simultaneous variable selection and estimation of median regression for skewed response variable. Our hierarchical Bayesian model can incorporate advantages of $l_0$ penalty for skewed and heteroscedastic error. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing frequentist median lasso regression model. Considering the estimation bias and total square error, our proposed model performs as good as, or better than competing frequentist estimators. In biomedical studies, the covariates often affect the location, scale as well as the shape of the skewed response distribution. Existing biostatistical literature mainly focuses on the mean regression with a symmetric error distribution. While such modeling assumptions and methods are often deemed as restrictive and inappropriate for skewed response, the completely nonparametric methods may lack a physical interpretation of the covariate effects. Existing nonparametric methods also miss any easily implementable computational tool. For a skewed response, we develop a novel model accommodating a nonparametric error density that depends on the covariates. The advantages of our semiparametric associated Bayes method include the ease of prior elicitation/determination, an easily implementable posterior computation, theoretically sound properties of the selection of priors and accommodation of possible outliers. The practical advantages of the method are illustrated via a simulation study and an analysis of a reallife epidemiological study on the serum response to DDT exposure during gestation period.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7622
 Format
 Thesis
 Title
 Statistical Analysis of Trajectories on Riemannian Manifolds.
 Creator

Su, Jingyong, Srivastava, Anuj, Klassen, Erik, Huffer, Fred, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

This thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete...
Show moreThis thesis consists of two distinct topics. First, we present a framework for estimation and analysis of trajectories on Riemananian manifolds. Second, we propose a framework of detecting, classifying, and estimating shapes in point cloud data. This thesis mainly focuses on statistical analysis of trajectories that take values on nonlinear manifolds. There are many difficulties when analyzing temporal trajectories on nonlinear manifold. First, the observed data are always noisy and discrete at unsynchronized times. Second, trajectories are observed under arbitrary temporal evolutions. In this work, we first address the problem of estimating full smooth trajectories on nonlinear manifolds using only a set of timeindexed points, for use in interpolation, smoothing, and prediction of dynamic systems. Furthermore, we study statistical analysis of trajectories that take values on nonlinear Riemannian manifolds and are observed under arbitrary temporal evolutions. The problem of analyzing such temporal trajectories including registration, comparison, modeling and evaluation exist in a lot of applications. We introduce a quantity that provides both a cost function for temporal registration and a proper distance for comparison of trajectories. This distance, in turn, is used to define statistical summaries, such as the sample means and covariances, of given trajectories and Gaussiantype models to capture their variability. Both theoretical proofs and experimental results are provided to validate our work. The problems of detecting, classifying, and estimating shapes in point cloud data are important due to their general applicability in image analysis, computer vision, and graphics. They are challenging because the data is typically noisy, cluttered, and unordered. We study these problems using a fully statistical model where the data is modeled using a Poisson process on the objects boundary (curves or surfaces), corrupted by additive noise and a clutter process. Using likelihood functions dictated by the model, we develop a generalized likelihood ratio test for detecting a shape in a point cloud. Additionally, we develop a procedure for estimating most likely shapes in observed point clouds under given shape hypotheses. We demonstrate this framework using examples of 2D and 3D shape detection and estimation in both real and simulated data, and a usage of this framework in shape retrieval from a 3D shape database.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7619
 Format
 Thesis
 Title
 Meta Analysis and Meta Regression of a Measure of Discrimination Used in Prognostic Modeling.
 Creator

Rivera, Gretchen L., McGee, Daniel, Hurt, Myra, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our...
Show moreIn this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. The predictors are: age, diabetes, total serum cholesterol (mg/dl), high density lipoprotein (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 84 cohort groups. Our main interest is to evaluate how well the prognostic model discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUROC). The AUROC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUROC and its standard error (SE). We used Metaanalysis to summarize the estimated AUROCs and to evaluate if there is heterogeneity in our estimates. To evaluate the existence of significant heterogeneity we used the Q statistic. Since heterogeneity was found in our study we compare seven different methods for estimating τ2 (between study variance). We conclude by examining whether differences in study characteristics explained the heterogeneity in the values of the AUROC.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7580
 Format
 Thesis
 Title
 An Ensemble Approach to Predicting Health Outcomes.
 Creator

Nilles, Ester Kim, McGee, Dan, Zhang, Jinfeng, Eberstein, Isaac, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Heart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk...
Show moreHeart disease and premature birth continue to be the leading cause of mortality and neonatal mortality in large parts of the world. They are also estimated to have the highest medical expenditures in the United States. Early detection of heart disease incidence plays a critical role in preserving heart health, and identifying pregnancies at high risk of premature birth is highly valuable information for early interventions. The past few decades, identification of patients at high health risk have been based on logistic regression or Cox proportional hazards models. In more recent years, machine learning models have grown in popularity within the medical field for their superior predictive and classification performances over the classical statistical models. However, their performances in heart disease and premature birth predictions have been comparable and inconclusive, leaving the question of which model most accurately reflects the data difficult to resolve. Our aim is to incorporate information learned by different models into one final model that will generate superior predictive performances. We first compare the widely used machine learning models  the multilayer perceptron network, knearest neighbor and support vector machine  to the statistical models logistic regression and Cox proportional hazards. Then the individual models are combined into one in an ensemble approach, also referred to as ensemble modeling. The proposed approaches include SSEweighted, AUCweighted, logistic and flexible naive Bayes. The individual models are unique and capture different aspects of the data, but as expected, no individual one outperforms any other. The ensemble approach is an easily computed method that eliminates the need to select one model, integrates the strengths of different models, and generates optimal performances. Particularly in cases where the risk factors associated to an outcome are elusive, such as in premature birth, the ensemble models significantly improve their prediction.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7530
 Format
 Thesis
 Title
 MixedEffects Models for Count Data with Applications to Educational Research.
 Creator

Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixedeffects and mixeddistribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zeroinflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zeroinflated Poisson models to repeated measures data of zeroinflated data by introducing a pair of possibly correlated random effects to the zeroinflated Poisson model to accommodate withinsubject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasiNewton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zeroinflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5181
 Format
 Thesis
 Title
 A Riemannian Framework for Annotated Curves Analysis.
 Creator

Liu, Wei, Srivastava, Anuj, Zhang, Jinfeng, Klassen, Eric P., Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

We propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions...
Show moreWe propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions using both geometric and auxiliary coordinates, and analyze shapes of these curves. The difficulty comes from the need for removing different groups from different components: the shape is invariant to rigidmotion, global scale and reparameterization while the auxiliary component is usually invariant only to the reparameterization. Thus, the removal of some transformations (rigid motion and global scale) is restricted only to the geometric coordinates, while the reparameterization group is removed for all coordinates. We demonstrate this framework using a number of experiments.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd4997
 Format
 Thesis
 Title
 Nonparametric Data Analysis on Manifolds with Applications in Medical Imaging.
 Creator

Osborne, Daniel Eugene, Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

Over the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the SizeandReflection Shape Space of kads in general position in 3D. This work is a part...
Show moreOver the past twenty years, there has been a rapid development in Nonparametric Statistical Analysis on Manifolds applied to Medical Imaging problems. In this body of work, we focus on two different medical imaging problems. The first problem corresponds to analyzing the CT scan data. In this context, we perform nonparametric analysis on the 3D data retrieved from CT scans of healthy young adults, on the SizeandReflection Shape Space of kads in general position in 3D. This work is a part of larger project on planning reconstructive surgery in severe skull injuries which includes preprocessing and postprocessing steps of CT images. The next problem corresponds to analyzing MR diffusion tensor imaging data. Here, we develop a twosample procedure for testing the equality of the generalized Frobenius means of two independent populations on the space of symmetric positive matrices. These new methods, naturally lead to an analysis based on Cholesky decompositions of covariance matrices which helps to decrease computational time and does not increase dimensionality. The resulting nonparametric matrix valued statistics are used for testing if there is a difference on average between corresponding signals in Diffusion Tensor Images (DTI) in young children with dyslexia when compared to their clinically normal peers. The results presented here correspond to data that was previously used in the literature using parametric methods which also showed a significant difference.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5085
 Format
 Thesis
 Title
 Riemannian Shape Analysis of Curves and Surfaces.
 Creator

Kurtek, Sebastian, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Huﬀer, Fred, Dryden, Ian, Department of Statistics, Florida State University
 Abstract/Description

Shape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the...
Show moreShape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the analysis must be performed on infinitedimensional and sometimes nonlinear spaces, which poses an additional difficulty. In this work, we develop and apply methods which address these issues. We begin by defining a framework for shape analysis of parameterized open curves and extend these ideas to shape analysis of surfaces. We utilize the presented frameworks in various classification experiments spanning multiple application areas. In the case of curves, we consider the problem of clustering DTMRI brain fibers, classification of protein backbones, modeling and segmentation of signatures and statistical analysis of biosignals. In the case of surfaces, we perform disease classification using 3D anatomical structures in the brain, classification of handwritten digits by viewing images as quadrilateral surfaces, and finally classification of cropped facial surfaces. We provide two additional extensions of the general shape analysis frameworks that are the focus of this dissertation. The first one considers shape analysis of marked spherical surfaces where in addition to the surface information we are given a set of manually or automatically generated landmarks. This requires additional constraints on the definition of the reparameterization group and is applicable in many domains, especially medical imaging and graphics. Second, we consider reflection symmetry analysis of planar closed curves and spherical surfaces. Here, we also provide an example of disease detection based on brain asymmetry measures. We close with a brief summary and a discussion of open problems, which we plan on exploring in the future.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4963
 Format
 Thesis
 Title
 A Novel Riemannian Metric for Analyzing Spherical Functions with Applications to HARDI Data.
 Creator

Ncube, Sentibaleng, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of...
Show moreWe propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of PDFs modulo the rotations. In other words, any two PDFs are compared in: (1) shape by rotationally aligning one PDF to another, using the FisherRao distance on the aligned PDFs, and (2) orientation by comparing their rotation matrices. This idea improves upon the results from using the FisherRao metric in analyzing PDFs directly, a technique that is being used increasingly, and leads to geodesic interpolations that are biologically feasible. This framework leads to definitions and efficient computations for the Karcher mean that provide tools for improved interpolation and denoising. We demonstrate these ideas, using an experimental setup involving several PDFs.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd5064
 Format
 Thesis
 Title
 Theories on Group Variable Selection in Multivariate Regression Models.
 Creator

Ha, SeungYeon, She, Yiyuan, Okten, Giray, Huffer, Fred, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

We study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized...
Show moreWe study group variable selection on multivariate regression model. Group variable selection is equivalent to select the nonzero rows of coefficient matrix, since there are multiple response variables and thus if one predictor is irrelevant to estimation then the corresponding row must be zero. In high dimensional setup, shrinkage estimation methods are applicable and guarantee smaller MSE than OLS according to JamesStein phenomenon (1961). As one of shrinkage methods, we study penalized least square estimation for a group variable selection. Among them, we study L0 regularization and L0 + L2 regularization with the purpose of obtaining accurate prediction and consistent feature selection, and use the corresponding computational procedure Hard TISP and HardRidge TISP (She, 2009) to solve the numerical difficulties. These regularization methods show better performance both on prediction and selection than Lasso (L1 regularization), which is one of popular penalized least square method. L0 acheives the same optimal rate of prediction loss and estimation loss as Lasso, but it requires no restriction on design matrix or sparsity for controlling the prediction error and a relaxed condition than Lasso for controlling the estimation error. Also, for selection consistency, it requires much relaxed incoherence condition, which is correlation between the relevant subset and irrelevant subset of predictors. Therefore L0 can work better than Lasso both on prediction and sparsity recovery, in practical cases such that correlation is high or sparsity is not low. We study another method, L0 + L2 regularization which uses the combined penalty of L0 and L2. For the corresponding procedure HardRidge TISP, two parameters work independently for selection and shrinkage (to enhance prediction) respectively, and therefore it gives better performance on some cases (such as low signal strength) than L0 regularization. For L0 regularization, λ works for selection but it is tuned in terms of prediction accuracy. L0 + L2 regularization gives the optimal rate of prediction and estimation errors without any restriction, when the coefficient of l2 penalty is appropriately assigned. Furthermore, it can achieve a better rate of estimation error with an ideal choice of blockwise weight to l2 penalty.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7404
 Format
 Thesis
 Title
 Monte Carlo Likelihood Estimation for Conditional Autoregressive Models with Application to Sparse Spatiotemporal Data.
 Creator

Bain, Rommel, Huffer, Fred, Becker, Betsy, Niu, Xufeng, Srivastava, Anuj, Department of Statistics, Florida State University
 Abstract/Description

Spatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Spatiotemporal models describe the relationship between observations collected from different spatiotemporal sites. The modeling of spatiotemporal interactions arising from spatiotemporal data is done by incorporating the spacetime dependence into the covariance structure. A...
Show moreSpatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Spatiotemporal models describe the relationship between observations collected from different spatiotemporal sites. The modeling of spatiotemporal interactions arising from spatiotemporal data is done by incorporating the spacetime dependence into the covariance structure. A main goal of spatiotemporal modeling is the estimation and prediction of the underlying process that generates the observations under study and the parameters that govern the process. Furthermore, analysis of the spatiotemporal correlation of variables can be used for estimating values at sites where no measurements exist. In this work, we develop a framework for estimating quantities that are functions of complete spatiotemporal data when the spatiotemporal data is incomplete. We present two classes of conditional autoregressive (CAR) models (the homogeneous CAR (HCAR) model and the weighted CAR (WCAR) model) for the analysis of sparse spatiotemporal data (the log of monthly mean zooplankton biomass) collected on a spatiotemporal lattice by the California Cooperative Oceanic Fisheries Investigations (CalCOFI). These models allow for spatiotemporal dependencies between nearest neighbor sites on the spatiotemporal lattice. Typically, CAR model likelihood inference is quite complicated because of the intractability of the CAR model's normalizing constant. Sparse spatiotemporal data further complicates likelihood inference. We implement Monte Carlo likelihood (MCL) estimation methods for parameter estimation of our HCAR and WCAR models. Monte Carlo likelihood estimation provides an approximation for intractable likelihood functions. We demonstrate our framework by giving estimates for several different quantities that are functions of the complete CalCOFI time series data.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7283
 Format
 Thesis
 Title
 Nonparametric Nonstationary Density Estimation Including Upper Control Limit Methods for Detecting Change Points.
 Creator

Becvarik, Rachel A., Chicken, Eric, Liu, Guosheng, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Nonstationary nonparametric densities occur naturally including applications such as monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current work on monitoring them for changes. A new statistic is proposed which effectively monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely...
Show moreNonstationary nonparametric densities occur naturally including applications such as monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current work on monitoring them for changes. A new statistic is proposed which effectively monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely nonparametric, designed for no particular distributional assumptions; thus making it effective in a variety of conditions. Existing methods for monitoring sequential data typically focus on using a single value upper control limit (UCL) based on a specified in control average run length (ARL) to detect changes in these nonstationary statistics. However, such a UCL is not designed to take into consideration the false alarm rate, the power associated with the test or the underlying distribution of the ARL. Additionally, if the monitoring statistic is known to be monotonic over time (which is typical in methods using maxima in their statistics, for example) the flat UCL does not adjust to this property. We propose several methods for creating UCLs that provide improved power and simultaneously adjust the false alarm rate to userspecified values. Our methods are constructive in nature, making no use of assumed distribution properties of the underlying monitoring statistic. We evaluate the different proposed UCLs through simulations to illustrate the improvements over current UCLs. The proposed method is evaluated with respect to profile monitoring scenarios and the proposed density statistic. The method is applicable for monitoring any monotonically nondecreasing nonstationary statistics.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7292
 Format
 Thesis
 Title
 Elastic Shape Analysis of RNAs and Proteins.
 Creator

Laborde, Jose M., Srivastava, Anuj, Zhang, Jinfeng, Klassen, Eric, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

Proteins and RNAs are molecular machines performing biological functions in the cells of all organisms. Automatic comparison and classification of these biomolecules are fundamental yet open problems in the field of Structural Bioinformatics. An outstanding unsolved issue is the definition and efficient computation of a formal distance between any two biomolecules. Current methods use alignment scores, which are not proper distances, to derive statistical tests for comparison and...
Show moreProteins and RNAs are molecular machines performing biological functions in the cells of all organisms. Automatic comparison and classification of these biomolecules are fundamental yet open problems in the field of Structural Bioinformatics. An outstanding unsolved issue is the definition and efficient computation of a formal distance between any two biomolecules. Current methods use alignment scores, which are not proper distances, to derive statistical tests for comparison and classifications. This work applies Elastic Shape Analysis (ESA), a method recently developed in computer vision, to construct rigorous mathematical and statistical frameworks for the comparison, clustering and classification of proteins and RNAs. ESA treats bio molecular structures as 3D parameterized curves, which are represented with a special map called the square root velocity function (SRVF). In the resulting shape space of elastic curves, one can perform statistical analysis of curves as if they were random variables. One can compare, match and deform one curve into another, or as well as compute averages and covariances of curve populations, and perform hypothesis testing and classification of curves according to their shapes. We have successfully applied ESA to the comparison and classification of protein and RNA structures. We further extend the ESA framework to incorporate additional nongeometric information that tags the shape of the molecules (namely, the sequence of nucleotide/aminoacid letters for RNAs/proteins and, in the latter case, also the labels for the socalled secondary structure). The biological representation is chosen such that the ESA framework continues to be mathematically formal. We have achieved superior classification of RNA functions compared to stateoftheart methods on benchmark RNA datasets which has led to the publication of this work in the journal, Nucleic Acids Research (NAR). Based on the ESA distances, we have also developed a fast method to classify protein domains by using a representative set of protein structures generated by a clusteringbased technique we call Multiple Centroid Class Partitioning (MCCP). Comparison with other standard approaches showed that MCCP significantly improves the accuracy while keeping the representative set smaller than the other methods. The current schemes for the classification and organization of proteins (such as SCOP and CATH) assume a discrete space of their structures, where a protein is classified into one and only one class in a hierarchical tree structure. Our recent study, and studies by other researchers, showed that the protein structure space is more continuous than discrete. To capture the complex but quantifiable continuous nature of protein structures, we propose to organize these molecules using a network model, where individual proteins are mapped to possibly multiple nodes of classes, each associated with a probability. Structural classes will then be connected to form a network based on overlaps of corresponding probability distributions in the structural space.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8586
 Format
 Thesis
 Title
 Failure Time Regression Models for Thinned Point Processes.
 Creator

Holden, Robert T., Huffer, Fred G., Nichols, Warren, McGee, Dan, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

In survival analysis, data on the time until a specific criterion event (or "endpoint") occurs are analyzed, often with regard to the effects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component. In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the effects of the...
Show moreIn survival analysis, data on the time until a specific criterion event (or "endpoint") occurs are analyzed, often with regard to the effects of various predictors. In the classic applications, the criterion event is in some sense a terminal event, e.g., death of a person or failure of a machine or machine component. In these situations, the analysis requires assumptions only about the distribution of waiting times until the criterion event occurs and the nature of the effects of the predictors on that distribution. Suppose that the criterion event isn't a terminal event that can only occur once, but is a repeatable event. The sequence of events forms a stochastic {it point process}. Further suppose that only some of the events are detected (observed); the detected events form a thinned point process. Any failure time model based on the data will be based not on the time until the first occurrence, but on the time until the first detected occurrence of the event. The implications of estimating survival regression models from such incomplete data will be analyzed. It will be shown that the effect of thinning on regression parameters depends on the combination of the type of regression model, the type of point process that generates the events, and the thinning mechanism. For some combinations, the effect of a predictor will be the same for time to the first event and the time to the first detected event. For other combinations, the regression effect will be changed as a result of the incomplete detection.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8568
 Format
 Thesis
 Title
 2D Affine and Projective Shape Analysis, and Bayesian Elastic Active Contours.
 Creator

Bryner, Darshan W., Srivastava, Anuj, Klassen, Eric, Gallivan, Kyle, Huffer, Fred, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

An object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian...
Show moreAn object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian framework for shape analysis where metrics and related quantities are invariant to larger groups, the affine and projective groups, that approximate such transformations that arise from perspective skews. Highlighting two possibilities for representing object boundaries  ordered points (or landmarks) and parametrized curves  we study different combinations of these representations (points and curves) and transformations (affine and projective). Specifically, we provide solutions to three out of four situations and develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussiantype statistical models, and classifying test shapes using such models learned from training data. In the case of parametrized curves, an added issue is to obtain invariance to the reparameterization group. The geodesics are constructed by particularizing the pathstraightening algorithm to geometries of current manifolds and are used, in turn, to compute shape statistics and Gaussiantype shape models. We demonstrate these ideas using a number of examples from shape and activity recognition. After developing such Gaussiantype shape models, we present a variational framework for naturally incorporating these shape models as prior knowledge in guidance of active contours for boundary extraction in images. This socalled Bayesian active contour framework is especially suitable for images where boundary estimation is difficult due to low contrast, low resolution, and presence of noise and clutter. In traditional active contour models curves are driven towards minimum of an energy composed of image and smoothing terms. We introduce an additional shape term based on shape models of prior known relevant shape classes. The minimization of this total energy, using iterated gradientbased updates of curves, leads to an improved segmentation of object boundaries. We demonstrate this Bayesian approach to segmentation using a number of shape classes in many imaging scenarios including the synthetic imaging modalities of SAS (synthetic aperture sonar) and SAR (synthetic aperture radar), which are notoriously difficult to obtain accurate boundary extractions. In practice, the training shapes used for priorshape models may be collected from viewing angles different from those for the test images and thus may exhibit a shape variability brought about by perspective effects. Therefore, by allowing for a prior shape model to be invariant to, say, affine transformations of curves, we propose an active contour algorithm where the resulting segmentation is robust to perspective skews.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8534
 Format
 Thesis
 Title
 The Relationship Between Body Mass and Blood Pressure in Diverse Populations.
 Creator

Abayomi, Emilola J., McGee, Daniel, Lackland, Daniel, Hurt, Myra, Chicken, Eric, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body...
Show moreHigh blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body mass is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waisttohip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index,BMI= Weight(kg)/Height(m)2
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5308
 Format
 Thesis
 Title
 The Relationship of Diabetes to Coronary Heart Disease Mortality: A MetaAnalysis Based on PersonLevel Data.
 Creator

Williams, Felicia Gray, McGee, Daniel, Hurt, Myra, Pati, Debdeep, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Studies have suggested that diabetes is a stronger risk factor for coronary heart disease (CHD) in women than in men. We present a metaanalysis of personlevel data from 42 cohort studies in which diabetes, CHD mortality and potential confounders were available and a minimum of 75 CHD deaths occurred. These studies followed up 77,863 men and 84,671 women aged 42 to 73 years on average from the US, Denmark, Iceland, Norway and the UK. Individual study prevalence rates of selfreported...
Show moreStudies have suggested that diabetes is a stronger risk factor for coronary heart disease (CHD) in women than in men. We present a metaanalysis of personlevel data from 42 cohort studies in which diabetes, CHD mortality and potential confounders were available and a minimum of 75 CHD deaths occurred. These studies followed up 77,863 men and 84,671 women aged 42 to 73 years on average from the US, Denmark, Iceland, Norway and the UK. Individual study prevalence rates of selfreported diabetes mellitus at baseline ranged between less than 1% in the youngest cohort and 15.7% (males) and 11.1% (females) in the NHLBI CHS study of the elderly. CHD death rates varied between 2% and 20%. A metaanalysis was performed in order to calculate overall hazard ratios (HR) of CHD mortality among diabetics compared to nondiabetics using Cox Proportional Hazard models. The randomeffects HR associated with baseline diabetes and adjusted for age was significantly higher for females 2.65 (95% CI: 2.34, 2.96) than for males 2.33 (95% CI: 2.07, 2.58) (p=0.004). These estimates were similar to the randomeffects estimates adjusted additionally for serum cholesterol, systolic blood pressure, and current smoking status: females 2.69 (95% CI: 2.35, 3.03) and males 2.32 (95% CI: 2.05, 2.59) . They also agree closely with estimates (odds ratios of 2.9 for females and 2.3 for males) obtained in a recent metaanalysis of 50 studies of both fatal and nonfatal CHD but not based on personlevel data. This evidence suggests that diabetes diminishes the female advantage. An additional analysis was performed on race. Only 14 cohorts were analyzed in the metaanalysis. This analyses showed no significant difference between the black and white cohorts before (p=0.68) or after adjustment for the major CHD RFs (p=0.88). The limited amount of studies used may lack the power to detect any differences.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7662
 Format
 Thesis
 Title
 Semiparametric Survival Analysis Using Models with LogLinear Median.
 Creator

Lin, Jianchang, Sinha, Debajyoti, Zhou, Yi, Lipsitz, Stuart, McGee, Dan, Niu, XuFeng, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

First, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our...
Show moreFirst, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our modeling techniques facilitate the ease of prior elicitation and computation for both parametric and semiparametric Bayesian analysis of survival data. We illustrate the advantages of our modeling, as well as model diagnostics, via reanalysis of a smallcell lung cancer study. Results of our simulation study provide further guidance regarding appropriate modelling in practice. Our second goal is to develop the methods of analysis and associated theoretical properties for interval censored and current status survival data. These new regression models use loglinear regression function for the median. We present frequentist and Bayesian procedures for estimation of the regression parameters. Our model is a useful and practical alternative to the popular semiparametric models which focus on modeling the hazard function. We illustrate the advantages and properties of our proposed methods via reanalyzing a breast cancer study. Our other aim is to develop a model which is able to account for the heteroscedasticity of response, together with robust parameter estimation and outlier detection using sparsity penalization. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing median lasso regression model. Considering the estimation bias, mean squared error and other identication benchmark measures, our proposed model performs better than the competing frequentist estimator.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4992
 Format
 Thesis
 Title
 The Risk of Lipids on Coronary Heart Disease: Prognostic Models and MetaAnalysis.
 Creator

Almansour, Aseel, McGee, Daniel, Flynn, Heather, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Prognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but...
Show morePrognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but all included one or more of the lipid determinations: Total cholesterol (TC). Low Density Lipoproteins (LDL), High Density Lipoproteins (HDL), or ratios TC/HDL and LDL/HDL. None of these researchers included both LDL and TC in the same model due to the high correlation between these measurements. In this thesis we will examine some questions about the inclusion of lipid determinations in prognostic models: Can the effect of LDL and TC on the risk of dying from CHD be differentiated? If one measure is demonstrably stronger than the other, then a single model using that variable would be considered advantageous. Is it possible to derive a single measure from TC and LDL that is a stronger predictor than either measure? If so, then a new summarization of the lipid measurements should be used in prognostic modeling. Does the addition of HDL to a prognostic model improve the predictive accuracy of the model? If it does, then this determination that is almost universally determined should be used when developing prognostic models. We use data from nine independent studies to examine these issues. The studies were chosen because they include longitudinal followup of participants and included lipid determinations in the baseline examination of participants. There are many methodologies available for developing prognostic models, including logistic regression and the proportional hazards model. We used the proportional hazards model since we have followup times and times to death from CHD on all of the participants in the included studies. We summarized our results using a metaanalytic approach. Using the metaanalytic approach, we addressed the additional question of whether the results vary significantly among the different studies and also whether adding additional characteristics to the prognostic models changes the estimated effect of the lipid determinations. All of our results are presented stratified by gender and, when appropriate, by race. Finally, because our studies were not selected randomly, we also examined whether there is evidence of bias in our metaanalyses. For this examination we used funnel plots with related methodology for testing whether there is evidence of bias in the results.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8724
 Format
 Thesis
 Title
 A Class of Semiparametric Volatility Models with Applications to Financial Time Series.
 Creator

Chung, Steve S., Niu, XuFeng, Gallivan, Kyle, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied...
Show moreThe autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied and well documented in financial and econometric literature and many variants of ARCH/GARCH models have been proposed. To list a few, these include exponential GARCH(EGARCH), GJRGARHCH(or threshold GARCH), integrated GARCH(IGARCH), quadratic GARCH(QGARCH), and fractionally integrated GARCH(FIGARCH). The ARCH/GARCH models and their variant models have gained a lot of attention and they are still popular choice for modeling volatility. Despite their popularity, they suffer from model flexibility. Volatility is a latent variable and hence, putting a specific model structure violates this latency assumption. Recently, several attempts have been made in order to ease the strict structural assumptions on volatility. Both nonparametric and semiparametric volatility models have been proposed in the literature. We review and discuss these modeling techniques in detail. In this dissertation, we propose a class of semiparametric multiplicative volatility models. We define the volatility as a product of parametric and nonparametric parts. Due to the positivity restriction, we take the log and square transformations on the volatility. We assume that the parametric part is GARCH(1,1) and it serves as a initial guess to the volatility. We estimate GARCH(1,1) parameters by using conditional likelihood method. The nonparametric part assumes an additive structure. There may exist some loss of interpretability by assuming an additive structure but we gain flexibility. Each additive part is constructed from a sieve of Bernstein basis polynomials. The nonparametric component acts as an improvement for the parametric component. The model is estimated from an iterative algorithm based on boosting. We modified the boosting algorithm (one that is given in Friedman 2001) such that it uses a penalized least squares method. As a penalty function, we tried three different penalty functions: LASSO, ridge, and elastic net penalties. We found that, in our simulations and application, ridge penalty worked the best. Our semiparametric multiplicative volatility model is evaluated using simulations and applied to the six major exchange rates and SP 500 index. The results show that the proposed model outperforms the existing volatility models in both insample estimation and outofsample prediction.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8756
 Format
 Thesis
 Title
 PERCENTILE RESIDUAL LIFE FUNCTIONS  PROPERTIES, TESTING AND ESTIMATION.
 Creator

JOE, HARRY SUE WAH., Florida State University
 Abstract/Description

Let F be a life distribution with survival function F(' )(TBOND)(' )1  F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t
Show moreLet F be a life distribution with survival function F(' )(TBOND)(' )1  F. Conditional on survival to time t, the remaining life has survival function, F(,t)(x) = F(t + x)/F(t), x (GREATERTHEQ) 0, 0 (LESSTHEQ) t < F('1)(1)., The mean residual life function of F is, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), if F has a finite mean. The (alpha)percentile or quantile (0 < (alpha) < 1) residual life function of F is, q(,(alpha),F)(t) = F(,t)('1)((alpha)) = F('1)(1  (alpha)F(t))  t, 0 (LESSTHEQ) t < F('1)(1),, where (alpha) = 1  (alpha). Statisticians find it useful to categorize life distributions according to different aging properties. Categories which involve m(,F)(t) are the decreasing mean residual life (DMRL) class and the new better than used in expectation (NBUE) class. The DMRL class consists of distributions F such that m(,F)(t) is monotone decreasing on (0, F('1)(1)) and the NBUE class consists of distributions F such that m(,F)(0) (GREATERTHEQ) m(,F)(t) for all 0 < t < F('1)(1). Analogous categories which involve q(,(alpha),F)(t) are the decreasing (alpha)percentile residual life (DPRL(alpha)) class and the new better than used with respect to the (alpha)percentile (NBUP(alpha)) class., The mean residual life function is of interest in biometry, actuarial studies and reliability, and the DMRL and NBUE classes of life distributions are useful for modelling situations where items deteriorate with age. In the statistical literature, there are several papers which consider properties or estimation of the mean residual life function or consider testing situations involving the DMRL and NBUE classes. Only one previous paper discusses the (alpha)percentile residual life function. This dissertation is concerned with properties and estimation of the (alpha)percentile residual life function, and with testing problems involving the (alpha)percentile residual life function., Properties of q(,(alpha),F)(t) and of the DPRL(alpha), NBUP(alpha) and their dual classes are studied in Chapter II. In Chapter III, tests are developed for testing exponentiality against alternatives of DPRL(alpha) and NBUP(alpha). In Chapter IV, these tests are extended to accommodate randomly censored data. In Chapter V, a distributionfree twosample test is developed for testing the hypothesis that two life distributions F and G are equal against the alternative that q(,(alpha),F)(t) (GREATERTHEQ) q(,(alpha),G)(t) for all t. In Chapter VI, strong consistency, asymptotic normality, bias and mean squared error of the estimator F(,n)('1)(1(' )(' )(alpha)F(,n)(t))  t of q(,(alpha),F)(t) are studied, where F(,n) is the empirical distribution function and F(,n)(' )(TBOND)(' )1  F(,n).
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8214932, 3085276, FSDT3085276, fsu:74771
 Format
 Document (PDF)
 Title
 A NEW METHOD FOR ESTIMATING LIFE DISTRIBUTIONS FROM INCOMPLETE DATA.
 Creator

KITCHIN, JOHN FRANCIS., Florida State University
 Abstract/Description

We construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the ProductLimit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life...
Show moreWe construct a new estimator for a continuous life distribution from incomplete data, the Piecewise Exponential Estimator (PEXE)., To date the principal method of nonparametric estimation from incomplete data is the ProductLimit Estimator (PLE) introduced by Kaplan and Meier {J. Amer. Statist. Assoc. (1958) 53}. Our formulation of the estimation problem posed by incomplete data is essentially that of Kaplan and Meier, but we approach its solution from the viewpoint of reliability and life testing., In this work we establish rigorously the asymptotic (large sample) properties of the PEXE. Our results include the strong consistency of the PEXE under various sets of assumptions plus the weak convergence of the PEXE, suitably normalized, to a Gaussian process. From an intermediate result in our weak convergence proof we derive asymptotic confidence bands and a goodnessoffit test based on the PEXE., Though our main objective is the introduction of a new estimator for incomplete data and the study of its asymptotic properties, our second contribution to this area of research is the extension of the asymptotic results of the extensively used PLE. In particular, our results extend the work of Peterson {J. Amer. Statist. Assoc. (1977) 72} and Langberg, Proschan, and Quinzi {Ann. Statist. (1980) 8} in strong consistency and that of Breslow and Crowley {Ann. Statist. (1974) 2} in weak convergence., Finally, we show that the New PEXE, as an alternative to the traditional PLE, has several advantages for estimating a continuous life distribution from incomplete data, along with some drawbacks. Since the two estimators are so alike asymptotically, we concentrate on differences in the PEXE and the PLE for estimation from small samples.
Show less  Date Issued
 1980, 1980
 Identifier
 AAI8104261, 3084762, FSDT3084762, fsu:74263
 Format
 Document (PDF)
 Title
 A MATHEMATICAL STUDY OF THE DIRICHLET PROCESS.
 Creator

TIWARI, RAM CHANDRA., Florida State University
 Abstract/Description

This dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset...
Show moreThis dissertation is a contribution to the theory of Bayesian nonparametrics. A construction of the Dirichlet process (Ferguson {1973}) on a finite set (chi) is introduced in such a way that it leads to the Blackwell's (1973) constructive definition of a Dirichlet process on a Borel space ((chi),A). If ((chi),A) is a Borel space and P is a random probability measure on ((chi),A) with a Dirichlet process prior D('(alpha)), then under the condition that the (alpha)measure of every open subset of (chi) is positive, for almost every realization P of P the set of discrete mass points of P is dense in (chi)., A more general constructive definition introduced by Sethuraman (1978) is used to derive several new properties of the Dirichlet process and to present in a unified way some of the known properties of the process. An alternative construction of Dalal's (1975) Ginvariant Dirichlet process (G being a finite group of transformations) is presented., The Bayes estimates of an estimable parameter of degree k(k (GREATERTHEQ) 1), namely, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where h is a symmetric kernel, are derived for the no sample size and for a sample of size n from P under the squared error loss function and a Dirichlet process prior. Using the result of the Bayes estimate of (psi)(,k)(P) for the no sample size the (marginal) distribution of a sample from P (when the prior for P is the Dirichlet process) is obtained. The extension to the case when the prior for P is Ginvariant Dirichlet process is also obtained.(,), Let ((chi),A) be the onedimensional Euclidean space (R(,1),B(,1)). Consider a sequence {D('(alpha)(,N)+(gamma))} of Dirichlet processes such that (alpha)(,N)((chi)) converges to zero as N tends to infinity, where (gamma) and (alpha)(,N)'s are finite measures on A. It is shown that D('(alpha)(,N)+(gamma)) converges weakly to D('(gamma)) in the topology of weak, convergence on P, the class of all probability measures on ((chi),A). As a corollary, it follows that D('(alpha)(,N)+nF(,n)) converges weakly to D('nF(,n)), where F(,n) is the empirical distribution of the sample. Suppose (alpha)(,N)((chi)) converges to zero and (alpha)(,N)/(alpha)(,N)((chi)) converges uniformly to (alpha)/(alpha)((chi)) as N tends to infinity. If, {D('(alpha)(,N))} is a sequence of Dirichlet process priors for a random probability measure P on ((chi),A), then P, in the limit, is a random probability measure concentrated on the set of degenerate probability measures on ((chi),A) and the point of degeneracy is distributed as (alpha)/(alpha)((chi)) on ((chi),A). To the sequence of priors (D('(alpha)(,N))) for P, there corresponds a sequence of the Bayes estimates of (psi)(,k)(P). The limit of this sequence of the Bayes estimates when (alpha)(,N)((chi)) converges to zero as N tends to infinity, called the limiting Bayes estimate of (psi)(,k)(P), is obtained., When P is a random probability measure on {0, 1}, Sethuraman (1978) proposed a more general class of conjugate priors for P which contains both the family of Dirichlet processes and the family of priors introduced by Dubins and Freedman (1966). As an illustration, a numerical example is considered and the Bayes estimates of the mean and the variance of P are computed under three distinct priors chosen from Sethuraman's class of priors. The computer algorithm for this calculation is presented.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8108190, 3084828, FSDT3084828, fsu:74329
 Format
 Document (PDF)
 Title
 AN INVESTIGATION OF THE EFFECT OF THE SWAMPING PHENOMENON ON SEVERAL BLOCK PROCEDURES FOR MULTIPLE OUTLIERS IN UNIVARIATE SAMPLES.
 Creator

WOOLLEY, THOMAS WILLIAM, JR., Florida State University
 Abstract/Description

Statistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the...
Show moreStatistical outliers have been an issue of concern to researchers for over two centuries, and are the focus of this study. Sources of outliers, and various means for dealing with them are discussed. Also presented are general descriptions of univariate outlier tests as well as the two approaches to handling multiple outlier situations, consecutive and block testing. The major problems inherent in these latter methods, masking and swamping, respectively, are recounted., Specifically, the primary aim of this study is to assess the susceptibility to swamping of four block procedures for multiple outliers in univariate samples., Pseudorandom samples are generated from a unit normal distribution, and varying numbers of upper outliers are placed in them according to specified criteria. A swamping index is created which reflects the relative vulnerability of each test to declare a block of outliers and the most extreme upper nonoutlier discordant, as a unit., The results of this investigation reveal that the four block tests disagree in their respective susceptibilities to swamping depending upon sample size and the prespecified number of outliers assumed to be present. Rank orderings of these four tests based upon their vulnerability to swamping under varying circumstances are presented. In addition, alternate approaches to calculating the swamping index when four or more outliers exist are described., Recommendations concerning the appropriate application of the four block procedures under differing situations, and proposals for further research, are advanced.
Show less  Date Issued
 1981, 1981
 Identifier
 AAI8113272, 3084903, FSDT3084903, fsu:74401
 Format
 Document (PDF)
 Title
 ESTIMATION AND PREDICTION FOR EXPONENTIAL TIME SERIES MODELS.
 Creator

MOHAMED, FOUAD YEHIA., Florida State University
 Abstract/Description

This work is concerned with the study of stationary time series models in which the marginal distribution of the observations follows an exponential distribution. This is in contrast to the standard models in the literature where the error sequence and hence the marginal distributions of the o
 Date Issued
 1981, 1981
 Identifier
 AAI8205698, 3085176, FSDT3085176, fsu:74671
 Format
 Document (PDF)
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 TESTS OF DISPLACEMENT AND ORDERED MEAN HYPOTHESES.
 Creator

SINCLAIR, DENNIS FRANKLIN., Florida State University
 Abstract/Description

Character displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing...
Show moreCharacter displacement is an ecological process by which, theoretically, coexisting species diverge in size to reduce competition. A closely allied concept is deletion, in which species are excluded from a habitat because they do not differ sufficiently from other species living there. Character displacement has been a controversial topic in recent years, largely due to a lack of statistical procedures for testing its existence. We propose herein a variety of approaches for testing displacement and deletion hypotheses. The applicability of the methods extends beyond the motivating ecological problem to other fields., Consider the model, X(,ij) = (mu)(,i) + (epsilon)(,ij), i = 1, ..., k; j = 1, ..., n(,i),, where X(,ij) is the j('th) observation on species i with population mean (mu)(,i). The (epsilon)(,ij)'s are independent normally distributed error terms with mean zero and common variance., Traditionally ecologists have regarded species sizes as randomly distributed. We develop tests for displacement and deletion by considering uniform, lognormal and loguniform distributions for species sizes. (A random variable Y has a loguniform distribution if log Y has a uniform distribution.), Most claimed manifestations of character displacement concern the ratios of each species size to the next smallest one (contiguous ratios). All but one of the test statistics are functions of spacings (logarithms of contiguous ratios). We prove a useful characterization of distributions in terms of spacings, and show that the loguniform distribution produces constant expected contiguous ratiosan important property in character displacement studies. The random effects approaches generally lack power in detecting the suspected patterns., We develop further tests for the model in which the (mu)(,i)'s are regarded as fixed. This fixed effects approach, which may be more realistic ecologically, produces considerably more powerful tests. Displacement hypotheses in the fixed effects framework are expressed naturally in terms of the ordered means (mu)(,(1)) < (mu)(,(2)) < ... < (mu)(,(k)). We develop a general theory by which a particular class of linear hypotheses about any number of sets of ordered means may be tested., Finally a functional relation is used to model the movement of species means from one environment to another. Existing asymptotic tests are shown to perform remarkably well for small samples.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8223194, 3085332, FSDT3085332, fsu:74827
 Format
 Document (PDF)
 Title
 SOME RESULTS ON THE DISTRIBUTION OF GRUBBS ESTIMATORS.
 Creator

BRINDLEY, DENNIS ALFRED., Florida State University
 Abstract/Description

This dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,...
Show moreThis dissertation is concerned with the estimation of error variances in a nonreplicated twoway classification and with inferences based on the estimators so derived. The postulated model used throughout the present work is, y(,ij) = (mu)(,i) + (beta)(,j) + (epsilon)(,ij),, where y(,ij) is the observation in the i('th) row and j('th) column, (mu)(,i) is the parameter representing the mean of the i('th) row, (beta)(,j) is the parameter representing the additional effect of the j('th) column,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), and the (epsilon)(,ij) are independent, zeromean, normal variates with, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), A set of unbiased estimates, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), developed in earlier work by Grubbs (J. AMER. STATIST. ASSOC. 43 (1948), 243264), Ehrenberg (BIOMETRIKA 37 (1950), 347357) and Russell and Bradley (BIOMETRIKA 45 (1958), 111129) are considered., The exact joint density of Q(,1), ..., Q(,r) is obtained for r = 3 and two exact results are derived for testing the null hypothesis,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), unknown, versus the two specific alternatives,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), for at least some j, j = 1, 2, 3, and,, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI)
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8229146, 3085401, FSDT3085401, fsu:74896
 Format
 Document (PDF)
 Title
 LARGE DEVIATION LOCAL LIMIT THEOREMS, WITH APPLICATIONS.
 Creator

CHAGANTY, NARASINGA RAO., Florida State University
 Abstract/Description

Let {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE...
Show moreLet {X(,n), n (GREATERTHEQ) 1} be a sequence of i.i.d. random variables withE(X(,1)) = 0, Var(X(,1)) = 1. Let (psi)(s) be the cumulant generating function (c.g.f.) and, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), be the large deviation rate of X(,1). Let S(,n) = X(,1) + ... + X(,n). Under some mild conditions on (psi), Richter (Theory Prob. Appl. (1957) 2, 206219) showed that the probability density function f(,n) of(' )S(,n)/SQRT.(n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), whenever x(,n) = o(SQRT.(n) and SQRT.(n x(,n) > 1. In this dissertation we obtain similar large deviation local limit theorems for arbitrary sequences of random variables, not necessarily sums of i.i.d. random variables, thereby increasing the applicability of Richter's theorem. Let {T(,n), n (GREATERTHEQ) 1} be an arbitrary sequence of nonlattice random variables with characteristic function (c.f.) (phi)(,n). Let (psi)(,n), (gamma)(,n) be the c.g.f. and the large deviation rate of T(,n)/n. The main theorem in Chapter II shows that under some standard conditions on (psi)(,n), which imply that T(,n)/n converges to a constant in probability, the density function K(,n) of T(,n)/n has the asymptotic expression, (DIAGRAM, TABLE OR GRAPHIC OMITTED...PLEASE SEE DAI), where m(,n) is any sequence of real numbers and (tau)(,n) is defined by(psi)(,n)'((tau)(,n)) = m(,n). When T(,n) is the sum of n i.i.d. random variables our result reduces to Richter's theorem. Similar theorems for lattice valued random variables are also presented which are useful in obtaining asymptotic probabilities for Wilcoxon signedrank test statistic and Kendall's tau., In Chapter III we use the results of Chapter II to obtain central limit theorem for sums of a triangular array of dependent random variables X(,j)('(n)), j = 1, ..., n with joint distribution given by z(,n)('1)exp{H(,n)(x(,1), ..., x(,n))}(PI)dP(x(,j)), where x(,i) (ELEM) R (FOR ALL) i (GREATERTHEQ) 1. The function H(,n)(x(,1), ..., x(,n)) is known as the Hamiltonian. Here P is a probability measure on R. When H(,n)(x(,1), ..., x(,n)) = log (phi)(,n)(s(,n)/n), where s(,n) = x(,1) + ... + x(,n) and the probability measure P satisfies appropriate conditions, we show that there exists an integer r (GREATERTHEQ) 1 and a sequence (tau)(,n) such that (S(,n)  n(tau)(,n))/n('1 1/2r) has a limiting distribution which is nonGaussian if r (GREATERTHEQ) 2. This result generalizes the theorems of JongWoo Jeon (Ph.D. Thesis, Dept. of Stat., F.S.U. (1979)) and Ellis and Newman (Z. Wahrscheinlichkeitstheorie und Verw. Gebiete. (1978) 44, 117139). Chapters IV and V extend the above to the multivariate case.
Show less  Date Issued
 1982, 1982
 Identifier
 AAI8225279, 3085419, FSDT3085419, fsu:74914
 Format
 Document (PDF)
 Title
 A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
 Creator

Delpish, Ayesha Nneka, Niu, XuFeng, Tate, Richard L., Huﬀer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
 Abstract/Description

The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chisquared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chisquared distribution, particularly for the variancecovariance component estimates.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0771
 Format
 Thesis
 Title
 Estimation from Data Representing a Sample of Curves.
 Creator

Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0286
 Format
 Thesis
 Title
 Modelling experimental data analysis.
 Creator

Ford, Charles Wesley, Jr., Florida State University
 Abstract/Description

An important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of objectoriented database technology. It uses the objectoriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely...
Show moreAn important goal of research in scientific databases is to provide a capability for managing the acquisition and analysis of data and to assist in the development and evaluation of data analysis functions. This dissertation is an important step toward this goal and represents an extension and application of objectoriented database technology. It uses the objectoriented approach to provide a model which describes a very general strategy for data analysis. This model is used to precisely define data acquisition and data analysis in a style which is well suited to management by a database system. The model has been implemented and evaluated in the context of a very complex experimental physics project. The implementation is described in detail with examples of the database schema and operations using the C${++}$ binding of ODMG93. As a result of the careful application of objectoriented methods, the management of data acquisition and analysis has become feasible for domain scientists. The application of the methods described herein to specific problems in experimental physics has resulted in a database which will be used to manage all of the data acquisition and data analysis for the Large Acceptance Spectrometer (CLAS) at the Continuous Electron Beam Accelerator Facility (CEBAF), a U.S. Department of Energy project.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9529599, 3088659, FSDT3088659, fsu:77461
 Format
 Document (PDF)
 Title
 On a general repair model for repairable systems.
 Creator

Dorado, Crisanto Ayap., Florida State University
 Abstract/Description

The minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an...
Show moreThe minimal repair process assumes that upon repair a system is restored to its functioning condition just before failure. For systems with few vulnerable components it is more reasonable to assume that repair actually brings the state of the system to a level that is between "completely new" and "prior to failure". Kijima (1989) introduced models for such a repair process based on the notion of age reduction. Under age reduction, the system, upon repair, is functionally the same as an identical system of lesser age. An alternative to age reduction is the notion of extra life. Under this notion, the system, upon repair, enjoys a longer expected remaining life than it would have had under a minimal repair., In this dissertation, we introduce a repair model that generalizes Kijima's models so as to include both the notions of age reduction and extra life. We then look at the problem of estimating system reliability based on observations of the repair process from several systems working independently. We make use of counting processes and martingales to derive large sample properties of the estimator.
Show less  Date Issued
 1995, 1995
 Identifier
 AAI9540050, 3088702, FSDT3088702, fsu:77504
 Format
 Document (PDF)
 Title
 PART 1  THE LIMITING DISTRIBUTION OF THE LIKELIHOOD RATIO STATISTIC 2 LOG(LAMBDA(N)) UNDER A CLASS OF LOCAL ALTERNATIVES. PART 2  MINIMUM AVERAGE RISK DECISION PROCEDURES FOR THE NONCENTRAL CHISQUARE DISTRIBUTION.
 Creator

LEVER, WILLIAM EDWIN., The Florida State University
 Date Issued
 1968, 1968
 Identifier
 AAI6811680, 2985783, FSDT2985783, fsu:70292
 Format
 Document (PDF)