 Prediction and Testing for NonParametric Random Function Signals in a Complex System.
 Creator

Hill, Paul C., Chicken, Eric, Klassen, Eric, Niu, Xufeng, Barbu, Adrian, Department of Statistics, Florida State University
 Abstract/Description

Methods employed in the construction of prediction bands for continuous curves require a dierent approach to those used for a data point. In many cases, the underlying function is unknown and thus a distributionfree approach which preserves sufficient coverage for the entire signal is necessary in the signal analysis. This paper discusses three methods for the formation of (1alpha)100% bootstrap prediction bands and their performances are compared through the coverage probabilities obtained...
Methods employed in the construction of prediction bands for continuous curves require a dierent approach to those used for a data point. In many cases, the underlying function is unknown and thus a distributionfree approach which preserves sufficient coverage for the entire signal is necessary in the signal analysis. This paper discusses three methods for the formation of (1alpha)100% bootstrap prediction bands and their performances are compared through the coverage probabilities obtained for each technique. Bootstrap samples are first obtained for the signal and then three dierent criteria are provided for the removal of 100% of the curves resulting in the (1alpha)100% prediction band. The first method uses the L1 distance between the upper and lower curves as a gauge to extract the widest bands in the dataset of signals. Also investigated are extractions using the Hausdorffdistance between the bounds as well as an adaption to the bootstrap intervals discussed in Lenhoffet al (1999). The bootstrap prediction bands each have good coverage probabilities for the continuous signals in the dataset. For a 95% prediction band, the coverage obtained were 90.59%, 93.72% and 95% for the L1 Distance, Hausdorff Distance and the adjusted Bootstrap methods respectively. The methods discussed in this paper have been applied to constructing prediction bands for spring discharge in a successful manner giving good coverage in each case. Spring Discharge measured over time can be considered as a continuous signal and the ability to predict the future signals of spring discharge is useful for monitoring flow and other issues related to the spring. While in some cases, rainfall has been tted with the gamma distribution, the discharge of the spring represented as continuous curves, is better approached not assuming any specific distribution. The Bootstrap aspect occurs not in sampling the output discharge curves but rather in simulating the input recharge that enters the spring. Bootstrapping the rainfall as described in this paper, allows for adequately creating new samples over different periods of time as well as specic rain events such as hurricanes or drought. The Bootstrap prediction methods put forth in this paper provide an approach that supplies adequate coverage for prediction bands for signals represented as continuous curves. The pathway outlined by the flow of the discharge through the springshed is described as a tree. A nonparametric pairwise test, motivated by the idea of Kmeans clustering, is proposed to decipher whether there is equality between two trees in terms of their discharges. A large sample approximation is devised for this lowertail significance test and test statistics for different numbers of input signals are compared to a generated table of critical values.
2012
 2012
 Identifier
 FSU_migr_etd4910
 Format
 Thesis
 Title
 Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage.
 Creator

Cuevas, Jordan, Chicken, Eric, Sobanjo, John, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector)....
Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an ndimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a PhaseI setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the PhaseII problem of monitoring sequences of profiles for deviations from incontrol. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation.
2012
 2012
 Identifier
 FSU_migr_etd4788
 Format
 Thesis
 Title
 Weighted Adaptive Methods for Multivariate Response Models with an HIV/Neurocognitive Application.
 Creator

Geis, Jennifer Ann, She, Yiyuan, MeyerBaese, Anke, Barbu, Adrian, Bunea, Florentina, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the...
Multivariate response models are being used increasingly more in almost all fields with the necessary employment of inferential methods such as Canonical Correlation Analysis (CCA). This requires the estimation of the number of uncorrelated canonical relationships between the two sets, or, equivalently so, determining the rank of the coefficient estimator in the multivariate response model.One way to do this is by the Rank Selection Criterion (RSC) by Bunea et al. with the assumption the error matrix has independent constant variance entries. While this assumption is necessary to show their strong theoretical results, in practical application, some flexibility is required. That is, such assumption cannot always be safely made. What is developed here are the theoretics that parallel Bunea et al.'s work with the addition of a "decorrelator" weight matrix. One choice for the weight matrix is the residual covariance, but this introduces many issues in practice. A computationally more convenient weight matrix is the sample response covariance. When such a weight matrix is chosen, CCA is directly accessible by this weighted version of RSC giving rise to an Adaptive CCA (ACCA) with principal proofs for the large sample setting. However, particular considerations are required for the highdimensional setting, where similar theoretics do not hold. What is offered instead are extensive empirical simulations that reveal that using the sample response covariance still provides good rank recovery and estimation of the coefficient matrix, and hence, also provides good estimation of the number of canonical relationships and variates. It is argued precisely why other versions of the residual covariance, including a regularized version, are poor choices in the highdimensional setting. Another approach to avoid these issues is to employ some type of variable selection methodology first before applying ACCA. Truly, any group selection method may be applied prior to ACCA as variable selection in the multivariate response model is the same as group selection in the univariate response model and thus completely eliminates these highdimensional concerns. To offer a practical application of these ideas, ACCA is applied to a "large sample'" neurocognitive dataset. Then, a highdimensional dataset is generated to which Group LASSO will be first utilized before ACCA. This provides a unique perspective into the relationships between cognitive deficiencies in HIVpositive patients and the extensive, available neuroimaging measures.
2012
 2012
 Identifier
 FSU_migr_etd4861
 Format
 Thesis
 Title
 Nonparametric Wavelet Thresholding and Profile Monitoring for NonGaussian Errors.
 Creator

McGinnity, Kelly, Chicken, Eric, Hoeﬂich, Peter, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

Recent advancements in data collection allow scientists and researchers to obtain massive amounts of information in short periods of time. Often this data is functional and quite complex. Wavelet transforms are popular, particularly in the engineering and manufacturing fields, for handling these type of complicated signals. A common application of wavelets is in statistical process control (SPC), in which one tries to determine as quickly as possible if and when a sequence of profiles has...
Recent advancements in data collection allow scientists and researchers to obtain massive amounts of information in short periods of time. Often this data is functional and quite complex. Wavelet transforms are popular, particularly in the engineering and manufacturing fields, for handling these type of complicated signals. A common application of wavelets is in statistical process control (SPC), in which one tries to determine as quickly as possible if and when a sequence of profiles has gone outofcontrol. However, few wavelet methods have been proposed that don't rely in some capacity on the assumption that the observational errors are normally distributed. This dissertation aims to fill this void by proposing a simple, nonparametric, distributionfree method of monitoring profiles and estimating changepoints. Using only the magnitudes and location maps of thresholded wavelet coefficients, our method uses the spatial adaptivity property of wavelets to accurately detect profile changes when the signal is obscured with a variety of nonGaussian errors. Wavelets are also widely used for the purpose of dimension reduction. Applying a thresholding rule to a set of wavelet coefficients results in a "denoised" version of the original function. Once again, existing thresholding procedures generally assume independent, identically distributed normal errors. Thus, the second main focus of this dissertation is a nonparametric method of thresholding that does not assume Gaussian errors, or even that the form of the error distribution is known. We improve upon an existing evenodd crossvalidation method by employing block thresholding and level dependence, and show that the proposed method works well on both skewed and heavytailed distributions. Such thresholding techniques are essential to the SPC procedure developed above.
2013
 2013
 Identifier
 FSU_migr_etd7502
 Format
 Thesis
 Title
 The Frequentist Performance of Some Bayesian Confidence Intervals for the Survival Function.
 Creator

Tao, Yingfeng, Huﬀer, Fred, Okten, Giray, Sinha, Debajyoti, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based...
Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based in some way on the KaplanMeier estimator first proposed by Kaplan and Meier (1958) and widely used as the nonparametric estimator in the presence of rightcensored data. For intervalcensored data, the Turnbull estimator (Turnbull (1974)) plays a similar role. For a class of Bayesian models involving Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques lead to probability intervals for the survival function (at arbitrary time points) and its quantiles for both the rightcensored and intervalcensored cases. This dissertation will examine the frequentist properties and general performance of these probability intervals when the prior is noninformative. Simulation studies will be used to compare these probability intervals with other published approaches. Extensions of the DossHuffer approach are given for constructing simultaneous confidence bands for the survival function and for computing approximate confidence intervals for the survival function based on Edgeworth expansions using posterior moments. The performance of these extensions is studied by simulation.
2013
 2013
 Identifier
 FSU_migr_etd7624
 Format
 Thesis
 Title
 Meta Analysis and Meta Regression of a Measure of Discrimination Used in Prognostic Modeling.
 Creator

Rivera, Gretchen L., McGee, Daniel, Hurt, Myra, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our...
In this paper we are interested in predicting death with the underlying cause of coronary heart disease (CHD). There are two prognostic modeling methods used to predict CHD: the logistic model and the proportional hazard model. For this paper we consider the logistic model. The dataset used is the Diverse Populations Collaboration (DPC) dataset which includes 28 studies. The DPC dataset has epidemiological results from investigation conducted in different populations around the world. For our analysis we include those individuals who are 17 years old or older. The predictors are: age, diabetes, total serum cholesterol (mg/dl), high density lipoprotein (mg/dl), systolic blood pressure (mmHg) and if the participant is a current cigarette smoker. There is a natural grouping within the studies such as gender, rural or urban area and race. Based on these strata we have 84 cohort groups. Our main interest is to evaluate how well the prognostic model discriminates. For this, we used the area under the Receiver Operating Characteristic (ROC) curve. The main idea of the ROC curve is that a set of subject is known to belong to one of two classes (signal or noise group). Then an assignment procedure assigns each object to a class on the basis of information observed. The assignment procedure is not perfect: sometimes an object is misclassified. We want to evaluate the quality of performance of this procedure, for this we used the Area under the ROC curve (AUROC). The AUROC varies from 0.5 (no apparent accuracy) to 1.0 (perfect accuracy). For each logistic model we found the AUROC and its standard error (SE). We used Metaanalysis to summarize the estimated AUROCs and to evaluate if there is heterogeneity in our estimates. To evaluate the existence of significant heterogeneity we used the Q statistic. Since heterogeneity was found in our study we compare seven different methods for estimating τ2 (between study variance). We conclude by examining whether differences in study characteristics explained the heterogeneity in the values of the AUROC.
2013
 2013
 Identifier
 FSU_migr_etd7580
 Format
 Thesis
 Title
 MixedEffects Models for Count Data with Applications to Educational Research.
 Creator

Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixedeffects and mixeddistribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zeroinflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zeroinflated Poisson models to repeated measures data of zeroinflated data by introducing a pair of possibly correlated random effects to the zeroinflated Poisson model to accommodate withinsubject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasiNewton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zeroinflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5181
 Format
 Thesis
 Title
 A Novel Riemannian Metric for Analyzing Spherical Functions with Applications to HARDI Data.
 Creator

Ncube, Sentibaleng, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of...
Show moreWe propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of PDFs modulo the rotations. In other words, any two PDFs are compared in: (1) shape by rotationally aligning one PDF to another, using the FisherRao distance on the aligned PDFs, and (2) orientation by comparing their rotation matrices. This idea improves upon the results from using the FisherRao metric in analyzing PDFs directly, a technique that is being used increasingly, and leads to geodesic interpolations that are biologically feasible. This framework leads to definitions and efficient computations for the Karcher mean that provide tools for improved interpolation and denoising. We demonstrate these ideas, using an experimental setup involving several PDFs.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd5064
 Format
 Thesis
 Title
 A SenderCentric Approach to Spam and Phishing Control.
 Creator

Sanchez, Fernando X. (Fernando Xavier), Duan, Zhenhai, Niu, Xufeng, Yuan, Xin, Aggarwal, Sudhir, Department of Scientific Computing, Florida State University
 Abstract/Description

The Internet email system as a popular online communication tool has been increasingly misused by illwilled users to carry out malicious activities including spamming and phishing. Alarmingly, in recent years the nature of the emailbased malicious activities has evolved from being purely annoying (with the notorious example of spamming) to being criminal (with the notorious example of phishing). Despite more than a decade of antispam and antiphishing research and development efforts, both...
Show moreThe Internet email system as a popular online communication tool has been increasingly misused by illwilled users to carry out malicious activities including spamming and phishing. Alarmingly, in recent years the nature of the emailbased malicious activities has evolved from being purely annoying (with the notorious example of spamming) to being criminal (with the notorious example of phishing). Despite more than a decade of antispam and antiphishing research and development efforts, both the sophistication and volume of spam and phishing messages on the Internet have continuously been on the rise over the years. A key difficulty in the control of emailbased malicious activities is that malicious actors have great operational flexibility in performing emailbased malicious activities, in terms of both the email delivery infrastructure and email content; moreover, existing antispam and antiphishing measures allow for arms race between malicious actors and the antispam and antiphishing community. In order to effectively control emailbased malicious activities such as spamming and phishing, we argue that we must limit (and ideally, eliminate) the operational flexibility that malicious actors have enjoyed over the years. In this dissertation we develop and evaluate a sendercentric approach (SCA) to addressing the problem of emailbased malicious activities so as to control spam and phishing emails on the Internet. SCA consists of three complementary components, which together greatly limit the operational flexibility of malicious actors in sending spam and phishing emails. The first two components of SCA focus on limiting the infrastructural flexibility of malicious actors in delivering emails, and the last component focuses on on limiting the flexibility of malicious actors in manipulating the content of emails. In the first component of SCA, we develop a machinelearning based system to prevent malicious actors from utilizing compromised machines to send spam and phishing emails. Given that the vast majority of spam and phishing emails are delivered via compromised machines on the Internet today, this system can greatly limit the infrastructural flexibility of malicious actors. Ideally, malicious actors should be forced to send spam and phishing messages from their own machines so that blacklists and reputationbased systems can be effectively used to block spam and phishing emails. The machinelearning based system we develop in this dissertation is a critical step towards this goal. In recent years, malicious actors also started to employ advanced techniques to hijack network prefixes in conducting emailbased malicious activities, which makes the control and attribution of spam and phishing emails even harder. In the second component of SCA, we develop a practical approach to improve the security of the Internet interdomain routing protocol BGP. Given that the key difficulties in adopting any mechanism to secure the Internet interdomain routing are the overhead and incremental deployment property of the mechanism, our scheme is designed to have minimum overhead and it can be incrementally deployed by individual networks on the Internet to protect themselves (and their customer networks), so that individual networks have incentives to deploy the scheme. In addition to the infrastructural flexibility in delivering spam and phishing emails, malicious actors have enormous flexibility in manipulating the format and content of email messages. In particular, malicious actors can forge phishing messages as close to legitimate messages in terms of both format and content. Although malicious actors have immense power in manipulating the format and content of phishing emails, they cannot completely hide how a message is delivered to the recipients. Based on this observation, in the last component of SCA, we develop a system to identify phishing emails based on the sender related information instead of the format or content of email messages. Together, the three complementary components of SCA will greatly limit the operational flexibility and capability that malicious actors have enjoyed over the years in delivering spam and phishing emails, and we believe that SCA will make a significant contribution towards addressing the spam and phishing problem on the Internet.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd5163
 Format
 Thesis
 Title
 Monte Carlo Likelihood Estimation for Conditional Autoregressive Models with Application to Sparse Spatiotemporal Data.
 Creator

Bain, Rommel, Huffer, Fred, Becker, Betsy, Niu, Xufeng, Srivastava, Anuj, Department of Statistics, Florida State University
 Abstract/Description

Spatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Spatiotemporal models describe the relationship between observations collected from different spatiotemporal sites. The modeling of spatiotemporal interactions arising from spatiotemporal data is done by incorporating the spacetime dependence into the covariance structure. A...
Show moreSpatiotemporal modeling is increasingly used in a diverse array of fields, such as ecology, epidemiology, health care research, transportation, economics, and other areas where data arise from a spatiotemporal process. Spatiotemporal models describe the relationship between observations collected from different spatiotemporal sites. The modeling of spatiotemporal interactions arising from spatiotemporal data is done by incorporating the spacetime dependence into the covariance structure. A main goal of spatiotemporal modeling is the estimation and prediction of the underlying process that generates the observations under study and the parameters that govern the process. Furthermore, analysis of the spatiotemporal correlation of variables can be used for estimating values at sites where no measurements exist. In this work, we develop a framework for estimating quantities that are functions of complete spatiotemporal data when the spatiotemporal data is incomplete. We present two classes of conditional autoregressive (CAR) models (the homogeneous CAR (HCAR) model and the weighted CAR (WCAR) model) for the analysis of sparse spatiotemporal data (the log of monthly mean zooplankton biomass) collected on a spatiotemporal lattice by the California Cooperative Oceanic Fisheries Investigations (CalCOFI). These models allow for spatiotemporal dependencies between nearest neighbor sites on the spatiotemporal lattice. Typically, CAR model likelihood inference is quite complicated because of the intractability of the CAR model's normalizing constant. Sparse spatiotemporal data further complicates likelihood inference. We implement Monte Carlo likelihood (MCL) estimation methods for parameter estimation of our HCAR and WCAR models. Monte Carlo likelihood estimation provides an approximation for intractable likelihood functions. We demonstrate our framework by giving estimates for several different quantities that are functions of the complete CalCOFI time series data.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7283
 Format
 Thesis
 Title
 The Relationship Between Body Mass and Blood Pressure in Diverse Populations.
 Creator

Abayomi, Emilola J., McGee, Daniel, Lackland, Daniel, Hurt, Myra, Chicken, Eric, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

High blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body...
Show moreHigh blood pressure is a major determinant of risk for Coronary Heart Disease (CHD) and stroke, leading causes of death in the industrialized world. A myriad of pharmacological treatments for elevated blood pressure, defined as a blood pressure greater than 140/90mmHg, are available and have at least partially resulted in large reductions in the incidence of CHD and stroke in the U.S. over the last 50 years. The factors that may increase blood pressure levels are not well understood, but body mass is thought to be a major determinant of blood pressure level. Obesity is measured through various methods (skinfolds, waisttohip ratio, bioelectrical impedance analysis (BIA), etc.), but the most commonly used measure is body mass index,BMI= Weight(kg)/Height(m)2
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5308
 Format
 Thesis
 Title
 3Manifolds of S1Category Three.
 Creator

Wang, Dongxu, Heil, Wolfgang, Niu, Xufeng, Klassen, Eric P., Hironaka, Eriko, Nichols, Warren D., Department of Mathematics, Florida State University
 Abstract/Description

I study 3manifold theory, which is a fascinating research area in topology. Many new ideas and techniques were introduced during these years, which makes it an active and fast developing subject. It is one of the most fruitful branches of today's mathematics and with the solution of the Poincare conjecture, it is getting more attention. This dissertation is motivated by results about categorical properties for 3manifolds. This can be rephrased as the study of 3manifolds which can be...
Show moreI study 3manifold theory, which is a fascinating research area in topology. Many new ideas and techniques were introduced during these years, which makes it an active and fast developing subject. It is one of the most fruitful branches of today's mathematics and with the solution of the Poincare conjecture, it is getting more attention. This dissertation is motivated by results about categorical properties for 3manifolds. This can be rephrased as the study of 3manifolds which can be covered by certain sets satisfying some homotopy properties. A special case is the problem of classifying 3manifolds that can be covered by three simple S1contractible subsets. S1contractible subsets are subsets of a 3manifold M3 that can be deformed into a circle in M3. In this thesis, I consider more geometric subsets with this property, namely subsets are homeomorphic to 3balls, solid tori and solid Klein bottles. The main result is a classication of all closed 3manifolds that can be obtained as a union of three solid Klein bottles.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7650
 Format
 Thesis
 Title
 Assessment of Parameteric and Model Uncertainty in Groundwater Modeling.
 Creator

Lu, Dan, Ye, Ming, Niu, Xufeng, Beerli, Peter, Curtis, Gary, Navon, Michael, Plewa, Tomasz, Department of Scientific Computing, Florida State University
 Abstract/Description

Groundwater systems are open and complex, rendering them prone to multiple conceptual interpretations and mathematical descriptions. When multiple models are acceptable based on available knowledge and data, model uncertainty arises. One way to assess the model uncertainty is postulating several alternative hydrologic models for a site and using model selection criteria to (1) rank these models, (2) eliminate some of them, and/or (3) weight and average predictions statistics generated by...
Show moreGroundwater systems are open and complex, rendering them prone to multiple conceptual interpretations and mathematical descriptions. When multiple models are acceptable based on available knowledge and data, model uncertainty arises. One way to assess the model uncertainty is postulating several alternative hydrologic models for a site and using model selection criteria to (1) rank these models, (2) eliminate some of them, and/or (3) weight and average predictions statistics generated by multiple models based on their model probabilities. This multimodel analysis has led to some debate among hydrogeologists about the merits and demerits of common model selection criteria such as AIC, AICc, BIC, and KIC. This dissertation contributes to the discussion by comparing the abilities of the two common Bayesian criteria (BIC and KIC) theoretically and numerically. The comparison results indicate that, using MCMC results as a reference, KIC yields more accurate approximations of model probability than does BIC. Although KIC reduces asymptotically to BIC, KIC provides consistently more reliable indications of model quality for a range of sample sizes. In the multimodel analysis, the model averaging predictive uncertainty is a weighted average of predictive uncertainties of individual models. So it is important to properly quantify individual model's predictive uncertainty. Confidence intervals based on regression theories and credible intervals based on Bayesian theories are conceptually different ways to quantify predictive uncertainties, and both are widely used in groundwater modeling. This dissertation explores their differences and similarities theoretically and numerically. The comparison results indicate that given Gaussian distributed observation errors, for linear or linearized nonlinear models, linear confidence and credible intervals are numerically identical when consistent prior parameter information is used. For nonlinear models, nonlinear confidence and credible intervals can be numerically identical if parameter confidence and credible regions based on approximate likelihood method are used and intrinsic model nonlinearity is small; but they differ in practice due to numerical difficulties in calculating both confidence and credible intervals. Model error is a more vital issue than differences between confidence and credible intervals for individual models, suggesting the importance of considering alternative models. Model calibration results are the basis for the model selection criteria to discriminate between models. However, how to incorporate calibration data errors into the calibration process is an unsettled problem. It has been seen that due to the improper use of the error probability structure in the calibration, the model selection criteria lead to an unrealistic situation in which one model receives overwhelmingly high averaging weight (even 100%), which cannot be justified by available data and knowledge. This dissertation finds that the errors reflected in the calibration should include two parts, measurement errors and model errors. To consider the probability structure of the total errors, I propose an iterative calibration method with two stages of parameter estimation. The multimodel analysis based on the estimation results leads to more reasonable averaging weights and better averaging predictive performance, compared to those with considering only measurement errors. Traditionally, dataworth analyses have relied on a single conceptualmathematical model with prescribed parameters. Yet this renders model predictions prone to statistical bias and underestimation of uncertainty and thus affects the groundwater management decision. This dissertation proposes a multimodel approach to optimum dataworth analyses that is based on model averaging within a Bayesian framework. The developed multimodel Bayesian approach to dataworth analysis works well in a real geostatistical problem. In particular, the selection of target for additional data collection based on the approach is validated against actual data collected. The last part of the dissertation presents an efficient method of Bayesian uncertainty analysis. While Bayesian analysis is vital to quantify predictive uncertainty in groundwater modeling, its application has been hindered in multimodel uncertainty analysis because of computational cost of numerous models executions and the difficulty in sampling from the complicated posterior probability density functions of model parameters. This dissertation develops a new method to improve computational efficiency of Bayesian uncertainty analysis using sparsegrid method. The developed sparsegridbased method for Bayesian uncertainty analysis demonstrates its superior accuracy and efficiency to classic importance sampling and MCMC sampler when applied to a groundwater flow model.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5003
 Format
 Thesis
 Title
 The Risk of Lipids on Coronary Heart Disease: Prognostic Models and MetaAnalysis.
 Creator

Almansour, Aseel, McGee, Daniel, Flynn, Heather, Niu, Xufeng, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Prognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but...
Show morePrognostic models are widely used in medicine to estimate particular patients' risk of developing disease. For cardiovascular disease risk numerous prognostic models have been developed for predicting cardiovascular disease including those by Wilson et al. using the Framingham Study[17], by Assmann et al. using the Procam study[22] and by Conroy et al.[33] using a pool of European cohorts. The prognostic models developed by these researchers differed in their approach to estimating risk but all included one or more of the lipid determinations: Total cholesterol (TC). Low Density Lipoproteins (LDL), High Density Lipoproteins (HDL), or ratios TC/HDL and LDL/HDL. None of these researchers included both LDL and TC in the same model due to the high correlation between these measurements. In this thesis we will examine some questions about the inclusion of lipid determinations in prognostic models: Can the effect of LDL and TC on the risk of dying from CHD be differentiated? If one measure is demonstrably stronger than the other, then a single model using that variable would be considered advantageous. Is it possible to derive a single measure from TC and LDL that is a stronger predictor than either measure? If so, then a new summarization of the lipid measurements should be used in prognostic modeling. Does the addition of HDL to a prognostic model improve the predictive accuracy of the model? If it does, then this determination that is almost universally determined should be used when developing prognostic models. We use data from nine independent studies to examine these issues. The studies were chosen because they include longitudinal followup of participants and included lipid determinations in the baseline examination of participants. There are many methodologies available for developing prognostic models, including logistic regression and the proportional hazards model. We used the proportional hazards model since we have followup times and times to death from CHD on all of the participants in the included studies. We summarized our results using a metaanalytic approach. Using the metaanalytic approach, we addressed the additional question of whether the results vary significantly among the different studies and also whether adding additional characteristics to the prognostic models changes the estimated effect of the lipid determinations. All of our results are presented stratified by gender and, when appropriate, by race. Finally, because our studies were not selected randomly, we also examined whether there is evidence of bias in our metaanalyses. For this examination we used funnel plots with related methodology for testing whether there is evidence of bias in the results.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8724
 Format
 Thesis
 Title
 A Class of Semiparametric Volatility Models with Applications to Financial Time Series.
 Creator

Chung, Steve S., Niu, XuFeng, Gallivan, Kyle, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied...
Show moreThe autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied and well documented in financial and econometric literature and many variants of ARCH/GARCH models have been proposed. To list a few, these include exponential GARCH(EGARCH), GJRGARHCH(or threshold GARCH), integrated GARCH(IGARCH), quadratic GARCH(QGARCH), and fractionally integrated GARCH(FIGARCH). The ARCH/GARCH models and their variant models have gained a lot of attention and they are still popular choice for modeling volatility. Despite their popularity, they suffer from model flexibility. Volatility is a latent variable and hence, putting a specific model structure violates this latency assumption. Recently, several attempts have been made in order to ease the strict structural assumptions on volatility. Both nonparametric and semiparametric volatility models have been proposed in the literature. We review and discuss these modeling techniques in detail. In this dissertation, we propose a class of semiparametric multiplicative volatility models. We define the volatility as a product of parametric and nonparametric parts. Due to the positivity restriction, we take the log and square transformations on the volatility. We assume that the parametric part is GARCH(1,1) and it serves as a initial guess to the volatility. We estimate GARCH(1,1) parameters by using conditional likelihood method. The nonparametric part assumes an additive structure. There may exist some loss of interpretability by assuming an additive structure but we gain flexibility. Each additive part is constructed from a sieve of Bernstein basis polynomials. The nonparametric component acts as an improvement for the parametric component. The model is estimated from an iterative algorithm based on boosting. We modified the boosting algorithm (one that is given in Friedman 2001) such that it uses a penalized least squares method. As a penalty function, we tried three different penalty functions: LASSO, ridge, and elastic net penalties. We found that, in our simulations and application, ridge penalty worked the best. Our semiparametric multiplicative volatility model is evaluated using simulations and applied to the six major exchange rates and SP 500 index. The results show that the proposed model outperforms the existing volatility models in both insample estimation and outofsample prediction.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8756
 Format
 Thesis
 Title
 Modeling HighFrequency Order Book Dynamics with Support Vector Machines.
 Creator

Zhang, Yuan, Kercheval, Alec N., Niu, Xufeng, Nichols, Warren, Kim, Kyounghee, Department of Mathematics, Florida State University
 Abstract/Description

A machine learning based framework is proposed in this paper to capture the dynamics of highfrequency limit order books in financial markets and automate the prediction process in realtime on metrics characterizing the dynamics such as midprice and price spread crossing. By representing each entry in a limit order book with a vector of features including price and volume at different levels as well as statistic features derived from limit order book, the proposed framework builds a...
Show moreA machine learning based framework is proposed in this paper to capture the dynamics of highfrequency limit order books in financial markets and automate the prediction process in realtime on metrics characterizing the dynamics such as midprice and price spread crossing. By representing each entry in a limit order book with a vector of features including price and volume at different levels as well as statistic features derived from limit order book, the proposed framework builds a learning model for each metric with the help of multiclass support vector machines (SVMs) to predict the directions of market movement. Experiments with real data as well as synthetic data establish that features selected by the proposed framework have highly differentiating capability, models built are effective and efficient in predictions on price movements, and trading strategies based on resulting models can achieve profitable returns with low risk.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8670
 Format
 Thesis
 Title
 Evaluation of Engineering Properties of Hot Mix Asphalt Concrete for the MechanisticEmpirical Pavement Design.
 Creator

Xiao, Yuan, Ping, WeiChou V., Niu, Xufeng, Abichou, Tarek, Sobanjo, John, Department of Civil and Environmental Engineering, Florida State University
 Abstract/Description

Hot Mix Asphalt (HMA) is a viscoelastic material and has been broadly used in pavement structures. It is important to understand the mechanism of complex behaviors of HMA mixtures in field for improving pavement mechanical performance. Aggregate gradation and asphalt binder are two key factors that influence the engineering properties of HMA. The asphalt binder plays a significant role in elastic properties of HMA and it is the essential component that determines HMA's viscous behavior. Many...
Show moreHot Mix Asphalt (HMA) is a viscoelastic material and has been broadly used in pavement structures. It is important to understand the mechanism of complex behaviors of HMA mixtures in field for improving pavement mechanical performance. Aggregate gradation and asphalt binder are two key factors that influence the engineering properties of HMA. The asphalt binder plays a significant role in elastic properties of HMA and it is the essential component that determines HMA's viscous behavior. Many research works suggest that StyreneButadieneStyrene (SBS) polymer is a promising modifier to improve the asphalt binder, and hence to benefit the HMA viscoelastic properties. The specific beneficial characteristics and appropriate polymer concentration need to be identified. In addition, aggregate gradation requirements have been defined in Superpave mix design criteria. However, a potentially sound coarse mixture with the gradation curve passing below the coarse size limit may be disqualified from being used. There is a need to evaluate the Superpave gradation requirements by studying mixtures purposely designed exceeding the control limits. Moreover, the mechanical parameters adopted by AASHTO to characterize HMA properties are shifting from indirect diametral tensile (IDT) test to dynamic modulus test (DMT), because the DMT has the ability to simulate real traffic conditions and to record more viscoelastic information of HMA. Thus, the DMT and the IDT test for implementing the AASHTO MechanisticEmpirical Design Guide (ME PDG) are needed to be discussed. The primary objective of this research study was to evaluate the fracture mechanics properties of HMA concrete and to study the correlation between the DMT and the IDT test for Superpave mixtures. An experimental program was performed on asphalt mixtures with various types of materials. The laboratory testing program was developed by applying a viscoelastic fracture mechanicsbased framework that appeared to be capable of describing the whole mechanical properties of HMA according to past research studies. The goals for these experiments are to evaluate the effect of aggregate type, the effect of gradation adjustment to control mix designs, and the effect of SBS polymer on fracture mechanics properties of HMA mixtures. Two standard coarse mixes were selected as control levels for fracture mechanics tests: one granite mixture and one limestone mixture. Each control mix design was modified to two different gradation levels with the control asphalt binder (PG 6722) and three SBS polymer content levels (3.0%, 4.5%, and 6.0%) with the original aggregate gradation. The experimental program for dynamic complex modulus test involved 20 Superpave asphalt concrete mixtures commonly used in Florida with a range of aggregates and mix designs. Data evaluation of the test results indicated the increase of nominal maximum size aggregate amount by 5% to 15% to the standard coarse mix designs had negligible effect on HMA fracture mechanics properties. The SBS polymermodified asphalt binder improved the fracture mechanics behavior of asphalt mixtures comprehensively. The limestone materials hold advantages over granite materials in improving the performance of thermal cracking at low service temperatures and the rutting resistance at high service temperatures. The master curve construction and linear regression analysis indicated that the total resilient modulus increased with an increase in dynamic modulus at a specific loading frequency. The resilient modulus values were comparable with the dynamic modulus values at the loading frequency of 4 Hz. A correlation relationship was developed for predicting the dynamic modulus from existing resilient modulus values of the asphalt concrete mixture in implementing the mechanisticempirical pavement design.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd0411
 Format
 Thesis
 Title
 Analysis and Predictions of Extreme Coastal Water Levels.
 Creator

Xu, Sudong, Huang, Wenrui, Niu, Xufeng, Nnaji, Soronnadi, Abichou, Tarek, Department of Civil and Environmental Engineering, Florida State University
 Abstract/Description

Understanding the characteristics of probability distribution of extreme water levels is important for coastal flood mitigation and engineering design. In this study, frequency analysis has been conducted to investigate probability distributions along the coast of the U.S. by using threeparameter General Extreme Value (GEV) method. The GEV model combines three types of probability distributions (Type I for Gumbel distribution, Type II for Fretchet, or Type III for Weibull) into one...
Show moreUnderstanding the characteristics of probability distribution of extreme water levels is important for coastal flood mitigation and engineering design. In this study, frequency analysis has been conducted to investigate probability distributions along the coast of the U.S. by using threeparameter General Extreme Value (GEV) method. The GEV model combines three types of probability distributions (Type I for Gumbel distribution, Type II for Fretchet, or Type III for Weibull) into one expression. Types of distributions can be clarified by one of the three parameters of the GEV model for the corresponding studied stations. In this study, the whole U.S. coast was divided into four study areas: Pacific Coast, Northeast Atlantic Coast, Southeast Atlantic Coast and Gulf of Mexico Coast. Nine National Oceanic and Atmospheric Administration (NOAA) stations with a long history of data (more than 70 years) in the four study areas were chosen in this study. Parameters of the GEV model were estimated by using the annual maximum water level of studied stations based on the Maximum Likelihood Estimation (MLE) method. Ttest was applied in this study to tell if the parameter, , was greater than, less than or equal to 0, which was used to tell the type of the GEV model. Results show that different coastal areas have different probability distribution characteristics. The characteristics of probability distribution in Pacific Coast and Northeast Atlantic Coast are similar with extreme value I and III model. The Southeast Atlantic Coast and Gulf of Mexico Coast were found to have similar probability distribution characteristics. The probability distributions were found to be extreme value I and II model, which are different from those of the Pacific Coast and Northeast Atlantic Coast. The performance of the GEV model was also studied in the four coastal areas. GEV model works well in the five studied stations of both the Pacific Coast and the Northeast Atlantic Coast but does not work well in the Southeast Atlantic Coast and the Gulf of Mexico Coast. Adequate predictions of extreme annual maximum coastal water levels (such as 100year flood elevation) are also very important for flood hazard mitigation in coastal areas of Florida, USA. In this study, a frequency analysis method has been developed to provide more accurate predictions of 1% annual maximum water levels for the Florida coast waters. Using 82 and 94 years of water level data at Pensacola and Fernandina, performances of traditional frequency analysis methods, including advanced method of Generalized Extreme Value distribution method, have been evaluated. Comparison with observations of annual maximum water levels with 83 and 95 return years indicate that traditional methods are unable to provide satisfactory predictions of 1% annual maximum water levels to account for hurricaneinduced extreme water levels. Based on the characteristics of annual maximum water level distribution Pensacola and Fernandina stations, a new probability distribution method has been developed in this study. Comparison with observations indicates that the method presented in this study significantly improves the accuracy of predictions of 1% annual maximum water levels. For Fernandina station, predictions of extreme water level match well with the general trend of observations. With a correlation coefficient of 0.98, the error for the maximum observed extreme water level of 3.11 m (NGVD datum) with 95 return years is 0.92 %. For Pensacola station, the prediction error for the maximum observed extreme water level with a return period of 83 years is 5.5 %, with a correlation value of 0.98. In frequency analysis of 100 year coastal flood (FEMA 2005), annual extreme high water levels are often used. However, in many coastal areas, long history data of water levels are unavailable. In addition, some water level records may be missed due to the damage of measurement instruments during hurricanes. In this study, a method has been developed to employ artificial neural network and harmonic analysis for predicting extreme coastal water levels during hurricanes. The combined water levels were decomposed into tidal signals and storm surge. Tidal signal can be derived by harmonic analysis, while storm surge can be predicted by neural network modeling based on the observed wind speeds and atmospheric pressure. The neural network model employs threelayer feedforward backgropagation structure with advanced scaled conjugate training algorithm. The method presented in this study has been successfully tested in Panama City Beach and Apalachicola located in Florida coast for Hurricane Dennis and Hurricane Ivan. In both stations, model predicted peak elevations match well with observations in both hurricane events. The decomposed storm surge hydrograph also make it possible for analysis potential extreme water levels if storm surge occurs during spring high tide.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd0416
 Format
 Thesis
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 MetaAnalytic Structural Equation Modeling (MASEM): Comparison of the Multivariate Methods.
 Creator

Zhang, Ying, Becker, Betsy Jane, Yang, Yanyun, Niu, Xufeng, Eklund, Robert, Department of Educational Psychology and Learning Systems, Florida State University
 Abstract/Description

Metaanalytic Structural Equation Modeling (MASEM) has drawn interest from many researchers recently. In doing MASEM, researchers usually first synthesize correlation matrices across studies using metaanalysis techniques and then analyze the pooled correlation matrix using structural equation modeling techniques. Several multivariate methods of MASEM have been proposed by the researchers. In this dissertation, I compared the commonly used multivariate methods for metaanalytic path modeling....
Show moreMetaanalytic Structural Equation Modeling (MASEM) has drawn interest from many researchers recently. In doing MASEM, researchers usually first synthesize correlation matrices across studies using metaanalysis techniques and then analyze the pooled correlation matrix using structural equation modeling techniques. Several multivariate methods of MASEM have been proposed by the researchers. In this dissertation, I compared the commonly used multivariate methods for metaanalytic path modeling. Specifically, I examined the Generalized Least Squares (GLS) method (Becker, 1992; Becker & Schram, 1994) and the TwoStage Structural Equation Modeling (TSSEM) method (Cheung, 2002; Cheung & Chan, 2005) using both simulation studies and real data analyses. Both the traditional GLS approach (Becker, 1992) and the modified GLS approaches (Becker & Fahrbach, 1994) were applied and compared with the TSSEM approach. Fixedeffects data and randomeffects data were generated to see how these approaches differ at the first and second stages of MASEM. The results shows that the modified GLS approach performs as well as or better than the TSSEM approach in both the first step of synthesizing correlation matrices and the second step estimation of the parameters and standard errors, using both fixedeffects data and randomeffects data. The original GLS approach only performs well when the withinstudy sample size is large enough (of the simulation situations in this dissertation, n ). Both the modified GLS approach and the TSSEM approach produce equivalent parameter estimates across all conditions. However, the standard errors from the TSSEM approach seem to be overestimates under certain conditions. Overall, both the modified GLS and TSSEM approaches are appropriate for conducting metaanalytic path modeling and the difference in parameter estimates is minimal.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0534
 Format
 Thesis
 Title
 A Comparison of Estimators in Hierarchical Linear Modeling: Restricted Maximum Likelihood versus Bootstrap via Minimum Norm Quadratic Unbiased Estimators.
 Creator

Delpish, Ayesha Nneka, Niu, XuFeng, Tate, Richard L., Huﬀer, Fred W., Zahn, Douglas, Department of Statistics, Florida State University
 Abstract/Description

The purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations,...
Show moreThe purpose of the study was to investigate the relative performance of two estimation procedures, the restricted maximum likelihood (REML) and the bootstrap via MINQUE, for a twolevel hierarchical linear model under a variety of conditions. Specific focus lay on observing whether the bootstrap via MINQUE procedure offered improved accuracy in the estimation of the model parameters and their standard errors in situations where normality may not be guaranteed. Through Monte Carlo simulations, the importance of this assumption for the accuracy of multilevel parameter estimates and their standard errors was assessed using the accuracy index of relative bias and by observing the coverage percentages of 95% confidence intervals constructed for both estimation procedures. The study systematically varied the number of groups at level2 (30 versus 100), the size of the intraclass correlation (0.01 versus 0.20) and the distribution of the observations (normal versus chisquared with 1 degree of freedom). The number of groups and intraclass correlation factors produced effects consistent with those previously reported—as the number of groups increased, the bias in the parameter estimates decreased, with a more significant effect observed for those estimates obtained via REML. High levels of the intraclass correlation also led to a decrease in the efficiency of parameter estimation under both methods. Study results show that while both the restricted maximum likelihood and the bootstrap via MINQUE estimates of the fixed effects were accurate, the efficiency of the estimates was affected by the distribution of errors with the bootstrap via MINQUE procedure outperforming the REML. Both procedures produced less efficient estimators under the chisquared distribution, particularly for the variancecovariance component estimates.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0771
 Format
 Thesis
 Title
 The Robustness of Real Interest Rate Parity Tests to Alternative Measures of Real Interest Rates.
 Creator

Pipatchaipoom, Onsurang, Norrbin, Stefan, Niu, XuFeng, Beaumont, Paul, Marquis, Milton, Department of Economics, Florida State University
 Abstract/Description

Prior research using the ex ante real interest rate has led to mixed evidence about the validity of the Fisher relationship and for the Real Interest Parity hypothesis. In particular, authors have disagreed over whether the ex ante real rate of interest is stationary or not, and have therefore used different econometric methodologies to test the theories. Such a controversy may stem from the methods used in constructing the ex ante real interest rate since the measurement of the unobserved...
Show morePrior research using the ex ante real interest rate has led to mixed evidence about the validity of the Fisher relationship and for the Real Interest Parity hypothesis. In particular, authors have disagreed over whether the ex ante real rate of interest is stationary or not, and have therefore used different econometric methodologies to test the theories. Such a controversy may stem from the methods used in constructing the ex ante real interest rate since the measurement of the unobserved real interest rate required underlying assumptions of the forecasting behavior of agents. Our findings indicate that the time series properties of the constructed ex ante real interest rate appear to be sensitive not only to the method used, but also to the choice of inflation rate calculation used to construct the series. Therefore, we would anticipate that hypothesis testing that involves the ex ante real interest rate will provide a wide range of conclusions depending on the methodology used to construct the real rates. To examine whether different approaches of constructing the real interest rate series matter in real interest rate parity (RIP) hypothesis testings, different linear methods of testing real interest rate parity are employed to analyze the real interest rate linkages among a group of OECD countries. The stationarity conclusions of the real interest rates series are subject to the choice of methods of constructing the underlying real interest rates. Thus, this leads to problems of selecting the methodology of conducting RIP tests. If the real interest rates are assumed to be stationary, the standard linear regression tests of real rate equalization will be adequate. Otherwise, a cointegration technique will be more appropriate to test for the stationary relationship among the variables. By assuming nonstationary real interest rates, we investigate whether there exists a common trend among the real interest rates by using bivariate and multivariate cointegration tests. The findings indicates that the existence of RIP depends substantially on how the real rates are computed. The final essay deals with the effects of the choice of interest rate methodology on nonlinear tests of RIP. Research has found evidence supporting gradual regimeswitching behavior of real interest rate adjustments due to the existence of transactions costs. There may be no response of domestic rate to foreign rate changes when the deviation between the real rates is small due to transactions costs. However, when shocks to both rates are large enough to make arbitrage profitable, this may evoke quick adjustments to restore the parity. We investigate nonlinear adjustments of real interest rates toward the longrun equilibrium using the smooth transition autoregressive (STAR) models. Both logistic (LSTAR) and exponential (ESTAR) smooth transition autoregressive models are considered. The findings indicate that there exists nonlinearities in OECD real interest rate adjustment. However, the types of models as well as the speed of reestablishing RIP depend on the approaches used in measuring the underlying real interest rates.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd0702
 Format
 Thesis
 Title
 Effects of Arrest on Intimate Partner Violence Incidence and Revictimization: Logistic Regression and Regression Time Series Analysis of the National Crime Victimization Survey from 1987 to 2003.
 Creator

Cho, Hyunkag, McNeece, C. Aaron, Niu, Xufeng, Wilke, Dina J., College of Social Work, Florida State University
 Abstract/Description

The police have actively intervened in intimate partner violence (IPV) since the 1980's to hold batterers accountable by legal punishment, to prevent future violence, and to provide victims' safety. However, research results of the effectiveness of the police intervention are inconclusive. Moreover, the majority of studies focused on batterers, overlooking victimrelated factors in examining the effectiveness of the police intervention. This paper used the National Crime Victimization Survey...
Show moreThe police have actively intervened in intimate partner violence (IPV) since the 1980's to hold batterers accountable by legal punishment, to prevent future violence, and to provide victims' safety. However, research results of the effectiveness of the police intervention are inconclusive. Moreover, the majority of studies focused on batterers, overlooking victimrelated factors in examining the effectiveness of the police intervention. This paper used the National Crime Victimization Survey from 1987 to 2003 to examine whether arrest of batterers has an effect in reducing revictimization. Overall, younger, separated victims are more likely to be revictimized than the older, married women. Rape or sexual assault victims and those without injury from the previous victimization are more vulnerable to revictimization. Also, IPV incidence rates declined by half, and arrest rates of batterers were doubled from 1987 to 2004. With regard to the effect of arrest, the study results support the specific effect of arrest on victims' safety. Logistic regression analysis of data from 2,462 victims showed that when the police arrested batterers, their chance of revictimization fell by half. However, the general effect of arrest on incidence rates is not as apparent as the specific effect. The timelagged effect of arrest on incidence rates, which was shown in this study, needs future research for meaningful interpretation because there is no theory to explain such a delayed effect. Since dualexamination of the specific effect and the general effect showed the overall effectiveness of arrest in reducing IPV incidence and revictimization, social work policy and practice fields will be able to develop informed, effective intervention strategies in IPV.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd3795
 Format
 Thesis
 Title
 Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
 Creator

Franks, Billy J., McGee, Daniel, Hurt, Myra, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain dataderived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd4402
 Format
 Thesis
 Title
 Statistical Shape Analysis on Manifolds with Applications to Planar Contours and Structural Proteomics.
 Creator

Ellingson, Leif A., Patrangenaru, Vic, Mio, Washington, Zhang, Jinfeng, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so...
Show moreThe technological advances in recent years have produced a wealth of intricate digital imaging data that is analyzed effectively using the principles of shape analysis. Such data often lies on either highdimensional or infinitedimensional manifolds. With computing power also now strong enough to handle this data, it is necessary to develop theoreticallysound methodology to perform the analysis in a computationally efficient manner. In this dissertation, we propose approaches of doing so for planar contours and the threedimensional atomic structures of protein binding sites. First, we adapt Kendall's definition of direct similarity shapes of finite planar configurations to shapes of planar contours under certain regularity conditions and utilize Ziezold's nonparametric view of Frechet mean shapes. The space of direct similarity shapes of regular planar contours is embedded in a space of HilbertSchmidt operators in order to obtain the VeroneseWhitney extrinsic mean shape. For computations, it is necessary to use discrete approximations of both the contours and the embedding. For cases when landmarks are not provided, we propose an automated, randomized landmark selection procedure that is useful for contour matching within a population and is consistent with the underlying asymptotic theory. For inference on the extrinsic mean direct similarity shape, we consider a onesample neighborhood hypothesis test and the use of nonparametric bootstrap to approximate confidence regions. Bandulasiri et al (2008) suggested using extrinsic reflection sizeandshape analysis to study the relationship between the structure and function of protein binding sites. In order to obtain meaningful results for this approach, it is necessary to identify the atoms common to a group of binding sites with similar functions and obtain proper correspondences for these atoms. We explore this problem in depth and propose an algorithm for simultaneously finding the common atoms and their respective correspondences based upon the Iterative Closest Point algorithm. For a benchmark data set, our classification results compare favorably with those of leading established methods. Finally, we discuss current directions in the field of statistics on manifolds, including a computational comparison of intrinsic and extrinsic analysis for various applications and a brief introduction of sample spaces with manifold stratification.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0053
 Format
 Thesis
 Title
 Impact of Missing Data on Building Prognostic Models and Summarizing Models Across Studies.
 Creator

Munshi, Mahtab R., McGee, Daniel, Eberstein, Isaac, Hollander, Myles, Niu, Xufeng, Chattopadhyay, Somesh, Department of Statistics, Florida State University
 Abstract/Description

We examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in...
Show moreWe examine the impact of missing data in two settings, the development of prognostic models and the addition of new risk factors to existing risk functions. Most statistical software presently available perform complete case analysis, wherein only participants with known values for all of the characteristics being analyzed are included in model development. Missing data also impacts the summarization of evidence amongst multiple studies using metaanalytic techniques. As we progress in medical research, new covariates become available for studying various outcomes. While we want to investigate the influence of new factors on the outcome, we also do not want to discard the historical datasets that do not have information about these markers. Our research plan is to investigate different methods to estimate parameters for a model when some of the covariates are missing. These methods include likelihood based inference for the studylevel coefficients and likelihood based inference for the logistic model on the personlevel data. We compare the results from our methods to the corresponding results from complete case analysis. We focus our empirical investigation on a historical example, the addition of high density lipoproteins to existing equations for predicting death due to coronary heart disease. We verify our methods through simulation studies on this example.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd2191
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Assimilation of Hyperspectral Satellite Radiance Observations within Tropical Cyclones.
 Creator

Lin, Haidao, Zou, Xiaolei, Niu, Xufeng, Ellingson, Robert G., Liu, Guosheng, Hart, Robert, Department of Earth, Ocean and Atmospheric Sciences, Florida State University
 Abstract/Description

The availability of high resolution temperature and water vapor data is critical for the study of mesoscale scale weather phenomena (e.g., convective initiations, and tropical cyclones). As hyperspectral infrared sounders, the Atmospheric Infrared Sounder (AIRS) and Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) could provide high resolution atmospheric profiles by measuring radiations in many thousands of different channels. This work focuses on the assessment of the potential...
Show moreThe availability of high resolution temperature and water vapor data is critical for the study of mesoscale scale weather phenomena (e.g., convective initiations, and tropical cyclones). As hyperspectral infrared sounders, the Atmospheric Infrared Sounder (AIRS) and Geosynchronous Imaging Fourier Transform Spectrometer (GIFTS) could provide high resolution atmospheric profiles by measuring radiations in many thousands of different channels. This work focuses on the assessment of the potential values of satellite hyperspectral radiance data on the study of convective initiations (CI) and the assimilation of AIRS radiance observations within tropical storms. First, the potential capability of hyperspectral infrared measurements (GIFTS) to provide convective precipitation forecasts has been studied and assessed. Using both the observed and the modelpredicted profiles as input to the GIFTS radiative transfer model (RTM), it is shown that the simulated GIFTS radiance could capture the high vertical and temporal variability of the real and modeled atmosphere prior to a convective initiation, as well as the differences between observations and model forecasts. This study suggests the potential for hyperspectral infrared radiance data to make an important contribution to the improvement of the forecast skill of convective precipitation. Second, as the first step toward applying AIRS data to tropical cyclone (TC) prediction, a set of dropsonde profiles during Hurricane Rita (2005) is used to simulate AIRS radiance data and to assess the ability of AIRS data in capturing the vertical variability within TCs through onedimensional variational (1DVar) twin experiments. The AIRS observation errors and background errors are first estimated. Five sets of 1DVar twin experiments are then performed using different combinations of AIRS channels. Finally, results from these 1DVar experiments are analyzed. Major findings are: (1) AIRS radiance data contain useful information about the vertical variability of the temperature and water vapor within hurricanes; (2) assimilation of AIRS radiances significantly reduced errors in background temperature in the lower troposphere and relative humidity in the upper troposphere; (3) the nearreal time (NRT) channel set provided by NOAA/NESDIS seems sufficient for capturing the vertical variability of the atmosphere in the upper troposphere of TCs, but not in the lower troposphere; and (4) the channels with weighting functions peak within the layer between 500700 hPa could provide useful information to the atmospheric state below 700 hPa. A channel selection method is proposed to capture most vertical variability of temperature and water vapor within TCs contained in AIRS data. Finally, AIRS radiance data within TCs have been assimilated in the 1DVar experiments with comparisons of the retrieval temperature and water vapor profiles with colocated Global Positioning System (GPS) radio occultation (RO) soundings and dropsonde profiles. The comparisons of AIRS 1DVar retrieval profiles with GPS RO sounding show that AIRS data can greatly improve the analysis of temperature and water vapor profiles within TCs. The comparisons of retrieval profiles with dropsonde data during Hurricane Rita, however, showed some discrepancies partly due to the difference of these two measurements and the uncertainties of the AIRS errors.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1326
 Format
 Thesis
 Title
 Bayesian Generalized Polychotomous Response Models and Applications.
 Creator

Yang, Fang, Niu, XuFeng, Johnson, Suzanne B., McGee, Dan, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Polychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from...
Show morePolychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from Gibbs sampling estimates will be compared to traditional maximum likelihood estimates to show the strength that using the uniform improper priors for the regression coeffcients. Motivated by investigating of relationship between BMI categories and several risk factors, we carry out the application studies to examine the impact of risk factors on BMI categories, especially for categories of "Overweight" and "Obesities". By applying the mixedeffects Bayesian polychotomous response model with uniform improper priors, we would get similar interpretations of the association between risk factors and BMI, comparing to literature findings.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1092
 Format
 Thesis
 Title
 A Probabilistic and Graphical Analysis of Evidence in O.J. Simpson's Murder Case Using Bayesian Networks.
 Creator

Olumide, Kunle, Huﬀer, Fred, Shute, Valerie, Sinha, Debajyoti, Niu, Xufeng, Logan, Wayne, Department of Statistics, Florida State University
 Abstract/Description

This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of...
Show moreThis research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or nonguilt of the accused; this main hypothesis will be supplemented by a series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such an evaluation. Some sensitivity analyses will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd2287
 Format
 Thesis
 Title
 Adaptive Series Estimators for Copula Densities.
 Creator

Gui, Wenhao, Wegkamp, Marten, Van Engelen, Robert A., Niu, Xufeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

In this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the...
Show moreIn this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the copula density estimator based on a finite series using the estimated coefficients. We propose a stopping rule for selecting the number of coefficients used in the series and we prove that this rule minimizes the mean integrated squared error. In addition, we consider hard and soft thresholding techniques for sparse representations. We obtain oracle inequalities that hold with prescribed probability for various norms of the difference between the copula density and our threshold series density estimator. Uniform confidence bands are derived as well. The oracle inequalities clearly reveal that our estimator adapts to the unknown degree of sparsity of the series representation of the copula density. A simulation study indicates that our method is extremely easy to implement and works very well, and it compares favorably to the popular kernel based copula density estimator, especially around the boundary points, in terms of mean squared error. Finally, we have applied our method to an insurance dataset. After comparing our method with the previous data analyses, we reach the same conclusion as the parametric methods in the literature and as such we provide additional justification for the use of the developed parametric model.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3929
 Format
 Thesis
 Title
 Estimating the Probability of Cardiovascular Disease: A Comparison of Methods.
 Creator

Fan, Li, McGee, Daniel, Hurt, Myra, Niu, XuFeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Risk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for...
Show moreRisk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for risk assessment and evaluates the performance of prediction at two levels, including the overall level and the individual level. At the overall level, cross validation and simulation are used to assess the risk prediction, while at the individual level, the "Parametric Bootstrap" and the delta method are used to evaluate the uncertainty of the individual risk prediction. Further exploration of the reasons producing different performance among the models is ongoing.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd4508
 Format
 Thesis
 Title
 Developing Crash Modification Factors for Urban Highway with Substandard Wide Curb Lane.
 Creator

Mbatta, Geophrey, Moses, Ren, Niu, Xufeng, Sobanjo, John, AbdelRazig, Yassir, Department of Civil and Environmental Engineering, Florida State University
 Abstract/Description

Across the United States, a great deal of attention is being focused on promoting energy efficient and environmental friendly modes of transportation. Bicycling is an integral part of a sustainable transportation system that is one of the most efficient. The growing use of bicycles for commuting and leisure activities is creating conflicts with motorized traffic mainly due to deficient roadway facilities that were in the past designed to primarily accommodate motorized traffic.In 2008, 716...
Show moreAcross the United States, a great deal of attention is being focused on promoting energy efficient and environmental friendly modes of transportation. Bicycling is an integral part of a sustainable transportation system that is one of the most efficient. The growing use of bicycles for commuting and leisure activities is creating conflicts with motorized traffic mainly due to deficient roadway facilities that were in the past designed to primarily accommodate motorized traffic.In 2008, 716 bicyclists were killed in the USA which corresponds to 2 percent of total traffic fatalities reported. In the same year, over 52,000 bicyclists were also reported to have been injured in traffic crashes. When the data is broken down state by state, Florida ranked second with 6.82 bicyclist fatalities per million population. These statistics provide a grim reminder of the dangers faced by bicyclist riding on urban and rural highways.This study developed lane width crash modification factors (CMFs) for roads with wider outside lane narrower or equal to 14 ft, and inside lanes narrower or equal to 11 ft that were minimum recommended by the Florida Department of Transportation (FDOT) for arterial roadways in the state of Florida. The road segments used were urban fourlane with a divided median and fourlane twoway left turn lane (TWLT). Data used in the evaluation included 25 centerline miles of urban fourlane (TWLT) roads and 75 centerline miles of fourlane with a divided median roads. Two main types of crash modification factors and safety performance functions were developed in this study. The first type is crash modification factors and safety performance functions for all types of crashes, and the second is crash modification factors and safety performance functions for motor vehiclebicyclist crashes only.In sum the results obtained from this study suggest that, maintaining of inside lane width to 11.5ft and outside lane width to 13 ft for 4D and 5T could results in less number of all types of crashes. Additionally, the decreases in motor vehiclebicyclist crashes will also be achieved by outside lane width of 13 ft as the inside lane width was not found to contribute to the increase or decrease in motor vehiclebicyclist crashes.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd2627
 Format
 Thesis
 Title
 Early 19th Century U.S. Hurricanes: A GIS Tool and Climate Analysis.
 Creator

Bossak, Brian H., Elsner, James B., Niu, Xufeng, Baker, E. Jay, Jacobson, R. Dan, Department of Geography, Florida State University
 Abstract/Description

Hurricane climate research is based on data spanning the last 100 years or so. To better understand rare but potentially catastrophic hurricane events it is helpful to have longer records. Records from historical archives are available, but they need to be collated and edited. Efforts to collate U.S. tropical cyclone information from the first half of the 19th Century using a Geographic Information System (GIS) have been conducted in this research. The Historical Hurricane Impact Tool (HHIT)...
Show moreHurricane climate research is based on data spanning the last 100 years or so. To better understand rare but potentially catastrophic hurricane events it is helpful to have longer records. Records from historical archives are available, but they need to be collated and edited. Efforts to collate U.S. tropical cyclone information from the first half of the 19th Century using a Geographic Information System (GIS) have been conducted in this research. The Historical Hurricane Impact Tool (HHIT) is based on Environmental Systems Research Institute's (ESRI) ArcView GIS 3.1. Statements concerning coastal and nearcoastal impacts are reproduced within map callout boxes. The callout boxes point to the geographic location of the documented information. Map layers are used for different archival sources. The HHIT, which is available in hardcopy format and will be online in the near future via an internet map server, can be used by scientists, emergency managers, and the general public to better estimate the risk of a hurricane catastrophe. The U.S. hurricane database ("BestTrack") was recently extended from 1871 back to 1851 through the work of NOAA's Atlantic Hurricane Reanalysis Project. In addition, the previously mentioned Historical Hurricane Impact Tool (HHIT) has been utilized to collate and list recorded U.S. hurricanes back to the year 1800. The combination of NOAA's "BestTrack" data back to 1851 and the HHIT collated hurricane list back to 1800 provide an unprecedented look at U.S. hurricane activity since the beginning of the industrial revolution. This research also examines U.S. (major) hurricanes over four 50year epochs, and then further examines regional trends in U.S. hurricanes. Seasonal distributions are similar across epochs. The earliest epoch contains the greatest ratio of major hurricanes to all U.S. hurricanes. Each epoch is further divided into three separate regions, and hurricane landfalls in Florida and the East Coast region are found to have an inverse relationship. Furthermore, the relationship between climate variables such as ENSO, the NAO, the PDO, and U.S. hurricanes appears to be different in the first epoch (18011850) than in the other three epochs (18512000). The relationships noted are robust to changes in sample size. A physical explanation for the noted trend is presented in a later chapter. Other climate influences on U.S. hurricanes, including volcanic eruptions and sunspots, are explored for effects on landfall counts.
Show less  Date Issued
 2003
 Identifier
 FSU_migr_etd3514
 Format
 Thesis
 Title
 Quasi3D Statistical Inversion of Oceanographic Tracer Data.
 Creator

Herbei, Radu, Speer, Kevin, Wegkamp, Marten, Laurent, Louis St., Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error....
Show moreWe perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error. This approach estimates both horizontal and vertical flow as well as diffusion coefficients. We find a system of alternating zonal jets at the depths of the North Atlantic Deep Water, consistent with direct measurements of flow and concentration maps. A uniqueness analysis of our model is performed in terms of the oxygen consumption rate. The vertical mixing coefficient bears some relation to the bottom topography even though we do not incorporate that into our model. We extend the method to a multilayer model, using thermal wind relations weakly in a local fashion (as opposed to integrating the entire water column) to connect layers vertically. Results suggest that the estimated deep zonal jets extend vertically, with a clear depth dependent structure. The vertical structure of the flow field is modified by the tracer fields over that set a priori by thermal wind. Our estimates are consistent with observed flow at the depths of the Antarctic Intermediate Water; at still shallower depths, above the layers considered here, the subtropical gyre is a significant feature of the horizontal flow.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd4101
 Format
 Thesis
 Title
 New Semiparametric Methods for Recurrent Events Data.
 Creator

Gu, Yu, Sinha, Debajyoti, Eberstein, Isaac W., McGee, Dan, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Recurrent events data are rising in all areas of biomedical research. We present a model for recurrent events data with the same link for the intensity and mean functions. Simple interpretations of the covariate effects on both the intensity and mean functions lead to a better understanding of the covariate effects on the recurrent events process. We use partial likelihood and empirical Bayes methods for inference and provide theoretical justifications and as well as relationships between...
Show moreRecurrent events data are rising in all areas of biomedical research. We present a model for recurrent events data with the same link for the intensity and mean functions. Simple interpretations of the covariate effects on both the intensity and mean functions lead to a better understanding of the covariate effects on the recurrent events process. We use partial likelihood and empirical Bayes methods for inference and provide theoretical justifications and as well as relationships between these methods. We also show the asymptotic properties of the empirical Bayes estimators. We illustrate the computational convenience and implementation of our methods with the analysis of a heart transplant study. We also propose an additive regression model and associated empirical Bayes method for the risk of a new event given the history of the recurrent events. Both the cumulative mean and rate functions have closed form expressions for our model. Our inference method for the simiparametric model is based on maximizing a finite dimensional integrated likelihood obtained by integrating over the nonparametric cumulative baseline hazard function. Our method can accommodate timevarying covariates and is easier to implement computationally instead of iterative algorithm based full Bayes methods. The asymptotic properties of our estimates give the largesample justifications from a frequentist stand point. We apply our method on a study of heart transplant patients to illustrate the computational convenience and other advantages of our method.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3941
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 Bayesian Portfolio Optimization with TimeVarying Factor Models.
 Creator

Zhao, Feng, Niu, Xufeng, Cheng, Yingmei, Huﬀer, Fred W., Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the riskfree rate ("risk premia"). Both firmlevel characteristics and macroeconomic variables are used to predict stocks' timevarying alphas and betas, and macroeconomic variables are used to predict the risk...
Show moreWe develop a modeling framework to simultaneously evaluate various types of predictability in stock returns, including stocks' sensitivity ("betas") to systematic risk factors, stocks' abnormal returns unexplained by risk factors ("alphas"), and returns of risk factors in excess of the riskfree rate ("risk premia"). Both firmlevel characteristics and macroeconomic variables are used to predict stocks' timevarying alphas and betas, and macroeconomic variables are used to predict the risk premia. All of the models are specified in a Bayesian framework to account for estimation risk, and informative prior distributions on both stock returns and model parameters are adopted to reduce estimation error. To gauge the economic signicance of the predictability, we apply the models to the U.S. stock market and construct optimal portfolios based on model predictions. Outofsample performance of the portfolios is evaluated to compare the models. The empirical results confirm predictabiltiy from all of the sources considered in our model: (1) The equity risk premium is timevarying and predictable using macroeconomic variables; (2) Stocks' alphas and betas differ crosssectionally and are predictable using firmlevel characteristics; and (3) Stocks' alphas and betas are also timevarying and predictable using macroeconomic variables. Comparison of different subperiods shows that the predictability of stocks' betas is persistent over time, but the predictability of stocks' alphas and the risk premium has diminished to some extent. The empirical results also suggest that Bayesian statistical techinques, especially the use of informative prior distributions, help reduce model estimation error and result in portfolios that outperform the passive indexing strategy. The findings are robust in the presence of transaction costs.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0526
 Format
 Thesis
 Title
 Essays in Corporate Finance.
 Creator

Krieger, Kevin, Niu, Xufeng, Cheng, Yingmei, Haslem, Bruce, Department of Finance, Florida State University
 Abstract/Description

This dissertation examines two underdeveloped topics in the field of corporate finance. In the first chapter, I detail the impact of expectations on the performance of firms under the leadership of new CEOs. I seek to answer the heretofore untested question of whether greater pressure motivates new CEOs to succeed, encourages them to engage in manipulative behaviors, or both. I show that new CEOs with greater expectations are significantly more likely to report superior achievement. However,...
Show moreThis dissertation examines two underdeveloped topics in the field of corporate finance. In the first chapter, I detail the impact of expectations on the performance of firms under the leadership of new CEOs. I seek to answer the heretofore untested question of whether greater pressure motivates new CEOs to succeed, encourages them to engage in manipulative behaviors, or both. I show that new CEOs with greater expectations are significantly more likely to report superior achievement. However, after possible manipulation is accounted for, this superiority is reduced to insignificant levels. I conclude that while a small motivational effect of expectations may exist, spurring some new CEOs to perform, the strongest impact of heightened expectations for new CEOs is an increased likelihood to artificially inflate performance measures for the sake of appearance. In the second chapter, I question whether the market's response to the announcement of significantly increased dividends depends on the future free cash flows, realized by the firm, which may prove to be susceptible to agency costs. I consider the market's reaction to the announcement of substantial dividend hikes and find this reaction is not tied to the future free cash flows of the firm; however, I also find that the reaction is strongly linked to the excess cash balance of the firm in the year of the dividend increase. I contend the market attempts to consider agency costs when evaluating new dividends, but in doing so it fails to look beyond the firm's current cash situation.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2870
 Format
 Thesis
 Title
 The Effects of Degree Type, the Integration Process, and External Factors on Degree Completion for Mothers in College: A Comparison Study of Single Mother and Married Mother College Students.
 Creator

McLaughlin, Alicia Nicole, Wilke, Dina, Niu, XuFeng, Radey, Melissa, Randolph, Karen, College of Social Work, Florida State University
 Abstract/Description

The National Center for Education Statistics reports that single mother college students are nearly three times as likely to drop out of college during their first year of study compared to single females without children. Qualitative studies on single mothers indicate that financial problems and demands of parenthood are reasons that precipitate voluntary withdrawal from college. These studies also indicate that being able to academically and socially integrate into the collegiate atmosphere...
Show moreThe National Center for Education Statistics reports that single mother college students are nearly three times as likely to drop out of college during their first year of study compared to single females without children. Qualitative studies on single mothers indicate that financial problems and demands of parenthood are reasons that precipitate voluntary withdrawal from college. These studies also indicate that being able to academically and socially integrate into the collegiate atmosphere increases the chance of completing a degree. Considering the various obstacles facing single mothers, it becomes important to examine why some single mothers graduate from college while others leave without degrees. Therefore, the focus of this study was to examine how potential factors impacted degree completion for single mothers. To understand the magnitude of how potential factors impacted degree completion, comparisons with married mothers were performed. Although vast amounts of higher education research have been conducted on degree completion, little attention has been given exclusively to studentmothers attending college, particularly those who are single. This study utilized data provided in the Beginning Postsecondary Students Longitudinal Study (BPS:96/01 – restricted level) employing logistic regression to investigate the influence of the integration process (academic integration and social integration), degree type (certificate, associate, and bachelor), and pertinent external factors (age of child, financial difficulties, and family difficulties) on degree completion for single and married mothers as separate groups. Findings revealed that the proposed model of degree completion operated similarly for single and married mothers. This study validated concepts from Tinto's (1993) model of institutional departure for single and married mothers. With the exception of having a child under the age of five, degree type, the integration process, and external factors predicted degree completion as hypothesized. Results from this study filled the gap in knowledge by becoming the first to examine factors that impacted degree completion on nationally representative samples of studentmother undergraduates. Results from this study could inform educational administrators, advocates for single mothers, and educational policy makers about the oncampus and offcampus experiences of single mothers so that better educational and advocacy decisions can be enacted. This was significant, not only for single mothers but also, for the 73% of nontraditional students attending postsecondary institutions in America.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2525
 Format
 Thesis
 Title
 Individual PatientLevel Data MetaAnalysis: A Comparison of Methods for the Diverse Populations Collaboration Data Set.
 Creator

Dutton, Matthew Thomas, McGee, Daniel, Becker, Betsy, Niu, Xufeng, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

DerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the meta...
Show moreDerSimonian and Laird define metaanalysis as "the statistical analysis of a collection of analytic results for the purpose of integrating their findings. One alternative to classical metaanalytic approaches in known as Individual PatientLevel Data, or IPD, metaanalysis. Rather than depending on summary statistics calculated for individual studies, IPD metaanalysis analyzes the complete data from all included studies. Two potential approaches to incorporating IPD data into the metaanalytic framework are investigated. A twostage analysis is first conducted, in which individual models are fit for each study and summarized using classical metaanalysis procedures. Secondly, a onestage approach that singularly models the data and summarizes the information across studies is investigated. Data from the Diverse Populations Collaboration data set are used to investigate the differences between these two methods in a specific example. The bootstrap procedure is used to determine if the two methods produce statistically different results in the DPC example. Finally, a simulation study is conducted to investigate the accuracy of each method in given scenarios.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd0620
 Format
 Thesis
 Title
 The Effect of Risk Factors on Coronary Heart Disease: An AgeRelevant Multivariate Meta Analysis.
 Creator

Li, Yan, McGee, Dan, She, Yiyuan, Eberstein, Ike, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

The importance of major risk factors, such as hypertension, total cholesterol, body mass index, diabetes, smoking, for predicting incidence and mortality of Coronary Heart Disease (CHD) is well known. In light of the fact that age is also a major risk factor for CHD death, a natural question is whether the risk effects on CHD change with age. This thesis focuses on examining the interaction between age and risk factors using data from multiple studies containing differing age ranges. The aim...
Show moreThe importance of major risk factors, such as hypertension, total cholesterol, body mass index, diabetes, smoking, for predicting incidence and mortality of Coronary Heart Disease (CHD) is well known. In light of the fact that age is also a major risk factor for CHD death, a natural question is whether the risk effects on CHD change with age. This thesis focuses on examining the interaction between age and risk factors using data from multiple studies containing differing age ranges. The aim of my research is to use statistical methods to determine whether we can combine these diverse results to obtain an overall summary, using which one can find how the risk effects on CHD death change with age. One intuitive approach is to use classical meta analysis based on generalized linear models. More specifically, one can fit a logistic model with CHD death as response and age, a risk factor and their interaction as covariates for each of the studies, and conduct meta analysis on every set of three coefficients in the multivariate setting to obtain 'synthesized' coefficients. Another aspect of the thesis is a new method, meta analysis with respect to curves that goes beyond linear models. The basic idea is that one can choose the same spline with the same knots on covariates, say age and systolic blood pressure (SBP), for all the studies to ensure common basis functions. The knotbased tensor product basis coefficients obtained from penalized logistic regression can be used for multivariate meta analysis. Using the common basis functions and the 'synthesized' knotbased basis coefficients from meta analysis, a twodimensional smooth surface on the ageSBP domain is estimated. By cutting through the smooth surface along two axes, the resulting slices show how the risk effect on CHD death change at an arbitrary age as well as how the age effect on CHD death change at an arbitrary SBP value. The application to multiple studies will be presented.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1428
 Format
 Thesis
 Title
 Multistate Intensity Model with ARGARCH Random Effect for Corporate Credit Rating Transition Analysis.
 Creator

Li, Zhi, Niu, Xufeng, Huﬀer, Fred, Kercheval, Alec, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more...
Show moreThis thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more accurate models on high frequency financial data. The corporate rating transition modeling was historically dealing with low frequency data which did not have the need to specify the volatility. However, the newly published Moody's market implied ratings are exhibiting much higher transition frequencies. Therefore, we feel that it is necessary to capture the volatility component and make extensions to existing models to reflect this fact. The theoretical model specification and estimation details are discussed thoroughly in this dissertation. The performance of our models is studied on several simulated data sets and compared to the original model. Finally, the models are applied to both Moody's issuer rating and market implied rating transition data as an application.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1426
 Format
 Thesis
 Title
 Flexible Additive Risk Models Using Piecewise Constant Hazard Functions.
 Creator

Uhm, Daiho, Huﬀer, Fred W., Kercheval, Alec, McGee, Dan, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We study a weighted least squares (WLS) estimator for Aalen's additive risk model which allows for a very flexible handling of covariates. We divide the followup period into intervals and assume a constant hazard rate in each interval. The model is motivated as a piecewise approximation of a hazard function composed of three parts: arbitrary nonparametric functions for some covariate effects, smoothly varying functions for others, and known (or constant) functions for yet others. The...
Show moreWe study a weighted least squares (WLS) estimator for Aalen's additive risk model which allows for a very flexible handling of covariates. We divide the followup period into intervals and assume a constant hazard rate in each interval. The model is motivated as a piecewise approximation of a hazard function composed of three parts: arbitrary nonparametric functions for some covariate effects, smoothly varying functions for others, and known (or constant) functions for yet others. The proposed estimator is an extension of the grouped data version of the HufferMcKeague estimator (1991). Our estimator may also be regarded as a piecewise constant analog of the semiparametric estimates of McKeague & Sasieni (1994), and Lin & Ying (1994). By using a fairly large number of intervals, we should get an essentially semiparametric model similar to the McKeagueSasieni and LinYing approaches. For our model, since the number of parameters is finite (although large), conventional approaches (such as maximum likelihood) are easy to formulate and implement. The approach is illustrated by simulations, and is applied to data from the Framingham heart study.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd1464
 Format
 Thesis
 Title
 A Class of MixedDistribution Models with Applications in Financial Data Analysis.
 Creator

Tang, Anqi, Niu, Xufeng, Cheng, Yingmei, Wu, Wei, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Statisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the...
Show moreStatisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the probability of nonzero response; the other part is a linear model that models the mean response given that the outcomes are not zeros. Random effects with AR(1) covariance structure are introduced into both parts of the model to allow serial correlation and subject specific effect. Estimating the twopart model is challenging because of high dimensional integration necessary to obtain the maximum likelihood estimates. We propose a Monte Carlo EM algorithm for estimating the maximum likelihood estimates of parameters. Through simulation study, we demonstrate the good performance of the MCEM method in parameter and standard error estimation. To illustrate, we apply the twopart model with correlated random effects and the model with autoregressive random effects to executive compensation data to investigate potential determinants of CEO stock option grants.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1710
 Format
 Thesis
 Title
 Logistic Regression, Measures of Explained Variation, and the Base Rate Problem.
 Creator

Sharma, Dinesh R., McGee, Daniel L., Hurt, Myra, Niu, XuFeng, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

One of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the...
Show moreOne of the desirable properties of the coefficient of determinant (R2 measure) is that its values for different models should be comparable whether the models differ in one or more predictors, or in the dependent variable, or whether the models are specified as being different for different subsets of a dataset. This allows researchers to compare adequacy of models across subgroups of the population or models with different but related dependent variables. However, the various analogs of the R2 measure used for logistic regression analysis are highly sensitive to the base rate (proportion of successes in the sample) and thus do not possess this property. An R2 measure sensitive to the base rate is not suitable to comparison for the same or different model on different datasets, different subsets of a dataset or different but related dependent variables. We evaluated 14 R2 measures that have been suggested or might be useful to measure the explained variation in the logistic regression models based on three criteria 1) intuitively reasonable interpret ability; 2) numerical consistency with the Rho2 of underlying model, and 3) the base rate sensitivity. We carried out a Monte Carlo Simulation study to examine the numerical consistency and the base rate dependency of the various R2 measures for logistic regression analysis. We found all of the parametric R2 measures to be substantially sensitive to the base rate. The magnitude of the base rate sensitivity of these measures tends to be further influenced by the rho2 of the underlying model. None of the measures considered in our study are found to perform equally well in all of the three evaluation criteria used. While R2L stands out for its intuitively reasonable interpretability as a measures of explained variation as well as its independence from the base rate, it appears to severely underestimate the underlying rho2. We found R2CS to be numerically most consistent with the underlying Rho2, with R2N its nearest competitor. In addition, the base rate sensitivity of these two measures appears to be very close to that of the R2L, the most base rate invariant parametric R2 measure. Therefore, we suggest to use R2CS and R2N for logistic regression modeling, specially when it is reasonable to believe that a underlying latent variable exists. However, when the latent variable does not exit, comparability with theunderlying rho2 is not an issue and R2L might be a better choice over all the R2 measures.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd1789
 Format
 Thesis
 Title
 Semiparametric Survival Analysis Using Models with LogLinear Median.
 Creator

Lin, Jianchang, Sinha, Debajyoti, Zhou, Yi, Lipsitz, Stuart, McGee, Dan, Niu, XuFeng, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

First, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our...
Show moreFirst, we present two novel semiparametric survival models with loglinear median regression functions for right censored survival data. These models are useful alternatives to the popular Cox (1972) model and linear transformation models (Cheng et al., 1995). Compared to existing semiparametric models, our models have many important practical advantages, including interpretation of the regression parameters via the median and the ability to address heteroscedasticity. We demonstrate that our modeling techniques facilitate the ease of prior elicitation and computation for both parametric and semiparametric Bayesian analysis of survival data. We illustrate the advantages of our modeling, as well as model diagnostics, via reanalysis of a smallcell lung cancer study. Results of our simulation study provide further guidance regarding appropriate modelling in practice. Our second goal is to develop the methods of analysis and associated theoretical properties for interval censored and current status survival data. These new regression models use loglinear regression function for the median. We present frequentist and Bayesian procedures for estimation of the regression parameters. Our model is a useful and practical alternative to the popular semiparametric models which focus on modeling the hazard function. We illustrate the advantages and properties of our proposed methods via reanalyzing a breast cancer study. Our other aim is to develop a model which is able to account for the heteroscedasticity of response, together with robust parameter estimation and outlier detection using sparsity penalization. Some preliminary simulation studies have been conducted to compare the performance of proposed model and existing median lasso regression model. Considering the estimation bias, mean squared error and other identication benchmark measures, our proposed model performs better than the competing frequentist estimator.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4992
 Format
 Thesis
 Title
 The Effect of High Groundwater Level on Pavement Subgrade Performance.
 Creator

Zhang, Chaohan, Ping, W. Virgil, Niu, Xufeng, Hilton, Amy Chan, Abichou, Tarek, Abdullah, Makola, Department of Civil and Environmental Engineering, Florida State University
 Abstract/Description

High groundwater table exerts detrimental effects on the roadway base and the whole pavement. Base clearance guidelines have been developed to prevent water from entering the pavement system in order to reduce its detrimental effects. This dissertation presents an experimental study to evaluate the effects of high groundwater and the moisture on determining pavement base clearance for granular subgrades. Fullscale inlab and testpit tests were conducted to simulate pavement profile and...
Show moreHigh groundwater table exerts detrimental effects on the roadway base and the whole pavement. Base clearance guidelines have been developed to prevent water from entering the pavement system in order to reduce its detrimental effects. This dissertation presents an experimental study to evaluate the effects of high groundwater and the moisture on determining pavement base clearance for granular subgrades. Fullscale inlab and testpit tests were conducted to simulate pavement profile and vehicle dynamic impact on the pavement. Eight types of granular subgrades were tested for this study. From the test, using layer theory, the results of the resilient modulus for each layer (layer resilient modulus) can be compared with the resilient modulus results from laboratory test. Multiple regression model will be established to predict soil resilient modulus without doing resilient modulus test. The dominant factor or factors of the effect of moisture to resilient modulus will be discussed. The results showed that a 24inch base clearance was considered adequate for the base protection of most A3 and A2 subgrades against high groundwater tables. The lab resilient modulus and layer resilient modulus have the same trend for each soil according to the moisture content change. The SR70 A24 (14% fines) soil was the most susceptible to the change of groundwater table than the other soils. The percent of fines or the percent of clays of subgrade soil is not good indicator to measure the influence of moisture effect on the resilient modulus. The coefficient of uniformity and coefficient of curvature of the subgrade gradations, which better represent the whole shape of the gradation curve, are better indicators of the effect of moisture to modulus.
Show less  Date Issued
 2004
 Identifier
 FSU_migr_etd0545
 Format
 Thesis
 Title
 Revealing Sparse Signals in Functional Data.
 Creator

Ivanescu, Andrada E. (Andrada Eugenia), Bunea, Florentina, Wegkamp, Marten, Gert, Joshua, Niu, Xufeng, Hollander, Myles, Department of Statistics, Florida State University
 Abstract/Description

My dissertation presents a novel statistical method to estimate a sparse signal in functional data and to construct confidence bands for the signal. Existing methods for inference for the mean function in this framework include smoothing splines and kernel estimates. Our methodology involves thresholding a least squares estimator, and the threshold level depends on the sources of variability that exist in this type of data. The proposed estimation method and the confidence bands successfully...
Show moreMy dissertation presents a novel statistical method to estimate a sparse signal in functional data and to construct confidence bands for the signal. Existing methods for inference for the mean function in this framework include smoothing splines and kernel estimates. Our methodology involves thresholding a least squares estimator, and the threshold level depends on the sources of variability that exist in this type of data. The proposed estimation method and the confidence bands successfully adapt to the sparsity of the signal. We present supporting evidence through simulations and applications to real datasets.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd3852
 Format
 Thesis
 Title
 SpatioTemporal Evolutions of NonOrthogonal Equatorial Wave Modes Derived from Observations.
 Creator

Barton, Cory, Cai, Ming, Niu, Xufeng, Clarke, Allan J., Speer, Kevin G. (Kevin George), Sura, Philip, Florida State University, College of Arts and Sciences, Program in...
Show moreBarton, Cory, Cai, Ming, Niu, Xufeng, Clarke, Allan J., Speer, Kevin G. (Kevin George), Sura, Philip, Florida State University, College of Arts and Sciences, Program in Geophysical Fluid Dynamics
Show less  Abstract/Description

Equatorial waves have been studied extensively due to their importance to the tropical climate and weather systems. Historically, their activity is diagnosed mainly in the wavenumberfrequency domain. Recently, many studies have projected observational data onto parabolic cylinder functions (PCFs), which represent the meridional structure of individual wave modes, to attain timedependent spatial wave structures. The nonorthogonality of wave modes has yet posed a problem when attempting to...
Show moreEquatorial waves have been studied extensively due to their importance to the tropical climate and weather systems. Historically, their activity is diagnosed mainly in the wavenumberfrequency domain. Recently, many studies have projected observational data onto parabolic cylinder functions (PCFs), which represent the meridional structure of individual wave modes, to attain timedependent spatial wave structures. The nonorthogonality of wave modes has yet posed a problem when attempting to separate data into wave fields where the waves project onto the same structure functions. We propose the development and application of a new methodology for equatorial wave expansion of instantaneous flows using the full equatorial wave spectrum. By creating a mapping from the meridional structure function amplitudes to the equatorial wave class amplitudes, we are able to diagnose instantaneous wave fields and determine their evolution. Because all meridional modes are shared by some subset of the wave classes, we require constraints on the wave class amplitudes to yield a closed system with a unique solution for all waves' spatial structures, including IG waves. A synthetic field is analyzed using this method to determine its accuracy for data of a single vertical mode. The wave class spectra diagnosed using this method successfully match the correct dispersion curves even if the incorrect depth is chosen for the spatial decomposition. In the case of more than one depth scale, waves with varying equivalent depth may be similarly identified using the dispersion curves. The primary vertical mode is the 200 m equivalent depth mode, which is that of the peak projection response. A distinct spectral power peak along the Kelvin wave dispersion curve for this value validates our choice of equivalent depth, although the possibility of depth varying with time and height is explored. The wave class spectra diagnosed assuming this depth scale mostly match their expected dispersion curves, showing that this method successfully partitions the wave spectra by calculating wave amplitudes in physical space. This is particularly striking because the time evolution, and therefore the frequency characteristics, is determined simply by a timeseries of independentlydiagnosed instantaneous horizontal fields. We use the wave fields diagnosed by this method to study wave evolution in the context of the stratospheric QBO of zonal wind, confirming the continuous evolution of the selection mechanism for equatorial waves in the middle atmosphere. The amplitude cycle synchronized with the background zonal wind as predicted by QBO theory is present in the wave class fields even though the dynamics are not forced by the method itself. We have additionally identified a timeevolution of the zonal wavenumber spectrum responsible for the amplitude variability in physical space. Similar to the temporal characteristics, the vertical structures are also the result of a simple height crosssection through multiple independentlydiagnosed levels.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Barton_fsu_0071E_13099
 Format
 Thesis