Current Search: Huﬀer, Fred (x)
Search results
 Title
 A Spectral Element Method to Price Single and MultiAsset European Options.
 Creator

Zhu, Wuming, Kopriva, David A., Huﬀer, Fred, Case, Bettye Anne, Kercheval, Alec N., Okten, Giray, Wang, Xiaoming, Department of Mathematics, Florida State University
 Abstract/Description

We develop a spectral element method to price European options under the BlackScholes model, Merton's jump diffusion model, and Heston's stochastic volatility model with one or two assets. The method uses piecewise high order Legendre polynomial expansions to approximate the option price represented pointwise on a GaussLobatto mesh within each element. This piecewise polynomial approximation allows an exact representation of the nonsmooth initial condition. For options with one asset under...
Show moreWe develop a spectral element method to price European options under the BlackScholes model, Merton's jump diffusion model, and Heston's stochastic volatility model with one or two assets. The method uses piecewise high order Legendre polynomial expansions to approximate the option price represented pointwise on a GaussLobatto mesh within each element. This piecewise polynomial approximation allows an exact representation of the nonsmooth initial condition. For options with one asset under the jump diffusion model, the convolution integral is approximated by high order GaussLobatto quadratures. A second order implicit/explicit (IMEX) approximation is used to integrate in time, with the convolution integral integrated explicitly. The use of the IMEX approximation in time means that only a block diagonal, rather than full, system of equations needs to be solved at each time step. For options with two variables, i.e., two assets under the BlackScholes model or one asset under the stochastic volatility model, the domain is subdivided into quadrilateral elements. Within each element, the expansion basis functions are chosen to be tensor products of the Legendre polynomials. Three iterative methods are investigated to solve the system of equations at each time step with the corresponding second order time integration schemes, i.e., IMEX and CrankNicholson. Also, the boundary conditions are carefully studied for the stochastic volatility model. The method is spectrally accurate (exponentially convergent) in space and second order accurate in time for European options under all the three models. Spectral accuracy is observed in not only the solution, but also in the Greeks.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd0513
 Format
 Thesis
 Title
 Variance Gamma Pricing of American Futures Options.
 Creator

Yoo, Eunjoo, Nolder, Craig A., Huﬀer, Fred, Case, Bettye Anne, Kercheval, Alec N., Quine, Jack, Department of Mathematics, Florida State University
 Abstract/Description

In financial markets under uncertainty, the classical BlackScholes model cannot explain the empirical facts such as fat tails observed in the probability density. To overcome this drawback, during the last decade, Lévy process and stochastic volatility models were introduced to financial modeling. Today crude oil futures markets are highly volatile. It is the purpose of this dissertation to develop a mathematical framework in which American options on crude oil futures contracts are priced...
Show moreIn financial markets under uncertainty, the classical BlackScholes model cannot explain the empirical facts such as fat tails observed in the probability density. To overcome this drawback, during the last decade, Lévy process and stochastic volatility models were introduced to financial modeling. Today crude oil futures markets are highly volatile. It is the purpose of this dissertation to develop a mathematical framework in which American options on crude oil futures contracts are priced more effectively than by current methods. In this work, we use the Variance Gamma process to model the futures price process. To generate the underlying process, we use a random tress method so that we evaluate the option prices at each tree node. Through fifty replications of a random tree, the averaged value is taken as a true option price. Pricing performance using this method is accessed using American options on crude oil commodity contracts from December 2003 to November 2004. In comparison with the Variance Gamma model, we price using the BlackScholes model as well. Over the entire sample period, a positive skewness and high kurtosis, especially in the shortterm options, are observed. In terms of pricing errors, the Variance Gamma process performs better than the BlackScholes model for the American options on crude oil commodities.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd0691
 Format
 Thesis
 Title
 Numerical Methods for Portfolio Risk Estimation.
 Creator

Zhang, Jianke, Kercheval, Alec, Huﬀer, Fred, Gallivan, Kyle, Beaumont, Paul, Nichols, Warren, Department of Mathematics, Florida State University
 Abstract/Description

In portfolio risk management, a global covariance matrix forecast often needs to be adjusted by changing diagonal blocks corresponding to specific submarkets. Unless certain constraints are obeyed, this can result in the loss of positive definiteness of the global matrix. Imposing the proper constraints while minimizing the disturbance of offdiagonal blocks leads to a nonconvex optimization problem in numerical linear algebra called the Weighted Orthogonal Procrustes Problem. We analyze...
Show moreIn portfolio risk management, a global covariance matrix forecast often needs to be adjusted by changing diagonal blocks corresponding to specific submarkets. Unless certain constraints are obeyed, this can result in the loss of positive definiteness of the global matrix. Imposing the proper constraints while minimizing the disturbance of offdiagonal blocks leads to a nonconvex optimization problem in numerical linear algebra called the Weighted Orthogonal Procrustes Problem. We analyze and compare two local minimizing algorithms and offer an algorithm for global minimization. Our methods are faster and more effective than current numerical methods for covariance matrix revision.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd0542
 Format
 Thesis
 Title
 Calibration of Multivariate Generalized Hyperbolic Distributions Using the EM Algorithm, with Applications in Risk Management, Portfolio Optimization and Portfolio Credit Risk.
 Creator

Hu, Wenbo, Kercheval, Alec, Huﬀer, Fred, Case, Bettye, Nichols, Warren, Nolder, Craig, Department of Mathematics, Florida State University
 Abstract/Description

The distributions of many financial quantities are wellknown to have heavy tails, exhibit skewness, and have other nonGaussian characteristics. In this dissertation we study an especially promising family: the multivariate generalized hyperbolic distributions (GH). This family includes and generalizes the familiar Gaussian and Student t distributions, and the socalled skewed t distributions, among many others. The primary obstacle to the applications of such distributions is the numerical...
Show moreThe distributions of many financial quantities are wellknown to have heavy tails, exhibit skewness, and have other nonGaussian characteristics. In this dissertation we study an especially promising family: the multivariate generalized hyperbolic distributions (GH). This family includes and generalizes the familiar Gaussian and Student t distributions, and the socalled skewed t distributions, among many others. The primary obstacle to the applications of such distributions is the numerical difficulty of calibrating the distributional parameters to the data. In this dissertation we describe a way to stably calibrate GH distributions for a wider range of parameters than has previously been reported. In particular, we develop a version of the EM algorithm for calibrating GH distributions. This is a modification of methods proposed in McNeil, Frey, and Embrechts (2005), and generalizes the algorithm of Protassov (2004). Our algorithm extends the stability of the calibration procedure to a wide range of parameters, now including parameter values that maximize loglikelihood for our real market data sets. This allows for the first time certain GH distributions to be used in modeling contexts when previously they have been numerically intractable. Our algorithm enables us to make new uses of GH distributions in three financial applications. First, we forecast univariate ValueatRisk (VaR) for stock index returns, and we show in outofsample backtesting that the GH distributions outperform the Gaussian distribution. Second, we calculate an efficient frontier for equity portfolio optimization under the skewedt distribution and using Expected Shortfall as the risk measure. Here, we show that the Gaussian efficient frontier is actually unreachable if returns are skewed t distributed. Third, we build an intensitybased model to price Basket Credit Default Swaps by calibrating the skewed t distribution directly, without the need to separately calibrate xi the skewed t copula. To our knowledge this is the first use of the skewed t distribution in portfolio optimization and in portfolio credit risk.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd3694
 Format
 Thesis
 Title
 Option Pricing with Selfsimilar Additive Processes.
 Creator

Galloway, Mack L. (Mack Laws), Nolder, Craig, Huﬀer, Fred, Beaumont, Paul, Case, Bettye Anne, Quine, John R., Department of Mathematics, Florida State University
 Abstract/Description

The use of timeinhomogeneous additive models in option pricing has gained attention in recent years due to their potential to adequately price options across both strike and maturity with relatively few parameters. In this thesis two such classes of models based on the selfsimilar additive processes of Sato are developed. One class of models consists of the riskneutral exponentials of a selfsimilar additive process, while the other consists of the riskneutral exponentials of a Brownian...
Show moreThe use of timeinhomogeneous additive models in option pricing has gained attention in recent years due to their potential to adequately price options across both strike and maturity with relatively few parameters. In this thesis two such classes of models based on the selfsimilar additive processes of Sato are developed. One class of models consists of the riskneutral exponentials of a selfsimilar additive process, while the other consists of the riskneutral exponentials of a Brownian motion timechanged by an independent, increasing, selfsimilar additive process. Examples from each class are constructed in which the time one distributions are Variance Gamma or Normal Inverse Gaussian distributed. Pricing errors are assessed for the case of Standard and Poor's 500 index options from the year 2005. Both sets of timeinhomogeneous additive models show dramatic improvement in pricing error over their associated Lévy processes. Furthermore, with regard to the average of the pricing errors over the quote dates studied, the selfsimilar Normal Inverse Gaussian model yields a mean pricing error significantly less than that implied by the bidask spreads of the options, and also significantly less than that given by its associated, less parsimonious, Lévy stochastic volatility model.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd4372
 Format
 Thesis
 Title
 Partial Differential Equation Methods to Price Options in the Energy Market.
 Creator

Yan, Jinhua, Kopriva, David, Huﬀer, Fred, Case, Bettye Anne, Nolder, Craig, Wang, Xiaoming, Department of Mathematics, Florida State University
 Abstract/Description

We develop partial differential equation methods with wellposed boundary conditions to price average strike options and swing options in the energy market. We use the energy method to develop boundary conditions that make a two space variable model of Asian options wellposed on a finite domain. To test the performance of wellposed boundary conditions, we price an average strike call. We also derive new boundary conditions for the average strike option from the putcall parity. Numerical...
Show moreWe develop partial differential equation methods with wellposed boundary conditions to price average strike options and swing options in the energy market. We use the energy method to develop boundary conditions that make a two space variable model of Asian options wellposed on a finite domain. To test the performance of wellposed boundary conditions, we price an average strike call. We also derive new boundary conditions for the average strike option from the putcall parity. Numerical results show that wellposed boundary conditions are working appropriately and solutions with new boundary conditions match the similarity solution significantly better than those provided in the existing literature. To price swing options, we develop a finite element penalty method on a one factor mean reverting diffusion model. We use the energy method to find wellposed boundary conditions on a finite domain, derive formulas to estimate the size of the numerical domain, develop a priori error estimates for both Dirichlet boundary conditions and Neumann boundary conditions. We verify the results through numerical experiments. Since the optimal exercise price is unknown in advance, which makes the swing option valuation challenging, we use a penalty method to resolve the difficulty caused by the early exercise feature. Numerical results show that the finite element penalty method is thousands times faster than the Binomial tree method at the same level of accuracy. Furthermore, we price a multiple right swing option with different strike prices. We find that a jump discontinuity can occur in the initial condition of a swing right since the exercise of another swing right may force its optimal exercise region to shrink. We develop an algorithm to identify the optimal exercise boundary at each time level, which allows us to record the optimal exercise time. Numerical results are accurate to one cent comparing with the benchmark solutions computed by a Binomial tree method. We extend applications to multiple right swing options with a waiting period restriction. A waiting period exists between two swing rights to be exercised successively, so we cannot exercise the latter right when we see an optimal exercise opportunity within the waiting period, but have to wait for the first optimal exercise opportunity after the waiting period. Therefore, we keep track of the optimal exercise time when pricing each swing right. We also verify an extreme case numerically. When the waiting time decreases, the value of M right swing option price increases to the value of M times an American option price as expected.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7673
 Format
 Thesis
 Title
 Stochastic Volatility Extensions of the Swap Market Model.
 Creator

Tzigantcheva, Milena G. (Milena Gueorguieva), Nolder, Craig, Huﬀer, Fred, Case, Bettye Anne, Kercheval, Alec, Quine, Jack, Sumners, De Witt, Department of Mathematics, Florida...
Show moreTzigantcheva, Milena G. (Milena Gueorguieva), Nolder, Craig, Huﬀer, Fred, Case, Bettye Anne, Kercheval, Alec, Quine, Jack, Sumners, De Witt, Department of Mathematics, Florida State University
Show less  Abstract/Description

Two stochastic volatility extensions of the Swap Market Model, one with jumps and the other without, are derived. In both stochastic volatility extensions of the Swap Market Model the instantaneous volatility of the forward swap rates evolves according to a squareroot diffusion process. In the jumpdiffusion stochastic volatility extension of the Swap Market Model, the proportional lognormal jumps are applied to the swap rate dynamics. The speed, the flexibility and the accuracy of the fast...
Show moreTwo stochastic volatility extensions of the Swap Market Model, one with jumps and the other without, are derived. In both stochastic volatility extensions of the Swap Market Model the instantaneous volatility of the forward swap rates evolves according to a squareroot diffusion process. In the jumpdiffusion stochastic volatility extension of the Swap Market Model, the proportional lognormal jumps are applied to the swap rate dynamics. The speed, the flexibility and the accuracy of the fast fractional Fourier transform made possible a fast calibration to European swaption market prices. A specific functional form of the instantaneous swap rate volatility structure was used to meet the observed evidence that volatility of the instantaneous swap rate decreases with longer swaption maturity and with larger swaption tenors.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd1762
 Format
 Thesis
 Title
 Impulse Control Problems under NonConstant Volatility.
 Creator

Moreno, Juan F. (Juan Felipe), Kercheval, Alec, Huﬀer, Fred, Beaumont, Paul, Nichols, Warren, Nolder, Craig, Wang, Xiaoming, Department of Mathematics, Florida State University
 Abstract/Description

The objective of this dissertation is to study impulse control problems in situations where the volatility of the underlying process is not constant. First, we explore the case where the dynamics of the underlying process are modified for a fixed (or random with known probability distribution) period of time after each intervention of the impulse control. We propose a modified intervention operator to be used in the QuasiVariational Inequalities approach for solving impulse control problems,...
Show moreThe objective of this dissertation is to study impulse control problems in situations where the volatility of the underlying process is not constant. First, we explore the case where the dynamics of the underlying process are modified for a fixed (or random with known probability distribution) period of time after each intervention of the impulse control. We propose a modified intervention operator to be used in the QuasiVariational Inequalities approach for solving impulse control problems, and we formulate and prove a verification theorem for finding the Value Function of the problem and the optimal control. Secondly, we use a perturbation approach to tackle impulse control problems when the volatility of the underlying process is stochastic but meanreverting. The perturbation method permits to approximate the Value Function and the parameters of the optimal control. Finally, we present a numerical scheme to obtain solutions to impulse control problems with constant and stochastic volatility. Throughout the thesis we find explicit solutions to practical applications in financial mathematics; specifically, in optimal central bank intervention of the exchange rate and in optimal policy dividend payments.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd2271
 Format
 Thesis
 Title
 Sensitivity Analysis of Options under Lévy Processes via Malliavin Calculus.
 Creator

Bayazit, Dervis, Nolder, Craig A., Huﬀer, Fred, Case, Bettye Anne, Kopriva, David, Okten, Giray, Quine, Jack, Department of Mathematics, Florida State University
 Abstract/Description

The sensitivity analysis of options is as important as pricing in option theory since it is used for hedging strategies, hence for risk management purposes. This dissertation presents new sensitivities for options when the underlying follows an exponential Lévy process, specifically Variance Gamma and Normal Inverse Gaussian processes. The calculation of these sensitivities is based on a finite dimensional Malliavin calculus and the centered finite difference method via MonteCarlo...
Show moreThe sensitivity analysis of options is as important as pricing in option theory since it is used for hedging strategies, hence for risk management purposes. This dissertation presents new sensitivities for options when the underlying follows an exponential Lévy process, specifically Variance Gamma and Normal Inverse Gaussian processes. The calculation of these sensitivities is based on a finite dimensional Malliavin calculus and the centered finite difference method via MonteCarlo simulations. We give explicit formulas that are used directly in MonteCarlo simulations. By using simulations, we show that a localized version of the Malliavin estimator outperforms others including the centered finite difference estimator for the call and digital options under Variance Gamma and Normal Inverse Gaussian processes driven option pricing models. In order to compare the performance of these methods we use an inverse Fourier transform method to calculate the exact values of the sensitivities of European call and digital options written on S&P 500 index. Our results show that a variation of localized Malliavin calculus approach gives a robust estimator while the convergence of centered finite difference method in MonteCarlo simulations varies with different Greeks and new sensitivities that we introduce. We also discuss an approximation method for the Variance Gamma process. We introduce new random number generators for the path wise simulations of the approximating process. We improve convergence results for a type of sensitivity by using a mixed Malliavin calculus on the increments of the approximating process.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1157
 Format
 Thesis
 Title
 Anova for Parameter Dependent Nonlinear PDEs and Numerical Methods for the Stochastic Stokes Equations.
 Creator

Chen, Zheng, Gunzburger, Max, Huﬀer, Fred, Peterson, Janet, Wang, Xiaoqiang, Department of Mathematics, Florida State University
 Abstract/Description

This dissertation includes the application of analysisofvariance (ANOVA) expansions to analyze solutions of parameter dependent partial differential equations and the analysis and finite element approximations of the Stokes equations with stochastic forcing terms. In the first part of the dissertation, the impact of parameter dependent boundary conditions on the solutions of a class of nonlinear PDEs is considered. Based on the ANOVA expansions of functionals of the solutions, the effects...
Show moreThis dissertation includes the application of analysisofvariance (ANOVA) expansions to analyze solutions of parameter dependent partial differential equations and the analysis and finite element approximations of the Stokes equations with stochastic forcing terms. In the first part of the dissertation, the impact of parameter dependent boundary conditions on the solutions of a class of nonlinear PDEs is considered. Based on the ANOVA expansions of functionals of the solutions, the effects of different parameter sampling methods on the accuracy of surrogate optimization approaches to PDE constrained optimization is considered. The effects of the smoothness of the functional and the nonlinearity in the PDE on the decay of the higherorder ANOVA terms are studied. The concept of effective dimensions is used to determine the accuracy of the ANOVA expansions. Demonstrations are given to show that whenever truncated ANOVA expansions of functionals provide accurate approximations, optimizers found through a simple surrogate optimization strategy are also relatively accurate. The effects of several parameter sampling strategies on the accuracy of the surrogate optimization method are also considered; it is found that for this sparse sampling application, the Latin hypercube sampling method has advantages over other wellknown sampling methods. Although most of the results are presented and discussed in the context of surrogate optimization problems, they also apply to other settings such as stochastic ensemble methods and reducedorder modeling for nonlinear PDEs. In the second part of the dissertation, we study the numerical analysis of the Stokes equations driven by a stochastic process. The random processes we use are white noise, colored noise and the homogeneous Gaussian process. When the process is white noise, we deal with the singularity of matrix Green's functions in the form of mild solutions with the aid of the theory of distributions. We develop finite element methods to solve the stochastic Stokes equations. In the 2D and 3D cases, we derive error estimates for the approximate solutions. The results of numerical experiments are provided in the 2D case that demonstrate the algorithm and convergence rates. On the other hand, the singularity of the matrix Green's functions necessitates the use of the homogeneous Gaussian process. In the framework of theory of abstract Wiener spaces, the stochastic integrals with respect to the homogeneous Gaussian process can be defined on a larger space than L2 . With some conditions on the density function in the definition of the homogeneous Gaussian process, the matrix Green's functions have well defined integrals. We have studied the probability properties of this kind of integral and simulated discretized colored noise.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd3851
 Format
 Thesis
 Title
 A Computational Study of Ion Conductance in the KcsA K⁺ Channel Using a NernstPlanck Model with Explicit Resident Ions.
 Creator

Jung, Yong Woon, Mascagni, Michael A., Huﬀer, Fred, Bowers, Philip, Klassen, Eric, Cogan, Nick, Department of Mathematics, Florida State University
 Abstract/Description

In this dissertation, we describe the biophysical mechanisms underlying the relationship between the structure and function of the KcsA K+ channel. Because of the conciseness of electrodiffusion theory and the computational advantages of a continuum approach, NernstPlanck (NP) type models such as the GoldmanHodgkinKatz (GHK) and PoissonNernstPlanck (PNP) models have been used to describe currents in ion channels. However, the standard PNP (SPNP) model is known to be inapplicable to...
Show moreIn this dissertation, we describe the biophysical mechanisms underlying the relationship between the structure and function of the KcsA K+ channel. Because of the conciseness of electrodiffusion theory and the computational advantages of a continuum approach, NernstPlanck (NP) type models such as the GoldmanHodgkinKatz (GHK) and PoissonNernstPlanck (PNP) models have been used to describe currents in ion channels. However, the standard PNP (SPNP) model is known to be inapplicable to narrow ion channels because it cannot handle discrete ion properties. To overcome this weakness, we formulated the explicit resident ions NernstPlanck (ERINP) model, which applies a local explicit model where the continuum model fails. Then we tested the effects of the ERI Coulomb potential, the ERI induced potential, and the ERI dielectric constant for ion conductance were tested in the ERINP model. Using the currentvoltage (IV ) and currentconcentration (IC) relationships determined from the ERINP model, we discovered biologically significant information that is unobtainable from the traditional continuum model. The mathematical analysis of the K+ ion dynamics revealed a tight structurefunction system with a shallow well, a deep well, and two K+ ions resident in the selectivity filter. We also demonstrated that the ERINP model not only reproduced the experimental results with a realistic set of parameters, it also reduced CPU costs.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd3741
 Format
 Thesis
 Title
 TimeVarying Coefficient Models with ARMAGARCH Structures for Longitudinal Data Analysis.
 Creator

Zhao, Haiyan, Niu, Xufeng, Huﬀer, Fred, Nolder, Craig, McGee, Dan, Department of Statistics, Florida State University
 Abstract/Description

The motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary...
Show moreThe motivation of my research comes from the analysis of the Framingham Heart Study (FHS) data. The FHS is a long term prospective study of cardiovascular disease in the community of Framingham, Massachusetts. The study began in 1948 and 5,209 subjects were initially enrolled. Examinations were given biennially to the study participants and their status associated with the occurrence of disease was recorded. In this dissertation, the event we are interested in is the incidence of the coronary heart disease (CHD). Covariates considered include sex, age, cigarettes per day (CSM), serum cholesterol (SCL), systolic blood pressure (SBP) and body mass index (BMI, weight in kilograms/height in meters squared). Statistical literature review indicates that effects of the covariates on Cardiovascular disease or death caused by all possible diseases in the Framingham study change over time. For example, the effect of SCL on Cardiovascular disease decreases linearly over time. In this study, I would like to examine the timevarying effects of the risk factors on CHD incidence. Timevarying coefficient models with ARMAGARCH structure are developed in this research. The maximum likelihood and the marginal likelihood methods are used to estimate the parameters in the proposed models. Since highdimensional integrals are involved in the calculations of the marginal likelihood, the Laplace approximation is employed in this study. Simulation studies are conducted to evaluate the performance of these two estimation methods based on our proposed models. The KullbackLeibler (KL) divergence and the root mean square error are employed in the simulation studies to compare the results obtained from different methods. Simulation results show that the marginal likelihood approach gives more accurate parameter estimates, but is more computationally intensive. Following the simulation study, our proposed models are applied to the Framingham Heart Study to investigate the timevarying effects of covariates with respect to CHD incidence. To specify the timeseries structures of the effects of risk factors, the Bayesian Information Criterion (BIC) is used for model selection. Our study shows that the relationship between CHD and risk factors changes over time. For males, there is an obviously decreasing linear trend for age effect, which implies that the age effect on CHD is less significant for elder patients than younger patients. The effect of CSM stays almost the same in the first 30 years and decreases thereafter. There are slightly decreasing linear trends for both effects of SBP and BMI. Furthermore, the coefficients of SBP are mostly positive over time, i.e., patients with higher SBP are more likely developing CHD as expected. For females, there is also an obviously decreasing linear trend for age effect, while the effects of SBP and BMI on CHD are mostly positive and do not change too much over time.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0527
 Format
 Thesis
 Title
 Estimation from Data Representing a Sample of Curves.
 Creator

Auguste, Anna L., Bunea, Florentina, Mason, Patrick, Hollander, Myles, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately...
Show moreThis dissertation introduces and assesses an algorithm to generate confidence bands for a regression function or a main effect when multiple data sets are available. In particular it proposes to construct confidence bands for different trajectories and then aggregate these to produce an overall confidence band for a mean function. An estimator of the regression function or main effect is also examined. First, nonparametric estimators and confidence bands are formed on each data set separately. Then each data set is in turn treated as a testing set for aggregating the preliminary results from the remaining data sets. The criterion used for this aggregation is either the least squares (LS) criterion or a BIC type penalized LS criterion. The proposed estimator is the average over data sets of these aggregates. It is thus a weighted sum of the preliminary estimators. The proposed confidence band is the minimum L1 band of all the M aggregate bands when we only have a main effect. In the case where there is some random effect we suggest an adjustment to the confidence band. In this case, the proposed confidence band is the minimum L1 band of all the M adjusted aggregate bands. Desirable asymptotic properties are shown to hold. A simulation study examines the performance of each technique relative to several alternate methods and theoretical benchmarks. An application to seismic data is conducted.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd0286
 Format
 Thesis
 Title
 Investigating the Categories for Cholesterol and Blood Pressure for Risk Assessment of Death Due to Coronary Heart Disease.
 Creator

Franks, Billy J., McGee, Daniel, Hurt, Myra, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Many characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those...
Show moreMany characteristics for predicting death due to coronary heart disease are measured on a continuous scale. These characteristics, however, are often categorized for clinical use and to aid in treatment decisions. We would like to derive a systematic approach to determine the best categorizations of systolic blood pressure and cholesterol level for use in identifying individuals who are at high risk for death due to coronary heart disease and to compare these data derived categories to those in common usage. Whatever categories are chosen, they should allow physicians to accurately estimate the probability of survival from coronary heart disease until some time t. The best categories will be those that provide the most accurate prediction for an individual's risk of dying by t. The approach that will be used to determine these categories will be a version of Classification And Regression Trees that can be applied to censored survival data. The major goals of this dissertation are to obtain dataderived categories for risk assessment, compare these categories to the ones already recommended in the medical community, and to assess the performance of these categories in predicting survival probabilities.
Show less  Date Issued
 2005
 Identifier
 FSU_migr_etd4402
 Format
 Thesis
 Title
 Riemannian Shape Analysis of Curves and Surfaces.
 Creator

Kurtek, Sebastian, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Huﬀer, Fred, Dryden, Ian, Department of Statistics, Florida State University
 Abstract/Description

Shape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the...
Show moreShape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the analysis must be performed on infinitedimensional and sometimes nonlinear spaces, which poses an additional difficulty. In this work, we develop and apply methods which address these issues. We begin by defining a framework for shape analysis of parameterized open curves and extend these ideas to shape analysis of surfaces. We utilize the presented frameworks in various classification experiments spanning multiple application areas. In the case of curves, we consider the problem of clustering DTMRI brain fibers, classification of protein backbones, modeling and segmentation of signatures and statistical analysis of biosignals. In the case of surfaces, we perform disease classification using 3D anatomical structures in the brain, classification of handwritten digits by viewing images as quadrilateral surfaces, and finally classification of cropped facial surfaces. We provide two additional extensions of the general shape analysis frameworks that are the focus of this dissertation. The first one considers shape analysis of marked spherical surfaces where in addition to the surface information we are given a set of manually or automatically generated landmarks. This requires additional constraints on the definition of the reparameterization group and is applicable in many domains, especially medical imaging and graphics. Second, we consider reflection symmetry analysis of planar closed curves and spherical surfaces. Here, we also provide an example of disease detection based on brain asymmetry measures. We close with a brief summary and a discussion of open problems, which we plan on exploring in the future.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4963
 Format
 Thesis
 Title
 A Riemannian Framework for Annotated Curves Analysis.
 Creator

Liu, Wei, Srivastava, Anuj, Zhang, Jinfeng, Klassen, Eric P., Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

We propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions...
Show moreWe propose a Riemannian framework for shape analysis of annotated curves, curves that have certain attributes defined along them, in addition to their geometries.These attributes may be in form of vectorvalued functions, discrete landmarks, or symbolic labels, and provide auxiliary information along the curves. The resulting shape analysis, that is comparing, matching, and deforming, is naturally influenced by the auxiliary functions. Our idea is to construct curves in higher dimensions using both geometric and auxiliary coordinates, and analyze shapes of these curves. The difficulty comes from the need for removing different groups from different components: the shape is invariant to rigidmotion, global scale and reparameterization while the auxiliary component is usually invariant only to the reparameterization. Thus, the removal of some transformations (rigid motion and global scale) is restricted only to the geometric coordinates, while the reparameterization group is removed for all coordinates. We demonstrate this framework using a number of experiments.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd4997
 Format
 Thesis
 Title
 A Probabilistic and Graphical Analysis of Evidence in O.J. Simpson's Murder Case Using Bayesian Networks.
 Creator

Olumide, Kunle, Huﬀer, Fred, Shute, Valerie, Sinha, Debajyoti, Niu, Xufeng, Logan, Wayne, Department of Statistics, Florida State University
 Abstract/Description

This research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of...
Show moreThis research work is an attempt to illustrate the versatility and wide applications of the field of statistical science. Specifically, the research work involves the application of statistics in the field of law. The application will focus on the subfields of Evidence and Criminal law using one of the most celebrated cases in the history of American jurisprudence  the 1994 O.J. Simpson murder case in California. Our task here is to do a probabilistic and graphical analysis of the body of evidence in this case using Bayesian Networks. We will begin the analysis by first constructing our main hypothesis regarding the guilt or nonguilt of the accused; this main hypothesis will be supplemented by a series of ancillary hypotheses. Using graphs and probability concepts, we will be evaluating the probative force or strength of the evidence and how well the body of evidence at hand will prove our main hypothesis. We will employ Bayes rule, likelihoods and likelihood ratios to carry out such an evaluation. Some sensitivity analyses will be carried out by varying the degree of our prior beliefs or probabilities, and evaluating the effect of such variations on the likelihood ratios regarding our main hypothesis.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd2287
 Format
 Thesis
 Title
 Sparse Factor AutoRegression for Forecasting Macroeconomic Time Series with Very Many Predictors.
 Creator

Galvis, Oliver Kurt, She, Yiyuan, Okten, Giray, Beaumont, Paul, Huﬀer, Fred, Tao, Minjing, Department of Statistics, Florida State University
 Abstract/Description

Forecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are...
Show moreForecasting a univariate target time series in high dimensions with very many predictors poses challenges in statistical learning and modeling. First, many nuisance time series exist and need to be removed. Second, from economic theories, a macroeconomic target series is typically driven by few latent factors constructed from some macroeconomic indices. Consequently, a high dimensional problem arises where deleting junk time series and constructing predictive factors simultaneously, are meaningful and advantageous for accuracy of the forecasting task. In macroeconomics, multiple categories are available with the target series belonging to one of them. With all series available we advocate constructing category level factors to enhance the performance of the forecasting task. We introduce a novel methodology, the Sparse Factor AutoRegression (SFAR) methodology, to construct predictive factors from a reduced set of relevant time series. SFAR attains dimension reduction via joint variable selection and rank reduction in high dimensional time series data. A multivariate setting is used to achieve simultaneous low rank and cardinality control on the matrix of coefficients where $ell_{0}$constraint regulates the number of useful series and the rank constrain elucidates the upper bound for constructed factors. The doublyconstrained matrix is a nonconvex mathematical problem optimized via an efficient iterative algorithm with a theoretical guarantee of convergence. SFAR fits factors using a sparse low rank matrix in response to a target category series. Forecasting is then performed using lagged observations and shrinkage methods. We generate a finite sample data to verify our theoretical findings via a comparative study of the SFAR. We also analyze realworld macroeconomic time series data to demonstrate the usage of the SFAR in practice.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8990
 Format
 Thesis
 Title
 Estimating the Probability of Cardiovascular Disease: A Comparison of Methods.
 Creator

Fan, Li, McGee, Daniel, Hurt, Myra, Niu, XuFeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Risk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for...
Show moreRisk prediction plays an important role in clinical medicine. It not only helps in educating patients to improve life style and in targeting individuals at high risk, but also guides treatment decisions. So far, various instruments have been used for different risk assessment in different countries and the risk predictions based from these different models are not consistent. In public use, a reliable risk prediction is necessary. This thesis discusses the models that have been developed for risk assessment and evaluates the performance of prediction at two levels, including the overall level and the individual level. At the overall level, cross validation and simulation are used to assess the risk prediction, while at the individual level, the "Parametric Bootstrap" and the delta method are used to evaluate the uncertainty of the individual risk prediction. Further exploration of the reasons producing different performance among the models is ongoing.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd4508
 Format
 Thesis
 Title
 Adaptive Series Estimators for Copula Densities.
 Creator

Gui, Wenhao, Wegkamp, Marten, Van Engelen, Robert A., Niu, Xufeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

In this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the...
Show moreIn this thesis, based on an orthonormal series expansion, we propose a new nonparametric method to estimate copula density functions. Since the basis coefficients turn out to be expectations, empirical averages are used to estimate these coefficients. We propose estimators of the variance of the estimated basis coefficients and establish their consistency. We derive the asymptotic distribution of the estimated coefficients under mild conditions. We derive a simple oracle inequality for the copula density estimator based on a finite series using the estimated coefficients. We propose a stopping rule for selecting the number of coefficients used in the series and we prove that this rule minimizes the mean integrated squared error. In addition, we consider hard and soft thresholding techniques for sparse representations. We obtain oracle inequalities that hold with prescribed probability for various norms of the difference between the copula density and our threshold series density estimator. Uniform confidence bands are derived as well. The oracle inequalities clearly reveal that our estimator adapts to the unknown degree of sparsity of the series representation of the copula density. A simulation study indicates that our method is extremely easy to implement and works very well, and it compares favorably to the popular kernel based copula density estimator, especially around the boundary points, in terms of mean squared error. Finally, we have applied our method to an insurance dataset. After comparing our method with the previous data analyses, we reach the same conclusion as the parametric methods in the literature and as such we provide additional justification for the use of the developed parametric model.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd3929
 Format
 Thesis
 Title
 Bayesian Generalized Polychotomous Response Models and Applications.
 Creator

Yang, Fang, Niu, XuFeng, Johnson, Suzanne B., McGee, Dan, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Polychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from...
Show morePolychotomous quantal response models are widely used in medical and econometric studies to analyze categorical or ordinal data. In this study, we apply the Bayesian methodology through a mixedeffects polychotomous quantal response model. For the Bayesian polychotomous quantal response model, we assume uniform improper priors for the regression coeffcients and explore the suffcient conditions for a proper joint posterior distribution of the parameters in the models. Simulation results from Gibbs sampling estimates will be compared to traditional maximum likelihood estimates to show the strength that using the uniform improper priors for the regression coeffcients. Motivated by investigating of relationship between BMI categories and several risk factors, we carry out the application studies to examine the impact of risk factors on BMI categories, especially for categories of "Overweight" and "Obesities". By applying the mixedeffects Bayesian polychotomous response model with uniform improper priors, we would get similar interpretations of the association between risk factors and BMI, comparing to literature findings.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1092
 Format
 Thesis
 Title
 A Statistical Approach for Information Extraction of Biological Relationships.
 Creator

Bell, Lindsey R., Zhang, Jinfeng, Niu, Xufeng, Tyson, Gary, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Vast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text...
Show moreVast amounts of biomedical information are stored in scientific literature, easily accessed through publicly available databases. Relationships among biomedical terms constitute a major part of our biological knowledge. Acquiring such structured information from unstructured literature can be done through human annotation, but is time and resource consuming. As this content continues to rapidly grow, the popularity and importance of text mining for obtaining information from unstructured text becomes increasingly evident. Text mining has four major components. First relevant articles are identified through information retrieval (IR), next important concepts and terms are flagged using entity recognition (ER), and then relationships between these entities are extracted from the literature in a process called information extraction(IE). Finally, text mining takes these elements and seeks to synthesize new information from the literature. Our goal is information extraction from unstructured literature concerning biological entities. To do this, we use the structure of triplets where each triplet contains two biological entities and one interaction word. The biological entities may include terms such as protein names, disease names, genes, and smallmolecules. Interaction words describe the relationship between the biological terms. Under this framework we aim to combine the strengths of three classifiers in an ensemble approach. The three classifiers we consider are Bayesian Networks, Support Vector Machines, and a mixture of logistic models defined by interaction word. The three classifiers and ensemble approach are evaluated on three benchmark corpora and one corpus that is introduced in this study. The evaluation includes cross validation and crosscorpus validation to replicate an application scenario. The three classifiers are unique and we find that performance of individual classifiers varies depending on the corpus. Therefore, an ensemble of classifiers removes the need to choose one classifier and provides optimal performance.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1314
 Format
 Thesis
 Title
 Variable Selection of Correlated Predictors in Logistic Regression: Investigating the DietHeart Hypothesis.
 Creator

Thompson, Warren R. (Warren Robert), McGee, Daniel, Eberstein, Isaac, Huﬀer, Fred, Sinha, Debajyoti, She, Yiyuan, Department of Statistics, Florida State University
 Abstract/Description

Variable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the...
Show moreVariable selection is an important aspect of modeling. Its aim is to distinguish between the authentic variables which are important in predicting outcome, and the noise variables which possess little to no predictive value. In other words, the goal is to find the variables that (collectively) best explains and predicts changes in the outcome variable. The variable selection problem is exacerbated when correlated variables are included in the covariate set. This dissertation examines the variable selection problem in the context of logistic regression. Specifically, we investigated the merits of the bootstrap, ridge regression, the lasso and Bayesian model averaging (BMA) as variable selection techniques when highly correlated predictors and a dichotomous outcome are considered. This dissertation also contributes to the literature on the dietheart hypothesis. The dietheart hypothesis has been around since the early twentieth century. Since then, researchers have attempted to isolate the nutrients in diet that promote coronary heart disease (CHD). After a century of research, there is still no consensus. In our current research, we used some of the more recent statistical methodologies (mentioned above) to investigate the effect of twenty dietary variables on the incidence of coronary heart disease. Logistic regression models were generated for the data from the Honolulu Heart Program  a study of CHD incidence in men of Japanese descent. Our results were largely methodspecific. However, regardless of method considered, there was strong evidence to suggest that alcohol consumption has a strong protective effect on the risk of coronary heart disease. Of the variables considered, dietary cholesterol and caffeine were the only variables that, at best, exhibited a moderately strong harmful association with CHD incidence. Further investigation that includes a broader array of food groups is recommended.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd1360
 Format
 Thesis
 Title
 The Frequentist Performance of Some Bayesian Confidence Intervals for the Survival Function.
 Creator

Tao, Yingfeng, Huﬀer, Fred, Okten, Giray, Sinha, Debajyoti, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

Estimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based...
Show moreEstimation of a survival function is a very important topic in survival analysis with contributions from many authors. This dissertation considers estimation of confidence intervals for the survival function based on right censored or intervalcensored survival data. Most of the methods for estimating pointwise confidence intervals and simultaneous confidence bands of the survival function are reviewed in this dissertation. In the rightcensored case, almost all confidence intervals are based in some way on the KaplanMeier estimator first proposed by Kaplan and Meier (1958) and widely used as the nonparametric estimator in the presence of rightcensored data. For intervalcensored data, the Turnbull estimator (Turnbull (1974)) plays a similar role. For a class of Bayesian models involving Dirichlet priors, Doss and Huffer (2003) suggested several simulation techniques to approximate the posterior distribution of the survival function by using Markov chain Monte Carlo or sequential importance sampling. These techniques lead to probability intervals for the survival function (at arbitrary time points) and its quantiles for both the rightcensored and intervalcensored cases. This dissertation will examine the frequentist properties and general performance of these probability intervals when the prior is noninformative. Simulation studies will be used to compare these probability intervals with other published approaches. Extensions of the DossHuffer approach are given for constructing simultaneous confidence bands for the survival function and for computing approximate confidence intervals for the survival function based on Edgeworth expansions using posterior moments. The performance of these extensions is studied by simulation.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7624
 Format
 Thesis
 Title
 A Method for Finding the Nadir of NonMonotonic Relationships.
 Creator

Tan, Fei, McGee, Daniel, Lloyd, Donald, Huﬀer, Fred, Niu, Xufeng, Dutton, Gareth, Department of Statistics, Florida State University
 Abstract/Description

Different methods have been proposed to model the Jshaped or Ushaped relationship between a risk factor and mortality so that the optimal riskfactor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixedknot spline regression. A quadratic method contains both the linear and the...
Show moreDifferent methods have been proposed to model the Jshaped or Ushaped relationship between a risk factor and mortality so that the optimal riskfactor value (nadir) associated with the lowest mortality can be estimated. The basic model considered is the Cox Proportional Hazards model. Current methods include a quadratic method, a method with transformation, fractional polynomials, a change point method and fixedknot spline regression. A quadratic method contains both the linear and the quadratic term of the risk factor, it is simple but often it generates unrealistic nadir estimates. The transformation method converts the original risk factor so that after transformation it has a Normal distribution, but this may not work when there is no good transformation to normality. Fractional polynomials are an extended class of regular polynomials that applies negative and fractional powers to the risk factor. Compared with the quadratic method or the transformation method it does not always have a good model interpretation and inferences about it do not incorporate the uncertainty coming from preselection of powers and degree. A change point method models the prognostic index using two pieces of upward quadratic functions that meet at their common nadir. This method assumes the knot and the nadir are the same, which is not always true. Fixedknot spline regression has also been used to model nonlinear prognostic indices. But its inference does not account for variation arising from knot selections. Here we consider spline regressions with free knots, a natural generalization of the quadratic, the change point and the fixedknot spline method. They can be applied to risk factors that do not have a good transformation to normality as well as keep intuitive model interpretations. Asymptotic normality and consistency of the maximum partial likelihood estimators are established under certain condition. When the condition is not satisfied simulations are used to explore asymptotic properties. The new method is motivated by and applied to the nadir estimation in nonmonotonic relationships between BMI (body mass index) and allcause mortality. Its performance is compared with that of existing methods, adopting criteria of nadir estimation ability and goodness of fit.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd1719
 Format
 Thesis
 Title
 A Class of MixedDistribution Models with Applications in Financial Data Analysis.
 Creator

Tang, Anqi, Niu, Xufeng, Cheng, Yingmei, Wu, Wei, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Statisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the...
Show moreStatisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the probability of nonzero response; the other part is a linear model that models the mean response given that the outcomes are not zeros. Random effects with AR(1) covariance structure are introduced into both parts of the model to allow serial correlation and subject specific effect. Estimating the twopart model is challenging because of high dimensional integration necessary to obtain the maximum likelihood estimates. We propose a Monte Carlo EM algorithm for estimating the maximum likelihood estimates of parameters. Through simulation study, we demonstrate the good performance of the MCEM method in parameter and standard error estimation. To illustrate, we apply the twopart model with correlated random effects and the model with autoregressive random effects to executive compensation data to investigate potential determinants of CEO stock option grants.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1710
 Format
 Thesis
 Title
 Optimal Linear Representations of Images under Diverse Criteria.
 Creator

Rubinshtein, Evgenia, Srivastava, Anuj, Liu, Xiuwen, Huﬀer, Fred, Chicken, Eric, Department of Statistics, Florida State University
 Abstract/Description

Image analysis often requires dimension reduction before statistical analysis, in order to apply sophisticated procedures. Motivated by eventual applications, a variety of criteria have been proposed: reconstruction error, class separation, nonGaussianity using kurtosis, sparseness, mutual information, recognition of objects, and their combinations. Although some criteria have analytical solutions, the remaining ones require numerical approaches. We present geometric tools for finding linear...
Show moreImage analysis often requires dimension reduction before statistical analysis, in order to apply sophisticated procedures. Motivated by eventual applications, a variety of criteria have been proposed: reconstruction error, class separation, nonGaussianity using kurtosis, sparseness, mutual information, recognition of objects, and their combinations. Although some criteria have analytical solutions, the remaining ones require numerical approaches. We present geometric tools for finding linear projections that optimize a given criterion for a given data set. The main idea is to formulate a problem of optimization on a Grassmann or a Stiefel manifold, and to use differential geometry of the underlying space to construct optimization algorithms. Purely deterministic updates lead to local solutions, and addition of random components allows for stochastic gradient searches that eventually lead to global solutions. We demonstrate these results using several image datasets, including natural images and facial images.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd1926
 Format
 Thesis
 Title
 A Bayesian Approach to MetaRegression: The Relationship Between Body Mass Index and AllCause Mortality.
 Creator

Marker, Mahtab, McGee, Dan, Hurt, Myra, Niu, Xiufeng, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

This thesis presents a Bayesian approach to MetaRegression and Individual Patient Data (IPD) Metaanalysis. The focus of the research is on establishing the relationship between Body Mass Index (BMI) and allcause mortality. This has been an area of continuing interest in the medical and public health communities and no concensus has been reached on what the optimal weight for individuals is. Standards are usually speci ed in terms of body mass index (BMI = wt(kg) over height(m)2 ) which is...
Show moreThis thesis presents a Bayesian approach to MetaRegression and Individual Patient Data (IPD) Metaanalysis. The focus of the research is on establishing the relationship between Body Mass Index (BMI) and allcause mortality. This has been an area of continuing interest in the medical and public health communities and no concensus has been reached on what the optimal weight for individuals is. Standards are usually speci ed in terms of body mass index (BMI = wt(kg) over height(m)2 ) which is associated with body fat percentage. Many studies in the literature have modelled the relationship between BMI and mortality and reported a variety of relationships including Ushaped, Jshaped and linear curves. The aim of my research was to use statistical methods to determine whether we can combine these diverse results an obtain single estimated relationship, using which one can nd the point of minimum mortality and establish reasonable ranges for optimal BMI or how we can best examine the reasons for the heterogeneity of results. Commonly used techniques of Metaanalysis and Metaregression are explored and a problem with the estimation procedure in the multivariate setting is presented. A Bayesian approach using Hierarchical Generalized Linear Mixed Model is suggested and implemented to overcome this drawback of standard estimation techniques. Another area which is explored briefly is that of Individual Patient Data metaanalysis. A Frailty model or Random Effects Proportional Hazards Survival model approach is proposed to carry out IPD metaregression and come up with a single estimated relationship between BMI and mortality, adjusting for the variation between studies.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd2736
 Format
 Thesis
 Title
 Stochastic Models and Inferences for Commodity Futures Pricing.
 Creator

Ncube, Moeti M., Srivastava, Anuj, Doran, James, Mason, Patrick, Niu, Xufeng, Huﬀer, Fred, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis...
Show moreThe stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis, we propose an alternative Kalman Smoother Expectation Maximization procedure (KSEM) that can jointly estimate all the parameters and produces better model t that compared to alternative estimation procedures. Further, we consider the additional complexity of the modeling of jumps or spikes that may occur in a time series. For this calibration we develop a Particle Smoother Expectation Maximization procedure (PSEM) for the optimization of nonlinear systems. This is an entirely new estimation approach, and we provide several examples of it's application.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2707
 Format
 Thesis
 Title
 Spatiotemporal Bayesian Hierarchical Models, with Application to Birth Outcomes.
 Creator

Norton, Jonathan D. (Jonathan David), Niu, Xufeng, Eberstein, Isaac, Huﬀer, Fred, McGee, Daniel, Department of Statistics, Florida State University
 Abstract/Description

A class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear...
Show moreA class of hierarchical Bayesian models is introduced for adverse birth outcomes such as preterm birth, which are assumed to follow a conditional binomial distribution. The logodds of an adverse outcome in a particular county, logit(p(i)), follows a linear model which includes observed covariates and normallydistributed random effects. Spatial dependence between neighboring regions is allowed for by including an intrinsic autoregressive (IAR) prior or an IAR convolution prior in the linear predictor. Temporal dependence is incorporated by including a temporal IAR term also. It is shown that the variance parameters underlying these random effects (IAR, convolution, convolution plus temporal IAR) are identifiable. The same results are also shown to hold when the IAR is replaced by a conditional autoregressive (CAR) model. Furthermore, properties of the CAR parameter ρ are explored. The Deviance Information Criterion (DIC) is considered as a way to compare spatial hierarchical models. Simulations are performed to test whether the DIC can identify whether binomial outcomes come from an IAR, an IAR convolution, or independent normal deviates. Having established the theoretical foundations of the class of models and validated the DIC as a means of comparing models, we examine preterm birth and low birth weight counts in the state of Arkansas from 1994 to 2005. We find that preterm birth and low birth weight have different spatial patterns of risk, and that rates of low birth weight can be fit with a strikingly simple model that includes a constant spatial effect for all periods, a linear trend, and three covariates. It is also found that the risks of each outcome are increasing over time, even with adjustment for covariates.
Show less  Date Issued
 2008
 Identifier
 FSU_migr_etd2523
 Format
 Thesis
 Title
 Functional Component Analysis and Regression Using Elastic Methods.
 Creator

Tucker, J. Derek, Srivastava, Anuj, Wu, Wei, Klassen, Eric, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Constructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability...
Show moreConstructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability leads to a loss of structure in the data, and inefficiency in data models. Moreover, most methods use a "preprocessing'' alignment step to remove the phasevariability; without considering a more natural joint solution. This dissertation presents three approaches to this problem. The first relies on separating the phase (xaxis) and amplitude (yaxis), then modeling these components using joint distributions. This separation in turn, is performed using a technique called elastic alignment of functions that involves a new mathematical representation of functional data. Then, using individual principal components, one for each phase and amplitude components, it imposes joint probability models on principal coefficients of these components while respecting the nonlinear geometry of the phase representation space. The second combines the phasevariability into the objective function for two component analysis methods, functional principal component analysis and functional principal least squares. This creates a more complete solution, as the phasevariability is removed while simultaneously extracting the components. The third approach combines the phasevariability into the functional linear regression model and then extends the model to logistic and multinomial logistic regression. Through incorporating the phasevariability a more parsimonious regression model is obtained and therefore, more accurate prediction of observations is achieved. These models then are easily extended from functional data to curves (which are essentially functions in R2) to perform regression with curves as predictors. These ideas are demonstrated using random sampling for models estimated from simulated and real datasets, and show their superiority over models that ignore phaseamplitude separation. Furthermore, the models are applied to classification of functional data and achieve high performance in applications involving SONAR signals of underwater objects, handwritten signatures, periodic body movements recorded by smart phones, and physiological data.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9106
 Format
 Thesis
 Title
 Age Effects in the Extinction of Planktonic Foraminifera: A New Look at Van Valen's Red Queen Hypothesis.
 Creator

Wiltshire, Jelani, Huﬀer, Fred, Parker, William, Chicken, Eric, Sinha, Debajyoti, Department of Statistics, Florida State University
 Abstract/Description

Van Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic...
Show moreVan Valen's Red Queen hypothesis states that within a homogeneous taxonomic group the age is statistically independent of the rate of extinction. The case of the Red Queen hypothesis being addressed here is when the homogeneous taxonomic group is a group of similar species. Since Van Valen's work, various statistical approaches have been used to address the relationship between taxon duration (age) and the rate of extinction. Some of the more recent approaches to this problem using Planktonic Foraminifera (Foram) extinction data include Weibull and Exponential modeling (Parker and Arnold, 1997), and Cox proportional hazards modeling (Doran et al. 2004,2006). I propose a general class of test statistics that can be used to test for the effect of age on extinction. These test statistics allow for a varying background rate of extinction and attempt to remove the effects of other covariates when assessing the effect of age on extinction. No model is assumed for the covariate effects. Instead I control for covariate effects by pairing or grouping together similar species. I use simulated data sets to compare the power of the statistics. In applying the test statistics to the Foram data, I have found age to have a positive effect on extinction.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd0952
 Format
 Thesis
 Title
 Multistate Intensity Model with ARGARCH Random Effect for Corporate Credit Rating Transition Analysis.
 Creator

Li, Zhi, Niu, Xufeng, Huﬀer, Fred, Kercheval, Alec, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more...
Show moreThis thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more accurate models on high frequency financial data. The corporate rating transition modeling was historically dealing with low frequency data which did not have the need to specify the volatility. However, the newly published Moody's market implied ratings are exhibiting much higher transition frequencies. Therefore, we feel that it is necessary to capture the volatility component and make extensions to existing models to reflect this fact. The theoretical model specification and estimation details are discussed thoroughly in this dissertation. The performance of our models is studied on several simulated data sets and compared to the original model. Finally, the models are applied to both Moody's issuer rating and market implied rating transition data as an application.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1426
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 Quasi3D Statistical Inversion of Oceanographic Tracer Data.
 Creator

Herbei, Radu, Speer, Kevin, Wegkamp, Marten, Laurent, Louis St., Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error....
Show moreWe perform a quasi3D Bayesian inversion of oceanographic tracer data from the South Atlantic Ocean. Initially we are considering one active neutral density layer with an upper and lower boundary. The available hydrographic data is linked to model parameters (water velocities, diffusion coefficients) via a 3D advectiondiffusion equation. A robust solution to the inverse problem considered can be attained by introducing prior information about parameters and modeling the observation error. This approach estimates both horizontal and vertical flow as well as diffusion coefficients. We find a system of alternating zonal jets at the depths of the North Atlantic Deep Water, consistent with direct measurements of flow and concentration maps. A uniqueness analysis of our model is performed in terms of the oxygen consumption rate. The vertical mixing coefficient bears some relation to the bottom topography even though we do not incorporate that into our model. We extend the method to a multilayer model, using thermal wind relations weakly in a local fashion (as opposed to integrating the entire water column) to connect layers vertically. Results suggest that the estimated deep zonal jets extend vertically, with a clear depth dependent structure. The vertical structure of the flow field is modified by the tracer fields over that set a priori by thermal wind. Our estimates are consistent with observed flow at the depths of the Antarctic Intermediate Water; at still shallower depths, above the layers considered here, the subtropical gyre is a significant feature of the horizontal flow.
Show less  Date Issued
 2006
 Identifier
 FSU_migr_etd4101
 Format
 Thesis