Current Search: Research Repository (x) » Statistics (x) » Wu, Wei (x)
Search results
 Title
 2D Affine and Projective Shape Analysis, and Bayesian Elastic Active Contours.
 Creator

Bryner, Darshan W., Srivastava, Anuj, Klassen, Eric, Gallivan, Kyle, Huffer, Fred, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

An object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian...
Show moreAn object of interest in an image can be characterized to some extent by the shape of its external boundary. Current techniques for shape analysis consider the notion of shape to be invariant to the similarity transformations (rotation, translation and scale), but often times in 2D images of 3D scenes, perspective effects can transform shapes of objects in a more complicated manner than what can be modeled by the similarity transformations alone. Therefore, we develop a general Riemannian framework for shape analysis where metrics and related quantities are invariant to larger groups, the affine and projective groups, that approximate such transformations that arise from perspective skews. Highlighting two possibilities for representing object boundaries  ordered points (or landmarks) and parametrized curves  we study different combinations of these representations (points and curves) and transformations (affine and projective). Specifically, we provide solutions to three out of four situations and develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussiantype statistical models, and classifying test shapes using such models learned from training data. In the case of parametrized curves, an added issue is to obtain invariance to the reparameterization group. The geodesics are constructed by particularizing the pathstraightening algorithm to geometries of current manifolds and are used, in turn, to compute shape statistics and Gaussiantype shape models. We demonstrate these ideas using a number of examples from shape and activity recognition. After developing such Gaussiantype shape models, we present a variational framework for naturally incorporating these shape models as prior knowledge in guidance of active contours for boundary extraction in images. This socalled Bayesian active contour framework is especially suitable for images where boundary estimation is difficult due to low contrast, low resolution, and presence of noise and clutter. In traditional active contour models curves are driven towards minimum of an energy composed of image and smoothing terms. We introduce an additional shape term based on shape models of prior known relevant shape classes. The minimization of this total energy, using iterated gradientbased updates of curves, leads to an improved segmentation of object boundaries. We demonstrate this Bayesian approach to segmentation using a number of shape classes in many imaging scenarios including the synthetic imaging modalities of SAS (synthetic aperture sonar) and SAR (synthetic aperture radar), which are notoriously difficult to obtain accurate boundary extractions. In practice, the training shapes used for priorshape models may be collected from viewing angles different from those for the test images and thus may exhibit a shape variability brought about by perspective effects. Therefore, by allowing for a prior shape model to be invariant to, say, affine transformations of curves, we propose an active contour algorithm where the resulting segmentation is robust to perspective skews.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd8534
 Format
 Thesis
 Title
 Bayesian Tractography Using Geometric Shape Priors.
 Creator

Dong, Xiaoming, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Diffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some...
Show moreDiffusionweighted image(DWI) and tractography have been developed for decades and are key elements in recent, largescale efforts for mapping the human brain. The two techniques together provide us a unique possibility to access the macroscopic structure and connectivity of the human brain noninvasively and in vivo. The information obtained not only can help visualize brain connectivity and help segment the brain into different functional areas but also provides tools for understanding some major cognitive diseases such as multiple sclerosis, schizophrenia, epilepsy, etc. There are lots of efforts have been put into this area. On the one hand, a vast spectrum of tractography algorithms have been developed in recent years, ranging from deterministic approaches through probabilistic methods to global tractography; On the other hand, various mathematical models, such as diffusion tensor, multitensor model, spherical deconvolution, Qball modeling, have been developed to better exploit the acquisition dependent signal of Diffusionweighted image(DWI). Despite considerable progress in this area, current methods still face many challenges, such as sensitive to noise, lots of false positive/negative fibers, incapable of handling complex fiber geometry and expensive computation cost. More importantly, recent researches have shown that, even with highquality data, the results using current tractography methods may not be improved, suggesting that it is unlikely to obtain an anatomically accurate map of the human brain solely based on the diffusion profile. Motivated by these issues, this dissertation develops a global approach that incorporates anatomical validated geometric shape prior when reconstructing neuron fibers. The fiber tracts between regions of interest are initialized and updated via deformations based on gradients of the posterior energy defined in this paper. This energy has contributions from diffusion data, shape prior information, and roughness penalty. The dissertation first describes and demonstrates the proposed method on the 2D dataset and then extends it to 3D Phantom data and the real brain data. The results show that the proposed method is relatively immune to issues such as noise, complicated fiber structure like fiber crossings and kissing, false positive fibers, and achieve more explainable tractography results.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_DONG_fsu_0071E_15144
 Format
 Thesis
 Title
 A Class of MixedDistribution Models with Applications in Financial Data Analysis.
 Creator

Tang, Anqi, Niu, Xufeng, Cheng, Yingmei, Wu, Wei, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Statisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the...
Show moreStatisticians often encounter data in the form of a combination of discrete and continuous outcomes. A special case is zeroinflated longitudinal data where the response variable has a large portion of zeros. These data exhibit correlation because observations are obtained on the same subjects over time. In this dissertation, we propose a twopart mixed distribution model to model zeroinflated longitudinal data. The first part of the model is a logistic regression model that models the probability of nonzero response; the other part is a linear model that models the mean response given that the outcomes are not zeros. Random effects with AR(1) covariance structure are introduced into both parts of the model to allow serial correlation and subject specific effect. Estimating the twopart model is challenging because of high dimensional integration necessary to obtain the maximum likelihood estimates. We propose a Monte Carlo EM algorithm for estimating the maximum likelihood estimates of parameters. Through simulation study, we demonstrate the good performance of the MCEM method in parameter and standard error estimation. To illustrate, we apply the twopart model with correlated random effects and the model with autoregressive random effects to executive compensation data to investigate potential determinants of CEO stock option grants.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd1710
 Format
 Thesis
 Title
 A Class of Semiparametric Volatility Models with Applications to Financial Time Series.
 Creator

Chung, Steve S., Niu, XuFeng, Gallivan, Kyle, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied...
Show moreThe autoregressive conditional heteroskedasticity (ARCH) and generalized autoregressive conditional heteroskedasticity (GARCH) models take the dependency of the conditional second moments. The idea behind ARCH/GARCH model is quite intuitive. For ARCH models, past squared innovations describes the present squared volatility. For GARCH models, both squared innovations and the past squared volatilities define the present volatility. Since their introduction, they have been extensively studied and well documented in financial and econometric literature and many variants of ARCH/GARCH models have been proposed. To list a few, these include exponential GARCH(EGARCH), GJRGARHCH(or threshold GARCH), integrated GARCH(IGARCH), quadratic GARCH(QGARCH), and fractionally integrated GARCH(FIGARCH). The ARCH/GARCH models and their variant models have gained a lot of attention and they are still popular choice for modeling volatility. Despite their popularity, they suffer from model flexibility. Volatility is a latent variable and hence, putting a specific model structure violates this latency assumption. Recently, several attempts have been made in order to ease the strict structural assumptions on volatility. Both nonparametric and semiparametric volatility models have been proposed in the literature. We review and discuss these modeling techniques in detail. In this dissertation, we propose a class of semiparametric multiplicative volatility models. We define the volatility as a product of parametric and nonparametric parts. Due to the positivity restriction, we take the log and square transformations on the volatility. We assume that the parametric part is GARCH(1,1) and it serves as a initial guess to the volatility. We estimate GARCH(1,1) parameters by using conditional likelihood method. The nonparametric part assumes an additive structure. There may exist some loss of interpretability by assuming an additive structure but we gain flexibility. Each additive part is constructed from a sieve of Bernstein basis polynomials. The nonparametric component acts as an improvement for the parametric component. The model is estimated from an iterative algorithm based on boosting. We modified the boosting algorithm (one that is given in Friedman 2001) such that it uses a penalized least squares method. As a penalty function, we tried three different penalty functions: LASSO, ridge, and elastic net penalties. We found that, in our simulations and application, ridge penalty worked the best. Our semiparametric multiplicative volatility model is evaluated using simulations and applied to the six major exchange rates and SP 500 index. The results show that the proposed model outperforms the existing volatility models in both insample estimation and outofsample prediction.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd8756
 Format
 Thesis
 Title
 Comparative mRNA Expression Analysis Leveraging Known Biochemical Interactions.
 Creator

Steppi, Albert Joseph, Zhang, Jinfeng, Sang, QingXiang, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

We present two studies incorporating existing biological knowledge into differential gene expression analysis that attempt to place the results within a broader biological context. The studies investigate breast cancer health disparity between differing ethnic groups by comparing gene expression levels in tumor samples from patients from different ethnic populations. We incorporate existing knowledge by making comparisons not just between individual genes, but between sets of related genes...
Show moreWe present two studies incorporating existing biological knowledge into differential gene expression analysis that attempt to place the results within a broader biological context. The studies investigate breast cancer health disparity between differing ethnic groups by comparing gene expression levels in tumor samples from patients from different ethnic populations. We incorporate existing knowledge by making comparisons not just between individual genes, but between sets of related genes and networks of interacting genes. In the first study, a comparison is made between mRNA expression patterns in Asian and Caucasian American breast cancer samples in an attempt to better understand why there are significantly lower breast cancer incidence and mortality rates in Asian Americans compared to Caucasian Americans. In the second study, the expression levels of genes related to drug and xenobiotic metabolizing enzymes (DXME) are compared between African, Asian, and Caucasian American breast cancer patients. The expression of genes related to these enzymes has been found to significantly affect drug clearance and the onset of drug resistance. Both studies found differentially expressed genes and pathways that may be associated with health disparities between the three ethnic populations. A thorough investigation of the literature was made in order to understand the context in which these differences in gene expression could affect the development and progression of breast tumors, and to identify genes and pathways that may be differentially expressed between the ethnic groups in general but not associated with breast cancer. Many of the relevant differences in gene expression were found to be linked to factors such as diet and differences in body composition. The process of finding relevant pathways and sets of interacting genes to inform comparative mRNA expression analysis can be laborious and time consuming. The literature is expanding at an exponential rate, and there is little hope for research groups to be able to keep up with all of the latest research. It is becoming more common for journals to require authors to make their results available in public databases, but many results concerning biochemical interactions are only accessible in unstructured text. Extracting relationships and interactions from the biological literature using techniques from machine learning and natural language processing is an important and growing field of research. To gain a better understanding of this field, we participated in the BioCreative VI Track 4 challenge, which involved classifying PubMed abstracts that contain examples of proteinprotein interactions that are affected by a mutation. We discuss the model we developed and the lessons learned while participating in the competition. The problem of acquiring sufficient quantities of quality labeled data is a great obstacle preventing the improvement of performance. We present a web application we are developing to streamline the annotation of entityentity interactions in text. It makes use of a database of known interactions to locate passages that are likely to be relevant and offers a simple and concise user interface to minimize the cognitive burden on the annotator.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Steppi_fsu_0071E_14522
 Format
 Thesis
 Title
 Developing SRSF Shape Analysis Techniques for Applications in Neuroscience and Genomics.
 Creator

Wesolowski, Sergiusz, Wu, Wei, Bertram, R. (Richard), Srivastava, Anuj, Beerli, Peter, Mio, Washington, Florida State University, College of Arts and Sciences, Department of...
Show moreWesolowski, Sergiusz, Wu, Wei, Bertram, R. (Richard), Srivastava, Anuj, Beerli, Peter, Mio, Washington, Florida State University, College of Arts and Sciences, Department of Mathematics
Show less  Abstract/Description

Dissertation focuses on exploring the capabilities of the SRSF statistical shape analysis framework through various applications. Each application gives rise to a specific mathematical shape analysis model. The theoretical investigation of the models, driven by real data problems, give rise to new tools and theorems necessary to conduct a sound inference in the space of shapes. From theoretical standpoint the robustness results are provided for the model parameters estimation and an ANOVA...
Show moreDissertation focuses on exploring the capabilities of the SRSF statistical shape analysis framework through various applications. Each application gives rise to a specific mathematical shape analysis model. The theoretical investigation of the models, driven by real data problems, give rise to new tools and theorems necessary to conduct a sound inference in the space of shapes. From theoretical standpoint the robustness results are provided for the model parameters estimation and an ANOVAlike statistical testing procedure is discussed. The projects were a result of the collaboration between theoretical and applicationfocused research groups: the Shape Analysis Group at the Department of Statistics at Florida State University, the Center of Genomics and Personalized Medicine at FSU and the FSU's Department of Neuroscience. As a consequence each of the projects consists of two aspects—the theoretical investigation of the mathematical model and the application driven by a real life problem. The applications components, are similar from the data modeling standpoint. In each case the problem is set in an infinite dimensional space, elements of which are experimental data points that can be viewed as shapes. The three projects are: ``A new framework for Euclidean summary statistics in the neural spike train space''. The project provides a statistical framework for analyzing the spike train data and a new noise removal procedure for neural spike trains. The framework adapts the SRSF elastic metric in the space of point patterns to provides a new notion of the distance. ``SRSF shape analysis for sequencing data reveal new differentiating patterns''. This project uses the shape interpretation of the Next Generation Sequencing data to provide a new point of view of the exon level gene activity. The novel approach reveals a new differential gene behavior, that can't be captured by the stateofthe art techniques. Code is available online on github repository. ``How changes in shape of nucleosomal DNA near TSS influence changes of gene expression''. The result of this work is the novel shape analysis model explaining the relation between the change of the DNA arrangement on nucleosomes and the change in the differential gene expression.
Show less  Date Issued
 2017
 Identifier
 FSU_FALL2017_Wesolowski_fsu_0071E_14177
 Format
 Thesis
 Title
 Elastic Functional Principal Component Analysis for Modeling and Testing of Functional Data.
 Creator

Duncan, Megan, Srivastava, Anuj, Klassen, E., Huffer, Fred W., Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Statistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For...
Show moreStatistical analysis of functional data requires tools for comparing, summarizing and modeling observed functions as elements of a function space. A key issue in Functional Data Analysis (FDA) is the presence of the phase variability in the observed data. A successful statistical model of functional data has to account for the presence of phase variability. Otherwise the ensuing inferences can be inferior. Recent methods for FDA include steps for phase separation or functional alignment. For example, Elastic Functional Principal Component Analysis (Elastic FPCA) uses the strengths of Functional Principal Component Analysis (FPCA), along with the tools from Elastic FDA, to perform joint phaseamplitude separation and modeling. A related problem in FDA is to quantify and test for the amount of phase in a given data. We develop two types of hypothesis tests for testing the significance of phase variability: a metricbased approach and a modelbased approach. The metricbased approach treats phase and amplitude as independent components and uses their respective metrics to apply the FriedmanRafsky Test, Schilling's Nearest Neighbors, and Energy Test to test the differences between functions and their amplitudes. In the modelbased test, we use Concordance Correlation Coefficients as a tool to quantify the agreement between functions and their reconstructions using FPCA and Elastic FPCA. We demonstrate this framework using a number of simulated and real data, including weather, tecator, and growth data.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Duncan_fsu_0071E_14470
 Format
 Thesis
 Title
 Elastic Functional Regression Model.
 Creator

Ahn, Kyungmin, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Functional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalaronfunction regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution...
Show moreFunctional variables serve important roles as predictors in a variety of pattern recognition and vision applications. Focusing on a specific subproblem, termed scalaronfunction regression, most current approaches adopt the standard L2 inner product to form a link between functional predictors and scalar responses. These methods may perform poorly when predictor functions contain nuisance phase variability, i.e., predictors are temporally misaligned due to noise. While a simple solution could be to prealign predictors as a preprocessing step, before applying a regression model, this alignment is seldom optimal from the perspective of regression. In this dissertation, we propose a new approach, termed elastic functional regression, where alignment is included in the regression model itself, and is performed in conjunction with the estimation of other model parameters. This model is based on a normpreserving warping of predictors, not the standard time warping of functions, and provides better prediction in situations where the shape or the amplitude of the predictor is more useful than its phase. We demonstrate the effectiveness of this framework using simulated and real data.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Ahn_fsu_0071E_14452
 Format
 Thesis
 Title
 Estimation and Sequential Monitoring of Nonlinear Functional Responses Using Wavelet Shrinkage.
 Creator

Cuevas, Jordan, Chicken, Eric, Sobanjo, John, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Statistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector)....
Show moreStatistical process control (SPC) is widely used in industrial settings to monitor processes for shifts in their distributions. SPC is generally thought of in two distinct phases: Phase I, in which historical data is analyzed in order to establish an incontrol process, and Phase II, in which new data is monitored for deviations from the incontrol form. Traditionally, SPC had been used to monitor univariate (multivariate) processes for changes in a particular parameter (parameter vector). Recently however, technological advances have resulted in processes in which each observation is actually an ndimensional functional response (referred to as a profile), where n can be quite large. Additionally, these profiles are often unable to be adequately represented parametrically, making traditional SPC techniques inapplicable. This dissertation starts out by addressing the problem of nonparametric function estimation, which would be used to analyze process data in a PhaseI setting. The translation invariant wavelet estimator (TI) is often used to estimate irregular functions, despite the drawback that it tends to oversmooth jumps. A trimmed translation invariant estimator (TTI) is proposed, of which the TI estimator is a special case. By reducing the point by point variability of the TI estimator, TTI is shown to retain the desirable qualities of TI while improving reconstructions of functions with jumps. Attention is then turned to the PhaseII problem of monitoring sequences of profiles for deviations from incontrol. Two profile monitoring schemes are proposed; the first monitors for changes in the noise variance using a likelihood ratio test based on the highest detail level of wavelet coefficients of the observed profile. The second offers a semiparametric test to monitor for changes in both the functional form and noise variance. Both methods make use of wavelet shrinkage in order to distinguish relevant functional information from noise contamination. Different forms of each of these test statistics are proposed and results are compared via Monte Carlo simulation.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4788
 Format
 Thesis
 Title
 A Framework for Comparing Shape Distributions.
 Creator

Henning, Wade, Srivastava, Anuj, Alamo, Ruﬁna G., Huﬀer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

The problem of comparisons of shape populations is present in many branches of science, including nanomanufacturing, medical imaging, particle analysis, fisheries, seed science, and computer vision. Researchers in these fields have traditionally characterized the profiles in these sets using combinations of scalar valued descriptor features, like aspect ratio or roughness, whose distributions are easy to compare using classical statistics. However, there is a desire in this community for a...
Show moreThe problem of comparisons of shape populations is present in many branches of science, including nanomanufacturing, medical imaging, particle analysis, fisheries, seed science, and computer vision. Researchers in these fields have traditionally characterized the profiles in these sets using combinations of scalar valued descriptor features, like aspect ratio or roughness, whose distributions are easy to compare using classical statistics. However, there is a desire in this community for a single comprehensive feature that uniquely defines these profiles. The shape of the profile itself is such a feature. Shape features have traditionally been studied as individuals, and comparing distributions underlying sets of shapes is challenging. Since the data comes in the form of samples from shape populations, we use kernel methods to estimate underlying shape densities. We then take a metric approach to define a proper distance, termed the FisherRao distance, to quantify differences between any two densities. This distance can be used for clustering, classification and other types of statistical modeling; however, this dissertation focuses on comparing shape populations as a classical twosample hypothesis test with populations characterized by respective probability densities on shape space. Since we are interested in the shapes of planar closed curves and the space of such curves is infinite dimensional, there are some theoretical issues in defining and estimating densities on this space. We therefore use a spherical multidimensional scaling algorithm to project shape distributions to the unit twosphere, and this allows us to use a von MisesFisher kernel for density estimation. The estimated densities are then compared using the FisherRao distance, which, in turn, is estimated using Monte Carlo methods. This distance estimate is used as a test statistic for the twosample hypothesis test mentioned above. We use a bootstrap approach to perform the test and to evaluate population classification performance. We demonstrate these ideas using applications from industrial and chemical engineering.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9185
 Format
 Thesis
 Title
 Functional Component Analysis and Regression Using Elastic Methods.
 Creator

Tucker, J. Derek, Srivastava, Anuj, Wu, Wei, Klassen, Eric, Huﬀer, Fred, Department of Statistics, Florida State University
 Abstract/Description

Constructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability...
Show moreConstructing generative models for functional observations is an important task in statistical function analysis. In general, functional data contains both phase (or x or horizontal) and amplitude (or y or vertical) variability. Traditional methods often ignore the phase variability and focus solely on the amplitude variation, using crosssectional techniques such as functional principal component analysis for dimensional reduction and regression for data modeling. Ignoring phase variability leads to a loss of structure in the data, and inefficiency in data models. Moreover, most methods use a "preprocessing'' alignment step to remove the phasevariability; without considering a more natural joint solution. This dissertation presents three approaches to this problem. The first relies on separating the phase (xaxis) and amplitude (yaxis), then modeling these components using joint distributions. This separation in turn, is performed using a technique called elastic alignment of functions that involves a new mathematical representation of functional data. Then, using individual principal components, one for each phase and amplitude components, it imposes joint probability models on principal coefficients of these components while respecting the nonlinear geometry of the phase representation space. The second combines the phasevariability into the objective function for two component analysis methods, functional principal component analysis and functional principal least squares. This creates a more complete solution, as the phasevariability is removed while simultaneously extracting the components. The third approach combines the phasevariability into the functional linear regression model and then extends the model to logistic and multinomial logistic regression. Through incorporating the phasevariability a more parsimonious regression model is obtained and therefore, more accurate prediction of observations is achieved. These models then are easily extended from functional data to curves (which are essentially functions in R2) to perform regression with curves as predictors. These ideas are demonstrated using random sampling for models estimated from simulated and real datasets, and show their superiority over models that ignore phaseamplitude separation. Furthermore, the models are applied to classification of functional data and achieve high performance in applications involving SONAR signals of underwater objects, handwritten signatures, periodic body movements recorded by smart phones, and physiological data.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9106
 Format
 Thesis
 Title
 Generalized Mahalanobis Depth in Point Process and Its Application in Neural Coding and SemiSupervised Learning in Bioinformatics.
 Creator

Liu, Shuyi, Wu, Wei, Wang, Xiaoqiang, Zhang, Jinfeng, Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters...
Show moreIn the first project, we propose to generalize the notion of depth in temporal point process observations. The new depth is defined as a weighted product of two probability terms: 1) the number of events in each process, and 2) the centeroutward ranking on the event times conditioned on the number of events. In this study, we adopt the Poisson distribution for the first term and the Mahalanobis depth for the second term. We propose an efficient bootstrapping approach to estimate parameters in the defined depth. In the case of Poisson process, the observed events are order statistics where the parameters can be estimated robustly with respect to sample size. We demonstrate the use of the new depth by ranking realizations from a Poisson process. We also test the new method in classification problems using simulations as well as real neural spike train data. It is found that the new framework provides more accurate and robust classifications as compared to commonly used likelihood methods. In the second project, we demonstrate the value of semisupervised dimension reduction in clinical area. The advantage of semisupervised dimension reduction is very easy to understand. SemiSupervised dimension reduction method adopts the unlabeled data information to perform dimension reduction and it can be applied to help build a more precise prediction model comparing with common supervised dimension reduction techniques. After thoroughly comparing with dimension embedding methods with label data only, we show the improvement of semisupervised dimension reduction with unlabeled data in breast cancer chemotherapy clinical area. In our semisupervised dimension reduction method, we not only explore adding unlabeled data to linear dimension reduction such as PCA, we also explore semisupervised nonlinear dimension reduction, such as semisupervised LLE and semisupervised Isomap.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Liu_fsu_0071E_14367
 Format
 Thesis
 Title
 Geometric Approaches for Analysis of Images, Densities and Trajectories on Manifolds.
 Creator

Zhang, Zhengwu, Srivastava, Anuj, Klassen, E. (Eric), Wu, Wei, Pati, Debdeep, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis....
Show moreIn this dissertation, we focus on the problem of analyzing highdimensional functional data using geometric approaches. The term functional data refers to images, densities and trajectories on manifolds. The nature of these data imposes difficulties on statistical analysis. First, the objects are functional type of data which are infinite dimensional. One needs to explore the possible representations of each type such that the representations can facilitate the future statistical analysis. Second, the representation spaces are often nonlinear manifolds. Thus, proper Riemannian structures are necessary to compare objects. Third, the analysis and comparison of objects need be invariant to certain nuisance variables. For example, comparison between two images should be invariant to their blur levels, and comparison between timeindexed trajectories on manifolds should be invariant to their temporal evaluation rates. We start by introducing frameworks for representing, comparing and analyzing functions in Euclidean space including signals, images and densities, and the comparisons are invariant to the Gaussian blur existed in these objects. Applications in blur levels matching, blurred image recognition, image classification and twosample hypothesis test are discussed. Next, we present frameworks for analyzing longitudinal trajectories on a manifold M, while the analysis is invariant to the reparameterization action (temporal variation). Particularly, we are interested in analyzing trajectories in two manifolds: the twosphere and the set of symmetric positivedefinite matrices. Applications such as bird migration and hurricane tracks analysis, visual speech recognition and hand gesture recognition are used to demonstrate the advantages of the proposed frameworks. In the end, a Bayesian framework for clustering of shapes of curves is presented, and examples of clustering cell shapes and protein structures are discussed.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9503
 Format
 Thesis
 Title
 Intensity Estimation in Poisson Processes with Phase Variability.
 Creator

Gordon, Glenna, Wu, Wei, Whyte, James, Srivastava, Anuj, Chicken, Eric, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Intensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a...
Show moreIntensity estimation for Poisson processes is a classical problem and has been extensively studied over the past few decades. However, current methods of intensity estimation assume phase variability or compositional noise, i.e. a nonlinear shift along the time axis, is nonexistent in the data which is an unreasonable assumption for practical observations. The key challenge is that these observations are not "aligned'', and registration procedures are required for successful estimation. As a result, these estimation methods can yield estimators that are inefficient or that underperform in simulations and applications. This dissertation summarizes two key projects which examine estimation of the intensity of a Poisson process in the presence of phase variability. The first project proposes an alignmentbased framework for intensity estimation. First, it is shown that the intensity function is areapreserved with respect to compositional noise. Such a property implies that the time warping is only encoded in the density, or normalized intensity, function. Then, the intensity function can be decomposed into the product of the estimated total intensity (a scalar value) and the estimated density function. The estimation of the density relies on a metric which measures the phase difference between two density functions. An asymptotic study shows that the proposed estimation algorithm provides a consistent estimator for the normalized intensity. The success of the proposed estimation algorithm is illustrated using two simulations and the new framework is applied in a real data set of neural spike trains, showing that the proposed estimation method yields improved classification accuracy over previous methods. The second project utilizes 2014 Florida data from the Healthcare Cost and Utilization Project's State Inpatient Database and State Emergency Department Database (provided to the U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality by the Florida Agency for Health Care Administration) to examine heart failure emergency department arrival times. Current estimation methods for examining emergency department arrival data ignore the functional nature of the data and implement naive analysis methods. In this dissertation, the arrivals are treated as a Poisson process and the intensity of the process is estimated using existing density estimation and function registration methods. The results of these analyses show the importance of considering the functional nature of emergency department arrival data and the critical role that function registration plays in the intensity estimation of the arrival process.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Gordon_fsu_0071E_13511
 Format
 Thesis
 Title
 Investigating the Use of Mortality Data as a Surrogate for Morbidity Data.
 Creator

Miller, Gregory, Hollander, Myles, McGee, Daniel, Hurt, Myra, Wu, Wei, Zhang, Jinfeng, Department of Statistics, Florida State University
 Abstract/Description

We are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate...
Show moreWe are interested in differences between risk models based on Coronary Heart Disease (CHD) incidence, or morbidity, compared to risk models based on CHD death. Risk models based on morbidity have been developed based on the Framingham Heart Study, while the European SCORE project developed a risk model for CHD death. Our goal is to determine whether these two developed models differ in treatment decisions concerning patient heart health. We begin by reviewing recent metrics in surrogate variables and prognostic model performance. We then conduct bootstrap hypotheses tests between two Cox proportional hazards models using Framingham data, one with incidence as a response, and one with death as a response, and find that the coefficients differ for the age covariate, but find no significant differences for the other risk factors. To understand how surrogacy can be applied to our case, where the surrogate variable is nested within the true variable of interest, we examine models based on a composite event compared to models based on singleton events. We also conduct a simulation, simulating times to a CHD incidence and time from CHD incidence to CHD death, censoring at 25 years to represent the end of a study. We compare a Cox model with death response with a Cox model based on incidence using bootstrapped confidence intervals, and find that age and systolic blood pressure have differences with their covariates. We continue the simulation by using Net Reclassification Index (NRI) to evaluate the treatment decision performance of the two models, and find that the two models do not perform significantly different in correctly classifying events, if the decisions are based on the risk ranks of the individuals. As long as the relative order of patients' risks is preserved across different risk models, treatment decisions based on classifying an upper specified percent as high risk will not be significantly different. We conclude the dissertation with statements about future methods for approaching our question.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd2408
 Format
 Thesis
 Title
 Matched Sample Based Approach for CrossPlatform Normalization on Gene Expression Data.
 Creator

Shao, Jiang, Zhang, Jinfeng, Sang, QingXiang Amy, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Geneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to...
Show moreGeneexpression data profile are widely used in all kinds of biomedical studies especially in cancer research. This dissertation work focus on solving the problem of how to combine datasets arising from different studies. Of particular interest is how to remove platform effect alone. The matched sample based crossplatform normalization method we developed are designed to tackle data merging problem in two scenarios: The first is affyagilent crossplatform normalization which are belong to classic microarray gene expression profile. The second is the integration of microarray data with Next Generation Sequencing genome data. We use several general validation measures to assess and compare with the popular Distanceweighted discrimination method. With the public webbased tool NCI60 CellMiner and The Cancer Genome Atlas data portal supported, our proposed method outperformed DWD in both crossplatform scenarios. It can be further assessed by the ability of exploring biological features in the studies of cancer type discrimination. We applied our method onto two classification problem: One is Breast cancer tumor/normal status classification on microarray and next generation sequencing datasets; The other is Breast cancer patients chemotherapy response classification on GPL96 and GPL570 microarray datasets. Both problems show the classification power are increased after our matched sample based crossplatform normalization method.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Shao_fsu_0071E_12833
 Format
 Thesis
 Title
 A MatchedSampleBased Normalization Method: CrossPlatform Microarray and NGS Data Integration.
 Creator

Zhang, Se Rin, Zhang, Jinfeng, Sang, QingXiang, Wu, Wei, Niu, Xufeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Utilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and...
Show moreUtilizing high throughput gene expression data stored in public archives not only saves research time and cost but also enhances the power of its statistical support. However, gene expression profiling data can be obtained from many different technical platforms. Same gene expressions quantified by different platforms have different distributional properties, which makes the data integration across multiple platforms challenging. Several crossplatform normalization methods developed and tried to remove the differences caused by the platform discrepancy but they also remove the important biological signals as well. Zhang and Jiang (2015) introduced a new method focusing on eliminating platform effect among systematic effects by employing matched samples which are measured by different platforms for getting a benchmark model. Since the matched sample have no biological difference, their approach is robust to get rid of solely the platform effect. They showed that the new method performs better than Distance Weighted Discrimination (DWD) method. This paper is a followup study of their work and we attempt to improve the new method by incorporating Fast Linear Mixed Regression (FLMER) model. The result indicates that the FLMER model works better than the original proposed model, OLS (Ordinary Least Squares) model in afternormalization concordance comparison and Differential Expression(DE) analysis. Also, we compare our methods to other existing crossplatform normalization methods not only DWD but also Empirical Bayes methods, XPN and GQ methods. The results showed that the proposed method performs much better than other crossplatform normalization methods for removing platform differences and keeping the biological information.
Show less  Date Issued
 2018
 Identifier
 2018_Fall_Zhang_fsu_0071E_14868
 Format
 Thesis
 Title
 MixedEffects Models for Count Data with Applications to Educational Research.
 Creator

Shin, Jihyung, Niu, Xufeng, Hu, Shouping, Al Otaiba, Stephanie Dent, McGee, Daniel, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with...
Show moreThis research is motivated by an analysis of reading research data. We are interested in modeling the test outcome of ability to fluently recode letters into sounds of kindergarten children aged between 5 and 7. The data showed excessive zero scores (more than 30% of children) on the test. In this dissertation, we carefully examine the models dealing with excessive zeros, which are based on the mixture of distributions, a distribution with zeros and a standard probability distribution with non negative values. In such cases, a log normal variable or a Poisson random variable is often observed with probability from semicontinuous data or count data. The previously proposed models, mixedeffects and mixeddistribution models (MEMD) by Tooze(2002) et al. for semicontinuous data and zeroinflated Poisson (ZIP) regression models by Lambert(1992) for count data are reviewed. We apply zeroinflated Poisson models to repeated measures data of zeroinflated data by introducing a pair of possibly correlated random effects to the zeroinflated Poisson model to accommodate withinsubject correlation and between subject heterogeneity. The model describes the effect of predictor variables on the probability of nonzero responses (occurrence) and mean of nonzero responses (intensity) separately. The likelihood function is maximized using dual quasiNewton optimization of an approximated by adaptive Gaussian quadrature. The maximum likelihood estimates are obtained through standard statistical software package. Using different model parameters, the number of subject, and the number of measurements per subject, the simulation study is conducted and the results are presented. The dissertation ends with the application of the model to reading research data and future research. We examine the number of correct letter sound counted of children collected over 2008 2009 academic year. We find that age, gender and socioeconomic status are significantly related to the letter sound fluency of children in both parts of the model. The model provides better explanation of data structure and easier interpretations of parameter values, as they are the same as in standard logistic models and Poisson regression models. The model can be extended to accommodate serial correlation which can be observed in longitudinal data. Also, one may consider multilevel zeroinflated Poisson model. Although the multilevel model was proposed previously, parameter estimation by penalized quasi likelihood methods is questionable, and further examination is needed.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd5181
 Format
 Thesis
 Title
 Multistate Intensity Model with ARGARCH Random Effect for Corporate Credit Rating Transition Analysis.
 Creator

Li, Zhi, Niu, Xufeng, Huﬀer, Fred, Kercheval, Alec, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more...
Show moreThis thesis presents a stochastic process and time series study on corporate credit rating and market implied rating transitions. By extending an existing model, this paper incorporates the generalized autoregressive conditional heteroscedastic (GARCH) random effects to capture volatility changes in the instantaneous transition rates. The GARCH model is a crucial part in financial research since its ability to model volatility changes gives the market practitioners flexibility to build more accurate models on high frequency financial data. The corporate rating transition modeling was historically dealing with low frequency data which did not have the need to specify the volatility. However, the newly published Moody's market implied ratings are exhibiting much higher transition frequencies. Therefore, we feel that it is necessary to capture the volatility component and make extensions to existing models to reflect this fact. The theoretical model specification and estimation details are discussed thoroughly in this dissertation. The performance of our models is studied on several simulated data sets and compared to the original model. Finally, the models are applied to both Moody's issuer rating and market implied rating transition data as an application.
Show less  Date Issued
 2010
 Identifier
 FSU_migr_etd1426
 Format
 Thesis
 Title
 Nonparametric Nonstationary Density Estimation Including Upper Control Limit Methods for Detecting Change Points.
 Creator

Becvarik, Rachel A., Chicken, Eric, Liu, Guosheng, Sinha, Debajyoti, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

Nonstationary nonparametric densities occur naturally including applications such as monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current work on monitoring them for changes. A new statistic is proposed which effectively monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely...
Show moreNonstationary nonparametric densities occur naturally including applications such as monitoring the amount of toxins in the air and in monitoring internet streaming data. Progress has been made in estimating these densities, but there is little current work on monitoring them for changes. A new statistic is proposed which effectively monitors these nonstationary nonparametric densities through the use of transformed wavelet coefficients of the quantiles. This method is completely nonparametric, designed for no particular distributional assumptions; thus making it effective in a variety of conditions. Existing methods for monitoring sequential data typically focus on using a single value upper control limit (UCL) based on a specified in control average run length (ARL) to detect changes in these nonstationary statistics. However, such a UCL is not designed to take into consideration the false alarm rate, the power associated with the test or the underlying distribution of the ARL. Additionally, if the monitoring statistic is known to be monotonic over time (which is typical in methods using maxima in their statistics, for example) the flat UCL does not adjust to this property. We propose several methods for creating UCLs that provide improved power and simultaneously adjust the false alarm rate to userspecified values. Our methods are constructive in nature, making no use of assumed distribution properties of the underlying monitoring statistic. We evaluate the different proposed UCLs through simulations to illustrate the improvements over current UCLs. The proposed method is evaluated with respect to profile monitoring scenarios and the proposed density statistic. The method is applicable for monitoring any monotonically nondecreasing nonstationary statistics.
Show less  Date Issued
 2013
 Identifier
 FSU_migr_etd7292
 Format
 Thesis
 Title
 A Novel Riemannian Metric for Analyzing Spherical Functions with Applications to HARDI Data.
 Creator

Ncube, Sentibaleng, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

We propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of...
Show moreWe propose a novel Riemannian framework for analyzing orientation distribution functions (ODFs), or their probability density functions (PDFs), in HARDI data sets for use in comparing, interpolating, averaging, and denoising PDFs. This is accomplished by separating shape and orientation features of PDFs, and then analyzing them separately under their own Riemannian metrics. We formulate the action of the rotation group on the space of PDFs, and define the shape space as the quotient space of PDFs modulo the rotations. In other words, any two PDFs are compared in: (1) shape by rotationally aligning one PDF to another, using the FisherRao distance on the aligned PDFs, and (2) orientation by comparing their rotation matrices. This idea improves upon the results from using the FisherRao metric in analyzing PDFs directly, a technique that is being used increasingly, and leads to geodesic interpolations that are biologically feasible. This framework leads to definitions and efficient computations for the Karcher mean that provide tools for improved interpolation and denoising. We demonstrate these ideas, using an experimental setup involving several PDFs.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd5064
 Format
 Thesis
 Title
 Parametric and Nonparametric Spherical Regression with Diffeomorphisms.
 Creator

Rosenthal, Michael, Srivastava, Anuj, Wu, Wei, Klassen, Eric, Pati, Debdeep, Department of Statistics, Florida State University
 Abstract/Description

Spherical regression explores relationships between pairs of variables on spherical domains. Spherical data has become more prevalent in biological, gaming, geographical, and meteorological investigations, creating a need for tools that analyze such data. Previous works on spherical regression have focused on rigid parametric models or nonparametric kernel smoothing methods. This leaves a huge gap in the available tools with no intermediate options currently available. This work will develop...
Show moreSpherical regression explores relationships between pairs of variables on spherical domains. Spherical data has become more prevalent in biological, gaming, geographical, and meteorological investigations, creating a need for tools that analyze such data. Previous works on spherical regression have focused on rigid parametric models or nonparametric kernel smoothing methods. This leaves a huge gap in the available tools with no intermediate options currently available. This work will develop two such intermediate models, one parametric using projective linear transformation and one nonparametric model using diffeomorphic maps from a sphere to itself. The models are estimated in a maximumlikelihood framework using gradientbased optimizations. For the parametric model, an efficient NewtonRaphson algorithm is derived and asymptotic analysis is developed. A firstorder roughness penalty is specified for the nonparametric model using the Jacobian of diffeomorphisms. The prediction performance of the proposed models are compared with stateoftheart methods using simulated and real data involving plate tectonics, cloud deformations, wind, accelerometer, bird migration, and vectorcardiogram data.
Show less  Date Issued
 2014
 Identifier
 FSU_migr_etd9082
 Format
 Thesis
 Title
 Riemannian Shape Analysis of Curves and Surfaces.
 Creator

Kurtek, Sebastian, Srivastava, Anuj, Klassen, Eric, Wu, Wei, Huﬀer, Fred, Dryden, Ian, Department of Statistics, Florida State University
 Abstract/Description

Shape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the...
Show moreShape analysis of curves and surfaces is a very important tool in many applications ranging from computer vision to bioinformatics and medical imaging. There are many difficulties when analyzing shapes of parameterized curves and surfaces. Firstly, it is important to develop representations and metrics such that the analysis is invariant to parameterization in addition to the standard transformations (rigid motion and scaling). Furthermore, under the chosen representations and metrics, the analysis must be performed on infinitedimensional and sometimes nonlinear spaces, which poses an additional difficulty. In this work, we develop and apply methods which address these issues. We begin by defining a framework for shape analysis of parameterized open curves and extend these ideas to shape analysis of surfaces. We utilize the presented frameworks in various classification experiments spanning multiple application areas. In the case of curves, we consider the problem of clustering DTMRI brain fibers, classification of protein backbones, modeling and segmentation of signatures and statistical analysis of biosignals. In the case of surfaces, we perform disease classification using 3D anatomical structures in the brain, classification of handwritten digits by viewing images as quadrilateral surfaces, and finally classification of cropped facial surfaces. We provide two additional extensions of the general shape analysis frameworks that are the focus of this dissertation. The first one considers shape analysis of marked spherical surfaces where in addition to the surface information we are given a set of manually or automatically generated landmarks. This requires additional constraints on the definition of the reparameterization group and is applicable in many domains, especially medical imaging and graphics. Second, we consider reflection symmetry analysis of planar closed curves and spherical surfaces. Here, we also provide an example of disease detection based on brain asymmetry measures. We close with a brief summary and a discussion of open problems, which we plan on exploring in the future.
Show less  Date Issued
 2012
 Identifier
 FSU_migr_etd4963
 Format
 Thesis
 Title
 Robust Function Registration Using Depth on the Phase Variability.
 Creator

Cleveland, Jason D., Wu, Wei, Klassen, E. (Eric), Srivastava, Anuj, Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In the field of functional data analysis, registration is still a fundamental problem. Registration still has to take into consideration what underlying template is chosen for "center" or alignment purposes and the hurdles that come with each. In this dissertation we will cover the registration of temporal observations with compositional and additive noises present under a mean template and the registration of temporal observations under a median template. Our first project covers an...
Show moreIn the field of functional data analysis, registration is still a fundamental problem. Registration still has to take into consideration what underlying template is chosen for "center" or alignment purposes and the hurdles that come with each. In this dissertation we will cover the registration of temporal observations with compositional and additive noises present under a mean template and the registration of temporal observations under a median template. Our first project covers an adaptation of the FisherRao framework such that can yield a consistent estimator when both noises are present. Our second project covers a similar registration but uses data depth. The adapted FisherRao method gives a mean template and the data depth method gives a median. As in standard statistics, we wish to explore outlier removal, appropriate template usage, and pattern classification. Various frameworks have been developed over the past two decades where registrations are conducted based on optimal time warping between functions. Comparison of functions solely based on time warping, however, may have limited application, in particular when certain constraints are desired in the registration. In the first project, we study registration with normpreserving constraint. A closely related problem is on signal estimation, where the goal is to estimate the groundtruth template given random observations with both compositional and additive noises. We propose to adopt the FisherRao framework to compute the underlying template, and mathematically prove that such framework leads to a consistent estimator. We then illustrate the constrained FisherRao registration using simulations as well as two real data sets. It is found that the constrained method is robust with respect to additive noise and has superior alignment and classification performance to conventional, unconstrained registration methods. Recently, statistical methodologies have been developed and extended to deal with functional observations. A past notion of depth has advanced the ways in which we can view these functional observations, i.e. an intuitive centeroutward ordering scheme and median signal template estimation. However, functional observations often have noise. Thus, we propose a semiparametric model and an algorithm that yields a consistent estimator for the underlying median template. We also propose a new band based boxplot methodology for outlier detection and removal. We illustrate the robustness of this depth based registration using simulations as well as two real data sets.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Cleveland_fsu_0071E_13838
 Format
 Thesis
 Title
 Shape Based Function Estimation.
 Creator

Dasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences,...
Show moreDasgupta, Sutanoy, Srivastava, Anuj, Pati, Debdeep, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Estimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a...
Show moreEstimation of functions is an extremely rich and wellresearched topic of research with broad applications spanning several scientific fields. We develop a shape based framework for probability density and general function modelling. The framework encompasses both shape constrained and unconstrained estimation, and can accomodate a much broader notion of shape constraints than has been considered in literature. The estimation approach is a two step process where the first step creates a template or an initial guess, and the second important step ``improves" the estimate according to an appropriate objective function. We derive asymptotic properties of the estimators in different scenarios, and illustrate the performance of the estimate through several simulation as well as real data examples.
Show less  Date Issued
 2019
 Identifier
 2019_Summer_Dasgupta_fsu_0071E_15347
 Format
 Thesis
 Title
 Sparse Feature and Element Selection in HighDimensional Vector Autoregressive Models.
 Creator

Huang, Xue, Niu, Xufeng, She, Yiyuan, Cheng, Yingmei, Huffer, Fred W. (Fred William), Wu, Wei, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

This thesis is to identify the underlying structures of multivariate time series and propose a methodology to construct predictive VAR models. Due to the complexity of high dimensions in multivariate time series, forecasting a target series with many predictors in VAR models poses a challenge in statistical learning and modeling. The quadratically increasing dimension of parameter space, which is known as "curse of dimensionality" poses considerable challenges to multivariate time series...
Show moreThis thesis is to identify the underlying structures of multivariate time series and propose a methodology to construct predictive VAR models. Due to the complexity of high dimensions in multivariate time series, forecasting a target series with many predictors in VAR models poses a challenge in statistical learning and modeling. The quadratically increasing dimension of parameter space, which is known as "curse of dimensionality" poses considerable challenges to multivariate time series models. Meanwhile, there are two facts involved in reducing dimensions in multivariate time series: first, some nuisance time series exist and better to be removed, second a target time series is typically driven by few dependent elements constructed from some indices. To address these challenge and facts, our approach is to reduce both the dimensions of the series and the features involved in each series simultaneously. As a result, the original high dimensional structure can be modeled using a lower dimensional time series, and subsequently the forecasting performance will be improved. The methodology we introduced in this work is called Sparse Feature and Element Selection (SFES). It employs a "L1 + group L1" penalty to conduct group selection and variable selection within each group simultaneously. Our contributions in this thesis are twofolds. First, the doublyconstrained regularization in SFES is a convex mathematical problem, and we optimize it using a fast but simpletoimplement algorithm. We evaluate this algorithm with a largescale dataset and theoretically prove that it has guaranteed strict iterative convergence and global optimality. Second, we theoretically present nonasymptotic results based on combined statistical and computational analysis. A sharp oracle inequality is proved to reveal its power in predictive learning. We compare SFES with the related work of Sparse Group Lasso (SGL) to show that the proposed method is both computationally efficient and theoretically justified. Experiments using simulation data and realworld macroeconomic time series data are conducted to demonstrate the efficiency and efficacy of the proposed SFES in practice.
Show less  Date Issued
 2016
 Identifier
 FSU_FA2016_Huang_fsu_0071E_13659
 Format
 Thesis
 Title
 A Statistical Approach to an Ocean Circulation Inverse Problem.
 Creator

Choi, Seoeun, Huﬀer, Fred W., Speer, Kevin G., Nolder, Craig, Niu, Xufeng, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

This dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the...
Show moreThis dissertation presents, applies, and evaluates a statistical approach to an ocean circulation problem. The objective is to produce a map of ocean velocity in the North Atlantic based on sparse measurements along ship tracks, based on a Bayesian approach with a physical model. The Stommel Gulf Stream model which relates the wind stress curl to the transport stream function is the physical model. A Gibbs sampler is used to extract features from the posterior velocity field. To specify the prior, the equation of the Stommel Gulf Stream model on a twodimensional grid is used.Comparisons with earlier approaches used by oceanographers are also presented.
Show less  Date Issued
 2007
 Identifier
 FSU_migr_etd3758
 Format
 Thesis
 Title
 Statistical Modelling and Applications of Neural Spike Trains.
 Creator

Lawhern, Vernon, Wu, Wei, Contreras, Robert J., Srivastava, Anuj, Huﬀer, Fred, Niu, Xufeng, Department of Statistics, Florida State University
 Abstract/Description

In this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target...
Show moreIn this thesis we investigate statistical modelling of neural activity in the brain. We first develop a framework which is an extension of the statespace Generalized Linear Model (GLM) by Eden and colleagues [20] to include the effects of hidden states. These states, collectively, represent variables which are not observed (or even observable) in the modeling process but nonetheless can have an impact on the neural activity. We then develop a framework that allows us to input apriori target information into the model. We examine both of these modelling frameworks on motor cortex data recorded from monkeys performing different targetdriven hand and arm movement tasks. Finally, we perform temporal coding analysis of sensory stimulation using principled statistical models and show the efficacy of our approach.
Show less  Date Issued
 2011
 Identifier
 FSU_migr_etd3251
 Format
 Thesis
 Title
 Statistical Shape Analysis of Neuronal Tree Structures.
 Creator

Duncan, Adam, Srivastava, Anuj, Klassen, E., Wu, Wei, Huffer, Fred W., Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

Neuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in...
Show moreNeuron morphology plays a central role in characterizing cognitive health and functionality of brain structures. The problem of quantifying neuron shapes, and capturing statistical variability of shapes, is difficult because axons and dendrites have tree structures that differ in both geometry and topology. In this work, we restrict to the trees that consist of: (1) a main branch viewed as a parameterized curve in ℝ³, and (2) some number of secondary branches  also parameterized curves in ℝ³  which emanate from the main branch at arbitrary points. We present two shapeanalytic frameworks which each give a metric structure to the set of such tree shapes, Both frameworks are based on an elastic metric on the space of curves with certain shapepreserving nuisance variables modded out. In the first framework, the side branches are treated as a continuum of curvevalued annotations to the main branch. In the second framework, the side branches are treated as discrete entities and are matched to each other by permutation. We show geodesic deformations between tree shapes in both frameworks, and we show Fréchet means and modes of variability, as well as crossvalidated classification between different experimental groups using the second framework. We conclude with a smaller project which extends some of these ideas to more general weighted attributed graphs.
Show less  Date Issued
 2018
 Identifier
 2018_Sp_Duncan_fsu_0071E_14500
 Format
 Thesis
 Title
 Stochastic Models and Inferences for Commodity Futures Pricing.
 Creator

Ncube, Moeti M., Srivastava, Anuj, Doran, James, Mason, Patrick, Niu, Xufeng, Huﬀer, Fred, Wu, Wei, Department of Statistics, Florida State University
 Abstract/Description

The stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis...
Show moreThe stochastic modeling of financial assets is essential to the valuation of financial products and investment decisions. These models are governed by certain parameters that are estimated through a process known as calibration. Current procedures typically perform a gridsearch optimization of a given objective function over a specified parameter space. These methods can be computationally intensive and require restrictions on the parameter space to achieve timely convergence. In this thesis, we propose an alternative Kalman Smoother Expectation Maximization procedure (KSEM) that can jointly estimate all the parameters and produces better model t that compared to alternative estimation procedures. Further, we consider the additional complexity of the modeling of jumps or spikes that may occur in a time series. For this calibration we develop a Particle Smoother Expectation Maximization procedure (PSEM) for the optimization of nonlinear systems. This is an entirely new estimation approach, and we provide several examples of it's application.
Show less  Date Issued
 2009
 Identifier
 FSU_migr_etd2707
 Format
 Thesis
 Title
 Tools for Statistical Analysis on Shape Spaces of ThreeDimensional Object.
 Creator

Xie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of...
Show moreXie, Qian, Srivastava, Anuj, Klassen, E. (Eric), Huffer, Fred W. (Fred William), Wu, Wei, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

With the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical...
Show moreWith the increasing popularity of information technology, especially electronic imaging techniques, large amount of high dimensional data such as 3D shapes become pervasive in science, engineering and even people's daily life, in the recent years. Though the data quantity is huge, the extraction of relevant knowledge on those data is still limited. How to understand data in a meaningful way is generally an open problem. The specific challenges include finding adequate mathematical representations of data and designing proper algorithms to process them. The existing tools for analyzing highdimensional data, including 3D shape data, are found to be insufficient as they usually suffer from many factors, such as misalignments, noise, and clutter. This thesis attempts to develop a framework for processing, analyzing and understanding highdimensional data, especially 3D shapes, by proposing a set of statistical tools including theory, algorithms and optimization applied to practical problems. In particular, the following aspects of shape analysis are considered: 1. A framework adopting the SRNF representation, based on parallel transport of deformations across surfaces in the shape space, leads to statistical analysis on shape data. Three main analyses are conducted under this framework: (1) computing geodesics when either two end surfaces or the starting surface and an initial deformation are given; (2) parallel transporting deformation across surfaces; and (3) sampling random surfaces. 2. Computational efficiency plays an important role in performing statistical shape analysis on large datasets of 3D objects. To speed up the previous method, a framework with numerical solution is introduced by approximating the inverse mapping, and it reduces the computational cost by an order of magnitude. 3. The geometrical and morphological information, or their shapes, of 3D objects can be analyzed explicitly using boundaries extracted from original image scans. An alternative idea is to consider variability in shapes directly from their embedding images. A novel framework is proposed to unify three important tasks, registering, comparing and modeling images. 4. Finally, the spatial deformations learned from registering images are modeled using the GRID based decomposition. This specific model provides a way to decompose a large deformation into local and fundamental ones so that shape differences between images are easily interpretable. We conclude this thesis with conclusions drawn in this research and discuss potential future directions of statistical shape analysis in the last chapter, both from methodological and application aspects.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9495
 Format
 Thesis
 Title
 Trend and VariablePhase Seasonality Estimation from Functional Data.
 Creator

Tai, LiangHsuan, Gallivan, Kyle A., Srivastava, Anuj, Wu, Wei, Klassen, E. (Eric), Ökten, Giray, Florida State University, College of Arts and Sciences, Department of Mathematics
 Abstract/Description

The problem of estimating trend and seasonality has been studied over several decades, although mostly using single time series setup. This dissertation studies the problem of estimating these components from a functional data point of view, i.e. multiple curves, in situations where seasonal effects exhibit arbitrary time warpings or phase variability across different observations. Rather than ignoring the phase variability, or using an offtheshelf alignment method to remove phase, we take...
Show moreThe problem of estimating trend and seasonality has been studied over several decades, although mostly using single time series setup. This dissertation studies the problem of estimating these components from a functional data point of view, i.e. multiple curves, in situations where seasonal effects exhibit arbitrary time warpings or phase variability across different observations. Rather than ignoring the phase variability, or using an offtheshelf alignment method to remove phase, we take a modelbased approach and seek Maximum Likelihood Estimators (MLEs) of the trend and the seasonal effects, while performing alignments over the seasonal effects at the same time. The MLEs of trend, seasonality, and phase are computed using a coordinate descent based optimization method. We use bootstrap replication for computing confidence bands and for testing hypothesis about the estimated components. We also utilize loglikelihood for selecting the trend subspace, and for comparisons with other candidate models. This framework is demonstrated using experiments involving synthetic data and three real data (Berkeley growth velocity, U.S. electricity price, and USD exchange fluctuation). Our framework is further applied to another biological problem, significance analysis of gene sets of timecourse gene expression data and outperform the stateoftheart method.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Tai_fsu_0071E_13816
 Format
 Thesis
 Title
 Univariate and Multivariate Volatility Models for Portfolio Value at Risk.
 Creator

Xiao, Jingyi, Niu, Xufeng, Ökten, Giray, Wu, Wei, Huffer, Fred W. (Fred William), Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype...
Show moreIn modern day financial risk management, modeling and forecasting stock return movements via their conditional volatilities, particularly predicting the Value at Risk (VaR), became increasingly more important for a healthy economical environment. In this dissertation, we evaluate and compare two main families of models for the conditional volatilities  GARCH and Stochastic Volatility (SV)  in terms of their VaR prediction performance of 5 major US stock indices. We calculate GARCHtype model parameters via Quasi Maximum Likelihood Estimation (QMLE) while for those of SV we employ MCMC with Ancillary Sufficient Interweaving Strategy. We use the forecast volatilities corresponding to each model to predict the VaR of the 5 indices. We test the predictive performances of the estimated models by a twostage backtesting procedure and then compare them via the Lopez loss function. Results of this dissertation indicate that even though it is more computational demanding than GARCHtype models, SV dominates them in forecasting VaR. Since financial volatilities are moving together across assets and markets, it becomes apparent that modeling the volatilities in a multivariate framework of modeling is more appropriate. However, existing studies in the literature do not present compelling evidence for a strong preference between univariate and multivariate models. In this dissertation we also address the problem of forecasting portfolio VaR via multivariate GARCH models versus univariate GARCH models. We construct 3 portfolios with stock returns of 3 major US stock indices, 6 major banks and 6 major technical companies respectively. For each portfolio, we model the portfolio conditional covariances with GARCH, EGARCH and MGARCHBEKK, MGARCHDCC, and GOGARCH models. For each estimated model, the forecast portfolio volatilities are further used to calculate (portfolio) VaR. The ability to capture the portfolio volatilities is evaluated by MAE and RMSE; the VaR prediction performance is tested through a twostage backtesting procedure and compared in terms of the loss function. The results of our study indicate that even though MGARCH models are better in predicting the volatilities of some portfolios, GARCH models could perform as well as their multivariate (and computationally more demanding) counterparts.
Show less  Date Issued
 2019
 Identifier
 2019_Spring_Xiao_fsu_0071E_15172
 Format
 Thesis