Current Search: Research Repository (x) » Statistics (x) » Barbu, Adrian G. (Adrian Gheorghe) (x)
Search results
 Title
 High Level Image Analysis on Manifolds via Projective Shapes and 3D Reflection Shapes.
 Creator

Lester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences,...
Show moreLester, David T. (David Thomas), Patrangenaru, Victor, Liu, Xiuwen, Barbu, Adrian G. (Adrian Gheorghe), Tao, Minjing, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Shape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D...
Show moreShape analysis is a widely studied topic in modern Statistics with important applications in areas such as medical imaging. Here we focus on twosample hypothesis testing for both finite and infinite extrinsic mean shapes of configurations. First, we present a test for equality of mean projective shapes of 2D contours based on rotations. Secondly, we present a test for mean 3D reflection shapes based on the Schoenberg mean. We apply these tests to footprint data (contours), clamshells (3D reflection shape) and human facial configurations extracted from digital camera images. We also present the method of MANOVA on manifolds, and apply it to face data extracted from digital camera images. Finally, we present a new statistical tool called antiregression.
Show less  Date Issued
 2017
 Identifier
 FSU_2017SP_Lester_fsu_0071E_13856
 Format
 Thesis
 Title
 Modeling Multivariate Data with ParameterBased Subspaces.
 Creator

Gupta, Ajay, Barbu, Adrian G. (Adrian Gheorghe), MeyerBaese, Anke, She, Yihuan, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

When modeling multivariate data such as vectorized images, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by yaw rotation, and one would expect a face rotated 65 degrees to the left to have characteristics more similar to a face rotated 55 degrees to the left than to a face rotated 65 degrees to the right. We introduce a novel method, parameterized principal component...
Show moreWhen modeling multivariate data such as vectorized images, one might have an extra parameter of contextual information that could be used to treat some observations as more similar to others. For example, images of faces can vary by yaw rotation, and one would expect a face rotated 65 degrees to the left to have characteristics more similar to a face rotated 55 degrees to the left than to a face rotated 65 degrees to the right. We introduce a novel method, parameterized principal component analysis (PPCA), that can model data with linear variation like principal component analysis (PCA), but can also take advantage of this parameter of contextual information like yaw rotation. Like PCA, PPCA models an observation using a mean vector and the product of observationspecific coefficients and basis vectors. Unlike PCA, PPCA treats the elements of the mean vector and basis vectors as smooth, piecewise linear functions of the contextual parameter. PPCA is fit by a penalized optimization that penalizes potential models which have overly large differences between corresponding mean or basis vector elements for similar parameter values. The penalty ensures that each observation's projection will share information with observations that have similar parameter values, but not with observations that have dissimilar parameter values. We tested PPCA on artificial data based on known, smooth functions of an added parameter, as well as on three real datasets with different types of parameters. We compared PPCA to independent principal component analysis (IPCA), which groups observations by their parameter values and projects each group using principal component analysis with no sharing of information for different groups. PPCA recovers the known functions with less error and projects the datasets' test set observations with consistently less reconstruction error than IPCA does. PPCA's performance is particularly strong, relative to IPCA, when there are limited training data. We also tested the use of spectral clustering to form the groups in an IPCA model. In our experiment, the clustered IPCA model had very similar error to the parameterbased IPCA model, suggesting that spectral clustering might be a viable alternative if one did not know the parameter values for an application.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SU_Gupta_fsu_0071E_13422
 Format
 Thesis
 Title
 The Oneand TwoSample Problem for Data on Hilbert Manifolds with Applications to Shape Analysis.
 Creator

Qiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College...
Show moreQiu, Mingfei, Patrangenaru, Victor, Liu, Xiuwen, Slate, Elizabeth H., Barbu, Adrian G. (Adrian Gheorghe), Clickner, Robert Paul, Paige, Robert, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

This dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we...
Show moreThis dissertation is concerned with high level imaging analysis. In particular, our focus is on extracting the projective shape information or the similarity shape from digital camera images or Magnetic Resonance Imaging(MRI). The approach is statistical without making any assumptions about the distributions of the random object under investigation. The data is organized as points on a Hilbert manifold. In the case of projective shapes of finite dimensional configuration of points, we consider testing a onesample null hypothesis, while in the infinite dimensional case, we considered a neighborhood hypothesis testing methods. For 3D scenes, we retrieve the 3D projective shape, and use the Lie group structure of the projective shape space. We test the equality of two extrinsic means, by introducing the mean projective shape change. For 2D MRI of midsections of Corpus Callosum contours, we use an automatic matching technique that is necessary in pursuing a onesample neighborhood hypothesis testing for the similarity shapes. We conclude that the mean similarity shape of the Corpus Callosum of average individuals is very far from the shape of Albert Einstein's, which may explain his geniality. Another application of our Hilbert manifold methodology is twosample testing problem for VeroneseWhitney means of projective shapes of 3D contours. Particularly, our data consisting comparing 3D projective shapes of contours of leaves from the same tree species.
Show less  Date Issued
 2015
 Identifier
 FSU_2015fall_Qiu_fsu_0071E_12922
 Format
 Thesis
 Title
 Parameter Sensitive Feature Selection for Learning on Large Datasets.
 Creator

Gramajo, Gary, Barbu, Adrian G. (Adrian Gheorghe), Piyush, Kumar, Huffer, Fred W. (Fred William), She, Yiyuan, Zhang, Jinfeng, Florida State University, College of Arts and...
Show moreGramajo, Gary, Barbu, Adrian G. (Adrian Gheorghe), Piyush, Kumar, Huffer, Fred W. (Fred William), She, Yiyuan, Zhang, Jinfeng, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Though there are many feature selection methods for learning, they might not scale well to very large datasets, such as those generated in computer vision data. Furthermore, it can be beneficial to capture and model the variability inherent to data such as face detection where a plethora of face poses (i.e. parameters) are possible. We propose a parameter sensitive learning method that can learn effectively on datasets that can be prohibitively large. Our contributions are the following....
Show moreThough there are many feature selection methods for learning, they might not scale well to very large datasets, such as those generated in computer vision data. Furthermore, it can be beneficial to capture and model the variability inherent to data such as face detection where a plethora of face poses (i.e. parameters) are possible. We propose a parameter sensitive learning method that can learn effectively on datasets that can be prohibitively large. Our contributions are the following. First, we propose an efficient feature selection algorithm that optimizes a differentiable loss with sparsity constraints. We note that any differentiable loss can be used and will vary depending on the application. The iterative algorithm alternates parameter updates with tightening the sparsity constraints by gradually removing variables based on the coefficient magnitudes and a schedule. Second, we show how to train a single parameter sensitive classifier that models the wide range of class variability. The sole classifier is important since this reduces the amount of data necessary for training compared to methods where multiple classifiers are trained for each parameter value. Third, we show how to use nonlinear univariate response functions to obtain a nonlinear decision boundary with feature selection; an important characteristic since the separation of classes in real world datasets is very challenging. Fourth, we show it is possible to mine hard negatives with feature selection, though it is more difficult. This is vital in computer vision data where 10^5 training examples can be generated per image. Fifth, we propose an approach to perform face detection using a 3D model on a number of face keypoints. We modify binary face features from the literature (generated using random forests) to fit into our 3D model framework. Experiments on detecting the face keypoints and on face detection using the proposed 3D models and modified face features show that the feature selection dramatically improve performance and come close to the state of the art on two standard datasets for face detection . We also apply our parameter sensitive learning method with feature selection to detect malicious websites, a dataset with approximately 2.4 million websites and 3.3 million features per website. We outperform other batch algorithms and obtain results close to a high performing online algorithm but using far fewer features.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9604
 Format
 Thesis
 Title
 Statistical Methods for Big Data and Their Applications in Biomedical Research.
 Creator

Yu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences,...
Show moreYu, Kaixian, Zhang, Jinfeng, Sang, QingXiang Amy, Barbu, Adrian G. (Adrian Gheorghe), She, Yiyuan, Sinha, Debajyoti, Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Big data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed...
Show moreBig data has brought both opportunities and challenges to our research community. Complex models can be built with large volumes of data researchers have never had access before. In this study we explore the structure learning of Bayesian network (BN) and its application to reverse engineering of gene regulatory networks (GRNs). A Bayesian network is a graphical representation of a joint distribution that encodes the conditional dependencies and independencies among the variables. We proposed a novel threestage BN structure learning method, called GRASP (GRowthbased Approach with Staged Pruning). In the first stage, a new skeleton (undirected edges) discovery method, double filtering (DF), was designed. Compared to existing methods, DF requires smaller sample sizes to achieve similar statistical power. Based on the skeleton estimated in the first step, we proposed a sequential Monte Carlo (SMC) method to sample the edges and their directions to optimize a BICbased score. SMC method has less tendency to be trapped in local optima, and the computation is easily parallelizable. On the third stage, we reclaim the edges that may be missed from previous stages. We obtained satisfactory results from simulation study and applied the method to infer GRNs from real experimental data. A method on personalized chemotherapy regimen selection for breast cancer and a novel algorithm for relationship extraction from unstructured documents will be discussed as well.
Show less  Date Issued
 2016
 Identifier
 FSU_2016SP_Yu_fsu_0071E_13079
 Format
 Thesis
 Title
 The Studies of Joint Structure Sparsity Pursuit in the Applications of Hierarchical Variable Selection and Fused Lasso.
 Creator

Jiang, He, She, Yiyuan, Ökten, Giray, Barbu, Adrian G. (Adrian Gheorghe), Mai, Qing, Florida State University, College of Arts and Sciences, Department of Statistics
 Abstract/Description

In this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a twoway interaction model. In highdimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an...
Show moreIn this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a twoway interaction model. In highdimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an interaction term implies that at least one or both associated main effects must be present. Lately this problem has attracted a lot of attentions from statisticians, but existing computational algorithms converge slow and cannot meet the challenge of big data computation. More importantly, theoretical studies of hierarchical variable selection are extremely scarce, largely due to the difficulty that multiple sparsitypromoting penalties are enforced on the same subject. This work investigates a new type of estimator based on group multiregularization to capture various types of structural parsimony simultaneously. In this work, we present nonasymptotic results based on combined statistical and computational analysis, and reveal the minimax optimal rate. A generalpurpose algorithm is developed with a theoretical guarantee of strict iterate convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach. The second topic studies Fused Lasso which pursues joint sparsity of both variables and their consecutive differences simultaneously. The overlapping penalties of Fused Lasso pose critical challenges to computation studies and theoretical analysis. Some theoretical analysis about fused lasso, however, is only performed under an orthogonal design and there is hardly any nonasymptotic study in the past literature. In this work, we study Fused Lasso and its application in a classification problem to achieve exact clustering. Computationally, we derive a simpletoimplement algorithm which scales well to big data computation; in theory, we propose a brand new technique and some nonasymptotic analysis are performed. To evaluate the prediction performance theoretically, we derived oracle inequality of Fused Lasso estimator to show the $ell_2$ prediction error rate. The minimax optimal rate is also revealed. For estimation accuracy, $ell_q (1leq q leq infty)$ norm error bound for fused lasso estimator is derived. The simulation studies shows that exact clustering can be achieved using postthresholding technique.
Show less  Date Issued
 2015
 Identifier
 FSU_migr_etd9362
 Format
 Thesis
 Title
 Testing for the Equality of Two Distributions on High Dimensional Object Spaces and Nonparametric Inference for Location Parameters.
 Creator

Guo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department...
Show moreGuo, Ruite, Patrangenaru, Victor, Mio, Washington, Barbu, Adrian G. (Adrian Gheorghe), Bradley, Jonathan R., Florida State University, College of Arts and Sciences, Department of Statistics
Show less  Abstract/Description

Our view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated...
Show moreOur view is that while some of the basic principles of data analysis are going to remain unchanged, others are to be gradually replaced with Geometry and Topology methods. Linear methods are still making sense for functional data analysis, or in the context of tangent bundles of object spaces. Complex nonstandard data is represented on object spaces. An object space admitting a manifold stratification may be embedded in an Euclidean space. One defines the extrinsic energy distance associated with two probability measures on an arbitrary object space embedded in a numerical space, and one introduces an extrinsic energy statistic to test for homogeneity of distributions of two random objects (r.o.'s) on such an object space. This test is validated via a simulation example on the Kendall space of planar kads with a VeroneseWhitney (VW) embedding. One considers an application to medical imaging, to test for the homogeneity of the distributions of Kendall shapes of the midsections of the Corpus Callosum in a clinically normal population vs a population of ADHD diagnosed individuals. Surprisingly, due to the high dimensionality, these distributions are not significantly different, although they are known to have highly significant VWmeans. New spread and location parameters are to be added to reflect the nontrivial topology of certain object spaces. TDA is going to be adapted to object spaces, and hypothesis testing for distributions is going to be based on extrinsic energy methods. For a random point on an object space embedded in an Euclidean space, the mean vector cannot be represented as a point on that space, except for the case when the embedded space is convex. To address this misgiving, since the mean vector is the minimizer of the expected square distance, following Frechet (1948), on an embedded compact object space, one may consider both minimizers and maximizers of the expected square distance to a given point on the embedded object space as mean, respectively antimean of the random point. Of all distances on an object space, one considers here the chord distance associated with the embedding of the object space, since for such distances one can give a necessary and sufficient condition for the existence of a unique Frechet mean (respectively Frechet antimean). For such distributions these location parameters are called extrinsic mean (respectively extrinsic antimean), and the corresponding sample statistics are consistent estimators of their population counterparts. Moreover around the extrinsic mean ( antimean ) located at a smooth point, one derives the limit distribution of such estimators.
Show less  Date Issued
 2017
 Identifier
 FSU_SUMMER2017_Guo_fsu_0071E_13977
 Format
 Thesis