Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
In this dissertation, we study joint sparsity pursuit and its applications in variable selection in high dimensional data. The first part of dissertation focuses on hierarchical variable selection and its application in a two-way interaction model. In high-dimensional models that involve interactions, statisticians usually favor variable selection obeying certain logical hierarchical constraints. The first part of this paper focuses on structural hierarchy which means that the existence of an interaction term implies that at least one or both associated main effects must be present. Lately this problem has attracted a lot of attentions from statisticians, but existing computational algorithms converge slow and cannot meet the challenge of big data computation. More importantly, theoretical studies of hierarchical variable selection are extremely scarce, largely due to the difficulty that multiple sparsity-promoting penalties are enforced on the same subject. This work investigates a new type of estimator based on group multi-regularization to capture various types of structural parsimony simultaneously. In this work, we present non-asymptotic results based on combined statistical and computational analysis, and reveal the minimax optimal rate. A general-purpose algorithm is developed with a theoretical guarantee of strict iterate convergence and global optimality. Simulations and real data experiments demonstrate the efficiency and efficacy of the proposed approach. The second topic studies Fused Lasso which pursues joint sparsity of both variables and their consecutive differences simultaneously. The overlapping penalties of Fused Lasso pose critical challenges to computation studies and theoretical analysis. Some theoretical analysis about fused lasso, however, is only performed under an orthogonal design and there is hardly any nonasymptotic study in the past literature. In this work, we study Fused Lasso and its application in a classification problem to achieve exact clustering. Computationally, we derive a simple-to-implement algorithm which scales well to big data computation; in theory, we propose a brand new technique and some nonasymptotic analysis are performed. To evaluate the prediction performance theoretically, we derived oracle inequality of Fused Lasso estimator to show the $ell_2$ prediction error rate. The minimax optimal rate is also revealed. For estimation accuracy, $ell_q (1leq q leq infty)$ norm error bound for fused lasso estimator is derived. The simulation studies shows that exact clustering can be achieved using post-thresholding technique.
A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Includes bibliographical references.
Yiyuan She, Professor Directing Dissertation; Giray Okten, University Representative; Adrian Barbu, Committee Member; Qing Mai, Committee Member.
Florida State University
Use and Reproduction
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.