Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Big-data applications typically involve huge numbers of observations and features, thus posing new challenges for variable selection and parameter estimation. In addition to being efficient, the desired algorithm is better to have implementation ease. An ad-hoc procedure may be fragile, and an algorithm design on the basis of optimization, rather than heuristics, is preferred. Meanwhile, methods such as LASSO or screening algorithms are derived assuming mildly correlated features. These assumptions make it possible to eliminate features in an aggressive manner, but may be too stringent to hold for real-life high dimensional applications. The paper proposes a novel "slow kill" (SK) technique on the basis of nonconvex constrained optimization, which gradually identifies and removes irrelevant variables with adaptive ℓ2-shrinkage and growing learning rate. That the problem size can drop during the iterations of SK makes it particularly suitable for large-scale variable selection. The interplay between statistics and optimization leads to insightful results on quantile control, stepsize and shrinkage parameters to relax the regularity conditions under which the desire order of statistical accuracy is guaranteed. The theoretical study justifies the optimal error rate and fast convergence of SK, without assuming asymptotic conditions or globally optimal solutions. The technique applies to a general loss that is not necessarily a likelihood function. In addition, no grid search is required to find a solution with a prescribed cardinality, which exempts the unnecessary expensive computational cost when tuning the regularization parameters. Experiments on synthetic data show that SK outperforms state-of-the-art algorithms in various situations while being extremely computationally scalable. Moreover, the performances on various real data examples including spam websites identification, digit recognition and gene expression further demonstrated the power of the proposed method.