Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
By removing as much as 90% or more of a deep neural network's (DNN's) parameters, a wide variety of pruning approaches not only allow for DNN compression but also increase generalization (model performance on test/unseen data). This observation is in conflict with emerging DNN generalization theory and empirical observations, however, which suggest that DNNs generalize better as their parameter counts rise, despite overparameterization (use of more parameters than data points). Seeking to reconcile such modern findings and pruning-based generalization improvements, this thesis empirically studies the cause of improved generalization in pruned DNNs. We begin by providing support for our hypothesis that pruning regularizes similarly to noise injection with a perhaps surprising result: pruning parameters more immediately important to the network leads to better generalization later, after the network has adapted to the pruning. We show that this behavior is a manifestation of a more general phenomenon. Across a wide variety of experimental configurations and pruning algorithms, pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately after pruning). We study the limits of this generalization-stability tradeoff and use it to inform the derivation of a novel pruning algorithm that produces particularly unstable pruning and higher generalization. Such results suggest that accounting for this tradeoff would improve pruning algorithm design. Finally, we empirically examine the consistency of several generalization theories with the generalization-stability tradeoff and pruning-based generalization improvements. Notably, we find that pruning less stably heightens measures of DNN flatness (robustness to data-sample and parameter changes) that are positively correlated with generalization, and pruning-based generalization improvements are maintained when pruning is modified to only remove parameters temporarily. Thus, by demonstrating a regularization mechanism in pruning that depends on changes to sharpness-related complexity rather than parameter-count complexity, this thesis elucidates the compatibility of pruning-based generalization improvements and high generalization in overparameterized DNNs, while also corroborating the relevance of flatness to DNN generalization.