Title:
Data-driven penalties for model selection
Abstract:
Penalization procedures often suffer from their dependence on multiplying factors, whose
optimal values are either unknown or hard to estimate from data.
We propose a completely data-driven calibration algorithm for this parameter in the
least-squares regression framework, without assuming a particular shape for the penalty.
Moreover, dimensionality-based penalties such as Mallows' Cp fail in the heteroscedastic
regression framework, so that the shape of the penalty itself has to be estimated.
Resampling is used for building penalties robust to heteroscedasticity, without requiring prior
information on the noise-level.
For instance, V-fold penalization is shown to improve V-fold cross-validation for a fixed
computational cost.