Data-driven penalties for optimal calibration of learning algorithms
Learning algorithms usually depend on one or several parameters that need to be chosen carefully. We tackle in this talk the question of designing
penalties for an optimal choice of such regularization parameters in non-parametric regression.
First, we consider the problem of selecting among several linear estimators, which includes model selection for linear regression, the choice of a
regularization parameter in kernel ridge regression or spline smoothing, and the choice of a kernel in multiple kernel learning. We propose a new
penalization procedure which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously
introduced in the context of model selection. Then, plugging our variance estimate in Mallows' $C_L$ penalty is proved to lead to an algorithm
satisfying an oracle inequality.
Second, when data are heteroscedastic, we can show that dimensionality-based penalties are suboptimal for model selection in least-squares regression.
So, the shape of the penalty itself has to be estimated. Resampling is used for building penalties robust to heteroscedasticity, without requiring prior
information on the noise-level. For instance, V-fold penalization is shown to improve V-fold cross-validation for a fixed computational cost.