Data-driven calibration of linear estimators with minimal penalties, with an application to multi-task regression
This talk tackles the problem of selecting among several linear estimators in non-parametric regression; this includes model selection for linear regression, the choice of a regularization parameter in kernel ridge regression or spline smoothing, the choice of a kernel in multiple kernel learning, the choice of a bandwidth for Nadaraya-Watson estimators, and the choice of k for k-nearest neighbors regression.
We propose a new algorithm which first estimates consistently the variance of the noise, based upon the concept of minimal penalty which was previously introduced in the context of model selection. Then, plugging our variance estimate in Mallows' C_L penalty is proved to lead to an algorithm satisfying an oracle inequality. Simulation experiments show that the proposed algorithm often improves significantly existing calibration procedures such as 10-fold cross-validation or generalized cross-validation.
We then provide an application to the kernel multiple ridge regression framework, which we refer to as multi-task regression. The theoretical analysis of this problem shows that the key element appearing for an optimal calibration is the covariance matrix of the noise between the different tasks. We present a new algorithm for estimating this covariance matrix, based upon several single-task variance estimations. We show, in a non-asymptotic setting and under mild assumptions on the target function, that this estimator converges towards the covariance matrix. Then, plugging this estimator into the corresponding ideal penalty leads to an oracle inequality. We illustrate the behavior of our algorithm on synthetic examples.
This talk is based on two joint works with Francis Bach and Matthieu Solnon:
S. Arlot, F. Bach. Data-driven Calibration of Linear Estimators with Minimal Penalties. arXiv:0909.1884
M. Solnon, S. Arlot, F. Bach. Multi-task Regression using Minimal Penalties. arXiv:1107.4512