Model selection and estimator selection for statistical learning


Tutorial given at the Scuola Normale Superiore di Pisa in February 2011. Since this tutorial mostly comes from the "Cours Peccot" lectures I gave in January 2011 at College de France (Paris), you can also have a look to my lecture notes for the Cours Peccot (in French).


References:

[1] Luc Devroye, Laszlo Gyorfi, and Gabor Lugosi. A probabilistic theory of pattern recognition, volume 31 of Applications of Mathematics (New York). Springer-Verlag, New York, 1996.
[2] Pascal Massart. Concentration Inequalities and Model Selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003.
[3] Lucien Birgé and Pascal Massart. Minimal penalties for Gaussian model selection. Probab. Theory Related Fields, 138(1-2):33-73, 2007.
[4] Sylvain Arlot and Pascal Massart. Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res., 10:245-279 (electronic), 2009.
[5] Jean-Patrick Baudry, Cathy Maugis and Bertrand Michel. Slope Heuristics : Overview and Implementation. Technical Report 7223, INRIA, 2010.
[6] Sylvain Arlot and Francis Bach. Data-driven calibration of linear estimators with minimal penalties. Proceedings of NIPS 2009.
[7] Sylvain Arlot. Choosing a penalty for model selection in heteroscedastic regression. Preprint. 2010.
[8] Bradley Efron and Robert J. Tibshirani. An Introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York, 1993.
[9] Sylvain Arlot. Model selection by resampling penalization. Electron. J. Stat., 3:557-624 (electronic), 2009.
[10] Sylvain Arlot and Alain Celisse. A survey of cross-validation procedures for model selection. Statist. Surv., 4:40-79, 2010.
[11] Sylvain Arlot and Alain Celisse. Segmentation of the mean of heteroscedastic data via cross-validation. Statistics and Computing, 2010.
[12] Sylvain Arlot. V-fold cross-validation improved: V-fold penalization. arXiv:0802.0566v2.
[13] Matthieu Lerasle. Optimal model selection in density estimation. hal-00422655, 2009.
[14] Adrien Saumard. Nonasymptotic quasi-optimality of AIC and the slope heuristics in maximum likelihood estimation of density using histogram models. hal-00512310, 2010.
[15] Adrien Saumard. The slope heuristics in heteroscedastic regression. hal-00512306, 2010.


Abstract

Prediction is among the major problems in statistical learning. Given a sequence of examples of pairs of random variables (X_i,Y_i), i=1..n, the goal is to "predict" from X only (the explanatory variables) the value of Y (the interest variable). Many estimators have been proposed for prediction, and each estimator usually depends itself on one or several parameters, whose calibration crucial for optimizing the statistical performance.

These lectures will address the problem of data-driven estimator selection, which includes the calibration problem as well as model selection (e.g., which variables in X are the most useful for predicting Y ?). Two main approaches will be considered: penalization of the empirical risk (with deterministic or with data-driven penalties), and (V-fold) cross-validation.

We will focus on two main kinds of questions: Which theoretical results can be proved for these selection procedures, and how these results can help practicioners to choose a selection procedure for a given statistical problem ? How can theory help to design new selection procedures that improve existing ones ?



Retour à l'index - Back to index