Séminaire Probabilités et Statistiques
Worst-Case Bounds for Cross-Validation as a Risk Estimator
28
May 2026
logo_team
Intervenant : Thomas Weinberger
Institution : EPFL
Heure : 14h00 - 15h00
Lieu : 3L15

We study the MSE of $k$-fold cross-validation (CV) as a risk estimator. Despite the widespread use of CV, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates. 

 

We first focus on the Majority algorithm in binary classification, a minimal yet nontrivial ERM procedure for which existing theory provides loose and even vacuous bounds. We introduce a minimax framework for CV risk estimation and show via a reduction to Majority that no ERM can achieve $O(1/n)$ minimax MSE once $k=\omega_n(1)$; instead, a $\Omega(\sqrt{k}/n)$ worst-case lower bound is unavoidable.

 

Complementing this, we construct a learning rule for which there exist distributions with $\Omega(k/n)$ MSE, matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. 

 

Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes. 

 

Based on joint work with Ido Nachum (University of Haifa) and Rüdiger Urbanke (EPFL) (COLT 2026). For a full version with additional results, see https://arxiv.org/pdf/2511.03554.

All (past and future) events