Worst-Case Bounds for Cross-Validation as a Risk Estimator

Séminaire Probabilités et Statistiques

28
May 2026

Intervenant :	Thomas Weinberger
Institution :	EPFL
Heure :	14h00 - 15h00
Lieu :	3L15

We study the MSE of $k$-fold cross-validation (CV) as a risk estimator. Despite the widespread use of CV, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates.

We first focus on the Majority algorithm in binary classification, a minimal yet nontrivial ERM procedure for which existing theory provides loose and even vacuous bounds. We introduce a minimax framework for CV risk estimation and show via a reduction to Majority that no ERM can achieve $O(1/n)$ minimax MSE once $k=\omega_n(1)$; instead, a $\Omega(\sqrt{k}/n)$ worst-case lower bound is unavoidable.

Complementing this, we construct a learning rule for which there exist distributions with $\Omega(k/n)$ MSE, matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$.

Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.

Based on joint work with Ido Nachum (University of Haifa) and Rüdiger Urbanke (EPFL) (COLT 2026). For a full version with additional results, see https://arxiv.org/pdf/2511.03554.

All (past and future) events