mai 2026
| Intervenant : | Thomas Weinberger |
| Institution : | EPFL |
| Heure : | 14h00 - 15h00 |
| Lieu : | 3L15 |
We study the MSE of $k$-fold cross-validation (CV) as a risk estimator. Despite the widespread use of CV, principled guidance for choosing $k$ is largely absent, mainly due to the complex dependence between fold-wise error estimates.
We first focus on the Majority algorithm in binary classification, a minimal yet nontrivial ERM procedure for which existing theory provides loose and even vacuous bounds. We introduce a minimax framework for CV risk estimation and show via a reduction to Majority that no ERM can achieve $O(1/n)$ minimax MSE once $k=\omega_n(1)$; instead, a $\Omega(\sqrt{k}/n)$ worst-case lower bound is unavoidable.
Complementing this, we construct a learning rule for which there exist distributions with $\Omega(k/n)$ MSE, matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$.
Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\sqrt{k}/n$ regimes.
Based on joint work with Ido Nachum (University of Haifa) and Rüdiger Urbanke (EPFL) (COLT 2026). For a full version with additional results, see https://arxiv.org/pdf/2511.03554.