Maximal Deviations of Incomplete U-statistics with Applications to Empirical Risk Sampling
Résumé
It is the goal of this paper to extend the \textit{Empirical Risk Minimization} (ERM) paradigm, from a practical perspective, to the situation where a natural estimate of the risk is of the form of a $K$-sample $U$-statistics, as it is the case in the $K$-partite ranking problem for instance. Indeed, the numerical computation of the empirical risk is hardly feasible if not infeasible, even for moderate samples sizes. Precisely, it involves averaging $O(n^{d_1+\ldots+d_K})$ terms, when considering a $U$-statistic of degrees $(d_1,\;\ldots,\; d_K)$ based on samples of sizes proportional to $n$. We propose here to consider a drastically simpler Monte-Carlo version of the empirical risk based on $O(n)$ terms solely, which can be viewed as an \textit{incomplete generalized $U$-statistic}, and prove that, remarkably, the approximation stage does not damage the ERM procedure and yields a learning rate of order $O_{\mathbb{P}}(1/\sqrt{n})$. Beyond a theoretical analysis guaranteeing the validity of this approach, numerical experiments are displayed for illustrative purpose.
Origine : Fichiers produits par l'(les) auteur(s)