Finite Sample Improvement of Akaike's Information Criterion

Adrien Saumard; Fabien Navarro

doi:10.1109/TIT.2021.3094770

Article Dans Une Revue IEEE Transactions on Information Theory Année : 2021

Finite Sample Improvement of Akaike's Information Criterion

(1, 2) , (3)

1
2
3

Adrien Saumard

Fonction : Auteur

Centre de Recherche en Economie et Statistique [Bruz]

Ecole Nationale de la Statistique et de l'Analyse de l'Information [Bruz]

Fabien Navarro

Fonction : Auteur
PersonId : 4634
IdHAL : fabien-navarro
ORCID : 0000-0002-4979-2745
IdRef : 185580262

Statistique, Analyse et Modélisation Multidisciplinaire (SAmos-Marin Mersenne)

Résumé

Considering the selection of frequency histograms, we propose a modification of Akaike's Information Criterion that avoids overfitting, even when the sample size is small. We call this correction an over-penalization procedure. We emphasize that the principle of unbiased risk estimation for model selection can indeed be improved by addressing excess risk deviations in the design of the penalization procedure. On the theoretical side, we prove sharp oracle inequalities for the Kullback-Leibler divergence. These inequalities are valid with positive probability for any sample size and include the estimation of unbounded log-densities. Along the proofs, we derive several analytical lemmas related to the Kullback-Leibler divergence, as well as concentration inequalities, that are of independent interest. In a simulation study, we also demonstrate state-of-theart performance of our over-penalization criterion for bin size selection, in particular outperforming AICc procedure.

Mots clés

model selection bin size AIC corrected overpenalization small sample size

Domaines

Statistiques [math.ST] Apprentissage [cs.LG] Machine Learning [stat.ML]

Fichier principal

SauNav_MLE_final.pdf (2.04 Mo)

SauNav_MLE_Supp.pdf (412.34 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Fabien Navarro : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03286369

Soumis le : mercredi 14 juillet 2021-13:55:19

Dernière modification le : vendredi 26 avril 2024-16:59:39

Archivage à long terme le : vendredi 15 octobre 2021-16:14:34

Dates et versions

hal-03286369 , version 1 (14-07-2021)

Identifiants

HAL Id : hal-03286369 , version 1
DOI : 10.1109/TIT.2021.3094770

Citer

Adrien Saumard, Fabien Navarro. Finite Sample Improvement of Akaike's Information Criterion. IEEE Transactions on Information Theory, 2021, 67 (10), ⟨10.1109/TIT.2021.3094770⟩. ⟨hal-03286369⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS1 GENES SAMOS SAMM CREST ENSAI

125 Consultations

119 Téléchargements

Finite Sample Improvement of Akaike's Information Criterion

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager