A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights

Abstract : Logistic regression, one of the most popular machine learning binary classification methods, has been long believed to be unbiased. In this paper, we consider the "hard" classification problem of separating high dimensional Gaussian vectors, where the data dimension p and the sample size n are both large. Based on recent advances in random matrix theory (RMT) and high dimensional statistics, we evaluate the asymptotic distribution of the logistic regression classifier and consequently, provide the associated classification performance. This brings new insights into the internal mechanism of logistic regression classifier, including a possible bias in the separating hyperplane, as well as on practical issues such as hyper-parameter tuning, thereby opening the door to novel RMT-inspired improvements. Index Terms-High dimensional statistic, logistic regression , machine learning, random matrix theory.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02139980
Contributor : Xiaoyi Mai <>
Submitted on : Sunday, May 26, 2019 - 5:12:12 PM
Last modification on : Friday, June 14, 2019 - 3:52:22 PM

File

A large scale analysis of logi...
Files produced by the author(s)

Identifiers

Citation

Xiaoyi Mai, Zhenyu Liao, Romain Couillet. A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights. ICASSP, May 2019, Brighton, United Kingdom. ⟨10.1109/ICASSP.2019.8683376⟩. ⟨hal-02139980⟩

Share

Metrics

Record views

18

Files downloads

22