Skip to Main content Skip to Navigation
Conference papers

Malware Detection in PDF Files Using Machine Learning

Abstract : We present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack was almost 100% successful. Next, we provided counter-measures to this attack: a more elaborated features selection and the use of a threshold allowed us to stop up to 99.99% of this attack. Finally, using adversarial learning techniques, we were able to prevent gradient-descent attacks by iteratively feeding the SVM with malicious forged PDF files. We found that after 3 iterations, every gradient-descent forged PDF file were detected, completely preventing the attack.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download
Contributor : Mathieu VALOIS Connect in order to contact the contributor
Submitted on : Monday, August 20, 2018 - 11:32:46 AM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM
Long-term archiving on: : Wednesday, November 21, 2018 - 12:58:10 PM


Malware Detection in PDF Files...
Files produced by the author(s)


  • HAL Id : hal-01704766, version 2


Bonan Cuan, Aliénor Damien, Claire Delaplace, Mathieu Valois. Malware Detection in PDF Files Using Machine Learning. SECRYPT 2018 - 15th International Conference on Security and Cryptography, Jul 2018, Porto, Portugal. 8p. ⟨hal-01704766v2⟩



Record views


Files downloads