Malware Detection in PDF Files Using Machine Learning

Abstract : We present how we used machine learning techniques to detect malicious behaviours in PDF files. At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack was almost 100% successful. Next, we provided counter-measures to this attack: a more elaborated features selection and the use of a threshold allowed us to stop up to 99.99% of this attack. Finally, using adversarial learning techniques, we were able to prevent gradient-descent attacks by iteratively feeding the SVM with malicious forged PDF files. We found that after 3 iterations, every gradient-descent forged PDF file were detected, completely preventing the attack.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01704766
Contributor : Mathieu Valois <>
Submitted on : Monday, August 20, 2018 - 11:32:46 AM
Last modification on : Friday, June 14, 2019 - 6:31:19 PM
Long-term archiving on : Wednesday, November 21, 2018 - 12:58:10 PM

File

Malware Detection in PDF Files...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01704766, version 2

Citation

Bonan Cuan, Aliénor Damien, Claire Delaplace, Mathieu Valois. Malware Detection in PDF Files Using Machine Learning. SECRYPT 2018 - 15th International Conference on Security and Cryptography, Jul 2018, Porto, Portugal. 8p. ⟨hal-01704766v2⟩

Share

Metrics

Record views

848

Files downloads

1890