Abstract : We present how we used machine learning techniques to detect malicious behaviours in PDF files.
At this aim, we first set up a SVM (Support Machine Vector) classifier that was able to detect 99.7% of
malware. However, this classifier was easy to lure with malicious PDF files, which we forged to make them
look like clean ones. For instance, we implemented a gradient-descent attack to evade this SVM. This attack
was almost 100% successful. Next, we provided counter-measures to this attack: a more elaborated features
selection and the use of a threshold allowed us to stop up to 99.99% of this attack.
Finally, using adversarial learning techniques, we were able to prevent gradient-descent attacks by iteratively
feeding the SVM with malicious forged PDF files. We found that after 3 iterations, every gradient-descent
forged PDF file were detected, completely preventing the attack.