Performance/cost analysis of a cloud based solution for big data analytic: Application in intrusion detection

Abstract : The essential target of ‘Big Data’ technology is to provide new techniques and tools to assimilate and store large amount of generated data in a way to analyze and process it to get insights and predictions that can offer new opportunities towards the improvement of our life in different domains. In this context, ‘Big Data’ treats two essential issues: the real-time analysis issue introduced by the increasing velocity at which data is generated, and the long-term analysis issue introduced by the huge volume of stored data. To deal with these two issues, we propose in this paper a Cloud-based solution for big data analytic on Amazon Cloud operator. Our objective is to evaluate the performance of Big Data services offered regarding the volume/velocity of the processed data. The dataset we use contains information about”network connections” in approximately 5 million records with 41 features; the solution works as a network intrusion detector. It receives data records in real time from a raspberry pi node and predicts if the connection is bad (malicious intrusion or attack) or good (normal connection). The prediction model was made using a logistic regression network. We evaluate the cloud resources needed to train the machine learning model (batch processing), and to predict the new streaming data with the trained network in real time (real time processing). The solution worked very well with high accuracy and the results show that when working with Big Data in the cloud, we are mainly dealing with a cost/performance trade-off, the processing performance in term of response time for both long-term and real-time analysis can be always guaranteed once the cloud resources are well provisioned according to the needs.
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02155795
Contributor : Frédéric Davesne <>
Submitted on : Thursday, June 13, 2019 - 9:31:38 PM
Last modification on : Wednesday, November 13, 2019 - 11:50:04 AM

Identifiers

  • HAL Id : hal-02155795, version 1

Citation

Nada Chendeb Taher, Imane Mallat, Nazim Agoulmine, Nour El-Mawass. Performance/cost analysis of a cloud based solution for big data analytic: Application in intrusion detection. 1st International Conference on Big Data and Cyber-Security Intelligence (BDCSIntell 2018), Dec 2018, Beirut, Lebanon. pp.34--41. ⟨hal-02155795⟩

Share

Metrics

Record views

24