Monday, October 22, 2018

anomaly detection machine learning visualization

  • IDS (Intrusion Detection System) records Attacks and generates log files

Instead of handing someone a log file that describes how an attack happened, one can use a picture, a visual
representation of the log records

Attack / Pen Testing
Data Collection
Analysis
Visualization
Notification

Attack/Pen Testing
Metasploit Framework
Data Collection
Snort, Suricata and Bro –open-source

Comparison among IDSs
Snort is a single-threaded
Suricata makes use of Snort rule-set, in addition to other supporting products along with
multi-threading
Bro provides additional features via its script-based analysis engine and ability to extend the response through scripts
Bro is often the best option for more critical tasks
Higher-level protocol knowledge
Working across multiple network flows
Using a custom algorithm to compute something about the traffic in question.
One of the distinctive aspects of Bro is its categorization of logs
Bro generates several notices based on the customized or default scripting (Detection)
https://www.sanog.org/resources/sanog29/SANOG29-Conference_real-time-visualisation-Aneela.pdf
  • The Suricata engine is capable of real time intrusion detection (IDS), inline intrusion prevention (IPS), network security monitoring (NSM) and offline pcap processing.

With standard input and output formats like YAML and JSON integrations with tools like existing SIEMs, Splunk, Logstash/Elasticsearch, Kibana, and other database become effortless.

https://suricata-ids.org/


  • Anomaly Hunting with Suricata & Splunk

Applying Data Science to Security OPS
https://suricon.net/wp-content/uploads/2016/11/SuriCon2016_AnthonyTellez.pdf


  • RapidMiner Anomaly Detection Extension

The Anomaly Detection Extension for RapidMiner comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets. It allows you to find data, which is significantly different from the normal, without the need for the data being labeled.

Some of the algorithms are:

Local Outlier Factor (LOF)
k-NN Global Anomaly Score
Connectivity-based Outlier Factor (COF)
Local Correlation Integral (LOCI)
Local Outlier Probability (LoOP)
Cluster-based Local Outlier Factor (CBLOF)

https://github.com/Markus-Go/rapidminer-anomalydetection


  • The RapidMiner Educational License Program provides free RapidMiner product licenses for academic usage to students, professors and researchers.

https://rapidminer.com/educational-program/


  • ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains.

https://elki-project.github.io/


  • scikit-learn

Machine Learning in Python
Simple and efficient tools for data mining and data analysis
Built on NumPy, SciPy, and matplotlib
http://scikit-learn.org/stable/index.html


  • Dataiku DSS Community

https://www.dataiku.com/dss/editions/

  • Anomaly detection is the problem of identifying data points that don't conform to expected (normal) behaviour. 

For example, an anomaly in MRI image scan could be an indication of the malignant tumour or anomalous reading from production plant sensor may indicate faulty component
anomaly detection is the task of defining a boundary around normal data points so that they can be distinguishable from outliers.

To keep things simple we will use two features 1) throughput in mb/s and 2) latency in ms of response for each server

The Gaussian model will be used to learn an underlying pattern of the dataset with the hope that our features follow the gaussian distribution.
After that, we will find data points with very low probabilities of being normal and hence can be considered outliers.
For training set, we will first learn the gaussian distribution of each feature for which mean and variance of features are required.
Numpy provides the method to calculate both mean and variance (covariance matrix) efficiently.
Similarly, Scipy library provide method to estimate gaussian distribution.
http://aqibsaeed.github.io/2016-07-17-anomaly-detection/

1 comment:

  1. Hi there! Thank you for sharing your thoughts about electrician in your area. I am glad to stop by your site and know more about electrician. Keep it up! This is a good read. I will be looking forward to visit your page again and for your other posts as well.
    visual inspection machine

    ReplyDelete