Thesis Abstracts 2002

Research and Graduate Studies Electrical and Computer Engineering

Detecting Malicious Use with Unlabelled Data Using Clustering and Outlier Analysis

By: Carosielli, Luciano, MASc

Supervisor: Dr. G.S. Knight

Abstract

Most commercial intrusion detection systems (IDSs) presently available are signature based network IDSs. Organisations using these IDSs are still experiencing difficulties in detecting intrusive activity within their networks due to the fact that novel attacks are consistently being released, not all attacks go through the network, and analysts can easily miss legitimate alarms when reviewing the vast alarm logs produced. Research into improving anomaly based IDSs has been receiving increasing attention within the last ten years. Many researchers are utilizing popular data mining techniques in an effort to effectively detect intrusive activity. Empirical results obtained so far have demonstrated that these techniques can be effective when trained/calibrated using labelled datasets. Unfortunately, the creation of these labelled datasets is resource intensive.

This thesis simulates and analyses malicious activity on an existing local area network to determine if it is possible to detect the malicious activity with data mining techniques using unlabelled datasets. The network connection data collected when using several popular malware tools is combined with intranet and Internet daily usage connection data to create training and test datasets. Semi-discrete decomposition (SDD) is used as a clustering and outlier analysis technique to identify the connections within the training datasets and classify the previously unseen connections within the test datasets as either normal or anomalous. The empirical results obtained during this thesis will be compared to the results obtained by the existing data mining techniques to determine SDD's labelling and classifying effectiveness.