Speeding-up K-means clustering

Modern Intrusion Detection Systems (IDS) must process enormous quantities of traffic data in real time. Reducing the amount of data that IDS has to process at a time is necessary in order for these systems to be practically useful. The reduction is performed by grouping the similar attack signatures in the IDS knowledge base and comparing the actual network traffic with the representatives of such groups instead of each member of the group. The problem of grouping similar attack signatures is solved by classifying these signatures, very often in an unsupervised way. In that case, we are talking about clustering of attack signatures, i.e. finding well separated groups of similar signatures without learning. There are many methods of clustering, of which a particular one, the K-means partitional clustering method has gained popularity due to its linear time complexity with respect to the number of data units (vectors of features) to cluster. But with the increase of network bandwidth, even linear time complexity becomes insufficient. Since the beginning of the 21st century, several methods of improvement of the 50 years old original K-means algorithm have been proposed with the idea of reducing its time complexity especially when implemented in a distributed computing environment. This talk reviews these proposals and puts some research questions related to the properties of clusters and their optimal number.

Dato: 22. september 2017, kl. 12.12

Ingen slettedato satt

Rom: A146

Last ned filer: Lyd Kamera Skjerm Kombinert

Vis video i enkel avspiller