|Thesis abstract: |
Since their first appearance in 1981 the number of malware has exploded.
Nowadays, there are so many different variants of malicious software
(more than 150k different malware signatures) that analyzing them one by
one is not feasible.
Therefore, researchers are developing new techniques of analysis based
on what makes pieces of code ¿malicious¿: their behavior.
A software¿s behavior can be characterized through the sequence of
system calls, or API calls, that it executes to interact with the
underlying operating system. So, a way to pragmatically represent a
software behavior is a list of executed API calls with information about
API tracing can be performed through dynamic analysis, i.e. running
malware samples on a sandbox with an instrumented operating system.
These calls can be aggregated into higher-level, more complex behaviors,
generating a hierarchy with different levels of granularity.
In most previous research works, such high-level behaviors need to be
defined by a knowledgeable expert in the field of malware analysis.
Our purpose is to develop and implement behavior-based malware analysis
tools that can assist analysts in their daily job, and automatically
detect and classify new behaviors, without human help, or with limited
assistance from an expert.
The idea is to exploit data mining and machine learning techniques to
extract relevant and new information from large datasets of malware samples.