Survey of Machine Learning Techniques for Malware Analysis
Coping with malware is getting more and more challenging, given their
relentless growth in complexity and volume. One of the most common approaches
in literature is using machine learning techniques, to automatically learn
models and patterns behind such complexity, and to develop technologies for
keeping pace with the speed of development of novel malware. This survey aims
at providing an overview on the way machine learning has been used so far in
the context of malware analysis. We systematize surveyed papers according to
their objectives (i.e., the expected output, what the analysis aims to), what
information about malware they specifically use (i.e., the features), and what
machine learning techniques they employ (i.e., what algorithm is used to
process the input and produce the output). We also outline a number of problems
concerning the datasets used in considered works, and finally introduce the
novel concept of malware analysis economics, regarding the study of existing
tradeoffs among key metrics, such as analysis accuracy and economical costs.