Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Bose, S. (2020). Towards Explainability in Machine Learning for Malware Detection. Retrieved from https://purl.lib.fsu.edu/diginole/2020_Summer_Fall_Bose_fsu_0071E_16025
Malware has crippled computer systems, caused financial losses on the order of billions of dollars and denied access to services for millions of users worldwide. Detecting malware before it can cause any damage, therefore, is a significant problem. There are a number of methods which try to detect pre-existing as well as new threats. These methods mostly use signatures, which are byte sequences within the executable, for detection. Malware authors have tried a number of methods like advanced obfuscation techniques and run-time packing to evade detection with varying levels of success. In recent days, machine learning techniques have been widely used for a number of applications including malware detection. Most machine learning systems are based on neural networks which are not very interpretable, although that is slowly changing. To combat this problem, we have developed systems which are able to detect malicious programs while also providing an explanation for the classification. The first system to this end is a method using topic models, a suite of statistical models which assign a set of "topics'' to documents. Using a few topic models like Latent Dirichlet Allocation (LDA) and Correlated Topic Models (CTM), the system is able to effectively detect malware it has not encountered before. The advantages and shortcomings of this approach are also presented. In the pursuit of more explainable AI models, a framework is presented which analyzes existing neural networks and explains their decisions. This will be beneficial in building more robust and efficient neural networks. This framework is applied to analyze MalConv and reconcile two disparate findings about its results. It should also be noted that this framework can be used to analyze any convolutional network. Finally, to aid in explainable malware analysis, a reinforcement-learning based program analysis engine is proposed which leverages game-playing techniques to find valid paths to vulnerable code segments. All the methods presented should be beneficial in building an ecosystem of more interpretable and robust models for malware detection and analysts to find new ways to protect users from new malware.
A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Xiuwen Liu, Professor Directing Dissertation; Ming Yu, University Representative; David Whalley, Committee Member; Zhi Wang, Committee Member.
Publisher
Florida State University
Identifier
2020_Summer_Fall_Bose_fsu_0071E_16025
Bose, S. (2020). Towards Explainability in Machine Learning for Malware Detection. Retrieved from https://purl.lib.fsu.edu/diginole/2020_Summer_Fall_Bose_fsu_0071E_16025