Skip to content

Riccorl/ml-malware-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code style: black

ml-malware-classifier

Reference

Daniel Arp, Michael Spreitzenbarth, Malte Huebner, Hugo Gascon, and Konrad Rieck 
"Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket", 
21th Annual Network and Distributed System Security Symposium (NDSS), February 2014
  • The original paper can be found here.
  • The original dataset can be found here.

Usage

The code is inside code folder. Use main.py to run the script.

usage: main.py [-h] [-d DATA] [--type TYPE] [-s S] method
Android Malware Classificator

positional arguments:  method       mnb=MultinomialNB, bnb=BernoulliNB, sgdc=SGDClassifier, 
                                    lsvc=LinearSVC, svm=SVM, rf=RandomForest

optional arguments:
  -h, --help   show this help message and exit
  -d DATA      path to the dataset folder
  --type TYPE
  -s S         Feature subset

The dataset is inside data folder. There are two subfolders, small_drebin, which contains a very little portion of the original dataset, and medium_drebin, which contains roughly 5500 files. By default the script uses the medium folder. To use custom data, put it inside data folder. It should have this format:

- data
  |__ custom_data
      |__ feature_vectors
          |__ file_1
          |__ file_2
          ...
          |__ file_n
      |__ sha256_family.csv