From 200043aec327a5e15888714aa28dcbe3b75ea6e4 Mon Sep 17 00:00:00 2001 From: Dongkwan Kim <0xdkay@gmail.com> Date: Sat, 28 Nov 2020 00:46:48 +0900 Subject: [PATCH] Fix #1 and add more details in the readme --- README.md | 69 ++++++++++++++++++++++++++++++++++++++-------- example/example.sh | 2 +- requirements.txt | 10 +++---- 3 files changed, 63 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 599e29b..e642ccd 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ This step takes the most time. Please configure the `chunk_size` for parallel processing. ```bash -$ python3 helper/do_idascript.py \ +$ python helper/do_idascript.py \ --idapath "/home/dongkwan/.tools/ida-6.95" \ --idc "tiknib/ida/fetch_funcdata.py" \ --input_list "example/input_list_find.txt" \ @@ -101,32 +101,77 @@ Additionally, you can use this script to run any idascript in parallel. ### 2. Extract function type information for type features. ```bash -python3 helper/extract_functype.py \ - --source_list "example/source_list.txt" \ - --input_list "example/input_list_find.txt" \ - --ctags_dir "data/ctags" \ - --threshold 1 +$ python helper/extract_functype.py \ + --source_list "example/source_list.txt" \ + --input_list "example/input_list_find.txt" \ + --ctags_dir "data/ctags" \ + --threshold 1 ``` ### 3. Extract numeric presemantic features and type features. ```bash -python3 helper/extract_features.py \ - --input_list "example/input_list_find.txt" \ - --threshold 1 +$ python helper/extract_features.py \ + --input_list "example/input_list_find.txt" \ + --threshold 1 ``` ### 4. Evaluate target configuration ```bash -python3 helper/test_roc.py \ - --input_list "example/input_list_find.txt" \ - --config "config/gnu/config_gnu_normal_all.yml" +$ python helper/test_roc.py \ + --input_list "example/input_list_find.txt" \ + --config "config/gnu/config_gnu_normal_all.yml" ``` For more details, please check `example/`. All configuration files for our experiments are in `config/`. +# Issues + +### Tested environment +We ran all our experiments on a server equipped with four Intel Xeon E7-8867v4 +2.40 GHz CPUs (total 144 cores), 896 GB DDR4 RAM, and 4 TB SSD. We setup Ubuntu +16.04 with IDA Pro v6.95 on the server. + +We will make it run on IDA Pro v7.5 soon. + +### Tested python version +- Python 3.8.0 + +### Running example +The time spent for running `example/example.sh` took as below. + +- Processing IDA analysis: 1384 s +- Extracting function types: 102 s +- Extracting features: 61 s +- Training: 31 s +- Testing: 0.8 s + +You can obtain below information after running `test_roc.py` in the example. +Note that below is just one example. + +``` +Features: +inst_num_abs_ctransfer (inter): 0.4749 +inst_num_cmp (inter): 0.5500 +inst_num_cndctransfer (inter): 0.5906 +... +... +... +Avg \# of selected features: 13.0000 +Avg. TP-TN Gap: 0.3866 +Avg. TP-TN Gap of Grey: 0.4699 +Avg. ROC: 0.9424 +Std. of ROC: 0.0056 +Avg. AP: 0.9453 +Std. of AP: 0.0058 +Avg. Train time: 30.4179 +AVg. Test time: 1.4817 +Avg. # of Train Pairs: 155437 +Avg. # of Test Pairs: 17270 +``` + # Authors This project has been conducted by the below authors at KAIST. * [Dongkwan Kim](https://0xdkay.me/) diff --git a/example/example.sh b/example/example.sh index 9d4e878..0ebcae8 100755 --- a/example/example.sh +++ b/example/example.sh @@ -19,6 +19,6 @@ python3 helper/extract_features.py \ --threshold 1 echo "Testing features ..." -python3 test_roc.py \ +python3 helper/test_roc.py \ --input_list "example/input_list_find.txt" \ --config "config/gnu/config_gnu_normal_all.yml" diff --git a/requirements.txt b/requirements.txt index e514627..14887ba 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,7 +1,7 @@ +numpy==1.19.4 +capstone==4.0.0 coloredlogs==6.0 -capstone==4.0.0rc1 -numpy==1.11.0 -python_ctags3==1.2.4 -networkx==2.1 -scikit_learn==0.23.2 +networkx==2.5 +python_ctags3==1.5.0 PyYAML==5.3.1 +scikit_learn==0.23.2