Results

This folder contains the results of the experiments and the IPython notebooks to extract the different metrics and generate the plots.

Download the output data for each model we tested

Warning: the following steps will require about 13GB of free disk space.

To download the data from Google Drive use the gdrive_download.py Python3 script and follow the instructions below:

Install the Python3 virtualenv
Create a new virtualenv and install the required packages

# create a new "env" environment
python3 -m venv ../env
# enter the virtual environment
source ../env/bin/activate

# Install the requirements in the current environment
pip install -r ../requirements.txt

Download and unzip the data in the corresponding folders:

python3 ../gdrive_download.py --results

The data will be unzipped in the following directories:

Results/data/Dataset-1
Results/data/Dataset-1-CodeCMR
Results/data/Dataset-2
Results/data/Dataset-Vulnerability
Results/data/raw_results

Process the data to extract the different metrics and generate the plots

Most of the model implementations directly return the similarity between the function pairs for each dataset we tested. The CSV files with the results are saved in the corresponding Dataset folder under the data directory.

All the CSV files use the same header:

idb_path_1,fva_1,idb_path_2,fva_2,sim

idb_path and fva are used as "primary keys" to identify a single function
The sim column contains the similarity (distance) value computed using the specific metric required by each approach.

However, some models require an intermediate step to convert the output to this standard form. The data/raw_results folder includes the output from Asm2vec/Doc2vec, Catalog1, CodeCMR and FunctionSimSearch.

Use the Convert Asm2vec results IPython notebook to process the Asm2vec and Doc2vec output (data/raw_results/Asm2vec)
Use the Convert Catalog1 results IPython notebook to process the Catalog1 output (data/raw_results/Catalog1)
Use the Convert CodeCMR results IPython notebook to process the CodeCMR output (data/raw_results/CodeCMR)
Use the Convert FunctionSimSearch results IPython notebook to process the FunctionSimSearch output (data/raw_results/FunctionSimSearch).

Finally, there are three IPython notebooks to extract the metrics for all the experiments:

AUC and similarity plots computes the AUC for each task and model configuration
MRR@10 and Recall@K computes the MRR@10 and Recall@K metrics
Vulnerability task eval generates the metrics for the Vulnerability test case.

The output is saved in the metrics_and_plots folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Results

Download the output data for each model we tested

Process the data to extract the different metrics and generate the plots

Files

README.md

Latest commit

History

README.md

File metadata and controls

Results

Download the output data for each model we tested

Process the data to extract the different metrics and generate the plots