Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 2.09 KB

File metadata and controls

29 lines (22 loc) · 2.09 KB

BinaryAI / CodeCMR

The BinaryAI / CodeCMR experiment consists of two steps. The first is an IDA Pro plugin that takes as input a JSON specifying which functions to consider for the tests, and as output it produces intermediate results in pickle format. In the second step a neural network is trained to recognise similar functions. The output of the neural network is a vector representation for each function. We release only Part 1, since the authors of the CodeCMR paper trained and tested the model for us (more information in the [technical report](../../Additional technical information.pdf)).

Part 1

Before running the IDA plugin, follow the list of requirements from the IDA_scripts README

  • Input: the JSON file with the selected functions (-j) and the output directory (-o).
  • Output: one pickle file per IDB. The file contains a serialized version of a NetworkX graph (with the extracted features) for each function analyzed in the IDB.

Notes:

  • The plugin requires the IDA Pro decompiler license
  • IDA Pro requires 32-bit IDA to decompile 32-bit binaries and 64-bit IDA for 64-bit binaries. 32-bit binaries need to be exported to .idb format, while 64-bit binaries require .i64 format.
  • The path of the IDB files in the JSON in input must be relative to the binary_function_similarity directory. The Python3 script converts the relative path into a full path to correctly load the IDB in IDA Pro.

Example: run the plugins over the functions selected for the Dataset-1-CodeCMR test (requires the .i64 IDBs for 64-bit binaries and .idb for 32-bit binaries)

cd IDA_CodeCMR
python3 cli_codeCMR.py -j ../../../DBs/Dataset-1-CodeCMR/features/selected_Dataset-1-CodeCMR.json -o Dataset-1-CodeCMR

Run unit tests:

python3 -m unittest test_codeCMR.py

Copyright information about the BinaryAI / CodeCMR plugin

IDA_CodeCMR.py includes part of the code from https://github.com/binaryai/sdk/ which is licensed under GPL-3.0.