-
Notifications
You must be signed in to change notification settings - Fork 959
Decompiler outputs
!!! Some features described here were not yet merged to master
!!!
This page describes various RetDec decompilation outputs.
The default decompilation (without any special options listed below) of an input file input.exe
produces the following output files:
-
input.exe.dsm
: Disassembly output in our custom format. Instruction mnemonics are in the default Capstone format. -
input.exe.bc
: The final product of the Core decompilation part in LLVM bitcode format. -
input.exe.ll
: Human-readable disassembly of LLVM bitcode file in LLVM IR format. -
input.exe.config.json
: Metadata produced in the decompilation process. -
input.exe.c
: The decompiled C code. The main output.
As you can see, the output file names are generated by simply adding proper suffixes to the input file name: <input_file>.{dsm, bc, ll, config.json, c}
The following decompilation script (retdec-decompiler.py
) options control the output generation process:
-
-o FILE, --output FILE
If specified, the main decompilation output is stored toFILE
instead of<input_file>.c
. Furthermore,FILE
(without a potential suffix) is used as a base name to generate other output file names:<FILE_w/o_suffix>.{dsm, bc, ll, config.json}
-
-l LANGUAGE, --target-language LANGUAGE
Sets the target high-level language to generate. Currently, only C language is supported.
Note: There used to be apy
option generating a pseudo Python output. It was removed because it was scarcely tested, infrequently used by users, and it made adding new features harder and slower. -
-f OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
The defaultplain
option generates the main decompilation output directly as a high-level-language source code into an associated text file (e.g. C source code into a*.c
file). Thejson
andjson-human
options generate the output source code as a stream of lexer tokens, plus additional information. See section below for detailed format description. The suffix of the main decompilation output file is changed to.json
. -
--cleanup
Removes temporary files created during decompilation. Only the main decompilation output file and the disassembly file are preserved. -
--stop-after
Stops the decompilation process after the given tool. The natural consequence is that only output files generated up to that point are generated.
Parsing high-level-language source code is not trivial. However, 3rd party reversing tools might need to do just that in order to make use of RetDec output. Furthermore, additional meta-information may be required to enhance user experience or automated analyses - information that is hard to convey in a traditional high-level-language source code. Usage examples:
- Syntax highlighting in RetDec IDA plugin.
- Relations between decompiled output lines/elements and the original disassembly instructions in RetDec IDA plugin and RetDec Radare2 plugin.
In order to make these applications possible, RetDec offers an option (see the previous section) to generate its output as a sequence of annotated lexer tokens into a JSON format. Two JSON flavours can be generated:
- Human-readable JSON containing proper indentation (option
json-human
). - Machine-readable JSON without any indentation (option
json
).
In order to parse both flavours with a single parser implementation, they both use the same keys and values.
The current JSON schema is the following:
{
"language" : "<language_ID>",
"tokens" :
[
{
"addr" : "<address_format>",
"kind" : "<kind_values>",
"val" : "<value>"
},
// ...
]
}
- All the values are of the string data type.
-
language
key identifies the high-level language being tokenized. Possible<language_ID>
values:
Value | Description |
---|---|
C | C language |
- Source code is serialized in an array of token objects in a
tokens
array. - Token object contains the following entries:
- Address with key
addr
and value in prefixed hexadecimal format (e.g.0x8048577
). More on addresses below. - Value
val
which holds the actual token string as would appear in the source code. - Token kind with key
kind
and the following values:
- Address with key
Value | Description |
---|---|