Skip to content

Decompiler outputs

Peter Matula edited this page Aug 31, 2019 · 10 revisions

!!! Some features described here were not yet merged to master !!!

This page describes various RetDec decompilation outputs.

Generated files

The default decompilation (without any special options listed below) of an input file input.exe produces the following output files:

  • input.exe.dsm: Disassembly output in our custom format. Instruction mnemonics are in the default Capstone format.
  • input.exe.bc: The final product of the Core decompilation part in LLVM bitcode format.
  • input.exe.ll: Human-readable disassembly of LLVM bitcode file in LLVM IR format.
  • input.exe.config.json: Metadata produced in the decompilation process.
  • input.exe.c: The decompiled C code. The main output.

As you can see, the output file names are generated by simply adding proper suffixes to the input file name: <input_file>.{dsm, bc, ll, config.json, c}

Output generation options

The following decompilation script (retdec-decompiler.py) options control the output generation process:

  • -o FILE, --output FILE
    If specified, the main decompilation output is stored to FILE instead of <input_file>.c. Furthermore, FILE (without a potential suffix) is used as a base name to generate other output file names: <FILE_w/o_suffix>.{dsm, bc, ll, config.json}
  • -l LANGUAGE, --target-language LANGUAGE
    Sets the target high-level language to generate. Currently, only C language is supported.
    Note: There used to be a py option generating a pseudo Python output. It was removed because it was scarcely tested, infrequently used by users, and it made adding new features harder and slower.
  • -f OUTPUT_FORMAT, --output-format OUTPUT_FORMAT
    The default plain option generates the main decompilation output directly as a high-level-language source code into an associated text file (e.g. C source code into a *.c file). The json and json-human options generate the output source code as a stream of lexer tokens, plus additional information. See section below for detailed format description. The suffix of the main decompilation output file is changed to .json.
  • --cleanup
    Removes temporary files created during decompilation. Only the main decompilation output file and the disassembly file are preserved.
  • --stop-after Stops the decompilation process after the given tool. The natural consequence is that only output files generated up to that point are generated.

JSON output file format

Parsing high-level-language source code is not trivial. However, 3rd party reversing tools might need to do just that in order to make use of RetDec output. Furthermore, additional meta-information may be required to enhance user experience or automated analyses - information that is hard to convey in a traditional high-level-language source code. Usage examples:

In order to make these applications possible, RetDec offers an option (see the previous section) to generate its output as a sequence of annotated lexer tokens into a JSON format. Two JSON flavours can be generated:

  • Human-readable JSON containing proper indentation (option json-human).
  • Machine-readable JSON without any indentation (option json).

In order to parse both flavours with a single parser implementation, they both use the same keys and values.

The current JSON schema is the following:

{
    "language" : "<language_ID>",
    "tokens" :
    [
        {
            "addr" : "<address_format>",
            "kind" : "<kind_values>",
            "val" : "<value>"
        },
        // ...
    ]
}
  • All the values are of the string data type.
  • language key identifies the high-level language being tokenized. Possible <language_ID> values:
Value Description
C C language
  • Source code is serialized in an array of token objects in a tokens array.
  • Token object contains the following entries:
    • Address with key addr and value in prefixed hexadecimal format (e.g. 0x8048577). More on addresses below.
    • Value val which holds the actual token string as would appear in the source code.
    • Token kind with key kind and the following values:
Value Description
Clone this wiki locally