Skip to content

YARA Signator # User Manual for Version 0.3.1

fxb-cocacoding edited this page Apr 24, 2020 · 3 revisions

Getting Started:

In this section we show all the information that is necessary to build a working copy of YARA-Signator and its ingredients.

Installation guidelines:

You need to build the following software:

JAVA:

oracle-jdk-bin-1.8

This is the tested version we used, but we try to support all major JAVA versions and vendors. Feel free to use IBM/Oracle/OpenJDK/IceTeaJDK in every version >= 8 and open a bug report if the recent version is not supported. JAVA support issues will be handled with priority!

The installation of JAVA is recommended via packetmanager of your distribution.

Maven:

To build this project you need maven. Install it via packetmanager of your distribution.

YARA:

yara and yarac, version 3.8.0

We plan to support recent versions, but currently a fix would lead to a lot of changes at the code basis. We fix this when we have time for it.

https://virustotal.github.io/yara/

Capstone Daemon:

capstone_daemon, version 0.3.0

Here the versioning should not be a problem, feel free to open bug reports either here or if the problem is more specific to capstone_daemon, open them there.

See the installation details here: https://github.com/fxb-cocacoding/capstone_server

PostgreSQL:

postgres-11

You need a working instance of PostgreSQL, at least version 11. Version 11 is tested and an example configuration can be found in section PostgreSQL Configuration.

The installation of PostgreSQL is recommended via packetmanager of your distribution.

java2yara:

java2yara 0.3.0

You need a copy of the library java2yara to build YARA-Signator.

git clone https://github.com/fxb-cocacoding/java2yara.git
cd java2yara
mvn package
mvn install:install-file -Dfile=target/java2yara-0.3.0-SNAPSHOT.jar -DpomFile=pom.xml

See more information on the page of the project:

https://github.com/fxb-cocacoding/java2yara.git

smda-reader:

smda-reader 0.3.0

You need a copy of the library smda-reader to build YARA-Signator.

git clone https://github.com/fxb-cocacoding/smda-reader.git
cd smda-reader
mvn package
mvn install:install-file -Dfile=target/smda-reader-0.3.0-SNAPSHOT.jar -DpomFile=pom.xml

See more information on the page of the project:

https://github.com/fxb-cocacoding/smda-reader.git

YARA-Signator:

yara-signator 0.3.1

Then you can finally build YARA-Signator:

git clone https://github.com/fxb-cocacoding/yara-signator.git
cd yara-signator
mvn package

The software is now usable and the binary package is located in the folder target.

No installation a la make install is necessary, you can do that in some folders as you have to provide URIs to the executables in the YARA-Signator configuration file.

First steps:

Check this section to see how to get your first YARA-Rules!

Input data:

You need proper input reports. Currently we only support smda reports. See how to get properly formatted input reports at section How to generate Disassembly Reports.

When you have your stack of input reports, be sure to have a folder where all the reports are located in. Again, it is very important that very report has a correct malpedia_filepath and family entry containing the name in a format like win.citadel. If you are not sure if your reports are formatted correctly, check the example reports in the unit test resource directory in the smda-report repository.

Short example for a properly generated SMDA report from Malpedia with correct metadata:

{
 "architecture": "intel",
 "base_addr": 4194304,
 "bitness": 32,
 "buffer_size": 90112,
 "disassembly_errors": {},
 "execution_time": 1.09353,
 "filename": "",
 "metadata": {
  "family": "win.silence",
  "malpedia_filepath": "win.silence/2017-10-12-downloader/6ba9118ba1ab2aec60f77b8728e8f365c55ad5bfa4f7400602d24a01dd013e33_dump7_0x00400000",
  "message": "Analysis finished regularly."
 },
 "sha256": "61f92f6e0b9d505e50c0b5a4dfc628388a3cf9c6db009789aa0de3bf609a53dc",
 "smda_version": "1.0.3",
 "status": "ok",

[...]

The folder full of smda reports can contain subdirectories etc. but be sure to place only valid smda reports and no other files (check your hidden files) there.

Feel free to ask for help or open a ticket if the input files are not readable. If the ticket is useless we can still just remove it. The process is kind of tricky and we work on that to make it more user-friendly.

Configuration:

See this example configuration file to try to get a first impress of how to use YARA-Signator. To see all possible configurations, check out Configuration of YARA-Signator. The configuration file has to be placed in your home directory at ~/.yarasignator.conf. The name cannot be changed.

{
  "smda_path": "/home/fxb/mount/cruzialpostgres/datastore/smda_report_output/",
  "malpedia_path": "/home/fxb/mount/cruzialpostgres/datastore/malpedia/",
  "output_path": "/home/fxb/mount/cruzialpostgres/datastore/yara-output/",
  "yaraBinary": "/home/fxb/git/yara-3.8.0/yara",
  "yaracBinary": "/home/fxb/git/yara-3.8.0/yarac",
  "malpediaEvalScript": "/home/fxb/codingspace/uni/bachelorthesis/yara-signator/src/main/python/malpedia_evaluation.py",
  "malpediaEvalScriptOutput": "/tmp/95268496.json",
  "reduceInputForDebugging": false,
  "resumeFolder": "",

  "db_connection_string": "jdbc:postgresql://127.0.0.1/",
  "db_user": "postgres",
  "db_password": "",
  "db_name": "release_0_3_1",

  "skipSMDAInsertions": false,
  "skipUniqueNgramTableCreation": false,
  "skipYaraRuleGeneration": false,
  "skipRuleValidation": false,
  "skipNextGen": false,

  "insertion_threads": 16,
  "rulebuilder_threads": 8,

  "shuffle_seed": 12345678,
  "minInstructions": 100,
  "batchSize": 5000,
  "instructionLimitPerFamily": 15000000,
  "ng_recursion_limit": 1,

  "reportStatistics": true,
  "reportFileName": "report.csv",
  "duplicatesInsideSamplesEnabled": false,
  "permitOverlappingNgrams": true,
  "wildcardConfigEnabled": true,
  "rankingOptimizerEnabled": true,
  "scoreCommentEnabled": true,
  "prettifyEnabled": true,

  "wildcardConfig": [
    {
      "wildcardOperator": "callsandjumps"
    },
    { 
      "wildcardOperator": "datarefs"
    },
    { 
      "wildcardOperator": "binvalue"
    }
  ],

  "rankingConfig": [
    {
      "ranker": "rankPerNgramScore",
      "limit": 5000
    }
  ],

  "nextGenConfig": [
    { 
      "rankingConfig": [
        {
          "ranker": "rankPerNgramScore",
          "limit": 5000
        }
      ],
      "nextGenOperator": "CandidateOne",
      "rounds": 1,
      "permitOverlappingNgrams": false,
      "yara_condition": "7 of them",
      "yara_condition_limit": 7,
      "nextGenBreakout": {
        "score": "f_score",
        "score_limit": 0.9,
        "FPs_allowed": true
      }
    },
    { 
      "rankingConfig": [
        {
          "ranker": "rankPerNgramScore",
          "limit": 5000
        }
      ],
      "nextGenOperator": "ParseMalpediaEval",
      "rounds": 10,
      "permitOverlappingNgrams": false,
      "yara_condition": "1 of them",
      "yara_condition_limit": 7,
      "nextGenBreakout": {
        "score": "f_score",
        "score_limit": 0.9,
        "FPs_allowed": true
      }
    },
    { 
      "rankingConfig": [
        {
          "ranker": "rankPerNgramScore",
          "limit": 5000
        }
      ],
      "nextGenOperator": "CandidateOne",
      "rounds": 2,
      "permitOverlappingNgrams": false,
      "yara_condition": "7 of them",
      "yara_condition_limit": 7,
      "nextGenBreakout": {
        "score": "f_score",
        "score_limit": 0.9,
        "FPs_allowed": true
      }
    }
  ],

  "n": [
    4,
    5,
    6,
    7
  ]

}

The following points need to be customized to fit to your installation and/or environment:

"db_connection_string": "jdbc:postgresql://127.0.0.1/",
"db_user": "postgres",
"db_password": "",
"db_name": "release_0_3_1",

These are the connection details for your postgres installation. Per default you should have a postgres user without passphrase. If you want to increase the security, use a password protected user. The database privileges should be wide enough to create databases, tables, partitions and indexes plus editing, dropping and deleting them. No new users are created by that user, so you should stay away from using default root users etc.

VERY IMPORTANT:

  • The database name is important. If the database already exists, it will be dropped (This means your data is gone) within the first 30 seconds of the program execution.
  • Do not use a productive dbms server, it will become unresponsive.
  • Use enough powerful hardware and a good configuration

"smda_path": "/home/fxb/mount/cruzialpostgres/datastore/smda_report_output/"

This shall point to your directory containing all reports you want to process.

"malpedia_path": "/home/fxb/mount/cruzialpostgres/datastore/malpedia/"

This points to Malpedia. To get access, contact @push_pnx. Clone the complete Malpedia repository and add this folder here. The Malpedia folder has to be called malpedia/ (important!). It is the folder that contains the hidden .git folder. If this is not configured correctly, you may screw up the evaluation steps (leading to bad signatures).

"output_path": "/home/fxb/mount/cruzialpostgres/datastore/yara-output/"

This is the folder where the YARA-Rules are stored and regenerated during the run.

"yaraBinary": "/home/fxb/git/yara-3.8.0/yara"
"yaracBinary": "/home/fxb/git/yara-3.8.0/yarac"

Both need to be changed to your local, self-compiled (and outdated..) YARA version.

"malpediaEvalScript": "/home/fxb/codingspace/uni/bachelorthesis/yara-signator/src/main/python/malpedia_evaluation.py"

This is the malpedia evaluation script. It is located in the repository you cloned to build YARA-Signator, located in src/main/python/malpedia_evaluation.py`.

"malpediaEvalScriptOutput": "/tmp/95268496.json"

It is not necessary to overwrite this default value, but the file is overridden multiple times. Be sure to use a file name that is not used yet, otherwise the file will be erased.

"reduceInputForDebugging": false

This is a developer feature to reduce the input files (to currently 350). Useful if you want to run against a large corpus but want to inspect why the program crashed at a certain point.

"resumeFolder": ""

Useful if the program crashes after on of the following steps.

"skipSMDAInsertions": false,
"skipUniqueNgramTableCreation": false,
"skipYaraRuleGeneration": false,
"skipRuleValidation": false,
"skipNextGen": false,

These steps are the core components of our approach. If you want to resume after a crash, e.g. wrong yarac path etc., then you can change the respective value from false to true.

"insertion_threads": 16
"rulebuilder_threads": 8

These are the values that change the runtime performance. A recommended value is to use number of cpu threads * 2 as a value for insertion_threads (although more than 64 should be not that fast any more since disk IO might become a problem at a certain point).

The number of rulebuilder_threads should be the number of cpu cores. Be sure to use the same number of cores when running capstone_server as they will communicate.

You can choose several rankers by writing just the name of the ranker defined in the ranking factory. You can choose different prefilters at the wildcardConfig section based on the name of the prefilters (in another factory file). They are always executed in the same order as written in the config file. If you want to create custom filters and rankers, the factory files should be a good entry point. The design is very modular, so your should be able to integrate your own filters very quick by just creating a new class, adding it to the factory class and compile again. A real plugin interface for external jar/class files is a TODO.

Details about other configurations and operation modes, engines etc. can be found at the respective wiki pages.

Operating Systems and Platforms.

Currently only recent or popular Linux distributions are supported. We develop on Gentoo Linux systems and test on Ubuntu, so these are in fact the best tested distributions. But there should be no problems using any other distribution from Arch over Debian to RedHat or CentOS.

Windows or macOS support is not planed.