Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avclass package, class, restructure #35

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

eljeffeg
Copy link

@eljeffeg eljeffeg commented Mar 9, 2021

This is a major change but figured I should create the PR for review or in case you want to copy something. This fork requires Python 3.6+ and removes the original avclass 1.x (tag it for posterity use), so there is only one AVCLASS and it's at version 2.x (Compatibility mode -c still works). Some of the input parameters have changed to make it easier to add and detect things. It also pulls in some PR from other authors that I thought were meaningful.

  • Created a python package
$ git clone http://.../avclass
$ cd avclass
$ pip3 install .
  • Converted the labeler into a Class
import json
from avclass.labeler import AVClassLabeler

av_class = AVClassLabeler()
result = av_class.run(
    files="./examples/malheurReference_lb.json",
    data_type="lb",
    path_export=True,
)
print(json.dumps(result))
  • Command line usage
$ avclass -i ./examples/malheurReference_lb.json -t lb -p
  • Broke up the main function (cyclomatic complexity was really high)
  • Added documentation and type hints
  • Added new arg -json that will print out via console the class dictionary return (used for testing, but might be useful)
  • Added new arg -i (input) and -t (type) and removed -lb, -vt, -lbdir, -vtdir, -vt3, -gz (still supports directories)
  • Added MetaDefender file support

@malicialab
Copy link
Owner

Apologies for this falling in a vacuum. We did check your merge request when first issued, but as you mentioned there were many changes combined in the merge request and it was not easy to merge except blindly.
In any case, we have started addressing some of the issues you raised (better late than never):

  • Now you can install avclass as a package and we have released it as a PyPI package.
  • We have properly formatted the docstrings, but have not added the type hints since that would require Python 3.7+

Other of the proposed changes will hopefully be integrated soon in some form. For example, we plan to modify the options to provide the input and the type, but so far focused on changes that did not require changes to existing (AVClass2) options beyond the tool name change. We also plan to refactor labeler.py to make it easier to be used as a library.

In any case, many thanks for your contribution. We have added you to the list of significant contributors!

@malicialab
Copy link
Owner

malicialab commented Feb 23, 2023

Now there are two command line options -f to provide files and -d to provide directories. The format of each file is automatically determined by the tool. Input-related options removed: -vt, -vt3, -lb, -vtdir, -lbdir, -gz
We prefer to have separate options for files and directories instead of a single -i option as in your fork to avoid potentially confusing semantics.

We have also changed the default behavior of the tool so that now by default it runs in compatibility mode and outputs the families, which we believe is what most users want. There is a new -t option to output the full list of tags. We have also removed the -p option as by default the new -t option includes the full path of the tags as that provides more info.

@eljeffeg
Copy link
Author

Awesome - I'll have to look it over. In particular if you made any changes to common.py. I just implemented AVClass as a service into AssemblyLine. https://github.com/CybercentreCanada/assemblyline-service-avclass

Let me know if you have any suggestions with the implementation. One thing I did expand on was the translations and taxonomy. I provided an option to extend your dataset with Malpedia families / alt_names. I also added a function (think it's called is_hex) as I had a bit of false positives on short hex labels.

@malicialab
Copy link
Owner

I have committed a refactoring of labeler.py to introduce a FileLabeler class. I have introduced a -o option to place the output into a file instead of stdout. I have slightly changed the output format so that -pup, -vtt are included in the now default compatibility mode, but not with the -t (tags) option. Could add them with the tags as well if that seems useful.

Changes to common.py are limited to moving the get_sample_info* methods from AvLabels into FileLabeler. The idea is to leave AvLabels to focus on the labels and FileLabeler on files with many reports.

I plan to look into providing a few more options into AvLabels so that there is more flexibility using the package as a library

Regarding the integration into AssemblyLine, is your goal by adding Malpedia to provide additional aliases for a given tag? Or something else?

@eljeffeg
Copy link
Author

Part of it was to add possible aliases and giving a common name for a family. But it also could expand what is detected as a family by adding additional "FAM:...". If a family is detected, it's possible it also links attribution, references, and a history of the malware.

@malicialab
Copy link
Owner

I just committed support for OPSWAT MetaDefender as provided in this fork

Regarding Malpedia, we added support for such integration when producing the first MISP taxonomy, but the intern working on this left before we could commit the cleaned script to automatically update the MISP taxonomy to keep it in sync with the latest AVClass taxonomy and rules. We can try to resurrect that effort if that seems worth it for MISP users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants