-
Notifications
You must be signed in to change notification settings - Fork 0
Mining commits
In this wiki we describe how to use command-line options to mine commits of a git repository with MiSON. Functions of MiSON can also be imported into a script, in which case consult their respective docstrings.
The results will be a csv
file with a row for each modified file in each commit. The fields that are saved are commit_hash
, author_name
, author_email
, committer_name
, committer_email
, commit_date
, additions
, deletions
, and filename
.
Additionally, it is possible to map modified files to the corresponding microservice by providing a custom mapping function. Then the saved table will also contain the field microservice
(see Optional arguments).
The main command for mining commits with MiSON is mison commit
.
At least the fool options are mandatory for all cases:
-
--repo
Path to the repository. Depending on the backend used can be a local path or a URL -
--backend
Which backend to use, current options aregithub
andpydriller
-
--commit_table
Output path to save thecsv
table of all mined commits and their file modifications. Can bedefault
, in which case the format ismison_BACKEND_commit_table_TIMESTAMP.csv
These arguments can be provided in case mapping from the modified file name to the corresponding microservice is necessary.
-
--import_mapping_file
The name of the file from which a user-defined function of signaturestr -> str
is imported. Can be a*.py
file, in which case the default expected function name ismicroservice_mapping
(can be modified with--import_mapping_func
, see below). Another option is to provide the name of a module defined inmison.mappings
, for examplemison.mappings.trainticket
(feel free to submit a pull request adding mappings for common benchmarks!) -
--import_mapping_func
The name of the function to import from a specified custom file
The following lists all available backends and their specific CLI options.
This backend connects to the GitHub API to query data of a hosted repository to mine the commits
-
--github_token
A token to access GitHub API. Needs to have permissions to access desired repository -
--per_page
Passed to the API request, the amount of responses per page
This backend uses the PyDriller Python library to mine the commits of a local or remote repository.
The additional parameters are filters accepted by pydriller.Repository
class constructor.
-
--since
Only commits after this date will be analyzed (converted to datetime object) -
--from_commit
Only commits after this commit hash will be analyzed -
--from_tag
Only commits after this commit tag will be analyzed -
--to
Only commits up to this date will be analyzed (converted to datetime object) -
--to_commit
Only commits up to this commit hash will be analyzed -
--to_tag
Only commits up to this commit tag will be analyzed -
--order
Order to traverse commits, options: date-order,author-date-order,topo-order,reverse -
--only_in_branch
Only analyses commits that belong to this branch -
--only_no_merge
Only analyses commits that are not merge commits -
--only_authors
Only analyses commits that are made by these authors (accepts a list of names) -
--only_commits
Only these commits will be analyzed (accepts a list of values) -
--only_releases
Only commits that are tagged (“release” is a term of GitHub, does not actually exist in Git) -
--filepath
Only commits that modified this file will be analyzed -
--only_modifications_with_file_types
Only analyses commits in which at least one modification was done in that file type