WARNING: THIS PROJECT HAS BEEN MIGRATED TO https://atlassian.petermac.org.au/bitbucket/projects/SJDAW/repos/mismatchfinder AND IS NO LONGER MAINTAINED AT THIS LOCATION.

MisMatchFinder

The MisMatchFinder algorithm identifies mismatches within reads compared to the reference genome and filters background noise unrelated to somatic mutations through:

the use of high thresholds for mapping and base quality,
strict consensus between overlapping read-pairs,
gnomAD-based germline variant filtering,
and a ctDNA-centric fragmentomics filter

Installation

Install time is 100% dependent on the compile time and therefore different between systems, but on a standard x86 laptop compile time is typically less than 2 minutes.

Requirements: Rust

Clone this repository
Build the release optimised binary with cargo build -r
Move the binary (target/release/mismatchfinder) into the $PATH

Usage

To ensure optimal results, please use the gnomAD echtvar (available from https://github.com/brentp/echtvar/releases) or an eqivalent echtvar file

USAGE:
    mismatchfinder [OPTIONS] --output <OUTPUT_FOLDER> [--] [BAMS]...

ARGS:
    <BAMS>...    Bams to analyse

OPTIONS:
        --blacklist_bed <BLACKLIST_FILE>
            Bed file for genomic regions to ignore in the analysis. White list bed file regions
            overwrite black list regions

        --fragment_length_intervals <FRAGMENT_LENGTH_INTERVALS>...
            Length of fragments to be considered in the analysis [default: 100-150 250-325]

        --germline_file <GERMLINE_FILE>
            File to read germline information from default is an echtvar file

    -h, --help
            Print help information

        --maximum_edit_distance_per_read <MAX_EDIT_DISTANCE_PER_READ>
            Maximum mismatches we allow in the read [default: 15]

        --minimum_average_base_quality <MIN_AVERAGE_BASE_QUALITY>
            Mimimum average base quality of the read [default: 25]

        --minimum_edit_distance_per_read <MIN_EDIT_DISTANCE_PER_READ>
            Mimimum mismatches we require in the read [default: 1]

    -o, --output <OUTPUT_FOLDER>
            Output folder to write files to

        --only_overlaps
            only use the overlap of the two reads

        --overwrite
            overwrite previous results

    -q, --minimum_mapping_quality <MIN_MAPPING_QUALITY>
            Mimimum mapping quality for a read to be considered [default: 20]

    -Q, --minimum_base_quality <MIN_BASE_QUALITY>
            Mimimum base quality of the mismatch (BQ is summed for a readpair if reads agree)
            [default: 65]

        --strict_overlaps
            only analyse mismatches if the read pair agree (does not restrict to only overlap)

    -V, --version
            Print version information

        --whitelist_bed <WHITELIST_FILE>
            Bed file for genomic regions to include in the analysis. White list bed file regions
            overwrite black list regions [default: all regions]

Example analysis

With the example bam runing the default analysis with the provided whitelist we generate the vcf output example_bamsites.vcf.gz

please refer to echtvar release to download the echtvar reference for gnomad v3.1.2

Runtime of this step should be less than 20 seconds on the test data

$ mismatchfinder --germline_file /path/to/echtvarfile/gnomad.v3.1.2.echtvar.v2.zip  --whitelist_bed /path/to/whitelist.bed -o /path/to/outputfolder/--strict_overlaps --only_overlaps /path/to/example.bam 
[...]
2023-12-08T04:15:16.586Z INFO [mismatchfinder] Found 8738 mismatches 
2023-12-08T04:15:18.389Z INFO [mismatchfinder] Found 828 somatic mismatches

once we have the vcf (can be found in the examplefolder), we can perform signature deconvolution in R once both data.table and sigminer are installed

library(data.table)
library(sigminer)

maf <- sigminer::read_vcf("example_bamsites.vcf.gz", genome_build = "hg38")
tally <- sigminer::sig_tally(maf, mode="SBS", ref_genome="BSgenome.Hsapiens.UCSC.hg38")

# get the relative signature weights
SBSfit <- sigminer::sig_fit(t(tally$nmf_matrix), sig_index = signatures_selection, sig_db = "SBS", return_class = "data.table", mode="SBS", type="relative")

#plot the signature weights (highlighting SBS7a)
plot(1:ncol(SBSfit[,-1]), round(SBSfit[1,-1], digits=5), type='h', lwd=4, lend=1, col=ifelse(colnames(SBSfit[1,-1])=="SBS7a", "red", "black"), xlab="", ylab="Signature weight", las=2, xaxt='n', bty='n')
text(1:ncol(SBSfit[,-1]), line2user(0,1), labels = colnames(SBSfit[,-1]), srt=45, adj=1, cex=0.5, xpd=TRUE, col=ifelse(colnames(SBSfit[1,-1])=="SBS7a", "black", "lightgrey"))

the output on the test data will look similar to this

Known issues and possible extentions

No CRAM support yet
Zarr support only rudimentary

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
example		example
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WARNING: THIS PROJECT HAS BEEN MIGRATED TO https://atlassian.petermac.org.au/bitbucket/projects/SJDAW/repos/mismatchfinder AND IS NO LONGER MAINTAINED AT THIS LOCATION.

MisMatchFinder

Installation

Usage

Example analysis

Known issues and possible extentions

About

Releases

Packages

Languages

License

SebastianHollizeck/rust-mismatchfinder

Folders and files

Latest commit

History

Repository files navigation

WARNING: THIS PROJECT HAS BEEN MIGRATED TO https://atlassian.petermac.org.au/bitbucket/projects/SJDAW/repos/mismatchfinder AND IS NO LONGER MAINTAINED AT THIS LOCATION.

MisMatchFinder

Installation

Usage

Example analysis

Known issues and possible extentions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages