Skip to content

elixir-uk/hcnv-uk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title author date output theme
hCNV Tools : Compatibility and Support in Galaxy Platform
Khaled Jumah, Krzysztof Poterlowicz
11/04/2022
html_document
jekyll-theme-cayman

Intergrating a Copy number variant (CNV) detecting tools into Galaxy.

Despite the development of numerous CNV detection tools over recent years, only a limited number have been integrated into the Galaxy platform, with even fewer remaining fully supported and functional Zhao et al. (2013).

The following compilation provides a thorough overview of tools specifically designed to identify heterogeneous Copy Number Variations (hCNV) in Whole Exome Sequencing datasets. Each tool’s name, relevant link, Galaxy platform availability, and maintenance status—whether actively supported or no longer maintained—are outlined to assist in the selection of viable options for integration and use within Galaxy.

Title Description Source Tool version CONDA CONDA source Wrapper source Wrapper owner Wrapper version Comment Pull requests (Linked) Status Assignees
AbsCN-seq AbsCN-seq is an R package for estimation of tumor purity, ploidy, and copy number variation (CNV) in next-generation sequencing data. https://bioinformaticshome.com/tools/cnv/descriptions/AbsCN-seq.html 1.0 NO N/A N/A N/A N/A N/A N/A To add N/A
Accurity Accurity is a software tool for inferring tumor cell ploidy, tumor purity, and absolute allelic copy numbers for somatic copy number alterations in whole-genome sequencing (WGS) data. https://www.yfish.org/display/PUB/Accurity N/A NO N/A N/A N/A N/A (whole exome may work too) N/A To add N/A
ADTEx A tool for detection of copy number variation (CNV) in whole-genome exome data from paired or healthy tumor samples. The ADTex algorithm uses Hidden Markov Models (HMM) to predict CNV counts, genotypes, polyploidy, aneuploidy, cell contamination, and baseline shifts. The authors originally named ADTex to CoNVEX, but they changed the name because of the conflict with another tool. https://bioinformaticshome.com/tools/cnv/descriptions/ADTEx.html 2.0 NO N/A N/A N/A N/A N/A N/A To add N/A
aCNViewer aCNViewer is a tool for visualization of copy number variation (CNV) and loss of heterozygosity (LOH) in tumor samples. aCNViewer uses the Docker application. Requires: R, Python, Mode, Affymetrix power tools, Ascat, sequenza, and ggplot2. https://bioinformaticshome.com/tools/cnv/descriptions/aCNViewer.html 2.2 NO N/A N/A N/A N/A N/A N/A To add N/A
Bamgineer A tool to simulate haplotype-phased allele-specific copy number variation (CNV) into a Binary Alignment Mapping (BAM) and cancer CNVs in exome and targeted cell-free DNA sequencing data. https://bioinformaticshome.com/tools/cnv/descriptions/Bamgineer.html 2.0 NO N/A N/A N/A N/A N/A N/A To add N/A
BubbleTree BubbleTree is an R package for analysis of tumor samples in next generation sequencing (NGS) data. The BubbleTree algorithm can estimate cancer causing cell inpurity, ploidy, clonality, and allele-sepcific copy number variation (CNV). It displays the results in a graph format. The Authors claim that BubbleTree out performs THetA2, ABSOLUTE, AbsCN-seq and ASCAT tools. https://bioinformaticshome.com/tools/cnv/descriptions/BubbleTree.html 2.14.0 Yes https://anaconda.org/search?q=BubbleTree N/A N/A N/A N/A N/A To add N/A
CANOES CANOES is a tool for detection of copy number variation (CNV) in whole-genome exome sequencing data. The CANOES algorithm uses the negative binomial distribution to model read counts and a regression-based method on user-selected reference samples. https://bioinformaticshome.com/tools/cnv/descriptions/CANOES.html *** the tool is not av curently NO N/A N/A N/A N/A N/A N/A To add N/A
CODEX2 The tool uses depth information from the sequence to determine the presence or absence of a CNV https://github.com/yuchaojiang/CODEX2 N/A NO N/A N/A N/A N/A N/A N/A To add N/A
Control-FREEC a tool for assessing copy number and allelic content using next generation sequencing data. https://github.com/BoevaLab/FREEC v11.6 Yes https://anaconda.org/search?q=Control-FREEC%09 https://github.com/galaxyproject/tools-iuc/tree/master/tools/freec iuc 11.6 N/A N/A Up-to-date N/A
CoNIFER Uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes https://sourceforge.net/projects/conifer/ 0.2.1 Yes https://anaconda.org/search?q=CoNIFER N/A N/A N/A N/A N/A To add N/A
CONTRA a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data. https://contra-cnv.sourceforge.net/ v2.0.6 NO N/A https://toolshed.g2.bx.psu.edu/repository fcaramia 2.4 N/A N/A To update N/A
CoNVaDING A tool for indentification of copy number variation (CNV) specifically in next-generation sequencing (NGS) data. CoNVaDING algorithm contains quality control metrics and can be used for clinical diagnostics. Requires: samtools, Perl, and the Statistics::Normality package for perl. https://bioinformaticshome.com/tools/cnv/descriptions/CoNVaDING.html 1.2.0 NO N/A N/A N/A N/A N/A N/A To add N/A
CloneCNA CloneCNA is a tool to detect copy number variation (CNV) in heterogeneous tumor samples in whole-genome exome sequencing data. The CloneCNA algorithm identifies clonal and subclonal CNVs by computing the log-ratio of the read counts of paired healthy samples and tumor B allele frequencies in germline at independent SNP locations. https://bioinformaticshome.com/tools/cnv/descriptions/CloneCNA.html 2.0 NO N/A N/A N/A N/A N/A N/A To add N/A
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. https://github.com/etal/cnvkit 0.9.9 Yes https://anaconda.org/search?q=CNVkit N/A N/A N/A N/A N/A Up-to-date khaled196
cn.mops cn.mops (Copy Number estimation by a Mixture Of PoissonS) is an R package pipeline for analysis of copy number variation (CNV) in next-generation sequencing (NGS) data. https://bioinformaticshome.com/tools/cnv/descriptions/cn-mops.html 1.30.0 Yes https://anaconda.org/search?q=cn.mops N/A N/A N/A N/A N/A To add N/A
CNspector CNspector is a web-based tool for clinical diagnosis and visualization of copy number variation (CNV) using next-generation sequencing (NGS) data. https://bioinformaticshome.com/tools/cnv/descriptions/CNspector.html N/A NO N/A N/A N/A N/A special methosds for WES N/A To add N/A
CNVfinder CNVfinder is a tool to detect copy number variation in whole-genome exome sequencing data generated using amplicon-based enrichment technologies. https://bioinformaticshome.com/tools/cnv/descriptions/CNVfinder.html 1.0.0b5 NO N/A N/A N/A N/A N/A N/A To add N/A
CNVnator A tool for discovery and characterization of copy number variation (CNV) in population genome sequencing data. The tool is also well suited for personal genome analysis. The method is based on the mean-shift, Multiple-bandwidth partitioning, and GC correction. https://bioinformaticshome.com/tools/cnv/descriptions/CNVnator.html N/A Yes https://anaconda.org/search?q=CNVnator https://toolshed.g2.bx.psu.edu/repository fanruimeng N/A It was used in a paper to detect CNVs in WES data N/A Neglected N/A
cnvOffSeq cnvOffSeq is specifically designed for the detection of intergenic copy number variation (CNV) using off-target exome sequencing data. The algorithm uses local adaptive singular value decomposition (SVD) to normalize off-target. https://bioinformaticshome.com/tools/cnv/descriptions/cnvOffSeq.html 0.1.2 NO N/A N/A N/A N/A N/A N/A To add N/A
CN_Learn CN_Learn is a tool for copy number variation (CNV) detection. The algorithm integrates multiple CNV detection algorithms and learns to identify CNVs based on validated CNVs. https://bioinformaticshome.com/tools/cnv/descriptions/CN_Learn.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
DeAnnCNV DeAnnCNV (Detection and Annotation of Copy Number Variations) is a web-based tool to detect and annotate copy number variation (CNV) in whole genome exome sequencing. The algorithm is based on GPHMM tool (see links). https://bioinformaticshome.com/tools/cnv/descriptions/DeAnnCNV.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
EXCAVATOR2 EXCAVATOR2 is a tool for detecting copy number variation (CNV) in whole-genome exome sequencing data. The EXCAVATOR2 algorithm uses a hidden Markov model (HMM) and a method that classifies genomic regions into five CNV states. EXCAVATOR2 is an enchanced version of EXCAVATOR (see links). https://bioinformaticshome.com/tools/cnv/descriptions/EXCAVATOR2.html 1.1.2 NO N/A N/A N/A N/A N/A N/A To add N/A
ExCNVSS ExCNVSS is a software tool to detect copy number variation (CNV) in whole-genome exome sequencing data. The ExCNVSS algorithm is based on an evaluation of the coverage and uses a scale-space filtering approach to resolve coverage biases. https://bioinformaticshome.com/tools/cnv/descriptions/ExCNVSS.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
ExoCNVTest !!!!! an exome sequencing analysis pipeline to identify disease-associated CNVs and to generate absolute copy number genotypes at putatively associated loci https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436806/ N/A NO N/A N/A N/A N/A N/A N/A To add N/A
exomeCopy exomeCopy R package is for detection of copy number variants (CNV) from exome and unpaired sample sequencing. The implementation is based on a hidden Markov model on background read depth and GC-content to normalize copy count. https://bioinformaticshome.com/tools/cnv/descriptions/exomeCopy.html 1.30.0 Yes https://anaconda.org/search?q=exomeCopy N/A N/A N/A N/A N/A To add N/A
ExomeCNV a tool for detection of copy number variation (CNV) and loss of heterozygosity (LOH). The algorithm uses statistics of sequence coverage and B-allele frequencies for the estimation of CNV and LOH. https://bioinformaticshome.com/tools/cnv/descriptions/ExomeCNV.html 1.4 NO N/A N/A N/A N/A N/A N/A To add N/A
ExomeDepth A tool for calling copy number variation (CNV) from targeted exome sequencing data. ExomeDepth tool is specifically designed to address technical variability between the samples. https://github.com/vplagnol/ExomeDepth 1.1.16 Yes https://anaconda.org/search?q=ExomeDepth https://toolshed.g2.bx.psu.edu/repository iuc 1.1.10 N/A N/A To update N/A
Genovar Genovar is a tool for detection of copy number variation (CNV). It can compare detected CNVs with variants in the Database of Genomic Variants ( DGV) to find out whether variants are novel or previously detected. Genovar can visualize genomic source data, e.g., aCGH and sequence alignment data. https://bioinformaticshome.com/tools/cnv/descriptions/Genovar.html 0.951b NO N/A N/A N/A N/A N/A N/A To add N/A
HMZDelFinder HMZDelFinder is a tool for detection of homozygous/hemizygous copy number variation (CNV) in data from Mendelian disease cohorts. https://bioinformaticshome.com/tools/cnv/descriptions/HMZDelFinder.html 3.2.1 NO N/A N/A N/A N/A N/A N/A To add N/A
ichorCNA ichorCNA is an R tool for estimation of tumor fractions in ultra-low pass whole genome sequencing (WGS) and prediction of large-scale copy number variation (CNV). The ichorCNA algorithm uses a hidden Markov model (HMM) for the probabilistic modeling and works in sequencing coverages down to 0.1X. https://bioinformaticshome.com/tools/cnv/descriptions/ichorCNA.html 0.1.0 Yes https://anaconda.org/search?q=ichorCNA N/A N/A N/A N/A N/A To add N/A
iCNV A tool for detecting copy number variation (CNV) in various study designs: whole-genome exome sequencing, whole-genome sequencing, and single nucleotide polymorphism (SNP). iCNV (integrated CNV) algorithm the Hidden Markov Model (HMM) to do platform-specific normalization and to integrate sequencing data with SNP-array data. https://bioinformaticshome.com/tools/cnv/descriptions/iCNV.html 1.4.0 Yes https://anaconda.org/search?q=iCNV N/A N/A N/A N/A N/A To add khaled196
MATCHCLIP A legacy tool for detecting copy number variation (CNV) breakpoints. The algorithm identified the breakpoint CIGAR strings of reads and their positions. See links for the updated version. https://bioinformaticshome.com/tools/cnv/descriptions/MATCHCLIP.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
matchclips2 An updated version of matchclips for detecting copy number variation (CNV) breakpoints. The algorithm identified the breakpoint CIGAR strings of reads and their positions. The updated algorithm runs much faster than the original one. https://bioinformaticshome.com/tools/cnv/descriptions/matchclips2.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
ONCOCNV ONCOCNV is a tool to detect copy number variation (CNV) in ultra-deep targeted sequencing. The algorithm uses multifactor normalization and annotation techniques to detect large copy number variation in amplicon sequencing data. The Authors claim their method to have comparable precision to CGH techniques. https://bioinformaticshome.com/tools/cnv/descriptions/ONCOCNV.html 6.9 NO N/A N/A N/A N/A N/A N/A To add N/A
panelcn.MOPS A tool for the detection of copy number variation (CNV) in Next Generation Sequencing (NGS) data panels. This tool includes QC criteria for samples and for a region of interest. Implemented as a standalone tool with a graphical user interface for clinical diagnostics usage and as an R package. https://bioinformaticshome.com/tools/cnv/descriptions/panelcn-MOPS.html 1.1.0 Yes https://anaconda.org/search?q=panelcn.MOPS N/A N/A N/A N/A N/A To add N/A
Patchwork A tool for analysis and visualization of allele-specific copy number variation (CNV) and loss-of-heterozygosity (LOH) in cancer genomes. The Patchwork tool comes in two variants, Patchwork which takes BAM files as input and PatchworkCG takes CompleteGenomics files as input. Alternative name: patchworkCG. https://bioinformaticshome.com/tools/cnv/descriptions/Patchwork.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
PennCNV-ExomeSeq PennCNV Copy Number Variant Call Detection in Exome Sequencing https://penncnvexomeseq.sourceforge.net/ N/A NO N/A N/A N/A N/A N/A N/A To add N/A
PureCN PureCN package is for detection of copy number variation (CNV) and single nucleotide variation classification in targeted sequencing data. The PureCN algorithm estimates tumor purity, CNV, loss of heterozygosity (LOH), contamination, and classifies single nucleotide variants (SNVs) by somatic status and clonality. The algorithm integrates well with various somatic variant detection pipelines. https://bioinformaticshome.com/tools/cnv/descriptions/PureCN.html 1.14.1 Yes https://anaconda.org/search?q=PureCN N/A N/A N/A N/A N/A To add N/A
PropSeg A regression model for estimating DNA copy number applied to capture sequencing data https://ccb.nki.nl/software/ocs/ 0.9-4 NO N/A N/A N/A N/A N/A N/A To add N/A
PyLOH PyLOH is a tool for detecting copy number variation (CNV) and loss of heterozygosity in cancer genomes. The algorithm uses a unified probabilistic framework. https://bioinformaticshome.com/tools/cnv/descriptions/PyLOH.html 1.1 Yes https://anaconda.org/search?q=PyLOH N/A N/A N/A N/A N/A To add N/A
RefCNV A tool to copy number variation (CNV). The algorithm uses a reference set to estimate the sequence coverage of each exon. https://bioinformaticshome.com/tools/cnv/descriptions/RefCNV.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
RUbioSeq RUbioSeq is an integrated software suite for primary and secondary analysis of single nucleotide (SNP), copy number variants (CNV), and bisulfite-seq analyses. https://bioinformaticshome.com/tools/cnv/descriptions/RUbioSeq.html 3.8.1 NO N/A N/A N/A N/A N/A N/A To add N/A
RUbioSeq+ RUbioSeq+ is an updated and extended version of RUbioSeq. It is an integrated software suite for primary and secondary analysis of single nucleotide (SNP), copy number variants (CNVs), and bisulfite-seq analyses. The extended version includes a pipeline for ChIP-seq experiments, DNA-seq experiments, improvements in the parallelization, and multithreading. https://bioinformaticshome.com/tools/cnv/descriptions/RUbioSeq-plus.html 3.8.1 NO N/A N/A N/A N/A N/A N/A To add N/A
SAAS-CNV SAAS-CNV is a tool to identify somatic copy number variation (CNV) in next-generation sequencing (NGS) data. The SAAS-CNV algorithm consists of four main steps: (1) reading a reference and alleles at each locus for comparison of the read depth and alleles with tumor and healthy samples, (2) joint segmentation on the two signals, (3) correction of the CNV baseline, (4) calling of somatic CNVs for all detected signals. https://bioinformaticshome.com/tools/cnv/descriptions/SAAS-CNV.html 0.3.4 NO N/A N/A N/A N/A N/A N/A To add N/A
SeqGene A tool which integrates mutation identification, annotation, genotyping, expression quantification, copy number variation (CNV), expression quantitative trait loci (eQTLs) detection, allele specific expression (ASE), differentially expressed genes (DEGs) identification, and pathway analysis workflows in a single package. https://sourceforge.net/projects/seqgene/ 2.5 NO N/A N/A N/A N/A N/A N/A To add N/A
Sequenza Sequenza package provides tools to genotyping cancer samples, cancer cellularity, ploidy, copy number variation (CNV), and infer alleles that are mutated. https://bioinformaticshome.com/tools/cnv/descriptions/Sequenza.html 3.0.0 Yes https://anaconda.org/search?q=Sequenza https://toolshed.g2.bx.psu.edu/repository artbio 3.0.0+ N/A N/A Up-to-date N/A
SynthEx SynthEx is a tool to detect copy number alteration (CNA) and to profile tumor heterogeneity in a variety of high-throughput sequencing data. The SynthEx algorithm uses "synthetic normal" to represent the target; Thus, it does not need a matched pair of samples as normals. https://bioinformaticshome.com/tools/cnv/descriptions/SynthEx.html 1.06 NO N/A N/A N/A N/A N/A N/A To add N/A
TITAN TITAN is a tool for estimation of copy number variation (CNV) and loss of heterozygosity (LOH) in whole-genome sequencing data consisting of clonal cell populations. The TITAN algorithm uses hidden Marlov model (HMM). Alternative name: TitanCNA https://bioinformaticshome.com/tools/cnv/descriptions/TITAN.html 1.22.0 Yes https://anaconda.org/dranew/bioconductor-titan N/A N/A N/A can also work on WES dara N/A To add N/A
VarScan2 a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. https://sourceforge.net/projects/varscan/ 2.4.0 NO N/A https://toolshed.g2.bx.psu.edu/repository yhoogstrate 2.4.2 two tools are in the toolshed https://toolshed.g2.bx.psu.edu/repository N/A Neglected N/A
VCF2CNA VCF2CNA is a tool to detect copy number variation (CNV) in Tumor/Germline Variant Call Format (VCF) files. The VCF2CNA algorithm also computes tumor purity estimates for samples. https://bioinformaticshome.com/tools/cnv/descriptions/VCF2CNA.html N/A NO N/A N/A N/A N/A N/A N/A To add N/A
XCAVATOR a tool for detecting copy number variation (CNV) in whole-genome exome sequencing data https://sourceforge.net/projects/xcavator/ N/A NO N/A N/A N/A N/A N/A N/A To add N/A
XHMM Recover information on CNVs from targeted exome sequence data by running depth of coverage calculations, data normalization, CNV calling, and statistical genotyping https://github.com/RRafiee/XHMM N/A NO N/A N/A N/A N/A N/A N/A To add N/A
WISARD WISARD, a Workbench for Integrated Superfast Association study with Related Data is a statistical analysis toolkit for the analysis of large-scale single nucleotide polymorphism (SNP), copy number variation (CNV), and next-generation sequencing (NGS) data. With WISARD you can analyze related and unrelated samples. The code is optimized for running in multi-core CPUs. https://bioinformaticshome.com/tools/cnv/descriptions/WISARD.html 1.3.2 NO N/A N/A N/A N/A N/A N/A To add N/A
Manta Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. https://github.com/Illumina/manta v1.6.0 Yes https://anaconda.org/bioconda/manta https://toolshed.g2.bx.psu.edu/repository artbio 1.6+ N/A N/A Up-to-date N/A
CLAMMS CLAMMS is a scalable tool for detecting common and rare copy number variants from whole-exome sequencing data. https://github.com/rgcgithub/clamms 1.1 NO N/A N/A N/A N/A N/A N/A To add N/A
SeqCNV (cheek the tool link again) a novel method for identification of copy number variations in targeted next-generation sequencing data https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1566-3#Sec9 N/A NO N/A N/A N/A N/A N/A N/A To add N/A
Ulysses accurate detection of low-frequency structural variations in large insert-size sequencing libraries http://www.lcqb.upmc.fr/ulysses/ 1.0 NO N/A N/A N/A N/A N/A N/A To add N/A
CNV-RF Random Forest–Based Copy Number Variation Detection Method Using Next-Generation Sequencing https://github.com/getiria-onsongo/tso_cnv/tree/cnv_paper 1.2.2 NO N/A N/A N/A N/A N/A N/A To add N/A
GATK gCNV Precise common and rare germline CNV calling with GATK https://aacrjournals.org/cancerres/article/78/13_Supplement/2287/626564 N/A NO N/A N/A N/A N/A N/A N/A To add N/A

Galaxy Training Network Community provides a comprehensive tutorial to guide readers through the complete process of integrating a tool into Galaxy, including:

  • Creating a Bioconda recipe for a new tool,
  • Writing a Galaxy tool wrapper, and
  • Testing and deploying the tool in both local and public Galaxy environments.

This document presents a case study on integrating CNVkit into Galaxy, following the steps outlined in the tutorial above.

Benchmarking hCNV Detection Tools

Selecting a tool for analysis requires an understanding of each tool's CNV detection accuracy, execution time, and infrastructure requirements.

Our plan includes:

  1. Conducting benchmark tests on the CNV detection workflow in Galaxy to measure runtime and detected CNV performance.
  2. Comparing these results against a reference CNV workflow.

The reference CNV workflow is available through the Genome in a Bottle project.

Workflow and tools/code are available at:
https://github.com/NCBI-Hackathons/TheHumanPangenome/tree/master/MHC/e2e_notebooks

Training and Dissemination

The preprocessing steps for CNV data (from quality control to mapping) are typically similar across CNV detection workflows, with differences arising in the detection step itself.

Our plan to develop a tutorial for hCNV tools includes:

  1. Reviewing the input requirements for available CNV tools,
  2. Creating a shared preprocessing workflow adaptable for different tool datasets,
  3. Highlighting the tool with the most extensive preprocessing needs as a central focus, while introducing other CNV tools as alternatives within the workflow,
  4. Providing links to further reading for detailed guidance on each specific CNV tool, and
  5. Identifying CNV detection tools available both within and outside Galaxy.

Current progress can be found in this tutorial, which uses Control-Freec for CNV detection. This tutorial could serve as a foundational resource for future tutorials on this topic.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published