-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
156 lines (123 loc) · 6.54 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ProActive
<!-- badges: start -->
<!-- badges: end -->
**`ProActive` automatically detects regions of gapped and elevated read coverage
using a 2D pattern-matching algorithm. `ProActive` detects, characterizes and
visualizes read coverage patterns in both genomes and metagenomes. Optionally,
users may provide gene annotations associated with their genome or metagenome
in the form of a .gff file. In this case, `ProActive` will generate an additional
output table containing the gene annotations found within the detected regions of
gapped and elevated read coverage. Additionally, users can search for gene
annotations of interest in the output read coverage plots.**
Visualizing read coverage data is important because gaps and elevations in coverage can
be indicators of a variety of biological and non-biological scenarios, for example-
* Elevations and gaps in read coverage may be caused by some types of structural
variants. Deletions can cause gaps while duplications can cause elevations in read coverage [1].
* Highly active and/or abundant mobile genetic elements, like transposable
elements [2] and prophage [3] for example, can create elevations in read coverage
at their respective integration sites.
* Genetic regions with high mutation rates and/or high variability within the population
can generate gaps in read coverage [4].
* Poor quality sequencing reads and chimeric reference sequences may cause gaps
and elevations in read coverage.
**Since the cause for gaps and elevations in read coverage can be ambiguous,
ProActive is best used as a screening method to identify genetic regions for further
investigation with other tools!**
**References:**
1. Tattini L., D'Aurizio R., & Magi A. (2015). Detection of Genomic Structural
Variants from Next-Generation Sequencing Data. Frontiers in bioengineering and biotechnology,
3, 92. https://doi.org/10.3389/fbioe.2015.00092
2. Kleiner M., Bushnell B., Sanderson K.E. et al. (2020) Transductomics: sequencing-based
detection and analysis of transduced DNA in pure cultures and microbial communities.
Microbiome 8, 158. https://doi.org/10.1186/s40168-020-00935-5
3. Kieft K., Anantharaman K. (2022). Deciphering Active Prophages from Metagenomes. mSystems 7:e00084-22.
https\://doi.org/10.1128/msystems.00084-22
4. Fogarty E., Moore R. (2019). Visualizing contig coverages to better understand
microbial population structure. https://merenlab.org/2019/11/25/visualizing-coverages/
### Input files
#### Pileup file:
ProActive detects read coverage patterns using a pattern-matching algorithm that
operates on pileup files. A pileup file is a file format where each row
summarizes the 'pileup' of reads at specific genomic locations. Pileup files
can be used to generate a rolling mean of read coverages and associated base
pair positions which reduces data size while
preserving read coverage patterns. **ProActive requires that input pileups files**
**be generated using a 100 bp window/bin size.**
Pileup files can be generated by mapping sequencing reads to a
metagenome or genome fasta. **Read mapping should be performed using a high**
**minimum identity (0.97 or higher) and random mapping of ambiguous reads.** The
pileup files needed for ProActive are generated using the .bam files produced
during read mapping. Some read mappers, like
[BBMap](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbmap-guide/),
allow for the generation of pileup files in the
[`bbmap.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/bbmap.sh)
command with use of the `bincov` output with the `covbinsize=100`
parameter/argument. **Otherwise, BBMap's**
**[`pileup.sh`](https://github.com/BioInfoTools/BBMap/blob/master/sh/pileup.sh)**
**can convert .bam files produced by any read mapper to pileup files**
**compatible with ProActive using the `bincov` output with `binsize=100`.**
**NOTE:** For detailed information on input file format, please see the vignette. Users may also use
the 'sampleMetagenomePileup' and 'sampleGenomePileup' files that come pre-loaded with
ProActive as a reference.
#### gffTSV:
ProActive optionally accepts a .gff file as input. The .gff file must be
associated with the same metagenome or genome used to create your pileup file.
The .gff file should be a TSV and should follow the same general format described [here](https://en.wikipedia.org/wiki/General_feature_format#:~:text=In%20bioinformatics%2C%20the%20general%20feature,DNA%2C%20RNA%20and%20protein%20sequences.).
## Installation
Install ProActive from CRAN with:
``` r
install.packages("ProActive")
library(ProActive)
```
Install the development version of ProActive from [GitHub](https://github.com/) with:
``` r
if (!require("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("jlmaier12/ProActive")
library(ProActive)
```
## Quick start
```{r example}
library(ProActive)
## Metagenome mode
MetagenomeProActive <- ProActiveDetect(
pileup = sampleMetagenomePileup,
mode = "metagenome",
gffTSV = sampleMetagenomegffTSV
)
MetagenomePlots <- plotProActiveResults(pileup = sampleMetagenomePileup,
ProActiveResults = MetagenomeProActive)
MetagenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = MetagenomeProActive,
pileup = sampleMetagenomePileup,
gffTSV = sampleMetagenomegffTSV,
geneOrProduct = "product",
keyWords = c("transport", "chemotaxis"))
## Genome mode
GenomeProActive <- ProActiveDetect(
pileup = sampleGenomePileup,
mode = "genome",
gffTSV = sampleGenomegffTSV
)
GenomePlots <- plotProActiveResults(pileup = sampleGenomePileup,
ProActiveResults = GenomeProActive)
GenomeGeneMatches <- geneAnnotationSearch(ProActiveResults = GenomeProActive,
pileup = sampleGenomePileup,
gffTSV = sampleGenomegffTSV,
geneOrProduct = "product",
keyWords = c("ribosomal"),
inGapOrElev = TRUE,
bpRange = 5000)
```