This tool helps with processing multiomics sequencing data including metagenomics, metatranscriptomics, and metatranslatomics data. There are three typical use cases of this pipeline:
(1) To analyze all omics at the same time, run mopp workflow --help
(2) To analyze each omic, run mopp metag --help
.etc
(3) To run a single analysis step, run mopp trim --help
.etc
The culmination of the pipelines is the creation of a feature count table using Woltka.
Different use cases may have more requirements, but for every use case, the following are necessary:
- An input folder with your sequencing data
- A tab-delimited metadata file describing the sequencing data you would like processed and follows this template. Please avoid . in naming files unless . is used before suffix. Please avoid _Rxx in namig files unless it is for indicating strand (e.g. _R1, _R2). If any data is not meant to be paired, just put 'R1' in the strand column.
MOPP works with python >= 3.8 and linux-based web servers. To install mopp using pip, run the following command
conda create -n mopp python=3.8
conda activate mopp
conda install -c conda-forge -c bioconda bowtie2 trim-galore woltka
pip3 install mopp
To install the most up-to-date version of mopp, run the following command
git clone https://github.com/sherlyn99/mopp.git
cd mopp
conda env create -f mopp.yml -n mopp
conda activate mopp
pip install -e .
or use mamba for faster installation
git clone https://github.com/sherlyn99/mopp.git
cd mopp
conda install mamba -n base -c conda-forge # skip if mamba is already installed
mamba env create -f mopp.yml
mamba activate mopp
pip install -e .
The Web of Life (WoL) Database is used by MOPP as a reference for microbe phylogenies. This database is not included in the distribution and must be downloaded independently here. You can use the following command to download the required database files.
# make sure you are in mopp directory
mkdir mopp_db
cd mopp_db
wget --no-check-certificate -nH -np -r --cut-dirs=1 https://ftp.microbio.me/pub/wol-20April2021/ --reject="index.html*"
Create bowtie2 index of the WoL database by running the followig commands.
cd wol-20April2021
mkdir -p databases/bowtie2
xzcat genomes/concat.fna.xz > /tmp/input.fna
bowtie2-build --seed 42 /tmp/input.fna databases/bowtie2/WoLr1
rm /tmp/input.fna
MOPP uses micov
for calculating genome coverages and filtering genomes based on coverage thresholds. This library is included in the conda environment. Learn more about it here.
usage: mopp workflow -i <Input Directory> -o <Output Directory> -m <Metadata (tsv)> -x <Index> -t <Num Threads> -l <Genome Lengths Path> -c <Cutoff> -ref <Reference Database> -p <Index Prefix>
example:
mopp workflow -i ./test/data \
-o ./test/data/out3/ \
-m ./test/data/metadata.tsv \
-x ./test/data/wol_subset_index/wol_subset0.1_index \
-t 4 \
-l /home/y1weng/genome_lengths.tsv \
-c 0.1 \
-ref ./test/data/wol_subset_index/wol_above10.concat.fna \
-p myTest \
-r genus,species \
-db /panfs/y1weng/01_woltka_db/wol1/wol-20April2021 \
-strat \
-r genus,species
This is the central tool to MOPP, where you can analyze all omics at the same time. With your provided metadata, the tool is able to properly process metagenomic, metatranscriptomic, and metatranslatomic data to produce count tables that classify by genus and stratify species by uniref annotation.
The workflow takes the following steps:
-
Trim adapters off of all sequencing files.
-
Align metagenomic files to the entire Web of Life Database and calculate genome coverages. A subset index of the Web of Life Database is created based on a desired genome coverage threshold.
-
Align metagenomic, metatranscriptomic, and metatranslatomic data to the subset index.
-
Generate genus classification and Species|Uniref stratification count tables using Woltka.
usage: mopp trim -i <Input Directory> -o <Output Directory> -m <Metadata (tsv)>
example:
mopp trim -i ./test/data \
-o ./test/data/out3/cat \
-m ./test/data/metadata.tsv
mopp trim
trims sequencing data provided in the input directory. The metadata indicates which type of data it is (metaG, metaT, or metaRS) so that optimal trimming parameters can be selected case-by-case.
usage: mopp align -i <Input Directory> -o <Output Directory> -m <Metadata (tsv)> -p <Pattern> -x <Index> -t <Num Threads>
example:
mopp align -i ./test/data/out3/cat \
-p *.fq.gz \
-o ./test/data/out3/aligned \
-x ./test/data/wol_subset_index/wol_subset0.1_index \
-t 4
mopp align
aligns the sequencing data provided in the input directory to the reference index. Providing a file pattern -p
allows for specification of files with certain name patterns. Allocating more threads to this command -t
can reduce processing time.
usage: mopp cov -i <Input Directory> -o <Output Directory> -m <Metadata (tsv)> -l <Genome Lengths Path>
example:
mopp cov -i ./test/data/out3/aligned/samfiles \
-o ./test/data/out3/cov \
-l /home/y1weng/genome_lengths.tsv
mopp cov
uses micov's compress
to produce a spreadsheet with calculated genome coverages. This is essential for selecting an optimal coverage threshold when generating a subset index.
usage: mopp generate-index -i <Input Coverage> -o <Output Directory> -c <Cutoff> -ref <Reference Database> -p <Prefix>
example:
mopp generate-index -i ./test/data/out2/cov/coverages.tsv \
-c 0.1 \
-ref ./test/data/wol_subset_index/wol_above10.concat.fna \
-o ./test/data/out3/index \
-p myTest
mopp generate-index
creates a subset index from a larger database, given a cutoff threshold. For example, -c 0.2
would generate a subset that only contains genomes with 20% or greater coverage.
usage: mopp feature-table -i <Input Directory> -o <Output Directory> -db <Woltka Database> -strat <Stratification>
example:
mopp feature-table -i ./test/data/out3/aligned/samfiles \
-o ./test/data/out2/features \
-db /panfs/y1weng/01_woltka_db/wol1/wol-20April2021 \
-strat \
-r genus,species
mopp feature_table
is the culmination of the processing pipeline. Given the processed sequencing files, the command produces a feature count table using the Woltka database. Stratification options include species
, genus
, and genus,species
coming soon