Server setup instructions for the Stenglein lab taxonomic assessment pipeline

This is the pipeline used in the Stenglein lab to taxonomically classify sequences in NGS datasets.

The goal of this document is to describe how to setup a computer to run this pipeline.

Conda installation

The dependencies for this pipeline (see below) can be installed via conda. The conda recipe file for 64 bit linux is here. After downlading this recipe file, create a conda environment with this command:

conda create --name taxonomy --file taxo_recipe.yaml

Dependencies.

This pipeline has several main dependencies (included in the conda environment described above), including:

fastqc
bowtie2
cutadapt
blast
diamond
spades

As well as some perl modules and a the scripts in this repository.

The pipeline also expects local installations of the NCBI nt/nr databases, as well as the NCBI taxonomy database for accession->taxid mapping.

Installing NCBI databases

The pipeline needs databases of nucleotide (nt) and protein (nr) sequences from NCBI. These databases are pretty big. Expect that they will take up something like ~1Tb of disk space (and they continue to grow) .

We download and setup NCBI databases using the scripts in this repository in the server directory. The main scripts to run are:

Download and process the NCBI taxonomy database

https://github.com/stenglein-lab/taxonomy_pipeline/blob/master/server/fetch_NCBI_Taxonomy_db:

Download and process the NCBI nt and nr databases

https://github.com/stenglein-lab/taxonomy_pipeline/blob/master/server/fetch_nr_nt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server_setup_instructions.md

server_setup_instructions.md

Server setup instructions for the Stenglein lab taxonomic assessment pipeline

Conda installation

Dependencies.

Installing NCBI databases

Download and process the NCBI taxonomy database

Download and process the NCBI nt and nr databases

Files

server_setup_instructions.md

Latest commit

History

server_setup_instructions.md

File metadata and controls

Server setup instructions for the Stenglein lab taxonomic assessment pipeline

Conda installation

Dependencies.

Installing NCBI databases

Download and process the NCBI taxonomy database

Download and process the NCBI nt and nr databases