Skip to content

Latest commit

 

History

History
40 lines (25 loc) · 2.16 KB

server_setup_instructions.md

File metadata and controls

40 lines (25 loc) · 2.16 KB

Server setup instructions for the Stenglein lab taxonomic assessment pipeline

This is the pipeline used in the Stenglein lab to taxonomically classify sequences in NGS datasets.

The goal of this document is to describe how to setup a computer to run this pipeline.

Conda installation

The dependencies for this pipeline (see below) can be installed via conda. The conda recipe file for 64 bit linux is here. After downlading this recipe file, create a conda environment with this command:

conda create --name taxonomy --file taxo_recipe.yaml

Dependencies.

This pipeline has several main dependencies (included in the conda environment described above), including:

As well as some perl modules and a the scripts in this repository.

The pipeline also expects local installations of the NCBI nt/nr databases, as well as the NCBI taxonomy database for accession->taxid mapping.

Installing NCBI databases

The pipeline needs databases of nucleotide (nt) and protein (nr) sequences from NCBI. These databases are pretty big. Expect that they will take up something like ~1Tb of disk space (and they continue to grow) .

We download and setup NCBI databases using the scripts in this repository in the server directory. The main scripts to run are:

Download and process the NCBI taxonomy database

Download and process the NCBI nt and nr databases