-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME_for_developers
119 lines (66 loc) · 2.78 KB
/
README_for_developers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
DiscoVir: virome pipeline
Files for running pipeline in LOCUS
• Snakefile: pipeline script (reads in configs, commands for each pipeline step/rule)
• cluster_setup.smk: helper script for reading in cluster config file
• project_config.yaml: for snakemake --configfile option. config file with details for a specific project - working/input/output directories, path to scripts and other configs, sample names, options for specific rules in the pipeline, etc.
• locus.cluster_config.yaml: cluster configuration file for snakemake --cluster-config option. specifically for NIAID Locus HPC which uses UGE. (sets parameters for qsub command for each rule's job, and which environment modules to use)
• locus_submit_vp.sh: batch job submit script for running the pipeline on Locus
• scripts: see scripts README
Packages and versions (also found in locus_config.yaml) for all rules in pipeline
rule genomad:
• genomad/1.5.2
• genomad database (genomaddb): /hpcdata/bio_data/genomad/genomad_db/
version 1.3 is what we are currently using. There is a updated version (1.5).
The databases can be download directly from https://zenodo.org/records/8339387
or with the command below, but that might be the most updated version. (We should maybe update)
```
genomad download-database .
```
• https://portal.nersc.gov/genomad/installation.html
rule verse_genomad:
• Verse/0.1.5
• Python/3.9.5-GCCcore-10.3.0
• https://github.com/qinzhu/VERSE
rule checkV
checkV
• checkv/0.8.1
• Python/2.7.15-foss-2018b
• checkv database (checkvdb): /hpcdata/bio_data/checkv/checkv-db-v1.5
• https://bitbucket.org/berkeleylab/checkv/src/master/
rule checkv_filter
Seqtk
• Seqtk/1.3.r106
• https://github.com/lh3/seqtk
rule bbtools_dedupe
BBmap
• bbmap/38.90
• https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/installation-guide/
rule mmseqs
• mmseqs2/14.7e284
• Python/3.9.5-GCCcore-10.3.0
• https://github.com/soedinglab/MMseqs2
rule vOTU
-Python/3.9.5-GCCcore-10.3.0
rule vs4dramv
Virsorter2
• package: virsorter/2.2.3-Python-3.8.10
• https://github.com/jiarong/VirSorter2
rule dramv and rule amgs
rule dramv and rule amgs
• dram/1.4.6_ms
• dram database: /hpcdata/bcbb/shared/microbiome_share/modules/dram/1.4.6_ms
• https://github.com/WrightonLabCSU/DRAM
rule verse_dramv
- Python/3.9.5-GCCcore-10.3.0
- liftoff/1.6.3-Python-3.9.12
rule iphop
• iphop/1.3.0
• iphop database: /hpcdata/bio_data/iphop/Sept_2021_pub_rw/
• https://bitbucket.org/srouxjgi/iphop/src/main/
rule diamond
• diamond/2.0.15.153
• diamond database (diamonddb): /hpcdata/bio_data/DIAMOND_ncbi_db/nr.diamond.dmnd
Database version is diamond v0.9.28.129
• https://github.com/bbuchfink/diamond
rule gene_tables
- Python/3.9.5-GCCcore-10.3.0