WGS2AMR

This pipeline was built to predict antimicrobial resistance (AMR) to 8 common antibiotics in 5 different Gram-negative species based on whole-genome sequencing data.

Installation of dependencies

This project was tested with installation of all dependencies on a linux based machine.

DIAMOND

This tool is used to align the bacterial sequencing files (fastq format) to a precompiled database of antimicrobial resistance genes. To install DIAMOND, follow the instruction on their GitHub page.

The database to align to is provided in this project and does not need to be compiled.

R

Processing of the output as created by the DIAMOND aligner and predicing the AMR is done in R. For this project, R 3.5.0 was used, so this or any later version should work (though older versions are expected to work as well). After installing R on your machine, the following packages need to be installed before running the pipeline:

install.packages(c("tibble", "dplyr", "seqinr", "stringr", "purrr", "DescTools", "xgboost"))

SRA-Tools (optional)

In case you like to download and process publicly available files directly from NCBI's Sequence Reads Archive (SRA), you can install the SRA-Tools.

Input

The models were created to make predictions for the following bacterial species: Acinetobacter baumannii, Enterobacter cloacae, Escherichia Coli, Klebsiella aerogenes and Klebsiella pneumoniae. They were trained to be species independent and could theoretically make predictions on other Gram-negative species, but have not been validated for those thus far.

The files used as input should be next-generation-sequencing files in either fastq or compressed fastq.gz format. This can be in one single file or two separate files in case of pair-end read files. The testFiles folder in the wgs2amr folder contains dummy files that can be used to test the pipeline when all dependencies are installed.

It is also possible to download and process files directly from NCBI's Sequence Reads Archive (SRA) if the SRA-Tools have been installed on your machine and you know the SRR number of the sample of interest. Our pipeline can take process files of different sequencing depths, but know that shallow sequencing is more likely to miss important resistance genes and thus the error rate could be higher in such cases.

Running the pipeline

Download the project folder

Download the wgs2amr folder from this GitHub page and put it on the same machine where you installed the dependencies.

Call the wgs2amr.sh script - Local files

The wgs2amr folder contains the master script wgs2amr.sh that can be called with the following arguments

-d : Path to the diamond script in the diamond folder
-r : Either path to the Rscript in the R bin folder, or the name of the R module to load on a linux machine (e.g. R/3.5.0)
-f : The first sequence file. If there is only one sequence file, this one should be set. Both fastq and fastq.gz are supported
-s : The second sequence file in case of two pair-end reads files. Again, both fastq and fastq.gz are supported. In case of only one file, this argument can be omitted
-o : The location of the output folder where the prediction results in csv format will be stored. If not set, this will default to the RESULTS folder within the wgs2amr folder

Here are some examples of a typical wgs2amr.sh call using the test data provided in the wgs2amr

#Using two pair-end reads input files and the link to the Rscript
/pathToScript/wgs2amr.sh \
-d '/pathToDiamond/diamond' \
-r '/pathToR/bin/Rscript' \
-f '/pathTo/wgs2amr/testFiles/readsFile1.fastq.gz' \
-s '/pathTo/wgs2amr/testFiles/readsFile2.fastq.gz'

#Using one combined reads file and the name of the R module
/pathToScript/wgs2amr.sh \
-d '/pathToDiamond/diamond' \
-r 'R/3.5.0' \
-f '/pathTo/wgs2amr/testFiles/testFile.fastq.gz'

Call the wgs2amr.sh script - Download data from SRA

This will download the read files of the specified SRR from NCBI's SRA and put them in the wgs2amr/downloads/ folder. If the files already have been downloaded, this step will be skipped to save time and banswidth. Should the file be corrupt and lead to errors, simply delete both read files from the foolder and run the scrip again, in which a new download will start.

-n : Use NCBI's SRAToolkit's fastq-dump to download a SRR file. Set this parameter to the path of the fastq-dump script or the name of the SRAToolkit module to be loaded on a linux machine.
-d : Path to the diamond script in the diamond folder
-r : Either path to the Rscript in the R bin folder, or the name of the R module to load on a linux machine (e.g. R/3.5.0)
-f : This should be the SRR number you like to download and process.
-o : The location of the output folder where the prediction results in csv format will be stored. If not set, this will default to the RESULTS folder within the wgs2amr folder

#Downloading data from NCBI's SRA then processing it
/pathToScript/wgs2amr.sh \
-n '/pathToSRAtoolkit/fastq-dump'
-d '/pathToDiamond/diamond' \
-r 'R/3.5.0' \
-f 'SRR4017839'

Note:If an output folder is specified, make sure the path ends with a backslash

Output

The output file will be named after the first sequence file used in the format <filename>_AMRpredictions.csv and found in the default or specified output folder. The first column contains the names of the 8 antibiotics for which predictions were made, the second column the predicted susceptibility (susceptible or resistant), the last column the reliability, offering some idea of the degree the model thinks the output is correct (0 = very uncertain to 1 = near certain)

Example of the test file output:

antibiotic	prediction	reliability
cefepime	resistant	0.70
cefotaxime	resistant	0.78
ceftriaxone	resistant	0.98
ciprofloxacin	resistant	0.96
gentamicin	suseptible	0.88
levofloxacin	resistant	0.96
meropenem	suseptible	0.40
tobramycin	resistant	0.66

NOTE: Temporary files like the diamond alignment files are stored in the wgs2amr/temp folder and can safely be removed after finishing the pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scriptsAndData		scriptsAndData
testFiles		testFiles
.gitignore		.gitignore
README.html		README.html
README.md		README.md
log		log
wgs2amr.sh		wgs2amr.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WGS2AMR

Installation of dependencies

DIAMOND

R

SRA-Tools (optional)

Input

Running the pipeline

Download the project folder

Call the wgs2amr.sh script - Local files

Call the wgs2amr.sh script - Download data from SRA

Output

About

Releases

Packages

Languages

pieterjanvc/wgs2amr

Folders and files

Latest commit

History

Repository files navigation

WGS2AMR

Installation of dependencies

DIAMOND

R

SRA-Tools (optional)

Input

Running the pipeline

Download the project folder

Call the wgs2amr.sh script - Local files

Call the wgs2amr.sh script - Download data from SRA

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages