We used Guppy version 5.0.7 to basecall the fast5
files produced by minions. It was ran in GPU partition
.
Script for Guppy
#!/bin/bash -e
#SBATCH --job-name=Guppy_EW
#SBATCH --account=uoo02752
#SBATCH --time=10:00:00
#SBATCH --partition=gpu #guppy runs faster in gpu partition in Nesi, than other partition
#SBATCH --gres=gpu:1 #some configuration for gpu partition, suggested by Nesi support
#SBATCH --mem=6G
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
module load ont-guppy-gpu/5.0.7
guppy_basecaller -i ../path/to/all/fast5/files/ \
-s ./path/to/output/directory/ \
--flowcell FLO-MIN106 \
--kit SQK-LSK109 \
--num_callers 4 -x auto \
--recursive \ # recursive flag looks for fast5 files in subfolders as well
--trim_barcodes --disable_qscore_filtering # disabled the quality filtering to get all the fastq files produced in one folder instead of pass, and fail. we will do QC later.
Then we concatenate all the fastq files produced in one fastq file EW_nanopore_merged.fastq
We used PycoQC to check the quality of our long read data. It produces an interactive html file with detailed description of data quality.
It uses sequencing_summary.txt
file generated by guppy as an input.
Script for PycoQC
#!/bin/bash -e
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks 5
#SBATCH --job-name PycoQc
#SBATCH --mem=2G
#SBATCH --time=00:20:00
#SBATCH --account=uoo02752
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --hint=nomultithread
export PATH="/nesi/nobackup/uoo02752/nematode/bin/miniconda3/bin:$PATH"
pycoQC -f path/to/sequencing_summary.txt #sequencing_summary.txt is produced by guppy in its output folder.
-o path/to/output/file/pycoQC_output.html
We have used DNACS
during nanopore sequencing library preparation so we will use Nanolyse to remove lamda DNA from our fastq files.
Script for Nanolyse
#!/bin/bash -e
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks 5
#SBATCH --job-name Nanolyse.EW
#SBATCH --mem=1G
#SBATCH --time=01:00:00
#SBATCH --account=uoo02752
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --hint=nomultithread
module load NanoLyse/1.2.0-gimkl-2020a
cat path/to/EW_nanopore_merged.fastq | NanoLyse | gzip > EW_nanopore_merged_filtered.fastq.gz
Guppy only removes adapters attached to the ends of the reads. So we are using Porechop on top of that to remove any remaining adapter in middle of reads.
Script for Porechop
#!/bin/bash -e
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks 10
#SBATCH --job-name PorechopEW
#SBATCH --mem=40G
#SBATCH --time=02:00:00
#SBATCH --account=uoo02752
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --hint=nomultithread
module load Porechop/0.2.4-gimkl-2020a-Python-3.8.2
porechop -i /path/to/EW_nanopore_merged_filtered.fastq.gz -o /path/to/output/EW_nanopore_merged_filtered_porechop.fastq.gz --threads 10
NanoQC produces a html report on the quality of the reads.
Script for NanoQC
#!/bin/bash -e
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --ntasks 10
#SBATCH --job-name NanoQC.EW
#SBATCH --mem=10G
#SBATCH --time=03:00:00
#SBATCH --account=uoo02752
#SBATCH --output=%x.%j.out
#SBATCH --error=%x.%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --hint=nomultithread
export PATH="/nesi/nobackup/uoo02752/nematode/bin/miniconda3/bin:$PATH" #NanoQC was intalled in conda env
nanoQC /path/to/EW_nanopore_merged_filtered_porechop.fastq.gz -o /path/to/output/directory
To assemble the genome with long reads, we tried various available long read genome assemblers. Among all available softwares, Flye performed the best. We also tried different version of Flye and the v.2.7.1 (not the latest) performed better. So we ran Flye (v.2.7.1) with three iterations of polishing.
Script for Flye
#!/bin/bash -e
#SBATCH --nodes 1
#SBATCH --cpus-per-task 1
#SBATCH --job-name Flye.EW.v7
#SBATCH --mem=80G
#SBATCH --time=5-00:00:00
#SBATCH --account=uoo02752
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err
#SBATCH --mail-type=ALL
#SBATCH [email protected]
#SBATCH --hint=nomultithread
module load Flye/2.7.1-gimkl-2020a-Python-2.7.18
flye --nano-hq /path/to/EW_nanopore_merged_filtered_porechop.fastq.gz -o /path/to/output/directory-EW_Flye.v2.7 -t 10 -i 3