A pipeline to validate read trimming preformance. Assumes paired end.
Dependencies are the latest stable version of nextflow
and conda
(alternatively, BYO versions of the tools found in resources/environment.yml
and remove all conda
configuration in nextflow.config
).
cutadapt
is used to trim reads, given a known adapter sequence (--cutadapt_adapter
). Other arguments can be provided with--cutadapt_args
.- Independently,
bbmerge
is used to automatically detect adapter sequences from untrimmed data. Untrimmed files are subsetted to a fixed number of reads, configurable with--subsample_n
. bbmerge
consensus sequences are compared to the adapter provided tocutadapt
.- Consensus sequences are merged with a premade adapter list (can be altered via
--adapters
). The default uses the list provided bybbmap
. bbduk
is used with the merged adapter list to filter for adapters. Two different sets of arguments are used,-bbduk_standard_args
and-bbduk_lenient_args
.- Random adapter sequences are generated and the same parameters are used with the random adapters as a comparison.
- Report is generated. Most frequent matches are compared across each parameter combination.
WORKFLOW=path/to/adapter-checks/
nextflow run path/to/adapter-checks \
--input_fastq input_dir \
--cutadapt_adapter 'CTGTCTCTTATACACATCT' \
--cutadapt_args '-e 0.3' \
--bbduk_standard_args 'ktrim=r mink=11 hdist=2 k=21 tbo' \
--bbduk_lenient_args 'ktrim=r mink=11 hdist=1 k=23 tbo' \
--output results_dir \
-resume
A report will be generated with some custom figures.
TODO: It would be nice to have those as part of a multiQC plugin, perhaps.