Skip to content

VEuPathDB/blast-similarity-nextflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

THIS REPO IS 🚧 UNDER CONSTRUCTION 🚧 and NOT Used in ANY production CODE

Nextflow Conversion of BlastSimilarityTask

blastSimilarity

flowchart TD
    p0((Channel.fromPath))
    p1([splitFasta])
    p2(( ))
    p3[nonConfiguredDatabase:createDatabase]
    p4(( ))
    p5[nonConfiguredDatabase:blastSimilarity]
    p6([collectFile])
    p7(( ))
    p8([collectFile])
    p9(( ))
    p10([collectFile])
    p11(( ))
    p0 --> p1
    p1 -->|seqs| p5
    p2 -->|newdb.fasta| p3
    p3 --> p5
    p4 -->|fastaName| p5
    p5 --> p6
    p5 --> p8
    p5 --> p10
    p6 --> p7
    p8 --> p9
    p10 --> p11
Loading

Explanation of nextflow.config file parameters:

param value type description
blastProgram string Name of NCBI blast tool you want to run
seqFile string Path to input file
preConfiguredDatabase boolean If you have databasefiles generated from NCBI's makeblastdb, there is no need to generate these files. If this is set to true, you will need to supply databaseDir and databaseBaseName.
databaseDir string The path to the directory containing the database files. There can be other files in this directory, but any file beginning with the databaseBaseName will be brought into the process.
databaseBaseName string The rootname for you database files. For example, "newdb.fasta" would be used for the files in blastSimilarity/data/database
databaseFasta string The location of the fasta file that you would like to use to create your database. Needed if preConfiguredDatabase is false.
databaseType string The type of database you are using. Either "prot" or "nucl". Only needed if preConfiguredDatabase is false.
dataFile string How you would like the main output file to be named.
logFile string How you would like the log file to be named.
outputDir string Path to where you would like output files stored
saveAllBlastFiles boolean If true, the blast output for each time blast is ran. If you have 9 sequences in your input file, and you have fastaSubsetSize as 1, you will recieve 9 zipped files. If fastaSubsetSize is equal to three, you will recieve 3 zipped files. Zipped file names will be the sequence identifier for the first sequence in the group being run that is put into the file (also will be the last in the zip file).
saveGoodBlastFiles boolean Similar to saveAllBlastFiles, expect only files that contain a hit will be saved. saveGood and saveAll should not both be true.
doNotParse boolean This tool operates in two steps, running blast and grepping through the output to collect and return values. If doNotParse is true, only the blast output is generated and returned. If false, then the output will continue on to the processing step.
printSimSeqsFile boolean Changes the output format of dataFile. Returns sequence accession from seqFile, the taxon it matched with from the database, the p-value, the exponent for the p-Value, and some stats per identity and per match.
blastParamsFile string The file location of the file containing additional blast paramenters. These can just be written out in the file as if you were using them on the command line.
fastaSubsetSize Int Number of sequences per split of seqFile passed to blastSimilarity process.

Get Started

  • Install Nextflow

    curl https://get.nextflow.io | bash

  • Run the script

    nextflow run VEuPathDB/blastSimilarity -with-trace -c <config_file> -r main