This pipeline is extensible, allowing the incorporation of new methods for assembling MSAs, guide trees, and evaluating MSAs. Before adding a component, a Nextflow module must be created. Typically, it's best to create an nf-core module, but for specific cases or testing, a local module may be more suitable. Even for local modules, following nf-core conventions is recommended. Some useful resources for this process are listed below:
- The nf-core documentation
- The Nextflow documentation for modules
- The nf-core DSL2 module tutorial
- The nf-core module documentation
- The nf-test documentation
- The nf-core slack, particularly the multiplesequencealign channel. Feel free to reach out!
Please also check the contribution guidelines.
These steps will guide you to include a new MSA tool into the pipeline. Once done, this will allow you to systematically deploy and benchmark your tool against all others included in the pipeline. You are also welcome to contribute back to the pipeline if you wish.
- 0. Create an nf-core module for your tool. Instructions on how to contribute new modules here. Use other modules (e.g. famsa) as template. Ensure the output is in FASTA format.
!! You can look at an example of a new tool integration here.
-
1. Fork this repository and create a new branch (e.g. add-famsa)
-
2. Include the module in the alignment subworkflow (
subworkflows/local/align.nf
)- Install the module. E.g. with the command
nf-core modules install famsa/align
. - Include the module in
subworkflows/local/align.nf
, example here. - Add a branch to the correct channel, depending on your tool input. Example for sequence-based tools here and structure-based here.
- Add the code to correctly execute the tool, as done here.
- Feed the output alignment and versions channels back into the
msa
. Make sure tomix()
them so they do not get overwritten! example.
- Install the module. E.g. with the command
-
3. Add the aligner to the aligner config in conf/modules.config. Example.
-
4. Update Docs
- Update docs/usage.md
- Update CITATIONS.md
- Update CHANGELOG.md
- Update citations in utils subworkflow, here
-
5. Add your tool in the toolsheet in the test dataset repository. Example.
-
6. Open a PR against the
dev
branch of the nf-core repository :)
Congratulations, your aligner is now in nf-core/multiplesequencalign!
To add a tool to estimate a guide tree, please follow exactly the steps of "Adding an aligner" with the only difference being that the subworkflow to be updated is subworkflows/local/compute_trees.nf.
Adding a new evaluation mainly requires changes in the evaluate.nf subworkflow.
-
0. Create a module, local or nf-core for your evaluation tool. Instructions on how to contribute new modules here. Use other modules (e.g. tcoffee/alncompare as template. Ensure the output is in CSV format. To merge the correct evaluation files and report the final output, the pipeline utilizes the
meta
field, which specifies the tools to be used. This information has to be included in the CSV returned by the module so as to merge it later, these lines in tcoffe/alncompare take care of it. -
1. Fork this repository and create a new branch (e.g. add-tcoffee-alncompare)
-
2. Include the module in the evaluate subworkflow (
subworkflows/local/evaluate.nf
)- Add a
calc_yourscore
parameter to the pipeline innextflow.config
and document it innextflow_schema.json
. The parameter can then be passed by the user to decide whether to run your evaluation workflow. Example. - Add a codeblock to
subworkflows/local/evaluate
that calls the newly added evaluation module if the appropriate parameter is passed to the pipeline. Example. - To ensure the called module produces an output file with the correct name for merging evaluation outputs, add a config option in
conf/modules.config
. Example.
- Add a
-
3. Incorporate the evaluation output into the summary output. After computing the scores of the different evaluation tools, the pipeline merges them into different summary CSVs (per metric, total and in combination with the dataset statistics). For this to happen, the output of the individual evaluation runs needs to be concatenated using the
CSVTK_CONCAT
module twice, first in the evaluation call to merge all calls of a single evaluation tool and then in the merging step. -
4. Update Docs
- Update docs/usage.md
- Update CITATIONS.md
- Update CHANGELOG.md
- Update citations in utils subworkflow, here
-
5. Open a PR :)
Now your evaluation metric is incorporated into nf-core/multiplesequencealign! Congratulations!