-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add gprofiler #199
Add gprofiler #199
Conversation
…profiler), added citations for gprofiler, added gprofiler section to report
… added report text if no enrichment found, added gprofiler2 params to report and nextflow schema
|
… contrasts are investigated
…ing error in schema, added rds to GOST output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to have a another run-through of this in the morning. But for now: do you think it's sensible to have BOTH GSEA and gprofiler run? We could have a variety of gene set methods available, and I'm inclined to have one active at a time or the report will get very busy and complex.
Good point, but I would prefer not to outright prevent multiple methods from running simultaneously (sb might want to compare the results for example). I feel like if all of these methods are disabled by default, that should be fine, no? If a user then decides to run multiple ones in parallel, that's sort of on them :'D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more comments.
Mostly I think gprofiler should be better integrated into existing structures for working with gene sets (gene set file channels, report sections etc).
Note: This can only be merged after nf-core/modules#4538 |
…sea docu; replaced gsea with gprofiler in test config (gsea is still tested in test_affy); moved round_digits and gene_sets to new mods_ param category
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really close, just a few last comments and I tidied up the bash filtering script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a commit - hope you don't mind, just check you agree.
At a minimum we should document the new option in usage.
Alternatively, and just a suggestion, we reorder the priorities in the gprofiler2 module, so that the new mode option isn't necessary (because the gprofiler2 module would simply ignore the gene sets in the presence of one of the other options - we'd just have to document that).
What do you think?
…ndance into add_gpro, update gprofiler2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last batch of corrections. With these addressed we're ready to go!
|
||
def run_gene_set_analysis = params.gsea_run || params.gprofiler2_run | ||
|
||
if (run_gene_set_analysis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really don't think you can handle running the gprofiler module multiple times for different gene set files, suggest modifying the validation logic like:
if (run_gene_set_analysis) {
if (params.gene_sets_files) {
gene_sets_files = params.gene_sets_files.split(",")
ch_gene_sets = Channel.of(gene_sets_files).map { file(it, checkIfExists: true) }
if (params.gprofiler2_run && (!params.gprofiler2_token && !params.gprofiler2_organism) && gene_sets_files.size() > 1) {
error("gprofiler2 can only work with a single gene set file")
}
} else if (params.gsea_run) {
error("GSEA activated but gene set file not specified!")
} else if (params.gprofiler2_run) {
if (!params.gprofiler2_token && !params.gprofiler2_organism) {
error("To run gprofiler2, please provide a run token, GMT file or organism!")
}
} else {
ch_gene_sets = [] // For methods that can run without gene sets
}
}
(sorry for not making this a proper suggestion- GitHub wouldn't let me becuase of the deleted lines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that is simpler of course.
No, I will certainly enable multiple GMTs in the future when I get around to it, just not right now (I'll mention that in the error).
Co-authored-by: Jonathan Manning <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go- thanks for working with me on this, think things are much tighter now.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).