Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev ratio #288

Draft
wants to merge 151 commits into
base: dev
Choose a base branch
from
Draft

Dev ratio #288

wants to merge 151 commits into from

Conversation

bjlang
Copy link

@bjlang bjlang commented Sep 13, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Cristina Araiz and others added 30 commits July 18, 2024 10:38
Coda workflow.
Files that are unique to the coda workflow have been added to the corresponding folders (conf, modules, subworkflows...). Files that are present in the pipeline have been added as ***_coda to avoid overwriting the original ones (modules_coda.config, nextflow_coda.config, main_coda.nf... etc).
Update dev-ratio with changes from dev branch
Include CoDa workflow in main workflow
- use contrast file information for differential analysis with PropD
Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have a difference of opinion here, but I really think components should be kept simple, and complexity around channel manipulations should be pushed up to workflows, it's just the nf-core way.

I think I'm going to have a go at pushing the differential workflow up into nf-core/modules (in simplified form).

take:
ch_counts // [ meta_exp, counts ] with meta keys: method, args_diff
ch_samplesheet // [ meta_exp, samplesheet ]
ch_contrasts // [ meta_contrast, contrast_variable, reference, target ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_contrasts // [ meta_contrast, contrast_variable, reference, target ]
ch_contrasts // [ meta_contrast, contrast_variable, reference, target ]
method // limma, deseq2 ...

Comment on lines +27 to +33
ch_counts
.branch {
propd: it[0]["method"] == "propd"
deseq2: it[0]["method"] == "deseq2"
limma: it[0]["method"] == "limma"
}
.set { ch_counts }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_counts
.branch {
propd: it[0]["method"] == "propd"
deseq2: it[0]["method"] == "deseq2"
limma: it[0]["method"] == "limma"
}
.set { ch_counts }

Sorry, I still don't like this. We should keep the interface simple rather than forcing people to stuff things into meta maps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see your problem. The meta is existing in every nf-core module and subworkflow and from what I see it is also not uncommon to use the meta information for downstream analysis decisions. Take for example the single_end flag in RNAseq, ChIPseq, ATACseq,... Both modules (e.g. trimgalore ) and subworkflows are using it, and thus forcing the flag to exist. This is even also used for branching.

Copy link
Member

@pinin4fjords pinin4fjords Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single-end thing is a special case, we try and discourage the propagation of assumptions about fields in the meta.

The branching you point at is necessitated by the need to direct outputs form the first processes to the right place. It's not engineered in to the inputs. If I were writing that subworkflow without that first process I'd be saying the same thing- the single end status would be a straighforward subworkflow input.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point here is that the tool status has nothing to do with the count file. You're forcing people to build it into the meta to suit the use case, but it would just make life harder for other users of the subworkflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


workflow DIFFERENTIAL {
take:
ch_counts // [ meta_exp, counts ] with meta keys: method, args_diff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_counts // [ meta_exp, counts ] with meta keys: method, args_diff
ch_abundance // [ meta_exp, counts ] with meta keys: method, args_diff

This won't always be counts- e.g. for arrays it will be 'intensities', for PROTEUS it will be whatever that techology uses.

// Perform differential analysis with propd
// ----------------------------------------------------

// TODO propd currently don't support blocking, so we should not run propd with same contrast_variable, reference and target,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can assume that will be resolved at the workflow level (filtering the contrasts channel)

include { DESEQ2_DIFFERENTIAL } from '../../../modules/nf-core/deseq2/differential/main'
include { DESEQ2_DIFFERENTIAL as DESEQ2_NORM } from "../../../modules/nf-core/deseq2/differential/main"
include { LIMMA_DIFFERENTIAL } from '../../../modules/nf-core/limma/differential/main'
include { FILTER_DIFFTABLE as FILTER_DIFFTABLE_LIMMA } from '../../../modules/local/filter_difftable'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subworkflow will never run BOTH Limma AND DESeq2, so you don't need to import and call it twice. You should output DESeq2 + Limma to the same channels, which get filtered by the same process.

Copy link

@suzannejin suzannejin Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that is the idea with the multiply running options no?
If the user wants to run --pathway deseq2_gsea,limma_gsea
Both deseq2 and limma would be run, but the output would be channeled into different places.
How would you deal with that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, I think maybe this is down to our difference in thoughts on architecture.

In my view the subworkflow would run twice to make that happen.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhhhh so you would have import various subworkflows (eg DIFFERENTIAL_DESEQ2, DIFFERENTIAL_LIMMA) inside the workflow and run them.... actually that is nice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. We keep all components as simple as possible, and manage complexity via channel operations in the calling workflow.

subworkflows/local/differential/main.nf Outdated Show resolved Hide resolved
'community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:17abd3f137436739' }"

input:
tuple val(meta), path(count), path(samplesheet), val(contrast_variable), val(reference), val(target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very un-nf-core, and doesn't match the analagous differential modules. I think this will need to be split for nf-core

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay that is not a problem

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though why this is not recommended by nf-core?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh- has there been a change? Could you point at that guideline?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there a change/guideline for that...that is why i asked because i was not sure why you say this is un-nf-core.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the convention in modules is not to combine everything into a single input channel, but only to combine related things. You could copy the interface of the other differential modules, which would have better consistency and would prevent us having to do special operations to fit this module.


// conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'oras://community.wave.seqera.io/library/bioconductor-limma_r-ggplot2_r-propr:209490acb0e524e3' :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use the https versions of the singularity URIs for Seqera Containers. It's a bit of a pain, but here's how: https://nfcore.slack.com/archives/CJRH30T6V/p1729023924708159?thread_ts=1729023915.530259&cid=CJRH30T6V

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants