Skip to content

Commit

Permalink
Merge pull request #807 from d4straub/TreeSummarizedExperiment
Browse files Browse the repository at this point in the history
Export TreeSummarizedExperiment R object
  • Loading branch information
d4straub authored Jan 14, 2025
2 parents 55164ab + 5af4545 commit 64de8ae
Show file tree
Hide file tree
Showing 25 changed files with 238 additions and 80 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
- [#807](https://github.com/nf-core/ampliseq/pull/807) - Export of TreeSummarizedExperiment R object by default, can be omitted with `--skip_tse`, also added ability to skip phyloseq R object generation with `--skip_phyloseq`

### `Changed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@

> McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.
- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)

> Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.
### Non-default tools

- [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)
Expand Down
33 changes: 26 additions & 7 deletions assets/report_template.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ params:
picrust_pathways: FALSE
sbdi: FALSE
phyloseq: FALSE
tse: FALSE
---

<!-- Load libraries -->
Expand Down Expand Up @@ -1615,19 +1616,37 @@ but if you run nf-core/ampliseq with a sample metadata table (`--metadata`) any
"))
```

<!-- Section on PHYLOSEQ results -->
<!-- Section on exported R objects -->

```{r, eval = !isFALSE(params$phyloseq), results='asis'}
```{r, results='asis'}
any_robject <- !isFALSE(params$phyloseq) || !isFALSE(params$tse)
```

```{r, eval = !isFALSE(any_robject), results='asis'}
cat(paste0("
# Phyloseq
# R objects
[Phyloseq](https://doi.org/10.1371/journal.pone.0061217)
is a popular R package to analyse and visualize microbiom data.
The produced RDS files contain phyloseq objects and can be loaded directely into R and phyloseq.
Microbiome data can be analysed and visualized with certain R packages. For convenience, R objects in RDS format are provided.
"))
if ( !isFALSE(params$phyloseq) ) {
cat(paste0("
[Phyloseq](https://doi.org/10.1371/journal.pone.0061217) objects and can be loaded directely into R with package 'phyloseq'.
The objects contain an ASV abundance table and a taxonomy table.
If available, metadata and phylogenetic tree will also be included in the phyloseq object.
The files can be found in folder [phyloseq](../phyloseq/).
"))
"))
}
if ( !isFALSE(params$tse) ) {
cat(paste0("
[TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2) (TreeSE, TSE)
objects can be loaded into R with package 'TreeSummarizedExperiment'. and contain an ASV abundance table,
a taxonomy table, and sequences.
If available, metadata and phylogenetic tree will also be included in the object.
The files can be found in folder [treesummarizedexperiment](../treesummarizedexperiment/).
"))
}
```

<!-- Section on methods -->
Expand Down
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1057,6 +1057,15 @@ process {
pattern: "*.rds"
]
}

withName: TREESUMMARIZEDEXPERIMENT {
publishDir = [
path: { "${params.outdir}/treesummarizedexperiment" },
mode: params.publish_dir_mode,
pattern: "*.rds"
]
}

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
3 changes: 3 additions & 0 deletions conf/test_multi.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ params {
dada_ref_taxonomy = "rdp=18"
skip_dada_addspecies = true
input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet_multi.tsv"

skip_phyloseq = true
skip_tse = true
}
2 changes: 2 additions & 0 deletions conf/test_pacbio_its.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Prevent default taxonomic classification
skip_dada_taxonomy = true

skip_phyloseq = true
}
2 changes: 2 additions & 0 deletions conf/test_reftaxcustom.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Skip downstream analysis with QIIME2
skip_qiime_downstream = true

skip_tse = true
}
8 changes: 5 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
- [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
- [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
- [Phyloseq](#phyloseq) - Phyloseq R objects
- [R object](#r-objects) - Phyloseq and TreeSummarizedExperiment R objects
- [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -629,15 +629,17 @@ Most of the fields in the template will not be populated by the export process,

</details>

### Phyloseq
### R objects

This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.
Pipeline results are stored in phyloseq and TreeSummarizedExperiment R objects for each taxonomy table produced by this pipeline. The R objects will contain an ASV abundance table and a taxonomy table, and optionally sequences, metadata and a phylogenetic tree.

<details markdown="1">
<summary>Output files</summary>

- `phyloseq/`
- `<taxonomy>_phyloseq.rds`: Phyloseq R object.
- `treesummarizedexperiment/`
- `<taxonomy>_TreeSummarizedExperiment.rds`: TreeSummarizedExperiment R object.

</details>

Expand Down
2 changes: 2 additions & 0 deletions modules/local/summary_report.nf
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ process SUMMARY_REPORT {
path(picrust_pathways)
path(sbdi, stageAs: 'sbdi/*')
path(phyloseq, stageAs: 'phyloseq/*')
path(tse, stageAs: 'tse/*')

output:
path "*.svg" , emit: svg, optional: true
Expand Down Expand Up @@ -137,6 +138,7 @@ process SUMMARY_REPORT {
ancombc_formula ? "ancombc_formula='"+ ancombc_formula.join(",") +"'" : "",
sbdi ? "sbdi='"+ sbdi.join(",") +"'" : "",
phyloseq ? "phyloseq='"+ phyloseq.join(",") +"'" : "",
tse ? "tse='"+ tse.join(",") +"'" : "",
]
// groovy list to R named list string; findAll removes empty entries
params_list_named_string = params_list_named.findAll().join(',').trim()
Expand Down
82 changes: 82 additions & 0 deletions modules/local/treesummarizedexperiment.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
process TREESUMMARIZEDEXPERIMENT {
tag "$prefix"
label 'process_low'

conda "bioconda::bioconductor-treesummarizedexperiment=2.10.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' :
'biocontainers/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' }"

input:
tuple val(prefix), path(tax_tsv), path(otu_tsv)
path sam_tsv
path tree

output:
tuple val(prefix), path("*TreeSummarizedExperiment.rds"), emit: rds
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def sam_tsv = "\"${sam_tsv}\""
def otu_tsv = "\"${otu_tsv}\""
def tax_tsv = "\"${tax_tsv}\""
def tree = "\"${tree}\""
def prefix = "\"${prefix}\""
"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(TreeSummarizedExperiment))
# Read otu table. It must be in a SimpleList as a matrix where rows
# represent taxa and columns samples.
otu_mat <- read.table($otu_tsv, sep="\\t", header=TRUE, row.names=1)
otu_mat <- as.matrix(otu_mat)
assays <- SimpleList(counts = otu_mat)
# Read taxonomy table. Correct format for it is DataFrame.
taxonomy_table <- read.table($tax_tsv, sep="\\t", header=TRUE, row.names=1)
taxonomy_table <- DataFrame(taxonomy_table)
# Match rownames between taxonomy table and abundance matrix.
taxonomy_table <- taxonomy_table[match(rownames(otu_mat), rownames(taxonomy_table)), ]
# Create TreeSE object.
tse <- TreeSummarizedExperiment(
assays = assays,
rowData = taxonomy_table
)
# If taxonomy table contains sequences, move them to referenceSeq slot
if (!is.null(rowData(tse)[["sequence"]])) {
referenceSeq(tse) <- DNAStringSet( rowData(tse)[["sequence"]] )
rowData(tse)[["sequence"]] <- NULL
}
# If provided, we add sample metadata as DataFrame object. rownames of
# sample metadata must match with colnames of abundance matrix.
if (file.exists($sam_tsv)) {
sample_meta <- read.table($sam_tsv, sep="\\t", header=TRUE, row.names=1)
sample_meta <- sample_meta[match(colnames(tse), rownames(sample_meta)), ]
sample_meta <- DataFrame(sample_meta)
colData(tse) <- sample_meta
}
# If provided, we add phylogeny. The rownames in abundance matrix must match
# with node labels in phylogeny.
if (file.exists($tree)) {
phylogeny <- ape::read.tree($tree)
rowTree(tse) <- phylogeny
}
saveRDS(tse, file = paste0($prefix, "_TreeSummarizedExperiment.rds"))
# Version information
writeLines(c("\\"${task.process}\\":",
paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
paste0(" TreeSummarizedExperiment: ", packageVersion("TreeSummarizedExperiment"))),
"versions.yml"
)
"""
}
2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,8 @@ params {
skip_dada_addspecies = false
skip_alpha_rarefaction = false
skip_diversity_indices = false
skip_phyloseq = false
skip_tse = false
skip_multiqc = false
skip_report = false

Expand Down
8 changes: 8 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -797,6 +797,14 @@
"type": "boolean",
"description": "Skip alpha and beta diversity analysis"
},
"skip_phyloseq": {
"type": "boolean",
"description": "Skip exporting phyloseq rds object(s)"
},
"skip_tse": {
"type": "boolean",
"description": "Skip exporting TreeSummarizedExperiment rds object(s)"
},
"skip_multiqc": {
"type": "boolean",
"description": "Skip MultiQC reporting"
Expand Down
44 changes: 0 additions & 44 deletions subworkflows/local/phyloseq_workflow.nf

This file was deleted.

5 changes: 4 additions & 1 deletion subworkflows/local/qiime2_diversity.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ workflow QIIME2_DIVERSITY {
ch_versions_qiime2_diversity = Channel.empty()

//Phylogenetic tree for beta & alpha diversities
if (!ch_tree) {
produce_tree = !ch_tree ? true : false
if (produce_tree) {
QIIME2_TREE ( ch_seq )
ch_versions_qiime2_diversity = ch_versions_qiime2_diversity.mix(QIIME2_TREE.out.versions)
ch_tree = QIIME2_TREE.out.qza
Expand Down Expand Up @@ -82,6 +83,8 @@ workflow QIIME2_DIVERSITY {
}

emit:
tree_qza = ch_tree
tree_nwk = produce_tree ? QIIME2_TREE.out.nwk : []
depth = !skip_diversity_indices ? QIIME2_DIVERSITY_CORE.out.depth : []
alpha = !skip_diversity_indices ? QIIME2_DIVERSITY_ALPHA.out.alpha : []
beta = !skip_diversity_indices ? QIIME2_DIVERSITY_BETA.out.beta : []
Expand Down
50 changes: 50 additions & 0 deletions subworkflows/local/robject_workflow.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Create phyloseq objects
*/

include { PHYLOSEQ } from '../../modules/local/phyloseq'
include { PHYLOSEQ_INASV } from '../../modules/local/phyloseq_inasv'
include { TREESUMMARIZEDEXPERIMENT } from '../../modules/local/treesummarizedexperiment'

workflow ROBJECT_WORKFLOW {
take:
ch_tax
ch_tsv
ch_meta
ch_robject_intree
run_qiime2

main:
ch_versions_robject_workflow = Channel.empty()

if ( params.metadata ) {
ch_robject_inmeta = ch_meta.first() // The .first() is to make sure it's a value channel
} else {
ch_robject_inmeta = []
}

if ( run_qiime2 ) {
if ( params.exclude_taxa != "none" || params.min_frequency != 1 || params.min_samples != 1 ) {
ch_robject_inasv = PHYLOSEQ_INASV ( ch_tsv ).tsv
} else {
ch_robject_inasv = ch_tsv
}
} else {
ch_robject_inasv = ch_tsv
}

if ( !params.skip_phyloseq ) {
PHYLOSEQ ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
ch_versions_robject_workflow = ch_versions_robject_workflow.mix(PHYLOSEQ.out.versions)
}

if ( !params.skip_tse ) {
TREESUMMARIZEDEXPERIMENT ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
ch_versions_robject_workflow = ch_versions_robject_workflow.mix(TREESUMMARIZEDEXPERIMENT.out.versions)
}

emit:
phyloseq = !params.skip_phyloseq ? PHYLOSEQ.out.rds : []
tse = !params.skip_tse ? TREESUMMARIZEDEXPERIMENT.out.rds : []
versions = ch_versions_robject_workflow
}
1 change: 1 addition & 0 deletions tests/pipeline/doubleprimers.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ nextflow_pipeline {
{ assert snapshot(path("$outputDir/input/Samplesheet_double_primer.tsv")).match("input") },
{ assert new File("$outputDir/qiime2/abundance_tables/feature-table.tsv").exists() },
{ assert new File("$outputDir/phyloseq/kraken2_phyloseq.rds").exists() },
{ assert new File("$outputDir/treesummarizedexperiment/kraken2_TreeSummarizedExperiment.rds").exists() },
{ assert snapshot(path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.classifiedreads.txt"),
path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.complete.tsv"),
path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.tsv")).match("kraken2") },
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/failed.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ nextflow_pipeline {
{ assert new File("$outputDir/qiime2/diversity/alpha_diversity/shannon_vector/kruskal-wallis-pairwise-treatment1.csv").exists() },
{ assert new File("$outputDir/qiime2/diversity/beta_diversity/bray_curtis_pcoa_results-PCoA/index.html").exists() },
{ assert new File("$outputDir/summary_report/summary_report.html").exists() },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() },
{ assert new File("$outputDir/treesummarizedexperiment/dada2_TreeSummarizedExperiment.rds").exists() }
)
}
}
Expand Down
Loading

0 comments on commit 64de8ae

Please sign in to comment.