Skip to content

Commit

Permalink
nfDA
Browse files Browse the repository at this point in the history
  • Loading branch information
dw-thomson committed Oct 9, 2024
1 parent 93789fa commit fd82735
Show file tree
Hide file tree
Showing 7 changed files with 1,352 additions and 188 deletions.
45 changes: 44 additions & 1 deletion content/DifferentialAbundance.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,50 @@ title = 'nf-core/DifferentialAbundance'

![DifferentialAbundance_pipeline](DifferentialAbundance_pipeline.png)

#### nf-core/DifferentialAbundance
nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.

1. Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
2. Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
3. Run differential analysis over all contrasts specified.
4. Optionally run a differential gene set analysis.
5. Generate exploratory and differential analysis plots for interpretation.
6. Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
7. Build an HTML report based on R markdown, with interactive plots (where possible) and tables.

# Set-up run
***for those with more confidence*** this run can be prepared from scratch in a new directory.
- the pipeline will taake ~40mins to run
- for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions
- a good chance to demonstrate the **resume** feature of nextflow


```bash
mkdir ~/workshop/nfDifferentialAbundance2
cd ~/workshop/nfDifferentialAbundance2
ls -l
```
***optional*** you can use the nf-core launch command to build a launch command, but these instructions will be using a 'nextflow run' command

The minimal input requirements are
1. Sample sheet
- containg the sample information, metadata and group relationships

2




nf-core launch
```
####
nf-core/DifferentialAbundance
There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
You can check the ~/.nextflow/assets folders to see what is already installed
Expand Down
142 changes: 79 additions & 63 deletions content/nfRNAseq.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ nf-core list -s stars

lets use nf-core tools to build a command to run the nf-core/RNAseq pipeline
```bash
mkdir ~/workshop/nfcore
cd ~/workshop/nfcore
mkdir -p ~/workshop/nfRNAseq
cd ~/workshop/nfRNAseq

nf-core launch -h
```
Expand Down Expand Up @@ -60,7 +60,7 @@ nf-core launch --id 1728482936_6064841c2138
```
- alternatively you can use a nextflow run command

- no need to run this command, in the interest of time (and the lack of disk space on your intance), I've pre-prepared the outputs for this run. We will run the next pipeline to completion.
- no need to run this command, in the interest of time (and the lack of disk space on your intance), I've pre-prepared the outputs for this run

navigate to run directory to see the nextflow run command
```bash
Expand All @@ -84,81 +84,97 @@ The dataset used throughout this workshop is as follow:
- treatment vs control
- 4 replicates for each
```
- the information is reflected in the samplesheet and run command

# Results



1. Merge re-sequenced FastQ files (cat)
2. Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
3. Read QC (FastQC)
4. UMI extraction (UMI-tools)
5. Adapter and quality trimming (Trim Galore!)
6. Removal of genome contaminants (BBSplit)
7. Removal of ribosomal RNA (SortMeRNA)
8. Choice of multiple alignment and quantification routes:
1. STAR -> Salmon
2. STAR -> RSEM
3. HiSAT2 -> NO QUANTIFICATION
9. Sort and index alignments (SAMtools)
10. UMI-based deduplication (UMI-tools)
11. Duplicate read marking (picard MarkDuplicates)
12. Transcript assembly and quantification (StringTie)
13. Create bigWig coverage files (BEDTools, bedGraphToBigWig)
14. Extensive quality control:
1. RSeQC
2. Qualimap
3. dupRadar
4. Preseq
5. DESeq2
15. Pseudoalignment and quantification (Salmon or ‘Kallisto’; optional)
16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)

In this section we will work through setting up the nf-core/RNAseq pipeline. We will be chosing specific parameters and finally, going through was the output looks like.

# setting up nf-core/RNAseq
- most of this information is reflected in the samplesheet and run command

#### dataset
The dataset used throughout this workshop is as follow:
#### Run Summary

list results directory. All nf-core outputs have a consistant structure of the outputs
```bash
tree /home/workshop/workshop/nfRNAseq/outs
```
- 16 Samples sequenced with an MGI400 sequencer at SAGC using the Tecan Universal RNA-seq library protocol.
- 2 different cancer cell lines (human)
- treatment vs control
- 4 replicates for each
```
***NOTE*** some concessions had to be made to work with this for workshop. Taking into account the large file sizes, the long run times and need for high compute resources.

#### what are the inputs to an RNAseq pipeline
[link to execution timeline](../execution_timeline_2024-10-05_16-02-39.html)

# .fastq /.fastq.gz
```
[link to execution report](../execution_report_2024-10-05_16-02-39.html)

# Genomics Files Background
Quick background on genomic file formats to help describe the inputs and outputs of an NGS experiment.
Knowing what these files are isn't only important in finding which files to use for a pipeline, but a key foundation of genomic bioinformatics
Being able to use and manipulate each file open's up many opportunities, and is often required for troubleshooting
Many are plain text files, this means they can be manipluated with basic text editing.

##### .fastq /.fastq.gz

```
![](../fastqfile.png)

# .fasta
#### .fasta
the genome.fa file is a plain text representation of the genome sequence. This is the 'reference' to which the sequencing files (.fastq) are alligned.

# .gtf
```bash
head ./smallRNA/outs/mirtrace/[:]/mirtrace/qc_passed_reads.rnatype_unknown.collapsed/Acontrol1.fastp.fasta
```
#### .gtf
genes.gtf ; a genomic interval file referencing the genome. This file depicts the genes/ transcripts. It's a required for counting where reads are mapped to, and often has alot more annotation data in regards to the gene/transcript.

# .bed
#### .bed
![](../bedfile.png)

#### Genomic file formats
- now that we have gone through how to run the nf-core/RNAseq pipeline. Let's look at the inputs and outputs in detail.
- using this chance to describe what are the key genomic file formats used in RNAseq and beyond
- Knowing what these files are isn't only important in finding which files to use for a pipeline, but a key foundation of genomic bioinformati$
- being able to use and manipulate each file open's up many opportunities, and is often required for troubleshooting wehn something has gone w$
- many files are plain text files, this means they can be manipluated with basic text editing. I'll be going through some examples.
- For a more visual perspective, we'll also be using a genome browser, IGV (Integrative Genome Browser) to get a feel for what information eac$


# bam
#### bam
![](../bamfile.png)

# bigwig


# Walkthough of outputs nf-core/RNAseq
the multiqc summary is a real strength of nf-core pipeline.
alot of the key analyses are captured

[link to multiqc report](../multiqc_report.html)

```
Overview of the key processes run with nf-core/RNAseq.
- identifying steps to trouble shooting the success of a run.
- far more than just mapping (star) and counting (RSEM)
- demonstrating the need for a workflow manager
```
### Preprocessing
1. cat - Merge re-sequenced FastQ files
2. FastQC - Raw read QC

[link to fastqc.html](../Acontrol1_1_fastqc.html)

3. UMI-tools extract - UMI barcode extraction
4. TrimGalore - Adapter and quality trimming
5. BBSplit - Removal of genome contaminants
6. SortMeRNA - Removal of ribosomal RNA

### Alignment and quantification
1. STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
2. STAR via RSEM - Alignment and quantification of expression levels
3. HISAT2 - Memory efficient splice aware alignment to a reference

### Alignment post-processing
1. SAMtools - Sort and index alignments
2. UMI-tools dedup - UMI-based deduplication
3. picard MarkDuplicates - Duplicate read marking

### Other steps
1. StringTie - Transcript assembly and quantification
2. BEDTools and bedGraphToBigWig - Create bigWig coverage files

### Quality control
1. RSeQC - Various RNA-seq QC metrics
2. Qualimap - Various RNA-seq QC metrics
3. dupRadar - Assessment of technical / biological read duplication
4. Preseq - Estimation of library complexity
5. featureCounts - Read counting relative to gene biotype
6. DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
7 MultiQC - Present QC for raw reads, alignment, read counting and sample similiarity

### Pseudoalignment and quantification
1. Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
2. Kallisto - Near-optimal probabilistic RNA-seq quantification
3. Workflow reporting and genomes
4. Reference genome files - Saving reference genome indices/files
5. Pipeline information - Report metrics generated during the workflow execution

[link to execution report](../execution_report_2024-10-05_16-02-39.html)
1 change: 1 addition & 0 deletions content/setup_environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,7 @@ rm samtools-1.21.tar.bz2
cd samtools-1.21/
make
make install
export PATH=$PATH:/home/workshop/bin/bin
```

[link to RNAseq Background slides](../Workshop_RNAseq_Intro.pdf)
99 changes: 61 additions & 38 deletions public/differentialabundance/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,43 +12,23 @@


<meta name="description" content="
nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.
Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
Run differential analysis over all contrasts specified.
Optionally run a differential gene set analysis.
Generate exploratory and differential analysis plots for interpretation.
Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
Build an HTML report based on R markdown, with interactive plots (where possible) and tables.
nf-core/DifferentialAbundance
There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a nexflow run nf-core/ commands.
You can check the ~/.nextflow/assets folders to see what is already installed
ls -l ~/.nextflow/assets/nf-core/
If you don&rsquo;t see the pipeline, you can pull it from the nf-core website.
nextflow pull nf-core/differentialabundance
but this happeds automatically when running the nextflow run nf-core/differentialabundance
R Shiny App
one of the key outputs from the nf-core/DifferentialAbundance pipeline is an Shiny app.
This runs R processes in the background and presents the data as a site.
One way to run the app is to kick it off in R studio, to do this you can download the App directory and use the following R code to open it.
Set-up run
Launching the App on your local R Studio
option 1">
for those with more confidence this run can be prepared from scratch in a new directory.">



Expand Down Expand Up @@ -154,23 +134,66 @@ <h1 class="Heading-title">

</header>
<p><img src="DifferentialAbundance_pipeline.png" alt="DifferentialAbundance_pipeline"></p>
<p>nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.</p>
<ol>
<li>Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).</li>
<li>Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.</li>
<li>Run differential analysis over all contrasts specified.</li>
<li>Optionally run a differential gene set analysis.</li>
<li>Generate exploratory and differential analysis plots for interpretation.</li>
<li>Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.</li>
<li>Build an HTML report based on R markdown, with interactive plots (where possible) and tables.</li>
</ol>



<h1 id="set-up-run">
<a class="Heading-link u-clickable" href="/workshop_nfRNAseq/differentialabundance/#set-up-run">Set-up run</a>
</h1>
<p><em><strong>for those with more confidence</strong></em> this run can be prepared from scratch in a new directory.</p>
<ul>
<li>the pipeline will taake ~40mins to run</li>
<li>for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions</li>
<li>a good chance to demonstrate the <strong>resume</strong> feature of nextflow</li>
</ul>



<h4 id="nf-coredifferentialabundance">
<a class="Heading-link u-clickable" href="/workshop_nfRNAseq/differentialabundance/#nf-coredifferentialabundance">nf-core/DifferentialAbundance</a>
</h4>
<p>There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a <em>nexflow run nf-core/</em> commands.
You can check the ~/.nextflow/assets folders to see what is already installed</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir ~/workshop/nfDifferentialAbundance2
</span></span><span class="line"><span class="cl"><span class="nb">cd</span> ~/workshop/nfDifferentialAbundance2
</span></span><span class="line"><span class="cl">ls -l</span></span></code></pre></div>
<p><em><strong>optional</strong></em> you can use the nf-core launch command to build a launch command, but these instructions will be using a &rsquo;nextflow run&rsquo; command</p>
<p>The minimal input requirements are</p>
<ol>
<li>Sample sheet</li>
</ol>
<ul>
<li>containg the sample information, metadata and group relationships</li>
</ul>
<p>2</p>
<p>nf-core launch</p>



<pre tabindex="0"><code>ls -l ~/.nextflow/assets/nf-core/</code></pre>
<p>If you don&rsquo;t see the pipeline, you can pull it from the nf-core website.</p>
<pre tabindex="0"><code>






####
nf-core/DifferentialAbundance

There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
You can check the ~/.nextflow/assets folders to see what is already installed</code></pre>
<p>ls -l ~/.nextflow/assets/nf-core/</p>



<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">nextflow pull nf-core/differentialabundance</span></span></code></pre></div>
<pre tabindex="0"><code>If you don&#39;t see the pipeline, you can pull it from the nf-core website.
```bash
nextflow pull nf-core/differentialabundance</code></pre>
<p>but this happeds automatically when running the nextflow run nf-core/differentialabundance</p>
<p><img src="nextflowpull.png" alt="nfpull"></p>

Expand Down
1,041 changes: 1,041 additions & 0 deletions public/execution_report_2024-10-05_16-02-39.html

Large diffs are not rendered by default.

Loading

0 comments on commit fd82735

Please sign in to comment.