nfDA

sagc-bioinformatics · Oct 9, 2024 · fd82735 · fd82735
1 parent 93789fa
commit fd82735
Show file tree

Hide file tree

Showing 7 changed files with 1,352 additions and 188 deletions.
diff --git a/content/DifferentialAbundance.md b/content/DifferentialAbundance.md
@@ -4,7 +4,50 @@ title = 'nf-core/DifferentialAbundance'
 
 ![DifferentialAbundance_pipeline](DifferentialAbundance_pipeline.png)
 
-#### nf-core/DifferentialAbundance
+nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.
+
+1. Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
+2. Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
+3. Run differential analysis over all contrasts specified.
+4. Optionally run a differential gene set analysis.
+5. Generate exploratory and differential analysis plots for interpretation.
+6. Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
+7. Build an HTML report based on R markdown, with interactive plots (where possible) and tables.
+
+# Set-up run
+***for those with more confidence*** this run can be prepared from scratch in a new directory. 
+- the pipeline will taake ~40mins to run
+- for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions
+- a good chance to demonstrate the **resume** feature of nextflow
+
+
+```bash
+mkdir ~/workshop/nfDifferentialAbundance2
+cd ~/workshop/nfDifferentialAbundance2
+ls -l
+```
+***optional*** you can use the nf-core launch command to build a launch command, but these instructions will be using a 'nextflow run' command
+
+The minimal input requirements are
+1. Sample sheet
+- containg the sample information, metadata and group relationships
+
+2
+
+
+
+
+nf-core launch 
+```
+
+
+
+
+
+
+
+#### 
+nf-core/DifferentialAbundance
 
 There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
 You can check the ~/.nextflow/assets folders to see what is already installed

diff --git a/content/nfRNAseq.md b/content/nfRNAseq.md
@@ -13,8 +13,8 @@ nf-core list -s stars
 
 lets use nf-core tools to build a command to run the nf-core/RNAseq pipeline
 ```bash
-mkdir ~/workshop/nfcore
-cd ~/workshop/nfcore  
+mkdir -p  ~/workshop/nfRNAseq
+cd ~/workshop/nfRNAseq 
 
 nf-core launch -h
 ```
@@ -60,7 +60,7 @@ nf-core launch --id 1728482936_6064841c2138
 ```
 - alternatively you can use a nextflow run command
 
-- no need to run this command, in the interest of time (and the lack of disk space on your intance), I've pre-prepared the outputs for this run. We will run the next pipeline to completion.
+- no need to run this command, in the interest of time (and the lack of disk space on your intance), I've pre-prepared the outputs for this run
 
 navigate to run directory to see the nextflow run command
 ```bash
@@ -84,81 +84,97 @@ The dataset used throughout this workshop is as follow:
 - treatment vs control
 - 4 replicates for each
 ```
-- the information is reflected in the samplesheet and run command
-
-# Results
-
-
-
-	1. Merge re-sequenced FastQ files (cat)
-	2. Sub-sample FastQ files and auto-infer strandedness (fq, Salmon)
-	3. Read QC (FastQC)
-	4. UMI extraction (UMI-tools)
-	5. Adapter and quality trimming (Trim Galore!)
-	6. Removal of genome contaminants (BBSplit)
-	7. Removal of ribosomal RNA (SortMeRNA)
-	8. Choice of multiple alignment and quantification routes:
-		1. STAR -> Salmon
-		2. STAR -> RSEM
-		3. HiSAT2 -> NO QUANTIFICATION
-	9. Sort and index alignments (SAMtools)
-	10. UMI-based deduplication (UMI-tools)
-	11. Duplicate read marking (picard MarkDuplicates)
-	12. Transcript assembly and quantification (StringTie)
-	13. Create bigWig coverage files (BEDTools, bedGraphToBigWig)
-	14. Extensive quality control:
-		1. RSeQC
-		2. Qualimap
-		3. dupRadar
-		4. Preseq
-		5. DESeq2
-	15. Pseudoalignment and quantification (Salmon or ‘Kallisto’; optional)
-	16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)
-
-In this section we will work through setting up the nf-core/RNAseq pipeline. We will be chosing specific parameters and finally, going through was the output looks like.
-
-# setting up nf-core/RNAseq
+- most of this information is reflected in the samplesheet and run command
 
-#### dataset
-The dataset used throughout this workshop is as follow:
+#### Run Summary
 
+list results directory. All nf-core outputs have a consistant structure of the outputs
+```bash
+tree /home/workshop/workshop/nfRNAseq/outs
 ```
-- 16 Samples sequenced with an MGI400 sequencer at SAGC using the Tecan Universal RNA-seq library protocol.
-- 2 different cancer cell lines (human)
-- treatment vs control
-- 4 replicates for each
-```
-***NOTE*** some concessions had to be made to work with this for workshop. Taking into account the large file sizes, the long run times and need for high compute resources.
 
-#### what are the inputs to an RNAseq pipeline
+[link to execution timeline](../execution_timeline_2024-10-05_16-02-39.html)
 
-# .fastq /.fastq.gz
-```
+[link to execution report](../execution_report_2024-10-05_16-02-39.html)
+
+# Genomics Files Background
+Quick background on genomic file formats to help describe the inputs and outputs of an NGS experiment.
+Knowing what these files are isn't only important in finding which files to use for a pipeline, but a key foundation of genomic bioinformatics
+Being able to use and manipulate each file open's up many opportunities, and is often required for troubleshooting                           
+Many are plain text files, this means they can be manipluated with basic text editing.
+
+##### .fastq /.fastq.gz
 
-```
 ![](../fastqfile.png)
 
-# .fasta
+#### .fasta
+the genome.fa file is a plain text representation of the genome sequence. This is the 'reference' to which the sequencing files (.fastq) are alligned.
 
-# .gtf
+```bash
+head ./smallRNA/outs/mirtrace/[:]/mirtrace/qc_passed_reads.rnatype_unknown.collapsed/Acontrol1.fastp.fasta
+```
+#### .gtf
+genes.gtf ; a genomic interval file referencing the genome. This file depicts the genes/ transcripts. It's a required for counting where reads are mapped to, and often has alot more annotation data in regards to the gene/transcript.
 
-# .bed
+#### .bed
 ![](../bedfile.png)
 
-#### Genomic file formats
-- now that we have gone through how to run the nf-core/RNAseq pipeline. Let's look at the inputs and outputs in detail.
-- using this chance to describe what are the key genomic file formats used in RNAseq and beyond
-- Knowing what these files are isn't only important in finding which files to use for a pipeline, but a key foundation of genomic bioinformati$
-- being able to use and manipulate each file open's up many opportunities, and is often required for troubleshooting wehn something has gone w$
-- many files are plain text files, this means they can be manipluated with basic text editing. I'll be going through some examples.
-- For a more visual perspective, we'll also be using a genome browser, IGV (Integrative Genome Browser) to get a feel for what information eac$
 
-
-# bam
+#### bam
 ![](../bamfile.png)
 
-# bigwig
-
 
+# Walkthough of outputs nf-core/RNAseq
+the multiqc summary is a real strength of nf-core pipeline.
+alot of the key analyses are captured
 
 [link to multiqc report](../multiqc_report.html)
+
+```
+Overview of the key processes run with nf-core/RNAseq.
+- identifying steps to trouble shooting the success of a run.
+- far more than just mapping (star) and counting (RSEM)
+- demonstrating the need for a workflow manager
+```
+### Preprocessing
+1. cat - Merge re-sequenced FastQ files
+2. FastQC - Raw read QC
+
+[link to fastqc.html](../Acontrol1_1_fastqc.html)
+
+3. UMI-tools extract - UMI barcode extraction
+4. TrimGalore - Adapter and quality trimming
+5.  BBSplit - Removal of genome contaminants
+6.  SortMeRNA - Removal of ribosomal RNA
+
+### Alignment and quantification
+1. STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
+2. STAR via RSEM - Alignment and quantification of expression levels
+3. HISAT2 - Memory efficient splice aware alignment to a reference
+
+### Alignment post-processing
+1. SAMtools - Sort and index alignments
+2. UMI-tools dedup - UMI-based deduplication
+3. picard MarkDuplicates - Duplicate read marking
+
+### Other steps
+1. StringTie - Transcript assembly and quantification
+2. BEDTools and bedGraphToBigWig - Create bigWig coverage files
+
+### Quality control
+1. RSeQC - Various RNA-seq QC metrics
+2. Qualimap - Various RNA-seq QC metrics
+3. dupRadar - Assessment of technical / biological read duplication
+4. Preseq - Estimation of library complexity
+5. featureCounts - Read counting relative to gene biotype
+6. DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
+7 MultiQC - Present QC for raw reads, alignment, read counting and sample similiarity
+
+### Pseudoalignment and quantification
+1. Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
+2. Kallisto - Near-optimal probabilistic RNA-seq quantification
+3. Workflow reporting and genomes
+4. Reference genome files - Saving reference genome indices/files
+5. Pipeline information - Report metrics generated during the workflow execution
+
+[link to execution report](../execution_report_2024-10-05_16-02-39.html)
diff --git a/content/setup_environment.md b/content/setup_environment.md
@@ -245,6 +245,7 @@ rm samtools-1.21.tar.bz2
 cd samtools-1.21/
 make
 make install
+export PATH=$PATH:/home/workshop/bin/bin
 ```
 
 [link to RNAseq Background slides](../Workshop_RNAseq_Intro.pdf)
diff --git a/public/differentialabundance/index.html b/public/differentialabundance/index.html
@@ -12,43 +12,23 @@
 
 
     <meta name="description" content="
+nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.
 
+Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
+Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
+Run differential analysis over all contrasts specified.
+Optionally run a differential gene set analysis.
+Generate exploratory and differential analysis plots for interpretation.
+Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
+Build an HTML report based on R markdown, with interactive plots (where possible) and tables.
 
 
 
-  nf-core/DifferentialAbundance
-
-There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a nexflow run nf-core/ commands.
-You can check the ~/.nextflow/assets folders to see what is already installed
-
-  
-
-ls -l ~/.nextflow/assets/nf-core/
-If you don&rsquo;t see the pipeline, you can pull it from the nf-core website.
-
-  
-
-nextflow pull nf-core/differentialabundance
-but this happeds automatically when running the nextflow run nf-core/differentialabundance
-
-
-
-
-
-  R Shiny App
-
-
-one of the key outputs from the nf-core/DifferentialAbundance pipeline is an Shiny app.
-This runs R processes in the background and presents the data as a site.
-One way to run the app is to kick it off in R studio, to do this you can download the App directory and use the following R code to open it.
-
-
 
 
+  Set-up run
 
-  Launching the App on your local R Studio
-
-option 1">
+for those with more confidence this run can be prepared from scratch in a new directory.">
 
 
 
@@ -154,23 +134,66 @@ <h1 class="Heading-title">
 
 </header>
   <p><img src="DifferentialAbundance_pipeline.png" alt="DifferentialAbundance_pipeline"></p>
+<p>nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.</p>
+<ol>
+<li>Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).</li>
+<li>Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.</li>
+<li>Run differential analysis over all contrasts specified.</li>
+<li>Optionally run a differential gene set analysis.</li>
+<li>Generate exploratory and differential analysis plots for interpretation.</li>
+<li>Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.</li>
+<li>Build an HTML report based on R markdown, with interactive plots (where possible) and tables.</li>
+</ol>
+
 
 
+<h1 id="set-up-run">
+  <a class="Heading-link u-clickable" href="/workshop_nfRNAseq/differentialabundance/#set-up-run">Set-up run</a>
+</h1>
+<p><em><strong>for those with more confidence</strong></em> this run can be prepared from scratch in a new directory.</p>
+<ul>
+<li>the pipeline will taake ~40mins to run</li>
+<li>for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions</li>
+<li>a good chance to demonstrate the <strong>resume</strong> feature of nextflow</li>
+</ul>
+
+
 
-<h4 id="nf-coredifferentialabundance">
-  <a class="Heading-link u-clickable" href="/workshop_nfRNAseq/differentialabundance/#nf-coredifferentialabundance">nf-core/DifferentialAbundance</a>
-</h4>
-<p>There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a <em>nexflow run nf-core/</em> commands.
-You can check the ~/.nextflow/assets folders to see what is already installed</p>
+<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir ~/workshop/nfDifferentialAbundance2
+</span></span><span class="line"><span class="cl"><span class="nb">cd</span> ~/workshop/nfDifferentialAbundance2
+</span></span><span class="line"><span class="cl">ls -l</span></span></code></pre></div>
+<p><em><strong>optional</strong></em> you can use the nf-core launch command to build a launch command, but these instructions will be using a &rsquo;nextflow run&rsquo; command</p>
+<p>The minimal input requirements are</p>
+<ol>
+<li>Sample sheet</li>
+</ol>
+<ul>
+<li>containg the sample information, metadata and group relationships</li>
+</ul>
+<p>2</p>
+<p>nf-core launch</p>
 
 
 
-<pre tabindex="0"><code>ls -l ~/.nextflow/assets/nf-core/</code></pre>
-<p>If you don&rsquo;t see the pipeline, you can pull it from the nf-core website.</p>
+<pre tabindex="0"><code>
+
+
+
+
+
+
+#### 
+nf-core/DifferentialAbundance
+
+There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
+You can check the ~/.nextflow/assets folders to see what is already installed</code></pre>
+<p>ls -l ~/.nextflow/assets/nf-core/</p>
 
 
 
-<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">nextflow pull nf-core/differentialabundance</span></span></code></pre></div>
+<pre tabindex="0"><code>If you don&#39;t see the pipeline, you can pull it from the nf-core website. 
+```bash
+nextflow pull nf-core/differentialabundance</code></pre>
 <p>but this happeds automatically when running the nextflow run nf-core/differentialabundance</p>
 <p><img src="nextflowpull.png" alt="nfpull"></p>
 

diff --git a/public/execution_report_2024-10-05_16-02-39.html b/public/execution_report_2024-10-05_16-02-39.html