workshop

sagc-bioinformatics · Oct 9, 2024 · 2039b5e · 2039b5e
1 parent fd82735
commit 2039b5e
Show file tree

Hide file tree

Showing 8 changed files with 16,617 additions and 56 deletions.
diff --git a/content/DifferentialAbundance.md b/content/DifferentialAbundance.md
@@ -20,42 +20,18 @@ nf-core/differentialabundance is a bioinformatics pipeline that can be used to a
 - for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions
 - a good chance to demonstrate the **resume** feature of nextflow
 
-
 ```bash
-mkdir ~/workshop/nfDifferentialAbundance2
-cd ~/workshop/nfDifferentialAbundance2
+mkdir ~/workshop/DifferentialAbundance
+cd ~/workshop/DifferentialAbundance
 ls -l
 ```
 ***optional*** you can use the nf-core launch command to build a launch command, but these instructions will be using a 'nextflow run' command
 
-The minimal input requirements are
-1. Sample sheet
-- containg the sample information, metadata and group relationships
-
-2
-
-
-
-
-nf-core launch 
-```
-
-
-
-
-
-
-
-#### 
-nf-core/DifferentialAbundance
-
-There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
 You can check the ~/.nextflow/assets folders to see what is already installed
-
-```
+```bash
 ls -l ~/.nextflow/assets/nf-core/
 ```
-If you don't see the pipeline, you can pull it from the nf-core website. 
+If you don't see the pipeline, you can pull it from the nf-core website.
 ```bash
 nextflow pull nf-core/differentialabundance
 ```
@@ -64,6 +40,28 @@ but this happeds automatically when running the nextflow run nf-core/differentia
 ![nfpull](nextflowpull.png)
 
 
+to identify the requirements of the pipeline go to the website
+https://nf-co.re/differentialabundance/1.5.0/docs/usage/
+The minimal input requirements are
+1. Sample sheet, containg the sample information, metadata and group relationships
+2. A counts table, such as that made with the RNAseq pipeline
+3. a transcript length table - this is output from the RNAseq pipeline. It allows for more acurate normalisation based on transcript length.
+4. profile, refering to the config file and the experiment type. rnaseq, singularity.
+5. a gtf file - ideally this is the same genome reference that was used in the mapping step of the nf-core/RNAseq run
+6. optional. gsea gene set for gsea analysis. this can be downloaded from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp
+
+
+For those who just want to get the result, the previous run can be resumed with the following command
+```bash
+cd ~/workshop/nfDifferentialAbundance
+sh nf_differentialabundance.sh
+```
+
+#### multiQC summary
+[link to multiQC](../SAGC_Workshop_RNAseq.html)
+
+
+
 # R Shiny App
 - one of the key outputs from the nf-core/DifferentialAbundance pipeline is an Shiny app.
 - This runs R processes in the background and presents the data as a site.

diff --git a/content/DifferentialAbundance.md.save b/content/DifferentialAbundance.md.save
@@ -0,0 +1,225 @@
++++
+title = 'nf-core/DifferentialAbundance'
++++
+
+![DifferentialAbundance_pipeline](DifferentialAbundance_pipeline.png)
+
+nf-core/differentialabundance is a bioinformatics pipeline that can be used to analyse data represented as matrices, comparing groups of observations to generate differential statistics and downstream analyses. The pipeline supports RNA-seq data such as that generated by the nf-core rnaseq workflow, and Affymetrix arrays via .CEL files. Other types of matrix may also work with appropriate changes to parameters, and PRs to support additional specific modalities are welcomed.
+
+1. Optionally generate a list of genomic feature annotations using the input GTF file (if a table is not explicitly supplied).
+2. Cross-check matrices, sample annotations, feature set and contrasts to ensure consistency.
+3. Run differential analysis over all contrasts specified.
+4. Optionally run a differential gene set analysis.
+5. Generate exploratory and differential analysis plots for interpretation.
+6. Optionally build and (if specified) deploy a Shiny app for fully interactive mining of results.
+7. Build an HTML report based on R markdown, with interactive plots (where possible) and tables.
+
+# Set-up run
+***for those with more confidence*** this run can be prepared from scratch in a new directory. 
+- the pipeline will taake ~40mins to run
+- for this reason there is a completed run in ~/workshop/nfDifferentialAbundance/, where you can go through the motions
+- a good chance to demonstrate the **resume** feature of nextflow
+
+```bash
+mkdir ~/workshop/DifferentialAbundance
+cd ~/workshop/DifferentialAbundance
+ls -l
+```
+***optional*** you can use the nf-core launch command to build a launch command, but these instructions will be using a 'nextflow run' command
+
+You can check the ~/.nextflow/assets folders to see what is already installed
+```bash
+ls -l ~/.nextflow/assets/nf-core/
+```
+If you don't see the pipeline, you can pull it from the nf-core website.
+```bash
+nextflow pull nf-core/differentialabundance
+```
+but this happeds automatically when running the nextflow run nf-core/differentialabundance
+
+![nfpull](nextflowpull.png)
+
+
+to identify the requirements of the pipeline go to the website
+https://nf-co.re/differentialabundance/1.5.0/docs/usage/
+The minimal input requirements are
+1. Sample sheet, containg the sample information, metadata and group relationships
+2. A counts table, such as that made with the RNAseq pipeline
+3. a transcript length table - this is output from the RNAseq pipeline. It allows for more acurate normalisation based on transcript length.
+4. profile, refering to the config file and the experiment type. rnaseq, singularity.
+5. a gtf file - ideally this is the same genome reference that was used in the mapping step of the nf-core/RNAseq run
+6. optional. gsea gene set for gsea analysis. this can be downloaded from https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp
+
+
+For those who just want to get the result, the previous run can be resumed with the following command
+```bash
+cd ~/workshop/nfDifferentialAbundance
+sh nf_differentialabundance.sh
+```
+
+#### multiQC summary
+[link to multiQC](../SAGC_Workshop_RNAseq.html)
+
+
+
+# R Shiny App
+- one of the key outputs from the nf-core/DifferentialAbundance pipeline is an Shiny app.
+- This runs R processes in the background and presents the data as a site.
+- One way to run the app is to kick it off in R studio, to do this you can download the App directory and use the following R code to open it.
+
+## Launching the App on your local R Studio
+***option 1***
+
+The easiest way to run your R shiny app is using R studio.
+
+Firstly, you will need to ***download**** the nf-core/DifferentialAbundance outputs which you have generated to your pc using scp.
+
+```bash
+IP=[yourIPaddress]
+
+mkdir -p  ~/workshop_RNAseq/nfDifferentialAbundance 
+scp -r workshop@$IP:/home/workshop/workshop/nfDifferentialAbundance/outs/ ~/workshop_RNAseq/nfDifferentialAbundance/
+
+cd ~/workshop_RNAseq/nfDifferentialAbundance/outs
+```
+Open R studio, and paste the following R code. This also requires installation of a few packages in Rstudio including the ***shinyngs** package. The installation code is hashed out in this case.
+
+Within R studio, in the top left panel you can paste this code
+```r
+##To run shiny App
+# run this in R studio
+
+# install packages
+install.packages("remotes")
+remotes::install_github("pinin4fjords/shinyngs")
+library(remotes)
+library(shinyngs)
+library(markdown)
+
+# if you have downloaded the app locally then you will need to add your path to the working directory
+# if you downloaded it to the directory described above, this is where it will be  '~/workshop_RNAseq/nfDifferentialAbundance'
+setwd("/Users/danielthomson/workshop_RNAseq/nfDifferentialAbundance/outs/shinyngs_app/SAGC_Workshop_RNAseq")
+
+#####
+esel <- readRDS("data.rds") # you need to navigate to the 'shinyngs_app' directory
+app <- prepareApp("rnaseq", esel)
+shiny::shinyApp(app$ui, app$server)
+```
+
+## Using Rstudio via Nectar
+***option 2***
+
+You can also run Rstudio from a virual machine. In some cases this is preferable when dealing with large datasets, where you can control the compute resources used. \
+
+| setting | Description                                                                    |
+|------|----------------------------------------------------------------------------------------|
+|Host  |	use the instance IP address, or the hostname [hostname].[project].cloud.edu.au  |
+|Login |	use the username you created in the Configure Application dialog                |
+
+
+you shoulud see the following login page, access with your passoword\
+![Nectar Rstudio](Nectar_Rstudio.png)
+
+
+Within R studio, in the top left panel you can paste this code, making sure the path to the app directory is correct.
+```r
+##To run shiny App
+# run this in R studio
+
+#######################
+# install packages
+#install.packages("remotes")
+#if (!require("BiocManager", quietly = TRUE))
+#  install.packages("BiocManager")
+#BiocManager::install(c("SummarizedExperiment", "GSEABase", "limma"), force = TRUE)
+#devtools::install_github('pinin4fjords/shinyngs', force = TRUE)
+
+library(remotes)
+library(shinyngs)
+library(markdown)
+
+#######################
+
+# navigate to where the shiny app is
+# if you are working on Rstudio running on your nectar instance, then this should be in the 'out' directory from where you ran the nf-core/DiferentialAbundance pipeline
+setwd("/home/workshop/workshop/nfDifferentialAbundance/outs/shinyngs_app/SAGC_Workshop_RNAseq/")
+
+#####
+esel <- readRDS("data.rds") # you need to navigate to the 'shinyngs_app' directory
+app <- prepareApp("rnaseq", esel)
+shiny::shinyApp(app$ui, app$server)
+```
+
+# Interractive RNAseq data analysis
+This previous step is worth the hastle, because it gives you access to the R Shiny App with all your results available.
+
+![](RshinyHomepage.png)
+
+From here, you will see all the parameters you set up in your *nextflow run* kickoff script. And it will give you access to the data analysis.
+
+Once you've made it to this point take some time to navigate around. You will see interactive versions of many key differential expression analysis tools. In the background, it is being run in your Rstudio session.
+
+Now that we have done the combined work of what would take days writing R code for DEseq2 and ggplot2, there is a fair bit to unpack and understand \
+We'll walk through some of the key analysis, results.
+
+![](Rshiny_volcano.png)
+
+
+
+![](RshinyExperimentalData.png)
+You can see from the 'Experimental data' page, all of the information came from the sample sheet provided to nf-core/DifferentialAbudance.
+
+```bash
+cd home/workshop/workshop/nfDifferentialAbundance/
+cat SampleSheet.csv 
+```
+sample,fastq_1,fastq_2,treatment,cellline,condition
+	Acontrol1,,,control,A,Acontrol
+	Acontrol2,,,control,A,Acontrol
+	Acontrol3,,,control,A,Acontrol
+	Acontrol4,,,control,A,Acontrol
+	Atreated1,,,treated,A,Atreated
+	Atreated2,,,treated,A,Atreated
+	Atreated3,,,treated,A,Atreated
+	Atreated4,,,treated,A,Atreated
+	Bcontrol1,,,control,B,Bcontrol
+	Bcontrol2,,,control,B,Bcontrol
+	Bcontrol3,,,control,B,Bcontrol
+	Bcontrol4,,,control,B,Bcontrol
+	Btreated1,,,treated,B,Btreated
+	Btreated2,,,treated,B,Btreated
+	Btreated3,,,treated,B,Btreated
+	Btreated4,,,treated,B,Btreated
+
+```bash
+cat contrasts.csv 
+```
+	id,variable,reference,target
+	cellline,cellline,A,B
+	treatedVScontrol1,condition,Acontrol,Atreated
+	treatedVScontrol2,condition,Bcontrol,Btreated
+
+There is the opportunity to add as many extra columns to this table, which will all provide extra possible comparisons to the app.
+
+![](RowMetadata.png)
+Looking at the 'row metadata' page, you can see that all the gene information comes from the ***gtf*** file which we used.\
+For well annotated genomes like Hg38 (human) there is alot of extra information that can be pulled out.
+```bash
+head -n 10 genome.gtf
+```
+
+```bash
+head -n 10 genome.gtf
+```
+
+
+![normalisation](normalisation.png)
+
+
+![PCA](PCA.png)
+
+![cluster](heirachicalclustering.png)
+
+![foldchange](foldchange.png)
+
+![PDX1](PDX1.png)
diff --git a/content/sRNA.md b/content/sRNA.md
@@ -4,7 +4,6 @@ title = 'nf-core/smRNAseq'
 https://nf-co.re/smrnaseq/2.2.1/
 
 #### nf-core/smRNAseq
-test
 
 Pipeline summary
 1. Raw read QC (FastQC)

diff --git a/public/SAGC_Workshop_RNAseq.html b/public/SAGC_Workshop_RNAseq.html
diff --git a/public/differentialabundance/index.html b/public/differentialabundance/index.html
@@ -159,43 +159,46 @@ <h1 id="set-up-run">
 
 
 
-<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir ~/workshop/nfDifferentialAbundance2
-</span></span><span class="line"><span class="cl"><span class="nb">cd</span> ~/workshop/nfDifferentialAbundance2
+<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">mkdir ~/workshop/DifferentialAbundance
+</span></span><span class="line"><span class="cl"><span class="nb">cd</span> ~/workshop/DifferentialAbundance
 </span></span><span class="line"><span class="cl">ls -l</span></span></code></pre></div>
 <p><em><strong>optional</strong></em> you can use the nf-core launch command to build a launch command, but these instructions will be using a &rsquo;nextflow run&rsquo; command</p>
-<p>The minimal input requirements are</p>
-<ol>
-<li>Sample sheet</li>
-</ol>
-<ul>
-<li>containg the sample information, metadata and group relationships</li>
-</ul>
-<p>2</p>
-<p>nf-core launch</p>
+<p>You can check the ~/.nextflow/assets folders to see what is already installed</p>
 
 
 
-<pre tabindex="0"><code>
-
-
+<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ls -l ~/.nextflow/assets/nf-core/</span></span></code></pre></div>
+<p>If you don&rsquo;t see the pipeline, you can pull it from the nf-core website.</p>
 
+
 
+<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">nextflow pull nf-core/differentialabundance</span></span></code></pre></div>
+<p>but this happeds automatically when running the nextflow run nf-core/differentialabundance</p>
+<p><img src="nextflowpull.png" alt="nfpull"></p>
+<p>to identify the requirements of the pipeline go to the website
+<a href="https://nf-co.re/differentialabundance/1.5.0/docs/usage/">https://nf-co.re/differentialabundance/1.5.0/docs/usage/</a>
+The minimal input requirements are</p>
+<ol>
+<li>Sample sheet, containg the sample information, metadata and group relationships</li>
+<li>A counts table, such as that made with the RNAseq pipeline</li>
+<li>a transcript length table - this is output from the RNAseq pipeline. It allows for more acurate normalisation based on transcript length.</li>
+<li>profile, refering to the config file and the experiment type. rnaseq, singularity.</li>
+<li>a gtf file - ideally this is the same genome reference that was used in the mapping step of the nf-core/RNAseq run</li>
+<li>optional. gsea gene set for gsea analysis. this can be downloaded from <a href="https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp">https://www.gsea-msigdb.org/gsea/msigdb/human/collections.jsp</a></li>
+</ol>
+<p>For those who just want to get the result, the previous run can be resumed with the following command</p>
 
+
 
-#### 
-nf-core/DifferentialAbundance
+<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="nb">cd</span> ~/workshop/nfDifferentialAbundance
+</span></span><span class="line"><span class="cl">sh nf_differentialabundance.sh</span></span></code></pre></div>
 
-There are a few ways of installing the nf-core pipeline. But this happens automatically when you use a *nexflow run nf-core/* commands.
-You can check the ~/.nextflow/assets folders to see what is already installed</code></pre>
-<p>ls -l ~/.nextflow/assets/nf-core/</p>
 
-
 
-<pre tabindex="0"><code>If you don&#39;t see the pipeline, you can pull it from the nf-core website. 
-```bash
-nextflow pull nf-core/differentialabundance</code></pre>
-<p>but this happeds automatically when running the nextflow run nf-core/differentialabundance</p>
-<p><img src="nextflowpull.png" alt="nfpull"></p>
+<h4 id="multiqc-summary">
+  <a class="Heading-link u-clickable" href="/workshop_nfRNAseq/differentialabundance/#multiqc-summary">multiQC summary</a>
+</h4>
+<p><a href="../SAGC_Workshop_RNAseq.html">link to multiQC</a></p>
 
 
 

diff --git a/public/srna/index.html b/public/srna/index.html
@@ -18,7 +18,6 @@
 
   nf-core/smRNAseq
 
-test
 Pipeline summary
 
 Raw read QC (FastQC)
@@ -181,7 +180,6 @@ <h1 class="Heading-title">
 <h4 id="nf-coresmrnaseq">
   <a class="Heading-link u-clickable" href="/workshop_nfRNAseq/srna/#nf-coresmrnaseq">nf-core/smRNAseq</a>
 </h4>
-<p>test</p>
 <p>Pipeline summary</p>
 <ol>
 <li>Raw read QC (FastQC)</li>