From a569e5585b35a6a425d76cdb193e23a2a95d25c5 Mon Sep 17 00:00:00 2001
From: mconomos <mconomos@uw.edu>
Date: Mon, 10 Jun 2024 11:39:47 -0700
Subject: [PATCH] update locus zoom exercises

---
 04_conditional_analysis.Rmd  | 30 ++++++++++-------
 04_conditional_analysis.html | 62 +++++++++++++++++++++++++-----------
 2 files changed, 62 insertions(+), 30 deletions(-)

diff --git a/04_conditional_analysis.Rmd b/04_conditional_analysis.Rmd
index 737fc0f..fd7da4d 100644
--- a/04_conditional_analysis.Rmd
+++ b/04_conditional_analysis.Rmd
@@ -49,20 +49,20 @@ The [Locus Zoom Shiny App](https://locuszoom-shiny-app.bdc.sb-webapp.com/) is an
 The application requires data to be stored as a JSON file. There is a `GENESIS Data JSONizer` tool that converts single-variant association test results .RData file as output by the `GENESIS Single Variant Association Testing` app into the required JSON file. This tool also calculates the linkage disequilibrium (LD) measures required to make the LocusZoom plot for the selected variants.
 
 - Click the "GENESIS Data JSONizer" tab at the top of the screen
-- Select Input Files
+- Select Input Files from your Project
   - GDS file: `1KG_phase3_GRCh38_subset_chr1.gds`
-  - .RData file: `1KG_trait_1_chr1.RData`
+  - .RData file: `1KG_trait_1_assoc_chr1.RData`
 - JSONizer parameters
   - Check: "Specify variant and a flanking region around it"
   - Select the position of the variant of interest: 212956321
-  - Specify flanking region: 50000 (i.e. 50kb in each direction).
+  - Specify flanking region: 100000 (i.e. 100kb in each direction).
   - Select test type: score
 - Click: JSONize
   
 You have the option to download the JSON file to your local environment or upload it to the BioData Catalyst platform and save it for later, if you desire. 
 
 - Expand: JSON File - Download and Export Form
-- Set a file name (e.g. "1KG_trait_1_chr1_212956321")
+- Set a file name (e.g. "1KG_trait_1_assoc_chr1_212956321_100kb")
 - Choose extension: `.json`
 - Click: Export JSON file to platform
 - Select your Project and Click: Confirm
@@ -73,31 +73,35 @@ There are several optional data layers you can add to your LocusZoom plot. The m
 - Expand: Option Data Layers
 - Expand: Linkage Disequilibrium
 - Select Data Source: Compute LD Data
-- Select reference variant: 1:212956321_?/? (our variant of interest)
+- Select reference variant: 1:212956321_T/C (our variant of interest)
 - Click: Calculate LD
+- Note: do not check the "use sample set file for LD calculation" button -- this allows you to select a subset of samples from your dataset
 
 You can expand the Linkage Disequilibrium Data Overview tab to see a preview of the calculated LD data, and you can download the data as a JSON file to your local environment or upload it to the BioData Catalyst platform and save it for later, if you desire.
 
 - Expand: JSON File - Download and Export Form
-- Set a file name (e.g. "1KG_trait_1_chr1_212956321_LD")
+- Set a file name (e.g. "1KG_trait_1_assoc_chr1_212956321_LD_100kb")
 - Choose extension: `.json`
 - Click: Export JSON file to platform
 - Select your Project and Click: Confirm
 - Click: Upload
 
-You need to select the Genome Build that matches your data:
-
-- Change the Genome Build to GRCh38 for this dataset
+Note that the you need to select the Genome Build that matches your data. In this case, our data is in build GRCh38, which is the default setting. \n
 
 You can review the Initial Plot State Info to make sure everything looks as expected, and then make the plot!
 
 - Click: Generate plot
 
-The generated plot is interactive. You can hover over variants to see their chromosome, position, alleles, and association p-value. You can drag the figure left or right to see different sections of the plotted region. You can save the current figure as a .png or .svg file either locally or on the BioData Catalyst platform. 
+The generated plot is interactive. You can hover over variants to see their chromosome, position, alleles, and association p-value. You can hover over genes to see their Ensembl gene ID and other information. You can drag the figure left or right to see different sections of the plotted region. Click on "Show Legend" to see how the color coding of points corresponds to LD $r^2$ values. Note that our data for this exercise is a subset of the whole chromosome -- you shouldn't expect to see large gaps with no variants when working with the full WGS data. You can save the current figure as a .png or .svg file either locally or on the BioData Catalyst platform. Note that the Locus Zoom plot generated as described above is saved as `1KG_trait_1_assoc_chr1_212956321_100kb_LocusZoom.svg` in your project files.
+
+- What gene is our lead variant located in?
+- What is the position of the second most significant variant and what is its LD $r^2$ value with our lead variant?
+- What is the largest LD $r^2$ value observed at any variant with our lead variant?
 
 If you've saved your .json association results file and your .json LD statistics file to your Project, you can come back later and recreate your LocusZoom plot by selecting the "Use Your Own Data Sources" tab at the top of the LocusZoom Shiny App page. This time, rather than JSONizing the data, you can select the .json files as input, and set the plotting parameters the same as we did above.
 
 
+
 ## Conditional Analysis
 
 One of the most common post-GWAS analyses we routinely perform is to run conditional analyses to explore if there are any secondary hits at loci (regions) with significant variant associations. Conditional analyses include genetic variants in the null model (i.e. the conditional variants) to adjust for their effects on the trait, just like the other fixed effect covariates in the model. The idea is to see if other association signals remain after accounting for (i.e. conditioning on) the effect(s) of the conditional variant(s).
@@ -120,7 +124,7 @@ assoc[assoc$Score.pval < 5e-8, ]
 
 In our original association analysis, we found that there were 6 genome-wide significant variants at two distinct loci. In the particular example here, it is pretty clear that we can consider our hits as two distinct loci, as they are at opposite ends of the chromosome and the physical distance between them is ~188Mb. Therefore, we identify our conditional variants as those at `1:212956321` and `1:25046749`. \n
 
-### Conditoinal Null Model
+### Conditional Null Model
 
 When preparing our data to run the conditional null model, we need to actually extract the genotype values from the GDS file. It is easiest to use the `variant.id` values from the GDS file, but remember that these are unique to your GDS file. 
 
@@ -347,7 +351,9 @@ From looking at the truncated Manhattan plot, we see that the signal from the lo
 
 ## Exercise 4.3 (LocusZoom Shiny App)
 
-Return to the LocusZoom Shiny App and make locus zoom plots indexed by our secondary hit at position 212951423, using both the original and conditional association analysis results. For the original analysis results, you can use the data you JSONized before. For the conditional analysis results, you will need to JSONize the association statistics from that analysis. What do you observe in these locus zoom plots? 
+Return to the LocusZoom Shiny App and make locus zoom plots indexed by our secondary hit at position 212951423, using both the original and conditional association analysis results. For the original analysis results, you can use the association data you JSONized before, but you will need to re-calculate LD statistics with this variant as the reference. For the conditional analysis results, you will need to JSONize the association statistics from that analysis. What do you observe in these locus zoom plots? \n
+
+Note that the Locus Zoom plots generated as described in this exercise are saved as `1KG_trait_1_assoc_chr1_212951423_100kb_LocusZoom.svg` and `1KG_trait_1_assoc_cond_chr1_212951423_100kb_LocusZoom.svg` in your project files.
 
 
 
diff --git a/04_conditional_analysis.html b/04_conditional_analysis.html
index 0c87f12..f683a82 100644
--- a/04_conditional_analysis.html
+++ b/04_conditional_analysis.html
@@ -431,16 +431,16 @@ <h2>Locus Zoom Plots</h2>
 variants.</p>
 <ul>
 <li>Click the “GENESIS Data JSONizer” tab at the top of the screen</li>
-<li>Select Input Files
+<li>Select Input Files from your Project
 <ul>
 <li>GDS file: <code>1KG_phase3_GRCh38_subset_chr1.gds</code></li>
-<li>.RData file: <code>1KG_trait_1_chr1.RData</code></li>
+<li>.RData file: <code>1KG_trait_1_assoc_chr1.RData</code></li>
 </ul></li>
 <li>JSONizer parameters
 <ul>
 <li>Check: “Specify variant and a flanking region around it”</li>
 <li>Select the position of the variant of interest: 212956321</li>
-<li>Specify flanking region: 50000 (i.e. 50kb in each direction).</li>
+<li>Specify flanking region: 100000 (i.e. 100kb in each direction).</li>
 <li>Select test type: score</li>
 </ul></li>
 <li>Click: JSONize</li>
@@ -450,7 +450,7 @@ <h2>Locus Zoom Plots</h2>
 for later, if you desire.</p>
 <ul>
 <li>Expand: JSON File - Download and Export Form</li>
-<li>Set a file name (e.g. “1KG_trait_1_chr1_212956321”)</li>
+<li>Set a file name (e.g. “1KG_trait_1_assoc_chr1_212956321_100kb”)</li>
 <li>Choose extension: <code>.json</code></li>
 <li>Click: Export JSON file to platform</li>
 <li>Select your Project and Click: Confirm</li>
@@ -465,9 +465,12 @@ <h2>Locus Zoom Plots</h2>
 <li>Expand: Option Data Layers</li>
 <li>Expand: Linkage Disequilibrium</li>
 <li>Select Data Source: Compute LD Data</li>
-<li>Select reference variant: 1:212956321_?/? (our variant of
+<li>Select reference variant: 1:212956321_T/C (our variant of
 interest)</li>
 <li>Click: Calculate LD</li>
+<li>Note: do not check the “use sample set file for LD calculation”
+button – this allows you to select a subset of samples from your
+dataset</li>
 </ul>
 <p>You can expand the Linkage Disequilibrium Data Overview tab to see a
 preview of the calculated LD data, and you can download the data as a
@@ -475,16 +478,16 @@ <h2>Locus Zoom Plots</h2>
 platform and save it for later, if you desire.</p>
 <ul>
 <li>Expand: JSON File - Download and Export Form</li>
-<li>Set a file name (e.g. “1KG_trait_1_chr1_212956321_LD”)</li>
+<li>Set a file name
+(e.g. “1KG_trait_1_assoc_chr1_212956321_LD_100kb”)</li>
 <li>Choose extension: <code>.json</code></li>
 <li>Click: Export JSON file to platform</li>
 <li>Select your Project and Click: Confirm</li>
 <li>Click: Upload</li>
 </ul>
-<p>You need to select the Genome Build that matches your data:</p>
-<ul>
-<li>Change the Genome Build to GRCh38 for this dataset</li>
-</ul>
+<p>Note that the you need to select the Genome Build that matches your
+data. In this case, our data is in build GRCh38, which is the default
+setting. </p>
 <p>You can review the Initial Plot State Info to make sure everything
 looks as expected, and then make the plot!</p>
 <ul>
@@ -492,9 +495,25 @@ <h2>Locus Zoom Plots</h2>
 </ul>
 <p>The generated plot is interactive. You can hover over variants to see
 their chromosome, position, alleles, and association p-value. You can
-drag the figure left or right to see different sections of the plotted
-region. You can save the current figure as a .png or .svg file either
-locally or on the BioData Catalyst platform.</p>
+hover over genes to see their Ensembl gene ID and other information. You
+can drag the figure left or right to see different sections of the
+plotted region. Click on “Show Legend” to see how the color coding of
+points corresponds to LD <span class="math inline">\(r^2\)</span>
+values. Note that our data for this exercise is a subset of the whole
+chromosome – you shouldn’t expect to see large gaps with no variants
+when working with the full WGS data. You can save the current figure as
+a .png or .svg file either locally or on the BioData Catalyst platform.
+Note that the Locus Zoom plot generated as described above is saved as
+<code>1KG_trait_1_assoc_chr1_212956321_100kb_LocusZoom.svg</code> in
+your project files.</p>
+<ul>
+<li>What gene is our lead variant located in?</li>
+<li>What is the position of the second most significant variant and what
+is its LD <span class="math inline">\(r^2\)</span> value with our lead
+variant?</li>
+<li>What is the largest LD <span class="math inline">\(r^2\)</span>
+value observed at any variant with our lead variant?</li>
+</ul>
 <p>If you’ve saved your .json association results file and your .json LD
 statistics file to your Project, you can come back later and recreate
 your LocusZoom plot by selecting the “Use Your Own Data Sources” tab at
@@ -563,8 +582,8 @@ <h3>Selecting Conditional Variants</h3>
 conditional variants as those at <code>1:212956321</code> and
 <code>1:25046749</code>. </p>
 </div>
-<div id="conditoinal-null-model" class="section level3">
-<h3>Conditoinal Null Model</h3>
+<div id="conditional-null-model" class="section level3">
+<h3>Conditional Null Model</h3>
 <p>When preparing our data to run the conditional null model, we need to
 actually extract the genotype values from the GDS file. It is easiest to
 use the <code>variant.id</code> values from the GDS file, but remember
@@ -970,9 +989,16 @@ <h2>Exercise 4.3 (LocusZoom Shiny App)</h2>
 <p>Return to the LocusZoom Shiny App and make locus zoom plots indexed
 by our secondary hit at position 212951423, using both the original and
 conditional association analysis results. For the original analysis
-results, you can use the data you JSONized before. For the conditional
-analysis results, you will need to JSONize the association statistics
-from that analysis. What do you observe in these locus zoom plots?</p>
+results, you can use the association data you JSONized before, but you
+will need to re-calculate LD statistics with this variant as the
+reference. For the conditional analysis results, you will need to
+JSONize the association statistics from that analysis. What do you
+observe in these locus zoom plots? </p>
+<p>Note that the Locus Zoom plots generated as described in this
+exercise are saved as
+<code>1KG_trait_1_assoc_chr1_212951423_100kb_LocusZoom.svg</code> and
+<code>1KG_trait_1_assoc_cond_chr1_212951423_100kb_LocusZoom.svg</code>
+in your project files.</p>
 </div>
 <div id="exercise-4.4-locuszoom-shiny-app" class="section level2">
 <h2>Exercise 4.4 (LocusZoom Shiny App)</h2>