Skip to content

Commit

Permalink
update locus zoom exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
mconomos committed Jun 10, 2024
1 parent 5f0183f commit a569e55
Show file tree
Hide file tree
Showing 2 changed files with 62 additions and 30 deletions.
30 changes: 18 additions & 12 deletions 04_conditional_analysis.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,20 @@ The [Locus Zoom Shiny App](https://locuszoom-shiny-app.bdc.sb-webapp.com/) is an
The application requires data to be stored as a JSON file. There is a `GENESIS Data JSONizer` tool that converts single-variant association test results .RData file as output by the `GENESIS Single Variant Association Testing` app into the required JSON file. This tool also calculates the linkage disequilibrium (LD) measures required to make the LocusZoom plot for the selected variants.

- Click the "GENESIS Data JSONizer" tab at the top of the screen
- Select Input Files
- Select Input Files from your Project
- GDS file: `1KG_phase3_GRCh38_subset_chr1.gds`
- .RData file: `1KG_trait_1_chr1.RData`
- .RData file: `1KG_trait_1_assoc_chr1.RData`
- JSONizer parameters
- Check: "Specify variant and a flanking region around it"
- Select the position of the variant of interest: 212956321
- Specify flanking region: 50000 (i.e. 50kb in each direction).
- Specify flanking region: 100000 (i.e. 100kb in each direction).
- Select test type: score
- Click: JSONize

You have the option to download the JSON file to your local environment or upload it to the BioData Catalyst platform and save it for later, if you desire.

- Expand: JSON File - Download and Export Form
- Set a file name (e.g. "1KG_trait_1_chr1_212956321")
- Set a file name (e.g. "1KG_trait_1_assoc_chr1_212956321_100kb")
- Choose extension: `.json`
- Click: Export JSON file to platform
- Select your Project and Click: Confirm
Expand All @@ -73,31 +73,35 @@ There are several optional data layers you can add to your LocusZoom plot. The m
- Expand: Option Data Layers
- Expand: Linkage Disequilibrium
- Select Data Source: Compute LD Data
- Select reference variant: 1:212956321_?/? (our variant of interest)
- Select reference variant: 1:212956321_T/C (our variant of interest)
- Click: Calculate LD
- Note: do not check the "use sample set file for LD calculation" button -- this allows you to select a subset of samples from your dataset

You can expand the Linkage Disequilibrium Data Overview tab to see a preview of the calculated LD data, and you can download the data as a JSON file to your local environment or upload it to the BioData Catalyst platform and save it for later, if you desire.

- Expand: JSON File - Download and Export Form
- Set a file name (e.g. "1KG_trait_1_chr1_212956321_LD")
- Set a file name (e.g. "1KG_trait_1_assoc_chr1_212956321_LD_100kb")
- Choose extension: `.json`
- Click: Export JSON file to platform
- Select your Project and Click: Confirm
- Click: Upload

You need to select the Genome Build that matches your data:

- Change the Genome Build to GRCh38 for this dataset
Note that the you need to select the Genome Build that matches your data. In this case, our data is in build GRCh38, which is the default setting. \n

You can review the Initial Plot State Info to make sure everything looks as expected, and then make the plot!

- Click: Generate plot

The generated plot is interactive. You can hover over variants to see their chromosome, position, alleles, and association p-value. You can drag the figure left or right to see different sections of the plotted region. You can save the current figure as a .png or .svg file either locally or on the BioData Catalyst platform.
The generated plot is interactive. You can hover over variants to see their chromosome, position, alleles, and association p-value. You can hover over genes to see their Ensembl gene ID and other information. You can drag the figure left or right to see different sections of the plotted region. Click on "Show Legend" to see how the color coding of points corresponds to LD $r^2$ values. Note that our data for this exercise is a subset of the whole chromosome -- you shouldn't expect to see large gaps with no variants when working with the full WGS data. You can save the current figure as a .png or .svg file either locally or on the BioData Catalyst platform. Note that the Locus Zoom plot generated as described above is saved as `1KG_trait_1_assoc_chr1_212956321_100kb_LocusZoom.svg` in your project files.

- What gene is our lead variant located in?
- What is the position of the second most significant variant and what is its LD $r^2$ value with our lead variant?
- What is the largest LD $r^2$ value observed at any variant with our lead variant?

If you've saved your .json association results file and your .json LD statistics file to your Project, you can come back later and recreate your LocusZoom plot by selecting the "Use Your Own Data Sources" tab at the top of the LocusZoom Shiny App page. This time, rather than JSONizing the data, you can select the .json files as input, and set the plotting parameters the same as we did above.



## Conditional Analysis

One of the most common post-GWAS analyses we routinely perform is to run conditional analyses to explore if there are any secondary hits at loci (regions) with significant variant associations. Conditional analyses include genetic variants in the null model (i.e. the conditional variants) to adjust for their effects on the trait, just like the other fixed effect covariates in the model. The idea is to see if other association signals remain after accounting for (i.e. conditioning on) the effect(s) of the conditional variant(s).
Expand All @@ -120,7 +124,7 @@ assoc[assoc$Score.pval < 5e-8, ]

In our original association analysis, we found that there were 6 genome-wide significant variants at two distinct loci. In the particular example here, it is pretty clear that we can consider our hits as two distinct loci, as they are at opposite ends of the chromosome and the physical distance between them is ~188Mb. Therefore, we identify our conditional variants as those at `1:212956321` and `1:25046749`. \n

### Conditoinal Null Model
### Conditional Null Model

When preparing our data to run the conditional null model, we need to actually extract the genotype values from the GDS file. It is easiest to use the `variant.id` values from the GDS file, but remember that these are unique to your GDS file.

Expand Down Expand Up @@ -347,7 +351,9 @@ From looking at the truncated Manhattan plot, we see that the signal from the lo

## Exercise 4.3 (LocusZoom Shiny App)

Return to the LocusZoom Shiny App and make locus zoom plots indexed by our secondary hit at position 212951423, using both the original and conditional association analysis results. For the original analysis results, you can use the data you JSONized before. For the conditional analysis results, you will need to JSONize the association statistics from that analysis. What do you observe in these locus zoom plots?
Return to the LocusZoom Shiny App and make locus zoom plots indexed by our secondary hit at position 212951423, using both the original and conditional association analysis results. For the original analysis results, you can use the association data you JSONized before, but you will need to re-calculate LD statistics with this variant as the reference. For the conditional analysis results, you will need to JSONize the association statistics from that analysis. What do you observe in these locus zoom plots? \n

Note that the Locus Zoom plots generated as described in this exercise are saved as `1KG_trait_1_assoc_chr1_212951423_100kb_LocusZoom.svg` and `1KG_trait_1_assoc_cond_chr1_212951423_100kb_LocusZoom.svg` in your project files.



Expand Down
62 changes: 44 additions & 18 deletions 04_conditional_analysis.html
Original file line number Diff line number Diff line change
Expand Up @@ -431,16 +431,16 @@ <h2>Locus Zoom Plots</h2>
variants.</p>
<ul>
<li>Click the “GENESIS Data JSONizer” tab at the top of the screen</li>
<li>Select Input Files
<li>Select Input Files from your Project
<ul>
<li>GDS file: <code>1KG_phase3_GRCh38_subset_chr1.gds</code></li>
<li>.RData file: <code>1KG_trait_1_chr1.RData</code></li>
<li>.RData file: <code>1KG_trait_1_assoc_chr1.RData</code></li>
</ul></li>
<li>JSONizer parameters
<ul>
<li>Check: “Specify variant and a flanking region around it”</li>
<li>Select the position of the variant of interest: 212956321</li>
<li>Specify flanking region: 50000 (i.e. 50kb in each direction).</li>
<li>Specify flanking region: 100000 (i.e. 100kb in each direction).</li>
<li>Select test type: score</li>
</ul></li>
<li>Click: JSONize</li>
Expand All @@ -450,7 +450,7 @@ <h2>Locus Zoom Plots</h2>
for later, if you desire.</p>
<ul>
<li>Expand: JSON File - Download and Export Form</li>
<li>Set a file name (e.g. “1KG_trait_1_chr1_212956321”)</li>
<li>Set a file name (e.g. “1KG_trait_1_assoc_chr1_212956321_100kb”)</li>
<li>Choose extension: <code>.json</code></li>
<li>Click: Export JSON file to platform</li>
<li>Select your Project and Click: Confirm</li>
Expand All @@ -465,36 +465,55 @@ <h2>Locus Zoom Plots</h2>
<li>Expand: Option Data Layers</li>
<li>Expand: Linkage Disequilibrium</li>
<li>Select Data Source: Compute LD Data</li>
<li>Select reference variant: 1:212956321_?/? (our variant of
<li>Select reference variant: 1:212956321_T/C (our variant of
interest)</li>
<li>Click: Calculate LD</li>
<li>Note: do not check the “use sample set file for LD calculation”
button – this allows you to select a subset of samples from your
dataset</li>
</ul>
<p>You can expand the Linkage Disequilibrium Data Overview tab to see a
preview of the calculated LD data, and you can download the data as a
JSON file to your local environment or upload it to the BioData Catalyst
platform and save it for later, if you desire.</p>
<ul>
<li>Expand: JSON File - Download and Export Form</li>
<li>Set a file name (e.g. “1KG_trait_1_chr1_212956321_LD”)</li>
<li>Set a file name
(e.g. “1KG_trait_1_assoc_chr1_212956321_LD_100kb”)</li>
<li>Choose extension: <code>.json</code></li>
<li>Click: Export JSON file to platform</li>
<li>Select your Project and Click: Confirm</li>
<li>Click: Upload</li>
</ul>
<p>You need to select the Genome Build that matches your data:</p>
<ul>
<li>Change the Genome Build to GRCh38 for this dataset</li>
</ul>
<p>Note that the you need to select the Genome Build that matches your
data. In this case, our data is in build GRCh38, which is the default
setting. </p>
<p>You can review the Initial Plot State Info to make sure everything
looks as expected, and then make the plot!</p>
<ul>
<li>Click: Generate plot</li>
</ul>
<p>The generated plot is interactive. You can hover over variants to see
their chromosome, position, alleles, and association p-value. You can
drag the figure left or right to see different sections of the plotted
region. You can save the current figure as a .png or .svg file either
locally or on the BioData Catalyst platform.</p>
hover over genes to see their Ensembl gene ID and other information. You
can drag the figure left or right to see different sections of the
plotted region. Click on “Show Legend” to see how the color coding of
points corresponds to LD <span class="math inline">\(r^2\)</span>
values. Note that our data for this exercise is a subset of the whole
chromosome – you shouldn’t expect to see large gaps with no variants
when working with the full WGS data. You can save the current figure as
a .png or .svg file either locally or on the BioData Catalyst platform.
Note that the Locus Zoom plot generated as described above is saved as
<code>1KG_trait_1_assoc_chr1_212956321_100kb_LocusZoom.svg</code> in
your project files.</p>
<ul>
<li>What gene is our lead variant located in?</li>
<li>What is the position of the second most significant variant and what
is its LD <span class="math inline">\(r^2\)</span> value with our lead
variant?</li>
<li>What is the largest LD <span class="math inline">\(r^2\)</span>
value observed at any variant with our lead variant?</li>
</ul>
<p>If you’ve saved your .json association results file and your .json LD
statistics file to your Project, you can come back later and recreate
your LocusZoom plot by selecting the “Use Your Own Data Sources” tab at
Expand Down Expand Up @@ -563,8 +582,8 @@ <h3>Selecting Conditional Variants</h3>
conditional variants as those at <code>1:212956321</code> and
<code>1:25046749</code>. </p>
</div>
<div id="conditoinal-null-model" class="section level3">
<h3>Conditoinal Null Model</h3>
<div id="conditional-null-model" class="section level3">
<h3>Conditional Null Model</h3>
<p>When preparing our data to run the conditional null model, we need to
actually extract the genotype values from the GDS file. It is easiest to
use the <code>variant.id</code> values from the GDS file, but remember
Expand Down Expand Up @@ -970,9 +989,16 @@ <h2>Exercise 4.3 (LocusZoom Shiny App)</h2>
<p>Return to the LocusZoom Shiny App and make locus zoom plots indexed
by our secondary hit at position 212951423, using both the original and
conditional association analysis results. For the original analysis
results, you can use the data you JSONized before. For the conditional
analysis results, you will need to JSONize the association statistics
from that analysis. What do you observe in these locus zoom plots?</p>
results, you can use the association data you JSONized before, but you
will need to re-calculate LD statistics with this variant as the
reference. For the conditional analysis results, you will need to
JSONize the association statistics from that analysis. What do you
observe in these locus zoom plots? </p>
<p>Note that the Locus Zoom plots generated as described in this
exercise are saved as
<code>1KG_trait_1_assoc_chr1_212951423_100kb_LocusZoom.svg</code> and
<code>1KG_trait_1_assoc_cond_chr1_212951423_100kb_LocusZoom.svg</code>
in your project files.</p>
</div>
<div id="exercise-4.4-locuszoom-shiny-app" class="section level2">
<h2>Exercise 4.4 (LocusZoom Shiny App)</h2>
Expand Down

0 comments on commit a569e55

Please sign in to comment.