-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtemplate.qmd
271 lines (218 loc) · 9.27 KB
/
template.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
---
title: "VIPERA report: `r params$name`"
subtitle: "Workflow version: `r params$workflow_version`"
date: last-modified
date-format: "YYYY-MM-DD"
title-block-banner: true
format:
html:
page-layout: article
embed-resources: true
smooth-scroll: true
theme: cosmo
toc: true
toc-expand: true
toc-location: left
toc-title: Summary
number-sections: true
css: config/report.styles.css
editor: visual
params:
ufboot_reps: ""
shalrt_reps: ""
min_ivar_freq: ""
workflow_version: ""
div: ""
freyja: ""
tree: ""
tempest: ""
SNV: ""
SNV_s: ""
evo: ""
div_value: ""
panel: ""
volcano: ""
tree_ml: ""
fig_cor_snp: ""
stats_lm: ""
table: ""
sum_nv: ""
heat_tab: ""
omega_plot: ""
name: ""
output-file: report.html
---
<head>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Montserrat&display=swap" rel="stylesheet">
</head>
```{r read, echo = F, message = F, include = F}
library(jsonlite)
library(gt)
library(tidyr)
library(dplyr)
library(heatmaply)
library(readr)
# Diversity
div_values <- fromJSON(params$div_value)
# Temporal signal
stats <- fromJSON(params$stats_lm)
correlation <- stats[["r2"]]
sub_rate <- stats[["sub_rate"]]
sub_rate <- round(sub_rate, digits = 2)
p_value_lm <- stats[["pvalue"]]
# NV counts
nv.counts <- fromJSON(params$sum_nv)
n_SNV <- nv.counts[["SNV"]]
n_INDELS <- nv.counts[["INDELS"]]
# Summary table
table <- read.csv(params$table)
# Heatmap
vcf <- read_csv(params$heat_tab)
row.names(vcf) <- vcf$`...1`
vcf <- vcf[-1]
cor.mat <- cor(vcf)
cor.mat[is.na(cor.mat)] <- 0
```
## Summary of the target samples dataset
```{r summary_table, echo = F,message = F}
#| label: tbl-summary
#| tbl-cap: Target dataset summary
table %>%
mutate(
DeltaTime = difftime(
as.Date(Collection_Date),
min(as.Date(Collection_Date), na.rm = TRUE), units = "days"
)
) %>%
gt %>%
cols_label(
Sample = "Sample",
Collection_Date = "Collection date",
Lineage = "Lineage",
DeltaTime = "Days after first sampling"
) %>%
tab_style(
style = list(
cell_text(weight = "bold")
),
location = cells_column_labels()
) %>%
cols_align(
align = "center",
columns = everything()
) %>%
opt_table_font(
font = google_font("Montserrat")
)
```
## Evidence for single, serially-sampled infection
### Lineage admixture
The estimated lineage admixture for each sample has been calculated
using [Freyja](https://github.com/andersen-lab/Freyja).
![Estimated lineage admixture of each sample.
Samples in the X-axis are ordered chronologically, from more ancient to newer.](`r params$freyja`){#fig-freyja}
### Phylogenetic reconstruction
A maximum likelihood tree of the target and context samples has been
built using [IQTREE](http://www.iqtree.org/).
The target samples `r stats[["monophyly"]]` monophyletic. The clade
that contains all the target samples is supported by a UFBoot score of
$`r stats[["boot"]]`$% and a SH-aLRT score of $`r stats[["alrt"]]`$% (@fig-tree_ml).
![Maximum-likelihood phylogeny with $`r params$ufboot_reps`$ UFBoot
and $`r params$shalrt_reps`$ SH-aLRT support replicates of the
target dataset and its context samples. The clade that contains the target
samples is squared in red.](`r params$tree_ml`){#fig-tree_ml}
### Nucleotide diversity comparison
Nucleotide diversity (π) has been calculated for $`r div_values[["boot.reps"]]`$ random
sample subsets of size $`r div_values[["sample.size"]]`$, extracted
from the context dataset. The distribution of the nuclotide diversity is assumed to
`r div_values[["norm.text"]]` be normal after performing a Shapiro-Wilk test
(p-value of $`r div_values[["normal.pvalue"]]`$).
The nucleotide diversity of the target samples is $`r div_values[["diversity"]] `$ (red line in @fig-div).
Assuming the independence of the context samples, the `r div_values[["type.test"]]`
p-value of the context samples having a nucleotide diversity (in orange in @fig-div)
as low as that of the target dataset is $`r div_values[["p.value"]]`$.
![Analysis of the nucleotide diversity (π). The orange line describes
a normal distribution with the same mean and standard deviation as the distribution
of π from $`r div_values[["boot.reps"]]`$ subsets of $`r div_values[["sample.size"]]`$
sequences from the context. The red vertical line indicates the π value of the
target samples.](`r params$div`){#fig-div}
## Evolutionary trajectory of the serially-sampled SARS-CoV-2 infection
### Number of polymorphic sites
Sites with minor allele frequency $> `r params$min_ivar_freq`$ are considered polymorphic.
The linear association between the collection date of the samples and the number of
polymorphic sites has an $R^2$ of $`r nv.counts[["r2"]]`$ and a p-value of
$`r nv.counts[["value"]]`$ (@fig-fig_cor_snp).
![Number of polymorphic sites along time. The
blue line shows the linear model fit.](`r params$fig_cor_snp`){#fig-fig_cor_snp}
### Description of intra-host nucleotide variants
A total of $`r n_SNV`$ different single nucleotide variants (SNV) and $`r n_INDELS`$
insertions and deletions (indels) have been detected along the genome (@fig-SNV).
::: {.panel-tabset}
## Whole genome
![Summary of the intra-host accumulation of nucleotide variants,
using the reconstructed dataset ancestor as reference. A) Nucleotide
variants per site along the SARS-CoV-2 genome. Relative abundance of NVs is calculated
with a sliding window of width $`r nv.counts[["window"]]`$ nucleotides and a step of
$`r nv.counts[["step"]]`$. Labels indicate the coding regions of the non structural
proteins (NSP) within ORF1ab. B) Genome variation along the genome for each sample.
The Y-axis displays samples in chronological order, with the earliest collection date
at the bottom, and the latest, at the top.](`r params$SNV`){#fig-SNV}
## Spike ORF
![Summary of the intra-host accumulation of nucleotide variants
in the spike sequence, using the reconstructed dataset ancestor as reference. A) Nucleotide
variants per site along the S gene. Relative abundance of NVs is calculated
with a sliding window of width $`r nv.counts[["window"]]`$ nucleotides and a step of
$`r nv.counts[["step"]]`$. B) Genome variation along the S gene for each sample.
The Y-axis displays samples in chronological order, with the earliest collection date
at the bottom, and the latest, at the top.](`r params$SNV_s`){#fig-SNV_s}
:::
### Temporal signal of the intra-host mutations
The correlation of the allele frequency of each NV with the time since the
initial sampling has been calculated (@fig-volcano).
![Pearson’s correlation coefficients and adjusted p-values of
allele frequencies with time. Red dashed line indicates adjusted $p = 0.05$.
Labeled dots represent nucleotide variants correlated with time
(adjusted $p < 0.05$).](`r params$volcano`){#fig-volcano}
Significantly correlated nucleotide variants are described in more detail in @fig-panel.
![Time series of relative allele frequencies. The shown positions include
nucleotide variants with a significant correlation with time and sites with more
than two possible states. Each subplot depicts the progression of the allele
frequencies in time for a given genome position.](`r params$panel`){#fig-panel}
A neighbor-joining tree has been constructed using pairwise distances
between target samples (@fig-tree), based on the allele frequencies measured from
read mappings.
![Neighbor-joining tree based on the pairwise allele
frequency-weighted distances.](`r params$tree`){#fig-tree}
To estimate the evolutionary rate, root-to-tip distances measured on the previous
tree (@fig-tree) have been correlated with time, obtaining a $R^2$ of
$`r correlation`$ and a p-value of $`r p_value_lm`$. The estimated evolutionary
rate is $`r sub_rate`$ number of changes per year (@fig-tempest).
![Scatterplot depicting the relationship between root-to-tip
distances and the number of days passed since the first sample. The red
line shows the linear model fit.](`r params$tempest`){#fig-tempest}
### Correlation between alternative alleles
To detect possible interactions between mutations, pairwise correlation between allele
frequencies are calculated (@fig-heatmap). The heatmap is an interactive figure that allows
zooming in on specific regions.
```{r heatmap, echo = F, message = F, fig.align = 'center'}
#| label: fig-heatmap
#| fig-cap: "Interactive hierarchically clustered heatmap of the pairwise Pearson’s correlation coefficients between the time series of allele frequencies in the case study."
heatmaply_cor(
cor.mat,
fontsize_row = 8,
fontsize_col = 8,
column_text_angle = 90
)
```
### Non-synonymous and synonymous substitution rate over time
To track selection footprints, the substitutions per synonymous site ($dS$) and
per non-synonymous site ($dN$) for each sample with respect to the reconstructed
ancestral sequence have been calculated (@fig-evo), as well as their ratio ($\omega = dN/dS$; @fig-omega).
::: {.panel-tabset}
## $dN$ and $dS$
![Time series of $dN$ and $dS$. Each point corresponds to a different sample, sorted in chronological order.](`r params$evo`){#fig-evo}
## $\omega$ ($dN/dS$)
![Time series of $\omega$ ($dN/dS$). Each point corresponds to a different sample, sorted in chronological order.](`r params$omega_plot`){#fig-omega}
:::