-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Change the docs * Add model description * Add API for numeric and quasimultinomial submodules. * Finish API description * Mark as uncompleted
- Loading branch information
1 parent
0009d38
commit 245bf79
Showing
13 changed files
with
297 additions
and
19 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Dynamics | ||
|
||
Models predicting the changes in variant abundances over time. | ||
|
||
Import as | ||
|
||
```python | ||
|
||
from covvfit import dynamics | ||
``` | ||
|
||
::: covvfit.dynamics.JointLogisticGrowthParams |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# API Reference | ||
|
||
The package is divided into several modules, corresponding to different utilities. | ||
|
||
## Quasimultinomial model | ||
|
||
The quasimultinomial noise model, which corresponds to the high-level *Covvfit* utilities. | ||
The API is available [here](quasimultinomial.md). | ||
|
||
**Caution:** There are public functions in this submodule, which are being refactored into other modules. | ||
|
||
## Dynamics | ||
|
||
The growth models, linking model parameters to outcomes of idealised abundance. | ||
The API is available [here](dynamics.md). | ||
|
||
|
||
## Preprocessing | ||
|
||
Data preprocessing utilities, used to load deconvoluted data into the model and reshape them into appropriate data formats. | ||
The API is available [here](preprocessing.md). | ||
|
||
|
||
## Plotting | ||
|
||
The plotting utilities. | ||
The API is available [here](plotting.md). | ||
|
||
## Numeric | ||
|
||
General numerical programming utilities. | ||
The API is available [here](numeric.md). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Numeric | ||
|
||
General numerical programming utilities. | ||
|
||
It can be imported as: | ||
```python | ||
from covvfit import numeric | ||
``` | ||
|
||
or | ||
|
||
```python | ||
import covvfit | ||
|
||
covvfit.numeric.some_function() | ||
``` | ||
|
||
|
||
## Optimization | ||
|
||
::: covvfit.numeric.OptimizeMultiResult | ||
|
||
::: covvfit.numeric.jax_multistart_minimize | ||
|
||
|
||
## Matrix operations | ||
|
||
::: covvfit.numeric.log_matrix | ||
|
||
::: covvfit.numeric.log1mexp |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Plotting | ||
|
||
Plotting utilities. | ||
|
||
Import as: | ||
|
||
```python | ||
|
||
from covvfit import plot | ||
``` | ||
|
||
::: covvfit.plot.ArrangedGrid | ||
|
||
|
||
::: covvfit.plot.arrange_into_grid | ||
|
||
|
||
::: covvfit.plot.set_axis_off | ||
|
||
|
||
## Timeseries submodule | ||
|
||
Import as | ||
|
||
```python | ||
|
||
from covvfit import plot | ||
plot_ts = plot.timeseries | ||
``` | ||
|
||
::: covvfit.plot.timeseries.plot_fit | ||
|
||
::: covvfit.plot.timeseries.plot_complement | ||
|
||
::: covvfit.plot.timeseries.plot_data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Preprocessing | ||
|
||
Data preprocessing utilities, used to load deconvoluted data into the model and reshape them into appropriate data formats. | ||
|
||
Import as: | ||
|
||
```python | ||
|
||
from covvfit import preprocess | ||
``` | ||
|
||
::: covvfit.preprocess.TimeScaler | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Quasimultinomial | ||
|
||
The quasimultinomial noise model, which corresponds to the high-level *Covvfit* utilities. | ||
It can be imported as: | ||
|
||
```python | ||
from covvfit import quasimultinomial as qm | ||
``` | ||
|
||
It has the following capabilities: | ||
|
||
::: covvfit.quasimultinomial.construct_total_loss | ||
|
||
::: covvfit.quasimultinomial.construct_model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Command line workflow | ||
|
||
Tutorial to be added in February 2025. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Covvfit: Variant fitness estimates from wastewater data | ||
|
||
*Covvfit* is a framework for estimating relative growth advantages of different variants from deconvolved wastewater samples. | ||
It consists of command line tools, which can be included in the existing workflows and a Python package, which can be used to quickly develop custom solutions and extensions. | ||
|
||
|
||
|
||
|
||
## FAQ | ||
|
||
**How do I run *Covvfit* on my data?** | ||
|
||
We recommend to start using *Covvfit* as a command line tool, with the tutorial available [here](cli.md). | ||
|
||
**What data does *Covvfit* use?** | ||
|
||
*Covvfit* uses deconvolved wastewater data, accepting relative abundances of different variants measured at different locations and times. | ||
Tools such as [LolliPop](https://github.com/cbg-ethz/LolliPop) or [Freyja](https://github.com/andersen-lab/Freyja/) can be used to deconvolve wastewater data. | ||
|
||
**Can *Covvfit* predict emergence of new variants?** | ||
|
||
No, *Covvfit* explicitly assumes that no new variants emerge in the considered timeframe, so its predictions are unlikely to hold on longer timescales. | ||
The underlying model also cannot take into account changes in the transmission dynamics or immune response, so that it cannot predict the effects of vaccination programs or lockdowns. | ||
|
||
**How can I contact the developers?** | ||
|
||
In case you find a bug, want to ask about integrating *Covvfit* into your pipeline, or have any other feedback, we would love to hear it via our [issue tracker](https://github.com/cbg-ethz/covvfit/issues)! | ||
In this manner, other users can also benefit from your insights. | ||
|
||
**Is there a manuscript associated with the tool?** | ||
|
||
The manuscript is being finalised. We hope to release the preprint describing the method in February 2025. | ||
In case you would like to cite *Covvfit* in your work, we would recommend the following for now: | ||
|
||
D. Dreifuss, P. Czyż, N. Beerenwinkel, *Learning and forecasting selection dynamics of SARS-CoV-2 variants from wastewater sequencing data using Covvfit* (2025; in preparation). URL: https://github.com/cbg-ethz/covvfit |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# Statistical models | ||
|
||
Here, we explain the statistical models used. | ||
|
||
The models are composed of two components: | ||
|
||
- The growth model used to describe how the abundances increase or decrease, basing on competition between different variants. This model links some interpretable parameters (such as the growth advantages) with idealised (unobserved) abundances, $p(t)$. | ||
- The noise model connecting idealised abundances $p(t)$ with deconvolved values $y(t)$, which are subject to some noise. | ||
|
||
## Growth model | ||
|
||
### Single-location model | ||
|
||
We consider V competing variants, numbered between 1, ..., V present in the population. At time $t$ the relative abundances of the variants are represented by a vector $p (t) = (p_1(t), p_2(t), ..., p_V(t))$. | ||
We assume survival-of-the fittest type of [selection dynamics](https://en.wikipedia.org/wiki/Replicator_equation#Equation), where the $i$-th variant has a fitness value $f_i$, which is fixed in time, and the relative abundances change at a rate proportional to their fitness advantage over the average value: | ||
|
||
$$ \frac{\mathrm d}{\mathrm dt}p_i(t)= p_i(t)\left(f_i - \sum_{j=1}^V p_j(t) f_j \right).$$ | ||
|
||
In this model, the selection advantage of variant $X_i$ over variant $X_j$ is given by $s_{ij}=f_i-f_j$. | ||
Note that the model dynamics is determined only by the relative selection advantages, rather than the fitness values $f_i$, making the problem non-identifiable: adding a constant to all fitness values does not change the dynamics of the model. | ||
Hence, without loss of generality, we set $f_1=0$, to avoid this identifiability issue. | ||
|
||
This set of ordinary differential equations has the analytical solution | ||
|
||
$$ p_i(t)=\frac{\exp(f_i\cdot t+b_i)}{ \sum_{j=1}^V \exp(f_j\cdot t + b_j)} $$ | ||
|
||
Where $b_1, b_2, ..., b_V$ are constants given by the initial conditions. Similarly as with fitness values, adding a constant to all $b_i$ does not change the functions $p_i(t)$. We fix $b_1 = 0$. | ||
|
||
### Multiple-location model | ||
|
||
The above model can be used to describe a collection of location-specific relative abundance vectors $p_k(t)$, where the index $k\in \{1,..., K\}$ represents a spatial location (e.g., the city or a district connected to one data collection system). | ||
|
||
We expect that the introduction times of different variants to different locations may be different, which we accommodate by defining location-specific parameters $b_{vk}$. However, if the wastewater sampling locations are located at the same country subject to the same vaccine program and immunization, we suspect that the processes $p_k(t)$ are not entirely independent. | ||
We therefore *make an assumption that the fitness value does not change across the locations*. | ||
|
||
Note that while we find this assumption plausible in the analysis of the data from the same country, it may not hold when analysing locations subject to different vaccine programs. | ||
|
||
To summarize, we infer parameters $b_{vk}$ for all variants $v\in \{1, ..., V\}$ and locations $k\in \{1,..., K\}$ together with fitness values $f_1, ..., f_V$, which are shared between the sampling locations. | ||
We use the identifiability constraints $f_1 = 0$ and $b_{1k} = 0$ for all $k$. | ||
|
||
## Noise model | ||
|
||
We deconvolute a wastewater sample collected at time point $t$ and location $k$ to obtain the observed relative abundances vector | ||
|
||
$y_k(t) =(y_{1k}(t), ..., y_{Vk}(t))$, where $y_{vk}(t)$ represents the relative abundance of variant $v\in \{1, ..., V\}$ as obtained in the deconvolution procedure. | ||
|
||
Due to a small load of viral signal, amplification through next generation sequencing method, and deconvolution procedure, we do not have an explicit generative model linking the ideal abundance value $p_k(t)$, to the deconvolved value $y_k(t)$. | ||
Instead, we use the quasi-likelihood approach, where we assume that $\mathbb E[y_k(t)] = p_k(t),$ | ||
and use a covariance function in a generalized linear model corresponding to the scaled multinomial distribution. | ||
In this manner, the quasi-likelihood inference allows one to correct the obtained confidence intervals by using the dispersion parameter, which is adapted to capture the variance observed in the data. | ||
|
||
As both $y_k(t)$ and $p_k(t)$ are probability vectors, we use the quasi-multinomial model, in which the quasi-loglikelihood function is given by | ||
|
||
$$ q(f,b)= \sum_{k=1}^K \sum_{t=1}^T\sum_{v=1}^V y_{vk}(t) \log p_{vk}(t). $$ | ||
|
||
We numerically optimize it to find the maximum quasi-likelihood estimate $\hat \theta = (\hat f, \hat b)$. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.