Please cite as: Lavezzo, E., Franchin, E., Ciavarella, C. et al. Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo’. Nature (2020). https://doi.org/10.1038/s41586-020-2488-1
This repository contains
- the data collected in Vo to assess COVID-19 prevalence at two subsequent surveys conducted in February and March 2020;
- the code used to fit a compartmental SEIR-like model to the observed prevalence data;
- the code used to estimate the serial interval and reproduction number from a reconstruction of the transmission chains;
- the code used to assess the association between COVID-19 infection and age or sex via logistic regression.
The data folder contains the file anonymised_data_public_final.xlsx
, which has four sheets:
- anonymised data (sheet 1) contains a line list with contact data, dates and results of swab testing, symptoms and hospitalisation data;
- legend (sheet 2) contains a description of the information contained in sheet 1;
- RT_PCR_comparison (sheet 3) contains the viral load data (both Ct values and genome equivalents) and a statistical analysis of these data;
- RT_PCR_DATA (sheet 4) contains the viral load data in sheet 3 linked to the subjects IDs of sheet 1.
The scripts folder contains the main scripts for each separate analysis
SEIR_Covid_Vo.R
is the main script for fitting a SEIR-like model to prevalence dataitaly_analysis_script.R
is the main script for estimating the serial interval and the reproduction number. This script requires scriptitaly_cleaning_data_script.R
to be run beforehand.Logistic_regression.R
is the main script to assess the association between swab positivity and age or sex
The R folder contains the functions used by the main scripts.
SEIR_Covid_Vo.R
uses the following scripts:
- model.R
- clean.R
- figures.R
- tables.R
italy_analysis_script.R
uses the following scripts:
- checking_functions.R
- serial_interval.functions.R
- reproduction_number_functions.R
- plot_clusters.R
- plot_tree.R
The code calibrates a compartmental model of SARS-CoV-2 transmission to the prevalence data observed in Vo using the Metropolis-Hastings Markov Chain Monte Carlo (MCMC) method. The computational time depends on the number of MCMC iterations and chains. Convergence was assessed from three MCMC chains starting at different initial points: for all parameter combinations we run 200,000 MCMC iterations, thinned the chains taking every 100th iteration and removed the first 200 observations (burnin).
Number of subjects tested and observed number of pre-symptomatic, symptomatic and asymptomatic study participants testing positive to SARS-CoV-2 at two surveys conducted in the municipality of Vo, Italy in February and March 2020.
S
: susceptible individualsE
: individuals that are incubating the virus: infected but not yet infectious and with undetectable viral loadTP
: individuals with detectable viral load before the onset of symptoms, we assume they are infectiousI_S
: infectious individuals with detectable viral load who show symptomsI_A
: infectious individuals with detectable viral load who do not show symptomsTP_S
: symptomatic individuals that are no longer infectious but have a detectable viral loadTP_A
: asymptomatic individuals that are no longer infectious but have a detectable viral loadTN
: individuals who test negative (undetectable viral load)
tSeed
: time infection is seeded in Votime1
: time of first surveytime2
: time of second surveytQ
: time lockdown startedN
: Vo resident populationR0_1
: basic reproduction number before the implementation of lockdown1 - w
: proportional reduction of the basic reproduction number due to the lockdownseed
: number of infectious individuals at time tSeed, needed to trigger the epidemicp
: proportion of asymptomatic infections (not developing symptoms throughout the whole infection)1 / nu
: average time from infection to virus detectability1 / delta
: average time from virus detectability to symptom onset1 / gamma
: average duration of infectiousness from time of symptom onset1 / sigma
: average duration of virus detectability beyond the infectious period
The code reconstructs transmission chains following the algorithms described in Supplementary Text S1 and S2. The serial interval and the effective reproduction number are estimated from the reconstructed transmission chains. The 95% confidence interval around the central estimates is calculated by bootstrapping. The results presented in the paper were obtained with 10,000 iterations. The computational time depends on the number of iterations used.
italy_cleaning_data_script.R
uses the data file to generate the following outputs:
all_contacts.rds
lists all contacts between individuals (direct, indirect, imputed household)all_contacts.tsv
- same asall_contacts.rds
but in tsv formatdirect_contacts.rds
lists all contacts between individuals (direct and imputed household contacts only)direct_contacts.tsv
- same asdirect_contacts.rds
but in tsv formatlinelist.rds
lists all individuals who tested positive for SARS-CoV-2, their ids, household ids, dates of first positive and negative test, last positive and negative test, and date of onset (where available)linelist.tsv
- same aslinelist.rds
but in tsv format
These output files are used in italy_analysis_script.R
to execute the algorithms describes in Supplementary Text S1 and S2 for estimating the central estimates of the serial interval and effective reproduction number and their 95% confidence intervals.
plot_clusters.R
plots the observed transmission clusters as shown in Extended Data Figure 4b.
The code in Logistic_regression.R
does not depend on any external function. It runs logistic regresison on the line list data to test the association between positivity and age and sex.