This subdirectory contains scripts and notebooks that generate primer pools used for Delta variant spike deep mutational scanning experiments. The design is run by Snakefile, and puts results in ./results/. Code generates four separate primer pools:
- ./results/gisaid_primer_oPool.csv spreadsheet contains specific forward and reverse primer pools for all mutations present in GISAID spike sequences.
- ./results/usher_primer_oPool.csv spreadsheet contains specific forward and reverse primers for each independently reoccurring mutation on SARS-CoV-2 phylogenetic tree.
- ./results/positiveSel_primer_oPool.csv spreadsheet contains random forward and reverse primers for each spike site that's undergoing positive selection.
- ./results/paired_positiveSel_primer_oPool.csv spreadsheet contains forward and reverse primers for generating spike mutants that have paired mutations for closely occurring spike residues, which are undergoing positive selection.
The final primer sequences used as ordering sheets for oPools on IDTdna
are .csv
files ending with oPool
in ./results/.
The file results/aggregated_mutations.csv indicates all mutations that are designed in each category.
-
GISAID_data/spikeprot0724.fasta
contains an alignment of all spike proteins as downloaded from theDownload
tab of theEpiCov
section of GISAID on July-26-2021. Note that the download yields a zipped.tar
file; this file was then un-tarred and unzipped. Due to GISAID data sharing terms, this file is not actually included in the repo. -
./reference_sequences subdirectory contains SARS-CoV-2 spike reference sequences and lookup tables required to renumber positions between variants.
- scripts subdirectory contains scripts for generating data. Scripts work as follows:
spike_positive_selection_sites.py
script uses SARS-CoV-2 spike protein selection data (as described in this paper) to filter for positions in spike that are undergoing positive selection.filter_and_align_gisaid.py
usesGISAID_data/spikeprot0724.fasta
sequences to align all SARS-CoV-2 spike sequences deposited in GISAID as of July-26-2021spike_alignment_counts.py
extracts all mutations in GISAID spike alignments relative too Wuhan-1 sequence.spike_mutcounts.py
counts the number of independently reoccurring mutations on SARS-CoV-2 phylogenetic tree available from UShER.2021Jan_create_primers.py
andcreate_primers_del.py
are scripts that create random or specific amino acid change primers, respectively.
- ./notebooks subdirectory contains notebooks used to generate primer pools found in ./results/primers.
gisaid_variant_primers.py.ipynb
notebook generates specific amino acid primers for each mutation present in GISAID data.usher_primers.py.ipynb
notebook generates specific amino acid primers for each independently reoccurring mutation on SARS-CoV-2 phylogenetic tree.positive_selection_primers.py.ipynb
notebook generates NNG/NNC primer pools for each position on spike that is undergoing positive selection.paired_positive_selection_primers.py.ipynb
notebook generates pools of NNG/NNC primers that introduce paired mutations for closely located sites that are undergoing positive selectionoPool_primer_sheets.py.ipynb
takes primer pools generated by the notebooks above and formats spreadsheets in accordance toIDTdna
oPool order input format.
- Bernadeta's lab notebook that includes all experiments done on this project can be found here.