This subdirectory contains input data used by the pipeline.
PacBio_amplicon.gb: Genbank file having features to parse with alignparse. Has H5 HA marked as gene (the gene of interest) and barcode site annotated.
PacBio_feature_parse_specs.yaml: How to parse the PacBio amplicon using alignparse.
PacBio_runs.csv: List of PacBio CCS FASTQs used to link barcodes to variants. It must have the following columns:
library
: name of the libraryrun
: name of the sequencing run, must be uniquefastq
: FASTQ file from running CCS
site_numbering_map.csv: Maps several different numbering shcemes for HA. Columns in the spreadsheet include: sequential_site (sequential numbering of H5 HA 1,2,3...), reference_site (H3 reference numbering applied to H5 HA), reference_H1_site (H1 numbering applied to H5 HA), mature_H5_site (H5 HA sequential numbering starting after signal peptide), HA1_HA2_H5_site (Sequential H5 HA HA1 and HA2 ), region (assigns each site to a region of the protein).
data/mutation_design_classification.csv classifies mutations into the different categories of designed mutations. Has columns sequential_site, amino_acid, and mutation_type.
neutralization_standard_barcodes.csv barcodes for the neutralization standards. Has columns barcode and name, giving the barcode and name of this neutralization standard set.
barcode_runs.csv contains all samples and paths to sequencing files. It has the following format:
sample
: sample namelibrary
: name of librarydate
: date of sequencingfastq_R1
: path to one more FASTQ R1 sequencing files, multiple files should be semicolon-delimited
func_effects_config.yml has the configuration for analyzing functional effects of mutations. The format is explained within the file.
antibody_escape_config.yml has the configuration for analyzing escape from antibodies, stability treatment, etc.
summary_config.yml has the configuration for making summaries across assays.