- Fix minor issue w embedding Altair plots in VitePress homepage.
- Various updates to the
summary
plots:- Add option
no_mean_lineplot
to not show antibody-escape mean lineplot (see this issue). - Add option
lineplot_antibody_label_loc
("right" or "top") for labels on antibody-escape lineplots ((see this issue). - Add site labels to lineplots when only one antibody per group (addresses this issue)
- Add
scale_lineplot_height
option to adjust height of lineplots. - Add
selectable_per_antibody_heatmap
option to make antibody escape heatmap selectable for which antibody is shown.
- Add option
- Fix bug in
func_effect_diffs
tooltips introduced in 3.19.2.
- Make
func_effect_diffs
work when there are many selections by making correlation heatmap rather than many scatters, and showing tooltips as list of per-selection values. Also makesper_selection_tooltips
not needed in config forfunc_effect_diffs
.
- Fixed bug introduced in version 3.19.0 that causes pip installation of
alignparse==0.6.3
to fail.
- Added the ability to average results (functional effects, functional effect shifts, antibody escape) for different experiments keeping only different regions (sites) for each selection. This is useful if the experiments are done on different halves of a gene. Addresses this issue.
- The
func_effects_global_epistasis
notebooks now make plots showing correlation between mutation effects inferred for mutations by the global epistasis model versus the average effects of these mutations among all single-mutant variants. This helps with evaluating questions like this about whether single mutant estimates are being affected by the global epistasis model. - Increased the number of iterations and the stringency of the convergence criteria for the
multidms
global epistasis model fitting per the suggestions here. - Update
conda
environment to latest versions ofaltair
(5.4 -> 5.5),baltic
(0.2.2 -> 0.3),biopython
(1.83 -> 1.84),entrez-direct
(16.2 -> 22.4),markdown
(3.4 -> 3.6),matplotlib
(3.8 -> 3.9).minimap2
(2.26 -> 2.28),plotnine
(0.12 -> 0.14),python
(3.11 -> 3.12),
ruamel.yaml(0.17 -> 0.18),
snakemake(8.20.6 -> 8.25.3),
alignparse(0.6.2 -> 0.6.3),
polyclonal(6.12 -> 6.14), and
multidms(0.3.3 -> 0.4.2). Note that
multidms0.4.2 fixes a [this issue](https://github.com/matsengrp/multidms/issues/165) where
altairwas being downgraded to 5.1.2. Note that
polyclonal6.14 fixes [this issue](https://github.com/dms-vep/dms-vep-pipeline-3/issues/107) where the
minimum max of site` slider did not work properly with hidden sites. - Made some internal changes to code to adjust to version update of
ruamel.yaml
, as new version dropsround_trip_dump
andsafe_load
. - Some notebooks previously read input parameters from the YAML configuration file, which is non-ideal as it meant that
snakemake
was not explicitly tracking what was passed to them. Fixed this: now all parameters they use are passed viapapermill
parameterization. - Fixed bug introduced in version 3.14 that causes
analyze_variant_counts
notebook to be excluded from docs in some cases. - Update node version for homepage deployment and add note to README about how deployment version must match that used to create
package-lock.json
(addresses this issue).
- Add
no_heatmap
option tosummaries_config
to not show some heatmaps. Addresses this issue.
- Allow JSON (
*.json
) files to be listed in the auto-rendered DOCS in the same way CSV and FASTA files can. Useful for linking todms-viz
JSONs.
- Update
snakemake
,altair
, andupsetplot
versions inenvironment.yml
.
- Remove
defaults
channel fromenvironment.yml
. Addresses this issue.
- Update to
polyclonal
6.12 in order to work with latestbinarymap
; otherwise the update to the newestbinarymap
version caused error.
- Add
duplicate_fastq_R1
flag toconfig.yaml
to allow the user to specify what happens if a FASTQ is duplicated among samples. Addresses this issue.
- Update some packages in
conda
env (fixes this issue):pandas
to 2.2multidms
to 0.3.3altair
to 5.3snakemake
to 8.14neutcurve
to 2.1papermill
to 2.6
- Added test that compares actual results to expected ones for
test_example
better testing of pipeline updates.
- Make change in version 3.14.0 backward compatible by assuming
use_precomputed_barcode_counts
isFalse
if not specified.
- Add option to use pre-computed barcode counts (
use_precomputed_barcode_counts
) so pipeline can be entirely run from pre-calculated barcode-variant table and barcode counts within repo. Addresses this issue.
- Add replicate scatter plots in the notebooks that average mutation effects. Addresses this issue.
- Make nicer version of the scatter plot produced by
func_effect_diffs
that compares the functional effects in the two conditions, and save it to the default docs.
- Allow more customization of summary plots, such as by specifying CSVs explicitly. This is a backward-incompatible change in how specify the YAML configuration for the summary plots, in that now the CSV file is specified and all non-escape phenotypes are grouped together rather than as separate keys.
- Fix bug in summary plot line plot zooming. Now sites in line plot are always aligned with overlay bar prior to zooming, and missing sites are shown empty. Addresses this issue.
- Fix bug in setting of limits in summary plots with
min_at_least
.
- Remove the titles and legends from interactive figures (see this issue). These were not really being used, and with the new VitePress option that is a better way to provide detailed information around figures. This change:
- removes the
title
andlegend
keys associated with various figures and plots in the configuration YAMLs. - the pipeline no longer makes the
<path>_nolegend.html
versions of files as the final version (<path>.html
) now does not have a legend, so corresponding remove theformat_altair_html
rule. This removes the following files:common.smk
common_funcs.smk
scripts/format_altair_html.py
- Remove
bs4
fromenvironment.yml
as it is no longer needed.
- removes the
- Allow
mutation_annotations
file that provides annotations for specific mutations, such as how many nucleotide mutations are required to generate them (see this issue). In the test example, this is implemented by specifyingnt changes to codon
. These annotations can then be used as filters by specifying the indicated columns in the configuration for the average across replicates plots for antibody escape and functional effects, as well as in the summaries. - Bug fix to alignment of line plot and scale bar and some filtering bugs in summary plots
- Fix change to add VitePress homepage backward-compatible as originally intended by not building homepage (versus raising error) in VitePress related data totally ommitted from config.
- Add code and instructions on how to optionally build a nicer and more customized documentation using VitePress for hosting on GitHub Pages.
- Update the version of
mafft
from7.520
to7.525
to avoid channel priority issues.
- Allow multiple summaries to be specified in
summaries_config.yaml
. Addresses this issue. Note that the name of the output files in./results/summaries
has slightly changed, and as a result you will need to update the.gitignore
slightly. When migrating to this new version, full delete your old./results/summaries
subdirectory to remove obsolete naming, then re-run. Also, the table of contents in the output HTML for GitHub pages now labels the summaries slightly differently. - Lint Jupyter notebooks by fixing
ruff.toml
(addresses this issue).
- Enable
func_effect_diffs
heat map to be filtered by mutation effects on each individual type of functional effect (addresses this issue). This change actually just updates thefunc_effects_config.yml
for the test example, not the actual code.
- Pipeline can run correctly when
antibody_escape_config
is null (addresses this issue).
- Better error messages with problems with
barcode_runs
- Add
func_effect_diffs
option to compare differences among functional effects. - Update environment:
- update
polyclonal
to 6.11 (addresses this issue) - update
neutcurve
to 1.1.2 - update
altair
to 5.2.0 - update
biopython
to 1.83 - update
pandas
to 2.1 and addpyarrow
. Did not update topandas
2.2 due to this issue. - update to
seaborn
0.13 - update to
snakemake
8.3. Note that this means the recommended usage now changes from--use-conda
to--software-deployment-method conda
.
- update
- sort rows in prob escape values for consistent output, may very slightly change some of the fit antibody-escape values
- Fit each lasso weight independently in
multidms
comparisons (see this issue).
- Fix
PeriodicWildCardError
that was raised bysnakemake
for the_nolegend
output HTML plots for certain wildcards. - Fix how notebook linting done with
ruff
in tests.
- Fix bug in reporting pre-selection counts cutoff in
func_scores
. See this issue. The cutoff was applied correctly before and that has not changed, this just fixes reporting of it in plots inanalyze_func_scores
.
- Fix bug in
per_antibody_escape.csv
production insummary
introduced in version 3.5.1.
- Fix bug introduced in
summary
when no antibodies in version 3.5.1
- Fix tooltips in std chart in averaging notebooks when column has a
.
in the name (see this issue). summary
rule now creates a per-antibody escape CSV file.
- Updated to
polyclonal
6.9 - Add options to filter average measurements based on excessive variability in measurements across replicates (see this issue):
func_effects_config.yaml
now allows you to specifyeffect_std: <value>
underplot_kwargs: addtl_slider_stats
, remember to also addaddtl_slider_stats_as_max: [effect_std]
. You can also specifyfloor_for_effect_std
underavg_func_effects
to floor before computing the standard deviation.avg_func_effects.ipynb
now makes plots witheffect_std
slider and also plots distribution of effect standard deviations to help you choose a good initial value for this filter.- the results files with the average functional effects now include the
effect_std
column antibody_escape_config.yaml
now allows you to specifyescape_std: <value>
underplot_kwargs: addtl_slider_stats
, remember to also addaddtl_slider_stats_as_max: [escape_std]
.avg_antibody_escape.ipynb
now makes plots withescape_std
slider and also plots distribution of escape standard deviations to help you choose a good initial value for this filter.- the results files with the average antibody escape now include the
escape_std
column summaries_config.yml
now provides ale_filter
option which can be used to set a filter on the standard deviations.
- Customize scales in probability escape plots by adding
concentration_scale
andconcentration_title
to assay configuration inantibody_escape_config.yaml
. See this issue.
- Added
baltic
to the environment.
- Update to
dmslogo
0.7.0, which fixes this issue
- Update versions of some software in
conda
env:altair
-> 5.1.2matplotlib
-> 3.8papermill
-> 2.4snakemake
-> 7.32dmslogo
-> added version 0.6.3neutcurve
-> added version 0.5.7
- In
summary
plots you can now specify eithermax_at_least
orfixed_max
andmin_at_least
orfixed_min
for heatmaps.
- Order antibodies in
summary
plots as ordered in config. Addresses this issue.
- Fix bug that occurs when
barcode_runs
is set tonull
in theconfig.yaml
. Addresses this issue.
- Fix bug in
summary
when noother_assays
; also fix filtering on heatmaps sliders when more than two other properties. Addresses this issue.
- Antibody names passed to
summary
cannot have non-alphanumeric characters. Before this led to a silent and hard to diagnose failure. Now there is an explicit check that throws an interpretable error if any antibodies are assigned forsummary
non-alphanumeric names (addresses this issue).
- Fix bug so
summary
works even if no antibody escape data. - Include legend in
summary
plot when no antibody escape data.
- Fix bug in site sorting in line plots in
summary.ipynb
- Add new rules to create summaries of all of the data across assays. This adds the new rules in
summaries.smk
and the new configuration insummaries_config.yml
. - Add
show_icXX_in_docs
key toantibody_escape_config.yml
underavg_assay
, and then do not show ICXX values in final docs if this isFalse
or missing. The reason is that these values seem to be highly correlated with the escape values themselves, so there is often not much advantage showing them additionally. Furthermore, they sometime seem to be linearly correlated with validation assays but not with slope 1, making the unit-less escape value perhaps better (and so giving a reason not to show ICXX). This change is backward incompatible with respect to how the final HTML docs look: unless you addshow_icXX_in_docs: true
for a given assay average, it will not be shown in docs. Addresses this issue. - Fix bug introduced in version 3.3.0 (here) where the information written to the
avg_escape
CSVs was not correct for non-antibody and non-receptor-affinity selections.
- Improve visual appearance of some site plots by making x-axis labels less crowded.
- In
avg_escape
, plot the correlations using the same filters and metric that is being used for the final average plot. Addresses this issue.
- Restructure to allow arbitrary assays in
antibody_escape_config.yml
, not just antibody escape and receptor affinity. Backwards incompatible: This restructuring requires you to add a newassays
key toantibody_escape_config.yml
that defines the assays being used.
- Do not show assays in the docs if no samples for that assay.
- Allow additional site numbering schemes to be defined in addition to
reference_site
andsequential_site
by retaining as a tooltip any column insite_numbering_map
that ends insite
. This change also means you no longer need to specifysequential_site
underaddtl_tooltip_stats
inantibody_escape_config.yml
.
- De-clutter the HTML docs by creating subsections that separate the "Final summary plots", "Analysis notebooks", and "Data files". Addresses this issue.
- Add
other_target_files
variable: files can be added to this (eg, incustom_rules.smk
) and then the pipeline will ensure these files are also created. Addresses this issue. - Updated to
polyclonal
6.7, which addresses this issue and this issue.
- Add
min_filters
toplot_hide_stats
inantibody_escape_config.yaml
to only hide/filter robustly measured values.
- Constrain wildcards in
format_altair_html
to better enable custom rules
- Generalize
antibody_escape
to also allow receptor affinity selections with soluble receptor. - Changed output of
avg_escape
rules for antibodies to bemut_effect
rather thanmut_escape
. This is becauseeffect
can generally refer to other phenotypes too. - For the average mutation effects and ICXX values, also report the per-model values.
- Upgrade to
polyclonal
6.6, which now shows (hidden) mutations w deleterious functional effects of other hiding metrics. - Collapse lists in docs index even if just one entry
- Apply
times_seen
filter when computing correlations inavg_escape
. - Internally restructured to re-use a single rule for formatting altair plots by creating
common.smk
and then putting theformat_altair_html
rule there and removing plot-specific versions of that rule.
- Fixed a bug in
avg_func_effect_shifts
when lasso penalty is in scientific notation.
- Upgrade to
polyclonal
6.5.
- For average plots, change default to only show mutations if present in one more than half of all models / selections / comparisons. This reduces showing of mutations found in a minority of replicates.
- Compute and analyze shifts in functional effects between different conditions computed using
multidms
. This adds thefunc_effect_shifts
,avg_func_effect_shifts
, andformat_avg_func_effect_shifts_chart
rules. - Update
multidms
to version 0.2.1 - Add
ipywidgets
andseaborn
to environment - Improvements to GitHub actions testing: save log files on failure, and don't build
conda
environment twice.
- Allow underscores in antibody selection names. (Note this changes paths for prob escape files).
- Add tooltips to plot in
avg_antibody_escape
- Use log-scale in neutralization vs concentration plots for concentration.
- Added plot of diversity indices in
analyze_variant_counts
- Fix x-axis range of line plots in
avg_antibody_escape
- Plot x-axis using reference rather than sequential site in
build_codon_variants
andanalyze_variant_counts
- Allow the parent gene sequence to not start with ATG (start codon) if it fully translates otherwise.
- Update to
polyclonal
6.4
- Fix error causing crashing of
build_docs
due to leftover line from earlier debugging. - Update to
polyclonal
6.3 - Updated to
multidms
0.1.9
- Upgrade
multidms
to version 0.1.6, which fixes error with letter-suffixed site numbers.
This is a total re-write of the pipeline in a new repository, replacing versions 1 and 2 that are still available at https://github.com/dms-vep/dms-vep-pipeline. There are numerous changes to the pipeline structure in version 3 relative to versions 1 and 2, although much of the logic of the pipeline is the same.