-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First pass at making snakemake workflow for innovation model #11
base: master
Are you sure you want to change the base?
Conversation
I've tried to convert compare_natural notebook to a Python script There's a couple of remaining tasks for this section:
|
I couldn't run Snakemake due to a couple bugs in the Snakefile 1. run_models.smk missing comma 2. Snakefile missing pandas import 3. Some confusion in analysis_periods of iterating over list vs dictionary. Should be solved by passing analysis_periods.keys() to expand However, I couldn't test workflow due to missing data/ files so I'm not 100% sure this fixes things.
Thanks so much for diving in here @marlinfiggins. I'm sorry that I didn't notice this PR before you pointed it out to me last month. I just tried working from
I think that you're assuming that local files like I just did almost exactly this over here: https://github.com/blab/fitness-dynamics?tab=readme-ov-file#provision-metadata-locally. I can continue review once I know how to provision local data. Also, separate question: can we just drop |
This copies over logic from https://github.com/blab/fitness-dynamics where the prepare_clade_data rule that calls scripts/prepare-data.py is based on a defined analysis window of min_date and max_date rather than defining included_days. This is significantly cleaner for performing historical analyses. Additionally, drop references to cases (and the requirement of inputting cases to prepare-data.py). We're not uses cases in the MLR analysis and they just add unused overhead.
The line variant_relationships = pd.DataFrame(parent_map).reset_index() in prepare-pango-relationships.py was throwing an error for me. I've fixed it in this commit.
Hey @marlinfiggins. I was able to provision data locally and fix some errors to get me to
However, I can't figure out how to even begin with phenotypes. From
I'm pretty sure this was an issue with line 165 of:
I just updated this to
|
(continued...) However now calling
Can you please take a look at this? Please add instructions in |
This PR implements a Snakemake workflow for provisioning sequence counts with similar methods used in forecasts-ncov. This is my first time working with Snakemake, so any suggestions, comments, or questions are appreciated.
Major changes include:
config.yaml
for specification here)workflow/snakemake_rules/prepare_data.smk
)scripts/run-innovation-model
)This to be accomplished still:
mlr-fitness/data/pango-relationships.nb
for correctnessNote: I've borrowed the files
scripts/prepare-data.py
andscripts/collapse-lineages.py
directly from forecasts-ncov. Let me know if there's a better way of doing this.