-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactored SPP plugin w/tellread support. #92
Conversation
Updated to support recent changes to Job.submit_job(). These changes allow for submitting multiple Slurm jobs in parallel and waiting on 1+ jobs until they complete or terminate. Added basic support for TellRead.
Refactored qp-klp plugin using mixins to break up tellseq.sh into components and support a version 2.0 Tellread workflow in mg-scripts. klp.py needs more cleanup and the tests need to be refactored. All references to Step in klp need to be replaced. The actual execute_pipeline() function for TellSeq Workflow() needs to be pasted from StandardMetagenomicWorkflow() and modified to use the new TellReadJob() class in mg-scripts.
Updated Assays and Instruments for Amplicon and TellSeq workflows.
Reorganized sample input files into folders to allow for easier naming.
NB: This PR depends on biocore/mg-scripts#149 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions.
qp_klp/tests/data/sample-sheets/metagenomic/tellseq/good_sheet_absq_draft1.csv
Outdated
Show resolved
Hide resolved
@antgonza ty for your patience! It's ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
return df | ||
|
||
|
||
class MetaOmic(Assay): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% why we need Assay and MetaOmic; could you clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Historically what we've had are two (Amplicon, Metagenomic) then three (Metatranscriptomic) pipelines that are largely similar with but with several large and small differences along the path. One example is Amplicon skipping the host-filtering step and then having to copy and reorganize files so that the output can be fed into FastQC and load preps into Qiita downstream. Metagenomic and Metatranscriptomic are essentially identical but they are required to have individual types. Now TellSeq is essentially the same pipeline downstream of host-filtering but the conversion process is entirely different and the host filtering needed to be modified to support its metadata. (Note that presently the TellSeq pipeline didn't require a separate Assay mixing but you can see how future additions might.)
Since we like to stay 'dry' and not repeat ourselves in code, we can't define four different pipelines that are 80% identical. Previous versions of SPP resolved this with an ever growing number of 'if type == 'Amplicon' else:' type statements. However introducing new features potentially broke all pathways and it became harder to understand how to modify the code to add new features or fix bugs.
What Assays does is break out all the code that changes depending on the pipeline being Amplicon or Meta*omic and organizes it so that it's all easier to read, changes to one type won't affect the other so it doesn't need to be retested, and we don't need to constantly test whether we're an amplicon run or not. Moreover helper methods that are shared between the different pipelines are defined in the base Assay class and don't have to be duplicated in each class.
from metapool import load_sample_sheet | ||
|
||
|
||
PROTOCOL_NAME_NONE = "None" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is never used again, except in line 24; why not just define there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking a page out of metapool in that I'm starting to define global string constants at the top of the appropriate file. I actually have 'ASSAY_NAME_NONE' defined for Assays as well. It's true that it's currently only used in one case, but IMHO it's good to define it with the others so that they're all defined in the same place rather than, 'two cases up here and the one negative case down there where you're likely to forget about it.'
Refactored SPP plugin to support TellSeq. Replaced Step() workflow class and its two children (Metagenomic, Amplicon) w/Mixins. The new base Workflow() class is mixed with an Assay class and a SequencingTech class to create a new subclass e.g. StandardMetagenomicWorkflow(). Metatranscriptomic was also implemented as a separate class as well. All workflows are now much more extensible.
Job submissions and tests also updated to support sacct -> squeue migration and to support multiple Slurm jobs in parallel and wait on 1+ jobs until they complete or terminate.