Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored SPP plugin w/tellread support. #92

Merged
merged 32 commits into from
Dec 12, 2024

Conversation

charles-cowart
Copy link
Contributor

@charles-cowart charles-cowart commented Sep 16, 2024

Refactored SPP plugin to support TellSeq. Replaced Step() workflow class and its two children (Metagenomic, Amplicon) w/Mixins. The new base Workflow() class is mixed with an Assay class and a SequencingTech class to create a new subclass e.g. StandardMetagenomicWorkflow(). Metatranscriptomic was also implemented as a separate class as well. All workflows are now much more extensible.

Job submissions and tests also updated to support sacct -> squeue migration and to support multiple Slurm jobs in parallel and wait on 1+ jobs until they complete or terminate.

Updated to support recent changes to Job.submit_job(). These changes
allow for submitting multiple Slurm jobs in parallel and waiting on 1+
jobs until they complete or terminate.

Added basic support for TellRead.
Refactored qp-klp plugin using mixins to break up tellseq.sh into
components and support a version 2.0 Tellread workflow in mg-scripts.

klp.py needs more cleanup and the tests need to be refactored. All
references to Step in klp need to be replaced. The actual
execute_pipeline() function for TellSeq Workflow() needs to be pasted
from StandardMetagenomicWorkflow() and modified to use the new
TellReadJob() class in mg-scripts.
Updated Assays and Instruments for Amplicon and TellSeq workflows.
Reorganized sample input files into folders to allow for easier naming.
@charles-cowart charles-cowart changed the title WIP: Updated to support changes to submit_job. Refactored SPP plugin w/tellread support. Nov 25, 2024
@charles-cowart
Copy link
Contributor Author

NB: This PR depends on biocore/mg-scripts#149

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions.

qp_klp/Assays.py Show resolved Hide resolved
qp_klp/Assays.py Show resolved Hide resolved
qp_klp/Assays.py Outdated Show resolved Hide resolved
qp_klp/Assays.py Outdated Show resolved Hide resolved
qp_klp/CHECKLIST.txt Outdated Show resolved Hide resolved
qp_klp/StandardMetagenomicWorkflow.py Show resolved Hide resolved
qp_klp/tests/data/sample-sheets/.DS_Store Outdated Show resolved Hide resolved
qp_klp/tests/data/sample-sheets/metagenomic/.DS_Store Outdated Show resolved Hide resolved
qp_klp/SequencingTech.py Outdated Show resolved Hide resolved
qp_klp/SequencingTech.py Outdated Show resolved Hide resolved
@charles-cowart
Copy link
Contributor Author

@antgonza ty for your patience! It's ready for review!

Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions.

qp_klp/StandardAmpliconWorkflow.py Outdated Show resolved Hide resolved
qp_klp/StandardMetagenomicWorkflow.py Outdated Show resolved Hide resolved
qp_klp/StandardMetatranscriptomicWorkflow.py Outdated Show resolved Hide resolved
qp_klp/__init__.py Show resolved Hide resolved
qp_klp/klp.py Show resolved Hide resolved
qp_klp/tests/data/tellread_test.sbatch Show resolved Hide resolved
Copy link
Member

@antgonza antgonza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

return df


class MetaOmic(Assay):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% why we need Assay and MetaOmic; could you clarify?

Copy link
Contributor Author

@charles-cowart charles-cowart Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Historically what we've had are two (Amplicon, Metagenomic) then three (Metatranscriptomic) pipelines that are largely similar with but with several large and small differences along the path. One example is Amplicon skipping the host-filtering step and then having to copy and reorganize files so that the output can be fed into FastQC and load preps into Qiita downstream. Metagenomic and Metatranscriptomic are essentially identical but they are required to have individual types. Now TellSeq is essentially the same pipeline downstream of host-filtering but the conversion process is entirely different and the host filtering needed to be modified to support its metadata. (Note that presently the TellSeq pipeline didn't require a separate Assay mixing but you can see how future additions might.)

Since we like to stay 'dry' and not repeat ourselves in code, we can't define four different pipelines that are 80% identical. Previous versions of SPP resolved this with an ever growing number of 'if type == 'Amplicon' else:' type statements. However introducing new features potentially broke all pathways and it became harder to understand how to modify the code to add new features or fix bugs.

What Assays does is break out all the code that changes depending on the pipeline being Amplicon or Meta*omic and organizes it so that it's all easier to read, changes to one type won't affect the other so it doesn't need to be retested, and we don't need to constantly test whether we're an amplicon run or not. Moreover helper methods that are shared between the different pipelines are defined in the base Assay class and don't have to be duplicated in each class.

qp_klp/Assays.py Outdated Show resolved Hide resolved
qp_klp/Assays.py Outdated Show resolved Hide resolved
from metapool import load_sample_sheet


PROTOCOL_NAME_NONE = "None"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used again, except in line 24; why not just define there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a page out of metapool in that I'm starting to define global string constants at the top of the appropriate file. I actually have 'ASSAY_NAME_NONE' defined for Assays as well. It's true that it's currently only used in one case, but IMHO it's good to define it with the others so that they're all defined in the same place rather than, 'two cases up here and the one negative case down there where you're likely to forget about it.'

@antgonza antgonza merged commit 2d66c98 into qiita-spots:main Dec 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants