Refactored SPP plugin w/tellread support. #92

charles-cowart · 2024-09-16T22:43:07Z

Refactored SPP plugin to support TellSeq. Replaced Step() workflow class and its two children (Metagenomic, Amplicon) w/Mixins. The new base Workflow() class is mixed with an Assay class and a SequencingTech class to create a new subclass e.g. StandardMetagenomicWorkflow(). Metatranscriptomic was also implemented as a separate class as well. All workflows are now much more extensible.

Job submissions and tests also updated to support sacct -> squeue migration and to support multiple Slurm jobs in parallel and wait on 1+ jobs until they complete or terminate.

Updated to support recent changes to Job.submit_job(). These changes allow for submitting multiple Slurm jobs in parallel and waiting on 1+ jobs until they complete or terminate. Added basic support for TellRead.

Refactored qp-klp plugin using mixins to break up tellseq.sh into components and support a version 2.0 Tellread workflow in mg-scripts. klp.py needs more cleanup and the tests need to be refactored. All references to Step in klp need to be replaced. The actual execute_pipeline() function for TellSeq Workflow() needs to be pasted from StandardMetagenomicWorkflow() and modified to use the new TellReadJob() class in mg-scripts.

Updated Assays and Instruments for Amplicon and TellSeq workflows.

Reorganized sample input files into folders to allow for easier naming.

charles-cowart · 2024-12-03T18:17:02Z

NB: This PR depends on biocore/mg-scripts#149

antgonza

A few questions.

qp_klp/Assays.py

qp_klp/CHECKLIST.txt

qp_klp/StandardMetagenomicWorkflow.py

qp_klp/tests/data/configuration_profiles/iseq_metagenomic.json

qp_klp/tests/data/configuration_profiles/miseq_metagenomic.json

qp_klp/tests/data/sample-sheets/.DS_Store

qp_klp/tests/data/sample-sheets/metagenomic/.DS_Store

qp_klp/SequencingTech.py

qp_klp/tests/data/sample-sheets/metagenomic/tellseq/good_sheet_absq_draft1.csv

charles-cowart · 2024-12-06T06:03:16Z

@antgonza ty for your patience! It's ready for review!

antgonza

Some questions.

qp_klp/StandardAmpliconWorkflow.py

qp_klp/StandardMetagenomicWorkflow.py

qp_klp/StandardMetatranscriptomicWorkflow.py

qp_klp/__init__.py

qp_klp/klp.py

qp_klp/tests/data/tellread_test.sbatch

antgonza

A few more comments.

antgonza · 2024-12-11T14:42:09Z

qp_klp/Assays.py

+        return df
+
+
+class MetaOmic(Assay):


I'm not 100% why we need Assay and MetaOmic; could you clarify?

Sure. Historically what we've had are two (Amplicon, Metagenomic) then three (Metatranscriptomic) pipelines that are largely similar with but with several large and small differences along the path. One example is Amplicon skipping the host-filtering step and then having to copy and reorganize files so that the output can be fed into FastQC and load preps into Qiita downstream. Metagenomic and Metatranscriptomic are essentially identical but they are required to have individual types. Now TellSeq is essentially the same pipeline downstream of host-filtering but the conversion process is entirely different and the host filtering needed to be modified to support its metadata. (Note that presently the TellSeq pipeline didn't require a separate Assay mixing but you can see how future additions might.)

Since we like to stay 'dry' and not repeat ourselves in code, we can't define four different pipelines that are 80% identical. Previous versions of SPP resolved this with an ever growing number of 'if type == 'Amplicon' else:' type statements. However introducing new features potentially broke all pathways and it became harder to understand how to modify the code to add new features or fix bugs.

What Assays does is break out all the code that changes depending on the pipeline being Amplicon or Meta*omic and organizes it so that it's all easier to read, changes to one type won't affect the other so it doesn't need to be retested, and we don't need to constantly test whether we're an amplicon run or not. Moreover helper methods that are shared between the different pipelines are defined in the base Assay class and don't have to be duplicated in each class.

qp_klp/Assays.py

antgonza · 2024-12-11T14:44:11Z

qp_klp/Protocol.py

+from metapool import load_sample_sheet
+
+
+PROTOCOL_NAME_NONE = "None"


This is never used again, except in line 24; why not just define there?

I'm taking a page out of metapool in that I'm starting to define global string constants at the top of the appropriate file. I actually have 'ASSAY_NAME_NONE' defined for Assays as well. It's true that it's currently only used in one case, but IMHO it's good to define it with the others so that they're all defined in the same place rather than, 'two cases up here and the one negative case down there where you're likely to forget about it.'

charles-cowart added 25 commits September 16, 2024 15:40

Updated to support changes to submit_job.

7d7aeea

Updated to support recent changes to Job.submit_job(). These changes allow for submitting multiple Slurm jobs in parallel and waiting on 1+ jobs until they complete or terminate. Added basic support for TellRead.

Updated

cdb613a

point to add_tellread branch of mg-scripts

22d5f72

Update git w/latest on Workflows

4e508d9

klp updated to use new classes, streamlined

ab986e0

Added tests for Workflow creation using factory class.

1953326

Updated Assays and Instruments for Amplicon and TellSeq workflows.

Reorganized sample input files into folders.

485ec72

Reorganized sample input files into folders to allow for easier naming.

Revamping existing tests

0bf48c1

Revamping NuQCJob tests

dd68d5c

New tests exercise ConvertJob and NuQCJob

707506a

Updates from this week

0704682

Updated Amplicon quality control test.

f3deac4

Address test fail on MacOS vs Linux

933e78f

Updates

09a1cc2

Pre-test push

73d5d06

Checkpoint

b46ba32

checkpoint2

ebe2915

Code audit

e963ed7

All tests are passing again post metapool + mgscripts updates

4f92021

flake8

b443c14

smaller dummy files

3d7e8ab

update test

6a9ac71

Updates

632c366

Removed TODOs

2b1e4e9

charles-cowart changed the title ~~WIP: Updated to support changes to submit_job.~~ Refactored SPP plugin w/tellread support. Nov 25, 2024

charles-cowart requested a review from antgonza November 25, 2024 05:20

Updates

e28bb9f

antgonza requested changes Dec 3, 2024

View reviewed changes

wasade reviewed Dec 3, 2024

View reviewed changes

qp_klp/SequencingTech.py Outdated Show resolved Hide resolved

wasade reviewed Dec 4, 2024

View reviewed changes

qp_klp/SequencingTech.py Outdated Show resolved Hide resolved

wasade reviewed Dec 4, 2024

View reviewed changes

qp_klp/tests/data/sample-sheets/metagenomic/tellseq/good_sheet_absq_draft1.csv Outdated Show resolved Hide resolved

charles-cowart added 3 commits December 3, 2024 17:29

Some updates based on feedback

f195631

Removing .DS_Store

41dfbf5

Updates based on feedback

e0272f8

antgonza requested changes Dec 9, 2024

View reviewed changes

charles-cowart added 2 commits December 10, 2024 10:04

Updates based on feedback

ccfc81b

Updates based on testing

dd6db59

antgonza requested changes Dec 11, 2024

View reviewed changes

Updates based on feedback

32c808d

antgonza merged commit 2d66c98 into qiita-spots:main Dec 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactored SPP plugin w/tellread support. #92

Refactored SPP plugin w/tellread support. #92

charles-cowart commented Sep 16, 2024 •

edited

Loading

charles-cowart commented Dec 3, 2024

antgonza left a comment

charles-cowart commented Dec 6, 2024

antgonza left a comment

antgonza left a comment

antgonza Dec 11, 2024

charles-cowart Dec 12, 2024 •

edited

Loading

antgonza Dec 11, 2024

charles-cowart Dec 12, 2024

		from metapool import load_sample_sheet


		PROTOCOL_NAME_NONE = "None"

Refactored SPP plugin w/tellread support. #92

Refactored SPP plugin w/tellread support. #92

Conversation

charles-cowart commented Sep 16, 2024 • edited Loading

charles-cowart commented Dec 3, 2024

antgonza left a comment

Choose a reason for hiding this comment

charles-cowart commented Dec 6, 2024

antgonza left a comment

Choose a reason for hiding this comment

antgonza left a comment

Choose a reason for hiding this comment

antgonza Dec 11, 2024

Choose a reason for hiding this comment

charles-cowart Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

antgonza Dec 11, 2024

Choose a reason for hiding this comment

charles-cowart Dec 12, 2024

Choose a reason for hiding this comment

charles-cowart commented Sep 16, 2024 •

edited

Loading

charles-cowart Dec 12, 2024 •

edited

Loading