Added testing for disease predicates #151

lnrekerle · 2024-05-01T16:40:23Z

closes #137

I added a disease test, and proof that it works in the RPGRIP1 notebook.
A couple questions before I push this tho:

Did we want to run all diseases (like we do currently with phenotypes) or have the user give a specific disease (what we're currently doing)?
Should I rewrite compare_disease_vs_genotype so that the user gives a function they would normally see rather than having to give a class that's normally hidden? Or is the way it is right now good for now? This could be a rather large rewrite for that file (_gp_impl.py)

Thank you @pnrobinson @ielis

…class.

ielis · 2024-05-03T18:16:52Z

Hi @lnrekerle

thanks for the code! I added a few improvements/fixes to the code.

Regarding the questions.

Did we want to run all diseases (like we do currently with phenotypes) or have the user give a specific disease (what we're currently doing)?

I think we should let the user to provide a set of "specific diseases" - these are the diseases the user is interested to see. Let me call these query diseases. The query diseases act as a filter. If we have a cohort with 5 diseases and the user provides 2 query diseases, at most 2 query diseases will make it into results. If zero or (perhaps) just one of the query diseases is in fact represented in the cohort, the code should probably explode since this is likely a bug. Otherwise, we would either return an empty result or a result with correlation for one disease only, which IMO is not what the user wants to see.

Besides supporting the query diseases, IMO many times the users will want to see all diseases of the cohort. We should make this easy. One way to do this is to write a function like this:

class CohortAnalysis:

    # bla bla bla

   def compare_genotype_by_diseases(
        self,
        genotype_predicate: GenotypePolyPredicate,
        disease_ids: typing.Optional[typing.Sequence[hpotk.TermId]] = None,
   ) -> GenotypePhenotypeAnalysisResult:
        pass

The function takes a genotype poly predicate, which can be (as you know) anything, and an optional sequence of disease IDs formatted as hpotk.TermId. If disease_ids==None, then the user wants to see all diseases. If len(disease_ids) in {0, 1} we should probably explode because of what I wrote above. Otherwise, we use the disease_ids as a filter.

Should I rewrite compare_disease_vs_genotype so that the user gives a function they would normally see rather than having to give a class that's normally hidden? Or is the way it is right now good for now? This could be a rather large rewrite for that file (_gp_impl.py)

This question is a little hard for me to understand, I'll try my best though.
I think we are approaching a point where we will have to write a function like this:

def compare_genotype_by_phenotype(
    self,
    genotype_predicate: GenotypePolyPredicate,
    phenotype_predicate: PhenotypePolyPredicate,
) -> GenotypePhenotypeAnalysisResult:
    pass

This will become a central point of the entire analysis. All convenience functions that we currently have on CohortAnalysis, such as compare_by_protein_feature_type or compare_by_variant_key will actually boil down to compare_genotype_by_phenotype. So, most of the time, the user will not have to deal with a "class that is normally hidden" (e.g. ExonPredicate or PropagatingPhenotypePredicate) but the user will call one of the convenience function.
However, when the convenience function is not there, the user will have to create genotype/phenotype predicate and run the analysis.

This may seem like a huge change, but we actually +- have this in GpCohortAnalysis._apply_poly_predicate. We may need to discuss the requirements and tweak this a bit at some but it should not be too much work.

This sounds like a good subject for the next meeting. Please think about the requirements. Right now we _apply_poly_predicate takes an iterable of phenotype predicates and single genotype poly predicates. It would be good to think about the cardinalities, especially the corner cases of the phenotype part. E.g. what happens with patients diagnosed with multiple diseases.

Added testing for disease predicates

971b91d

lnrekerle self-assigned this May 1, 2024

lnrekerle marked this pull request as draft May 1, 2024 16:40

lnrekerle requested review from ielis and pnrobinson May 1, 2024 16:40

lnrekerle and others added 8 commits May 1, 2024 11:53

Oops, forgot to actually push the tests.

dd874b0

DiseasePresencePredicate should be tested in a separate test suite/…

c565b15

…class.

Remove unused import.

21e75cf

Update pydoc.

cf342a2

Remove redundant imports.

a3406e5

Improve handling of formatting TermIds of non-HPO terms.

712c6ba

Split HPO phenotypes and diseases and use as fixtures.

a15c2ee

Improve the comment string.

f7e077c

lnrekerle added 3 commits May 8, 2024 15:56

small fixes including fixing LMNA inputs

515bd97

Fixed a test

42770aa

Merge remote-tracking branch 'origin/develop' into disease_testing

ab84c8f

lnrekerle marked this pull request as ready for review May 9, 2024 17:46

lnrekerle merged commit 1df6274 into develop May 10, 2024
5 checks passed

lnrekerle deleted the disease_testing branch May 10, 2024 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added testing for disease predicates #151

Added testing for disease predicates #151

lnrekerle commented May 1, 2024 •

edited

Loading

ielis commented May 3, 2024

Added testing for disease predicates #151

Added testing for disease predicates #151

Conversation

lnrekerle commented May 1, 2024 • edited Loading

ielis commented May 3, 2024

lnrekerle commented May 1, 2024 •

edited

Loading