Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEVER MERGE: Test PR for adding synonyms - qc-duplicate-exact-synonym-no-abbrev #8584

Conversation

joeflack4
Copy link
Collaborator

@joeflack4 joeflack4 commented Jan 15, 2025

Testing whether or not bug fix for qc-duplicate-exact-synonym-no-abbrev failures in the following PR works:

Context:

- Updated wget URL for files using a mondo-ingest feature branch
@joeflack4 joeflack4 self-assigned this Jan 15, 2025
@joeflack4 joeflack4 added the bug label Jan 15, 2025
@joeflack4 joeflack4 marked this pull request as draft January 15, 2025 23:30
- Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo
@joeflack4 joeflack4 force-pushed the test-PR-adding-syns--v3--qc-duplicate-exact-synonym-no-abbrev branch from 6c21378 to f34ad89 Compare January 15, 2025 23:32
- Again, using a new round of changes: Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo
- Again, using a new round of changes: Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo
- Now passes locally. Will also run QC in PR.
Copy link
Collaborator Author

@joeflack4 joeflack4 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QC

I ran the qc-duplicate-exact-synonym-no-abbrev.sparql QC locally and it is now passing.

Now the CI GitHub action is running and we can confirm there when it is done: https://github.com/monarch-initiative/mondo/actions/runs/12802317540

edit: Unexpected to me, CI fully passed! Not just the 1 QC check that this PR was meant to address.
It looks like the 7 failures that were remaining in this PR were also handled:

Copy link
Member

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand what this PR exactly is doing, but synonym sync should never remove anything; so I don't know why I am looking at 30,000 removed lines in Mondo right now as a fix to a major known problem. What should be happening IMO is that you simply to not add new synonyms violating this QC check..


$(TMPDIR)/synonyms-added.robot.tsv:
wget "https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/refs/heads/main/src/ontology/reports/sync-synonym/sync-synonyms.added.robot.tsv" -O $@
wget "https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/refs/heads/qc-duplicate-exact-synonym-no-abbrev--mini-build/src/ontology/reports/sync-synonym/sync-synonyms.added.robot.tsv" -O $@
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! This is a 'do not merge' / 'never merge' test PR.

joeflack4

This comment was marked as duplicate.

Copy link
Collaborator Author

@joeflack4 joeflack4 Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation of negative diff

I do not understand what this PR exactly is doing, but synonym sync should never remove anything; so I don't know why I am looking at 30,000 removed lines in Mondo right now as a fix to a major known problem. What should be happening IMO is that you simply to not add new synonyms violating this QC check..

@matentzn Ah, this is because This PR is merging into test-PR-adding-syns, which already added a ton of synonyms which are violating QC.

What this does is not add those synonyms. After the last commit in test-PR-adding-syns, I reinstated the last mondo-edit.obo that existed before that branch. This is what is causing the "removals". Then I run the synonym sync which uses my mondo-ingest fix branch, and it doesn't add these violations, and QC now passes.

FYI for these mondo PRs, I would not tag you for review as I know you are tied up with other stuff right now, but mondo auto-tags you.

Edit: Might help to see the specific commit where reinstating the old mondo-edit causes this diff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops thanks for letting me know :)

Copy link
Collaborator

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reviewing the mondo-edit.obo file in this commit one thing that appears is that the Synonym Sync was not properly handling cases of synonyms where EXCLUDE was present since many of the removals in this commit are DOID synonyms where the existing synonym in mondo-edit.obo was already tagged with EXCLUDE.

@@ -203046,6 +203193,7 @@ subset: otar {source="MONDO:OTAR"}
subset: rare
synonym: "cth -" EXACT [icd11.foundation:1415819835]
synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, both synonyms exist in the diff BUT the synonym that has the double whitespace should not be in Mondo.
@joeflack4 - when the Synonym Sync is run again as part of the "Week 3" pipeline tasks, will only the synonym with the double whitespace fixed be added? That is, is seeing synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835] an artifact of how this PR was generated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this. It appears the stripping logic is working correctly.

The source of the problem is that while synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835] exists in this older state of mondo-edit.obo that this PR is based off, it does not exist currently:

id: MONDO:0009058
name: cystathioninuria
synonym: "cystathionase deficiency" EXACT [DOID:0090142, OMIM:219500, Orphanet:212]
synonym: "Cystathione gamma-lyase deficiency syndrome" EXACT [DOID:0090142]
synonym: "cystathione gamma-lyase deficiency syndrome" EXACT [DOID:0090142]
synonym: "cystathioninuria" EXACT CLINGEN_LABEL [DOID:0090142, icd11.foundation:1415819835, MONDO:ambiguous, NCIT:C129070, OMIM:219500, Orphanet:212]
synonym: "cystathioninuria (disease)" EXACT [https://orcid.org/0000-0002-6601-2165]
synonym: "gamma-cystathionase deficiency" EXACT [DOID:0090142, icd11.foundation:1415819835, Orphanet:212]

Since it does not appear among those synonyms, it is showing up as an "added" case instead of a "confirmed" case.

Copy link
Collaborator

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the changes in the diff for this commit look correct. This diff shows a number of categories of changes due to various things changing and the diff does not only represent changes only to resolve the Mondo QC check issue for qc-duplicate-exact-synonym-no-abbrev. The number of categories of changes and number of overall changes has made this PR more time intensive to review.

To document the review, here are the categories of changes I've seen in this commit (f34ad89):

  • OMIM synonyms were stripped of the trailing ", formerly" and now represented as RELATED --> this is correct
    Example:
- synonym: "arthrogryposis, distal, type 8, formerly" EXACT [OMIM:178110]
synonym: "arthrogryposis, distal, type 8" RELATED [MONDO:Lexical, OMIM:178110]
  • synonyms that are abbreviations are now also represented with the ABBREVIATION tag --> this is correct
    Example:
    previous: no example
    current: + synonym: "AUTOSOMAL DOMINANT" EXACT ABBREVIATION [OMIM:178110]
    This is the main item that needed to be fixed in the code that generated this PR.

  • synonyms that are only new to this PR
    Example:
    synonym: "Stiff man spectrum disorder" EXACT [Orphanet:3198] --> this is correct based on Orphanet

  • synonyms are now not added when the same synonym already exists with the tag EXCLUDE
    Example:

- synonym: "DiGeorge sequence" EXACT [DOID:11198] --> this removal is correct
synonym: "DiGeorge sequence" RELATED EXCLUDE [DOID:11198]
  • synonyms now have the "double whitespace" issue fixed
synonym: "CTH -  [cystathioninuria]" EXACT [icd11.foundation:1415819835]
+ synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
  • there are a handful of synonyms from OMIM in the diff that still have "..., formerly" EXACT --> these are incorrectly generated in the OMIM Ingest pipeline and I will add a new ticket into the OMIM repo about this. however, this is a small issue that does not require changes in the related code PR
    Examples:
synonym: "ANOP1, formerly" EXACT [OMIM:301590]

and

synonym: "maa" RELATED [OMIM:309800]
+ synonym: "MAA, formerly" EXACT ABBREVIATION [OMIM:309800]

and

+ synonym: "microphthalmia, syndromic 4" RELATED [OMIM:309800]
synonym: "microphthalmia, syndromic 4, formerly" EXACT [MONDO:Lexical, OMIM:301590]

and

synonym: "deafness, autosomal recessive 105" NARROW [OMIM:616958]
- synonym: "deafness, autosomal recessive 105, formerly" EXACT [OMIM:608653]
  • some Orphanet synonyms appear as-is and then also with 'syndrome' appended. this is not correct, but this is a small issue that does not require changes in the related code PR
synonym: "distal duplication 18q" EXACT [Orphanet:1716]
+ synonym: "Distal duplication 18q syndrome" EXACT [Orphanet:1716]
  • synonym: "ERYTHROCYTE ADA, ELEVATED, HEMOLYTIC ANEMIA DUE TO" EXACT ABBREVIATION [OMIM:301083]
    --> this is not an abbreviation. this very minor issue, I will add this to the OMIM repo

  • Other OMIM ingest issues:

synonym: "intellectual developmental disorder with gastrointestinal difficulties and high pain threshold" EXACT [OMIM:617450]
- synonym: "intellectual developmental disorder with gastrointestinal difficulties and high pain threshold, formerly" EXACT [OMIM:617450]

--> The remaining synonym should have been RELATED, not EXACT

synonym: "neurodevelopmental disorder with brain, liver, and lung abnormalities" EXACT [OMIM:618007]
- synonym: "neurodevelopmental disorder with brain, liver, and lung abnormalities, formerly" EXACT [OMIM:613658]

--> The remaining synonym should have been RELATED, not EXACT

synonym: "cap myopathy 1" EXACT []
- synonym: "cap myopathy 1, formerly" EXACT [OMIM:255310]

--> The remaining synonym should have been RELATED, not EXACT

  • synonym not parsed correctly. The OMIM content is not being parsed correctly for https://omim.org/entry/609029 so the value 22) is incorrectly represented as a synonym in omim.owl. I will post this to the OMIM repo
+ synonym: "22) SYNDROME" EXACT ABBREVIATION [OMIM:609029]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants