-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEVER MERGE: Test PR for adding synonyms - qc-duplicate-exact-synonym-no-abbrev
#8584
NEVER MERGE: Test PR for adding synonyms - qc-duplicate-exact-synonym-no-abbrev
#8584
Conversation
- Updated wget URL for files using a mondo-ingest feature branch
- Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo
6c21378
to
f34ad89
Compare
- Again, using a new round of changes: Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo
- Again, using a new round of changes: Reinstated the mondo-edit.obo version from the commit before running these tests. Ran synonym pipeline and updated mondo-edit.obo - Now passes locally. Will also run QC in PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QC
I ran the qc-duplicate-exact-synonym-no-abbrev.sparql
QC locally and it is now passing.
Now the CI GitHub action is running and we can confirm there when it is done: https://github.com/monarch-initiative/mondo/actions/runs/12802317540
edit: Unexpected to me, CI fully passed! Not just the 1 QC check that this PR was meant to address.
It looks like the 7 failures that were remaining in this PR were also handled:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand what this PR exactly is doing, but synonym sync should never remove anything; so I don't know why I am looking at 30,000 removed lines in Mondo right now as a fix to a major known problem. What should be happening IMO is that you simply to not add new synonyms violating this QC check..
|
||
$(TMPDIR)/synonyms-added.robot.tsv: | ||
wget "https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/refs/heads/main/src/ontology/reports/sync-synonym/sync-synonyms.added.robot.tsv" -O $@ | ||
wget "https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/refs/heads/qc-duplicate-exact-synonym-no-abbrev--mini-build/src/ontology/reports/sync-synonym/sync-synonyms.added.robot.tsv" -O $@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not merge this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! This is a 'do not merge' / 'never merge' test PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation of negative diff
I do not understand what this PR exactly is doing, but synonym sync should never remove anything; so I don't know why I am looking at 30,000 removed lines in Mondo right now as a fix to a major known problem. What should be happening IMO is that you simply to not add new synonyms violating this QC check..
@matentzn Ah, this is because This PR is merging into test-PR-adding-syns
, which already added a ton of synonyms which are violating QC.
What this does is not add those synonyms. After the last commit in test-PR-adding-syns
, I reinstated the last mondo-edit.obo
that existed before that branch. This is what is causing the "removals". Then I run the synonym sync which uses my mondo-ingest
fix branch, and it doesn't add these violations, and QC now passes.
FYI for these mondo
PRs, I would not tag you for review as I know you are tied up with other stuff right now, but mondo
auto-tags you.
Edit: Might help to see the specific commit where reinstating the old mondo-edit
causes this diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops thanks for letting me know :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reviewing the mondo-edit.obo
file in this commit one thing that appears is that the Synonym Sync was not properly handling cases of synonyms where EXCLUDE
was present since many of the removals in this commit are DOID synonyms where the existing synonym in mondo-edit.obo
was already tagged with EXCLUDE.
src/ontology/mondo-edit.obo
Outdated
@@ -203046,6 +203193,7 @@ subset: otar {source="MONDO:OTAR"} | |||
subset: rare | |||
synonym: "cth -" EXACT [icd11.foundation:1415819835] | |||
synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835] | |||
synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, both synonyms exist in the diff BUT the synonym that has the double whitespace should not be in Mondo.
@joeflack4 - when the Synonym Sync is run again as part of the "Week 3" pipeline tasks, will only the synonym with the double whitespace fixed be added? That is, is seeing synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
an artifact of how this PR was generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this. It appears the stripping logic is working correctly.
The source of the problem is that while synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
exists in this older state of mondo-edit.obo
that this PR is based off, it does not exist currently:
id: MONDO:0009058
name: cystathioninuria
synonym: "cystathionase deficiency" EXACT [DOID:0090142, OMIM:219500, Orphanet:212]
synonym: "Cystathione gamma-lyase deficiency syndrome" EXACT [DOID:0090142]
synonym: "cystathione gamma-lyase deficiency syndrome" EXACT [DOID:0090142]
synonym: "cystathioninuria" EXACT CLINGEN_LABEL [DOID:0090142, icd11.foundation:1415819835, MONDO:ambiguous, NCIT:C129070, OMIM:219500, Orphanet:212]
synonym: "cystathioninuria (disease)" EXACT [https://orcid.org/0000-0002-6601-2165]
synonym: "gamma-cystathionase deficiency" EXACT [DOID:0090142, icd11.foundation:1415819835, Orphanet:212]
Since it does not appear among those synonyms, it is showing up as an "added" case instead of a "confirmed" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the changes in the diff for this commit look correct. This diff shows a number of categories of changes due to various things changing and the diff does not only represent changes only to resolve the Mondo QC check issue for qc-duplicate-exact-synonym-no-abbrev
. The number of categories of changes and number of overall changes has made this PR more time intensive to review.
To document the review, here are the categories of changes I've seen in this commit (f34ad89):
- OMIM synonyms were stripped of the trailing ", formerly" and now represented as RELATED --> this is correct
Example:
- synonym: "arthrogryposis, distal, type 8, formerly" EXACT [OMIM:178110]
synonym: "arthrogryposis, distal, type 8" RELATED [MONDO:Lexical, OMIM:178110]
-
synonyms that are abbreviations are now also represented with the ABBREVIATION tag --> this is correct
Example:
previous: no example
current:+ synonym: "AUTOSOMAL DOMINANT" EXACT ABBREVIATION [OMIM:178110]
This is the main item that needed to be fixed in the code that generated this PR. -
synonyms that are only new to this PR
Example:
synonym: "Stiff man spectrum disorder" EXACT [Orphanet:3198]
--> this is correct based on Orphanet -
synonyms are now not added when the same synonym already exists with the tag EXCLUDE
Example:
- synonym: "DiGeorge sequence" EXACT [DOID:11198] --> this removal is correct
synonym: "DiGeorge sequence" RELATED EXCLUDE [DOID:11198]
- synonyms now have the "double whitespace" issue fixed
synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
+ synonym: "CTH - [cystathioninuria]" EXACT [icd11.foundation:1415819835]
- there are a handful of synonyms from OMIM in the diff that still have "..., formerly" EXACT --> these are incorrectly generated in the OMIM Ingest pipeline and I will add a new ticket into the OMIM repo about this. however, this is a small issue that does not require changes in the related code PR
Examples:
synonym: "ANOP1, formerly" EXACT [OMIM:301590]
and
synonym: "maa" RELATED [OMIM:309800]
+ synonym: "MAA, formerly" EXACT ABBREVIATION [OMIM:309800]
and
+ synonym: "microphthalmia, syndromic 4" RELATED [OMIM:309800]
synonym: "microphthalmia, syndromic 4, formerly" EXACT [MONDO:Lexical, OMIM:301590]
and
synonym: "deafness, autosomal recessive 105" NARROW [OMIM:616958]
- synonym: "deafness, autosomal recessive 105, formerly" EXACT [OMIM:608653]
- some Orphanet synonyms appear as-is and then also with 'syndrome' appended. this is not correct, but this is a small issue that does not require changes in the related code PR
synonym: "distal duplication 18q" EXACT [Orphanet:1716]
+ synonym: "Distal duplication 18q syndrome" EXACT [Orphanet:1716]
-
synonym: "ERYTHROCYTE ADA, ELEVATED, HEMOLYTIC ANEMIA DUE TO" EXACT ABBREVIATION [OMIM:301083]
--> this is not an abbreviation. this very minor issue, I will add this to the OMIM repo -
Other OMIM ingest issues:
synonym: "intellectual developmental disorder with gastrointestinal difficulties and high pain threshold" EXACT [OMIM:617450]
- synonym: "intellectual developmental disorder with gastrointestinal difficulties and high pain threshold, formerly" EXACT [OMIM:617450]
--> The remaining synonym should have been RELATED, not EXACT
synonym: "neurodevelopmental disorder with brain, liver, and lung abnormalities" EXACT [OMIM:618007]
- synonym: "neurodevelopmental disorder with brain, liver, and lung abnormalities, formerly" EXACT [OMIM:613658]
--> The remaining synonym should have been RELATED, not EXACT
synonym: "cap myopathy 1" EXACT []
- synonym: "cap myopathy 1, formerly" EXACT [OMIM:255310]
--> The remaining synonym should have been RELATED, not EXACT
- synonym not parsed correctly. The OMIM content is not being parsed correctly for https://omim.org/entry/609029 so the value
22)
is incorrectly represented as a synonym in omim.owl. I will post this to the OMIM repo
+ synonym: "22) SYNDROME" EXACT ABBREVIATION [OMIM:609029]
Testing whether or not bug fix for
qc-duplicate-exact-synonym-no-abbrev
failures in the following PR works:qc-duplicate-exact-synonym-no-abbrev
failures inmondo
mondo-ingest#751Context: