Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Synonym Sync: Remove recapitalization of acronyms #724

Merged

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented Dec 17, 2024

Overview

Fixed bug where this operation was being done for -updated and -confirmed, but it should only be done for -added.
Removed this recapitalization.

Pre-merge checklist

Documentation

Was the documentation added/updated under docs/?

  • Yes
  • No, updates to the docs were not necessary after careful consideration

QC

Was the full pipeline run before submitting this PR using sh run.sh make build-mondo-ingest on this branch (after
docker pull obolibrary/odkfull:dev), and no errors occurred?

  • Yes
  • No, there are no functional (code-related) changes to the pipeline in the PR, so no re-run is necessary

New Packages

Were any new Python packages added?

Were any other non-Python packages added?

PR Review and Conversations Resolved

Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?

  • Yes

Fixed bug where this operation was being done for -updated and -confirmed, but it should only be done for -added.
@joeflack4 joeflack4 force-pushed the bugfix-syn-sync-acronym-recap branch from 9a5a058 to b8c0679 Compare December 17, 2024 21:33
@joeflack4 joeflack4 changed the title Bug fix: Synonym Sync: Recapitalization of acronyms Bug fix: Synonym Sync: Accidental recapitalization of acronyms Dec 17, 2024
@joeflack4 joeflack4 self-assigned this Dec 17, 2024
@joeflack4
Copy link
Contributor Author

I need to come back to this. Might just want to get rid of this code here and introduce a warning elsewhere if Mondo acronyms are lowercase.

@joeflack4 joeflack4 changed the title Bug fix: Synonym Sync: Accidental recapitalization of acronyms Bug fix: Synonym Sync: Remove recapitalization of acronyms Dec 17, 2024
@joeflack4 joeflack4 added the bug Something isn't working label Dec 17, 2024
@twhetzel
Copy link
Contributor

Can you create a ticket and explain what is happening and what the current consequences are and which of these consequences this PR fixes?

Your comment:

I need to come back to this. Might just want to get rid of this code here and introduce a warning elsewhere if Mondo acronyms are lowercase.

makes it sound like there is more to think through here.

- Bug fix: I don't think we should have ever had this in the first place. I don't see that it actually fixes anything; only should cause bugs.
@joeflack4 joeflack4 mentioned this pull request Dec 17, 2024
9 tasks
Copy link
Contributor

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments inline.

@joeflack4 joeflack4 changed the base branch from bugfix-syn-sync-dupe-rows-684 to develop December 19, 2024 22:48
@joeflack4 joeflack4 changed the base branch from develop to bugfix-syn-sync-dupe-rows-684 December 19, 2024 22:52
@twhetzel twhetzel self-requested a review January 9, 2025 10:07
@joeflack4 joeflack4 merged commit b0edf7a into bugfix-syn-sync-dupe-rows-684 Jan 9, 2025
@joeflack4 joeflack4 deleted the bugfix-syn-sync-acronym-recap branch January 9, 2025 22:06
joeflack4 added a commit that referenced this pull request Jan 11, 2025
…tion of acronyms (#724)]

* Bug: Synonym Sync: Duplicate rows #684
- Bug fix: Fixed case in which sources can have multiple synonyms which only vary by capitalization, causing duplicate rows to appear in the results. We don't consider source capitalization as authoritative, so these variations are only useful for analysis and should not show up as multiple rows to be processed. Thus, we now aggregate capitalization variations into the single column synonym_case_diff_source.
- Bug fix: Address issues related to mondo capitalization variation.
- Update: Ensure the following columns are now in the output and are together: synonym_case_mondo, synonym_case_diff_mondo, synonym_case_mondo_is_many, synonym_case_source, synonym_case_diff_source, synonym_case_source_is_many. Note that previously, we had removed synonym_case_mondo & synonym_case_source, opting instead for the 'diff' columns, because it was previously only valuable to show the original capitalizations if there was a difference between the two. But now that we can have multiple variations in capitalization on the same syonym, it is useful to see the original case by itself, as wel as all the variations.
- Update: For synonym_case_source_is_many, ensure that all variations show up in synonym_case_source and synonym_case_diff_source columns. Note that when there are multiple capitalization variations at the source, we only need 1 row.
- Update: For all synonym_case_mondo_is_many, ensure that all variations show up in the synonym_case_diff_mondo column. But leave synonym_case_mondo as it is. We need to preserve the original case for that row, since unlike the source, we will retain multiple rows in the case that Mondo has multiple capitalization variations for a single synonym.

* Bug fix: Synonym Sync: Remove recapitalization of acronyms (#724)
- Fixed bug where this operation was being done for -updated and -confirmed, but it should only be done for -added.
- Bug fix: I don't think we should have ever had this in the first place. I don't see that it actually fixes anything; only should cause bugs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants