Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for donor assignment negative infinity likelihoods. #477

Merged
merged 1 commit into from
Oct 23, 2024

Conversation

jamesnemesh
Copy link
Collaborator

Fix for edge case bug where multiple reads are summarized as a single UMI with a single phred score. Due to the underlying error rating being quantized by the pred score, a very high error rate of 0.9 or higher is quantized to a phred score of 0. This is equivalent to an error rate of 1 when the phred score is unpacked to a probability, which causes errors with the downstream likelihoods.

To fix this, during summarization, if the phred score is less than 1 it is set to 1. This avoids issues in the likelihood calculations where having probabilities of 0 causes issues when logged (plus probabilities should always include some uncertainty, which is lost if a phred score is 0.)

… UMI with a single phred score. During this summarization, if the phred score is less than 1, it's set to 1. Due to the underlying error rating being quantized by the pred score, a very high error rate of 0.9 or higher is rounded to a phred score of 0. To avoid errors, phred scores of 0 are set to 1. This avoids issues in the likelihood calculations where having probabilities of 0 causes issues when logged (plus probabilities should always include some uncertainty, which is lost if a phred score is 0.)
@jamesnemesh jamesnemesh merged commit aa5b923 into master Oct 23, 2024
4 checks passed
@jamesnemesh jamesnemesh deleted the jn_donor_assignment_bug branch October 23, 2024 03:08
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.65%. Comparing base (825b982) to head (655fa4c).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...digitalallelecounts/SummarizeUMIBaseQualities.java 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #477      +/-   ##
============================================
- Coverage     80.65%   80.65%   -0.01%     
  Complexity     4690     4690              
============================================
  Files           358      358              
  Lines         20030    20032       +2     
  Branches       3118     3119       +1     
============================================
  Hits          16156    16156              
- Misses         2608     2609       +1     
- Partials       1266     1267       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant