Add workflow for evaluating predictions #12
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This workflow takes in three parts:
And estimates several metrics such as accuracy, precision, recall, and F1 for the predictions. This gives back an estimation of the true metrics, since the positive and negative manually curated mappings likely are not complete and therefore have some bias in which things were curated (e.g., I always curate the easiest first, leading towards a skew that more of my manual curations result in positive calls).
Why is this useful?
Mapping tool competitions don't have to keep writing their own infrastructure for holding their competitions. You do the following:
Demonstration
This also comes with a demonstrator by comparing a combination first-party ontology curations combine with third-party Biomappings curations against lexical mapping predictions made by Gilda. It reports the following when applied to a small number of OBO Foundry ontologies.
Completion refers to the percentage of predicted mappings that appear in the curated sets (both positive and negative). A higher completion reduces the impact of curation bias. E.g., a completion of 100% means that the metrics are unbiased.
Note that lexical matching has pretty high precision, i.e., most of the predictions it makes are right, but it is more prone to false negatives, so accuracy can vary. Some observations:
Caution
Mapping shouldn't be a competition. Make your predictions, curate them, contribute them to Biomappings or directly upstream, then everyone benefits and we don't have to keep playing this game.