Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spot prediction in projection command #209

Merged
merged 1 commit into from
Apr 9, 2024
Merged

Conversation

JeanMainguy
Copy link
Member

This PR fixes a rare problem with the spot prediction of the projection command.

Context Problem

In projection, we reproduce the spot graph and check that the original RGPs are in their spots. Then, from this spot graph, we add the new RGPs and find their spots.
In the pangenome of GTDB species HIMB11 sp003486095 (built with 20 genomes), I found an RGP that initially had no spot, which is included in the spot graph recreated by projection.

Explanation

It turns out that its right border had a gene from a family that was initially considered multigenic, but that this family became non-multigenic in the projection thanks to the addition of the projected genome in the family statistics.
Normally, the projected genome shouldn't be taken into account when rebuilding the spot graph.

Implemented solution

To fix this problem, multigenic families are now computed before the input genes are mapped to the pangenome families, preventing them from corrupting the family statistics during multigenic computation.

@axbazin axbazin self-requested a review April 9, 2024 07:07
@axbazin axbazin merged commit 26c00cc into dev Apr 9, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants