-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFA file of only gap records segfaults #30
Comments
Yikes. I assume we'd prefer an error (e.g. "Segment not found for gap <gap_id>")? Just to verify I understand correctly: this is not valid GFA, and we should never get GFA that has the records spread across multiple files like this, right? |
Short answer, yes. It's not valid GFA. Long answer. ABySS produces a GFA file of the segment records and edge records. For large genomes this file can be quite large. In a second step, ABySS then uses the paired-end and mate-pair reads to estimate the distances between segments and outputs the gap records. Rather than make a copy of the potentially large S+E records, it outputs only the gap records. ABySS can handle reading a GFA file spread across multiple files for this reason. It'd be useful to me if Gfakluge could also read these split files. Your call of course whether you want to support that or not. It's easy enough to use either |
Interesting. How big are these two files? I have been thinking about restructuring the command line tools to not build the GFAKluge object when the graph isn't being modified. When I get around to this I'll add support for breaking the graph into multiple files (with a stern warning, of course).
|
I guess I should mention: tools that don't modify the graph are:
These tools would support abyss' split file format, with a warning. The rest of the tools should support the complete ( (S + E) + (G) ) file, even if it is very large, and should be able to handle it regardless of order. I didn't intend to enforce an order to GFA files in GFAkluge but it seems I've done it by accident for gap records (and probably edges as well). |
For a human genome: Thanks again, Eric! |
A GFA file ought to include both segments and gap records. It'd be preferable if gfakluge didn't segfault when encountering such a file.
The text was updated successfully, but these errors were encountered: