Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

munmap_chunk(): invalid pointer on ubam file - KMC crashes on BAM with long reads #230

Open
tbenavi1 opened this issue Feb 29, 2024 · 4 comments
Labels

Comments

@tbenavi1
Copy link

Hello,

I am running the following KMC command:

kmc -k31 -t42 -m50 -sm -ci1 -cs100000000 -fbam data.u.bam data tmp

and I receive the following error at the end of stage 1:

Stage 1: 100%
munmap_chunk(): invalid pointer

I was wondering if you knew how to fix this? Is there an email where I can send the example file that causes this issue? I don't want to share the data publicly. Thank you.

@marekkokot marekkokot added the bug label Mar 1, 2024
@marekkokot marekkokot changed the title munmap_chunk(): invalid pointer on ubam file munmap_chunk(): invalid pointer on ubam file - KMC crashes on BAM with long reads Mar 1, 2024
@marekkokot
Copy link
Contributor

This bug is a little nasty because it is a consequence of some assumptions we made developing the first versions of KMC, i.e., that reads are short.
Later, we added support for long reads but still need to add this for bam file format.
Because this bug is quite complex, I don't know how fast we can fix it.
So, for now, the best option is to convert BAMs to fastq with samtools and use it as kmc input.

@marekkokot
Copy link
Contributor

For future me:

it seems there is a buffer overflow in skipSingleBGZFBlock.

@tbenavi1
Copy link
Author

Hello, I have a followup issue/question. We are trying to save space on our cluster, so if possible I would not like to have to save the output when converting BAMS to fasta/q. So I tried to run KMC with bash process substitution. For example,

kmc -k31 -t42 -m50 -sm -ci1 -cs100000000 -fm <(samtools fasta -@ 42 file.ubam) db tmp

However, I get the error:

Error: Error: /dev/fd/63 is not a file

which I believe comes from

ostr << "Error: " << f_name << " is not a file";

Is there any way to update KMC to allow it to take process substitution as input? Thanks for any information.

@marekkokot
Copy link
Contributor

marekkokot commented Apr 24, 2024

This is a little more complex.
KMC reads input files twice. The first time, only a very small portion of it for adjustments for better balancing.
After this file is closed and reopened for real processing.
This makes KMC unsuitable for streaming/pipe processing :(
We know this is quite a limitation, and we will do our best to make KMC work in pipe mode in the future.

We have this unstable branch here: https://github.com/refresh-bio/KMC/tree/experimental/stbm
In this branch there is a parameter -sss if you set it to -sssmin_hash it should work. For example, it seems to work:

bin/kmc -sssmin_hash -k27 <(cat in.fq) o .

Keep in mind that this branch is not production-ready. We use it for testing and experiments, and it may disappear at some point.
I'm also not sure about its performance etc.
I think it should just work, so you may try it.
I am not sure if the strict memory mode (-sm that you use) works fine on this branch. but 50GB (-m50) should be fine without this parameter.
If you spot any issues let me know, although I am not sure when we will fix them.
Let me know if you will try with this :)

Edit: Also, on this branch, kmc_tools may not work if you use -sssmin_hash. I don't remember the details now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants