You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want the unitigs induced by a subset of the kmers from a read set. Is there any way to create a file of kmers that you can read into bcalm2, and then compact into unitigs?
In fact I want to pick out kmers that I think are single copy in a diploid, based on depth. So a simple way to do that would be for you to allow me to give a maximum as well as a minimum copy number for each kmer. I would then set those by looking at a histogram. But ultimately I would like the freedom to correct the kmer counts by GC content (of their parent read pairs), so it would be more general to allow me to pass you a set of kmers.
I guess I could make a fasta file out of them, with one kmer per sequence...
Thanks, Richard
The text was updated successfully, but these errors were encountered:
For now, you can specify a minimum/maximum kmer abundance threshold using the options -abundance-min A -abundance-max B, noting that it will keep only kmers of abundance A<=x<=B.
Otherwise, the input formats of Bcalm are fasta/fastq reads, or a custom HDF5 file that contains counted k-mers. That format isn't that easy to create from scratch, but not impossible for us to maybe write a custom program for that. In that format, Bcalm expects that kmers are partitioned according to their minimizer.
Giving as input a fasta file with one kmer per sequence would be an acceptable solution, bearing in mind that abundance would be lost (unless the kmer is repeated as many times as its abundance, which isn't very elegant).
What's the volume of kmers that you wish to give to Bcalm: 10's of millions, billions?
Also, in what format do you have the counted k-mers right now?
Hello Rayan et al,
I want the unitigs induced by a subset of the kmers from a read set. Is there any way to create a file of kmers that you can read into bcalm2, and then compact into unitigs?
In fact I want to pick out kmers that I think are single copy in a diploid, based on depth. So a simple way to do that would be for you to allow me to give a maximum as well as a minimum copy number for each kmer. I would then set those by looking at a histogram. But ultimately I would like the freedom to correct the kmer counts by GC content (of their parent read pairs), so it would be more general to allow me to pass you a set of kmers.
I guess I could make a fasta file out of them, with one kmer per sequence...
Thanks, Richard
The text was updated successfully, but these errors were encountered: