-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kmc_tools simple intersect & union, not considering reverse complement? #227
Comments
Hi, I am not fully understanding.
|
Hi Marek, Thank you for your prompt response; much appreciated. Great to learn that by default, KMC already considers reverse complement (RC). I'm currently working on improvise something and would like to share my approach with you, seeking your input on its viability. I'm utilizing resequencing data from 300 samples, encompassing various species/subspecies within Malus. My goal is to explore the phylogenetic relationships among these samples based on kmer similarities, first to generate a kmer similarity matrix for all samples. The steps involve kmer counting for each sample and kmer comparison for every pair of samples ( kmc -> kmc_dump -> kmer_comparison.py ). To optimize efficiency in terms of time and space, I'm contemplating using only the R1 read for each sample. So learned from what you said, if I used the -b mode, RC kmers from different samples might be treated as distinct kmers, thus introducing bias into the results. To clarify, for each comparison, I'm calculating shared (unique) kmers relative to total (unique) kmers. I'm pleased to learn that the software inherently considers both original and RC simultaneously by default. (oh, maybe still problematic... since the lower of these two was chosen , it may varied across samples.. ), I seems to me that kmc_tools simple intersection function does not consider RC? Moving on to another aspect of my project, I have a question about the appropriate values for -ci and -cx when comparing shared kmers between samples. I think -ci should be set at 2 to keep all shared ones and discard error-prone kmers occurring only once. As for -cx, I've used 1000, but I'd appreciate your thoughts on this choice. The kmer size is 21. Thank you once again for your assistance. |
Hello, in my project I would like to just calculate the kmer counts for one side of the reads, read_1.fq.gz, for example.
And then afterwards, I would like to compare kmer.dump across samples, I would like to calculate shared kmers relative to total unique kmers, to this end, considering reverse complement is important, but I found kmc_tools intersect does not consider reverse complemented kmers between samples? Or, I have to do kmer calculation for both reads for each sample, then compare, that is the only way?
Thanks very much.
The text was updated successfully, but these errors were encountered: