Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmc_tools filter does not accept large FASTA input #221

Open
mscharmann opened this issue Sep 8, 2023 · 1 comment
Open

kmc_tools filter does not accept large FASTA input #221

mscharmann opened this issue Sep 8, 2023 · 1 comment

Comments

@mscharmann
Copy link

mscharmann commented Sep 8, 2023

Hello,
first of all, thank you for giving us KMC and kmc_tools, which I use frequently. Now I am trying to retrieve contigs from a genome assembly which contain kmers from a database using kmc_tools filter (ver. 3.2.1, 2022-01-04). The input to kmc_tools filter is thus in fasta format. Multiple fasta records are in the file (hundreds/thousands) but each sequence is on a single line, not "wrapped" / multi-line. Some sequences are >10 mega-bases or 100 mega-bases long, and the entire fasta file is >1 Gb in size. The input file parameter -fa (nor the undocumented -fm) does not behave as the help message suggests... I always get an

"Error: Wrong input file!"

Edit: this seems to be specific to the very long sequences in both FASTA and FASTQ format; the command succeeds when the sequences therein are only tens of kb long. Faking my genome contigs into FASTQ format does not help.

Many thanks and best regards,
Mathias

@marekkokot
Copy link
Contributor

Hi, thank you for using KMC and for reporting this issue. I guess something is wrong with handling long sequences in kmc_tools. I will try to take a look. Would be really helpful if you could share some of your input files causing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants