kmc_tools filter does not accept large FASTA input #221

mscharmann · 2023-09-08T08:26:31Z

Hello,
first of all, thank you for giving us KMC and kmc_tools, which I use frequently. Now I am trying to retrieve contigs from a genome assembly which contain kmers from a database using kmc_tools filter (ver. 3.2.1, 2022-01-04). The input to kmc_tools filter is thus in fasta format. Multiple fasta records are in the file (hundreds/thousands) but each sequence is on a single line, not "wrapped" / multi-line. Some sequences are >10 mega-bases or 100 mega-bases long, and the entire fasta file is >1 Gb in size. The input file parameter -fa (nor the undocumented -fm) does not behave as the help message suggests... I always get an

"Error: Wrong input file!"

Edit: this seems to be specific to the very long sequences in both FASTA and FASTQ format; the command succeeds when the sequences therein are only tens of kb long. Faking my genome contigs into FASTQ format does not help.

Many thanks and best regards,
Mathias

marekkokot · 2023-09-08T09:26:09Z

Hi, thank you for using KMC and for reporting this issue. I guess something is wrong with handling long sequences in kmc_tools. I will try to take a look. Would be really helpful if you could share some of your input files causing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kmc_tools filter does not accept large FASTA input #221

kmc_tools filter does not accept large FASTA input #221

mscharmann commented Sep 8, 2023 •

edited

Loading

marekkokot commented Sep 8, 2023

kmc_tools filter does not accept large FASTA input #221

kmc_tools filter does not accept large FASTA input #221

Comments

mscharmann commented Sep 8, 2023 • edited Loading

marekkokot commented Sep 8, 2023

mscharmann commented Sep 8, 2023 •

edited

Loading