Split "kmerseek index" into "index-create-sketch", "index-get-kmers", "index-rocksdb" #7
Labels
enhancement
New feature or request
good first issue
Good for newcomers
Python
Only involves writing Python code
Currently
kmerseek index
wraps three commands in Sourmash/Branchwater:index-create-sketch
:sourmash scripts manysketch
(uses as many CPUs as available, low-medium memory)index-get-kmers
:sourmash sig kmers
(takes a long time, but only 1 CPU and low memory)index-create-rocksdb
:sourmash scripts index
(uses as many CPUs as available, low-medium memory)These were combined for convenience on the command line, but for creating Nextflow pipelines, we probably want to add the option to separate them so that they can be put into separate processes. For example,
sourmash sig kmers
only uses one CPU and takes a really long time, like 24hrs+ on larger datasets, but shouldn't be blocking on proceeding with doingkmerseek search
.To clarify,
kmerseek-index
would still wrap the call to all three ofindex-create-sketch
,index-get-kmers
,index-rocksdb
.Notice that each command is named
search-VERB-something
-- I like this style for clarity. Also taking suggestions on naming, the hardest problem!The text was updated successfully, but these errors were encountered: