Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split "kmerseek index" into "index-create-sketch", "index-get-kmers", "index-rocksdb" #7

Open
olgabot opened this issue Feb 14, 2025 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers Python Only involves writing Python code

Comments

@olgabot
Copy link
Contributor

olgabot commented Feb 14, 2025

Currently kmerseek index wraps three commands in Sourmash/Branchwater:

  • index-create-sketch: sourmash scripts manysketch (uses as many CPUs as available, low-medium memory)
  • index-get-kmers: sourmash sig kmers (takes a long time, but only 1 CPU and low memory)
  • index-create-rocksdb: sourmash scripts index (uses as many CPUs as available, low-medium memory)

These were combined for convenience on the command line, but for creating Nextflow pipelines, we probably want to add the option to separate them so that they can be put into separate processes. For example, sourmash sig kmers only uses one CPU and takes a really long time, like 24hrs+ on larger datasets, but shouldn't be blocking on proceeding with doing kmerseek search.

To clarify, kmerseek-index would still wrap the call to all three of index-create-sketch, index-get-kmers, index-rocksdb.

Notice that each command is named search-VERB-something -- I like this style for clarity. Also taking suggestions on naming, the hardest problem!

@heuermh
Copy link

heuermh commented Feb 19, 2025

Notice that each command is named search-VERB-something -- I like this style for clarity. Also taking suggestions on naming, the hardest problem!

I think this sounds reasonable. It is also compatible with tab-completion, which should be something provided with kmerseek.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers Python Only involves writing Python code
Projects
None yet
Development

No branches or pull requests

2 participants