You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
% sourmash sig summarize collections/ncbi-viruses.mf.csv
== This is sourmash version 4.8.14. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
** loading from 'collections/ncbi-viruses.mf.csv'
path filetype: StandaloneManifestIndex
location: collections/ncbi-viruses.mf.csv
is database? yes
has manifest? yes
num signatures: 692919
** examining manifest...
total hashes: 606952063
summary of sketches:
230973 sketches with skipm2n3, k=24, scaled=50 303273376 total hashes
230973 sketches with DNA, k=21, scaled=50 151805251 total hashes
230973 sketches with DNA, k=31, scaled=50 151873436 total hashes
Execution time/memory:
Command being timed: "sourmash scripts gbsketch collections/ncbi-viruses.links.csv -n 9 -r 10 -p skipm2n3,k=24,scaled=50 -p dna,k=21,k=31,scaled=50 --failed gbsketch-fail.ncbi-viruses.txt --checksum-fail gbsketch-check-fail.ncbi-viruses.txt -o databases/ncbi-viruses.skip_m2n3.k24.zip -c 1 --batch 1000"
User time (seconds): 3942.62
System time (seconds): 644.22
Percent of CPU this job got: 77%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:38:06
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4387888
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7
Minor (reclaiming a frame) page faults: 3025881
Voluntary context switches: 20260228
Involuntary context switches: 23931
Swaps: 0
File system inputs: 0
File system outputs: 11742808
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
The text was updated successfully, but these errors were encountered:
viral genome databases are now available on farm as well as for download! 🎉
TODO:
files
location:
/group/ctbrowngrp5/sourmash-db/ncbi-viruses-2025.01
The .sig.zip databases are here:
note that these databases are scaled=50, and include skipmer-m2n3 databases using the parameters that seemed to perform well based on https://github.com/sourmash-bio/2024-ictv-challenge-sourmash.
they are available for download here:
ncbi-viruses-2025.01/ncbi-viruses.dna.k=21.scaled=50.sig.zip
ncbi-viruses-2025.01/ncbi-viruses.dna.k=31.scaled=50.sig.zip
ncbi-viruses-2025.01/ncbi-viruses.lineages.csv
ncbi-viruses-2025.01/ncbi-viruses.skip_m2n3.k=24.scaled=50.sig.zip
build repos and scripts
for sketching, I used the code in https://github.com/sourmash-bio/2025-sourmash-ncbi-viral-databases to get a list of all viral genomes and sketch them with directsketch.
content summary
Execution time/memory:
The text was updated successfully, but these errors were encountered: