Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading during M3VCF/MSAV Generation? #62

Open
mragsac opened this issue Jul 28, 2023 · 3 comments
Open

Multithreading during M3VCF/MSAV Generation? #62

mragsac opened this issue Jul 28, 2023 · 3 comments

Comments

@mragsac
Copy link

mragsac commented Jul 28, 2023

I am trying to generate a custom reference with Minimac3 (M3VCF) and Minimac4 (MSAV) and was wondering if the operations to do so can be enabled to be/are possibly multithreaded?

Commands to Generate Reference Files

# Minimac3
Minimac3 --refHaps chr${chr}.vcf.gz --processReference --prefix m3vcfs/chr${chr} --myChromosome {chr_prefix} --rsid

# Minimac4
minimac4 --compress-reference reference.{sav,bcf,vcf.gz} > reference.msav

When I try using the --cpus flag, it doesn't seem like the CPUs I have available are being used when I'm checking on things with htop...

@yukt
Copy link
Contributor

yukt commented Jul 28, 2023 via email

@mragsac
Copy link
Author

mragsac commented Jul 28, 2023

Thank you for your speedy reply!!

Is it expected that a single chromosome from an imputation panel would take multiple days to compress to the M3VCF or MSAV format? I'm trying to understand if there are issues on my end in running things or if this is expected behavior ...

@jonathonl
Copy link
Contributor

jonathonl commented Jul 28, 2023

Yes, It can take a long time for large reference panels. With Minimac4, you can speed up the compression by using multiple processes (instead of threads) and then concatenating the chunks:

bcftools view  chr1.vcf.gz -Ou -r chr1:1-10000000 -i 'POS>=1' | minimac4 --compress-reference /dev/stdin > chr1_1_10000000.msav
bcftools view  chr1.vcf.gz -Ou -r chr1:10000001-20000000 -i 'POS>=10000001' | minimac4 --compress-reference /dev/stdin > chr1_10000001_20000000.msav
 ...
sav concat $( ls chr1_*.msav | sort -V ) -o chr1.msav

I don't know for sure whether this approach is possible for minimac3.

bcftools: https://github.com/samtools/bcftools
sav: https://github.com/statgen/savvy/releases/download/v2.1.0/savvy-2.1.0-Linux-x86_64-cli.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants