Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference Panel download #66

Open
Shrishtee-kandoi opened this issue Jan 23, 2024 · 8 comments
Open

Reference Panel download #66

Shrishtee-kandoi opened this issue Jan 23, 2024 · 8 comments

Comments

@Shrishtee-kandoi
Copy link

Hi Minimac4 Team,

I am currently facing challenges while attempting to run "minimac4" to impute my genotype files on our High-Performance Cluster. I have encountered a couple of issues that I believe may require your expertise to resolve.

Reference Panel Download:
I attempted to download the reference panel from the provided link on the Minimac4 wiki page .. However, despite connecting to the FTP server, the download does not initiate. I would appreciate guidance on the correct procedure or any potential troubleshooting steps.

Target Study VCF File:
I want to confirm if the "targetStudy.vcf" file mentioned in the documentation refers to the VCF file generated from Plink files using the following commands from :

plink --bfile YOURFILE --keep-allele-order --freq --out YOURFILE.output --allow-no-sex
plink --bfile YOURFILE --recode vcf --out YOURFILE.output_file --keep-allele-order
vcf-sort YOURFILE.output_file.vcf | bgzip -c > pre_impute_YOURFILE.vcf.gz

Is this the correct process for creating the target VCF file for Minimac4 imputation?

Imputation Code:
For imputation, I am using the following code:

minimac4 --refHaps refPanel.m3vcf \
         --haps targetStudy.vcf \
         --prefix testRun \
         --cpus 5

Is "refPanel.m3vcf" downloadable from the link above?
and Can I substitute "targetStudy.vcf" with "pre_impute_YOURFILE.vcf.gz" in this command?

Your assistance in resolving these issues would be immensely valuable, and I appreciate your time and support in advance.

Thank you!

@jonathonl
Copy link
Contributor

That wiki is legacy documentation. Use the readme in this repo instead.

You can use the https protocol instead of ftp to download reference panel: https://share.sph.umich.edu/minimac4/panels/.

You must index your target VCF as weill (tabix -p vcf pre_impute_YOURFILE.vcf.gz).

The minimac4 command you reference will work with the correct reference panel, but is deprecated. See readme for commands to use with latest version. You will be using and *.msav reference panel instead of and *.m3vcf.gz.

@Shrishtee-kandoi
Copy link
Author

Thanks for getting back! I was able to download the reference panels.

I am now following the command lines exactly as outlined in the readme file:

minimac4 1000g_phase3_v5.chr14.with_parameter_estimates.msav pre_impute_YOURFILE.vcf.gz > imputed_YOURFILE.sav

However, it seems to be disregarding the parameters, and I'm receiving the following warnings:

WARNING - 
Problems encountered parsing command line:

Command line parameter 1000g_phase3_v5.chr14.with_parameter_estimates.msav (#1) ignored
Command line parameter pre_impute_upenn_ucla_mssm_impute_chr14.output_file.vcf.gz (#2) ignored

The same issue persists when using the command:
minimac4 1000g_phase3_v5.chr14.with_parameter_estimates.msav pre_impute_upenn_ucla_mssm_impute_chr14.output_file.vcf.gz -o imputed.vcf.gz

Am I required to include additional flags, or is there something else I might be overlooking?

@jonathonl
Copy link
Contributor

I think you are using an old version of minimac4. See the latest at https://github.com/statgen/Minimac4/releases.

@Shrishtee-kandoi
Copy link
Author

Thanks! I was waiting for our cluster to update the module. It works now!!
I also have a last question: Does minimac4 provide QC results and information on excluded SNPs similar to that of the Imputation server?

@jonathonl
Copy link
Contributor

No, the Imputation Server uses it's own routines for the QC preprocessing step (which includes variant and chunk exclusion). The only metrics that Minimac4 will provide are in the INFO fields of the imputed results (R2, ER2, AVG_CS). You can get a sites-only version of the results with the --sites option, which produces a VCF with these INFO fields but no genotype data. This file is also generated automatically when using the --prefix option.

@Shrishtee-kandoi
Copy link
Author

Awesome! Thank you.

@Shrishtee-kandoi
Copy link
Author

Hi Jonathon,

I've updated my files to the hg38 build recently. The link you provided before (https://share.sph.umich.edu/minimac4/panels/) has reference files for 1000g_phase3_v5, which is for hg19. Can you guide me to the reference panel for hg38, specifically the one for 1000 Genomes Phase 3 (Hg38)?

Thank you!

@jonathonl
Copy link
Contributor

We do not yet host a b38 panel. You would have to generate one on your own using a phased 1000g call set with minimac4 --compress-reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants