-
Notifications
You must be signed in to change notification settings - Fork 11
Example 1
This example illustrates TBA
Step 1. Download the simulated summary-level data example1.mat
.
The example1.mat
contains the following data.
-
betahat
: 12758 by 1 vector, the single-SNP effect size estimate for each SNP -
se
: 12758 by 1 vector, the standard errors of the single-SNP effect size estimates -
Nsnp
: 12758 by 1 vector, the sample size of each SNP -
R
: 12758 by 12758 matrix, the LD matrix estimated from a reference panel -
bwd
: integer, the bandwidth of the matrixR
-
BR
: (bwd
+1) by 12758 matrix, the banded storage of the matrixR
Note that only betahat
, se
, Nsnp
and R
are needed for model fitting. The other two quantities, bwd
and BR
, are used in SNP heritability calculation.
Step 2. Check the "small effects" model assumption.
Using the summary data, we can compute the squared sample correlation between the phenotype and each SNP. We check the "small effects" assumption by looking at these marginal squared correlation values.
>> chatsqr = (betahat(:).^2) ./ (Nsnp(:).*(se(:).^2) + betahat(:).^2);
>> disp(prctile(log10(chatsqr), 0:25:100));
-11.6029 -4.1154 -3.4721 -2.9962 -1.5982
Since our data is simulated from genotypes on a single chromosome, the simulated effect sizes per SNP are larger than would be expected in a typical GWAS (Table 1 in our paper).
Step 3. Fit the RSS-BVSR, RSS-BSLMM and RSS-ASH model via MCMC.
To fit the RSS-BVSR and RSS-BSLMM model, only the length of MCMC simulation is needed.
Ndraw = 2e6;
Nburn = 2e5;
Nthin = 9e1;
[betasam, gammasam, hsam, logpisam, Naccept] = rss_bvsr(betahat, se, R, Nsnp, Ndraw, Nburn, Nthin);
[bsam, zsam, lpsam, hsam, rsam, Naccept] = rss_bslmm(betahat, se, R, Nsnp, Ndraw, Nburn, Nthin);
The fitting of RSS-ASH model requires a grid for the prior standard deviations for the effect sizes.
Ndraw = 5e7;
Nburn = 1e7;
Nthin = 1e3;
sigma_beta = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3];
[bsam, zsam, wsam, lsam, Naccept] = rss_ash(betahat, se, R, Nsnp, sigma_beta, Ndraw, Nburn, Nthin);
Step 4. Estimate the SNP heritability.
We now use the posterior of multiple-SNP effect sizes to estimate the SNP heritability.
M = length(hsam); % the length of posterior simulations
pvesam = zeros(M,1); % preallocate the pve posterior estimates
for i = 1:M
pvesam(i) = compute_pve(bsam(i,:), betahat, se, Nsnp, bwd, BR, 1);
end
Recall that the SNP heritability estimator (Equation 3.8 in our paper) involves vector-matrix-vector product. To speed the calculation, we exploit the banded structure of R
and use the banded version of vector-matrix-vector product implemented in lapack
. Hence, the banded storage BR
, instead of the original form R
, is used to calculate the SNP heritability.
Step 5. Summarize the results.
The data is simulated with the true SNP heritability (PVE) being 0.2. The following table summarizes the posterior estimates (with 95% credible interval) and the total computational time (including MCMC iterations and PVE calculations) for the three models.
Model | PVE estimation | Total time |
---|---|---|
RSS-BVSR | 0.200 [0.125, 0.290] | 1.38 hours |
RSS-BSLMM | 0.216 [0.136, 0.306] | 2.52 hours |
RSS-ASH | 0.197 [0.114, 0.286] | 6.69 hours |
The following histograms depict the posterior distributions of estimated SNP heritability under the three models.