Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in priorSum_m[, s]: incorrect number of dimensions #98

Open
pdimens opened this issue Jun 26, 2024 · 12 comments
Open

error in priorSum_m[, s]: incorrect number of dimensions #98

pdimens opened this issue Jun 26, 2024 · 12 comments

Comments

@pdimens
Copy link

pdimens commented Jun 26, 2024

Hello, I've been running stitch a bunch and this is the first time I'm seeing this particular error:

...
[2024-06-25 09:56:53] Done generating inputs
[2024-06-25 09:56:53] Copying files onto tempdir
[2024-06-25 09:56:57] Done copying files onto tempdir
[2024-06-25 09:56:57] Generate allele count
[2024-06-25 09:57:04] Quantiles across SNPs of per-sample depth of coverage
[2024-06-25 09:57:04]     5%   25%   50%   75%   95%
[2024-06-25 09:57:04]  0.401 0.896 1.086 1.216 1.442
[2024-06-25 09:57:04] Done generating allele count
[2024-06-25 09:57:04] Outputting will be done in 21 blocks with on average 9833.9 SNPs in them
[2024-06-25 09:57:04] Begin parameter initialization
[2024-06-25 09:57:07] Done parameter initialization
[2024-06-25 09:57:07] Start EM
[2024-06-25 09:57:07] Number of samples: 384
[2024-06-25 09:57:07] Number of SNPs: 206512
[2024-06-25 09:57:07] Start of iteration 1
[2024-06-25 10:31:12] Start of iteration 2
[2024-06-25 11:07:39] Start of iteration 3
[2024-06-25 11:42:08] Start of iteration 4
Error in priorSum_m[, s] : incorrect number of dimensions
Calls: do.call ... <Anonymous> -> completeSampleIteration -> calculate_updates
In addition: Warning message:
In mclapply(sampleRanges, mc.cores = nCores, FUN = subset_of_complete_iteration,  :
  scheduled cores 2, 12 did not deliver results, all values of the jobs will be affected
Execution halted

Is this something you have seen before?

For context, this is how STITCH was run:

[2024-06-25 09:55:38] Running STITCH(chr = CM031720.1, nGen = 100, posfile = Impute/workflow/input/stitch/CM031720.1.stitch, K = 35, S = 10, outputdir = /local/workdir/pd348/shad_haplotag/Impute/modeldiploid_usebxTrue_bxlimit100000_k35_s10_ngen100/contigs/CM031720.1, nStarts = , tempdir = /local/workdir/pd348/shad_haplotag/Impute/modeldiploid_usebxTrue_bxlimit100000_k35_s10_ngen100/contigs/CM031720.1/tmp, bamlist = Impute/workflow/input/samples.list, cramlist = , sampleNames_file = , reference = , genfile = , method = diploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = FALSE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = NA, regionEnd = NA, buffer = NA, maxDifferenceBetweenReads = 1000, maxEmissionMatrixDifference = 1e+10, alphaMatThreshold = 1e-04, emissionThreshold = 1e-04, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = NA, nCores = 20, expRate = 0.5, maxRate = 100, minRate = 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = 39, generateInputOnly = FALSE, restartIterations = NA, refillIterations = c(6, 10, 14, 18), downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , reference_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = CM031720.1.vcf.gz, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FALSE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting = FALSE, output_haplotype_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 1e+05)
@rwdavies
Copy link
Owner

Offhand, no, I don't think I've seen this before. Did you try re-running it and see if it worked? Does it seem to be happening consistently for a given region?

I often see weird errors driven by running out of memory in a multicore job but those error messages look different normally. Any chance that could be an issue here?

@pdimens
Copy link
Author

pdimens commented Jul 1, 2024

Still not sure what caused this, but my guess is something to do with the container it ran in.

@pdimens
Copy link
Author

pdimens commented Jul 2, 2024

Nope, I'm getting this error consistently for these data. Is there some kind of debugging information I can provide to help diagnose this?

@Zilong-Li
Copy link
Collaborator

Hi,

For debugging, please first show the sessionInfo().

I would also suggest

  • turn off multithreading, set nCores=1
  • use default options as possible as you can
  • try different regions

I am guessing there probably are NAs in priorSum_m returned in HMM for your data after running few iterations.

@pdimens
Copy link
Author

pdimens commented Jul 2, 2024

@Zilong-Li here is the session info. In the meantime, I'm running it with a single core. Data's big, might take a while.

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Rocky Linux 8.6 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /lustre1/home2/nt246_0001/pd348/.snakemake/conda/cc0299658c70184c6a34b389b25ac216_/lib/libopenblasp-r0.3.27.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] STITCH_1.6.8 rrbgen_0.0.6

loaded via a namespace (and not attached):
[1] compiler_4.2.3 Rcpp_1.0.12

@rwdavies
Copy link
Owner

rwdavies commented Jul 5, 2024

Can you re-try setting S=1, while S > 1 should be supported, perhaps there are some weird edge case going on?

Maybe after that, can you re-try with the latest version of STITCH, just in case that fixes it?

Since it crashed on iteration 4, and that is when the "shuffle haplotype" heuristic comes on, you might also want to change this shuffleHaplotypeIterations = c(4, 8, 12, 16) to something like shuffleHaplotypeIterations = c(8, 12, 16) and see what happens

@pdimens
Copy link
Author

pdimens commented Jul 19, 2024

When S = 1, the error goes away. Any idea of why this might be happening?

@Zilong-Li
Copy link
Collaborator

Okay. As Robbie said, the main focus is on S=1. Probably there are some edge case for S > 1. If you can send some data over, we may take a look at them

@pdimens
Copy link
Author

pdimens commented Jul 19, 2024

The data I'm working with is quite large (380 samples). I'm not sure how I can feasibly share it.

@rwdavies
Copy link
Owner

Apologies, I don't have a specific idea of why it's occuring. It's almost certainly some bug / edge case for S > 1 that doesn't occur for S = 1.

If it's working for S = 1, I'd recommend that for now. S=1 is vastly better tested, S > 1 was more experimental, and while it can increase accuracy, it's generally of marginal to moderate importance.

If you can remind me in a week or so I might have some time to come back and try to check / augment the tests and see if I can find anything.

@pdimens
Copy link
Author

pdimens commented Sep 9, 2024

@rwdavies Hey, this is the reminder you requested.

I should also note that I ran STITCH with S = 1 and received the same error:

[2024-09-07 17:56:30] Running STITCH(chr = CM031716.1, nGen = 100, posfile = Impute/workflow/input/stitch/CM031716.1.stitch, K = 35, S = 1, outputdir = /local/workdir/pd348/shad_haplotag/Impute/modeldiploid/usebxTrue/k35_s1_ngen100_bxlimit750000/contigs/CM031716.1, nStarts = , tempdir = /local/workdir/pd348/shad_haplotag/Impute/modeldiploid/usebxTrue/k35_s1_ngen100_bxlimit750000/contigs/CM031716.1/tmp, bamlist = Impute/workflow/input/samples.list, cramlist = , sampleNames_file = , reference = , genfile = , method = diploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = FALSE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = NA, regionEnd = NA, buffer = NA, maxDifferenceBetweenReads = 1000, maxEmissionMatrixDifference = 1e+10, alphaMatThreshold = 1e-04, emissionThreshold = 1e-04, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = NA, nCores = 19, expRate = 0.5, maxRate = 100, minRate = 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = 39, generateInputOnly = FALSE, restartIterations = NA, refillIterations = c(6, 10, 14, 18), downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , reference_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = CM031716.1.vcf.gz, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FALSE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting = FALSE, output_haplotype_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 750000)
[2024-09-07 17:56:30] Program start
[2024-09-07 17:56:30] Get and validate pos and gen
[2024-09-07 17:56:31] Done get and validate pos and gen
[2024-09-07 17:56:31] Get BAM sample names
[2024-09-07 17:56:31] Done getting BAM sample names
[2024-09-07 17:56:31] Generate inputs
[2024-09-07 17:56:37] downsample sample PO135_1 - 25 of 93434 reads removed 
[2024-09-07 17:56:38] downsample sample KE665_3 - 16 of 122879 reads removed 
[2024-09-07 17:56:42] downsample sample ME329_1 - 23 of 84758 reads removed 
[2024-09-07 17:56:42] downsample sample DE600_1 - 68 of 116342 reads removed 
[2024-09-07 17:56:43] downsample sample PO150_3 - 62 of 99523 reads removed 
[2024-09-07 17:56:44] downsample sample DE626_4 - 20 of 140267 reads removed 
[2024-09-07 17:56:47] downsample sample CO071_1 - 81 of 167875 reads removed 
[2024-09-07 17:56:47] downsample sample SC011_4 - 20 of 148934 reads removed 
[2024-09-07 17:56:47] downsample sample SC002_1 - 19 of 127229 reads removed 
[2024-09-07 17:56:49] downsample sample SC007_1 - 14 of 122101 reads removed 
[2024-09-07 17:56:50] downsample sample HU017b_2 - 15 of 128150 reads removed 
[2024-09-07 17:56:52] downsample sample HU117_4 - 21 of 90680 reads removed 
[2024-09-07 17:56:53] downsample sample AW202_2 - 23 of 73669 reads removed 
[2024-09-07 17:56:53] downsample sample NM296_3 - 55 of 115818 reads removed 
[2024-09-07 17:56:53] downsample sample AW197_1 - 32 of 107871 reads removed 
[2024-09-07 17:56:57] downsample sample SJ022_3 - 20 of 39804 reads removed 
[2024-09-07 17:56:57] downsample sample HU017_1 - 20 of 134224 reads removed 
[2024-09-07 17:56:57] downsample sample RK106_1 - 22 of 68064 reads removed 
[2024-09-07 17:56:58] downsample sample DE601_1 - 25 of 96369 reads removed 
[2024-09-07 17:56:58] downsample sample PO153_3 - 50 of 94602 reads removed 
[2024-09-07 17:56:58] downsample sample NM264_1 - 12 of 67829 reads removed 
[2024-09-07 17:56:59] downsample sample ME334_2 - 20 of 77450 reads removed 
[2024-09-07 17:56:59] downsample sample ME361_4 - 25 of 80469 reads removed 
[2024-09-07 17:57:00] downsample sample ME331_1 - 24 of 66494 reads removed 
[2024-09-07 17:57:01] downsample sample PO134_1 - 51 of 94739 reads removed 
[2024-09-07 17:57:02] downsample sample BL053_1 - 30 of 82980 reads removed 
[2024-09-07 17:57:02] downsample sample CO078_2 - 4 of 87903 reads removed 
[2024-09-07 17:57:04] downsample sample PO132_1 - 23 of 86748 reads removed 
[2024-09-07 17:57:07] downsample sample BL075_3 - 23 of 101001 reads removed 
[2024-09-07 17:57:07] downsample sample PO160_4 - 39 of 72589 reads removed 
[2024-09-07 17:57:08] downsample sample HU017c_3 - 10 of 120148 reads removed 
[2024-09-07 17:57:08] downsample sample AW205_2 - 40 of 108672 reads removed 
[2024-09-07 17:57:10] downsample sample PO148_3 - 30 of 169883 reads removed 
[2024-09-07 17:57:11] downsample sample AW211_3 - 13 of 131175 reads removed 
[2024-09-07 17:57:12] downsample sample NM278_2 - 19 of 99331 reads removed 
[2024-09-07 17:57:15] downsample sample PO133_1 - 26 of 85501 reads removed 
[2024-09-07 17:57:17] downsample sample ME358_4 - 27 of 105247 reads removed 
[2024-09-07 17:57:19] downsample sample BL081_3 - 18 of 98195 reads removed 
[2024-09-07 17:57:19] downsample sample AW224_4 - 26 of 94961 reads removed 
[2024-09-07 17:57:20] downsample sample CO072_1 - 26 of 104917 reads removed 
[2024-09-07 17:57:21] downsample sample SC021_3 - 42 of 110142 reads removed 
[2024-09-07 17:57:21] downsample sample SC001_1 - 17 of 101151 reads removed 
[2024-09-07 17:57:24] downsample sample PO137_1 - 35 of 92338 reads removed 
[2024-09-07 17:57:26] downsample sample BL054_1 - 7 of 93337 reads removed 
[2024-09-07 17:57:26] downsample sample AW213_3 - 9 of 120168 reads removed 
[2024-09-07 17:57:27] downsample sample CO083_3 - 16 of 123768 reads removed 
[2024-09-07 17:57:28] downsample sample RK134_4 - 11 of 87199 reads removed 
[2024-09-07 17:57:31] downsample sample SJ002_1 - 76 of 94332 reads removed 
[2024-09-07 17:57:31] downsample sample BL072_2 - 32 of 95599 reads removed 
[2024-09-07 17:57:32] downsample sample RK105_1 - 68 of 81964 reads removed 
[2024-09-07 17:57:35] downsample sample PO144_2 - 48 of 75769 reads removed 
[2024-09-07 17:57:37] downsample sample ME343_3 - 21 of 119199 reads removed 
[2024-09-07 17:57:37] downsample sample SC010_4 - 27 of 137282 reads removed 
[2024-09-07 17:57:37] downsample sample SC027_4 - 37 of 125892 reads removed 
[2024-09-07 17:57:38] downsample sample PO158_4 - 35 of 142752 reads removed 
[2024-09-07 17:57:38] downsample sample DE620_3 - 179 of 102432 reads removed 
[2024-09-07 17:57:39] downsample sample RK128_3 - 37 of 116501 reads removed 
[2024-09-07 17:57:39] downsample sample KE661_2 - 27 of 61336 reads removed 
[2024-09-07 17:57:41] downsample sample CO087_3 - 27 of 79018 reads removed 
[2024-09-07 17:57:43] downsample sample NM291_2 - 18 of 60979 reads removed 
[2024-09-07 17:57:43] downsample sample CO091_4 - 27 of 145201 reads removed 
[2024-09-07 17:57:43] downsample sample RK131_3 - 18 of 71609 reads removed 
[2024-09-07 17:57:44] downsample sample AW225_4 - 46 of 67086 reads removed 
[2024-09-07 17:57:45] downsample sample SC006_1 - 35 of 130622 reads removed 
[2024-09-07 17:57:46] downsample sample PO140_2 - 40 of 124692 reads removed 
[2024-09-07 17:57:46] downsample sample ME335_2 - 10 of 77056 reads removed 
[2024-09-07 17:57:46] downsample sample CO069_1 - 82 of 87657 reads removed 
[2024-09-07 17:57:47] downsample sample DE611_2 - 9 of 67965 reads removed 
[2024-09-07 17:57:49] downsample sample SJ025_4 - 34 of 67762 reads removed 
[2024-09-07 17:57:50] downsample sample HU024_1 - 27 of 99320 reads removed 
[2024-09-07 17:57:50] Load and convert BAM 100 of 380
[2024-09-07 17:57:51] downsample sample ME338_2 - 30 of 93122 reads removed 
[2024-09-07 17:57:51] downsample sample NM259_1 - 22 of 101537 reads removed 
[2024-09-07 17:57:52] Load and convert BAM 200 of 380
[2024-09-07 17:57:52] downsample sample AW196_1 - 52 of 122552 reads removed 
[2024-09-07 17:57:56] downsample sample CO098_4 - 10 of 137063 reads removed 
[2024-09-07 17:57:56] downsample sample DE602_1 - 25 of 110334 reads removed 
[2024-09-07 17:57:56] downsample sample BL080_3 - 19 of 116061 reads removed 
[2024-09-07 17:57:57] downsample sample KE648_1 - 21 of 41157 reads removed 
[2024-09-07 17:57:57] Load and convert BAM 300 of 380
[2024-09-07 17:57:59] downsample sample BL068_2 - 18 of 63629 reads removed 
[2024-09-07 17:58:00] downsample sample CO079_2 - 71 of 112925 reads removed 
[2024-09-07 17:58:01] downsample sample CO086_3 - 20 of 86955 reads removed 
[2024-09-07 17:58:04] downsample sample PO155_4 - 99 of 156512 reads removed 
[2024-09-07 17:58:06] downsample sample AW196d_4 - 49 of 178058 reads removed 
[2024-09-07 17:58:11] Done generating inputs
[2024-09-07 17:58:11] Copying files onto tempdir
[2024-09-07 17:58:16] Done copying files onto tempdir
[2024-09-07 17:58:16] Generate allele count
[2024-09-07 17:58:28] Quantiles across SNPs of per-sample depth of coverage
[2024-09-07 17:58:28]     5%   25%   50%   75%   95%
[2024-09-07 17:58:28]  0.439 0.942 1.097 1.213 1.415
[2024-09-07 17:58:28] Done generating allele count
[2024-09-07 17:58:28] Outputting will be done in 31 blocks with on average 9897.7 SNPs in them
[2024-09-07 17:58:28] Begin parameter initialization
[2024-09-07 17:58:29] Done parameter initialization
[2024-09-07 17:58:29] Start EM
[2024-09-07 17:58:29] Number of samples: 380
[2024-09-07 17:58:29] Number of SNPs: 306828
[2024-09-07 17:58:29] Start of iteration 1
[2024-09-07 18:03:31] Start of iteration 2
[2024-09-07 18:08:42] Start of iteration 3
[2024-09-07 18:14:01] Start of iteration 4
[2024-09-07 18:19:36] Shuffle haplotypes - Iteration 4 - change on average 366 intervals out of 1042 considered
[2024-09-07 18:19:36] Start of iteration 5
[2024-09-07 18:24:50] Start of iteration 6
[2024-09-07 18:30:33] Iteration - 6 - refill infrequently used haplotypes
[2024-09-07 18:30:34] Refill infrequently used haplotypes - on average, 0.1% of regions replaced
[2024-09-07 18:30:34] Start of iteration 7
[2024-09-07 18:35:55] Start of iteration 8
Error in priorSum_m[, s] : incorrect number of dimensions
Calls: do.call ... <Anonymous> -> completeSampleIteration -> calculate_updates
In addition: Warning message:
In mclapply(sampleRanges, mc.cores = nCores, FUN = subset_of_complete_iteration,  :
  scheduled core 7 did not deliver a result, all values of the job will be affected
Execution halted

@rwdavies
Copy link
Owner

Thanks. I'm very confused as to how this could throw an error. This code which contains priorSum_m[, s] is rather innocuous

    ## normalize prior
    minPriorSum <- (1 / K) / 100 ## this seems arbitrary!
    for(s in 1:S) {
        priorSum_m[, s] <- priorSum_m[, s] / sum(priorSum_m[, s])
        ## make min
        priorSum_m[, s][priorSum_m[, s] < minPriorSum] <- minPriorSum
        priorSum_m[, s] <- priorSum_m[, s] / sum(priorSum_m[, s])
    }

it's very hacky, but if you're willing, would you be able to add some print statements, then re-compile the code and re-run?

For instance, you could replace the above with something like the below, or possibly even save priorSum_m in the loop before the error, and take a look afterwards

    ## normalize prior
    minPriorSum <- (1 / K) / 100 ## this seems arbitrary!
    for(s in 1:S) {
        print(priorSum_m) ## temporary code
        print(dim(priorSum_m[, s]) ## temporary code
        priorSum_m[, s] <- priorSum_m[, s] / sum(priorSum_m[, s])
        ## make min
        priorSum_m[, s][priorSum_m[, s] < minPriorSum] <- minPriorSum
        priorSum_m[, s] <- priorSum_m[, s] / sum(priorSum_m[, s])
    }

to re-compile STITCH hopefully this would work for you

## cd to where the STITCH repo is installed
## I am assuming dependencies are already installed otherwise see repo README
./scripts/build-and-install.R

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants