Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STITCH crashed #84

Open
rmott opened this issue Jul 12, 2023 · 11 comments
Open

STITCH crashed #84

rmott opened this issue Jul 12, 2023 · 11 comments

Comments

@rmott
Copy link

rmott commented Jul 12, 2023

I've just run stitch on 474 recombinant inbred arabidopsis genomes. As test case I am using just the first 1Mb of cars (about 11k SNP sites), with K=19 (the number of founders of the population), without a reference panel. It crashes after 26 iterations with a message:

error: Mat::operator(): index out of bounds

Here is the output log:

source("stitch.R")
[2023-07-12 15:05:11] Running STITCH(chr = 1, nGen = 7, posfile = chr1.pos.txt, K = 19, S = 1, outputdir = ./tair10.STITCH/, nStarts = , tempdir = NA, bamlist = bamfiles.markdup.reduced.list, cramlist = , sampleNames_file = , reference = , genfile = , method = pseudoH
aploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = FALSE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = 1, regionEnd = 1000000, buffer = 1000, maxDifferenceBetweenReads = 1000, maxEmissi
onMatrixDifference = 10000000000, alphaMatThreshold = 0.0001, emissionThreshold = 0.0001, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = 25, nCores = 5, expRate = 0.5, maxRate = 100, minRate
= 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = NA, generateInputOnly = FALSE, restartIterations = NA, refillIterations = c(6, 10, 14, 18),
downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , referen
ce_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = NULL, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FAL
SE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting =
FALSE, output_haplotype_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 50000)
[2023-07-12 15:05:11] Program start
[2023-07-12 15:05:11] Get and validate pos and gen
[2023-07-12 15:05:12] Done get and validate pos and gen
[2023-07-12 15:05:12] There are 0 variants in the left buffer region -999 <= position < 1
[2023-07-12 15:05:12] There are 11406 variants in the central region 1 <= position <= 1000000
[2023-07-12 15:05:12] There are 15 variants in the right buffer region 1000000 < position <= 1001000
[2023-07-12 15:05:12] Get BAM sample names
[2023-07-12 15:05:13] Done getting BAM sample names
[2023-07-12 15:05:13] Generate inputs
[2023-07-12 15:05:17] Load and convert BAM 100 of 474
[2023-07-12 15:05:20] downsample sample A77622_1 - 158 of 26132 reads removed
[2023-07-12 15:05:21] downsample sample A77910_1 - 84 of 20825 reads removed
[2023-07-12 15:05:22] downsample sample A77529_1 - 209 of 30861 reads removed
[2023-07-12 15:05:22] Load and convert BAM 200 of 474
[2023-07-12 15:05:23] downsample sample A77530_1 - 138 of 24400 reads removed
[2023-07-12 15:05:27] downsample sample A77916_1 - 95 of 24217 reads removed
[2023-07-12 15:05:28] Load and convert BAM 300 of 474
[2023-07-12 15:05:30] downsample sample A77536_1 - 8 of 30203 reads removed
[2023-07-12 15:05:35] Load and convert BAM 400 of 474
[2023-07-12 15:05:43] downsample sample A77740_1 - 147 of 28398 reads removed
[2023-07-12 15:05:43] downsample sample A77930_1 - 62 of 48928 reads removed
[2023-07-12 15:05:51] downsample sample A77937_1 - 189 of 30677 reads removed
[2023-07-12 15:05:52] downsample sample A77752_1 - 75 of 22978 reads removed
[2023-07-12 15:05:55] downsample sample A77660_1 - 5 of 25248 reads removed
[2023-07-12 15:05:58] downsample sample A77759_1 - 71 of 21445 reads removed
[2023-07-12 15:06:01] downsample sample A77950_1 - 112 of 24604 reads removed
[2023-07-12 15:06:04] downsample sample A77669_1 - 133 of 24651 reads removed
[2023-07-12 15:06:20] downsample sample A77972_1 - 12 of 25401 reads removed
[2023-07-12 15:06:23] downsample sample A77690_1 - 7 of 27502 reads removed
[2023-07-12 15:06:28] downsample sample A77696_1 - 50 of 21002 reads removed
[2023-07-12 15:06:29] downsample sample A77794_1 - 119 of 22716 reads removed
[2023-07-12 15:06:31] downsample sample A77890_1 - 75 of 19710 reads removed
[2023-07-12 15:06:33] downsample sample A77986_1 - 126 of 25146 reads removed
[2023-07-12 15:06:37] downsample sample A77991_1 - 111 of 24507 reads removed
[2023-07-12 15:06:38] downsample sample A77614_1 - 79 of 19718 reads removed
[2023-07-12 15:06:41] downsample sample A77901_1 - 109 of 22712 reads removed
[2023-07-12 15:06:44] Done generating inputs
[2023-07-12 15:06:44] Copying files onto tempdir
[2023-07-12 15:06:53] Done copying files onto tempdir
[2023-07-12 15:06:53] Generate allele count
[2023-07-12 15:07:02] Quantiles across SNPs of per-sample depth of coverage
[2023-07-12 15:07:02] 5% 25% 50% 75% 95%
[2023-07-12 15:07:02] 3.816 5.908 7.569 9.315 12.293
[2023-07-12 15:07:02] Done generating allele count
[2023-07-12 15:07:02] Outputting will be done in 2 blocks with on average 5703 SNPs in them
[2023-07-12 15:07:02] Begin parameter initialization
[2023-07-12 15:07:02] Done parameter initialization
[2023-07-12 15:07:02] Initialize readProbs
[2023-07-12 15:07:11] Done initializing readProbs
[2023-07-12 15:07:11] Start EM
[2023-07-12 15:07:11] Number of samples: 474
[2023-07-12 15:07:11] Number of SNPs: 11421
[2023-07-12 15:07:11] Start of iteration 1
[2023-07-12 15:07:27] Start of iteration 2
[2023-07-12 15:07:43] Start of iteration 3
[2023-07-12 15:07:59] Start of iteration 4
[2023-07-12 15:08:16] Shuffle haplotypes - Iteration 4 - change on average 11 intervals out of 11 considered
[2023-07-12 15:08:16] Start of iteration 5
[2023-07-12 15:08:33] Start of iteration 6
[2023-07-12 15:08:49] Iteration - 6 - refill infrequently used haplotypes
[2023-07-12 15:08:49] Refill infrequently used haplotypes - on average, 29.6% of regions replaced
[2023-07-12 15:08:49] Start of iteration 7
[2023-07-12 15:09:05] Start of iteration 8
[2023-07-12 15:09:22] Shuffle haplotypes - Iteration 8 - change on average 17 intervals out of 17 considered
[2023-07-12 15:09:22] Start of iteration 9
[2023-07-12 15:09:39] Start of iteration 10
[2023-07-12 15:09:55] Iteration - 10 - refill infrequently used haplotypes
[2023-07-12 15:09:55] Refill infrequently used haplotypes - on average, 30.3% of regions replaced
[2023-07-12 15:09:55] Start of iteration 11
[2023-07-12 15:10:12] Start of iteration 12
[2023-07-12 15:10:29] Shuffle haplotypes - Iteration 12 - change on average 19 intervals out of 20 considered
[2023-07-12 15:10:29] Start of iteration 13
[2023-07-12 15:10:45] Start of iteration 14
[2023-07-12 15:11:01] Iteration - 14 - refill infrequently used haplotypes
[2023-07-12 15:11:01] Refill infrequently used haplotypes - on average, 26.3% of regions replaced
[2023-07-12 15:11:01] Start of iteration 15
[2023-07-12 15:11:17] Start of iteration 16
[2023-07-12 15:11:35] Shuffle haplotypes - Iteration 16 - change on average 19 intervals out of 21 considered
[2023-07-12 15:11:35] Start of iteration 17
[2023-07-12 15:11:51] Start of iteration 18
[2023-07-12 15:12:07] Iteration - 18 - refill infrequently used haplotypes
[2023-07-12 15:12:07] Refill infrequently used haplotypes - on average, 27.2% of regions replaced
[2023-07-12 15:12:07] Start of iteration 19
[2023-07-12 15:12:25] Start of iteration 20
[2023-07-12 15:12:43] Start of iteration 21
[2023-07-12 15:13:00] Start of iteration 22
[2023-07-12 15:13:17] Start of iteration 23
[2023-07-12 15:13:34] Start of iteration 24
[2023-07-12 15:13:51] Start of iteration 25
[2023-07-12 15:17:11] Split reads, average N=11 (0.045 %)
[2023-07-12 15:17:11] Start of iteration 26

error: Mat::operator(): index out of bounds

error: Mat::operator(): index out of bounds

error: Mat::operator(): index out of bounds

error: Mat::operator(): index out of bounds

error: Mat::operator(): index out of bounds
[2023-07-12 15:17:11] Error in forwardBackwardHaploid(sampleReads = sampleReads, eHapsCurrent_tc = eHapsCurrent_tc, :
Mat::operator(): index out of bounds

Error in check_mclapply_OK(single_iteration_results) :
An error occured during STITCH. The first such error is above
In addition: Warning message:
In mclapply(sampleRanges, mc.cores = nCores, FUN = subset_of_complete_iteration, :
all scheduled cores encountered errors in user code

@rwdavies
Copy link
Owner

Hey Richard!

As a general comment, I'm not sure I'd advise running pseudoHaploid anymore. I think for most instances, including here, diploid is probably the best bet.

The error looks legit, but is happening at iteration 26, which is just after iteration 25, which is the default iteration for splitReadIterations. My suggestion would be to try and turn this off (possibly by setting to NA, or -1, I can't remember). What this tries to do (splitReadIterations) is detect long reads that span ancestral recombination breakpoints and split them, for an overall better model fit. But I feel it is more hassle than it's worth. It's fine to disable it.

Best,
Robbie

@rmott
Copy link
Author

rmott commented Jul 12, 2023 via email

@rwdavies
Copy link
Owner

For truly haploid data, diploid isn't so bad. What about "diploid-inbred"? That outputs diploid but assumes the samples are haploid (I think this is what you want to do)

p.s. if you want to test it out, you might want to turn on plot_shuffle_haplotype_attempts, then send me the plots, we can discuss. the default parameters are tuned for human / mice, and I'm not sure about Ne / recomb rate for arabidopsis, you might want to e.g. set shuffle_bin_radius lower, we can discuss

Robbie

@rmott
Copy link
Author

rmott commented Jul 12, 2023 via email

@rwdavies
Copy link
Owner

Sure, do you want to come grab lunch at Somerville some day perhaps? I'm free quite often now during the summer

@rmott
Copy link
Author

rmott commented Jul 12, 2023 via email

@rwdavies
Copy link
Owner

how old is your bgzip? maybe update? bgzip is the easiest thing to install since sliced bread

this bit "unable to start device JPEG " is on you, google search for how to fix

you can potentially get around it using plotAfterImputation = FALSE I think, but plots are nice!

next week for lunch is fine, maybe email my gmail

@rmott
Copy link
Author

rmott commented Jul 12, 2023 via email

@rmott
Copy link
Author

rmott commented Jul 12, 2023 via email

@rwdavies
Copy link
Owner

http://www.htslib.org/download/

should be in htslib I think

@rwdavies
Copy link
Owner

I think also possibly

./scripts/install-dependencies.sh bgzip

should install a copy and symlink it to your STITCH github home dir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants