You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running Drop-seq encapsulated in Demuxafy,and I find little SNP can pass AssignCellsToSamples, mainly occurred in MissingTagFilteringIterator steps. I further discovered that this was because my TagReadWithGeneFunction step did not work,while the output did not report any errors.I took a few of them as follows: INFO 2025-01-18 11:48:20 TagReadWithGeneFunction Processed 280,000,000 records. Elapsed time: 00:54:39s. Time for last 1,000,000: 11s. Last read position: chrX:119,577,305 INFO 2025-01-18 11:48:32 TagReadWithGeneFunction Processed 281,000,000 records. Elapsed time: 00:54:50s. Time for last 1,000,000: 11s. Last read position: chrX:141,178,215 INFO 2025-01-18 11:48:45 TagReadWithGeneFunction Processed 282,000,000 records. Elapsed time: 00:55:03s. Time for last 1,000,000: 13s. Last read position: chrX:154,400,619 INFO 2025-01-18 11:48:57 TagReadWithGeneFunction Processed 283,000,000 records. Elapsed time: 00:55:15s. Time for last 1,000,000: 12s. Last read position: KI270733.1:135,046 INFO 2025-01-18 12:15:13 TagReadWithGeneFunction Processed 441,000,000 records. Elapsed time: …… 01:21:32s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:23 TagReadWithGeneFunction Processed 442,000,000 records. Elapsed time: 01:21:41s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:32 TagReadWithGeneFunction Processed 443,000,000 records. Elapsed time: 01:21:51s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:42 TagReadWithGeneFunction Processed 444,000,000 records. Elapsed time: 01:22:00s. Time for last 1,000,000: 9s. Last read position: */* [Sat Jan 18 12:15:51 UTC 2025] org.broadinstitute.dropseqrna.metrics.TagReadWithGeneFunction done. Elapsed time: 82.16 minutes. Runtime.totalMemory()=2149580800
The codes and resulting dropulation_tag_bam file are as follows: apptainer exec Demuxafy.sif TagReadWithGeneFunction \ ANNOTATIONS_FILE=refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf \ INPUT=merge_sorted.bam \ OUTPUT=dropulation_tag_bam.bam A00984:468:HCFKMDSX2:3:1659:24596:1658 0 chr1 10029 0 82M8S * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACACTAACCCAAACCCTAACACTAACACAAAACAAAAA FFF,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:F,FF,,F,FF,F,,FFF,,,FF,,,,:F,,:,FF, CB:Z:TGCGGGTGTGCCTTTC-1 UB:Z:TTATCCGGTAGC RE:A:I XF:Z:INTERGENIC RG:Z:20k_NSCLC_DTC_3p_nextgem_Multiplex_intron:0:1:HCFKMDSX2:3-7123AC16 NH:i:6 HI:i:1 nM:i:4 CR:Z:TGCGGGTGTGCCTTTC UR:Z:TTATCCGGTAGC AS:i:72 CY:Z:FFFFFFFFFFF,,FFF UY:Z:FFF:FFFF:FFF xf:i:0 A00984:467:HC7WWDSX2:4:2236:12572:25269 16 chr1 10535 1 63M27S * 0 0 GTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTCCCCATGTACTCTGCGTTGATACCACT FFFF:FFFFF,FF:FFFFFFFFFFFFFFFFF:F:FFFF,F,FFFFFFFFFFFFFFF,FFFFF:F:FFFFFFF:FFFFFFFF:FFFFFFFF CB:Z:GTGCGTGTCGACATAC-1 UB:Z:CTGCAGGCCACA RE:A:I XF:Z:INTERGENIC RG:Z:20k_NSCLC_DTC_3p_nextgem_Multiplex_intron:0:1:HC7WWDSX2:4 NH:i:3 HI:i:1 nM:i:1 CR:Z:GTGCGTGTCGACATAC UR:Z:CTGCAGGCCACA AS:i:60 CY:Z:FFFFFFFFFFFFFFFF UY:Z:FFFF,FFFFFFF xf:i:0 ts:i:26
And only 3 SNP pass in AssignCellsToSamples step, codes and the output of tools are as follows: apptainer exec Demuxafy.sif Dropulation_AssignCellsToSamples.py --CELL_BC_FILE merge_barcode.tsv \ --INPUT_BAM dropulation_tag_bam.bam \ --OUTPUT assignments.tsv.gz \ --VCF result_sorted.vcf \ --SAMPLE_FILE donor_list.txt \ --CELL_BARCODE_TAG CB \ --MOLECULAR_BARCODE_TAG UB \ --VCF_OUTPUT assignment.vcf \ --MAX_ERROR_RATE 0.05
INFO 2025-01-18 09:12:39 SNPUMIBasePileupIterator Processed 756,000,000 records. Elapsed time: 00:22:36s. Time for last 1,000,000: 1s. Last read position: */*. Last read name: A00984:468:HCFKMDSX2:2:1219:15808:9173 INFO 2025-01-18 09:12:41 SNPUMIBasePileupIterator Processed 757,000,000 records. Elapsed time: 00:22:37s. Time for last 1,000,000: 1s. Last read position: */*. Last read name: A00984:467:HC7WWDSX2:3:1420:27145:10676 INFO 2025-01-18 09:12:41 ChromosomeFilteringIterator Records pass [452343157] records fail [304945439] INFO 2025-01-18 09:12:41 MapQualityFilteredIterator Records pass [438809206] records fail [13533951] INFO 2025-01-18 09:12:41 MissingTagFilteringIterator Records pass [10876] records fail [438798330] INFO 2025-01-18 09:12:41 SNPUMICellReadIteratorWrapper UMIs that with at least one SNP [870] of total UMIs [2077] INFO 2025-01-18 09:12:44 AssignCellsToSamples Processed [3] SNPs in BAM + VCF INFO 2025-01-18 09:12:44 AssignCellsToSamples Finished! [Sat Jan 18 09:12:44 UTC 2025] org.broadinstitute.dropseqrna.barnyard.digitalallelecounts.sampleassignment.AssignCellsToSamples done. Elapsed time: 22.75 minutes. Runtime.totalMemory()=188743680
I have no idea about this situation and any suggestions will be greatly appreciated !
Thank,
PJChen
The text was updated successfully, but these errors were encountered:
I'm finding your output dump a bit hard to read and incomplete. You might instead attach your full logs for these two programs as text. You mentioned that TagReadWithGeneFunction failed. Here's what I would do to check on that:
Run TagReadWithGeneFunction on your bam.
Look at your BAM with samtools, and look for reads that are tagged appropriately - they would have gn, gs, gf tags. Check and see what proportion of your reads have flags.
Example:
#get the number of reads that were tagged.
samtools view dropulation_tag_bam.bam |grep gn:Z: |wc -l
#get the total number of reads
samtools view dropulation_tag_bam.bam |wc -l
For some random data I just pulled, I see:
40303253 reads have the gn tag (the read is assigned to at least one gene)
57687585 reads in total
This is expected - the majority of the reads in a good RNASeq experiment will be genic, but not all reads are genic. Showing just the first two reads at the start of chromosome 1 (where there's no gene) doesn't prove that the program didn't work. The process of how this program works and how to interpret reads is here.
If the number of tagged reads is low, then you should check that your GTF is intact (not truncated). My guess is that this step probably did work, and your problem is later. Typically, having a VCF file that has valid records is the harder part of the process. The Census-Seq cookbook on this github's main page has a bunch of information on how to properly clean up VCF files. You might also want to take a look at the donor assignment cookbook if you haven't already.
Hello,
I am running Drop-seq encapsulated in Demuxafy,and I find little SNP can pass AssignCellsToSamples, mainly occurred in MissingTagFilteringIterator steps. I further discovered that this was because my TagReadWithGeneFunction step did not work,while the output did not report any errors.I took a few of them as follows:
INFO 2025-01-18 11:48:20 TagReadWithGeneFunction Processed 280,000,000 records. Elapsed time: 00:54:39s. Time for last 1,000,000: 11s. Last read position: chrX:119,577,305 INFO 2025-01-18 11:48:32 TagReadWithGeneFunction Processed 281,000,000 records. Elapsed time: 00:54:50s. Time for last 1,000,000: 11s. Last read position: chrX:141,178,215 INFO 2025-01-18 11:48:45 TagReadWithGeneFunction Processed 282,000,000 records. Elapsed time: 00:55:03s. Time for last 1,000,000: 13s. Last read position: chrX:154,400,619 INFO 2025-01-18 11:48:57 TagReadWithGeneFunction Processed 283,000,000 records. Elapsed time: 00:55:15s. Time for last 1,000,000: 12s. Last read position: KI270733.1:135,046 INFO 2025-01-18 12:15:13 TagReadWithGeneFunction Processed 441,000,000 records. Elapsed time: …… 01:21:32s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:23 TagReadWithGeneFunction Processed 442,000,000 records. Elapsed time: 01:21:41s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:32 TagReadWithGeneFunction Processed 443,000,000 records. Elapsed time: 01:21:51s. Time for last 1,000,000: 9s. Last read position: */* INFO 2025-01-18 12:15:42 TagReadWithGeneFunction Processed 444,000,000 records. Elapsed time: 01:22:00s. Time for last 1,000,000: 9s. Last read position: */* [Sat Jan 18 12:15:51 UTC 2025] org.broadinstitute.dropseqrna.metrics.TagReadWithGeneFunction done. Elapsed time: 82.16 minutes. Runtime.totalMemory()=2149580800
The codes and resulting dropulation_tag_bam file are as follows:
apptainer exec Demuxafy.sif TagReadWithGeneFunction \ ANNOTATIONS_FILE=refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf \ INPUT=merge_sorted.bam \ OUTPUT=dropulation_tag_bam.bam
A00984:468:HCFKMDSX2:3:1659:24596:1658 0 chr1 10029 0 82M8S * 0 0 CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACACTAACCCAAACCCTAACACTAACACAAAACAAAAA FFF,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:F,FF,,F,FF,F,,FFF,,,FF,,,,:F,,:,FF, CB:Z:TGCGGGTGTGCCTTTC-1 UB:Z:TTATCCGGTAGC RE:A:I XF:Z:INTERGENIC RG:Z:20k_NSCLC_DTC_3p_nextgem_Multiplex_intron:0:1:HCFKMDSX2:3-7123AC16 NH:i:6 HI:i:1 nM:i:4 CR:Z:TGCGGGTGTGCCTTTC UR:Z:TTATCCGGTAGC AS:i:72 CY:Z:FFFFFFFFFFF,,FFF UY:Z:FFF:FFFF:FFF xf:i:0 A00984:467:HC7WWDSX2:4:2236:12572:25269 16 chr1 10535 1 63M27S * 0 0 GTACCACCGAAATCTGTGCAGAGGAGAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTCCCCATGTACTCTGCGTTGATACCACT FFFF:FFFFF,FF:FFFFFFFFFFFFFFFFF:F:FFFF,F,FFFFFFFFFFFFFFF,FFFFF:F:FFFFFFF:FFFFFFFF:FFFFFFFF CB:Z:GTGCGTGTCGACATAC-1 UB:Z:CTGCAGGCCACA RE:A:I XF:Z:INTERGENIC RG:Z:20k_NSCLC_DTC_3p_nextgem_Multiplex_intron:0:1:HC7WWDSX2:4 NH:i:3 HI:i:1 nM:i:1 CR:Z:GTGCGTGTCGACATAC UR:Z:CTGCAGGCCACA AS:i:60 CY:Z:FFFFFFFFFFFFFFFF UY:Z:FFFF,FFFFFFF xf:i:0 ts:i:26
And only 3 SNP pass in AssignCellsToSamples step, codes and the output of tools are as follows:
apptainer exec Demuxafy.sif Dropulation_AssignCellsToSamples.py --CELL_BC_FILE merge_barcode.tsv \ --INPUT_BAM dropulation_tag_bam.bam \ --OUTPUT assignments.tsv.gz \ --VCF result_sorted.vcf \ --SAMPLE_FILE donor_list.txt \ --CELL_BARCODE_TAG CB \ --MOLECULAR_BARCODE_TAG UB \ --VCF_OUTPUT assignment.vcf \ --MAX_ERROR_RATE 0.05
INFO 2025-01-18 09:12:39 SNPUMIBasePileupIterator Processed 756,000,000 records. Elapsed time: 00:22:36s. Time for last 1,000,000: 1s. Last read position: */*. Last read name: A00984:468:HCFKMDSX2:2:1219:15808:9173 INFO 2025-01-18 09:12:41 SNPUMIBasePileupIterator Processed 757,000,000 records. Elapsed time: 00:22:37s. Time for last 1,000,000: 1s. Last read position: */*. Last read name: A00984:467:HC7WWDSX2:3:1420:27145:10676 INFO 2025-01-18 09:12:41 ChromosomeFilteringIterator Records pass [452343157] records fail [304945439] INFO 2025-01-18 09:12:41 MapQualityFilteredIterator Records pass [438809206] records fail [13533951] INFO 2025-01-18 09:12:41 MissingTagFilteringIterator Records pass [10876] records fail [438798330] INFO 2025-01-18 09:12:41 SNPUMICellReadIteratorWrapper UMIs that with at least one SNP [870] of total UMIs [2077] INFO 2025-01-18 09:12:44 AssignCellsToSamples Processed [3] SNPs in BAM + VCF INFO 2025-01-18 09:12:44 AssignCellsToSamples Finished! [Sat Jan 18 09:12:44 UTC 2025] org.broadinstitute.dropseqrna.barnyard.digitalallelecounts.sampleassignment.AssignCellsToSamples done. Elapsed time: 22.75 minutes. Runtime.totalMemory()=188743680
I have no idea about this situation and any suggestions will be greatly appreciated !
Thank,
PJChen
The text was updated successfully, but these errors were encountered: