Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in Souporcell results using Demuxafy pipeline vs standalone tool #58

Open
ubinKim opened this issue Dec 5, 2024 · 2 comments

Comments

@ubinKim
Copy link

ubinKim commented Dec 5, 2024

When running Souporcell using the v3.0.0 of the Demuxafy pipeline and the snATAC-seq, I obtained results that differ significantly from those generated by using Souporcell as a standalone tool. This seems strange to me, cause I assumed that Demuxafy serves as a wrapper for various demultiplexing tools, so I expected the results to match. Upon inspecting the output files, I noticed that the following seven files were not generated when using the Demuxafy pipeline:

  • fastqs.done
  • minimap.err
  • remapping.done
  • retag.err
  • retagging.done
  • souporcell_minimap_tagged_sorted.bam
  • souporcell_minimap_tagged_sorted.bam.bai

Despite this, no errors were reported, and the pipeline appeared to complete all processes successfully. However, the droplet assignment results were drastically different between the two approaches.

For reference, here is the code using the Demuxafy pipeline:

singularity exec Demuxafy.sif Souporcell.py \
  -i ${BAM} \
  -b ${BARCODES} \
  -f ${FASTA} \
  -t ${THREADS} \
  -o ${SOUPORCELL_OUTDIR} \
  -k ${N} \
  --common_variants ${VCF} \
  --no_umi True

singularity exec Demuxafy.sif bash souporcell_summary.sh ${SOUPORCELL_OUTDIR}/clusters.tsv > ${SOUPORCELL_OUTDIR}/souporcell_summary.tsv

And this is to run Souporcell without Demuxafy (except for the code to summarize the results, for that I used ‘souporcell_summary.sh’ script provided by Demuxafy):

singularity exec souporcell_release.sif souporcell_pipeline.py \
  -i ${BAM} \
  -b ${BARCODES} \
  -f ${FASTA} \
  -t ${THREADS} \
  -o ${SOUPORCELL_OUTDIR} \
  -k ${N} \
  --common_variants ${VCF} \
  --no_umi True

singularity exec Demuxafy.sif bash souporcell_summary.sh ${SOUPORCELL_OUTDIR}/clusters.tsv > ${SOUPORCELL_OUTDIR}/souporcell_summary.tsv

With these codes, I got the following results when I demultiplexed three samples:
image

As you can tell, I used exactly the same code. The input files, parameters, and Souporcell version (v2.5) were also identical in both cases. The attached code examples do not use reference SNP genotypes, but the behavior was consistent when using reference SNP genotypes as well.

Questions:

  1. What might be the reason for the difference in results between the two approaches? Are there undocumented preprocessing steps or any modifications to default options within the Demuxafy pipeline that could affect demultiplexing outputs? I also reviewed the publication but could not find relevant details about such discrepancies. Could similar differences arise when using other demultiplexing tools within Demuxafy?

  2. When using the Demuxafy pipeline, I also found that Souporcell fails to run without providing SNP genotype information (i.e., you must use either --common_variants or --known_genotypes option). Is it intended behavior that Demuxafy requires SNP genotype information for Souporcell while the standalone tool does not?

Thank you in advance!

@drneavin
Copy link
Owner

Hi @ubinKim,

I apologise for my delayed response. I'm pretty sure this is an issue with my wrapping script which is meant to help check for the same chr encoding for vcfs and bams but it doesn't appear to be parsing the arguments correctly. I'll update this in the new year but in the meantime, you can run the typical souporcell_pipeline.py from Demuxafy as well to double check that you get the same results. If you do run that, I would be interested to hear if you get the same results as running directly from the souporcell image.

Thanks!
-Drew

@ubinKim
Copy link
Author

ubinKim commented Dec 18, 2024

Hi @ubinKim,

I apologise for my delayed response. I'm pretty sure this is an issue with my wrapping script which is meant to help check for the same chr encoding for vcfs and bams but it doesn't appear to be parsing the arguments correctly. I'll update this in the new year but in the meantime, you can run the typical souporcell_pipeline.py from Demuxafy as well to double check that you get the same results. If you do run that, I would be interested to hear if you get the same results as running directly from the souporcell image.

Thanks! -Drew

Hi @drneavin,

Thanks for the reply.
I ran the souporcell_pipeline.py from Demuxafy using the same inputs, and indeed I've got the same results as running directly from the souporcell image (without Demuxafy), and also could run without providing SNP genotype information.

Could you also check other demultiplexing tools within Demuxafy to see if they might have similar problems when you update the script? Thank you so much for your time and effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants