-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extracting methylation data from aligned BAMs #63
Comments
Hi thanks for letting us know this, the ML tag is not meant to be malformed we will investigate and get back to you |
Hi! I am also interested in extracting the methylation information and was wondering if there is anything new about this topic? I realised that you are using the ML:Z: tag but couldnt find any description for this in the SAM format specification. Could you maybe commend why you use this tag and not the ML:B:C? |
Just looking at the code, I think you've hit on the error @Henrikkoe. The code was adapted to use pysam from code that directly wrote SAM text. The value of the tag is being created as a string and passed to pysam which gladly writes the string as a Z format tag rather than a B format tag. We will need to change this. |
I just quickly changed the tags from ML:Z: to ML:B:C for a test set and got this error for "modkit pileup":
So I think for each alignment within a concatemer the MN tag should be updated according the alignment length. |
Thanks @Henrikkoe, That sounds like a secondary issue, after chopping up the read we need to invalidate the You should be able to test this further for us. The We should be able to recalculate and amend MN (worst case we can discard it) for the monomer reads. |
I did a small check and the |
I may have confused things: yesterday I made a release of pore-c-py that ensures MN is updated correctly. In the process I changed the behaviour to consistently remove all the mod tags if one is removed (because it makes no sense to leave them in a partial state). |
I am still a bit confused about the tags that are removed and added at which time during processing. If I understood it correctly then is the input for pore-c-py digest an unaligned bam file with the called base modifications. At this moment in the pipeline there was no alignment and only the digest of the concatemer into monomers. These monomers can be alignments later in the processing and basically representing a partial state/alignment of the concatemer. Then the sequence/alignment length (MN) should be updated according to the new length. This is what I dont see in the bam file entries but I see (except of the ML:Z: to ML:B:C tag label) a correct representation of ML and MM tags. So the ML and MM tags are trimmed correctly. Only all alignments of a concatemer then have a wrong MN tag. Here a truncated representation of a bam file entry: |
You are correct: in the current release the MN tag is not being updated correctly (and the ML tag is a string tag not and array tag). We need to make a new release to correct these two issues. We've corrected the code in the pore-c-py package but not updated wf-pore-c to use that new code. |
Ah okay I see, thank you! Do you already have an estimate when the wf-pore-c will be updated? |
Hi, this is updated now in the latest release v1.2.0 |
Thank you so much!!! |
Awesome! Thanks |
It seems that there is still an issue with a not correctly trimmed ML and MM tag for some reads/alignments.
|
Oh no sorry, I think we missed a capitalisation. Will amend again and let you know when ready. |
Hi, this has now been released in version v1.2.2 |
Ask away!
Hi! Quick question on extracting methylation data from aligned reads. Our raw bams have been methylation called already, so it seems like the easiest way to extract methylation information would be to use modkit on the aligned bams at {alias}.ns.bam. However, using "modkit pileup" gives errors of the form
The code for pore-c-py seems like it is designed to pass modified bam information to the digested aligned output bams, so I am confused as to why this is happening. It may be an issue with the way I am calling modkit, but before I attempt troubleshooting further I want to ask if it should be possible to use modkit to extract methylation data from BAMs aligned from the pipeline, or if there is another preferred I should be processing methylation calls. Thank you!
The text was updated successfully, but these errors were encountered: