Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sumstatsBP_overlap.py failed to handle paired-end reads with unequal R1 and R2 lengths #85

Open
Qirong-Lin opened this issue Jan 16, 2024 · 0 comments

Comments

@Qirong-Lin
Copy link

Describe the bug
Hi Caleb, thanks for developing this amazing tool! I'm running a atac-seq dataset with unequal R1 and R2 read lengths due as I trimmed the low-quality area of R1 after sequencing. It throws out the error after running:

Traceback (most recent call ast):
File"/home/qirong/anaconda3/envs/mgatk py38/ib/python3.8/site-packages/mgatk/bin/python/sumstatsBP overlap.py",line 107,in <module>
fwd overlap use idx = np.where(fwd overlap quality > rev overlap quality)[0]ValueError: operands could not be broadcast together with shapes (53,) (52,)

After checking the code, this is because R1 covers the entire sequence or R2, or vice versa, which perhaps causes the error when extracting the overlapped area. i.e. R1 contains R1 only, overlapped region, R1 only area, while the entire R2 is overlapped with R1.

Additional context
I modified the sumstatsBP_overlap.py by adding codes to detect whether such overlap happens and extract the overlapped regions and exclusive regions by R1_only_5prime, overlapped, R1_only_3prime, or vice versa.
sumstatsBP_overlap.zip
I'm not 100% sure whether this is the proper way and looking forward to discussing more about this.

Best,
Qirong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant