Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which parameter can be used to filter the scallop result? #35

Open
Huangyizhong opened this issue Nov 8, 2021 · 2 comments
Open

Which parameter can be used to filter the scallop result? #35

Huangyizhong opened this issue Nov 8, 2021 · 2 comments

Comments

@Huangyizhong
Copy link

hi ,there
The scallop is a good software to assembly the illumina data and I got lots of transcripts that other softwares can not. When I use the ORFfinder to predict the ORF with the scallop results. I got lots of transcripts without the classical splice site, such as the GT-AG,GC-AG or AT-AC. As shown in the picture1, the scallop results were not the same as the other data. Lots of scallop transcripts were not the classical splice site. Is there some parameters can be used to filter it ? As also the picture 2, the transcript looks so strange!
Thanks so much!
Sincerely
Yizhong Huang

image

image

@shaomingfu
Copy link
Collaborator

Hi Yizhong,

Re question 1: Scallop fully uses the splice sites predicted by the aligner. So far it does not contain any model or parameter to detect / filter out poorly supported non-canonical splice sites. We will probably add such feature in future releases. But for now, you may try: 1, check if certain aligner such as STAR or HISAT2 provide such parameters to control splice sites, and/or 2, write a script of your own to filter the assembled transcripts (by Scallop).

Re question 2: the assembled transcripts seem strange to me too. Is this sample strand-specific? If so did you specify library-type when running Scallop?

Best,
Mingfu

@Huangyizhong
Copy link
Author

@shaomingfu Thanks so much for your quick reply! It is a pity that the scallop has no the parameter to filter the splice sites. I have checked the annotation file of the human using the gffread software, and almost all the transcripts are the canonical splice sites. May be I can use the gffread to filter these directly. How can I get the proper thread of the reads number to filter the undesired transcripts? As shown in the picture1, the scallop transcript has two more bases (CT) than other data. The strange transcript I have attached is not the strand-specific, how to deal with it ? I just run the scallop as follows: ${scallop} -i ${bam[$PBS_ARRAYID]} -o ${output}/${NAME}_scallop.gtf.
Thanks again for your kind help
Sincerely
Yizhong Huang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants