Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between Scallop-LR and Scallop releases ? #23

Open
LehmannN opened this issue Apr 21, 2020 · 2 comments
Open

Difference between Scallop-LR and Scallop releases ? #23

LehmannN opened this issue Apr 21, 2020 · 2 comments

Comments

@LehmannN
Copy link

Hello,

I installed both version of Scallop-LR v0.9.2 and Scallop v0.10.4 and tried to run both on Nanopore data. Scallop-LR is not working (it runs but does not return anything) but Scallop is working fine following the advises in issue #11. I tried with multiple set of parameters for both versions (put very low level to see if anything would be detected with scallop-lr):

Scallop-LR:
--verbose 2
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 2
--min_transcript_length_increase 10
--min_transcript_length_base 10
--min_mapping_quality 1
--min_bundle_gap 1
--min_num_hits_in_bundle 1
--min_splice_hits 1
--min_bundary_hits 2

Scallop:
--verbose 1
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 10
--min_transcript_length_increase 50
--min_transcript_length_base 150
--min_mapping_quality 1
--max_num_cigar 10000
--min_bundle_gap 50
--min_num_hits_in_bundle 5
--min_flank_length 3
--min_splice_bundary_hits 1

Thus, I wondered:
1- Is there a reason Scallop-LR is not working ? I thought it would be better to use this version for long read data but impossible up to now
2- What are the key differences between Scallop-LR and Scallop ?
3- Do you have more (new) advises than the ones in issue #11 to run Scallop with ONT data ?

Thanks a lot for your help,

Nathalie

@shaomingfu
Copy link
Collaborator

Hello,

I installed both version of Scallop-LR v0.9.2 and Scallop v0.10.4 and tried to run both on Nanopore data. Scallop-LR is not working (it runs but does not return anything) but Scallop is working fine following the advises in issue #11. I tried with multiple set of parameters for both versions (put very low level to see if anything would be detected with scallop-lr):

Scallop-LR:
--verbose 2
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 2
--min_transcript_length_increase 10
--min_transcript_length_base 10
--min_mapping_quality 1
--min_bundle_gap 1
--min_num_hits_in_bundle 1
--min_splice_hits 1
--min_bundary_hits 2

Scallop:
--verbose 1
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 10
--min_transcript_length_increase 50
--min_transcript_length_base 150
--min_mapping_quality 1
--max_num_cigar 10000
--min_bundle_gap 50
--min_num_hits_in_bundle 5
--min_flank_length 3
--min_splice_bundary_hits 1

Thus, I wondered:
1- Is there a reason Scallop-LR is not working ? I thought it would be better to use this version for long read data but impossible up to now

The current version of Scallop-LR is primarily optimized for PacBio IsoSeq data. We haven't tested it on any ONT data.

2- What are the key differences between Scallop-LR and Scallop ?

(1), Scallop-LR takes advantage of the 5'/3' primers in the PacBio reads to detect starting/ending boundaries of expressed transcripts (piped to Scallop-LR via a header file);
(2), Scallop-LR tries to correct the coordinates of splicing junctions, as long-reads data suffer high error rate;
(3), Scallop-LR uses a postprocess procedure to cluster transcripts to reduce false positives, again tries to address the issue of high error rate.

The core algorithm (to decompose splice graph into paths in the presence of phasing paths) used in Scallop-LR is the same as in Scallop.

3- Do you have more (new) advises than the ones in issue #11 to run Scallop with ONT data ?

For now you may try Scallop for ONT data. The parameters you set look good to me (you can set --min_num_hits_in_bundle 1, to further increase sensitivity, considering that long-reads data usually has low coverage).

Thanks a lot for your help,

Thanks for using our tools!

Nathalie

@LehmannN
Copy link
Author

Thanks for your detailed reply !

(1), Scallop-LR takes advantage of the 5'/3' primers in the PacBio reads to detect starting/ending boundaries of expressed transcripts (piped to Scallop-LR via a header file);

That may be some of the reasons why Scallop-LR is not working with ONT data. Could you show me in which part of the code this is done so I try to make some modifications ?

you can set --min_num_hits_in_bundle 1

It is not clear to me exactly what is a "bundle" (couldn't find more information neither in the paper nor in the documentation). Could you explain a bit more so it's clearer for me ? Thanks !

I also have 2 more questions:
1- I end up with a lot of artifactual transcripts (such as described further in issue #11). At that time you said:

For now, you can try set --min_splice_bundary_hits to a higher number (say, 2 to 5) to filter out these junctions that are supported less that these number of reads. But this may also filter out transcripts that are lowly expressed. [...] You can also try postprocess these transcripts by merging similar transcripts that differ only a few base pairs on some exons.

Do you know a tool that could do the job to merge similar transcripts. I thought of bedtools merge but if you know something else, that'd be good to take !

2- I have ONT + short read data for the same cell type. Would you recommend to run both dataset together (one single run) ? Or run separately and merge the 2 results ? Or would it be better to turn to tools like IDP-denovo as mentioned in the paper ?

Thanks ! Have a nice day

Nathalie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants