Difference between Scallop-LR and Scallop releases ? #23

LehmannN · 2020-04-21T16:05:11Z

Hello,

I installed both version of Scallop-LR v0.9.2 and Scallop v0.10.4 and tried to run both on Nanopore data. Scallop-LR is not working (it runs but does not return anything) but Scallop is working fine following the advises in issue #11. I tried with multiple set of parameters for both versions (put very low level to see if anything would be detected with scallop-lr):

Scallop-LR:
--verbose 2
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 2
--min_transcript_length_increase 10
--min_transcript_length_base 10
--min_mapping_quality 1
--min_bundle_gap 1
--min_num_hits_in_bundle 1
--min_splice_hits 1
--min_bundary_hits 2

Scallop:
--verbose 1
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 10
--min_transcript_length_increase 50
--min_transcript_length_base 150
--min_mapping_quality 1
--max_num_cigar 10000
--min_bundle_gap 50
--min_num_hits_in_bundle 5
--min_flank_length 3
--min_splice_bundary_hits 1

Thus, I wondered:
1- Is there a reason Scallop-LR is not working ? I thought it would be better to use this version for long read data but impossible up to now
2- What are the key differences between Scallop-LR and Scallop ?
3- Do you have more (new) advises than the ones in issue #11 to run Scallop with ONT data ?

Thanks a lot for your help,

Nathalie

shaomingfu · 2020-04-21T16:43:10Z

Hello,

I installed both version of Scallop-LR v0.9.2 and Scallop v0.10.4 and tried to run both on Nanopore data. Scallop-LR is not working (it runs but does not return anything) but Scallop is working fine following the advises in issue #11. I tried with multiple set of parameters for both versions (put very low level to see if anything would be detected with scallop-lr):

Scallop-LR:
--verbose 2
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 2
--min_transcript_length_increase 10
--min_transcript_length_base 10
--min_mapping_quality 1
--min_bundle_gap 1
--min_num_hits_in_bundle 1
--min_splice_hits 1
--min_bundary_hits 2

Scallop:
--verbose 1
--library_type unstranded
--min_transcript_coverage 1
--min_single_exon_coverage 10
--min_transcript_length_increase 50
--min_transcript_length_base 150
--min_mapping_quality 1
--max_num_cigar 10000
--min_bundle_gap 50
--min_num_hits_in_bundle 5
--min_flank_length 3
--min_splice_bundary_hits 1

Thus, I wondered:
1- Is there a reason Scallop-LR is not working ? I thought it would be better to use this version for long read data but impossible up to now

The current version of Scallop-LR is primarily optimized for PacBio IsoSeq data. We haven't tested it on any ONT data.

2- What are the key differences between Scallop-LR and Scallop ?

(1), Scallop-LR takes advantage of the 5'/3' primers in the PacBio reads to detect starting/ending boundaries of expressed transcripts (piped to Scallop-LR via a header file);
(2), Scallop-LR tries to correct the coordinates of splicing junctions, as long-reads data suffer high error rate;
(3), Scallop-LR uses a postprocess procedure to cluster transcripts to reduce false positives, again tries to address the issue of high error rate.

The core algorithm (to decompose splice graph into paths in the presence of phasing paths) used in Scallop-LR is the same as in Scallop.

3- Do you have more (new) advises than the ones in issue #11 to run Scallop with ONT data ?

For now you may try Scallop for ONT data. The parameters you set look good to me (you can set --min_num_hits_in_bundle 1, to further increase sensitivity, considering that long-reads data usually has low coverage).

Thanks a lot for your help,

Thanks for using our tools!

Nathalie

LehmannN · 2020-04-22T12:47:37Z

Thanks for your detailed reply !

(1), Scallop-LR takes advantage of the 5'/3' primers in the PacBio reads to detect starting/ending boundaries of expressed transcripts (piped to Scallop-LR via a header file);

That may be some of the reasons why Scallop-LR is not working with ONT data. Could you show me in which part of the code this is done so I try to make some modifications ?

you can set --min_num_hits_in_bundle 1

It is not clear to me exactly what is a "bundle" (couldn't find more information neither in the paper nor in the documentation). Could you explain a bit more so it's clearer for me ? Thanks !

I also have 2 more questions:
1- I end up with a lot of artifactual transcripts (such as described further in issue #11). At that time you said:

For now, you can try set --min_splice_bundary_hits to a higher number (say, 2 to 5) to filter out these junctions that are supported less that these number of reads. But this may also filter out transcripts that are lowly expressed. [...] You can also try postprocess these transcripts by merging similar transcripts that differ only a few base pairs on some exons.

Do you know a tool that could do the job to merge similar transcripts. I thought of bedtools merge but if you know something else, that'd be good to take !

2- I have ONT + short read data for the same cell type. Would you recommend to run both dataset together (one single run) ? Or run separately and merge the 2 results ? Or would it be better to turn to tools like IDP-denovo as mentioned in the paper ?

Thanks ! Have a nice day

Nathalie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference between Scallop-LR and Scallop releases ? #23

Difference between Scallop-LR and Scallop releases ? #23

LehmannN commented Apr 21, 2020

shaomingfu commented Apr 21, 2020

LehmannN commented Apr 22, 2020

Difference between Scallop-LR and Scallop releases ? #23

Difference between Scallop-LR and Scallop releases ? #23

Comments

LehmannN commented Apr 21, 2020

shaomingfu commented Apr 21, 2020

LehmannN commented Apr 22, 2020