Challenge 3 Evaluation

The evaluation of submitted transcript models will be done using SQANTI3 descriptors and number of LRGASP-agreed evaluation metrics. The general procedure to assess these evaluation metrics on your data is to run, an adapted version of the original code that will generate automatically an HTML report with the results of the evaluation for challenge 3. Please, download/clone the entire sqanti3_evaluation directory, including the utilities subfolder.

Challenge 3 of LRGASP was thought to evaluate the performance of different pipelines to recontruct a transcriptome without using previous annotation nor a reference genome (except for the long and genome and free-style categories, for which it is allowed to use a genome). However, for its evaluation, we will use a de novo ONT-based genome available here to obtain information about the submitted transcriptomes.

Setting up the environment

In order to install all the dependencies needed by, please use the YML file to build a conda environment.

conda env create -f sqanti3_lrgasp.yml
source activate sqanti3_lrgasp

SQANTI3 also takes advantage of some scripts of cDNA_Cupcake. Please install it with the sqanti3_lrgasp environment activated and add cDNA_Cupcakeand cDNA_Cupcake/sequence to your PYTHONPATH.

(sqanti3_lrgasp)$ git clone
(sqanti3_lrgasp)$ cd cDNA_Cupcake
(sqanti3_lrgasp)$ python build
(sqanti3_lrgasp)$ python install

(sqanti3_lrgasp)$ export PYTHONPATH=$PYTHONPATH:<path_to>/cDNA_Cupcake/sequence/
(sqanti3_lrgasp)$ export PYTHONPATH=$PYTHONPATH:<path_to>/cDNA_Cupcake/

Remember to activate the sqanti3_lrgasp environment and setting up the PYTHONPATH every time you are running the evaluation.


When running SQANTI3, your transcript-models are usually compared against some kind of genome annotation, but in this case, that's not a possibility since we are replicating the situation in which there isn't any type of information beyond sequencing data. For Challenge 3 the only annotation file that will be provided to SQANTI3 will be the spike-ins information. Here you can find that GTF. It contains the annotation of Lexogen's SIRVs-Set 4.

LRGASP will be using Illumina junction coverage to evaluate your transcript models. We therefore recommend you run enabling this analysis. To do so:

  • SJ coverage: As SJ information is dependent on the sample being analyzed, it is necessary to run previously STAR to map the Illumina reads against the genome and identify possible splice-junctions using the --twopassMode. Then, the resulting * file can be input to with the parameter -c. This is an example of how we normally run STAR for this SJ detection:
  1. Create a genome index with STAR without providing the reference annotation. We don't want to bias the SJ-detection towards the annotated splice sites.
STAR --runThreadN <num_threads> --runMode genomeGenerate --genomeDir <star_index> --genomeFastaFiles <reference_genome_FASTA> --outTmpDir <index_dir_tmp> 
  1. Map short-reads using --twopassMode
STAR --runThreadN <num_threads> --genomeDir <star_index> --readFilesIn <read1> <read2> --outFileNamePrefix <output_prefix> --alignSJoverhangMin 8  --alignSJDBoverhangMin 1 --outFilterType BySJout --outSAMunmapped Within --outFilterMultimapNmax 20 --outFilterMismatchNoverLmax 0.04 --outFilterMismatchNmax 999 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --sjdbScore 1 --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --twopassMode Basic

It is also neccessary to provide metadata files in JSON format. To complete a submission it is mandatory to create experiment and entry JSON files and those are the ones that is expecting. Here you can find more information about metadata files and some templates for them.

If you don't have those JSON files ready yet, now --experiment_json and --entry_json are optional arguments. If they are not defined, inside utilities/ folder there are a couple of dummy files that will do the job. Please, make sure that species field of the experiment_dummy.json is right. Manatee samples used a different set of spike-ins compared to mouse samples. In consequence, if the species are not set properly, it will result into a wrong evaluation of SIRVs.


This is an example of how to run the script:

python manatee_submitted.fasta lrgasp_sirv4.gtf lrgasp_manatee_sirv1.fasta \
	--experiment_json manatee_experiment.json --entry_json manatee_entry.json \
	-c -d /my_output/directory -o manatee_submission_test

This will generate in /my_output/directory the two main SQANTI3 outputs (*_classification.txtand *_junctions.txt) and a HTML file that will be called, in this case, manatee_submission_test_Evaluation_report.html.


