Earwig_genome_project

This repository contains all the scripts used to assemble and annotate the Earwig genome. The pipeline is presented in three parts:

Genome assembly
Denovo repeat library
Genome annotation

1. Genome assembly

Genome is assembled using linked reads from 10x chromium and long reads from Oxford nanopore. Long and linked reads were individually assembled and then merged together. After multiple iterations of scaffolding, gapclosing, and haplotigs and contaminants removal, assembly was polished with mRNA-seq reads to obtain final assembly. Schematic representation in figure below (Created with BioRender.com).

Workflow and Scripts:

Linked read assembly
Long read assembly
Merging two assemblies
Further processing with long reads
Further processing with linked reads
Processing with RNA-seq reads
Final bits: Haplotigs removal, cleaning and polishing

2. Denovo repeat library

A comprehensive denovo repeat library is prepared for the assembled genome. It was used for repeat content analysis, repeat masking and as input for annotation pipeline.

Workflow:

Repeat library preparation
Concatenating, filtering, and classifying repeats
1. RepeatClassifier
Repeat masking the genome
1. RepeatMasker

3. Genome annotation

Maker2 pipeline is used for genome annotation. mRNA-seq data is denovo assembled using Trinity. Other relavant publicly available datasets were downloaded and used as input.

Processing mRNA-seq data
GeneMark-ES
Braker
Configuring and running Maker2 pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Earwig_genome_project

1. Genome assembly

2. Denovo repeat library

3. Genome annotation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Earwig_genome_project

1. Genome assembly

2. Denovo repeat library

3. Genome annotation