Skip to content

Latest commit

 

History

History
47 lines (39 loc) · 2.42 KB

README.md

File metadata and controls

47 lines (39 loc) · 2.42 KB

Earwig_genome_project

This repository contains all the scripts used to assemble and annotate the Earwig genome. The pipeline is presented in three parts:

  1. Genome assembly
  2. Denovo repeat library
  3. Genome annotation

1. Genome assembly

Genome is assembled using linked reads from 10x chromium and long reads from Oxford nanopore. Long and linked reads were individually assembled and then merged together. After multiple iterations of scaffolding, gapclosing, and haplotigs and contaminants removal, assembly was polished with mRNA-seq reads to obtain final assembly. Schematic representation in figure below (Created with BioRender.com).

Alt text

Workflow and Scripts:

  1. Linked read assembly
  2. Long read assembly
  3. Merging two assemblies
  4. Further processing with long reads
  5. Further processing with linked reads
  6. Processing with RNA-seq reads
  7. Final bits: Haplotigs removal, cleaning and polishing

2. Denovo repeat library

A comprehensive denovo repeat library is prepared for the assembled genome. It was used for repeat content analysis, repeat masking and as input for annotation pipeline.

Workflow:

  1. Repeat library preparation
    1. Repeatmoduler
    2. LTRharvest & LTRdigest
    3. TransposonPSI
    4. Sine database
  2. Concatenating, filtering, and classifying repeats
    1. RepeatClassifier
  3. Repeat masking the genome
    1. RepeatMasker

3. Genome annotation

Maker2 pipeline is used for genome annotation. mRNA-seq data is denovo assembled using Trinity. Other relavant publicly available datasets were downloaded and used as input.

  1. Processing mRNA-seq data
  2. GeneMark-ES
  3. Braker
  4. Configuring and running Maker2 pipeline