Skip to content

Codeptor/DNA-Sequencing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

DNA Sequence Error Detection using Pysam

This Jupyter notebook demonstrates DNA sequence error detection using Pysam, focusing on a small E. coli genome dataset.

Overview

The notebook covers:

  1. Dataset Selection and Preparation
  2. Prerequisites and Setup
  3. Reading BAM Files with Pysam
  4. Error Detection Algorithms
  5. Analyzing Error Patterns
  6. Implementing Quality Control Measures
  7. Comparative Analysis

Requirements

  • Python 3.7+
  • Jupyter Notebook environment (e.g., Google Colab)
  • Basic understanding of DNA sequencing concepts
  • Familiarity with Python programming

Installation

The notebook includes installation steps for required tools and libraries:

  • SRA Toolkit
  • BWA
  • Samtools
  • Python libraries: pysam, numpy, matplotlib, seaborn

Usage

  1. Open the notebook in a Jupyter environment (e.g., Google Colab).
  2. Run the cells sequentially to perform the analysis.
  3. The notebook will download a small E. coli dataset and reference genome.
  4. Follow the step-by-step process to detect and analyze sequencing errors.

Output

The notebook generates several visualizations:

  • Mismatch types bar plot
  • Indel size distribution histogram
  • Error rate comparison before and after quality filtering

Note

This notebook is designed for educational and research purposes, demonstrating DNA sequencing error detection techniques on a small scale.

About

DNA Sequencing using PySAM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published