Skip to content

Latest commit

 

History

History
59 lines (39 loc) · 2.63 KB

README.md

File metadata and controls

59 lines (39 loc) · 2.63 KB

Manuscript Code Repository: Analysis of the limited M. tuberculosis pangenome reveals potential pitfalls of pan-genome analysis approaches

This repository contains all code and bioinformatics pipelines to reproduce the analysis of Analysis of the limited M. tuberculosis pangenome reveals potential pitfalls of pan-genome analysis approaches.

All initial bioinformatics processing (such as assembly, alignment, annotation, and pan-genome estimation) was implemented using the Snakemake workflow engine, while downstream data processing and analysis is available in Python based Jupyter notebooks.

Contents

Installation

All dependencies needed to reproduce this analysis can be installed via Conda .

# 1) Clone repository
git clone https://github.com/farhat-lab/mtb-pg-benchmarking-2024paper

# 2) Create a conda environment named 'CoreEnv_PG_V1'
cd mtb-pg-benchmarking-2024paper/

conda env create --file CondaEnvs/CoreEnv_PG_V1.yml -n CoreEnv_PG_V1

# 3) Activate environment (used for SnakeMake pipeline and data analysis)
conda activate CoreEnv_PG_V1

Snakemake pipelines

All data processing starting from raw reads (FASTQ) was orchestrated using the Snakemake workflow system. Refer to the Snakemake directory's README.md for detailed usage of each pipeline.

Data Analysis

The DataAnalysis/ directory contains Jupyter notebooks for downstream data processing, table generation, and figure generation related to this work.

Data analysis can be run after initial assembly, annotation, and analysis pipelines have been run. (Alternatively, processed pipeline outputs can be downloaded.)

Refer to the Analysis directory's README.md for a detailed overview of this project's analysis.

Supplemental Files

🚧 Check back soon 🚧

Curated Datasets

All Hybrid and short-read genome assemblies (along with annotations) used in this work are organized and made easily accessible for future use. Additionally, the (relatively) large outputs of the various pan-genome analyses are made easily available via Zenodo.

DOI

Genome assemblies used

🚧 Check back soon 🚧

Pan-genome analyses

🚧 Check back soon 🚧

License

This repository is distributed under the MIT license terms.