Skip to content

scripts to generate figures/tables from Gorelick et al. Nat Metab, 2021

License

Notifications You must be signed in to change notification settings

alhafidzhamdan/mtdna-mutations

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Respiratory complex and tissue lineage drive recurrent mutations in tumour mtDNA

alt text

Instructions:

The following steps can be used to recreate the essential parts of the figures in the manuscript. All code is in R (>=3.6.0).

Clone this repo:

git clone https://github.com/reznik-lab/mtdna-mutations
cd mtdna-mutations

Install required R packages from CRAN:

## check for missing required packages, install them.
required.packages <- c('data.table','ggplot2','cowplot','RColorBrewer',
                       'parallel','ggsignif','binom','scales',
                       'MASS','ggrepel','Hmisc','Rcpp',
                       'car','here','magrittr','knitr','rmarkdown','survminer','DescTools')

new.packages <- required.packages[!(required.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)>0) install.packages(new.packages)

You will also need to install Pandoc if you don't already have it, see: https://pandoc.org/installing.html

Generate html file with figures in the main text and extended data:

R -e "rmarkdown::render('html/figures.Rmd', output_file = 'figures.html')"

The generated file html/figures.html can be opened in a web browser to view the panels from the main text and extended data figures.

Optional:

The above commands use data provided in this repo, including a number of "precalculated" datasets based on methods described in the manuscript. All such precalculated datasets are in the data/processed_data subdirectory, and can be recreated with the following commands:

## If Bioconductor isn't already installed, install it.
if(!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

## Install additional required packages from Bioconductor if not installed.
if(!'DESeq2' %in% installed.packages()) BiocManager::install("DESeq2")
if(!'apeglm' %in% installed.packages()) BiocManager::install("apeglm")
if(!'fgsea' %in% installed.packages()) BiocManager::install("fgsea")
  1. Generate misc tables used in other scripts based on sample coverage data:
## Generate a table of number of samples covered per position, 
## and a matrix of effective-gene-lengths for each sample. Generates: 
## - data/processed_data/samples_callable_per_position.txt
## - data/processed_data/sample_gene_lengths.txt.gz
## Additionally generate similar files using a cutoff of 20 alt-reads rather than the default 5:
## - data/processed_data/samples_callable_per_position20.txt
## - data/processed_data/sample_gene_lengths20.txt.gz
Rscript r/do/process_sample_coverage.R
  1. Generate table of fraction of mtDNA with callable mutations in tumor, normal, and combined tumor+normal samples, at a threshold of 5+ alt reads, and 20+ alt reads:
## Generates: data/processed_data/frac_callable.txt
Rscript r/do/get_frac_callable.R
  1. Determine recurrent positions for mtDNA SNV, truncating indels, and tRNA mutations:
## Calculate SNP hotspots. Generates: 
## - data/processed_data/hotspots_snv.txt
Rscript r/do/hotspot_snv.R

## Calculate frame-shift indel hotspots. Generates:
## - data/processed_data/hotspots_indel.txt
Rscript r/do/hotspot_indel.R

## Calculate tRNA alignment hotspots. Generates:
## - data/processed_data/hotspots_trna.txt
## - data/processed_data/hotspots_trna_pcawg.txt ## for figure EDF7b
Rscript r/do/hotspot_trna.R
  1. Annotate samples with mtDNA variants, % callable, and overall mtDNA-status:
## Generates: 
## - data/processed_data/sample_hotspot_status_tcga.txt
## - data/processed_data/sample_hotspot_status_impact.txt
## - data/processed_data/hotspot_region_map.txt
Rscript r/do/annotate_sample_mutations.R
  1. Calculate differentially-expressed genes and run GSEA:
## Generates 
## - data/processed_data/rnaseq_truncating_gsea_hallmark_precalculated.txt
## - data/processed_data/rnaseq_truncating_histogram_precalculated.txt
## - data/processed_data/rnaseq_vus_gsea_hallmark_precalculated.txt
## - data/processed_data/rnaseq_vus_histogram_precalculated.txt
## - data/processed_data/rnaseq_r25q_gsea_hallmark_precalculated.txt

## First, download TCGA RNA-Seq RSEM data (1.8GB file):
wget http://api.gdc.cancer.gov/data/3586c0da-64d0-4b74-a449-5ff4d9136611 -O data/original_data/resources/EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv

## Now run DESeq and fGSEA analyses:
Rscript r/do/run_deseq.R

Citation

Contact

E-mail any questions to [email protected].

About

scripts to generate figures/tables from Gorelick et al. Nat Metab, 2021

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 99.6%
  • CSS 0.4%