This repository contains code and data from Koche et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma Nat Genet (2020).
Circle Calls
Data/CircleseqCircles.bed
contains Circle-seq circle calls for our datasetData/WGSinferredCircles_eccDNA.txt
contains WGS-inferred circle calls for our dataset (eccDNA only)Data/WGSinferredCircles_ecDNA.txt
contains WGS-inferred circle calls for our dataset (ecDNA only)
WGS Rearrangement Calls
Data/MergedSV_StrictestFiltering_CircleSeqCircleAnnotation.txt
are the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with Circle-seq circle calls for classification (available upon request)Data/MergedSV_StrictestFiltering_WGSCircleAnnotation.txt
are the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with WGS-inferred circle calls for classification (available upon request)Data/PedPanCanSVs.csv
contains structural variants in the DKFZ Pediatric Pan-Cancer dataset published by Gröbner et al. 2018 (data downloaded from the corresponding R2 platform)Data/PedPanCanMeta.csv
contains metadata such as the tumor entity for the DKFZ Pediatric Pan-Cancer dataset
Expression Data
Data/berlin_cohort_rnaseq_fpkm.txt
is RNA-seq data for the Berlin neuroblastoma cohort (available upon request)Data/peifer_54nb_fpkms.txt
is RNA-seq data published by Peifer et al.
Clinical Data
Data/ClinicalData.csv
specifies risk group and survival data
Other
Data/BroadIntervals/b37-[...]_MinusBlckLst.txt
specifies the mappable, non-blacklisted genome we use for randomization analyses
script_merge_translocations[...].py
collapses interchromosomal structural variants from different callersscript_filter_mapqBAM_3.0.py
filters collapsed interchromosomal structural variants using common criteriascript_merge_intraSVs_[...].py
collapses intrachromosomal structural variants from different callersscript_filter_atleast2callers_2.0.py
filters collapsed intrachromosomal structural variants using common criteriascript_merge_circularized_regions_[...].py
overlaps interchromosomal structural variants with circle calls and classifies accordinglyscript_merge_circularized_regions_[...]_intraSV_[...].py
overlaps intrachromosomal structural variants with circle calls and classifies accordinglyscript_search_integrations_[...].py
searches for two-breakpoint patterns indicative of circle integration into a chromosome
Tree-shaped rearrangement regions are referred to as Palm Trees throughout the code.
RunAll.R
runs scripts in the correct orderParse*.R
andPrepare*.R
are several scripts to read data from different sources and create tidy representations for further analysisCallPalmTrees.R
uses output of structural variant merging to call palm trees (a.k.a. tree-shaped rearrangements or clusters of rearrangements)CallPalmTreesPedPanCan.R
calls palm trees on published data from Gröbner et al. 2018 (using the same settings as in CallPalmTrees.R)
CircosPlots.R
plots the identified rearrangements and palm tree regionsPalmTreeStackPlot.R
creates plots that integrate all CN, SV and Circle-seq informationGeneralPalmTreeStatistics.R
analyses number and length distributions of palm treesPalmTreeDensity.R
plots genome-wide recurrence of palm treesPedPanCanStats.R
explores palm tree occurrence in PedPanCan datasetCompareBerlinCohortToPedPanCan.R
compares palm tree prevalence between the Peifer/Berlin dataset and the PedPanCan neuroblastoma casesAnalyseNBCallPalmTreesRandomization.Rmd
explores 500 synthetic datasets for our cohort and the PedPanCan dataset respectively. Synthetic datasets were obtained by randomizing breakpoint positions during palm tree calling. This is used to estimate false-discovery rates due to the number of rearrangements per sample.
PalmTreeGenes.R
analyzes which genes can be found within palm tree regionsPalmTreeTargetsCloseToOncogenes.R
statistically analyses whether palm tree target sites are significantly enriched in the neighborhood of cancer-related genes.ExpressionAnalysis.R
screens for deregulated genes close to palm tree associated rearrangements
RegionSampling.R
contains sampling methods to randomly sample regions from a masked or unmasked genomeOverlap[...]CircleCalls.R
statistically analyses the degree of overlap between palm tree regions and eccDNA / ecDNA inferred from WGS / Circle-seqMergeWGSCSOverlapPlots.R
merges results from the preceding scripts for integrative plotting
ClinicalData.R
explores the prognostic significance of palm trees
CircleJunctionAnalyseSvabaBreakpoints.R
investigates sequence characteristics reconstructed Circle-seq circle junctionsSVAnalyseSvabaBreakpoints.R
investigates sequence characteristics interchromosomal rearrangements calls by SvabaMemeAnalysis.sh
runs MEME on several breakpoint-associated sequencesmyHomology.R
obtains microhomology estimates for accurate breakpoint coordinates from the reference genomeRepeats.R
tests for association of breakpoints with repetitive regions
If you have any questions concerning code or data, please do not hesitate to contact us at [email protected].