This document contains information about all data files associated with this project. Each file will have the following association information:
- File type will be one of:
- Reference file: Obtained from an external source/database. When known, the obtained data and a link to the external source is included.
- Modified reference file: Obtained from an external source/database but modified for OpenPBTA use.
- Processed data file: Data that are processed upstream of the analysis project, e.g., the output of a somatic single nucleotide variant method. Links to the relevant D3B Center or Kids First workflow (and version where applicable) are included in Origin.
- Analysis file: Any file created by a script in
analyses/*
.
- Origin
- For Processed data files, a link the relevant D3B Center or Kids First workflow (and version where applicable).
- When applicable, a link to the specific script that produced (or modified, for Modified reference file types) the data.
- File description
- A brief one sentence description of what the file contains (e.g., bed files contain coordinates for features XYZ).
File name | File Type | Origin | File Description |
---|---|---|---|
histologies-base.tsv |
Data file | Cohort-specific data files and databases | Clinical and sequencing metadata for each biospecimen |
histologies.tsv |
Modified data file | molecular-subtyping-integrate |
histologies-base.tsv plus molecular_subtype , cancer_group , integrated_diagnosis , and harmonized_diagnosis |
intersect_cds_lancet_strelka_mutect_WGS.bed |
Analysis file | snv-callers |
Intersection of gencode.v39.primary_assembly.annotation.gtf.gz CDS with Lancet, Strelka2, Mutect2 regions |
intersect_strelka_mutect_WGS.bed |
Analysis file | snv-callers |
Intersection of gencode.v39.primary_assembly.annotation.gtf.gz CDS with Strelka2 and Mutect2 regions called |
efo-mondo-map.tsv |
Reference mapping file | Manual collation | Mapping of EFO and MONDO codes to cancer groups |
efo-mondo-map-prefill.tsv |
Modified reference mapping file | Analysis file generated in molecular-subtyping-integrate |
Mapping of EFO and MONDO codes to cancer groups |
ensg-hugo-pmtl-mapping.tsv |
Reference mapping file | Manual curation of PMTLv1.1 by FNL; RNA-Seq pipeline GTF mapping | File which maps Hugo Symbols to ENSEMBL gene IDs an each ENSG to the RMTL curated by FNL |
*.bed |
Reference file | Manual collation | Bed files used for variant calling and are used for tmb calculation |
uberon-map-gtex-group.tsv |
Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx broad groups |
uberon-map-gtex-subgroup.tsv |
Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx subgroups |
methyl-beta-values.rds |
Processed data file | methylation beta values | Methylation beta values |
methyl-m-values.rds |
Processed data file | methylation m values | Methylation m values |
rna-isoform-expression-rsem-tpm.rds |
Processed data file | RNA isoform TPM files | RNA isoform TPM files |
fusion-dgd.tsv |
Processed data file | DGD merged fusion results | DGD merged fusion results |
fusion-arriba.tsv.gz |
Processed data file | Gene fusion detection; Workflow | Fusion - Arriba TSV, annotated with FusionAnnotator |
fusion-starfusion.tsv.gz |
Processed data file | Gene fusion detection; Workflow | Fusion - STARFusion TSV |
fusion-annoFuse.tsv.gz |
Processed data file | AnnoFuse QC filtered fusion file; Workflow | Filter out normal and non-expressed fusions |
fusion_summary_embryonal_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of embryonal tumor fusions of interest |
fusion_summary_ependymoma_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of ependymal tumor fusions of interest |
fusion_summary_ewings_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of Ewing's sarcoma fusions of interest |
fusion_summary_lgg_hgg_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of LGG and HGG fusions of interest |
fusion-putative-oncogenic.tsv |
Analysis file | fusion_filtering |
Filtered and prioritized fusions |
gene-counts-rsem-expected_count-collapsed.rds |
Analysis file | PBTA+GMKF+TARGET collapse-rnaseq |
Gene expression - RSEM expected_count for each samples collapsed to gene symbol (gene-level) |
gene-expression-rsem-tpm-collapsed.rds |
Analysis file | PBTA+GMKF+TARGET collapse-rnaseq |
Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |
tcga_gene-counts-rsem-expected_count-collapsed.rds |
Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level) |
tcga_gene-expression-rsem-tpm-collapsed.rds |
Modified reference file | TCGA samples lifted from GENCODE v27 to v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |
gtex_gene-expression-rsem-tpm-collapsed.rds |
Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |
gtex_gene-counts-rsem-expected_count-collapsed.rds |
Modified reference file | GTEX v8 release lifted to GENCODE v39 | Gene expression - RSEM counts for each samples collapsed to gene symbol (gene-level) |
WGS.hg38.lancet.300bp_padded.bed |
Reference Target/Baits File | SNV and INDEL calling | WGS.hg38.lancet.unpadded.bed file with each region padded by 300 bp |
WGS.hg38.lancet.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 WGS regions created using UTR, exome, and start/stop codon features of the GENCODE 31 reference, augmented with PASS variant calls from Strelka2 and Mutect2 |
WGS.hg38.mutect2.vardict.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M and non-N regions) used for Mutect2 and VarDict variant callers |
WGS.hg38.strelka2.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M) used for Strelka2 variant caller |
WGS.hg38.vardict.100bp_padded.bed |
Reference Regions File | SNV and INDEL calling | WGS.hg38.mutect2.vardict.unpadded.bed with each region padded by 100 bp used for VarDict variant caller |
snv-consensus-plus-hotspots.maf.tsv.gz |
Analysis file | Kids First somatic workflow consensus calls | Consensus (2 of 4) maf +1/4 hotspots |
snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz |
Analysis file | Kids First Tumor Only workflow | Mutect2 tumor only with additional filters to remove t_alt_count <5 |
cnv-cnvkit.seg.gz |
Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - CNVkit SEG file |
cnv-consensus.seg.gz |
Analysis file | [copy_number_consensus_call ]](https://github.com/d3b-center/OpenPedCan-analysis/tree/dev/analyses/copy_number_consensus_call) |
Somatic Copy Number Variant - WGS samples only |
cnvkit_with_status.tsv consensus_seg_with_status.tsv |
Analysis files | copy_number_consensus_call |
CNVkit calls for WXS or CNV consensus calls for WGS with gain/loss status |
cnv-consensus-gistic.gz |
Analysis file | run-gistic |
GISTIC results - WGS samples only |
cnv-controlfreec.tsv.gz |
Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - TSV file that is a merge of ControlFreeC *_CNVs files |
cnv-controlfreec-tumor-only.tsv.gz |
Processed data file | Copy number variant calling Workflow - tumor only | Somatic Copy Number Variant - TSV file that is a merge of ControlFreeC *_CNVs files |
cnv-gatk.seg.gz |
Processed data file | Copy number variant calling | Somatic Copy Number Variant - TSV SEG file produced by GATK CNV |
consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; all chromosomes |
consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only_x_and_y.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; sex chromosomes only |
consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only_autosomes.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; autosomes chromosomes only |
snv-mutation-tmb-all.tsv |
Processed data file | tmb-calculation |
TSV file with sample names and their tumor mutation burden counting all variants |
snv-mutation-tmb-coding.tsv |
Processed data file | tmb-calculation |
TSV file with sample names and their tumor mutation burden counting all variants in coding region only |
sv-manta.tsv.gz |
Processed data file | Structural variant calling; Workflow | Somatic Structural Variant - Manta output, annotated with AnnotSV (WGS samples only) |
splice-events-rmats.tsv.gz |
Processed data | Kids First splice variant workflow; Workflow | rMATs single sample workflow |
cptac-protein-imputed-phospho-expression-log2-ratio.tsv.gz |
Processed data | CPTAC pediatric brain tumor phospho-proteomics expression | Imputed phospho-protein expression, log2 TMT ratio |
cptac-protein-imputed-prot-expression-abundance.tsv.gz |
Processed data | CPTAC pediatric brain tumor protein expression | Imputed whole cell protein expression, total abundance |
cptac-protein-imputed-prot-expression-log2-ratio.tsv.gz |
Processed data | CPTAC pediatric brain tumor protein expression | Imputed whole cell protein expression, log2 TMT ratio |
gbm-protein-imputed-phospho-expression-abundance.tsv.gz |
Processed data | CPTAC adult GBM brain tumor phospho-proteomics expression | Imputed phospho-protein expression, total abundance |
gbm-protein-imputed-prot-expression-abundance.tsv.gz |
Processed data | CPTAC adult GBM brain tumor protein expression | Imputed whole cell expression, total abundance |
hope-protein-imputed-phospho-expression-abundance.tsv.gz |
Processed data | Adult and Young Adolescent (AYA) brain tumor phospho-proteomics expression (Project HOPE) | Imputed phospho-protein expression, total abundance |
hope-protein-imputed-prot-expression-abundance.tsv.gz |
Processed data | Adult and Young Adolescent (AYA) brain tumor protein expression (Project HOPE) | Imputed whole cell protein expression, total abundance |
rna-dna-qc-stats.tsv |
Reference QC file | Quality control metrics for WGS, WXS, DNA panel, and RNA-Seq samples | Used to filter samples for data release |
mirna-expression-counts.rds |
Processed data | miRNA expression counts | Generated from HTG-Seq |
independent-specimens.methyl.primary.tsv |
|||
independent-specimens.methyl.relapse.tsv |
|||
independent-specimens.rnaseq.primary.eachcohort.tsv |
|||
independent-specimens.rnaseq.primary.tsv |
|||
independent-specimens.rnaseq.relapse-pre-release.tsv |
|||
independent-specimens.rnaseq.relapse.eachcohort.tsv |
|||
independent-specimens.rnaseq.relapse.tsv |
|||
independent-specimens.rnaseq.primary-plus-pre-release.tsv |
|||
independent-specimens.rnaseqpanel.primary-plus.pre-release.tsv |
|||
independent-specimens.rnaseqpanel.primary-plus.tsv |
|||
independent-specimens.rnaseqpanel.primary.eachcohort.tsv |
|||
independent-specimens.rnaseqpanel.primary.tsv |
|||
independent-specimens.rnaseqpanel.relapse.eachcohort.tsv |
|||
independent-specimens.rnaseqpanel.relapse.tsv |
|||
independent-specimens.wgs.primary-plus.eachcohort.tsv |
|||
independent-specimens.wgs.primary-plus.tsv |
|||
independent-specimens.wgs.primary.eachcohort.tsv |
|||
independent-specimens.wgs.primary.tsv |
|||
independent-specimens.wgs.relapse.eachcohort.tsv |
|||
independent-specimens.wgs.relapse.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.tsv |
|||
independent-specimens.wgswxspanel.primary.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.tsv |
|||
independent-specimens.wgswxspanel.relapse.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.relapse.tsv |
|||
independent-specimens.methyl.primary-plus.eachcohort.tsv |
|||
independent-specimens.methyl.primary.eachcohort.tsv |
|||
independent-specimens.methyl.relapse.eachcohort.tsv |
Analysis files | independent-samples |
Independent (non-redundant) sample list of DNA, RNA, or methylation samples of all sequencing methods, from primary, primary-plus, or relapse tumors within each or across all cohorts |
independent-specimens.rnaseq.primary-plus-pre-release.tsv |
|||
independent-specimens.rnaseq.primary-pre-release.tsv |
|||
independent-specimens.rnaseq.primary-pre-release.tsv |
|||
independent-specimens.rnaseq.relapse-pre-release.tsv |
Analysis files | independent-samples |
Independent (non-redundant) sample list of RNA samples of all sequencing methods, from primary, primary-plus, or relapse tumors across all cohorts for the purposes of running fusion_filtering pre-release |