Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(4-B/N) update and run fusion summary #513

Merged
merged 114 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
bccc21e
update and run fusion summary
jharenza Dec 30, 2023
7d2fa49
run tmb
jharenza Dec 30, 2023
d62ea60
Merge branch 'focal-cn-v13' into fus-summ-v13
jharenza Dec 30, 2023
5782a54
Merge branch 'fus-summ-v13' into tmb-v13
jharenza Dec 30, 2023
c7e9db2
run gsva
jharenza Dec 30, 2023
927e051
run tp53
jharenza Dec 30, 2023
4c0a3e8
rerun MB
jharenza Dec 30, 2023
e1bfb98
restore yaml
jharenza Dec 30, 2023
9436603
run cranio
jharenza Dec 30, 2023
1d0e8e6
pull code from PR493 and rerun
jharenza Dec 30, 2023
92d096e
use keep cols for maf merge
jharenza Dec 30, 2023
a288f32
Merge branch 'mb-v13' into epn-v13
jharenza Dec 30, 2023
96a53b3
run embryonal subtype for v13
zzgeng Dec 15, 2023
acd2b53
add match id and remove PPTC NBL samples
zzgeng Dec 17, 2023
36e0f90
add code from PR494 and rerun
jharenza Dec 30, 2023
b4e4332
pull code from PR495 and rerun
jharenza Dec 30, 2023
1d61240
update gitignore
jharenza Dec 30, 2023
0817909
remove rds file from git
jharenza Dec 30, 2023
a288752
update gitignore
jharenza Dec 30, 2023
3bfdb6f
use match id, rerun with v13
jharenza Dec 17, 2023
3813978
add match id, rerun for v13
jharenza Dec 17, 2023
c5e9746
rerun
jharenza Dec 30, 2023
40543a8
add match_id, run v13
jharenza Dec 17, 2023
b255a95
add match id, run pineo v13
jharenza Dec 17, 2023
0a0f304
rerun
jharenza Dec 30, 2023
fd3e585
pull code from PR500 and rerun:
jharenza Dec 30, 2023
b63010f
pull code from PR502 and rerun
jharenza Dec 30, 2023
63d0629
pull code from PR502 and rerun
jharenza Dec 30, 2023
da15e2f
pull code from PR505 and rerun
jharenza Dec 30, 2023
7203b4a
pull code from PR506 and rerun
jharenza Dec 30, 2023
0ab32d0
make else statement for GA
jharenza Dec 30, 2023
80e9d61
fix df in else
jharenza Dec 30, 2023
8cedd42
Merge branch 'mb-v13' into epn-v13
jharenza Dec 30, 2023
ddeff93
Merge branch 'epn-v13' into emb-v13
jharenza Dec 30, 2023
2b70f6a
remove MN1--MN1 fusions, as they are in epn/lgg not emb
jharenza Dec 30, 2023
e874761
Merge branch 'emb-v13' into chordoma-v13
jharenza Dec 30, 2023
ca4f696
change rds to scratch to fix perm issue, rerun
jharenza Dec 30, 2023
c833738
Merge branch 'chordoma-v13' into ews-v13
jharenza Dec 30, 2023
339b356
Merge branch 'ews-v13' into neuro-v13
jharenza Dec 30, 2023
35b0c91
Merge branch 'neuro-v13' into atrt-v13
jharenza Dec 30, 2023
9cf3045
Merge branch 'atrt-v13' into pineo-v13
jharenza Dec 30, 2023
b131f24
Merge branch 'pineo-v13' into hgg-v13
jharenza Dec 30, 2023
e39b62c
erge branch 'hgg-v13' into lgg-v13
jharenza Dec 30, 2023
d3714bb
Merge branch 'lgg-v13' into nbl-v13
jharenza Dec 30, 2023
73353c8
Merge branch 'nbl-v13' into path-v13
jharenza Dec 30, 2023
254fb14
rerun
jharenza Dec 30, 2023
d600a0e
Merge branch 'path-v13' into int-v13
jharenza Dec 30, 2023
c166dc5
rerun
jharenza Dec 30, 2023
bb4a481
use match id for methyl matching
jharenza Dec 30, 2023
3b3a42a
Merge branch 'path-v13' into int-v13
jharenza Dec 30, 2023
183dcac
rerun after match id fix in path
jharenza Dec 30, 2023
1e46bd2
fix cg for DHG astrocytoma/gbm/etc
jharenza Dec 30, 2023
240e0d5
update 7316-3240 cg and harm dx
jharenza Dec 30, 2023
ebd8f52
add html
jharenza Dec 30, 2023
b2496c8
fix path free text inclusion using grepl
jharenza Dec 30, 2023
0aa5de6
add specimen list in output
jharenza Dec 30, 2023
8285717
Merge branch 'emb-v13' into chordoma-v13
jharenza Dec 30, 2023
04bf6d7
Merge branch 'chordoma-v13' into ews-v13
jharenza Dec 30, 2023
7a14c3c
Merge branch 'ews-v13' into neuro-v13
jharenza Dec 30, 2023
7e0b629
Merge branch 'neuro-v13' into atrt-v13
jharenza Dec 30, 2023
d1622ca
add PPTC samples
jharenza Dec 30, 2023
2412563
Merge branch 'atrt-v13' into pineo-v13
jharenza Dec 30, 2023
c99c38f
Merge branch 'pineo-v13' into hgg-v13
jharenza Dec 30, 2023
63fe5bd
add PPTC to combine table, was missed!
jharenza Dec 30, 2023
3d5d4b2
add EGFR to alterations
jharenza Dec 31, 2023
3a105f6
Merge branch 'hgg-v13' into lgg-v13
jharenza Dec 31, 2023
bf8e8e2
Merge branch 'lgg-v13' into nbl-v13
jharenza Dec 31, 2023
9d05044
Merge branch 'nbl-v13' into path-v13
jharenza Dec 31, 2023
dc3ee4c
rerun
jharenza Dec 31, 2023
e7f531f
Merge branch 'path-v13' into int-v13
jharenza Dec 31, 2023
10f46f6
fix chordoma subtypes adding TBC
jharenza Dec 31, 2023
4a39283
update BS_Q2R56X78, BS_QV60J6XZ, BS_3DV5FVPQ as K28 mut or altered ba…
jharenza Dec 31, 2023
b4afd07
Merge branch 'path-v13' into int-v13
jharenza Dec 31, 2023
1f9b2fe
fix chordoma typo
jharenza Dec 31, 2023
5f4ad65
Merge branch 'path-v13' into int-v13
jharenza Dec 31, 2023
f818d51
rerun
jharenza Dec 31, 2023
760b884
fix NA cancer groups
jharenza Dec 31, 2023
16c720b
oops, add back other cgs
jharenza Dec 31, 2023
7c6b1db
fix a few more NA, add subtype count table
jharenza Dec 31, 2023
ee84c11
Independent samples v13
jharenza Dec 31, 2023
4840991
add chordoma loss samples
jharenza Dec 31, 2023
0ca563d
run for v13
jharenza Dec 31, 2023
bfb15d5
fix appending rnaseq files, chordoma, and add new samples for non gatk
jharenza Dec 31, 2023
f20a6ac
rerun
jharenza Dec 31, 2023
010ce50
fix typo
jharenza Dec 31, 2023
f525195
add embryonal samples
jharenza Jan 1, 2024
3fe6c7b
change read_rds to readRDS
jharenza Jan 1, 2024
ac8c778
test changing perm of rds file
jharenza Jan 1, 2024
5e083b3
change back to results dir, add gitignore
jharenza Jan 1, 2024
545f744
remove turn off subset data option, dir back to scratch, rm gitignore
jharenza Jan 1, 2024
bd1c001
use match_id
jharenza Jan 2, 2024
fe919a2
Merge branch 'indep2-v13' into subset-v13
jharenza Jan 2, 2024
f7dd28f
Merge pull request #531 from d3b-center/subset-v13
jharenza Jan 2, 2024
e5e71c8
Merge pull request #530 from d3b-center/indep2-v13
jharenza Jan 2, 2024
0ac88a3
Merge pull request #529 from d3b-center/int-v13
zzgeng Jan 2, 2024
73a982b
Merge pull request #528 from d3b-center/path-v13
zzgeng Jan 2, 2024
6839173
Merge pull request #527 from d3b-center/nbl-v13
zzgeng Jan 2, 2024
9380798
Merge pull request #526 from d3b-center/lgg-v13
zzgeng Jan 2, 2024
32fb140
Merge pull request #525 from d3b-center/hgg-v13
zzgeng Jan 2, 2024
d0b717c
Merge pull request #524 from d3b-center/pineo-v13
zzgeng Jan 2, 2024
7981a3e
Merge pull request #523 from d3b-center/atrt-v13
zzgeng Jan 2, 2024
110f096
Merge pull request #522 from d3b-center/neuro-v13
zzgeng Jan 2, 2024
398d223
Merge pull request #521 from d3b-center/ews-v13
zzgeng Jan 2, 2024
2bb49a6
Merge pull request #520 from d3b-center/chordoma-v13
jharenza Jan 2, 2024
1a414a7
Merge pull request #519 from d3b-center/emb-v13
jharenza Jan 2, 2024
f631287
Merge pull request #518 from d3b-center/epn-v13
jharenza Jan 2, 2024
7dd8b64
Merge pull request #517 from d3b-center/mb-v13
jharenza Jan 3, 2024
454737a
rerun HGG, fixing PXA and hope snv maf bugs
jharenza Jan 3, 2024
b2bfef0
rerun path, int, update BS_H1XPVS9A for osissue 490
jharenza Jan 3, 2024
5c5896c
rerun path, int, update BS_H1XPVS9A for osissue 490
jharenza Jan 3, 2024
565d82f
Merge branch 'tp53-v13' of github.com:d3b-center/OpenPedCan-analysis …
jharenza Jan 3, 2024
0ff3e10
Merge pull request #516 from d3b-center/tp53-v13
jharenza Jan 3, 2024
1b5f3df
Merge pull request #515 from d3b-center/gsea-v13
jharenza Jan 3, 2024
0a4ef28
Merge pull request #514 from d3b-center/tmb-v13
jharenza Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,8 @@ open_pbta_envs.txt

# Everything in scratch
.scratch/

# Expression files in subset directories
analyses/molecular-subtyping*/*subset/*rsem-tpm*


176 changes: 123 additions & 53 deletions analyses/create-subset-files/01-get_biospecimen_identifiers.R

Large diffs are not rendered by default.

78 changes: 13 additions & 65 deletions analyses/create-subset-files/02-subset_files.R
Original file line number Diff line number Diff line change
Expand Up @@ -16,40 +16,8 @@ suppressWarnings(
)
suppressPackageStartupMessages(library(optparse))
suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(arrow))
suppressPackageStartupMessages(options(readr.show_col_types = FALSE))

write_maf_file <- function(maf_df, file_name, version_string) {
# Given a data.frame that contains the fields for a MAF file, write a gzipped
# MAF file and include the version information provided in version_string.
#
# Note: if file_name exists, it will be overwritten
#
# Args:
# maf_df: A data.frame that contains the MAF info.
# file_name: Output file name, including the full path.
# version_string: the version string that will be written to the first line
# of the file at file_name
#
# Returns: intended to be used to write files only

# if the file name supplied to this function ends in `.gz`, take it out for
# the purposes of writeLines, etc.
# we'll gzip it at the end with R.utils::gzip and this extension is not needed
if (grepl(".gz", file_name)) {
file_name <- sub(".gz", "", file_name)
}

# write the version string to the top of the file
writeLines(version_string, con = file_name)

# write the tabular data of maf_df
readr::write_tsv(maf_df, path = file_name, append = TRUE, col_names = TRUE)

# now gzip the file
R.utils::gzip(file_name, overwrite = TRUE)
}

subset_files <- function(filename, biospecimen_ids, output_directory) {
# given the full path to a file to be subset and the list of biospecimen ids
# to use for subsetting, write a file of the same name to the output directory
Expand All @@ -75,30 +43,6 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
# filtering strategy depends on the file type, mostly because how the sample
# IDs change based on the file type -- that's why this logic is required
if (grepl("snv", filename)) {
# if (grepl("hotspots", filename)) {
# snv_file <- data.table::fread(filename,
# skip = 1, # skip version string
# data.table = FALSE,
# showProgress = FALSE)
# # we need to obtain the version string from the first line of the MAF file
# version_string <- readLines(filename, n = 1)
# # filter + write to file with custom function
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# write_maf_file(file_name = output_file,
# version_string = version_string)
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# readr::write_tsv(output_file)
# } else {
# # in a column 'Tumor_Sample_Barcode'
# snv_file <- data.table::fread(filename, data.table = FALSE,
# showProgress = FALSE)
# snv_file %>%
# dplyr::filter(Tumor_Sample_Barcode %in% biospecimen_ids) %>%
# readr::write_tsv(output_file)
# }
# in a column 'Tumor_Sample_Barcode'
snv_file <- data.table::fread(filename, data.table = FALSE,
showProgress = FALSE)
snv_file %>%
Expand Down Expand Up @@ -133,7 +77,7 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
fusion_file %>%
dplyr::filter(Sample %in% biospecimen_ids |
# this is required for the the fusion-summary module and TP53 module
grepl("RELA|MN1|EWSR1|FGFR1--TACC1|MYB--QKI|BRAF|TP53--TRPS1|TP53--PSMG4", FusionName)) %>%
grepl("ZFTA|MN1|EWSR1|FGFR1--TACC1|MYB--QKI|BRAF|TP53--TRPS1|TP53--PSMG4", FusionName)) %>%
readr::write_tsv(output_file)
} else if (grepl("dgd", filename)) {
fusion_file %>%
Expand Down Expand Up @@ -168,9 +112,14 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
expression_file %>% dplyr::select(transcript_id, gene_symbol,
!!!rlang::quos(any_of(biospecimen_ids))) %>%
readr::write_rds(output_file)
} else if (grepl("methyl", filename)) {
expression_file %>% dplyr::select(Probe_ID,
!!!rlang::quos(any_of(biospecimen_ids))) %>%
# } else if (grepl("methyl", filename)) {
# expression_file %>% dplyr::select(Probe_ID,
# !!!rlang::quos(any_of(biospecimen_ids))) %>%
# readr::write_rds(output_file)
} else if (grepl("gtex", filename)) {
expression_file <- readr::read_rds(filename)
biospecimen_ids <- intersect(colnames(expression_file), biospecimen_ids)
expression_file %>% dplyr::select(!!!rlang::quos(any_of(biospecimen_ids))) %>%
readr::write_rds(output_file)
} else {
expression_file %>% dplyr::select(!!!rlang::quos(any_of(biospecimen_ids))) %>%
Expand All @@ -182,12 +131,11 @@ subset_files <- function(filename, biospecimen_ids, output_directory) {
independent_file %>%
dplyr::filter(Kids_First_Biospecimen_ID %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
} else if (grepl("splice-events-rmats", filename)) {
# } else if (grepl("splice-events-rmats", filename)) {
# in a column 'sample_id'
rmats_file <- arrow::read_tsv_arrow(filename)
rmats_file %>%
dplyr::filter(sample_id %in% biospecimen_ids) %>%
readr::write_tsv(output_file)
# rmats_file <- vroom::vroom(filename) %>%
# dplyr::filter(sample_id %in% biospecimen_ids) %>%
# readr::write_tsv(output_file)
} else {
# error-handling
stop("File type unrecognized by 'subset_files'")
Expand Down
6 changes: 4 additions & 2 deletions analyses/create-subset-files/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Steps for creating subset files for CI
## Steps for creating subset files for GitHub Actions CI

1. Update to the most recent release of the data by running `bash download-data.sh` in the root directory of the repository.
2. Run the shell script to generate subset files (from the root directory of the repository):
Expand All @@ -21,6 +21,8 @@ Non-matched samples are also added to each file (10% of `--num_matched`), which
Some files are copied over in their entirety (e.g., BED files).
See `create_subset_files.sh` for more information.

Note: `splice-events-rmats.tsv.gz` and all `methyl*` files are skipped in v13 due to large size and that no modules currently routinely utilize these files.

#### Special considerations

Certain analysis modules have required modifications to the subset file creation steps beyond randomly selecting participants.
Expand Down Expand Up @@ -55,6 +57,6 @@ Running the following from the root directory of the repository
SKIP_SUBSETTING=1 ./analyses/create-subset-files/create_subset_files.sh
```

will skip the subsetting file steps that are implemented in R and only copy files that are included in full (e.g., `pbta-histologies.tsv`) and generate a new `md5sum.txt`.
will skip the subsetting file steps that are implemented in R and only copy files that are included in full (e.g., `histologies.tsv`) and generate a new `md5sum.txt`.
This is intended to be used when the only files that need to be updated are those that are copied over without being reduced in size in anyway.

Binary file modified analyses/create-subset-files/biospecimen_ids_for_subset.RDS
Binary file not shown.
12 changes: 10 additions & 2 deletions analyses/create-subset-files/create_subset_files.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ set -o pipefail

# Set defaults for release and biospecimen file name
BIOSPECIMEN_FILE=${BIOSPECIMEN_FILE:-biospecimen_ids_for_subset.RDS}
RELEASE=${RELEASE:-v12}
RELEASE=${RELEASE:-v13}
NUM_MATCHED=${NUM_MATCHED:-15}

# This option controls whether or not the two larger MAF files are skipped as
Expand Down Expand Up @@ -41,7 +41,6 @@ fi
# download Illumina methylation annotations file if does not exist in data
# from the data release s3 bucket
URL="https://d3b-openaccess-us-east-1-prd-pbta.s3.amazonaws.com/open-targets"
RELEASE="v12"
PROBES="infinium.gencode.v39.probe.annotations.tsv.gz"
if [ -f "${DATA_DIRECTORY}/${PROBES}" ]; then
echo "${PROBES} exists, skip downloading"
Expand Down Expand Up @@ -101,6 +100,15 @@ cp $FULL_DIRECTORY/cnv-consensus-gistic.zip $SUBSET_DIRECTORY
# all bed files
cp $FULL_DIRECTORY/*.bed $SUBSET_DIRECTORY

# DGD fusion file
cp $FULL_DIRECTORY/fusion-dgd.tsv.gz $SUBSET_DIRECTORY

# All proteomic files
cp $FULL_DIRECTORY/*protein* $SUBSET_DIRECTORY

# Full tumor only MAF (for now, it is small)
cp $FULL_DIRECTORY/snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz $SUBSET_DIRECTORY

# if the md5sum.txt file already exists, get rid of it
cd $SUBSET_DIRECTORY
rm -f md5sum.txt
Expand Down
36 changes: 18 additions & 18 deletions analyses/efo-mondo-mapping/results/efo-mondo-map-prefill.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -4,73 +4,70 @@ Acute Myeloid Leukemia EFO_0000222 MONDO_0018874 NCIT_C3171
Adamantinomatous Craniopharyngioma EFO_1000069 MONDO_0002787 NCIT_C4726
Adenocarcinoma EFO_0000228 MONDO_0004970 NCIT_C2852
Adrenocortical Carcinoma EFO_1000796 MONDO_0006639 NCIT_C9325
Anaplastic Large Cell Lymphoma EFO_0003032 MONDO_0020325 NCIT_C3720
Alveolar soft part sarcoma NA NA NA
Angiosarcoma EFO_0003968 MONDO_0016982 NCIT_C3088
Astroblastoma MONDO_0016707 MONDO_0016707 NCIT_C4324
Astrocytoma EFO_0000272 MONDO_0019781 NCIT_C6958
Atypical Teratoid Rhabdoid Tumor EFO_1002008 MONDO_0020560 NCIT_C6906
Atypical choroid plexus papilloma MONDO_0002684 MONDO_0002684 NCIT_C53686
B Acute Lymphoblastic Leukemia/Lymphoma EFO_0000094 MONDO_0004967 NCIT_C8644
Bladder Urothelial Carcinoma EFO_0006544 MONDO_0005611 NCIT_C39851
Breast Invasive Carcinoma EFO_1000307 MONDO_0006256 NCIT_C9245
Burkitt Leukemia/Lymphoma EFO_0000309 MONDO_0007243 NCIT_C2912
CIC-DUX4 Sarcoma EFO_0000691 MONDO_0005089 NCIT_C165663
CIC-rearranged sarcoma NA NA NA
CNS Burkitt's lymphoma EFO_0000309 MONDO_0007243 NCIT_C2912
CNS Embryonal tumor EFO_0005784 MONDO_0018843 NCIT_C5398
CNS Melanoma EFO_0002617 MONDO_0005191 NCIT_C133504
CNS neuroblastoma EFO_0000621 MONDO_0006130 NCIT_C4826
CNS tumor with BCOR internal tandem duplication NA NA NA
Cavernoma EFO_1000151 MONDO_0003155 NCIT_C3086
Central neurocytoma EFO_1000856 MONDO_0019134 NCIT_C3791
Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma EFO_1000162 MONDO_0006143 NCIT_C157526
Cholangiocarcinoma EFO_0005221 MONDO_0019087 NCIT_C4436
Chondromyxoid fibroma EFO_0000332 MONDO_0018447 NCIT_C3830
Chordoma Orphanet_178 MONDO_0008978 NCIT_C2947
Choroid plexus carcinoma MONDO_0016718 MONDO_0016718 NCIT_C4715
Choroid plexus papilloma EFO_1000177 MONDO_0009837 NCIT_C3698
Choroid plexus tumor EFO_0007206 MONDO_0016717 NCIT_C4533
Chromophobe renal cell carcinoma EFO_0000335 MONDO_0017885 NCIT_C4146
Chronic Myelogenous Leukemia EFO_0000339 MONDO_0011996 NCIT_C3174
Clear cell sarcoma of the kidney EFO_0000350 MONDO_0005006 NCIT_C4264
Colon Adenocarcinoma EFO_1001949 MONDO_0002271 NCIT_C4349
Colon Carcinoma NA NA NA
Congenital malignant brain tumor NA NA NA
Craniopharyngioma EFO_1000209 MONDO_0002787 NCIT_C2964
Cutaneous Melanoma EFO_0000389 MONDO_0005012 NCIT_C3510
Desmoid-type fibromatosis EFO_0009907 Orphanet_873 NCIT_C9182
Desmoplastic infantile astrocytoma and ganglioglioma MONDO_0016731 MONDO_0016731 NCIT_C4747
Diffuse fibrillary astrocytoma MONDO_0016688 MONDO_0016688 NCIT_C4322
Diffuse hemispheric glioma MONDO_0016680 MONDO_0016680 NA
Diffuse intrinsic pontine glioma EFO_1000026 MONDO_0006033 NCIT_C94764
Diffuse leptomeningeal glioneuronal tumor MONDO_0016745 MONDO_0016745 NCIT_C129424
Diffuse midline glioma EFO_1000026 MONDO_0006033 NCIT_C129309
Dysembryoplastic neuroepithelial tumor EFO_0005551 MONDO_0005505 NCIT_C9505
Dysgerminoma MONDO_0003002 MONDO_0003002 NCIT_C2996
EBV-Positive Diffuse Large B-Cell Lymphoma NA NA NA
Embryonal tumor with multilayer rosettes MONDO_0016715 MONDO_0016715 NCIT_C129499
Ependymoma EFO_1000028 MONDO_0016698 NCIT_C3017
Epstein-Barr virus-related tumor MONDO_0017342 MONDO_0017342 NA
Esophageal Carcinoma EFO_0002916 MONDO_0019086 NCIT_C3513
Ewing sarcoma EFO_0000174 MONDO_0012817 NCIT_C4817
Extraventricular neurocytoma MONDO_0016727 MONDO_0016727 NCIT_C92555
Fibromyxoid lesion MONDO_0037745 MONDO_0037745 NCIT_C66760
Follicular Variant Thyroid Gland Papillary Carcinoma NA NA NA
Ganglioglioma EFO_0003094 MONDO_0016733 NCIT_C3788
Ganglioneuroblastoma EFO_0000502 MONDO_0005035 NCIT_C3790
Ganglioneuroma EFO_0000500 MONDO_0005033 NCIT_C3049
Germ Cell Tumor EFO_0000514 MONDO_0005040 NCIT_C3708
Germinoma MONDO_0020580 MONDO_0020580 NCIT_C121618
Glial-neuronal tumor MONDO_0016729 MONDO_0016729 NCIT_C4747
Glial-neuronal tumor NOS MONDO_0016729 MONDO_0016729 NCIT_C4747
Glioblastoma MONDO_0018177 MONDO_0018177 NCIT_C30587
Glioblastoma Multiforme EFO_0000519 MONDO_0018177 NCIT_C3058
Head and Neck Squamous Cell Carcinoma EFO_0000181 MONDO_0010150 NCIT_C34447
Hemangioblastoma MONDO_0016748 MONDO_0016748 NCIT_C3801
Hepatoblastoma EFO_1000292 MONDO_0018666 NCIT_C3728
Hepatocellular Carcinoma EFO_0000182 MONDO_0007256 NCIT_C3099
Hepatocellular neoplasm NOS NA NA NA
High-grade glioma MONDO_0100342 MONDO_0100342 NCIT_C4822
High-grade neuroepithelial tumor NA NA NA
Histiocytic tumor MONDO_0020081 MONDO_0020081 NCIT_C9294
Hodgkin's lymphoma EFO_0000183 MONDO_0004952 NCIT_C9357
Infant-type hemispheric glioma EFO_0005543 MONDO_0014695 NCIT_C185471
Infantile Fibrosarcoma MONDO_0002678 MONDO_0002678 NCIT_C4244
Infantile hemispheric glioma NA NA NA
Inflammatory Myofibroblastic Tumor MONDO_0015798 MONDO_0015798 NCIT_C6481
Intrahepatic Cholangiocarcinoma EFO_1001961 MONDO_0003210 NCIT_C35417
Intraneural perineuroma MONDO_0015032 MONDO_0015032 NCIT_C6911
Juvenile xanthogranuloma EFO_1000311 MONDO_0015534 NCIT_C3451
Langerhans Cell histiocytosis EFO_1000318 MONDO_0018310 NCIT_C3107
Expand All @@ -87,22 +84,22 @@ Mesenchymal tumor EFO_1000473 MONDO_0003512 NCIT_C7059
Mesothelioma EFO_0000588 MONDO_0005065 NCIT_C3234
Metastatic secondary tumors EFO_0009812 MONDO_0024883 NCIT_C4968
Mixed germ cell tumor MONDO_0015864 MONDO_0015864 NCIT_C4290
Myeloid Leukemia Associated with Down Syndrome NA NA NA
Myeloid Sarcoma NA NA NA
Neuroblastoma EFO_0000621 MONDO_0005072 NCIT_C3270
Neuroepithelial tumor with PATZ1 fusion NA NA NA
Neurofibroma/Plexiform EFO_0000658 MONDO_0003304 NCIT_C3797
Non-Hodgkin Lymphoma EFO_0005952 MONDO_0018908 NCIT_C3211
Non-germinomatous germ cell tumor MONDO_0020580 MONDO_0020580 NCIT_C121619
Oligodendroglioma EFO_0000632 MONDO_0016695 NCIT_C3288
Osteosarcoma EFO_0000637 MONDO_0009807 NCIT_C9145
Other tumor NA NA NA
Ovarian Serous Cystadenocarcinoma EFO_1000043 MONDO_0006046 NCIT_C7978
Pancreatic Adenocarcinoma EFO_1000044 MONDO_0006047 NCIT_C8294
Perineuroma MONDO_0019404 MONDO_0019404 NCIT_C4973
Pancreatoblastoma NA NA NA
Papillary Carcinoma NA NA NA
Pheochromocytoma and Paraganglioma EFO_0020005 MONDO_0035540 NA
Pilocytic astrocytoma Orphanet_251612 MONDO_0016691 NCIT_C4047
Pineoblastoma EFO_1000475 MONDO_0016722 NCIT_C9344
Pineocytoma EFO_1000476 MONDO_0016723 NCIT_C6966
Pleomorphic xanthoastrocytoma MONDO_0016690 MONDO_0016690 NCIT_C4323
Primary intracranial sarcoma NA NA NA
Primary mediastinal large B cell lymphoma MONDO_0004021 MONDO_0020323 NCIT_C9280
Prostate Adenocarcinoma EFO_0000673 MONDO_0005082 NCIT_C2919
Rectum Adenocarcinoma EFO_0005631 MONDO_0002169 NCIT_C9383
Expand All @@ -115,15 +112,18 @@ Rosai-Dorfman disease MONDO_0006412 MONDO_0006412 NCIT_C36075
Rosette-forming glioneuronal tumor MONDO_0016736 MONDO_0016736 NCIT_C129431
Sarcoma EFO_0000691 MONDO_0005089 NCIT_C9118
Schwannoma EFO_0000693 MONDO_0002546 NCIT_C3269
Small Cell Carcinoma NA NA NA
Spindle cell neoplasm NA NA NA
Stomach Adenocarcinoma EFO_0000503 MONDO_0005036 NCIT_C4004
Subependymal Giant Cell Astrocytoma MONDO_0016693 MONDO_0016693 NCIT_C3696
T Acute Lymphoblastic Leukemia/Lymphoma EFO_0000209 MONDO_0004963 NCIT_C3183
Teratoma MONDO_0002601 MONDO_0002601 NCIT_C3403
Testicular Germ Cell Tumor EFO_1000566 MONDO_0010108 NCIT_C8591
Thymoma EFO_1000581 MONDO_0006456 NCIT_C3411
Thyroid Carcinoma EFO_0002892 MONDO_0015075 NCIT_C4815
Thyroid Gland Follicular Carcinoma EFO_0000501 MONDO_0005034 NCIT_C8054
Thyroid Gland Papillary Carcinoma EFO_0000641 MONDO_0005075 NCIT_C4035
Thyroid gland neoplasm NA NA NA
Type I Pleuropulmonary Blastoma NA NA NA
Uterine Carcinosarcoma EFO_1000613 MONDO_0006485 NCIT_C42700
Uterine Corpus Endometrial Carcinoma EFO_0007532 MONDO_0000553 NCIT_C159413
Uveal Melanoma EFO_1000616 MONDO_0006486 NCIT_C7712
Expand Down
8 changes: 4 additions & 4 deletions analyses/fusion-summary/01-fusion-summary.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@ prepareOutput <- function(fuseDF, bioid) {
fuseDF %>%
# some fusions have in-frame and frameshift fusion calls for a sample
# this will make unique fusionName and Sample dataset to get 1/0 values
dplyr::select(Sample,FusionName) %>%
unique() %>%
reshape2::dcast(Sample ~ FusionName,fun.aggregate = length) %>%
right_join(data.frame(Sample = bioid)) %>%
distinct(Sample, FusionName) %>%
mutate(Count = 1) %>%
pivot_wider(names_from = FusionName, values_from = Count, values_fill = list(Count = 0)) %>%
right_join(data.frame(Sample = specimensUnion)) %>%
replace(is.na(.), 0) %>%
rename(Kids_First_Biospecimen_ID = Sample)
}
Expand Down
Loading