Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This PR adds an workflow for automating the analysis of the landscape of a given domain, given a (mostly declarative) configuration describing the resources in that domain. It includes five landscape analyses: 1. [Disease](https://github.com/biopragmatics/semra/blob/disease-landscape/scripts/disease/disease-landscape.ipynb) 2. [Cell & Cell Line](https://github.com/biopragmatics/semra/blob/disease-landscape/scripts/cell/cell-landscape.ipynb) 3. [Anatomy](https://github.com/biopragmatics/semra/blob/disease-landscape/scripts/anatomy/anatomy-landscape.ipynb) 4. [Protein Complex](https://github.com/biopragmatics/semra/blob/disease-landscape/scripts/complex/complex-landscape.ipynb) 5. [Gene](https://github.com/biopragmatics/semra/blob/disease-landscape/scripts/gene/gene-landscape.ipynb) To do: - [x] Make comparison chart between raw mappings + processed - [x] Integrate GARD - [x] Integrate OMIM - [x] Integrate Orphanet - [x] Remove HPO - [x] Create upset plot - [x] Slice out irrelevant hierarchies from MeSH, EFO, etc. - [x] Create landscape histogram This PR also makes other improvements to the underlying SeMRA pipeline and web app, including closes #16. ![](https://raw.githubusercontent.com/biopragmatics/semra/disease-landscape/notebooks/landscape/disease/graph.svg) We're able to automatically generate an UpSet plot like the one in [How many rare diseases are there? (Haendel *et al.*, 2020)](https://doi.org/10.1038/d41573-019-00180-y) (a similar plot to the following appears in the [supplementary info](https://media.nature.com/original/magazine-assets/d41573-019-00180-y/17308594) and an explanation appears on [zenodo](https://zenodo.org/records/3478576)). Note that our plot is about all diseases, not specifically rare ones: ![](https://raw.githubusercontent.com/biopragmatics/semra/disease-landscape/notebooks/landscape/disease/landscape_upset.svg) The following histogram estimates how many diseases there are. Importantly, it shows how many show up in a single resource, how many show up in all resources, and how many show up in a few ![](https://raw.githubusercontent.com/biopragmatics/semra/disease-landscape/notebooks/landscape/disease/landscape_histogram.svg)
- Loading branch information