Skip to content

Commit

Permalink
updated the paper
Browse files Browse the repository at this point in the history
  • Loading branch information
ialbert committed Apr 21, 2024
1 parent f08c5f8 commit 2072431
Show file tree
Hide file tree
Showing 5 changed files with 39 additions and 22 deletions.
11 changes: 5 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,6 @@ tag: test
exe: test
python src/genescape/exe.py --build

shiny: test
pip install rsconnect
rsconnect deploy shiny src/genescape --name biostar --title GeneScape

# Generate images for the documentation
docimg:
Expand All @@ -63,8 +60,6 @@ fix:
push:
git commit -am 'saving work' && git push



build: clean
rm -rf build dist
hatch build
Expand Down Expand Up @@ -106,6 +101,10 @@ clean:
rm -rf build dist ${IDX_FILE}

env:
conda create -n genescape python=3.11 shiny rsconnect graphviz
micromamba create -n shiny python=3.11 rsconnect-python graphviz make -y

shiny:
cp -f src/genescape/web.py src/app/app.py
rsconnect deploy shiny src/app --name biostar --title GeneScape

.PHONY: test lint fix push clean build publish obo docimg web
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@

**GeneScape** is a Python-based [Shiny][pyshiny] application that be run both at the command line and also via a graphical user interface.

There is a public version of the software at:

* https://biostar.shinyapps.io/genescape/

[pyshiny]: https://shiny.posit.co/py/

## Quickstart
Expand Down
10 changes: 10 additions & 0 deletions docs/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -92,3 +92,13 @@ @ebi.ac.uk;
eprint = {https://academic.oup.com/bioinformatics/article-pdf/25/22/3045/48997998/bioinformatics\_25\_22\_3045.pdf},
}

@InProceedings{networkx,
author = {Aric A. Hagberg and Daniel A. Schult and Pieter J. Swart},
title = {Exploring Network Structure, Dynamics, and Function using NetworkX},
booktitle = {Proceedings of the 7th Python in Science Conference},
pages = {11 - 15},
address = {Pasadena, CA USA},
year = {2008},
editor = {Ga\"el Varoquaux and Travis Vaught and Jarrod Millman},
}

35 changes: 19 additions & 16 deletions docs/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ bibliography: paper.bib

The Gene Ontology (GO) [@Ashburner2000; @GO2023] is a structured vocabulary that describes gene products in the context of their associated functions. The ontology takes the form of a directed graph, where each node defines a term, and each edge represents a hierarchical relationship between the terms (the words of the vocabulary).

For example, in the GO data, `GO:0090630` defines *activation of GTPase activity* and is a child of `GO:0043547`, which is a *positive regulation of GTPase activity* which in turn is a child of `GO:0051345` representing a *positive regulation of hydrolase activity*.
For example, in the GO data, `GO:0090630` defines *activation of GTPase activity* and is a child of `GO:0043547`, defined as *positive regulation of GTPase activity* which in turn is a child of `GO:0051345` representing *positive regulation of hydrolase activity*.

Gene association files (GAF) are text files used to annotate an organism's gene products with Gene Ontology terms, associating a function to a gene product. For example, a GAF file connects a gene product label, such as `ZC3H11B`, with multiple GO terms, such as `GO:0046872` or `GO:0016973`. The complete human genome GAF representation contains 288,575 associations of 19,606 gene symbols over 18,680 GO terms.
Gene association files (GAF) are text files used to annotate an organism's gene products with Gene Ontology terms, associating a function to a gene product. For example, a GAF file connects a gene product label, such as `ZC3H11B`, with multiple GO terms, such as `GO:0046872` or `GO:0016973`. The complete human genome GAF representation contains 288,575 associations of 19,606 gene symbols with over 18,680 GO terms.

The [Gene Ontology Consortium][GO] maintains GAF files for various organisms. Typical genomic analysis protocols generate gene lists that must be placed in a functional context.

Expand All @@ -39,12 +39,14 @@ The most annotated gene in the human genome, `HTT1`, currently has 1098 annotati

Web-based tools designed to visualize and filter gene ontology data include `AmiGO` [@AmiGO] and `QuickGO` [@QuickGO]. Command line tools like `goatools` [@goatools] support GO term lineage visualization. R packages like `topGO` [@topGO] implement GO structure visualizations of enriched GO terms. We are unaware of locally installable software that specifically allows for interactive filtering and visualization of gene ontology derived on gene lists.

GeneScape is a Python package that allows users to visualize a list of gene products in terms of the functional context represented by the Gene Ontology. GeneScape is distributed both as a command-line tool and as GUI-enabled standalone software that does not require Python to be installed on the user's computer, thus making it accessible to a wide range of users.
GeneScape is a Python package that allows users to visualize a list of gene products in terms of the functional context represented by the Gene Ontology. GeneScape is distributed both as a command-line tool and as GUI-enabled standalone software via the [Shiny platform][shiny], thus making it accessible to a wide range of users.

[shiny]: https://shiny.posit.co/

GeneScape is distributed with prebuilt databases for human and mouse genomes. For other organisms, users need to download the GAF files from the Gene Ontology website and run the command:

```
genescape build --gaf mydata.gaf --index mydata.index.gz
genescape build --gaf mydata.gaf.gz --index mydata.index.gz
```

The `build` command will create a database that can then be used for all subsequent analyses with the software. Users should consult the [GeneScape documentation][docs] for up-to-date details.
Expand All @@ -64,17 +66,18 @@ GRTP1
GeneScape first transforms the above gene input list into a GO term list, where additional information is added to each term:

```
gid,root,count,function,source,size,label
GO:0090630,BP,1,activation of GTPase activity,GRTP1,4,(1/4)
GO:0046982,MF,1,protein heterodimerization activity,ABTB3,4,(1/4)
GO:0031083,CC,1,BLOC-1 complex,BCAS4,4,(1/4)
GO:0016020,CC,1,membrane,ABTB3,4,(1/4)
GO:0005737,CC,1,cytoplasm,BCAS4,4,(1/4)
GO:0005615,CC,1,extracellular space,C3P1,4,(1/4)
...
count,function,root,goid,source,size,label
1,activation of GTPase activity,BP,GO:0090630,GRTP1,4,(1/4)
1,protein heterodimerization activity,MF,GO:0046982,ABTB3,4,(1/4)
1,BLOC-1 complex,CC,GO:0031083,BCAS4,4,(1/4)
1,membrane,CC,GO:0016020,ABTB3,4,(1/4)
1,cytoplasm,CC,GO:0005737,BCAS4,4,(1/4)
1,extracellular space,CC,GO:0005615,C3P1,4,(1/4)
1,GTPase activator activity,MF,GO:0005096,GRTP1,4,(1/4)
1,endopeptidase inhibitor activity,MF,GO:0004866,C3P1,4,(1/4)
```

In the next step, GeneScape visualizes the GO terms as the graph structure that represents the functional context of the genes relative to the larger Gene Ontology.
In the next step, GeneScape draws the GO terms as the graph structure using the Networkx package [@networkx] helping users visualize the functional context of the genes relative to the larger Gene Ontology.

![Ontology subgraph for a gene list \label{fig:interface}](images/genescape-output1.png){height="216pt"}

Expand All @@ -83,9 +86,9 @@ Various colors are used to provide additional context to the nodes in the graph;
Since the resulting graphs may also be large, with thousands of nodes, the main interface provides input widgets that allow users to interactively
reduce the subgraph to nodes for which:

1. The function definitions match certain patterns
2. A minimum number of genes share a function,
3. Nodes belong to a specific GO subtree: Biological Process (BP), Molecular Function (MF), Cellular Component (CC)
1. The function definitions match certain patterns.
2. A minimum number of genes share a function.
3. Nodes belong to a specific GO subtree: Biological Process (BP), Molecular Function (MF), Cellular Component (CC).

As an example, take the input genelist of just four genes:

Expand Down
1 change: 1 addition & 0 deletions src/app/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
genescape

0 comments on commit 2072431

Please sign in to comment.