Skip to content

Commit

Permalink
preparing for 1.0
Browse files Browse the repository at this point in the history
  • Loading branch information
ialbert committed May 22, 2024
1 parent f3ae1d9 commit 47aeb97
Show file tree
Hide file tree
Showing 10 changed files with 62 additions and 38 deletions.
9 changes: 6 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,12 @@ tag: test

# Generate images for the documentation
docimg:
genescape tree -o docs/images/gs_output_1.png src/genescape/data/test_genes.txt
genescape tree -o docs/images/gs_output_2.png src/genescape/data/test_genes.txt --mincov 1
genescape tree -m repair --mincov 2 -o docs/images/gs_output_3.png src/genescape/data/test_genes.txt
genescape tree -o docs/images/gs_output_1.png docs/genelist_1.txt
genescape tree -o docs/images/gs_output_2.png docs/genelist_2.txt --mincov 1
genescape tree -m lipid --mincov 2 -o docs/images/gs_output_3.png docs/genelist_2.txt

pdf:
(cd docs && pandoc paper.md -o paper.pdf)

# Runs a linter.
lint:
Expand Down
4 changes: 4 additions & 0 deletions docs/genelist_1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
ABTB3
BCAS4
C3P1
GRTP1
4 changes: 4 additions & 0 deletions docs/genelist_2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Cyp1a1
Sphk2
Sptlc2
Smpd3
Binary file modified docs/images/gs_output_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/gs_output_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/gs_output_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 39 additions & 21 deletions docs/paper.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 'GeneScape: A Python package for gene ontology analysis'
title: 'GeneScape: A Python package for gene ontology visualization'
tags:
- Python
- biology
Expand Down Expand Up @@ -35,7 +35,7 @@ The [Gene Ontology Consortium][GO] maintains GAF files for various organisms. Ty

# Statement of need

The most annotated gene in the human genome, `HTT1`, currently has 1098 annotations. Thus, even small lists of genes may have a large number of annotations presenting an extraordinary challenge for interpretation. There is a clear need to visualize shared gene functions in an informative manner.
The most annotated gene in the human genome, `HTT`, currently has 1100 annotations. Thus, even small lists of genes may have a large number of annotations presenting an extraordinary challenge for interpretation. There is a clear need to visualize shared gene functions in an informative manner.

Web-based tools designed to visualize and filter gene ontology data include `AmiGO` [@AmiGO] and `QuickGO` [@QuickGO]. Command line tools like `goatools` [@goatools] support GO term lineage visualization. R packages like `topGO` [@topGO] implement GO structure visualizations of enriched GO terms. We are unaware of locally installable software that specifically allows for interactive filtering and visualization of gene ontology derived on gene lists.

Expand Down Expand Up @@ -72,28 +72,43 @@ GRTP1
Users can process the list above via the command line or via the Shiny interface. A command line invocation might look like:

```console
genescape tree genes.txt -o output.pdf
genescape tree genes1.txt -o output.pdf
```

The command above will produce the image:

![Ontology subgraph for a gene list \label{fig:interface}][img_shiny]{height="216pt"}

[img_shiny]: images/gs_output_1.png

Internally, GeneScape first transforms the input gene list into a GO term list, where additional information is added to each term:

```
count,function,root,goid,source,size,label
1,activation of GTPase activity,BP,GO:0090630,GRTP1,4,(1/4)
1,protein heterodimerization activity,MF,GO:0046982,ABTB3,4,(1/4)
1,BLOC-1 complex,CC,GO:0031083,BCAS4,4,(1/4)
1,membrane,CC,GO:0016020,ABTB3,4,(1/4)
1,cytoplasm,CC,GO:0005737,BCAS4,4,(1/4)
1,extracellular space,CC,GO:0005615,C3P1,4,(1/4)
1,GTPase activator activity,MF,GO:0005096,GRTP1,4,(1/4)
1,endopeptidase inhibitor activity,MF,GO:0004866,C3P1,4,(1/4)
Coverage,Function,Domain,GO,Genes
1,endopeptidase inhibitor activity,MF,GO:0004866,C3P1
1,GTPase activator activity,MF,GO:0005096,GRTP1
1,extracellular space,CC,GO:0005615,C3P1
1,cytoplasm,CC,GO:0005737,BCAS4
1,membrane,CC,GO:0016020,ABTB3
1,PDZ domain binding,MF,GO:0030165,ABTB3
1,BLOC-1 complex,CC,GO:0031083,BCAS4
1,"synaptic transmission, glutamatergic",BP,GO:0035249,ABTB3
1,exploration behavior,BP,GO:0035640,ABTB3
1,protein heterodimerization activity,MF,GO:0046982,ABTB3
1,protein stabilization,BP,GO:0050821,ABTB3
1,activation of GTPase activity,BP,GO:0090630,GRTP1
1,glutamatergic synapse,CC,GO:0098978,ABTB3
```

In the next step, GeneScape draws the GO terms as the graph structure using the Networkx package [@networkx] helping users visualize the functional context of the genes relative to the larger Gene Ontology.

![Ontology subgraph for a gene list \label{fig:interface}](images/gs_output_1.png){height="216pt"}
Various colors and labels are used to provide additional context to the nodes in the graph; for example, functions present in the input genes are colored green. The intermediate nodes are colored by their category. Node labels display the total annotations and the number of genes that carry that function.

Various colors are used to provide additional context to the nodes in the graph; for example, functions derived from the input genes are colored green. The intermediate nodes are colored by their category.
![Filtering a large graph for a specific term \label{fig:help}][img_help]{height="216pt"}

[img_help]: images/node_help_1.png

In the web interface, users can zoom in and out of the tree. The software's command-line version supports generating outputs in various formats, such as PDF or PNG.

Since the resulting graphs may also be large, with thousands of nodes, the main interface provides input widgets that allow users to interactively
reduce the subgraph to nodes for which:
Expand All @@ -111,21 +126,24 @@ Sptlc2
Smpd3
```

the resulting functional ontology graph is huge:
the resulting functional ontology graph is large with 641 nodes and 1007 edges:

![Very few genes can produce a large ontology tree \label{fig:huge}][img_bigtree]

![Very few genes can produce a large ontology tree \label{fig:huge}](images/gs_output_2.png)
[img_bigtree]: images/gs_output_2.png

Users can reduce the tree to show only terms that match the word `repair` via the graphical user interface or the command line:
Users can reduce the tree to show only terms that match the word `lipid` and with at least two genes supporting the function via the graphical user interface or the command line:

```console
genescape tree -m repair genes.txt -o output.pdf
genescape tree -m lipid --micov 2 genes2.txt -o output.pdf
```

The filtering process will result in a smaller tree focused on the functions that contain the word "repair":
The filtering process will result in a smaller tree with 18 nodes and 29 edges focused on the functions that contain the word "lipid":

![Filtering a large graph for a specific term \label{fig:filter}](images/gs_output_3.png){height="216pt"}
![Filtering a large graph for a specific term \label{fig:filter}][img_filter]{height="216pt"}

[img_filter]: images/gs_output_3.png

In the web interface, users can zoom in and out of the tree. The software's command-line version supports generating outputs in various formats, such as PDF or PNG.

The software's primary purpose is to allow users to assess the functional depth of genes and to identify commonalities and differences in the functional context of these genes.

Expand Down
4 changes: 2 additions & 2 deletions src/genescape/gs_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def __init__(self, idg, targets=[], root=utils.NS_ALL, mincount=1, pattern=''):

# Store the missing targets
if self.missing_targets:
msg = f"{len(self.missing_targets)} unkown symbols: {','.join(self.missing_targets)}"
msg = f"{len(self.missing_targets)} unkown symbols: {', '.join(self.missing_targets)}"
self.errors.append(msg)

# The valid targets.
Expand All @@ -75,7 +75,7 @@ def __init__(self, idg, targets=[], root=utils.NS_ALL, mincount=1, pattern=''):
self.missing_goids = list(filter(lambda x: x not in self.idx.obo, self.goids))

if self.missing_goids:
msg = f"Missing GO terms: {','.join(self.missing_goids)}"
msg = f"Missing GO terms: {', '.join(self.missing_goids)}"
self.errors.append(msg)

# Nodes to build the subgraph from.
Expand Down
2 changes: 1 addition & 1 deletion src/genescape/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ def web(idx_fname='', host='localhost', port=8000, reload=False, test=False):
if test:
os.environ['GENESCAPE_TEST'] = '1'

shiny.run_app("genescape.shiny.tree.app:app", host=host, port=port, reload=reload, factory=True)
shiny.run_app("genescape.shiny.tree.app:app", host=host, port=port, reload=reload)


@run.command()
Expand Down
17 changes: 6 additions & 11 deletions src/genescape/shiny/tree/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ def random_gene_list(res, N=50):
The **Coverage** column indicates how many genes in the input cover that function.
How to reduce the graph size:
1. The **Coverage** filter will select for the minimum coverage.
Expand All @@ -75,26 +74,22 @@ def random_gene_list(res, N=50):
* **Coverage=5** at least 5 genes carrying the function
* **Filter=GTP|kinase** keep only functions that match both `GTP` and `kinase`.
Notes:
1. When the **Coverage** is not specified the program will guess a reasonable value for it.
"""

HELP_TAB = """
Type a list of genes of GO terms then press "Draw Tree".
Tips for making graphs smaller:
## Usage
1. Filter for minimum coverage in the functions. You can use regular expressions in the pattern.
1. Press **Draw Tree** to generate the graph.
1. The **Coverage** indicates how many genes in the input cover that function.
1. When the **Coverage** is not specified the program will guess a reasonable value for it.
**Legend**
## Legend
<img src="https://raw.githubusercontent.com/ialbert/genescape-central/main/docs/images/node_help_1.png" class="img-fluid help" alt="Node help">
The coverage indicates how many genes in the input cover that function.
**Colors**
## Colors
Green nodes indicate functions present in input genes.
Expand Down

0 comments on commit 47aeb97

Please sign in to comment.