Neighbours_package v2
This is the new package for gene neighbourhood analysis. The package contains two main scripts:
Both scripts use to set up mongo connection, and also de folder data, in which is storage KEGG pathways information.
Finally, is a module required for Contains most functions used by script.
Usage: -u <unigene cluster> or -f <file(unigenes_list).txt>
use -u
just to inspect one unigene. If you want to analyze a complete list of unigene use -f
The output contains three different lines:
- Keegs assignation
- EGGnog assignation
- List of neighbours genes for analysed unigene
In the first and second lines, columns from 1 to 11 contain:
gmgc: unigene cluster
query_cogs: cogs(Kegg,Eggnog) assigned in mongo database to query unigene cluster
subject_cogs: cogs(Kegg,Eggnog) predicted by neighbourhood analysis
analysed_orfs: number of ORFs that contain the query unigene cluster
number_neigh_genes: the total sum of neighbours genes presents in all the ORFs that conforms the unigene cluster
number_neigh_with_cogs: number of neigh genes with cog assignation(Kegg or EGGnog)
unique_cogs: number of unique cogs present in the query unigene cluster
count_of_cogs: the total sum of all cogs(kegg or EGGnog) presentin the query unigene cluster
cog_conservation: 1 - [(unique_cogs/count_of_cogs)/number_neigh_with_cogs]
hit_cog_percentage: percentage of presence of a determined cog in the neighbourhood. cog_description: Functional description of cogs
For the 3rd line, List of neighbours genes for analysed unigene. The ORFs are separeted by comma, and neighbourhood of every ORF is represented by its unigene identifier and not for its gen ID. For example, Below you can see one selected ORF from the 000_000_005 unigene cluster. Five unigenes are separated by '@', 2 upstream of the ORFs and 2 downstream of the query ORF, and in the middle its present the query unigene that represent the ORF.
In the case there are not neigh genes presents, the unigene identifier would be substitute by 'NA'.
Usage: <number_of_neighbours_genes_to_display> <unigene_cluster>
Example: 2 000_000_005
It controls the number of unigenes to display upstream and downstream of your target
graphication script compute all the different syntenies presente in that unigene clustrer. Thus only show uniques syntenies for every unigene cluster
Highligted in grey appears the ORFs of the unigene cluster, in the central position. Neighbours genes appears in colors, yellowe for Keggs, and green for EGGnogs. In case there are not neighbours genes, these will marked as a red [X]. Functional description are below.