Error while retriving HOGIDs #46

dhanuushbala · 2025-01-03T10:09:49Z

I am trying to identify all missing genes in the species of my interest for my project. For this, I performed the proteome assessment and retrived the HOG IDs of all missing genes. However, I am trying to verify if the list of genes are actually missing by blasting the gene sequence from the same HOG Family against my species of interest and comparing with the annotation file I have.

i am successfully able to retrive the list by while trying to use the HOG IDs to get the sequence of a related HOG, I face "500 Server Error: Internal Server Error " for some HOG IDs (not all).

I would like to know if there are any internal errors in the server? It would be great if you could help resolve my problem.

Do the HOG IDs change when the LUCA h5 database updates? I am asking because, during my study I had to rerun the proteome assessment when the proteome was update. I would also like to know the version.

Sincerly,
Dhanuush.

alpae · 2025-01-06T11:46:16Z

Hi Dhanuush,

could you let us know how you try to get the sequences for a given HOG-id? are you using the OMA api?

also, it could be helpful if you could tell when the requests were made, so I can check in the logs what goes wrong.

Regarding the second question: yes, the HOG-ids will change with every OMA realease update. The letter in the HOG-id (currently E) will change to the next letter. The OMA browser will resolve old hog-ids to new ones if possible (based on shared membership of genes and taxonomic level). We also consider to provide in the future a flat file to convert old HOG-IDs to new ones.

Best,

Adrian

dhanuushbala · 2025-01-06T17:35:39Z

Yes, using API. I will also share the snippet for clarity

`from omadb import Client
import random
c = Client()

random.seed(0)

#write fasta file (nucleotide sequences)
fasta_file = output_dir + 'missing_genes.fna'
lost_hogs = lost_df['hog']
level = 'Tetrapoda'

with open(fasta_file, "w") as outfile:
for hog_id in lost_hogs:
members = c.hogs.members(hog_id, level=level)
members = [x['entry_nr'] for x in members]
entrynr = int(random.sample(members, 1)[0])

    outfile.write(">" + str(entrynr) + "|" +
                  c.entries[entrynr]['canonicalid'] + "|" +
                  hog_id +
                  "\n" )
    outfile.write(c.entries[entrynr]['cdna'] + "\n")

outfile.close()`

We ran this in December (around 12th/13th Dec). I ran it again in january this week, it gave me similar errors.

Okay, thanks. But can I know what is the current version called? (the E version).
How frequent are these updates?

Best,
Dhanuush.

alpae · 2025-01-07T11:15:55Z

Dear Dhanuush,

it looks like you enforce the Tetrapoda level for retrieving the HOG members, but several of these HOGs don't go up the species tree up to Tetrapoda (e.g. 'HOG:E0732584' goes up to Amniota, or 'HOG:E0718996' goes up to Myomorpha). How comes that you expect the gene to exist at the Tetrapoda level (you didn't try to map it with OMAmer to that level only, right?).

If you want to ensure you don't select a member gene from the HOG that is too distant from your species, you could first check if the hog reaches a certain level and fetch the members of that level or the rootlevel otherwise:

for hog_id in   lost_hogs:
     hoginfo = c.hogs.info(hog_id)
     if "Tetrapoda" == hoginfo.level or "Tetrapoda" in hoginfo.alternative_levels:
           lev = "Tetrapoda"
     else:
           lev = None
     members = c.hogs.members(hog_id, level=lev)
     members = [x['entry_nr'] for x in members]

Of course, you could also iterate the Tetrapoda over the whole lineage of your species of interest, e.g. first try more recent levels and go up until you find one.

The current OMA release (with HOG-letter E) is called All.Jul2024. Note that you can check in the OMAmer database from which OMA release it was build by running

omamer info --db LUCA.h5

We update the OMA browser usually 1-2 per year (nowadays rather one time a year).

Cheers Adrian

dhanuushbala · 2025-01-07T14:14:24Z

Hi Adrain,

Right! Now I see where the problem is. Thanks much for the information.

I forgot to mention something. I am trying this out with several species (Pleurodeles, axolotl, xenopus, mouse, humans, chicken, etc).

I do not have this error in pleurodeles, axolotl and xenopus (in the basal tetrapods) and I see these in mouse humans and other higher level tetrapods. While this makes sense, I am wondering why aren't the basal tetrapods that I am looking at do not have these errors ( As in why are the HOG_IDs which are not in tetrapods not showing up in the missing genes in these species). I will try to check if there is anything wrong with the first step while performing the proteome assessment.
If you have comments on this above thoughts, please let me know!

Cheers,
Dhanuush.

alpae · 2025-01-08T06:59:31Z

Hi Dhanuush,

as a guess, it could be simply because in OMA we don't have that many basel tetrapods species included. so there are fewer level in between the extant species and the tetrapod level, at which we could infer the gene gains (i.e. root level of hog). only 3 out of 117 species are non amniota species.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while retriving HOGIDs #46

Error while retriving HOGIDs #46

dhanuushbala commented Jan 3, 2025

alpae commented Jan 6, 2025

dhanuushbala commented Jan 6, 2025 •

edited

Loading

alpae commented Jan 7, 2025

dhanuushbala commented Jan 7, 2025 •

edited

Loading

alpae commented Jan 8, 2025

Error while retriving HOGIDs #46

Error while retriving HOGIDs #46

Comments

dhanuushbala commented Jan 3, 2025

alpae commented Jan 6, 2025

dhanuushbala commented Jan 6, 2025 • edited Loading

alpae commented Jan 7, 2025

dhanuushbala commented Jan 7, 2025 • edited Loading

alpae commented Jan 8, 2025

dhanuushbala commented Jan 6, 2025 •

edited

Loading

dhanuushbala commented Jan 7, 2025 •

edited

Loading