You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to run the FastOMA and while analyzing the output, I noticed that there are some genes that I know should be orthologous, but they are separated into several HOGs (in consecutive numbers). So when I tried to use iHam, I could not trace their evolutionary history in one big picture.
Interestingly, these HOGs point to the same OMAmer root HOGs in the RootHOGs.tsv file
I also tried to align these HOGs, and they do not seem to differ too much.
My question is:
May I know what are the reasons for the gene to be broken into several HOGs?
Is it possible due to the missing gene for several taxa?
Thanks for your kind help!
Best,
Edi
The text was updated successfully, but these errors were encountered:
Hi Edi
I'm glad you managed to run it. I see, that's unfortunate you couldn't see the big picture you were looking for. We did see some cases like this before. It needs some additional check to know the exact evolutionary scenario (or an algorithm limitation).
So the gene family inference by fastoma (OMAmer under the hood) is not bad; there is enough homology which was detected . Then fastOMA infers gene tree for each taxonomic level for each gene family. FastOMA infers duplication events using species overlap concept. One possible explanation is that fastOMA infers a duplication event at the root level of the initial gene family. Then, it generates one smaller HOG for each ancestral genes (duplicates) at that level. This means that we have a few homologous gained genes at that level (which might be unlikely).
If it is an important gene family for your analsis, I can dig into it. I need to have the fasta file of the gene family and the species tree which are in the relevant folder in work. We can find the folder for e.g. HOG:E1111 by running sth like cd work; find . -name ".command.log" | xargs grep E1111. The folder also includes .command.sh which gives us the location of the HOG fasta. I'm wondering about the taxonomic level of the initial gene family (the LCA of all species present in the gene family) compared to the final ones? We can see the LCA and in .command.log. If it is the same as the root of input species tree, then it means that the set of input proteome doesn't have enough outgroups, (I guess this scenario is unlikely). It is also unlikely to have a few gained genes at one taxonomic level. (there might be some other genes in other species which wasn't detected by omamer). I can re-run the relevant step in fastOMA (.command.sh) and add fastoma-infer-subhogs ... --msa-write --gene-trees-write to write all gene trees to see exactly what happened, whether the duplication inference was correct and it wasn't due to some rogue sequences.
Hi,
Thank you for your previous help.
I was able to run the FastOMA and while analyzing the output, I noticed that there are some genes that I know should be orthologous, but they are separated into several HOGs (in consecutive numbers). So when I tried to use iHam, I could not trace their evolutionary history in one big picture.
Interestingly, these HOGs point to the same OMAmer root HOGs in the RootHOGs.tsv file
I also tried to align these HOGs, and they do not seem to differ too much.
My question is:
May I know what are the reasons for the gene to be broken into several HOGs?
Is it possible due to the missing gene for several taxa?
Thanks for your kind help!
Best,
Edi
The text was updated successfully, but these errors were encountered: