Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about how pangene defines genes #15

Open
Fight-a-tiger-in-the-mountain opened this issue Oct 24, 2024 · 0 comments
Open

Questions about how pangene defines genes #15

Fight-a-tiger-in-the-mountain opened this issue Oct 24, 2024 · 0 comments

Comments

@Fight-a-tiger-in-the-mountain

question2024-10-24.docx

Dear Teacher Li Heng, I am a postgraduate student. I have analyzed the Assembly of sheep database according to your process, but I have encountered some problems, so I would like to ask you for advice

  1. Regarding the definition of genes in your script, what is your definition of genes? How do you determine if this gene is in the Assembly?

Because I ran out some results according to your process, but there are some problems with this result, some of my results are shown below

image

The results showed that the ASIP gene only appeared in ASM1117029 and ASM2422226, but through blast comparison, I compared the gene in all Assembly, and the results are as follows

image

However, my personal coding ability is not good enough to fully understand the meaning of your script, so I would like to ask you what is the definition of whether this individual contains this gene in your script?

2.Regarding the interpretation of the results generated in your script, I used your code to generate part of the results. I found the ASIP gene in the gfa file, as shown in the following figure.

image

The corresponding bubble diagram has also been generated, as shown in the following figure

image

The ASIP gene was also found in the Rtab file

image

But I can't find ASIP in the bubble file

image

The same problem appears in genes such as IFNT11, GRID1 and KLHL.

Those are my main questions. The data I used is shown in the following table (the red data is the deleted data).

Awassi | ASM4054305v1
Bangladeshi_sheep | ASM3243364v1
Charollais_sheep | ASM2241674v1
Chinese_Merino_sheep | ASM2243282v1
Dorper | ASM1914517v1
East_Friesian | NWAFU_Friesian_1.0
East_Friesian_sheep | ASM3343944v1
Guide_black_fur_sheep | ASM4025935v1
Hu_sheep | ASM1117029v1
Hu_sheep | T2T-sheep1.0M
hu_sheep | T2T-sheep1.0
Kazak_sheep | ASM2243284v1
Kermani_sheep | ASM2243283v1
Polled_Dorset_sheep | ASM2241691v1
Qiaoke_sheep | ASM2241668v1
Rambouillet | ARS-UI_Ramb_v3.0
Romanov_sheep | ASM2422217v1
Romney_sheep | ASM2253800v1
Suffolk_sheep | ASM2241672v1
Texel_sheep | ASM2241677v1
Tibetan_sheep | CAU_O.aries_1.0
Ujimqin_sheep | ASM2241675v1
Waggir_sheep | ASM2422226v1
White_Dorper_sheep | ASM2241669v1
Yunnan_sheep | ASM2241678v1

The protein data I used was ensemble's Sheep 2.0 data
Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.pep.all.fa

The gtf file I use is the corresponding comment file
Ovis_aries_rambouillet.ARS-UI_Ramb_v2.0.111.gtf

My main code is as follows

image image image image

The attachments are my gfa file, Rtab file, bubble file, gene result file and blast comparison file respectively.

I sincerely hope that you can give me some advice when you are not busy.
提问2024-10-24.docx
question.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant