-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add sus scrofa #1312 #1672
base: main
Are you sure you want to change the base?
add sus scrofa #1312 #1672
Conversation
I will work with @AprilYUZhang on revising this PR (for issue #1312) and the follow-up QC. |
stdpopsim/catalog/SusScr/species.py
Outdated
_ZhangEtAl = stdpopsim.Citation( | ||
doi="https://doi.org/10.1016/j.gpb.2022.02.001", | ||
year=2022, | ||
author="Mingpeng Zhang et al.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to author="Zhang et al.",
stdpopsim/catalog/SusScr/species.py
Outdated
_JohnssonEtAl = stdpopsim.Citation( | ||
doi="https://doi.org/10.1186/s12711-021-00643-0", | ||
year=2021, | ||
author="Martin Johnsson et al.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to author="Johnsson et al.",
stdpopsim/catalog/SusScr/species.py
Outdated
"16": 1.1407464686635653e-08, | ||
"17": 1.3779949516098884e-08, | ||
"18": 1.3679923888648167e-08, | ||
"X": -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a value for X chromosome too. I appreciate that Johnsson et al. (2021) does not list X-chromosome recombination rate. Hence, I suggest that you calculate average recombination rate across other chromosomes (exclude Y and MT) and replace this for the X-chromosome rec rate value. Make this calculation clear in the code here.
stdpopsim/catalog/SusScr/species.py
Outdated
# citation is optional and can be deleted if not needed.] | ||
citations=[ | ||
stdpopsim.Citation( | ||
author="", year=-1, doi="", reasons={stdpopsim.CiteReason.ASSEMBLY} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AprilYUZhang provide reference for the pig genome assembly that we use here
stdpopsim/catalog/SusScr/species.py
Outdated
common_name="Pig", | ||
genome=_genome, | ||
ploidy=2, | ||
# The age at first pregnancy varies in the wild from about 10 to 20 months, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AprilYUZhang cite the paper here in the comments and point to page/figure/table for there estimates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have no idea; this sentence is from the explanation of generation time. Should I write "the details see in _ZhangEtAl"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point to where in the paper is this detail, say page
stdpopsim/catalog/SusScr/species.py
Outdated
author="Martin Johnsson et al.", | ||
reasons={stdpopsim.CiteReason.REC_RATE}, | ||
) | ||
# [The following are notes for implementers and should be deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AprilYUZhang cite the paper here in the comments and point to page/figure/table for there estimates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AprilYUZhang I left some comments for you to address. I will then have another look. Thanks for this addition to the stdpopsim catalogue!
stdpopsim/catalog/SusScr/species.py
Outdated
"MT": 0, | ||
} | ||
|
||
# the de novo mutation rate is inferred by Mingpeng Zhang et al 2022, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AprilYUZhang point to page/figure/table for there estimates from Zhang et al
…us_scrofa pull remote, becase remote have been updated
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1672 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 136 139 +3
Lines 4690 4718 +28
Branches 470 470
=======================================
+ Hits 4683 4711 +28
Misses 3 3
Partials 4 4 ☔ View full report in Codecov by Sentry. |
I advised @AprilYUZhang offline on this implementation, which is now very complete, and I volunteer to QC it. I have a question though about recombination rate averaging. The information on the recombination rate for this species comes from a study with an extensive pedigree (so we deem it's very good) and the study reported the recombination rate per Mb of each autosome. To get a value for the X chromosome, @AprilYUZhang has:
I think this makes sense, but would like to hear from others here what you have done in such cases. I think that weighted average to represent the recombination rate per basepair for the genome is the right thing to do (average of ratios ...). But is this the right value to use for the X chromosome (in the absence of other information) or should one use the simple (unweighted) average. It's a minor detail, but I thought I would ask here. Otherwise, I think this looks very good @AprilYUZhang and could soon be merged in by the maintainers. You will have to squash the commits before this happens though! @AprilYUZhang is also looking at adding the recombination map and demographic models later. |
Weighted average is consistent with what we do elsewhere - sounds good! |
# which is roughly equivalent to the average age of | ||
# the first pregnancy in sows and the beginning of rut in boars | ||
# plus the pregnant gestation period of sows." | ||
generation_time=3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm - generation time isn't the average age of first pregnancy of an individual, it's the average age of a parent at time of birth of their child, i.e., the thing you'd get by picking a random individual and asking how old their parent was when they were born. For instance, human generation time is 25-30 years (estimates vary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrelharp your are totally right, but this is verbatim text from the paper that we are following, so I guess we stick to their 3 years, which is an OK value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, I guess? I mean, according to the stated rationale, that explanation is an underestimate, probably by a good factor, unless most pigs die after one litter.
This paper used 5 years, which the paper you're citing cites several other papers as using? Hm, that's "based on a studbook"; I wonder what the method is?
However, googling suggests that a mean life span is like 4-8 years; so 3 can't be too far off (maybe just a factor of 2). If you do want to use this one, then make sure you document that it is not the actual generation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrelharp you are completely right here! Darn, I should know this (generation interval is a key quantity in the breeder's equation) :( I guess I was too fixated on following the publication. Indeed, looking now at this, 3 years is indeed low. @AprilYUZhang best if you follow https://link.springer.com/article/10.1186/s12711-016-0204-2 for the generation interval statistic. This will be a "fun" discrepancy when dealing with demographic models:(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if part of the discrepancy is that for the breeder's equation the relevant "generation time" is "how fast can we get these pigs to breed" which is more like the minimum value, not the average-in-the-wild value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, your definition still stands! Generation interval is the average age of parents at the age of their progeny. Having said this, to increase response to selection, breeders aim to reduce this average by replacing parents as soon as possible.
Looks good! Just a thing about the generation time. |
Thank you for your suggestions @gregorgorjanc @petrelharp! I will follow these comments. I still have a question: do we need to change other parameters? Because now there are mainly two sets of pig demography paramsters. One is sourced from "Analyses of pig genomes provide insight into porcine demography and evolution" where the generation interval is 5 years and the mutation rate is |
That is a good question. I think that (a) for the "species" parameters, choose what you judge to be the "best". However, for just this sort of reason, demographic models can specify their own mutation rate (and I think generation time?) that overrides the species parameters, so things are self-consistent. If you run into problems, let me know? |
@AprilYUZhang That 2.5x10^-8 is from human studies (at least that's what https://pmc.ncbi.nlm.nih.gov/articles/PMC3566564/ and https://pmc.ncbi.nlm.nih.gov/articles/PMC4053821/ state, and https://link.springer.com/article/10.1186/s12711-016-0204-2 cites these two papers as a source). The one you started with (3.6x10^-9) has an independent estimate from de-novo mutations in a pedigree, so it should be better, but it's an order of magnitude lower than the human one (which might be ok) and the recombination rate (which makes me wonder). Annoyingly, this new demographic model preprint https://www.biorxiv.org/content/10.1101/2025.02.05.636574v1 also used 5 years for generation interval and 2.5x10^-8 mutation rate (citing https://link.springer.com/article/10.1186/s12711-016-0204-2), so these values keep propagating because everyone uses them:( Chrissy has done work in this space (https://icqg6.org/wp-content/uploads/2020/11/ICQG6_2020_Abstract_Book.pdf#page=85, https://link.springer.com/article/10.1186/s12864-023-09296-3) so talk to her - she will be able to guide you to the best current estimate for mutation rate. But for the generation interval, I do think that 3 years could indeed be low, particularly for a wild population, which would correspond to most of the period that we would simulate here! I reckon that 5 years is a better estimate, however I am now starting to wonder if the value of 5 is propagated through literature similarly as the human mutation rate!? Can you please check the average age at first litter for sows and for boars and their average lifespan and a reference that looked into this without plucking a value from the air? |
No description provided.