How should I decide the number of Segregating sites? #76
-
In my simulation, I want to have different chromosome lengths, in order to try to replicate real data as best as possible. I'm doing a validation of my simulated data by calculating the linkage disequilibrium decay, by exporting the data using writePlink(). This function exports data using the segregating sites values, not the actual basepairs values (according to the manual). I'm having trouble to understand what a "segregating site" represents, and if is there an equivalent number of basepairs and segregating sites I could use to simulate my data. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
@alanaselli segregating site is simply a polymorphic site in a genome. I would have to check the actual position in the genome ... @gaynorr can you help? |
Beta Was this translation helpful? Give feedback.
-
AlphaSimR itself doesn't track physical base pair position. It only tracks genetic map position, because that is what really matters when modeling recombination. The order is the same for both, so AlphaSimR is tracking the physical ordering of loci. To get base pair positions, you have to convert genetic map positions to physical base pair positions. How you do this depends on how the simulation was created. If you create a simulation using external genotype data, you probably already have the base pair positions. However, I'm assuming you are more likely to be using MaCS via The first thing to note is that MaCS also doesn't track physical base pair positions. As far as I can tell, this is the case for other coalescent software programs too. These programs take a number of base pairs as an argument, but this gets used for calculations of probabilities and not tracking of actual base pairs. The genome itself is instead modeled using continuous numbers, as opposed to discrete base pair positions. In the case of MaCS, these numbers range between 0 and 1. To get a physical base pair position from MaCS, you have to convert the continuous numbers MaCS simulates to discrete base pair positions. As an example, assume the software simulations the position of a mutation occurring at 0.28948 in a simulated piece of DNA. This can be converted to a base pair position by multiplication. Let's assume we used 10,000 base pairs for this piece of DNA, so the approximate base pair position would be I should also note that AlphaSimR forces the first genetic map position on each chromosome to zero, so it loses a bit of information coming from MaCS regarding where the positions are actually being simulated. This means all the base pair positions get shifted over a bit, but the distances between them will be conserved. Finally, I should also note that the way MaCS gets used in AlphaSimR is to assume that recombination rates are uniform across the genome. This is clearly not the case in reality, as recombination rates tend to be low around centromeres and higher near telomeres. What I'm really assuming in AlphaSimR is that causal variants (QTLs) and genotyped variants (SNPs) are evenly spread along the genetic map, because the physical map itself doesn't really matter to the simulation. Since gene density is often assumed to be higher around the telomeres, this seems to be reasonable for QTL. Whether or not this is reasonable for SNPs can be assessed by looking at real distribution along genetic maps. Here is an example simulation showing how to convert genetic map positions to base pair positions:
|
Beta Was this translation helpful? Give feedback.
AlphaSimR itself doesn't track physical base pair position. It only tracks genetic map position, because that is what really matters when modeling recombination. The order is the same for both, so AlphaSimR is tracking the physical ordering of loci. To get base pair positions, you have to convert genetic map positions to physical base pair positions. How you do this depends on how the simulation was created. If you create a simulation using external genotype data, you probably already have the base pair positions. However, I'm assuming you are more likely to be using MaCS via
runMacs
orrunMacs2
, so I'll show how to handle this case.The first thing to note is that MaCS also doesn't track…