-
Hello, I am currently using AlphasimR to simulate two traits and would like to know if it is possible to simulate incomplete pleiotropy, where some QTLs are shared between the two traits while others are trait-specific. Additionally, can I control the correlation value between the two traits during this simulation? I read that this is not possible in 2022. Has this changed? Thank you in advance for your assistance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi @vfririon. Yes, you have the option to simulate fully pleiotropic and non-pleotropic traits or a completely general solution where you provide QTL effects of your choice. We cover some of this in our recent paper - see Plant breeding simulations with AlphaSimR at https://doi.org/10.1002/csc2.21312 and associated code examples at https://github.com/HighlanderLab/jbancic_alphasimr_plants/blob/main/04_Features/multipleTraits.R and https://github.com/HighlanderLab/jbancic_alphasimr_plants/blob/main/04_Features/importExternalHaplo.R The flexibility comes at a cost of the ease of controlling genetic correlation - this is why the default is fully pleiotropic - in this case it is easy to specify the desired correlation, less so in the other cases. I think more could be done here and would be an interesting algorithmic and code development! |
Beta Was this translation helpful? Give feedback.
-
Hi @gregorgorjanc, AlphaSimR does not natively allow to simulate incomplete pleiotropy. To overcome this limitation, four "pseudo-traits" are simulated: two that exhibit complete pleiotropy and two that are independent. These pseudo-traits can then be paired and combined, either theoretically or through selection on a multi-trait index, for exemple. ####################################################################################
#' An AlphaSimR script that generates a genetic background for two correlated traits
#' (some QTLs are pleiotropic, that is, common to both traits, others are specific).
#' The script targets specified genetic variances and correlation.
#'
#' @author Victor FRIRION (URFM - INRAE)
#'
#' R version 4.3.0 (2023-04-21 ucrt)
####################################################################################
#' @description
#' The `createPop` function create a population with specified genetic parameters
#'
#' @param drawingMethod Character. Specifies the method for drawing allele frequencies.
#' Options include `"Fixed"`, `"User-defined"`, `"Uniform"`, `"Beta"`, `"Gamma"`, and `"Normal"`.
#' @param nbInd Integer. Number of individuals in the population.
#' @param var_trait1 Numeric. Genetic variance for Trait 1.
#' @param tolerance_mean_trait1 Numeric. Tolerance for the genetic mean of Trait 1. The target genetic mean value is 0.
#' @param nbChr_trait1_spe Integer. Number of chromosomes with QTLs specific to Trait 1 (one locus per chromosome).
#' @param var_trait2 Numeric. Genetic variance for Trait 2.
#' @param tolerance_mean_trait2 Numeric. Tolerance for the genetic mean of Trait 2. The target genetic mean value is 0.
#' @param nbChr_trait2_spe Integer. Number of chromosomes with QTLs specific to Trait 2 (one locus per chromosome).
#' @param geneticCorr Numeric. Target genetic correlation between Trait 1 and Trait 2.
#' @param tolerance_corr Numeric. Tolerance for the genetic correlation.
#' @param pleioVarProp_trait1 Numeric. Proportion of genetic variance for Trait 1 due to the pleiotropic QTLs.
#' @param pleioVarProp_trait2 Numeric. Proportion of genetic variance for Trait 2 due to the pleiotropic QTLs.
#' @param nbChr_pleio Integer. Number of chromosomes with pleiotropic QTLs shared between the traits (one locus per chromosome).
#' @param maxIterations Integer. Maximum number of iterations allowed to achieve the desired genetic parameters. Default is 1000.
#'
#' @return The function returns a list containing an object of class "Pop" and the SimParam object from AlphaSimR.
#' @return Additionally, it stores population characteristics (e.g., genotypes, substitution effects) in the global environment for further diagnosis.
#'
#' @details
#' This function generates a genetic map and haplotypes to initialize a population using the AlphaSimR package.
#' It iteratively adjusts the population to meet the specified genetic means and correlation within given tolerances.
#'
#' @note AlphaSimR does not natively support incomplete pleiotropy. To address this,
#' four pseudo-traits are simulated: two exhibiting complete pleiotropy and two
#' independent. These are then paired and combined to achieve the desired outcome.
#'
createPop <- function(drawingMethod,
nbInd,
var_trait1 = NULL,
tolerance_mean_trait1 = NULL,
nbChr_trait1_spe = NULL,
var_trait2 = NULL,
tolerance_mean_trait2 = NULL,
nbChr_trait2_spe = NULL,
geneticCorr = NULL,
tolerance_corr = NULL,
pleioVarProp_trait1 = NULL,
pleioVarProp_trait2 = NULL,
nbChr_pleio = NULL,
maxIterations = 1000){
## Check if AlphaSimR is installed, install it if missing, and then load it
list.of.packages <- c("AlphaSimR")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
sapply (list.of.packages, require, character.only = T)
## Calculate genetic variances, and correlation
var_trait1_spe <- var_trait1 * (1 - pleioVarProp_trait1)
var_trait1_pleio <- var_trait1 * pleioVarProp_trait1
var_trait2_spe <- var_trait2 * (1 - pleioVarProp_trait2)
var_trait2_pleio <- var_trait2 * pleioVarProp_trait2
cov_total <- geneticCorr * sqrt(var_trait1) * sqrt(var_trait2) # Total genetic covariance
corr_pleio <- cov_total / (sqrt(var_trait1_pleio) * sqrt(var_trait2_pleio)) # Contribution of the pleiotropic effects to the total correlation
# Check if corr_pleio is >= 1, which can occur when specified values of pleiotropic variances are
# too low relative to total variances
if (corr_pleio >= 1) {
stop("The specified genetic correlation cannot be achieved with the current total and pleiotropic variance values. Please check and adjust the variance or correlation input values.")
}
corrMatrix <- matrix(c(1, corr_pleio,
corr_pleio, 1),
nrow = 2, ncol = 2)
rownames(corrMatrix) <- c("Trait1", "Trait2")
colnames(corrMatrix) <- c("Trait1", "Trait2")
## Create a genetic map for nbChr_tot chromosomes, with 1 segregating site per chromosome, that is, per linkage group
nbChr_tot <- nbChr_trait1_spe + nbChr_trait2_spe + nbChr_pleio # Total number of chromosomes
genMap <- lapply(1:nbChr_tot, function(x) seq(1, length.out = 1))
## Set the frequency of allele 1 at each locus based on the user-specified drawing method
alleleFreqs <- switch(drawingMethod,
"Fixed" = rep(0.5, nbChr_tot),
"User-defined" = {
userDefinedFreqs <- c(0.2, 0.5, 0.8, 0.3, 0.7, 0.4, 0.6, 0.1, 0.9, 0.5, 0.5, 1, 0.1) # configurable
if (length(userDefinedFreqs) != nbChr_tot) {
stop(paste("The allele frequency vector must have length", nbChr_tot))
}
userDefinedFreqs
},
"Uniform" = runif(nbChr_tot, min = 0.01, max = 0.99),
"Beta" = rbeta(nbChr_tot, shape1 = 0.5, shape2 = 2), # configurable
"Gamma" = {
gammaDraw <- rgamma(nbChr_tot, shape = 1, rate = 2) # configurable
(gammaDraw - min(gammaDraw)) / (max(gammaDraw) - min(gammaDraw)) * 0.98 + 0.01
},
"Normal" = {
normalDraw <- rnorm(nbChr_tot)
(normalDraw - min(normalDraw)) / (max(normalDraw) - min(normalDraw)) * 0.98 + 0.01
},
stop("Invalid drawingMethod. Use 'User-defined', 'Uniform', 'Beta', 'Gamma', or 'Normal'.")
)
## Generate haplotypes for each locus from the frequencies specified for allele 1
generateHaplotypes <- function(nbInd, freq) {
# Generate alleles 0/1 with probability defined by freq for allele 1
sample(x = 0:1, size = nbInd * 2 , replace = TRUE, prob = c(1 - freq, freq))
}
haplotypes <- lapply(1:nbChr_tot, function(i) { matrix(generateHaplotypes(nbInd, alleleFreqs[i]), nrow = nbInd * 2, ncol = 1) })
## Create a founder population from the genetic map and haplotypes
founderPop = newMapPop(genMap = genMap, haplotypes = haplotypes)
## Loop to achieve genetic means close to 0 and a correlation near the specified target
cat(paste0("Start loop to satisfy tolerance criteria...", "\n"))
proccess <- T
j <- 1
while (proccess && j < maxIterations) {
# Simulation parameters setup
SP <<- SimParam$new(founderPop)
# Add an unused trait with a QTL on each chromosome (Otherwise, if the first added trait only
# includes certain chromosomes, the unused chromosomes are deleted.)
SP$addTraitA(nQtlPerChr = 1,
mean = 0,
var = 0,
name = c("NULL"))
# Add a single pseudo-trait ('Trait1_spe')
SP$addTraitA(nQtlPerChr = c(rep(1, nbChr_trait1_spe), rep(0, nbChr_trait2_spe), rep(0, nbChr_pleio)),
mean = 0,
var = c(var_trait1_spe),
gamma = T,
shape = 1, # TODO: shape to be configured
name = c("Trait1_spe"))
# Add a single pseudo-trait ('Trait2_spe')
SP$addTraitA(nQtlPerChr = c(rep(0, nbChr_trait1_spe), rep(1, nbChr_trait2_spe), rep(0, nbChr_pleio)),
mean = 0,
var = c(var_trait2_spe),
gamma = T,
shape = 1, # TODO: shape to be configured
name = c("Trait2_spe"))
# Add two pseudo-traits ('Trait1_pleio' and 'Trait2_pleio'), exhibiting complete pleiotropy
SP$addTraitA(nQtlPerChr = c(rep(0, nbChr_trait1_spe), rep(0, nbChr_trait2_spe), rep(1, nbChr_pleio)),
mean = c(0,0),
var = c(var_trait1_pleio, var_trait2_pleio),
gamma = T,
shape = 1, # TODO: shape to be configured
name = c("Trait1_pleio", "Trait2_pleio"),
corA = corrMatrix)
# Get the substitution effects
substiEffects_trait1_spe <- SP$traits[[2]]@addEff
substiEffects_trait2_spe <- SP$traits[[3]]@addEff
substiEffects_trait1_pleio <- SP$traits[[4]]@addEff
substiEffects_trait2_pleio <- SP$traits[[5]]@addEff
substiEffects_trait1 <- c(substiEffects_trait1_spe, substiEffects_trait1_pleio)
substiEffects_trait2 <- c(substiEffects_trait2_spe, substiEffects_trait2_pleio)
# Create an initial population from the founderPop
pop1 <- newPop(founderPop, simParam = SP)
# Get the genotypes and compute the genetic means and correlation
genotypes <- pullQtlGeno(pop1)
genotypes_trait1_spe <- genotypes[, (1 : nbChr_trait1_spe)]
genotypes_trait2_spe <- genotypes[, (1 + nbChr_trait1_spe):(nbChr_trait1_spe + nbChr_trait2_spe)]
genotypes_pleio <- genotypes[, (1 + nbChr_trait1_spe + nbChr_trait2_spe):(nbChr_trait1_spe + nbChr_trait2_spe + nbChr_pleio)]
genotypes_trait1 <- cbind(genotypes_trait1_spe, genotypes_pleio)
genotypes_trait2 <- cbind(genotypes_trait2_spe, genotypes_pleio)
mean_trait1 <- mean((genotypes_trait1 - 1) %*% substiEffects_trait1)
mean_trait2 <- mean((genotypes_trait2 - 1) %*% substiEffects_trait2)
corr_pleio_pop <- cor(substiEffects_trait1_pleio, substiEffects_trait2_pleio)
cat(sprintf("\r Number of iterations: %d", j))
j <- j + 1
if (j == maxIterations) {
cat("\n")
stop("Failure - Please adjust the tolerance values or increase the number of iterations and try again.")
}
# Check if the genetic means and correlation are tolerable for traits 1 and 2
if (abs(mean_trait1) < tolerance_mean_trait1 &&
abs(mean_trait2) < tolerance_mean_trait2 &&
(abs(corr_pleio_pop - corr_pleio) < tolerance_corr)) {
proccess <- FALSE
cat(paste0("\n", "Tolerance criteria met", "\n"))
}
}
## Add population characteristics in the global environment (for diagnostics)
# 1. Number of chromosomes
list2env(mget(c("nbChr_trait1_spe", "nbChr_trait2_spe", "nbChr_pleio", "nbChr_tot")), envir = .GlobalEnv)
# 2. Substitution effects
list2env(mget(c("substiEffects_trait1", "substiEffects_trait2")), envir = .GlobalEnv)
list2env(mget(c("substiEffects_trait1_spe", "substiEffects_trait2_spe", "substiEffects_trait1_pleio", "substiEffects_trait2_pleio")), envir = .GlobalEnv)
# 3. Genotypes
list2env(mget(c("genotypes", "genotypes_trait1","genotypes_trait2")), envir = .GlobalEnv)
list2env(mget(c("genotypes_trait1_spe","genotypes_trait2_spe", "genotypes_pleio")), envir = .GlobalEnv)
# 4. Genetic means
list2env(mget(c("mean_trait1", "mean_trait2")), envir = .GlobalEnv)
# 5. Genetic variances
observedVar_trait1 <- varA(pop = pop1, simParam = SP)[2,2] + varA(pop = pop1, simParam = SP)[4,4]
observedVar_trait2 <- varA(pop = pop1, simParam = SP)[3,3] + varA(pop = pop1, simParam = SP)[5,5]
list2env(mget(c("observedVar_trait1", "observedVar_trait2")), envir = .GlobalEnv)
# 6. Genetic correlations between traits
corr_Trait1xTrait2 <- cor((genotypes_trait1 - 1) %*% substiEffects_trait1,
(genotypes_trait2 - 1) %*% substiEffects_trait2)
list2env(mget(c("corr_Trait1xTrait2", "corr_pleio_pop")), envir = .GlobalEnv)
## Return population and SP
cat(paste0("Population created successfully", "\n"))
return(list(pop1, SP))
}
## Example call:
rm(list = setdiff(ls(), c("createPop"))) # Clear the global environment, keeping only the createPop() function
myPopSP = createPop(
drawingMethod = "Fixed", # 'Fixed', 'Uniform', 'Beta', 'Gamma', 'Normal' or 'User-defined'
nbInd = 25000,
var_trait1 = 0.0042,
tolerance_mean_trait1 = 0.01,
nbChr_trait1_spe = 20,
var_trait2 = 6.4e-07,
tolerance_mean_trait2 = 0.0001,
nbChr_trait2_spe = 20,
geneticCorr = 0.35,
tolerance_corr = 0.01,
pleioVarProp_trait1 = 0.5,
pleioVarProp_trait2 = 0.5,
nbChr_pleio = 20,
maxIterations = 5000
)
## Diagnostic
# Actual allele 1 frequencies
plotAlleleFrequencies <- function(traitNum) {
alleleFreqs <- colMeans(get(paste0("genotypes_trait", traitNum))) / 2
barplot(alleleFreqs, main = paste("Trait ", traitNum),
xlab = "Allele", ylab = "Allele Frequency", ylim = c(0, 1))
hist(alleleFreqs, main = paste("Trait ", traitNum),
xlab = "Allele Frequency", xlim = c(0, 1), breaks = seq(0, 1, by = 0.05))
}
plotAlleleFrequencies(traitNum = 1)
plotAlleleFrequencies(traitNum = 2)
# Genetic means and variances
calculateAndPrintStats <- function(traitNum) {
genotypes <- get(paste0("genotypes_trait", traitNum))
substiEffects <- get(paste0("substiEffects_trait", traitNum))
alleleFreqs <- colMeans(genotypes) /2
geneticMean <- get(paste0("mean_trait", traitNum))
observedVar <- get(paste0("observedVar_trait", traitNum))
panmicticVar <- (2 * alleleFreqs * (1 - alleleFreqs)) %*% (substiEffects^2)
geneticMean <- format(geneticMean, scientific = TRUE, digits = 3)
observedVar <- format(observedVar, scientific = TRUE, digits = 3)
panmicticVar <- format(panmicticVar, scientific = TRUE, digits = 3)
cat(paste0(paste0("Trait ", traitNum), "\n"))
cat(paste0("GeneticMean = ", geneticMean, "\n"))
cat(paste0("observedVar = ", observedVar, ", panmicticVar = ", panmicticVar, "\n"))
}
calculateAndPrintStats(traitNum = 1)
calculateAndPrintStats(traitNum = 2)
# Genetic correlation between trait 1 and trait 2
plot((genotypes_trait1 - 1) %*% substiEffects_trait1,
(genotypes_trait2 - 1) %*% substiEffects_trait2,
xlab="Trait 1",
ylab="Trait 2",
main = (paste0("Cor(", "Trait 1", ", ", "Trait 2",") = ", round(corr_Trait1xTrait2, 3))))
# Comparing pleiotropic and specific allele substitution effects
#' @warning : Since the pleiotropic and non-pleiotropic allelic effects are drawn independently,
#' it is crucial to ensure that their distributions are similar, especily when pleioVarProp values
#' are far from 0.5.
plotSubstitutionEffects <- function(traitNum) {
substiEffects_pleio <- get(paste0("substiEffects_trait", traitNum, "_pleio"))
substiEffects_spe <- get(paste0("substiEffects_trait", traitNum, "_spe"))
absSubstiEffects_pleio <- abs(substiEffects_pleio)
absSubstiEffects_spe <- abs(substiEffects_spe)
breaks <- seq(0, max(c(absSubstiEffects_pleio, absSubstiEffects_spe)), length.out = 31)
hist(absSubstiEffects_pleio, breaks = breaks, col = rgb(1, 0, 0, 0.5),
main = paste0("Trait ", traitNum), xlab = "Substitution effects",
xlim = range(c(absSubstiEffects_pleio, absSubstiEffects_spe)),
ylim = c(0, max(hist(absSubstiEffects_pleio, plot = FALSE, breaks = breaks)$counts,
hist(absSubstiEffects_spe, plot = FALSE, breaks = breaks)$counts)))
hist(absSubstiEffects_spe, breaks = breaks, col = rgb(0, 0, 1, 0.5), add = TRUE)
legend("topright", legend = c("pleio QTL", "spe QTL"),
fill = c(rgb(1, 0, 0, 0.5), rgb(0, 0, 1, 0.5)))
print(wilcox.test(substiEffects_pleio, substiEffects_spe)) # Test de Wilcoxon
print(ks.test(substiEffects_pleio, substiEffects_spe)) # Test de Kolmogorov-Smirnov
}
plotSubstitutionEffects(traitNum = 1)
plotSubstitutionEffects(traitNum = 2)
|
Beta Was this translation helpful? Give feedback.
Hi @gregorgorjanc,
As promised, here is a script to simulate a genetic background for two traits correlated with incomplete pleiotropy (some QTLs are pleiotropic, that is, common to both traits, others are specific).
AlphaSimR does not natively allow to simulate incomplete pleiotropy. To overcome this limitation, four "pseudo-traits" are simulated: two that exhibit complete pleiotropy and two that are independent. These pseudo-traits can then be paired and combined, either theoretically or through selection on a multi-trait index, for exemple.