Initial add of Jason's LVB Bayesian posts

fishR-Core-Team · Feb 5, 2024 · f5027a0 · f5027a0
1 parent 4b3289f
commit f5027a0
Show file tree

Hide file tree

Showing 32 changed files with 2,708 additions and 79 deletions.
diff --git a/_freeze/blog/posts/2024-2-5_LVB_brms/index/execute-results/html.json b/_freeze/blog/posts/2024-2-5_LVB_brms/index/execute-results/html.json
@@ -0,0 +1,16 @@
+{
+  "hash": "6c48e1d8902d47a2036d49eb08e6130c",
+  "result": {
+    "markdown": "---\ntitle: Bayesian LVB I - brms\ndescription: Tutorial on how to estimate parameters of the LVB model with Bayesian inference and brms\nauthor: Jason Doll\ndate: 2/5/2024\ncategories:\n  - growth\n  - Bayesian\n  - Stan\n---\n\n\n# Introduction\n\nThe use of Bayesian inference in fisheries biology has been increasing. For good reason, as there are many benefits to taking a Bayesian approach. I won't go into those reasons here but you can read about them in @dorazio_2016 and @dolljacquemin_2018. This post assumes you have already decided to use Bayesian methods and will present how to estimate parameters of the von Bertalanffy growth model. Previous posts describe frequentist methods [here](../2019-12-31_vonB_plots_1/) and [here](../2020-1-2_vonB_plots_2/)\n\nThere are many programming languages that can be used to fit models with Bayesian inference. In this post, I will use Stan with the `brms` package. The `brms` packages is a wrapper for Stan that makes fitting simple models easier than writing the full model code. In a [second post](../2024-2-6_LVB_Stan/), I will show how the same model is fit using `rstan` and writing the full Stan model code. Although the `brms` package will write the full Stan model for you.\n\nBoth methods will fit the typical three parameter von Bertalanffy growth model\n\n$$\nTL_i=L_\\infty * (1-e^{(-\\kappa * (t_i-t_0))} )\n$$\nwhere $TL_i$ is total length of individual *i*, $L_\\infty$ is the average maximum length obtained, $\\kappa$ is the Brody growth coefficient, $t_i$ is the age of individual *i*, and $t_0$ is the theoretical age at zero length. To finish the model, an error term is added:\n\n$$\nTL_i=L_\\infty * (1-e^{(-\\kappa * (t_i-t_0))} + \\epsilon_i)\n$$\n$$\n\\epsilon_i \\sim normal(0,\\sigma)\n$$\nWhere $\\epsilon_i$ is a random error term for individual *i* with a mean of 0 and standard deviation $\\sigma$.\n\n# Prior probabilities\nAt the heart of Bayesian analysis is the prior probability distribution. This post will use non-informative prior probability distributions. When and how to use informative priors when fitting a von Bertalanffy growth model will be discussed in a future post. However, you can read about it in @dolljacquemin_2018. The prior probability distributions used in this post are:\n\n| Parameter  | Prior Probability Distribution |\n|------------|--------------------------------|\n| $L_\\infty$ | normal(0,1000)                 |\n| $\\kappa$   | normal(0,10)                   |\n| $t_0$      | normal(0,10)                   |\n| $\\sigma$   | student-t(3,0,30)              |\n\nStan parameterizes the normal distribution with the mean and standard deviation and the student-t distribution with the degrees of freedom, mean, and standard deviation\n\n# Preliminaries\nFirst step is to load the necessary packages\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nlibrary(FSA)\nlibrary(FSAdata)   # for data\nlibrary(dplyr)     # for filter(), select()\nlibrary(ggplot2)   # for plotting\nlibrary(brms)      # for fitting Stan models\nlibrary(tidybayes) # for plotting posterior results\nlibrary(bayesplot) # for plotting posterior predictive checks\n```\n:::\n\n\n# Data\n\nThe `WalleyeErie2` data available in the [`FSAdata`](https://fishr-core-team.github.io/FSAdata/) package was used in previous posts demonstrating von Bertalanffy growth models and will once again be used here. These data are Lake Erie Walleye (*Sander vitreus*) captured during October-November, 2003-2014. As before, the primary interest here is in the tl (total length in mm) and age variables. The data will also be filtered to focus only on female Walleye from location \"1\" captured in 2014.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ndata(WalleyeErie2,package=\"FSAdata\")\nwf14T <- WalleyeErie2 %>%\n  filter(year==2014,sex==\"female\",loc==1) %>%\n  select(-year,-sex,-setID,-loc,-grid)\nheadtail(wf14T)\n```\n\n::: {.cell-output .cell-output-stdout}\n```{.output}\n#R|       tl    w      mat age\n#R|  1   445  737 immature   2\n#R|  2   528 1571   mature   4\n#R|  3   380  506 immature   1\n#R|  323 488 1089 immature   2\n#R|  324 521 1408   mature   3\n#R|  325 565 1745   mature   3\n```\n:::\n:::\n\n\n# brms\n\nThe first step is to specify the formula using `brmsformula()`. The first argument below is the von Bertalanffy growth model, the second argument is to specify any hierarchical model of parameters (for example, estimate parameters by group or random effects), and the third is to tell `brms` that I want to fit a nonlinear model. Note in this example I do not specify any random effects for parameters. Also note that tl and age MUST be exactly how they are shown in the column headers of the data.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\n#Set formula\nformula <- brmsformula(tl ~ Linf * (1 - exp(-K * (age - t0))),\n                       Linf ~ 1, K ~ 1, t0 ~ 1, nl=TRUE)\n```\n:::\n\n\nThe next step is to specify the prior probability distribution. Several parameters can't be negative values so they are truncated using the `lb=0` argument.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npriors <- prior(normal(0,1000), nlpar=\"Linf\", lb=0) +\n          prior(normal(0,10), nlpar=\"K\", lb=0) +\n          prior(normal( 0,10), nlpar=\"t0\") +\n          prior(student_t(3,0,40), class=sigma)\n```\n:::\n\n\nAn optional step is to specify initial values for the parameters. It is good practice to always specify initial values, particularly with non-linear and other complex models. I will fit the model using multiple chains and it is advisable to use different starting values for each chain. To accomplish this, we will specify a function and use a random number generator for each parameter. Adjust the range for the uniform distribution to cover a large range of values that make sense for your data. You can use other distributions as long as they match the declared range in the model code. In this example, I am using the random uniform function because $L_\\infty$ and $\\kappa$ are restricted to be positive in the model. Therefore, the starting value must be positive.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\ninitsLst <- function() list(\n  Linf=runif(1, 200, 800),\n  K=runif(1, 0.05, 3.00),\n  t0=rnorm(1, 0, 0.5)\n)\n```\n:::\n\n\nFinally, use the `brm()` function to send the model, data, priors, initial values, and any specified settings to Stan and save the output in the `fit1` object\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nfit1 <- brm(formula,            # calls the formula object created above\n            family=gaussian(),  # specifies the error distribution\n            data=wf14T,         # the data object\n            prior=priors,       # the prior probability object \n            init=initsLst,      # the initial values object\n            chains=3,           # number of chains, typically 3 to 4\n            cores=3,            # number of cores for multi-core processing. Typically set to\n                                # match number of chains. Adjust as needed to match the number\n                                # of cores on your computer and number of chains\n            iter=3000,          # number of iterations\n            warmup=1000,        # number of warm up steps to discard\n            control=list(adapt_delta=0.80,    # Adjustments to algorithm to \n                           max_treedepth=15)) # improve convergence.\n```\n:::\n\n# Assess convergence\nBefore diving into the output, I like to examine the chains to see if they sufficiently mixed and reached a stationary posterior distribution.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nplot(fit1)\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-7-1.png){fig-align='center' width=528}\n:::\n:::\n\n\nThe chains appear to have mixed well for all parameters and reached a stationary posterior. This is seen by a unimodal distribution (on the left) and caterpillar plots for each parameter appear \"on top\" of each other (right).\n\nWe can also do a posterior predictive check to assess how well the model fit the data. A posterior predictive check compares observed data to predicted values based on the fitted model. If the model predicted values are \"similar\" to the observed data values then you can conclude the model fits the data well.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\npp_check(fit1,ndraws=100)  # posterior predictive checks\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-8-1.png){fig-align='center' width=336}\n:::\n:::\n\n\nThis figure shows two lines, one represents the data (*y*) and the other represents posterior predicted values from the model (*y<sub>rep</sub>*). The *y<sub>rep</sub>* generally follows the observed data so we can conclude the model fits the data well.\n\n# Posterior summary\nFinally, we can examine the summary table of the output.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nsummary(fit1)\n```\n\n::: {.cell-output .cell-output-stdout}\n```{.output}\n#R|   Family: gaussian \n#R|    Links: mu = identity; sigma = identity \n#R|  Formula: tl ~ Linf * (1 - exp(-K * (age - t0))) \n#R|           Linf ~ 1\n#R|           K ~ 1\n#R|           t0 ~ 1\n#R|     Data: wf14T (Number of observations: 325) \n#R|    Draws: 3 chains, each with iter = 3000; warmup = 1000; thin = 1;\n#R|           total post-warmup draws = 6000\n#R|  \n#R|  Population-Level Effects: \n#R|                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS\n#R|  Linf_Intercept   648.30      9.82   629.35   668.55 1.00     1426     1791\n#R|  K_Intercept        0.36      0.02     0.32     0.40 1.00     1322     1623\n#R|  t0_Intercept      -1.29      0.08    -1.45    -1.12 1.00     1378     1878\n#R|  \n#R|  Family Specific Parameters: \n#R|        Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS\n#R|  sigma    18.38      0.71    17.08    19.84 1.00     2706     2752\n#R|  \n#R|  Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS\n#R|  and Tail_ESS are effective sample size measures, and Rhat is the potential\n#R|  scale reduction factor on split chains (at convergence, Rhat = 1).\n```\n:::\n:::\n\n\nThe summary table provides the point estimates, 95% credible intervals, Rhat values, and ESS. The Rhat values are another check to assess model convergence. It is generally accepted that you want Rhat values less than 1.10. Bulk_ESS and Tail_ESS are also used to assess convergence. Bulk_ESS and Tail_ESS refer to \"Effective Sample Size.\" Because of the nature of MCMC methods, each successive sample from the posterior will typically be autocorrelated within a chain. Autocorrelation within the chains can increase uncertainty in the estimates. One way to assessing how much autocorrelation is present and big of an effect it might be, is with the \"Effective Sample Size.\" ESS represents the number of independent draws from the posterior. The ESS will be lower than the actual number of draws and you are looking for a high ESS. It has been recommended that an ESS of 1,000 for each parameter is sufficient @burkner_2017.\n\n# Posterior plotting\nNow that we are convinced the model fit the data well, let's plot the data and model predictions using the `ggplot2` and `tidybayes` packages.\n\n\n::: {.cell layout-align=\"center\"}\n\n```{.r .cell-code}\nwf14T %>%\n  add_predicted_draws(fit1) %>%  # adding the posterior distribution with tidybayes\n  ggplot(aes(x=age, y=tl)) +  \n  stat_lineribbon(aes(y=.prediction), .width=c(.95, .80, .50),  # regression line and CI\n                  alpha=0.5, colour=\"black\") +\n  geom_point(data=wf14T, colour=\"darkblue\", size=3) +   # raw data\n  scale_fill_brewer(palette=\"Greys\") +\n  ylab(\"Total length (mm)\\n\") + \n  xlab(\"\\nAge (years)\") +\n  theme_bw() +\n  theme(legend.title=element_blank(),\n        legend.position=c(0.15, 0.85))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-10-1.png){fig-align='center' width=336}\n:::\n:::\n\n\nThe figure above shows the observed data in blue circles, the prediction line as a solid black line, and the posterior prediction intervals (0.50, 0.80, and 0.95) in different shades of gray.\n\n\n::: {.cell layout-align=\"center\"}\n\n:::\n",
+    "supporting": [
+      "index_files"
+    ],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-10-1.png b/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-10-1.png
diff --git a/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-7-1.png b/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-7-1.png
diff --git a/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-8-1.png b/_freeze/blog/posts/2024-2-5_LVB_brms/index/figure-html/unnamed-chunk-8-1.png
diff --git a/_freeze/blog/posts/2024-2-6_LVB_Stan/index/execute-results/html.json b/_freeze/blog/posts/2024-2-6_LVB_Stan/index/execute-results/html.json
diff --git a/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-12-1.png b/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-12-1.png
diff --git a/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-13-1.png b/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-13-1.png
diff --git a/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-14-1.png b/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-14-1.png
diff --git a/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-16-1.png b/_freeze/blog/posts/2024-2-6_LVB_Stan/index/figure-html/unnamed-chunk-16-1.png