Merge pull request #5 from jk-brown/2-edits-to-workflow-documents-to-…

…prep-for-publication Updating workflow for reproducability
JGCRI · Oct 17, 2024 · e591d7f · e591d7f
2 parents cd21e18 + 20db10d
commit e591d7f
Show file tree

Hide file tree

Showing 12 changed files with 303 additions and 259 deletions.
diff --git a/workflows/01_building_dataset_from_Sherwood2020.Rmd b/workflows/01_building_dataset_from_Sherwood2020.Rmd
@@ -9,14 +9,12 @@ knitr::opts_chunk$set(echo = TRUE)
 
 # Goal 
 
-Build and save dataset of ECS values from different ECS configuraiton distirbutions as provided by S20 supplemental information.
+Build and save data set of ECS values from different ECS configuration distributions as provided by S20 supplemental information.
 
 # Building data set
 
 Here we are building ECS data sets for each of the evidence configurations of interest. We are using data from [Sherwood et al. 2020](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019RG000678) henceforth referred to as S20. The data was drawn from the supplemental information in S20 and represents ECS percentile estimates from likelihood distributions quantified in S20 for each of the five evidence configurations (each containing different combinations of lines of evidence). There are other configurations available in S20 beyond the five we have chosen here. The percentile estimates reported in this data represent the 5th, 10th, 17th, 20th, 25th, 50th, 75th, 80th, 83rd, 80th, and 95th percentile, as well as the mode and the mean. 
 
-Names of each vector list are coded to represent the evidence configuration the data are associated with.
-
 ```{r}
 ecs_data_list <- list(
   "Baseline" = c(
@@ -90,5 +88,5 @@ ecs_data_list <- list(
 Write data frame and store in the `data` directory.
 
 ```{r}
-saveRDS(ecs_data_list, "data/ecs_posterior_data_S20.RDS")
+saveRDS(ecs_data_list, "data/ecs_prior_data_S20.RDS")
 ```
diff --git a/workflows/02_fitting_and_sampling_ecs_distributions.Rmd b/workflows/02_fitting_and_sampling_ecs_distributions.Rmd
@@ -11,8 +11,9 @@ knitr::opts_chunk$set(echo = TRUE)
 
 The goal of this script is to fit and easy to sample parametric distribution to the ECS values provided in S20. We will then sample the distributions, save the data, and visualize them.
 
+## Source all setup, mapping, and helper functions stored in `sourced`
 ```{r}
-library(MASS)
+source("sourced/source_all.R")
 ```
 
 # Fit Distribution to Sherwood et al. 2020 data 
@@ -25,7 +26,7 @@ Some notes on `fitdistr()` usage: The function fits a maximum likelihood distrib
 
 ```{r}
 # read in the ecs_data and store as a list 
-ecs_data <- readRDS("data/ecs_posterior_data_S20.RDS")
+ecs_data <- readRDS("data/ecs_prior_data_S20.RDS")
 
 ```
 
@@ -70,10 +71,6 @@ The result is a new list (`ecs_sample_list`) of data frames, one for each eviden
 
 # Visualize Simulated Samples
 
-```{r}
-library(ggplot2)
-```
-
 Once the ECS samples are produced and stored in a list of data frames (`ecs_sample_list`), we visualize the sample distribution with `ggplot2`.
 
 Before working in `ggplot2`, we build a single data frame with all samples and the name of the evidence configurations fro each sample to make it easier to plot. 
@@ -125,46 +122,9 @@ quantile(ecs_sample_list$Baseline_Emergent_constraints$ECS, probs = c(0.05,0.50,
 
 Here we want to determine the optimal distribution hyperparameters for a skew normal distribution that meets specified quantile requirements. For this task we utilize the `sn` and `optimx` packages. The objective it to fit a skew normal distribution such that at quantile 5%, 50%, and 95% reflect IPCC ECS range (3, 2-5).
 
-
-We need the following packages:
-```{r}
-# Load the required libraries
-library(sn)
-library(optimx)
-
-```
-
 ## Define objective function
 
-Define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values. 
-
-```{r}
-# Objective function to find the optimal xi, omega, and alpha
-objective_function <- function(params) {
-  xi <- params[1]
-  omega <- params[2]
-  alpha <- params[3]
-  
-  # Calculate the quantiles for the given parameters
-  q5 <- qsn(0.05, xi = xi, omega = omega, alpha = alpha)
-  q50 <- qsn(0.50, xi = xi, omega = omega, alpha = alpha)
-  q95 <- qsn(0.95, xi = xi, omega = omega, alpha = alpha)
-  
-  # Desired quantiles
-  desired_q5 <- 2.0
-  desired_q50 <- 3.0
-  desired_q95 <- 5.0
-  
-  # Calculate the squared differences
-  error <- (q5 - desired_q5)^2 + (q50 - desired_q50)^2 + (q95 - desired_q95)^2
-  
-  # Print intermediate values for debugging
-  cat("xi:", xi, "omega:", omega, "alpha:", alpha, "error:", error, "\n")
-  
-  return(error)
-}
-
-```
+We define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values. This function is stored in `sourced/helper_functions.R`. 
 
 Here, the hyperparameters (`xi`, `omega`, and `alpha`) represent the location, scale, and shape of the skew normal distribution, respectively. The function computes quantiles for the 5th, 50th, and 95th percentiles from the skew normal distribution provided initial hyperparameters (`qsn()`). It then compares these values to the targeted IPCC values (3.0, 2.0-5.0) by calculating squared differences. This approach allows us to determine how well the hyperparameters of the skew normal distribution align with the desired percentiles.
 
@@ -233,7 +193,7 @@ print(likely)
 ```
 Here we can see that the very likely distribution aligns exactly with the IPCC expectation. However, the likely range is slightly more constrained that what has been proposed by IPCC AR6.
 
-## Visualize the distribution
+## preliminary visualization of the distribution
 ```{r}
 # convert the data to data frame
 IPCC_est_df <- data.frame(ECS = data)
@@ -249,18 +209,13 @@ ECS_sn <-
             xlim = c(2.0, 5.0), fill = "#F21A00", alpha = 0.5) +  # Shade area between 2.0 and 5.0
   geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
             xlim = c(2.6, 3.43), fill = "#F21A00", alpha = 0.7) +
-  # geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
-  #           xlim = c(0, 2.0), fill = "grey", alpha = 0.5) +
-  # geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
-  #           xlim = c(5.0, 8.0), fill = "grey", alpha = 0.5) +
   geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
             xlim = c(3.0, 3.01), fill = "black", alpha = 0.5)
 
 
 ECS_sn
 ```
 
-
 ## Add the IPCC AR6 distribution value to ECS sample list
 ```{r}
 # add estimated IPCC ECS values to ecs_sample_list 

diff --git a/workflows/03_plotting_ecs_samples_S1.Rmd b/workflows/03_plotting_ecs_samples_S1.Rmd
@@ -11,9 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)
 
 Produce a figure with the ECS samples. This may serve as a good supplemental figure for the paper to show the distribution shapes for the evidence configurations from S20, based on gamma distribution assumption. 
 
-```{r}
-library(tidyverse)
-```
 # Produce figure
 
 We produce PDF curves in a panel using the color palette and then save the figure in the `figures` directory.
@@ -27,54 +24,39 @@ ecs_sample_list <- readRDS("data/ecs_samples_plot_data.RDS")
 ecs_df_S1 <- data.frame(
   
   value = unlist(ecs_sample_list),
-  evidence = rep(names(ecs_sample_list), each = n),
+  scenario = rep(names(ecs_sample_list), each = n),
   row.names = NULL
   
 )
 
 # recode evidence configurations for aesthetics
-ecs_df_S1$evidence <- ecs_df_S1$evidence %>%
-  recode(
-    IPCC = "IPCC AR6",
-    No_Process = "No Process",
-    No_Historical = "No Historical",
-    No_Paleoclimate = "No Paleoclimate",
-    Baseline_Emergent_constraints = "Baseline + Emergent constraints"
-  )
-
-# edit order of the facet panels
-facet_order <- c(
-  "IPCC AR6",
-  "Baseline",
-  "No Historical",
-  "No Process",
-  "No Paleoclimate",
-  "Baseline + Emergent constraints"
-)
+ecs_df_S1_fig_data <- recode_scenarios(ecs_df_S1)
 
 # convert the evidence configuration to factor with the facet order we want
-ecs_df_S1$evidence <- factor(ecs_df_S1$evidence, levels = facet_order)
+ecs_df_S1_fig_data$scenario <- factor(ecs_df_S1_fig_data$scenario, levels = scenario_order)
 
 # plot with ggplot"
 ggplot() +
-  geom_density(data = ecs_df_S1,
+  geom_density(data = ecs_df_S1_fig_data,
                aes(x = value, 
-                   color = evidence,
-                   fill = evidence),
+                   color = scenario,
+                   fill = scenario),
                linewidth = 0.75, 
                alpha = 0.2, 
                bw = 0.3) +
   scale_color_manual(values = ECS_COLORS, 
-                     name = "ECS Configurations") +
+                     guide = "none") +
   scale_fill_manual(values = ECS_COLORS, 
-                    name = "ECS Configurations") +
+                    guide = "none") +
   scale_x_continuous(breaks = seq(from = 0, to = 9, 
                                   by = 1)) +
   labs(x = "Equilibrium Climate Sensitivity Value",
-       y = "Densisty") +
-  facet_wrap(~ evidence) +
+       y = "Density") +
+  facet_wrap(~ scenario) +
   theme_light() +
-  theme(legend.position = "bottom")
+  theme(legend.position = "bottom",
+        strip.text = element_text(size = 10, color = "black"), 
+        strip.background = element_rect(fill = "white"))
 
 ggsave("figures/S1_fig_ECS_distributions.png",
        device = "png",

diff --git a/workflows/04_running_the_model_computing_probabilities.Rmd b/workflows/04_running_the_model_computing_probabilities.Rmd
@@ -11,13 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)
 
 The goal of this script is to run Matilda with each of the ECS distributions we sampled prior. 
 
-```{r}
-library(matilda)
-options(matilda.verbose = FALSE)
-library(parallel)
-library(tidyverse)
-```
-
 # Using ECS samples to run Matilda
 
 We use ECS values sampled from the estimated parametric distributions from S20 to propagate the varying levels of uncertainty associated with evidence configurations to probabilistic climate projections. This provides an opportunity to better understand how different ECS evidence configurations affect temperature trajectories from a simple carbon cycle climate model. 
@@ -45,8 +38,6 @@ The result will be a new core object that can will be a required input to run th
 
 # Generate values for other model parameters
 
-**This needs to be edited** I think for this experiment it will be more straight forward to keep all parameters aside from ECS fixed. This will reduce the complexity that is introduced from parameter interactions and will isolate the affect of the ECS distributions from different ECS evidence configurations. Below are notes that will be edited accordingly and the code chunk in this section will be skipped.
-
 Matilda works by running Hector multiple times to build a perturbed parameter ensemble, thus applying parameter uncertainty to model outputs. We need to produce parameter values to accompany the ECS values we sampled in previous steps of the workflow.
 
 Parameter sets are generated in Matilda using `generate_params`. We use this function to produce `n` initial parameter sets (`init_params`). In this analysis I do not think I am going to run samples of the other parameters. Instead we can isolate the behavior of Hector to different ECS distributions by using Hector defaults for all parameters aside from ECS.
@@ -182,7 +173,11 @@ gmst_criteria_data <- gmst_criteria_data %>% rename("value" = "Anomaly..deg.C.",
 # Temperature anomaly error vector
 gmst_error <- read.csv("data-raw/annual_gmst_SD_quant.csv")
 temp_error_vec <- gmst_error$value
+```
+
+Plot temperature data with time varying uncertainty error as a sanity check:
 
+```{r}
 # for plotting add error values to criteria data 
 gmst_criteria_data$error <- gmst_error$value
 ggplot() +
@@ -195,12 +190,17 @@ ggplot() +
                   ymax = value + error), 
               alpha = 0.2)             
 
+```
 
-# create a new ocean c uptake criterion
+Create new temperature criterion:
+```{r}
+# create a GMST criterion
 gmst_criterion <- new_criterion("gmst", years = gmst_criteria_data$year, obs_values = gmst_criteria_data$value)
 
 ```
 
+*The Matilda provided gmst criterion only uses a time frame 1961-2023. We create the new criterion to use the timeframe of 1850-2024.* 
+
 Create criterion using ocean carbon uptake from the Global Carbon Project:
 
 ```{r}
@@ -253,10 +253,10 @@ names(model_weights)
 ## model_weights before culling constraint
 saveRDS(model_weights, "data/pre_culled_model_weights.RDS")
 ```
+
 Filter out ensemble members with near-zero weight:
-```{r}
-## CURRENTLY TESTING CONSTRAINING RESULT BY MINIMUM WEIGHT
 
+```{r}
 # filter out ensemble members that do not meet the minimum weight constraint (> 1e-6)
 constrained_weights <- lapply(model_weights, function(df) {
   
@@ -265,6 +265,7 @@ constrained_weights <- lapply(model_weights, function(df) {
   return(filtered_result)
   
   })
+
 ```
 
 Constrained weights can be merged with the constrained results. This would produce a list (based on ECS scenario) of the constrained model ensemble and the assigned weights for each run. However, because some of the models have been filtered out during the constraint, need to re-normalize so weights sum to 1. This way we can still use the resulting ensembles to compute metrics and probabilities accurately.
@@ -278,6 +279,10 @@ weighted_ensemble <- Map(function(a, b) {
 
   }, constrained_weights, model_result)
 
+```
+
+Sanity check to ensure names are correct and re-weighted ensemble sums to 1.0:
+```{r}
 #sanity check names 
 names(weighted_ensemble)
 
@@ -299,6 +304,10 @@ norm_weights_ensemble <- lapply(names(weighted_ensemble), function(df_name) {
   return(df)
 
 })
+```
+
+Sanity check that element names are preserved and final weights sum to 1.0:
+```{r}
 # make sure element names are preserved
 names(norm_weights_ensemble) <- names(weighted_ensemble)
 
@@ -309,9 +318,10 @@ sapply(norm_weights_ensemble, function(df) {
 
   print(sum)  
 })
+```
 
-
-# saving as a list does this work?
+```{r}
+# saver normalized weighted ensemble
 saveRDS(norm_weights_ensemble, "data/weighted_ensemble.RDS")
 
 ```
@@ -381,6 +391,10 @@ eoc_gsat_metrics <- Map(function(a, b){
   
 }, eoc_warming_results, norm_weights_ensemble)
 
+```
+
+Sanity check to verify the normaized metric weights sum to 1.0:
+```{r}
 # Apply function to calculate and print sum of the specified column
 # Verify the final normalized weights
 sapply(eoc_gsat_metrics, function(df) {
@@ -390,6 +404,10 @@ sapply(eoc_gsat_metrics, function(df) {
   print(sum)  
 })
 
+```
+
+Save weighted metric data:
+```{r}
 # save the result for future visualization
 saveRDS(eoc_gsat_metrics, "data/weighted_gsat_metrics.RDS")