Skip to content

Commit

Permalink
Merge pull request #5 from jk-brown/2-edits-to-workflow-documents-to-…
Browse files Browse the repository at this point in the history
…prep-for-publication

Updating workflow for reproducability
  • Loading branch information
jk-brown authored Oct 17, 2024
2 parents cd21e18 + 20db10d commit e591d7f
Show file tree
Hide file tree
Showing 12 changed files with 303 additions and 259 deletions.
6 changes: 2 additions & 4 deletions workflows/01_building_dataset_from_Sherwood2020.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,12 @@ knitr::opts_chunk$set(echo = TRUE)

# Goal

Build and save dataset of ECS values from different ECS configuraiton distirbutions as provided by S20 supplemental information.
Build and save data set of ECS values from different ECS configuration distributions as provided by S20 supplemental information.

# Building data set

Here we are building ECS data sets for each of the evidence configurations of interest. We are using data from [Sherwood et al. 2020](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019RG000678) henceforth referred to as S20. The data was drawn from the supplemental information in S20 and represents ECS percentile estimates from likelihood distributions quantified in S20 for each of the five evidence configurations (each containing different combinations of lines of evidence). There are other configurations available in S20 beyond the five we have chosen here. The percentile estimates reported in this data represent the 5th, 10th, 17th, 20th, 25th, 50th, 75th, 80th, 83rd, 80th, and 95th percentile, as well as the mode and the mean.

Names of each vector list are coded to represent the evidence configuration the data are associated with.

```{r}
ecs_data_list <- list(
"Baseline" = c(
Expand Down Expand Up @@ -90,5 +88,5 @@ ecs_data_list <- list(
Write data frame and store in the `data` directory.

```{r}
saveRDS(ecs_data_list, "data/ecs_posterior_data_S20.RDS")
saveRDS(ecs_data_list, "data/ecs_prior_data_S20.RDS")
```
55 changes: 5 additions & 50 deletions workflows/02_fitting_and_sampling_ecs_distributions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ knitr::opts_chunk$set(echo = TRUE)

The goal of this script is to fit and easy to sample parametric distribution to the ECS values provided in S20. We will then sample the distributions, save the data, and visualize them.

## Source all setup, mapping, and helper functions stored in `sourced`
```{r}
library(MASS)
source("sourced/source_all.R")
```

# Fit Distribution to Sherwood et al. 2020 data
Expand All @@ -25,7 +26,7 @@ Some notes on `fitdistr()` usage: The function fits a maximum likelihood distrib

```{r}
# read in the ecs_data and store as a list
ecs_data <- readRDS("data/ecs_posterior_data_S20.RDS")
ecs_data <- readRDS("data/ecs_prior_data_S20.RDS")
```

Expand Down Expand Up @@ -70,10 +71,6 @@ The result is a new list (`ecs_sample_list`) of data frames, one for each eviden

# Visualize Simulated Samples

```{r}
library(ggplot2)
```

Once the ECS samples are produced and stored in a list of data frames (`ecs_sample_list`), we visualize the sample distribution with `ggplot2`.

Before working in `ggplot2`, we build a single data frame with all samples and the name of the evidence configurations fro each sample to make it easier to plot.
Expand Down Expand Up @@ -125,46 +122,9 @@ quantile(ecs_sample_list$Baseline_Emergent_constraints$ECS, probs = c(0.05,0.50,

Here we want to determine the optimal distribution hyperparameters for a skew normal distribution that meets specified quantile requirements. For this task we utilize the `sn` and `optimx` packages. The objective it to fit a skew normal distribution such that at quantile 5%, 50%, and 95% reflect IPCC ECS range (3, 2-5).


We need the following packages:
```{r}
# Load the required libraries
library(sn)
library(optimx)
```

## Define objective function

Define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values.

```{r}
# Objective function to find the optimal xi, omega, and alpha
objective_function <- function(params) {
xi <- params[1]
omega <- params[2]
alpha <- params[3]
# Calculate the quantiles for the given parameters
q5 <- qsn(0.05, xi = xi, omega = omega, alpha = alpha)
q50 <- qsn(0.50, xi = xi, omega = omega, alpha = alpha)
q95 <- qsn(0.95, xi = xi, omega = omega, alpha = alpha)
# Desired quantiles
desired_q5 <- 2.0
desired_q50 <- 3.0
desired_q95 <- 5.0
# Calculate the squared differences
error <- (q5 - desired_q5)^2 + (q50 - desired_q50)^2 + (q95 - desired_q95)^2
# Print intermediate values for debugging
cat("xi:", xi, "omega:", omega, "alpha:", alpha, "error:", error, "\n")
return(error)
}
```
We define an objective function to quantify the deviation of the quantiles for the skew normal distribution from the desired IPCC values. This function is stored in `sourced/helper_functions.R`.

Here, the hyperparameters (`xi`, `omega`, and `alpha`) represent the location, scale, and shape of the skew normal distribution, respectively. The function computes quantiles for the 5th, 50th, and 95th percentiles from the skew normal distribution provided initial hyperparameters (`qsn()`). It then compares these values to the targeted IPCC values (3.0, 2.0-5.0) by calculating squared differences. This approach allows us to determine how well the hyperparameters of the skew normal distribution align with the desired percentiles.

Expand Down Expand Up @@ -233,7 +193,7 @@ print(likely)
```
Here we can see that the very likely distribution aligns exactly with the IPCC expectation. However, the likely range is slightly more constrained that what has been proposed by IPCC AR6.

## Visualize the distribution
## preliminary visualization of the distribution
```{r}
# convert the data to data frame
IPCC_est_df <- data.frame(ECS = data)
Expand All @@ -249,18 +209,13 @@ ECS_sn <-
xlim = c(2.0, 5.0), fill = "#F21A00", alpha = 0.5) + # Shade area between 2.0 and 5.0
geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
xlim = c(2.6, 3.43), fill = "#F21A00", alpha = 0.7) +
# geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
# xlim = c(0, 2.0), fill = "grey", alpha = 0.5) +
# geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
# xlim = c(5.0, 8.0), fill = "grey", alpha = 0.5) +
geom_area(stat = "function", fun = dsn, args = list(xi = xi, omega = omega, alpha = alpha),
xlim = c(3.0, 3.01), fill = "black", alpha = 0.5)
ECS_sn
```


## Add the IPCC AR6 distribution value to ECS sample list
```{r}
# add estimated IPCC ECS values to ecs_sample_list
Expand Down
44 changes: 13 additions & 31 deletions workflows/03_plotting_ecs_samples_S1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)

Produce a figure with the ECS samples. This may serve as a good supplemental figure for the paper to show the distribution shapes for the evidence configurations from S20, based on gamma distribution assumption.

```{r}
library(tidyverse)
```
# Produce figure

We produce PDF curves in a panel using the color palette and then save the figure in the `figures` directory.
Expand All @@ -27,54 +24,39 @@ ecs_sample_list <- readRDS("data/ecs_samples_plot_data.RDS")
ecs_df_S1 <- data.frame(
value = unlist(ecs_sample_list),
evidence = rep(names(ecs_sample_list), each = n),
scenario = rep(names(ecs_sample_list), each = n),
row.names = NULL
)
# recode evidence configurations for aesthetics
ecs_df_S1$evidence <- ecs_df_S1$evidence %>%
recode(
IPCC = "IPCC AR6",
No_Process = "No Process",
No_Historical = "No Historical",
No_Paleoclimate = "No Paleoclimate",
Baseline_Emergent_constraints = "Baseline + Emergent constraints"
)
# edit order of the facet panels
facet_order <- c(
"IPCC AR6",
"Baseline",
"No Historical",
"No Process",
"No Paleoclimate",
"Baseline + Emergent constraints"
)
ecs_df_S1_fig_data <- recode_scenarios(ecs_df_S1)
# convert the evidence configuration to factor with the facet order we want
ecs_df_S1$evidence <- factor(ecs_df_S1$evidence, levels = facet_order)
ecs_df_S1_fig_data$scenario <- factor(ecs_df_S1_fig_data$scenario, levels = scenario_order)
# plot with ggplot"
ggplot() +
geom_density(data = ecs_df_S1,
geom_density(data = ecs_df_S1_fig_data,
aes(x = value,
color = evidence,
fill = evidence),
color = scenario,
fill = scenario),
linewidth = 0.75,
alpha = 0.2,
bw = 0.3) +
scale_color_manual(values = ECS_COLORS,
name = "ECS Configurations") +
guide = "none") +
scale_fill_manual(values = ECS_COLORS,
name = "ECS Configurations") +
guide = "none") +
scale_x_continuous(breaks = seq(from = 0, to = 9,
by = 1)) +
labs(x = "Equilibrium Climate Sensitivity Value",
y = "Densisty") +
facet_wrap(~ evidence) +
y = "Density") +
facet_wrap(~ scenario) +
theme_light() +
theme(legend.position = "bottom")
theme(legend.position = "bottom",
strip.text = element_text(size = 10, color = "black"),
strip.background = element_rect(fill = "white"))
ggsave("figures/S1_fig_ECS_distributions.png",
device = "png",
Expand Down
46 changes: 32 additions & 14 deletions workflows/04_running_the_model_computing_probabilities.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,6 @@ knitr::opts_chunk$set(echo = TRUE)

The goal of this script is to run Matilda with each of the ECS distributions we sampled prior.

```{r}
library(matilda)
options(matilda.verbose = FALSE)
library(parallel)
library(tidyverse)
```

# Using ECS samples to run Matilda

We use ECS values sampled from the estimated parametric distributions from S20 to propagate the varying levels of uncertainty associated with evidence configurations to probabilistic climate projections. This provides an opportunity to better understand how different ECS evidence configurations affect temperature trajectories from a simple carbon cycle climate model.
Expand Down Expand Up @@ -45,8 +38,6 @@ The result will be a new core object that can will be a required input to run th

# Generate values for other model parameters

**This needs to be edited** I think for this experiment it will be more straight forward to keep all parameters aside from ECS fixed. This will reduce the complexity that is introduced from parameter interactions and will isolate the affect of the ECS distributions from different ECS evidence configurations. Below are notes that will be edited accordingly and the code chunk in this section will be skipped.

Matilda works by running Hector multiple times to build a perturbed parameter ensemble, thus applying parameter uncertainty to model outputs. We need to produce parameter values to accompany the ECS values we sampled in previous steps of the workflow.

Parameter sets are generated in Matilda using `generate_params`. We use this function to produce `n` initial parameter sets (`init_params`). In this analysis I do not think I am going to run samples of the other parameters. Instead we can isolate the behavior of Hector to different ECS distributions by using Hector defaults for all parameters aside from ECS.
Expand Down Expand Up @@ -182,7 +173,11 @@ gmst_criteria_data <- gmst_criteria_data %>% rename("value" = "Anomaly..deg.C.",
# Temperature anomaly error vector
gmst_error <- read.csv("data-raw/annual_gmst_SD_quant.csv")
temp_error_vec <- gmst_error$value
```

Plot temperature data with time varying uncertainty error as a sanity check:

```{r}
# for plotting add error values to criteria data
gmst_criteria_data$error <- gmst_error$value
ggplot() +
Expand All @@ -195,12 +190,17 @@ ggplot() +
ymax = value + error),
alpha = 0.2)
```

# create a new ocean c uptake criterion
Create new temperature criterion:
```{r}
# create a GMST criterion
gmst_criterion <- new_criterion("gmst", years = gmst_criteria_data$year, obs_values = gmst_criteria_data$value)
```

*The Matilda provided gmst criterion only uses a time frame 1961-2023. We create the new criterion to use the timeframe of 1850-2024.*

Create criterion using ocean carbon uptake from the Global Carbon Project:

```{r}
Expand Down Expand Up @@ -253,10 +253,10 @@ names(model_weights)
## model_weights before culling constraint
saveRDS(model_weights, "data/pre_culled_model_weights.RDS")
```

Filter out ensemble members with near-zero weight:
```{r}
## CURRENTLY TESTING CONSTRAINING RESULT BY MINIMUM WEIGHT

```{r}
# filter out ensemble members that do not meet the minimum weight constraint (> 1e-6)
constrained_weights <- lapply(model_weights, function(df) {
Expand All @@ -265,6 +265,7 @@ constrained_weights <- lapply(model_weights, function(df) {
return(filtered_result)
})
```

Constrained weights can be merged with the constrained results. This would produce a list (based on ECS scenario) of the constrained model ensemble and the assigned weights for each run. However, because some of the models have been filtered out during the constraint, need to re-normalize so weights sum to 1. This way we can still use the resulting ensembles to compute metrics and probabilities accurately.
Expand All @@ -278,6 +279,10 @@ weighted_ensemble <- Map(function(a, b) {
}, constrained_weights, model_result)
```

Sanity check to ensure names are correct and re-weighted ensemble sums to 1.0:
```{r}
#sanity check names
names(weighted_ensemble)
Expand All @@ -299,6 +304,10 @@ norm_weights_ensemble <- lapply(names(weighted_ensemble), function(df_name) {
return(df)
})
```

Sanity check that element names are preserved and final weights sum to 1.0:
```{r}
# make sure element names are preserved
names(norm_weights_ensemble) <- names(weighted_ensemble)
Expand All @@ -309,9 +318,10 @@ sapply(norm_weights_ensemble, function(df) {
print(sum)
})
```

# saving as a list does this work?
```{r}
# saver normalized weighted ensemble
saveRDS(norm_weights_ensemble, "data/weighted_ensemble.RDS")
```
Expand Down Expand Up @@ -381,6 +391,10 @@ eoc_gsat_metrics <- Map(function(a, b){
}, eoc_warming_results, norm_weights_ensemble)
```

Sanity check to verify the normaized metric weights sum to 1.0:
```{r}
# Apply function to calculate and print sum of the specified column
# Verify the final normalized weights
sapply(eoc_gsat_metrics, function(df) {
Expand All @@ -390,6 +404,10 @@ sapply(eoc_gsat_metrics, function(df) {
print(sum)
})
```

Save weighted metric data:
```{r}
# save the result for future visualization
saveRDS(eoc_gsat_metrics, "data/weighted_gsat_metrics.RDS")
Expand Down
Loading

0 comments on commit e591d7f

Please sign in to comment.