2024-10-29
bleutils
is a collection of R functions for use by the BLE LTER IM team.
remotes::install_github("BLE-LTER/bleutils")
First, load the example data. cp_data
is included with bleutils
and is a minimal example of what investigators need to give the IM team before we can proceed.
library(bleutils)
df <- cp_data
add_cp_cols
will add columns such as node, lagoon, lat/lon, etc... to a data.frame, provided a column of station codes, e.g. "EWLD1". Use this to quickly add the required columns to a dataset.
df <- add_cp_cols(df, station_code_col = "station")
Note that this function relies on a table of station information. This is kept on Box at BLE-IM/BLE_LTER_CP_Stations.csv and a direct download link to it is fed under the argument "station_source" to the function. To generate a direct link to the Box file, go to its Shared Link Settings, and copy the URL under "Direct Link". You can also feed the function a local file path instead of a Box link.
update_cp_stations(source_file = "path_goes_here")
infer_season
outputs a vector of categorical season values (under ice/break up/open water) from a vector of date values (column in data.frame).
If the date vector is of type chr instead of Date or POSIXct (i.e., datetimes) as in the case of the example data, you will first need to convert the vector.
# Example when dates are chr, without a time portion, e.g., "6/27/2019"
library(lubridate)
df$date_collected = as.Date(parse_date_time(df$date_collected, "mdy"))
# Now infer the season
df$season <- infer_season(df, date_col = "date_collected")
Order columns according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that's done before running this function.
# In the example data, date_collected must be renamed to date_time
names(df)[names(df)=="date_collected"] <- "date_time"
# Now order the columns
df <- order_cp_cols(df, type = "water")
Order data rows according to the standard BLE order. Check out the function documentation for the exact order depending on data type (water/sediment/mooring). Columns need to be named exactly as specified, so make sure that's done before running this function.
df <- sort_cp_rows(df, type = "water")
init_datapkg
creates a directory structure and a templated R script for a new dataset. It will warn you if the dataset ID already exists (based on directory names in base_path
) and messages the next available numeric ID.
init_datapkg(base_path = getwd(),
dataset_id = 9999L,
dataset_nickname = "test")
init_script
creates an R script and populates it from template. For a new dataset, init_datapkg
calls init_script(type = "init")
and there's no further action needed. For updating existing datasets, use the argument type = "update"
.
Note that the file directory must exist before running this function.
init_script(dataset_id = 9999,
file_dir = file.path(getwd(), "9999_test", "EML_RProject_9999", "2022"),
type = "update")
The main difference is that an update script also has code to pull the latest version of the dataset from EDI's production server, so the new data can be appended to it.
The templates live in inst/
if updates are needed.
We decided to append abbreviated units at the end of attribute names (see BLE handbook). To facilitate that, wrap calls to MetaEgress::get_meta()
in bleutils::append_units
. This saves us from having to actually have the units in attribute names in metabase, and makes updates easier.
From this point example commands will not work without being able to connect to an instance of metabase. Feel free to run them if your machine can connect to an instance of metabase though.
metadata <- MetaEgress::get_meta(dbname = "ble_metabase",
dataset_ids = 13,
user = "insert_or_enter_in_console",
password = "insert_or_enter_in_console")
metadata <- append_units(metadata)
rename_attributes
renames all columns in a data.frame to what's in the metadata. The usual assumptions apply: that there are as many columns in data as in metadata, and they are ordered the same way.
df <- rename_attributes(metadata,
dataset_id = 13,
entity = 1,
x = df,
append_units = TRUE)
BLE's Core Program datasets include a CSV of personnel with the years of data they were involved in. This information is stored in metabase and is specific to BLE. MetaEgress::get_meta()
doesn't query this information, so use:
export_personnel(dataset_ids = 13,
file_name = "BLE_LTER_chlorophyll_personnel.csv")
During ice break-up, sometimes the field team cannot reach the actual station coordinates and end up at a nearby location. In 2023 for transparency we decided to label when this has happened with different station codes and the actual coordinates, instead of the station field team was supposed to be at. This function takes a CSV kept on Box at BLE-IM/breakup_station_corrections_20192022.csv with the dates, the original stations affected, and new station codes to label these dates as. You can also use a local file with the same info. Note that this function strips the station information columns from the data.frame. They can be added back in via add_cp_cols
, given that the Box station file also has the info for the new station codes.
df <- correct_stations(df)
For data with taxa, CSVs usually include a WoRMS aphia_id and the basic taxonomic hierarchy. What's in WoRMS changes over time, frequently. The check_classification
function compares the hierachy in the CSV with what's in WoRMS for each aphia_id. Mismatches are printed to the console and saved to a file.
input_dataframe <- ... # load your data table CSV into a data frame here
file_path <- "c:\data\check_taxa_results.txt"
check_classification(input_dataframe, file_path)