This week we're looking at the Worldwide Bureaucracy Indicators (WWBI) dataset from the World Bank.
The Worldwide Bureaucracy Indicators (WWBI) database is a unique cross-national dataset on public sector employment and wages that aims to fill an information gap, thereby helping researchers, development practitioners, and policymakers gain a better understanding of the personnel dimensions of state capability, the footprint of the public sector within the overall labor market, and the fiscal implications of the public sector wage bill. The dataset is derived from administrative data and household surveys, thereby complementing existing, expert perception-based approaches.
The World Bank introduced the dataset with a series of four blogs:
Can you replicate the figures in the blogs? Can you display any of the data more clearly than in the blogs?
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2024-04-30')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 18)
wwbi_data <- tuesdata$wwbi_data
wwbi_series <- tuesdata$wwbi_series
wwbi_country <- tuesdata$wwbi_country
# Option 2: Read directly from GitHub
wwbi_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_data.csv')
wwbi_series <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_series.csv')
wwbi_country <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_country.csv')
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
variable | class | description |
---|---|---|
country_code | character | 3-letter ISO_3166-1 code |
indicator_code | character | code identifying the indicator of bureaucracy |
year | numeric | year of the data |
value | numeric | numeric value of the data |
variable | class | description |
---|---|---|
indicator_code | character | code identifying the indicator of bureaucracy |
indicator_name | character | name of the indicator |
variable | class | description |
---|---|---|
country_code | character | 3-letter ISO_3166-1 code |
short_name | character | short or common name for the country |
table_name | character | more alphabetically sortable name of the country |
long_name | character | full name of the country |
x2_alpha_code | character | 2-letter ISO_3166-1 code |
currency_unit | character | currency unit |
special_notes | character | special notes |
region | character | region |
income_group | character | low, lower middle, upper middle, or high income |
wb_2_code | character | alternate 2-letter code |
national_accounts_base_year | integer | national accounts base year |
national_accounts_reference_year | integer | national accounts reference year |
sna_price_valuation | character | UN system of national accounts price valuation |
lending_category | character | International Development Association (IDA), Interanational Bank of Reconstruction and Development (IBRD), a blend or neither |
other_groups | character | Heavily Indebted Poor Countries initiative (HIPC), or countries classified as the "Euro area" |
system_of_national_accounts | integer | which System of National Accounts methodology the country uses (1968, 1993, or 2008 version) |
balance_of_payments_manual_in_use | character | the version of the Balance of Payments Manual used by the country |
external_debt_reporting_status | character | estimate, preliminary, or actual |
system_of_trade | character | Under the general system imports include goods imported for domestic consumption and imports into bonded warehouses and free trade zones. Under the special system imports comprise goods imported for domestic consumption (including transformation and repair) and withdrawals for domestic consumption from bonded warehouses and free trade zones. Goods transported through a country en route to another are excluded. |
government_accounting_concept | character | government accounting concept |
imf_data_dissemination_standard | character | International Monetary Fund data-dissemination standard: Special Data Dissemination Standard (SDDS, 1996, created for countries |
that have or seek to have access to international markets), SDDS Plus (2012, the highest tier of data standards, intended for systemically important economies), enhanced GDDS (e-GDDS, 2015, encouraging participants to emphasize data publication) | ||
latest_household_survey | character | which household survey was most recently administered |
source_of_most_recent_income_and_expenditure_data | character | which survey serves as the basis for income and expenditure data |
vital_registration_complete | logical | whether the vital registration is complete |
latest_agricultural_census | integer | year of latest agricultural census |
latest_industrial_data | integer | year of latest industrial data |
latest_trade_data | integer | year of latest trade data |
latest_population_census_year | integer | year of latest population census |
latest_population_census_notes | character | notes about latest population census |
library(tidyverse)
library(janitor)
library(here)
library(fs)
library(withr)
working_dir <- here::here("data", "2024", "2024-04-30")
url <- "https://databank.worldbank.org/data/download/WWBI_CSV.zip"
file_path <- withr::local_tempfile(fileext = ".zip")
download.file(url, file_path)
extract_dir <- withr::local_tempdir("csvs")
unzip(file_path, exdir = extract_dir)
wwbi_country <- readr::read_csv(
fs::path(extract_dir, "WWBICountry.csv")
) |>
janitor::clean_names() |>
janitor::remove_empty("cols") |>
dplyr::mutate(
# Several columns are years, make them integers
national_accounts_reference_year = as.integer(national_accounts_reference_year),
latest_industrial_data = as.integer(latest_industrial_data),
latest_trade_data = as.integer(latest_trade_data),
latest_population_census_year = as.integer(stringr::str_extract(
latest_population_census,
"^\\d{4}"
)),
latest_agricultural_census = as.integer(stringr::str_extract(
latest_agricultural_census,
"^\\d{4}"
)),
national_accounts_base_year = as.integer(stringr::str_extract(
national_accounts_base_year,
"^\\d{4}"
)),
system_of_national_accounts = as.integer(stringr::str_extract(
system_of_national_accounts,
"\\d{4}"
)),
latest_population_census_notes = stringr::str_remove(
latest_population_census,
"^\\d{4}\\.?\\s*"
),
latest_population_census_notes = dplyr::na_if(
latest_population_census_notes,
""
),
# vital_registration_complete is either "yes" or "NA"
vital_registration_complete = !is.na(vital_registration_complete)
) |>
dplyr::select(-"latest_population_census")
wwbi_series <- readr::read_csv(
fs::path(extract_dir, "WWBISeries.csv"),
col_types = paste(rep("c", 21), collapse = "")
) |>
janitor::clean_names() |>
janitor::remove_empty("cols") |>
dplyr::rename(indicator_code = "series_code")
wwbi_data <- readr::read_csv(
fs::path(extract_dir, "WWBIData.csv"),
col_types = paste(c(rep("c", 4), rep("d", 21), "c"), collapse = "")
) |>
janitor::clean_names() |>
# indicator_name and country_name are redundant.
dplyr::select(-"indicator_name", -"country_name") |>
janitor::remove_empty("cols") |>
tidyr::pivot_longer(
cols = -c(country_code, indicator_code),
names_to = "year",
names_transform = ~ as.integer(stringr::str_remove(.x, "x")),
values_to = "value"
) |>
dplyr::filter(!is.na(value))
readr::write_csv(
wwbi_data,
fs::path(working_dir, "wwbi_data.csv")
)
readr::write_csv(
wwbi_series,
fs::path(working_dir, "wwbi_series.csv")
)
readr::write_csv(
wwbi_country,
fs::path(working_dir, "wwbi_country.csv")
)