Skip to content

Latest commit

 

History

History
192 lines (160 loc) · 9.23 KB

readme.md

File metadata and controls

192 lines (160 loc) · 9.23 KB

Worldwide Bureaucracy Indicators

This week we're looking at the Worldwide Bureaucracy Indicators (WWBI) dataset from the World Bank.

The Worldwide Bureaucracy Indicators (WWBI) database is a unique cross-national dataset on public sector employment and wages that aims to fill an information gap, thereby helping researchers, development practitioners, and policymakers gain a better understanding of the personnel dimensions of state capability, the footprint of the public sector within the overall labor market, and the fiscal implications of the public sector wage bill. The dataset is derived from administrative data and household surveys, thereby complementing existing, expert perception-based approaches.

The World Bank introduced the dataset with a series of four blogs:

Can you replicate the figures in the blogs? Can you display any of the data more clearly than in the blogs?

The Data

# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-04-30')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 18)

wwbi_data <- tuesdata$wwbi_data
wwbi_series <- tuesdata$wwbi_series
wwbi_country <- tuesdata$wwbi_country


# Option 2: Read directly from GitHub

wwbi_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_data.csv')
wwbi_series <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_series.csv')
wwbi_country <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-04-30/wwbi_country.csv')

How to Participate

  • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
  • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
  • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

Data Dictionary

wwbi_data.csv

variable class description
country_code character 3-letter ISO_3166-1 code
indicator_code character code identifying the indicator of bureaucracy
year numeric year of the data
value numeric numeric value of the data

wwbi_series.csv

variable class description
indicator_code character code identifying the indicator of bureaucracy
indicator_name character name of the indicator

wwbi_country.csv

variable class description
country_code character 3-letter ISO_3166-1 code
short_name character short or common name for the country
table_name character more alphabetically sortable name of the country
long_name character full name of the country
x2_alpha_code character 2-letter ISO_3166-1 code
currency_unit character currency unit
special_notes character special notes
region character region
income_group character low, lower middle, upper middle, or high income
wb_2_code character alternate 2-letter code
national_accounts_base_year integer national accounts base year
national_accounts_reference_year integer national accounts reference year
sna_price_valuation character UN system of national accounts price valuation
lending_category character International Development Association (IDA), Interanational Bank of Reconstruction and Development (IBRD), a blend or neither
other_groups character Heavily Indebted Poor Countries initiative (HIPC), or countries classified as the "Euro area"
system_of_national_accounts integer which System of National Accounts methodology the country uses (1968, 1993, or 2008 version)
balance_of_payments_manual_in_use character the version of the Balance of Payments Manual used by the country
external_debt_reporting_status character estimate, preliminary, or actual
system_of_trade character Under the general system imports include goods imported for domestic consumption and imports into bonded warehouses and free trade zones. Under the special system imports comprise goods imported for domestic consumption (including transformation and repair) and withdrawals for domestic consumption from bonded warehouses and free trade zones. Goods transported through a country en route to another are excluded.
government_accounting_concept character government accounting concept
imf_data_dissemination_standard character International Monetary Fund data-dissemination standard: Special Data Dissemination Standard (SDDS, 1996, created for countries
that have or seek to have access to international markets), SDDS Plus (2012, the highest tier of data standards, intended for systemically important economies), enhanced GDDS (e-GDDS, 2015, encouraging participants to emphasize data publication)
latest_household_survey character which household survey was most recently administered
source_of_most_recent_income_and_expenditure_data character which survey serves as the basis for income and expenditure data
vital_registration_complete logical whether the vital registration is complete
latest_agricultural_census integer year of latest agricultural census
latest_industrial_data integer year of latest industrial data
latest_trade_data integer year of latest trade data
latest_population_census_year integer year of latest population census
latest_population_census_notes character notes about latest population census

Cleaning Script

library(tidyverse)
library(janitor)
library(here)
library(fs)
library(withr)

working_dir <- here::here("data", "2024", "2024-04-30")

url <- "https://databank.worldbank.org/data/download/WWBI_CSV.zip"

file_path <- withr::local_tempfile(fileext = ".zip")
download.file(url, file_path)

extract_dir <- withr::local_tempdir("csvs")
unzip(file_path, exdir = extract_dir)

wwbi_country <- readr::read_csv(
  fs::path(extract_dir, "WWBICountry.csv")
) |> 
  janitor::clean_names() |> 
  janitor::remove_empty("cols") |> 
  dplyr::mutate(
    # Several columns are years, make them integers
    national_accounts_reference_year = as.integer(national_accounts_reference_year),
    latest_industrial_data = as.integer(latest_industrial_data),
    latest_trade_data = as.integer(latest_trade_data),
    latest_population_census_year = as.integer(stringr::str_extract(
      latest_population_census,
      "^\\d{4}"
    )),
    latest_agricultural_census = as.integer(stringr::str_extract(
      latest_agricultural_census,
      "^\\d{4}"
    )),
    national_accounts_base_year = as.integer(stringr::str_extract(
      national_accounts_base_year,
      "^\\d{4}"
    )),
    system_of_national_accounts = as.integer(stringr::str_extract(
      system_of_national_accounts,
      "\\d{4}"
    )),
    latest_population_census_notes = stringr::str_remove(
      latest_population_census,
      "^\\d{4}\\.?\\s*"
    ),
    latest_population_census_notes = dplyr::na_if(
      latest_population_census_notes,
      ""
    ),
    # vital_registration_complete is either "yes" or "NA"
    vital_registration_complete = !is.na(vital_registration_complete) 
  ) |> 
  dplyr::select(-"latest_population_census")

wwbi_series <- readr::read_csv(
  fs::path(extract_dir, "WWBISeries.csv"),
  col_types = paste(rep("c", 21), collapse = "")
) |> 
  janitor::clean_names() |> 
  janitor::remove_empty("cols") |> 
  dplyr::rename(indicator_code = "series_code")

wwbi_data <- readr::read_csv(
  fs::path(extract_dir, "WWBIData.csv"),
  col_types = paste(c(rep("c", 4), rep("d", 21), "c"), collapse = "")
) |> 
  janitor::clean_names() |> 
  # indicator_name and country_name are redundant.
  dplyr::select(-"indicator_name", -"country_name") |> 
  janitor::remove_empty("cols") |> 
  tidyr::pivot_longer(
    cols = -c(country_code, indicator_code),
    names_to = "year",
    names_transform = ~ as.integer(stringr::str_remove(.x, "x")),
    values_to = "value"
  ) |> 
  dplyr::filter(!is.na(value))

readr::write_csv(
  wwbi_data,
  fs::path(working_dir, "wwbi_data.csv")
)
readr::write_csv(
  wwbi_series,
  fs::path(working_dir, "wwbi_series.csv")
)
readr::write_csv(
  wwbi_country,
  fs::path(working_dir, "wwbi_country.csv")
)