Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 wpp: new 2024 release #53

Merged
merged 42 commits into from
Jul 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
a996ab5
chore: update python deps
marcelgerber Jun 26, 2024
42578ed
feat: indicator-based population explorer mostly works
marcelgerber Jun 26, 2024
f4e0f58
feat: indicator-based population explorer works
marcelgerber Jun 26, 2024
bd5c65c
feat: get rid of table defs!!
marcelgerber Jun 26, 2024
485299d
fix: fix up color scales
marcelgerber Jun 26, 2024
48f5e32
feat: drop/rename explicit title and subtitle fields
marcelgerber Jun 26, 2024
bad8b3f
feat: use 2024 data!
marcelgerber Jun 26, 2024
e84b766
population_broad -> population
lucasrodes Jun 26, 2024
85c2fb3
wip
lucasrodes Jun 26, 2024
6038d51
wip
lucasrodes Jun 26, 2024
2ba1426
population change
owidbot Jun 28, 2024
5fbe0b8
wip: explorer config
owidbot Jun 28, 2024
89c6ca4
wip
owidbot Jun 28, 2024
9aa3658
add f/m migration
owidbot Jul 1, 2024
cb32259
feat: no longer drop title/subtitle/note
marcelgerber Jul 1, 2024
ee60006
Revert "feat: get rid of table defs!!"
marcelgerber Jul 1, 2024
09df639
feat: specify map color schemes as metadata
marcelgerber Jul 1, 2024
d9fae5c
feat: get rid of explicit source information
marcelgerber Jul 1, 2024
70dc030
enhance: clarify titles and subtitles
marcelgerber Jul 1, 2024
6638a21
enhance: specify sensible `yAxisMin` for "Life expectancy at age ..."
marcelgerber Jul 1, 2024
06d0876
enhance: use better color scheme for 100+ deaths
marcelgerber Jul 1, 2024
19115aa
enhance: remove most explicit units
marcelgerber Jul 1, 2024
7919598
enhance: explicitly set empty subtitles to single-space string
marcelgerber Jul 1, 2024
717292c
enhance: don't specify column display name in most cases
marcelgerber Jul 1, 2024
f45cdcc
enhance: no special treatment for column type
marcelgerber Jul 1, 2024
b30372d
enhance: use `catalogPath` column
marcelgerber Jul 2, 2024
dff4e31
wip
owidbot Jul 4, 2024
d6a4113
wip
owidbot Jul 4, 2024
bb2269f
wip
lucasrodes Jul 4, 2024
6bac296
wip
owidbot Jul 4, 2024
f33f39a
wip
lucasrodes Jul 4, 2024
b189c75
wip
owidbot Jul 4, 2024
6677a58
wip
lucasrodes Jul 4, 2024
5c1cdc3
wip
owidbot Jul 4, 2024
0652952
wip map brackets
owidbot Jul 5, 2024
6d30c6c
map brackets
owidbot Jul 5, 2024
69a3ba5
remove 'fertility' keyword
owidbot Jul 10, 2024
62dc08b
remove note on mortality rates
owidbot Jul 10, 2024
46b5753
fix unwanted '-' removal
owidbot Jul 10, 2024
2095b8b
remove dimensions based on public release
owidbot Jul 12, 2024
17e4986
bump wpp dataset version
owidbot Jul 12, 2024
637d6b7
disable scenarios for num deaths
owidbot Jul 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,448 changes: 1,800 additions & 2,648 deletions explorers/population-and-demography.explorer.tsv

Large diffs are not rendered by default.

107 changes: 56 additions & 51 deletions scripts/demography-explorer/age_group.csv
Original file line number Diff line number Diff line change
@@ -1,51 +1,56 @@
slug,csv_slug,name,title_suffix
all,all,Total,
none,none,None
0,0,Under 1 year,of children under the age of 1
0_4,0_4,Under 5 years,of children under the age of 5
0_14,0_14,Under 15 years,of children under the age of 15
0_24,0_24,Under 25 years,under the age of 25
15_64,15_64,15-64 years,aged 15 to 64 years
1_4,1_4,1–4 years,aged 1 to 4 years
5_9,5_9,5–9 years,aged 5 to 9 years
10_14,10_14,10–14 years,aged 10 to 14 years
15_19,15_19,15–19 years,aged 15 to 19 years
15plus,15plus,15+ years,older than 15 years
18plus,18plus,18+ years,older than 18 years
20_29,20_29,20–29 years,aged 20 to 29 years
30_39,30_39,30–39 years,aged 30 to 39 years
40_49,40_49,40–49 years,aged 40 to 49 years
50_59,50_59,50–59 years,aged 50 to 59 years
60_69,60_69,60–69 years,aged 60 to 69 years
70_79,70_79,70–79 years,aged 70 to 79 years
80_89,80_89,80–89 years,aged 80 to 89 years
90_99,90_99,90–99 years,aged 90 to 99 years
100plus,100plus,100+ years,older than 100 years
mothers_15_19,15_19,Mothers aged 15–19 years,from mothers aged 15 to 19 years
mothers_20_24,20_24,Mothers aged 20–24 years,from mothers aged 20 to 24 years
mothers_25_29,25_29,Mothers aged 25–29 years,from mothers aged 25 to 29 years
mothers_30_34,30_34,Mothers aged 30–34 years,from mothers aged 30 to 34 years
mothers_35_39,35_39,Mothers aged 35–39 years,from mothers aged 35 to 39 years
mothers_40_44,40_44,Mothers aged 40–44 years,from mothers aged 40 to 44 years
mothers_45_49,45_49,Mothers aged 45–49 years,from mothers aged 45 to 49 years
aged_15,15,Aged 15,at age 15
aged_65,65,Aged 65,at age 65
aged_80,80,Aged 80,at age 80
at_birth,at_birth,At birth,at birth
1,1,At age 1,at age 1
5,5,At age 5,at age 5
10,10,At age 10,at age 10
15,15,At age 15,at age 15
20,20,At age 20,at age 20
30,30,At age 30,at age 30
40,40,At age 40,at age 40
50,50,At age 50,at age 50
60,60,At age 60,at age 60
65,65,At age 65,at age 65
70,70,At age 70,at age 70
80,80,At age 80,at age 80
90,90,At age 90,at age 90
100_and_over,100plus,At age 100 and over,at age 100 and over
dependency_total,dependency_total,Total dependency ratio,
dependency_child,dependency_child,Youth dependency ratio,
dependency_old,dependency_old,Old-age dependency ratio,
slug,csv_slug,name,title_suffix,plain
all,all,Total,,
none,none,None,,
0,0,Under 1 year,of children under the age of 1,
0_4,0_4,Under 5 years,of children under the age of 5,
0_14,0_14,Under 15 years,of children under the age of 15,
0_24,0_24,Under 25 years,under the age of 25,
15_64,15_64,15-64 years,aged 15 to 64 years,
1_4,1_4,1–4 years,aged 1 to 4 years,
5_9,5_9,5–9 years,aged 5 to 9 years,
10_14,10_14,10–14 years,aged 10 to 14 years,
15_19,15_19,15–19 years,aged 15 to 19 years,
15plus,15plus,15+ years,older than 15 years,
18plus,18plus,18+ years,older than 18 years,
20_29,20_29,20–29 years,aged 20 to 29 years,
30_39,30_39,30–39 years,aged 30 to 39 years,
40_49,40_49,40–49 years,aged 40 to 49 years,
50_59,50_59,50–59 years,aged 50 to 59 years,
60_69,60_69,60–69 years,aged 60 to 69 years,
70_79,70_79,70–79 years,aged 70 to 79 years,
80_89,80_89,80–89 years,aged 80 to 89 years,
90_99,90_99,90–99 years,aged 90 to 99 years,
100plus,100plus,100+ years,older than 100 years,
mothers_10_14,10_14,Mothers aged 10–14 years,from mothers aged 10 to 14 years,
mothers_15_19,15_19,Mothers aged 15–19 years,from mothers aged 15 to 19 years,
mothers_20_24,20_24,Mothers aged 20–24 years,from mothers aged 20 to 24 years,
mothers_25_29,25_29,Mothers aged 25–29 years,from mothers aged 25 to 29 years,
mothers_30_34,30_34,Mothers aged 30–34 years,from mothers aged 30 to 34 years,
mothers_35_39,35_39,Mothers aged 35–39 years,from mothers aged 35 to 39 years,
mothers_40_44,40_44,Mothers aged 40–44 years,from mothers aged 40 to 44 years,
mothers_45_49,45_49,Mothers aged 45–49 years,from mothers aged 45 to 49 years,
mothers_50_54,50_54,Mothers aged 50–54 years,from mothers aged 50 to 54 years,
aged_15,15,Aged 15,at age 15,15
aged_30,30,Aged 30,at age 30,30
aged_45,45,Aged 45,at age 45,45
aged_65,65,Aged 65,at age 65,65
aged_80,80,Aged 80,at age 80,80
aged_90,90,Aged 90,at age 90,90
at_birth,0,At birth,at birth,0
1,1,At age 1,at age 1,1
5,5,At age 5,at age 5,5
10,10,At age 10,at age 10,10
15,15,At age 15,at age 15,15
20,20,At age 20,at age 20,20
30,30,At age 30,at age 30,30
40,40,At age 40,at age 40,40
50,50,At age 50,at age 50,50
60,60,At age 60,at age 60,60
65,65,At age 65,at age 65,65
70,70,At age 70,at age 70,70
80,80,At age 80,at age 80,80
90,90,At age 90,at age 90,90
100_and_over,100plus,At age 100 and over,at age 100 and over,
dependency_total,dependency_total,Total dependency ratio,,
dependency_child,dependency_child,Youth dependency ratio,,
dependency_old,dependency_old,Old-age dependency ratio,,
93 changes: 45 additions & 48 deletions scripts/demography-explorer/demography-explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@
import textwrap
import pandas as pd
import re
from collections import defaultdict

# There are two datasets available:
# - DATASET_PATH_PREFIX: Classic dataset, with estimates for 1950-2023 and projections for 2024-2100.
# - DATASET_PATH_PREFIX_FULL: Alternative daraset, with projections for 1950-2100 (the 1950-2023 part is the same in all projections). This dataset is helpful in explorers to be able to plot the complete time series (estimates + projections) for a given projection.
DATASET_PATH_PREFIX = "grapher/un/2024-07-12/un_wpp/"
DATASET_PATH_PREFIX_FULL = "grapher/un/2024-07-12/un_wpp_full/"

def file_url(tableSlug):
return (
f"https://catalog.ourworldindata.org/explorers/un/2022/un_wpp/{tableSlug}.csv"
)
COLS_TO_DROP = []


# %%
Expand All @@ -20,52 +21,43 @@ def substitute_rows(row):
if isinstance(row[key], str):
while "${" in row[key]:
template = Template(row[key])
row[key] = template.substitute(**row)
row[key] = template.substitute(
**row, DATASET_PATH_PREFIX=DATASET_PATH_PREFIX
)
return row


def table_def(tableSlug, rows, display_names):
table_def = f"table {file_url(tableSlug)} {tableSlug}"
rows["ySlugs"] = rows["ySlugs"].map(lambda x: x.split(" "))
rows = rows.explode("ySlugs").drop_duplicates("ySlugs").reset_index(drop=True)
def table_def(rows, display_names):
rows["yVariableIds"] = rows["yVariableIds"].map(lambda x: x.split(" "))
rows = (
rows.explode("yVariableIds")
.drop_duplicates("yVariableIds")
.reset_index(drop=True)
)

column_defs = rows.filter(regex="^column__", axis=1).rename(
columns=lambda x: re.sub("^column__", "", x)
)
column_defs = column_defs.drop(columns=["type"])
col_names = [
"slug",
"catalogPath",
"name",
"type",
"sourceName",
"sourceLink",
"dataPublishedBy",
"additionalInfo",
*column_defs.columns,
]
col_names = "\t".join(col_names)

col_defs = [
[
row["ySlugs"],
display_names[row["ySlugs"]],
row["column__type"],
"United Nations, World Population Prospects (2022)",
"https://population.un.org/wpp/",
"United Nations, Department of Economic and Social Affairs, Population Division (2022). World Population Prospects 2022, Online Edition.",
"""The 2022 Revision of World Population Prospects was released on 11 July 2022 by the Population Division of the Department of Economic and Social Affairs of the United Nations.\\n\\nIt presents population estimates from 1950 to the present, based on historical demographic trends. It also includes projections to the year 2100 based on a range of demographic scenarios. The three scenarios that we show (‘Low’, ‘Medium’, ‘High’) differ only with respect to the level of fertility; they share the same assumptions for sex ratio at birth, life expectancy and international migration.\\n\\nAll values are estimated based on current country borders.\\n\\nThe next revision of this data by the UN is due in 2024.""",
row["yVariableIds"],
display_names.get(row["yVariableIds"]) or "",
*column_defs.loc[idx].values.tolist(),
]
for (idx, row) in rows.iterrows()
]
col_defs = ["\t".join(col) for col in col_defs]
col_defs = textwrap.indent("\n".join(col_defs), "\t")

return f"""{table_def}
columns {tableSlug}
return f"""columns
{col_names}
location Country name EntityName
year Year Year
{col_defs}"""


Expand Down Expand Up @@ -117,20 +109,32 @@ def table_def(tableSlug, rows, display_names):
.apply(lambda x: x.strip())
.apply(lambda x: x[0].upper() + x[1:] if len(x) else x)
.apply(lambda x: re.sub(" {2,}", " ", x))
.apply(lambda x: x.replace("\\-\\", " "))
# .apply(lambda x: x or " ")
)
# explicitly set empty strings to a single space, so we don't inherit it from ETL
# df.loc[df[col] == "-", col] = " "


# %%
# Use DATASET_PATH_PREFIX_FULL when variant is not "None" (i.e. some projection scenario)
mask = df["projection__slug"] != "estimates"
df.loc[mask, "yVariableIds"] = df.loc[mask, "yVariableIds"].str.replace(
DATASET_PATH_PREFIX, DATASET_PATH_PREFIX_FULL
)

# %%
# Extract column display names from ySlugs
# The `ySlugs` column can contain names for column slugs, e.g.:
# Extract column display names from yVariableIds
# The `yVariableIds` column can contain names for column slugs, e.g.:
# population_broad__all__15-24__records:"15-24 years"
# Note the colon, and especially the quotes around the name. They are required!
# This config will use the name "15-24 years" as the display name for the column.
# If an explicit name is not given, the row's title will be used instead.
col_display_names = {}

y_slug_re = r"([\w\-+]+):\"([^\"]+)\""
y_slug_re = r"([\w\-\/_#]+):\"([^\"]+)\""
for idx, row in df.iterrows():
matches = re.finditer(y_slug_re, row["ySlugs"])
matches = re.finditer(y_slug_re, row["yVariableIds"])
slugs = []
for match in matches:
col_slug, col_name = match.groups()
Expand All @@ -139,21 +143,13 @@ def table_def(tableSlug, rows, display_names):
col_display_names[col_slug] = col_name

if len(slugs):
row["ySlugs"] = " ".join(slugs)
elif row["ySlugs"] not in col_display_names:
col_display_names[row["ySlugs"]] = row["title"]
df.loc[idx, "yVariableIds"] = " ".join(slugs)

# %%
tables = df["tableSlug"].unique()
table_defs = [
table_def(
tableSlug,
df[df["tableSlug"] == tableSlug].reset_index(drop=True),
col_display_names,
)
for tableSlug in tables
if tableSlug != ""
]
col_defs = table_def(
df.reset_index(drop=True),
col_display_names,
)

# %%

Expand All @@ -175,12 +171,13 @@ def table_def(tableSlug, rows, display_names):
# Drop all remaining programmatic columns containing __
df = df.drop(columns=df.filter(regex="__"))

# %%
df = df.rename(columns={col_name: "_" + col_name for col_name in COLS_TO_DROP})

# %%
graphers_tsv = df.to_csv(sep="\t", index=False)
graphers_tsv_indented = textwrap.indent(graphers_tsv, "\t")

table_defs = "\n".join(table_defs)

# %%
warning = "# DO NOT EDIT THIS FILE BY HAND. It is automatically generated using a set of input files. Any changes made directly to it will be overwritten.\n\n"

Expand All @@ -189,7 +186,7 @@ def table_def(tableSlug, rows, display_names):
warning
+ template.substitute(
graphers_tsv=graphers_tsv_indented,
table_defs=table_defs,
table_defs=col_defs,
)
)

Expand Down
Loading
Loading