Skip to content

Code used to create the figures in our Panjiva data paper: "Bill of Lading Data in International Trade Research withan Application to the Covid-19 Pandemic"

Notifications You must be signed in to change notification settings

maddieky/panjiva-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Panjiva Data Methods

Code used to create the tables and figures in our FEDS note: "Bill of Lading Data in International Trade Research with an Application to the Covid-19 Pandemic" (Flaaen et al.).

database

All files used to build the database use python and require large computational servers to process the large data files. In our project, we used a server with 24 cores, 200GB RAM, and over 3 TB of disk space. Our version of the Panjiva Impala database is served on a Hadoop cluster. Your usage may vary depending on the resources available.

  • Downloading data from FTP server:
    • Coordinator script with Ray local parallelization: data_ftppull_coordinator_ray.py
    • Worker script: data_ftppull_worker.py
  • Processing data and uploading to Hadoop:
    • Script with Dask local parallelization: process_and_load_dask.py

Data Manipulation

All files used to query/manipulate data from our Panjiva Hadoop database are located in code_for_paper/data_manipulation or code_for_paper/stata:

  • Figs 1 + 2: pull_teu_shpt_val.R
  • Fig 3: pull_transport_2019.R
  • Fig 4: pull_mode_of_transport.R
  • Fig 5: pull_walmart_redaction.R
  • Fig 6: pull_weighted_hist_shp_con.R
  • Fig 7: pull_hist_con_shp_shpt_per_year.R
  • Fig 8: pull_yoy_shp_per_con.R
  • Fig 9: pull_intramonth.R
  • Fig 10a: query_furniture_decomp.R, pull_furniture_decomp_step1.do, pull_furniture_decomp_step2.do
  • Fig 10b: query_all_decomp.R, pull_all_decomp_step1.do, pull_all_decomp_step2.do
  • Figs 11+12: query_port_to_port_data.R, step1_vessel_clean.do, step2_vessel_clean.do, port_analysis.do
  • Fig 13: pull_ports_teu.R
  • Fig 14: pull_teu_ports_all.R
  • Fig 15: panjiva-days-count-teu-port-data.ipynb
  • Figs 16 + 17: pull_trading_partners.R

Each file outputs a csv of data that can be used as the input for the charting scripts listed in the next section.

Charting

All charts are created using ggplot2. All files used to create charts are in code_for_paper/figures:

  • Figs 1 + 2: fig_teu_shpt_val.R
  • Fig 3: fig_transport_2019.R
  • Fig 4: fig_mode_of_transport.R
  • Fig 5: fig_walmart_redaction.R
  • Fig 6: fig_weighted_hist_shp_con.R
  • Fig 7: fig_hist_con_shp_shpt_per_year.R
  • Fig 8: fig_yoy_shp_per_con.R
  • Fig 9: fig_intramonth.R
  • Fig 10a: fig_furn_decomp.R
  • Fig 10b: fig_all_decomp.R
  • Figs 11: fig_avg_days_between_shipments_ec_la.R
  • Fig 12: fig_inbound_reroute_la_lb.R
  • Fig 13: fig_ports_teu.R
  • Fig 14: fig_teu_ports_all.R
  • Fig 15: fig_teu_delay.R
  • Figs 16 + 17: fig_trading_partners.R

Tables

All files used to create tables are in tables_for_paper:

  • Tab 2: tab_transport_2019_by_country_levels.R
  • Tab 3: tab_missing_data.R
  • Tab 4: tab_top_consignees.R
  • Tab 5: tab_top_shippers.R
  • Tab 6: tab_census_panjiva_weight.R

About

Code used to create the figures in our Panjiva data paper: "Bill of Lading Data in International Trade Research withan Application to the Covid-19 Pandemic"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published