-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
165 lines (108 loc) · 9.25 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
output:
github_document:
df_print: tibble
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
tidy = TRUE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# uktrade
<!-- badges: start -->
[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![R-CMD-check](https://github.com/pvdmeulen/uktrade/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/pvdmeulen/uktrade/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of `uktrade` is to provide convenient wrapper functions for loading HMRC data in R.
## Installation
You can install the latest version of `uktrade` from this GitHub repository using `devtools`:
``` {r install, eval = FALSE}
devtools::install_github("pvdmeulen/uktrade")
```
## HMRC data
Her Majesty's Revenue & Customs (HMRC) is the United Kingdom's customs authority. Data on UK trade is available on their [uktradeinfo.com](https://www.uktradeinfo.com/) website, which is collected through a combination of customs declarations and Intrastat surveys (in the case of [some EU trade](https://www.gov.uk/intrastat)).
In the beginning of 2021, the [uktradeinfo.com](https://www.uktradeinfo.com/) website was updated. This update also introduced an Application Programming Interface (API) for accessing bulk trade (and trader) data. For smaller trade data extracts (< 30,000 rows), the [online tool](https://www.uktradeinfo.com/trade-data/) may be sufficient.
For more information on the various API endpoints and options, see [HMRC's API documentation](https://www.uktradeinfo.com/api-documentation/). Note that this service and the website itself are in beta at the time of writing - features may still be added or changed (e.g. the [number of rows per page was increased](https://www.uktradeinfo.com/news/enhancement-made-to-uktradeinfo-api-service/) from 30,000 to 40,000 in December 2021).
## This package
This package contains four functions:
|Function |Description |Status |
|-----------------|-------------------------------------------------------|-----------------------------------------------|
|`load_ots()` |a function for loading Overseas Trade Statistics data |:yellow_circle: experimental (use at own risk) |
|`load_rts()` |a function for loading Regional Trade Statistics data |:yellow_circle: experimental (use at own risk) |
|`load_trader()` |a function for loading trader data |:red_circle: *planned* |
|`load_custom()` |a function for loading a custom URL |:yellow_circle: experimental (use at own risk) |
All of these functions will output a `dataframe` object (or `tibble`) with the desired data, and are able to keep track of paginated results (in batches of 40,000 rows) as well as the API request limit of 60 requests per minute.
The first three are convenient wrapper functions which should make loading basic datasets easier (with the assumption that further data manipulation will be done in R after loading the data). Of course, the API allows for extensive customisation in what data is obtained. For this purpose, `load_custom()` allows you to specify a custom URL instead. This allows you to be more specific in your request (and in the case of trader data, expand specific columns to get more information).
## Example using OTS and RTS data
These functions are convenient wrappers for loading trade data - either detailed UK trade data using OTS (by CN8 code), or regional UK trade data using RTS (by SITC2 code). These functions will load the raw data and optionally join results with lookups obtained from the API (using, for example, the /Commodity and /Country endpoints). This makes data easy to read by humans, but also larger (more columns containing text).
### OTS
Loading all UK trade of single malt Scotch whisky (CN8 code 22083030) and bottled gin (22085011) for 2019 is done like so:
```{r load_ots example}
library(uktrade)
data <- load_ots(month = c(201901, 201912), commodity = c(22083030, 22085011), join_lookup = FALSE, output = "tibble")
# Results are now in a tibble (set output to "df" to obtain a dataframe):
data
```
Note that the `month` argument specifies a range in the form of `c(min, max)`, while the `commodity` argument specifies a collection of commodities (not a range).
As you can see, results are not easily interpretable as they stand. The HMRC API also has lookups which can be loaded separately using `load_custom()`, or joined automatically:
```{r load_ots example with join}
# Load specific lookups separately:
commodity_lookup <- load_custom(endpoint = "Commodity")
# Or join automatically with the `join_lookup = TRUE` option:
data <- load_ots(month = c(201901, 201912), commodity = c(22083030, 22085011), join_lookup = TRUE)
data
```
Loading aggregate data (such as all spirits, HS4 code 2208) is possible too (in the background, this is loading commodity codes greater than or equal to 22080000 and less than or equal to 22089999). You can also see what URL the code is using by specifying `print_URL = TRUE`:
```{r load_ots example with join and HS4}
data <- load_ots(month = c(201901, 201912), commodity = 2208, join_lookup = TRUE, print_url = TRUE)
data
```
When loading an HS2 code, or a SITC1 or SITC2 code, so-called Below Threshold Trade Allocation estimates are also loaded (for EU trade). These are, roughly, estimated values for those trades which fell below the Intrastat Survey threshold. At more detailed commodity levels, these estimates are excluded. BTTA trade estimates have different commodity codes (9-digit CN codes ending in 9999999, or 7-digit SITC codes ending in 99999):
```{r load_ots example with BTTA}
data <- load_ots(month = c(202101, 202103), commodity = "03", join_lookup = TRUE, print_url = TRUE)
library(dplyr)
library(stringr)
data %>%
filter(stringr::str_detect(Sitc4Code, "-"))
```
Specifying `commodity = NULL` and `SITC = NULL` will load all commodities (this may take considerable time). This will also include so-called non-response estimates, which have negative commodity codes (and currently cannot be split by e.g. SITC2 or HS2). For example, we can load all exports to Australia in January 2019:
```{r load_ots example with join and all commodities}
data <- load_ots(month = 201901, country = "AU", flow = 4, commodity = NULL, join_lookup = TRUE)
data
```
We can also use SITC codes - here, we load all beverage (SITC2 code 11) exports to Australia in January 2019:
```{r load_ots example with join and sitc commodities}
data <- load_ots(month = 201901, country = "AU", flow = 4, sitc = "11", join_lookup = TRUE)
data
```
Note that SITC codes need to be in character format, and include any leading zeros. This is because, unlike HS or CN codes, odd SITC codes exist (e.g. SITC1). We can also select a selection of codes in a similar way to HS/CN codes:
```{r load_ots example with join and sitc commodities 2}
data <- load_ots(month = 201901, country = "AU", flow = 4, sitc = c("0", "11"), join_lookup = TRUE)
data
```
### RTS
Loading all UK regional trade of 2-digit SITC Divisions '00 - Live animals other than animals of division 03' to '11 - Beverages' for 2019 is done by specifying `sitc = c(00, 11)` (leading zeros are removed). This again specifies a range, similar to the `month` argument. This is done to avoid URLs being too long.
```{r load_rts example}
data <- load_rts(month = c(201901, 201912), sitc = c(00, 11), join_lookup = TRUE)
data
```
Note: where relevant, BTTA data is [included in RTS data](https://www.gov.uk/government/statistics/overseas-trade-statistics-methodologies/regional-trade-in-goods-statistics-methodology) as well.
## Example using trader data
*Trader code is a work in progress...*
## Example using a custom URL
This example uses the `load_custom()` function to replicate the example given in the [API documentation](https://www.uktradeinfo.com/api-documentation/) for finding all traders that exported ‘Live horses (excl. pure-bred for breeding)’ from the ‘CB8’ post code in 2019. This way of loading data is possible, but will require more data manipulation after obtaining the results. This function is also the workhorse (no pun intended) behind the other functions.
```{r load_custom example}
data <- load_custom(endpoint = "Commodity", custom_search = "?$filter=Hs6Code eq '010129'&$expand=Exports($filter=MonthId ge 201901 and MonthId le 201912 and startswith(Trader/PostCode, 'CB8'); $expand=Trader)", output = "tibble")
data
```
Note that the variables expanded in the API query, Exports and Trader, both need to be expanded as they are contained in a nested <list> column. When unnested using `tidyr::unnest()`, we can see the final results. The second expanded variable (Trader) itself contains 8 columns (which are TraderId, CompanyName, five Address columns, and PostCode).
```{r unnesting columns}
tidyr::unnest(data, Exports, names_repair = "unique")
```
## MIT License
You're free to use this code and its associated documentation without restriction. This includes both personal and commercial use. However, there is no warranty provided or any liability on my part. See the LICENSE file for more information.