Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements arising from suggestions evt #132

Merged
merged 11 commits into from
Feb 7, 2025
Prev Previous commit
Next Next commit
rename mobility datasets v2
eugenividal committed Jan 14, 2025
commit 59a0ddf971ed8c55d697dcc01493eea34045870e
6 changes: 3 additions & 3 deletions vignettes/v1-2020-2021-mitma-data-codebook.qmd
Original file line number Diff line number Diff line change
@@ -83,7 +83,7 @@ Data structure:

| Variable Name | **Description** |
|------------------------------------|------------------------------------|
| `id` | District `id` assigned by the data provider. Matches with `id_origin`, `id_destination`, and `id` in district level [origin-destination data](#od-data) and [number of trips data](#pop-tc). |
| `id` | District `id` assigned by the data provider. Matches with `id_origin`, `id_destination`, and `id` in district level [origin-destination data](#od-data) and [number of trips data](#ptc-data). |
| `census_districts` | A string with semicolon-separated list of census district semicolon-separated identifiers as classified by the Spanish Statistical Office (INE) that are spatially bound within polygons with `id` above. |
| `municipalities_mitma` | A string with semicolon-separated list of municipality identifiers as assigned by the data provider in municipality zones spatial dataset that correspond to a given district `id` . |
| `municipalities` | A string with semicolon-separated list of municipality identifiers as classified by the Spanish Statistical Office (INE) that correspond to polygons with `id` above. |
@@ -203,15 +203,15 @@ The same summary operation as provided in the example above can be done with the

{{< include ../inst/vignette-include/csv-date-filter-note.qmd >}}

## 2.2. Population by trip count data {#pop-tc}
## 2.2. Population by trip count data {#ptc-data}

The population by trip count data shows the number of individuals in each district or municipality, categorized by the trips they make: 0, 1, 2, or more than 2.

| **English Variable Name** | **Original Variable Name** | **Type** | **Description** |
|----------------|----------------|----------------|------------------------|
| `date` | `fecha` | `Date` | The date of the recorded data, formatted as `YYYY-MM-DD`. |
| `id` | `distrito` | `factor` | The identifier of the `district` or `municipality` zone. |
| `n_trips` | `numero_viajes` | `factor` | The number of trips grouped into four categories `0`, `1`, `2`, or `2+`. |
| `n_trips` | `numero_viajes` | `factor` | The number of individuals who made trips, categorized by `0`, `1`, `2`, or `2+` trips. |
| `n_persons` | `personas` | `factor` | The number of individuals making the trips from `district` or `municipality` with zone `id`. |
| `year` | `year` | `integer` | The year of the recorded data, extracted from the date. |
| `month` | `month` | `integer` | The month of the recorded data, extracted from the date. |
10 changes: 5 additions & 5 deletions vignettes/v2-2022-onwards-mitma-data-codebook.qmd
Original file line number Diff line number Diff line change
@@ -235,15 +235,15 @@ od_mean_trips_by_ses_over_the_4_days
# ℹ Use `print(n = ...)` to see more rows
```

In this example above, becaus the data is with hourly intervals within each day, we first summed the number of trips for each day by age, sex, and income groups. We then grouped the data again dropping the day variable and calculated the mean number of trips per day by age, sex, and income groups. The full data for all 4 days was probably never loaded into memory all at once. Rather the available memory of the computer was used up to its maximum limit to make that calculation happen, without ever exceeding the available memory limit. If you were doing the same opearation on 100 or even more days, it would work in the same way and would be possible even with limited memory. This is done transparantly to the user with the help of [`DuckDB`](https://duckdb.org/){target="_blank"} (specifically, with [{duckdb} R package](https://r.duckdb.org/index.html){target="_blank"} @duckdb-r).
In this example above, because the data is with hourly intervals within each day, we first summed the number of trips for each day by age, sex, and income groups. We then grouped the data again dropping the day variable and calculated the mean number of trips per day by age, sex, and income groups. The full data for all 4 days was probably never loaded into memory all at once. Rather the available memory of the computer was used up to its maximum limit to make that calculation happen, without ever exceeding the available memory limit. If you were doing the same opearation on 100 or even more days, it would work in the same way and would be possible even with limited memory. This is done transparently to the user with the help of [`DuckDB`](https://duckdb.org/){target="_blank"} (specifically, with [{duckdb} R package](https://r.duckdb.org/index.html){target="_blank"} @duckdb-r).

The same summary operation as provided in the example above can be done with the entire dataset for multiple years worth of data on a regular laptop with 8-16 GB memory. It will take a bit of time to complete, but it will be done. To speed things up, please also see the [vignette on converting the data](convert.qmd) into formats that will increase the analsysis performance.
The same summary operation as provided in the example above can be done with the entire dataset for multiple years worth of data on a regular laptop with 8-16 GB memory. It will take a bit of time to complete, but it will be done. To speed things up, please also see the [vignette on converting the data](convert.qmd) into formats that will increase the analysis performance.

{{< include ../inst/vignette-include/csv-date-filter-note.qmd >}}

## 2.2. Number of trips data {#nt-data}
## 2.2. Population by trip count data {#ptc-data}

For each location, the "number of trips" data provides the number of individuals who spent the night there, with breakdown by the number of trips made, age, and sex.
The population by trip count data shows the number of individuals in each district or municipality, categorized by the trips they make (0, 1, 2, or more than 2), age, and sex.

| **English Variable Name** | **Original Variable Name** | **Type** | **Description** |
|-----------------|-----------------|-----------------|----------------------|
@@ -272,7 +272,7 @@ Because this data is small, we can actually load it completely into memory:
nt_dist_tbl <- nt_dist |> dplyr::collect()
```

## 2.3. Overnight stays {#os-data}
## 2.3. Population by overnight stay data {#pos-data}

This dataset provides the number of people who spend the night in each location, also identifying their place of residence down to the census district level according to the [INE encoding](https://www.ine.es/ss/Satellite?c=Page&p=1259952026632&pagename=ProductosYServicios%2FPYSLayout&cid=1259952026632&L=1){target="_blank"}.