Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ntd_id: revise dim organiztions and enrich ntd endpoints #3710

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

charlie-costanzo
Copy link
Member

@charlie-costanzo charlie-costanzo commented Feb 18, 2025

Description

This PR seeks to replace the deprecated use of _deprecated__ntd_agency_to_organization in dim_organizations with the appropriate alternative, as well as enrich ntd mart tables with caltrans_district from the refactored dim_organizations, and perform some other cleanup tasks as outlined below:

Refactor mart.dim_organizations:

  • remove deprecated use of _deprecated__ntd_agency_to_organization
  • instead, utilize ntd_agency_info_key from int_transit_database__organizations_dim for ntd_id previously inherited from _deprecated__ntd_agency_to_organization
  • Remaining Questions
    • Is this an appropriate substitution and use of ntd_id?
    • Should we be incorporating versioned use of ntd_id/ntd_agency_info_key based on analysis period? Thinking specifically in relation to cutover date and our dashboarding work.
    • Or is analysis best performed with current ntd_id and caltrans_district

Enrich NTD mart tables with caltrans_district:

  • In the following directories: mart.ntd_annual_reporting; mart.ntd_funding_and_expenses; mart.ntd_ridership; mart.ntd_safety_and_security;

Data type handling in stg_transit_database__organizations:

  • Cast unnested_ntd_records/ntd_agency_info_key as string appropriately

Rename column in stg_ntd__major_safety_events / fct_major_safety_events:

  • rename ntdid as ntd_id for consistency

Fix incorrectly named CTEs, add explicit column names (dbt best practice):

  • mart.ntd_ridership
    • TODO: consolidate ntd ridership directories currently duplicated and split between ntd and ntd_ridership directories
    • This should probably mean continuing to use the tables currently in the ntd directory but moving them to the appropriately-specific directory ntd_rideship

Resolves: #3709

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this been tested?

poetry run dbt run -s +mart.ntd_ridership +mart.ntd_annual_reporting +mart.ntd_safety_and_security +mart.funding_and_expenses +mart.transit_database
Screenshot 2025-02-25 at 10 49 07 AM

Post-merge follow-ups

  • No action required

@charlie-costanzo charlie-costanzo self-assigned this Feb 18, 2025
@charlie-costanzo charlie-costanzo added the data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner. label Feb 18, 2025
@charlie-costanzo charlie-costanzo force-pushed the revise-dim-organiztions-and-enrich-ntd-endpoints branch from b65ec76 to e95607a Compare February 21, 2025 16:59
@charlie-costanzo charlie-costanzo marked this pull request as ready for review February 25, 2025 16:55
current_dim_organizations AS (
SELECT
ntd_id,
caltrans_district
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is still extracting the Caltrans district from dim_organizations which extracts it form a deprecated column in Airtable. We should probably remove Caltrans District from the dim_organizations table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, thanks for letting me know @evansiroky! I was hoping to use this to enrich the new NTD mart tables with caltrans_district on ntd_id, is there a different source of truth for caltrans_district that we could use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlie-costanzo charlie-costanzo force-pushed the revise-dim-organiztions-and-enrich-ntd-endpoints branch from 2dbbd0e to 45c946e Compare February 27, 2025 17:59
@charlie-costanzo charlie-costanzo force-pushed the revise-dim-organiztions-and-enrich-ntd-endpoints branch from 45c946e to 7348187 Compare March 7, 2025 17:20
…ttern, remove deprecated airtable caltrans district columns from stg/int tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enrich dim_organizations with Caltrans District ID and pass through to new NTD endpoints
2 participants