Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RTPA and MPO to external_airtable_california_transit_organizations.yml #3755

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

fsalemi
Copy link
Contributor

@fsalemi fsalemi commented Mar 7, 2025

Description

Adding RTPA and MPO to external_airtable_california_transit_organizations.yml in order to eventually add these fields to the dim_organizations in the GBC warehouse.

Resolves #[3751]

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Post-merge follow-ups

Once this step is completed, Staging, Intermediate and Marts models will be updated to include these fields in the dim_organizations.

  • No action required
  • Actions required (specified below)

@vevetron
Copy link
Contributor

vevetron commented Mar 7, 2025

Please link to the issue this is trying to solve, even if it doesn't full resolve the issue.

@fsalemi fsalemi merged commit 3e77334 into main Mar 7, 2025
1 check passed
@fsalemi fsalemi deleted the fsalemi-patch-1 branch March 7, 2025 23:15
@evansiroky
Copy link
Member

@fsalemi and @vevetron, I believe this will not work. When we download from the Airtable API, Linked Record types will be an array of strings wherein each value is the record ID of another record that is being referred to.

Also, I renamed the MPO/RTPA column to RTPA, so I don't believe we will need that former column. Also please work with @csuyat-dot to see if he needs the historic RTPAs and MPOs to be present for dim_organizations prior to this date. That may be a more difficult lift.

@vevetron
Copy link
Contributor

vevetron commented Mar 8, 2025

Checking the jsonl of organizations
One example - Presidio Trust:
"rtpa":["recSOmBsATeoQylk1"]
"mpo":["recSOmBsATeoQylk1"]

Full Json:
{"id":"recsBfXgev9ICDCY1","name":"Presidio Trust","mobility_services_managed":["recmdAc50ylp6TDxJ"],"mobility_services_operated":["recmdAc50ylp6TDxJ"],"website":"https:\/\/www.presidio.gov\/presidio-trust","record_creation_time":"2021-07-16T21:30:19.000Z","parent_organization":null,"service_type__from_mobility_services_managed_":["fixed-route"],"currently_operating__from_mobility_services_managed_":[true],"currently_operating__from_mobility_services_operated_":[true],"service_type__from_mobility_services_operated_":["fixed-route"],"organization_type":"Federal Government","funding_sources_for_managed_transportation":["private"],"headquarters_place":"San Francisco","caltrans_district":"04 - Oakland","planning_authority":["recSOmBsATeoQylk1"],"tracking_category":"Passive","reporting_category":"Other Transit","assist_category":"Limited","__of_fixed_route_services":1,"__services_w__complete_rt_status":1,"__fixed_route_services_w__static_gtfs":1,"complete_static_gtfs_coverage__1_yes_":1,"complete_rt_coverage":1,"__1_gtfs_feed_for_any_service__1_yes_":1,"___1_complete_rt_set__1_yes_":1,"service_availability_category__from_mobility_services_managed_":["Public\n"],"__fixed_route_or_deviated_fixed_route_services":1,"__fixed_route_or_deviated_fixed_route_service_w__static_gtfs":1,"__services_with_missing_static_feed_for_fixed_route_or_deviated_fixed_route":0,"gtfs_static_status":"Static OK","gtfs_realtime_status":"RT OK","missing_static":[],"services_needing_alerts":[],"services_needing_tripupdates_or_vehiclepositions":[],"service_availability":["Public\n"],"fixed_route_service_operator_type":["provider-operated"],"headquarters_state_country":"CA","schedule_datasets":0,"funding_sources__from_mobility_services_managed_":["recdGrp0YDnkaiRsw"],"funding_sources__from_mobility_services_managed__2":["recdGrp0YDnkaiRsw"],"qc__airtable_hack___unique_website":"starthttps:\/\/www.presidio.gov\/presidio-trustend","qc_automations":["recUgaBvLTFuC3KIm"],"number_of_duplicate_website_values":0,"manual_check__contact_on_website":"Unknown","manages_fixed_route_service":"Y","funding_sources__from_mobility_services_managed__3":["recdGrp0YDnkaiRsw"],"assessment_status":"No","hq_county_geography":["recj4vOglLJoYXBSc"],"hq_caltrans_district":["4"],"qc__count__hq_county_geography_":1,"funding_sources__from_mobility_services_managed__4":["recdGrp0YDnkaiRsw"],"funding_sources__from_mobility_services_managed__5":["recdGrp0YDnkaiRsw"],"temp___has_5311":"No","fixed_route__from_mobility_services_managed_":[true],"currently_operating_public__from_mobility_services_managed_":["Yes"],"currently_operating_public_fixed_route__from_mobility_services_managed_":["Yes"],"record_id":"recsBfXgev9ICDCY1","is_public_entity":"Yes","public_currently_operating":"Yes","public_currently_operating_fixed_route":"Yes","name_of_service__from_services_":["recmdAc50ylp6TDxJ"],"public_currently_operating_fixed_route__from_services_":["Yes"],"public_currently_operating__from_services_":["Yes"],"is_public__from_services_":["Yes"],"rider_requirements__from_services_":["rec9S3diP9OKrofzL"],"is_either_ntd_reporter_or_public_entity":"Yes","qc__unique_organization_name":"startPresidio Trustend","qc__has_minimum_required_fields":"OK","qc__number_of_duplicate_organization_names":1,"website_or_details_filled":"Yes","qc__count_of_services":1,"qc__services_and_hq_county":"No","is_ntd_reporter":"No","operating_county_geographies__from_mobility_services_managed_":["recj4vOglLJoYXBSc"],"organization_types":["Federal Government"],"qc__number_of_rtpas":1,"qc__number_of_mpos":1,"itp_id":257.0,"gtfs_datasets_produced":null,"ntd_id":null,"total_voms__ntd_":null,"service_area_sq_miles__ntd_":null,"service_area_population__ntd_":null,"rtpa":["recSOmBsATeoQylk1"],"gtfs_datasets_referenced":null,"gtfs_dataset__from_mobility_services_managed_":["recvj6iwkUqws5W99","recoM5LwmAiGLKOKz","recMQaaRx5sbGDEx0","recP11dOJsDBmdpMc","rec9AyXUSMUHFnLsH","recAQiomSPtajnjLy","recAfxxeJWxHexo6e"],"gtfs_schedule_status":["ok"],"fares_v2_status":["Blocked - Vendor"],"flex_status":null,"immediate_gtfs_goals":null,"gtfs_schedule_quality__from_mobility_services_managed_":null,"gtfs_realtime_quality__from_mobility_services_managed_":null,"hubspot_company_record_id":"1879168135","schedule_uris":["https:\/\/presidiobus.com\/gtfs","https:\/\/presidiobus.com\/gtfs-rt\/vehiclepositions","https:\/\/presidiobus.com\/gtfs-rt\/alerts","https:\/\/presidiobus.com\/gtfs-rt\/tripupdates","https:\/\/api.511.org\/transit\/datafeeds?api_key={{ MTC_511_API_KEY}}&operator_id=PG","https:\/\/api.511.org\/transit\/datafeeds?api_key={{ MTC_511_API_KEY}}&operator_id=RG","https:\/\/api.511.org\/transit\/servicealerts?api_key={{ MTC_511_API_KEY}}&agency=RG","https:\/\/api.511.org\/transit\/tripupdates?api_key={{ MTC_511_API_KEY}}&agency=RG","https:\/\/api.511.org\/transit\/vehiclepositions?api_key={{ MTC_511_API_KEY}}&agency=RG","https:\/\/api.511.org\/transit\/servicealerts?api_key={{ MTC_511_API_KEY}}&agency=PG","https:\/\/api.511.org\/transit\/tripupdates?api_key={{ MTC_511_API_KEY}}&agency=PG","https:\/\/api.511.org\/transit\/vehiclepositions?api_key={{ MTC_511_API_KEY}}&agency=PG"],"raw_ntd_id":null,"gtfs_datasets":["recMQaaRx5sbGDEx0","recvj6iwkUqws5W99","recP11dOJsDBmdpMc","recoM5LwmAiGLKOKz","reccqL2JazGZdj2UN","recD4c7CYsOu9CJQR","recJ4VQoXTBdWgMvP"],"holiday_website":"https:\/\/presidio.gov\/visit\/getting-to-and-around-the-park\/presidio-go-shuttle\/presidio-go-south-hills-shuttle-schedule","holiday_website_status":"Current","holiday_website_notes":"The site does not specify the calendar year.\u00a0\n","ntd_id_2022":null,"mpo":["recSOmBsATeoQylk1"],"details":null,"funding_programs":null,"feature_progress__schedule_demand_response_completeness":null,"fare_systems":null,"regional_gtfs_partner_organization":["recSOmBsATeoQylk1"],"regional_gtfs_partner_website":["https:\/\/511.org\/open-data\/transit"],"drmt_organization_name":null,"drmt_reported_5310_vehicles":null,"opm_id_drmt":null,"attending_calact_2022_fall_conference":null,"alias_":null,"service___component":null,"brand":null,"roles":null,"county_geography":null,"county_geography_3":null,"dotid":null,"feature_progress__realtime_fixed_route_completeness":null,"feature_progress__schedule_fixed_route_completeness":null,"administrating_organization":null,"services":null,"eligibility_programs":null,"coordinated_plans":null}

@vevetron
Copy link
Contributor

vevetron commented Mar 8, 2025

Ugh I think this broke prod.

select * from cal-itp-data-infra.staging.stg_transit_database__organizations limit 1 -> Error while reading table: cal-itp-data-infra.external_airtable.california_transit__organizations, error message: JSON parsing error in row starting at position 4999: Array specified for non-repeated field: rtpa. File: gs://calitp-airtable/california_transit__organizations/dt=2025-03-06/ts=2025-03-06T02:08:54.955024+00:00/organizations.jsonl.gz

Going to revert.

@evansiroky I still see both MPO and RTPA in today's airtable organizations table, as in the above. Did we want to delete one of these in airtable or do we want both?

Both values could are arrays, either 0 or 1 or multiple values. We should be able to handle it similar to parent_organizations, which is also an array. Though we might need to make it REPEATED and not NULLABLE.

This record has multiple parent_organizations:

{"id":"recsWf2pKGGymVP8C","name":"Glenn County Transportation Commission","mobility_services_managed":null,"mobility_services_operated":null,"website":"https:\/\/www.countyofglenn.net\/committee\/local-transportation-commission\/welcome","record_creation_time":"2021-10-11T23:52:04.000Z","parent_organization":["rec0sMQyK2v8Cs4Io","reczUYNHfMp8ZSLhU","rec3XbkR6hSQR8a4W"],"service_type__from_mobility_services_managed_":null,"currently_operating__from_mobility_services_managed_":null,"currently_operating__from_mobility_services_operated_":null,"service_type__from_mobility_services_operated_":null,"organization_type":"MPO\/RTPA","funding_sources_for_managed_transportation":null,"headquarters_place":"Willows","caltrans_district":"03 - Marysville","planning_authority":null,"tracking_category":"Active","reporting_category":"Core","assist_category":"White Glove","__of_fixed_route_services":0,"__services_w__complete_rt_status":0,"__fixed_route_services_w__static_gtfs":0,"complete_static_gtfs_coverage

array_cols = ['roles', 'alias', 'mobility_services_managed', 'parent_organization',

vevetron pushed a commit that referenced this pull request Mar 8, 2025
This reverts commit 3e77334, reversing
changes made to c1faf49.
@vevetron vevetron mentioned this pull request Mar 8, 2025
4 tasks
vevetron added a commit that referenced this pull request Mar 8, 2025
This reverts commit 3e77334, reversing
changes made to c1faf49.

Co-authored-by: V <[email protected]>
@vevetron
Copy link
Contributor

vevetron commented Mar 8, 2025

Somebody ran create_external_table at 3:22 pm, good call. I assume it's either @fsalemi or @evansiroky

select * from cal-itp-data-infra.staging.stg_transit_database__organizations limit 1 works now.

@fsalemi
Copy link
Contributor Author

fsalemi commented Mar 8, 2025

@vevetron, We need to have both MPO and RPTA in the table. I should have deleted/renamed the old mpo_rpta in the yml file. I will correct these on Monday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants