We will want to add new rules to this validator as the static GTFS specification evolves. This page outlines the process of adding new rules to this tool. Note that:
- Notices related to file parsing and data types are defined in the core
- Notices related to GTFS semantics/business logic are encapsulated within the related validation rule class. See the example below in
TripUsageValidator
-
Check the current rules to make sure the rule doesn't already exist.
-
Check the list of possible future rules to see if an issue already exists for the proposed rule.
- If no existing issue exists, open a new issue.
-
Discuss the rule with the community via the Github issue and come to a consensus on the exact logic, and if it should be an
ERROR
, aWARNING
or anINFO
. See definitions for severities. -
Please note that
ValidationNotices
should be distinguished fromSystemErrors
: whileValidationNotices
give information about the data quality,SystemErrors
are not semantic errors, they give information about things that may have gone wrong during the validation process such as an impossibility to unzip a GTFS archive. -
Implement new rule using the process below
Let's look at an example to check that all trips in "trips.txt" have at least two stops in stop_times.txt
. If a trip has less than 2 records in stop_times.txt
, a WARNING
should be generated.
tl;dr - here's what the complete rule looks like:
/**
* Validates that every trip in "trips.txt" is used by at least two stops from "stop_times.txt"
*
* <p>Generated notice: {@link UnusableTripNotice}.
*/
@GtfsValidator
public class TripUsabilityValidator extends FileValidator {
private final GtfsTripTableContainer tripTable;
private final GtfsStopTimeTableContainer stopTimeTable;
@Inject
TripUsabilityValidator(
GtfsTripTableContainer tripTable, GtfsStopTimeTableContainer stopTimeTable) {
this.tripTable = tripTable;
this.stopTimeTable = stopTimeTable;
}
@Override
public void validate(NoticeContainer noticeContainer) {
for (GtfsTrip trip : tripTable.getEntities()) {
String tripId = trip.tripId();
if (stopTimeTable.byTripId(tripId).size() <= 1) {
noticeContainer.addValidationNotice(new UnusableTripNotice(trip.csvRowNumber(), tripId));
}
}
}
/**
* Trips must have more than one stop to be usable.
*
* A trip must visit more than one stop in `stop_times.txt` to be usable by passengers for boarding and alighting.
*/
@GtfsValidationNotice(
severity = WARNING,
files = @FileRefs({GtfsStopTimeSchema.class, GtfsTripSchema.class}),
urls = {
@UrlRef(
label = "Original Python validator implementation",
url = "https://github.com/google/transitfeed")
})
static class UnusableTripNotice extends ValidationNotice {
/** The row number of the faulty record. */
private final int csvRowNumber;
/** The faulty record's id. */
private final String tripId;
UnusableTripNotice(int csvRowNumber, String tripId) {
this.csvRowNumber = csvRowNumber;
this.tripId = tripId;
}
}
}
The above TripUsabilityValidator.java
file is located in the /main/src/main/java/org/mobilitydata/gtfsvalidator/validator
folder, where all new validation rules are stored. All files annotated with @GtfsValidator
will automatically be queued for processing - you don't need to manually tell the validator to execute this rule by editing another file.
The following steps explain how to implement the above rule.
If you want to take a look at a complete set of changes that implement this new rule (including tests and documentation) before diving into the instructions, see this commit on Github.
You'll notice that the above validation rule class ...extends FileValidator
- this means that the rule needs to examine more than one record in a GTFS file or cross-reference more than one file.
Alternately, if the rule only needs to look at a single record at a time (e.g., to make sure each record's start date comes before the end date), you should use ...extends SingleEntityValidator
instead - this is preferred for performance reasons.
For efficiency, multiple rules related to similar fields can be implemented in the same ...Validator.java
class (e.g., to avoid iterating through all rows from GTFS file stop_times.txt
for each rule), so take a look at the existing validators in the /main/src/main/java/org/mobilitydata/gtfsvalidator/validator
folder to see if there is already a related rule.
Note that some validators are automatically generated based on annotations in the GTFS table schema classes, so you'll never need to implement these manually:
...ForeignKeyValidator
- Checks if valid primary key values exist for all foreign key values (for all fields with@ForeignKey
). For example, if there is a record instop_times.txt
withtrip_id = 5
, this validator will generate an error if there isn't a record intrips.txt
withtrip_id = 5
....EndRangeValidator
- Checks if time or date ranges for a record are in order (for all fields with@EndRange
)....TableHeaderValidator
- Checks if@Required
fields exist in a file and outputsINFO
notices for any unknown fields (e.g., to help catch typos in field names).
The UnusableTripNotice
is the container for information that will be exported to JSON when this rule detects a problem and is also where we declare if this notice is a WARNING
or ERROR
.
@GtfsValidationNotice(severity = WARNING, ...)
static class UnusableTripNotice extends ValidationNotice {
/** The row number of the faulty record. */
private final int csvRowNumber;
/** The faulty record's id. */
private final String tripId;
UnusableTripNotice(int csvRowNumber, String tripId) {
this.csvRowNumber = csvRowNumber;
this.tripId = tripId;
}
}
In this case, because the GTFS spec doesn't explictly say that each trip requires at least two stops, we can't say it's an ERROR
. But it's still suspicious (riders need to board and exit the vehicle), so we set this as a WARNING
.
You can set up the notice constructor UnusableTripNotice(int csvRowNumber, String tripId)
to take in whatever variables you want to pass from the validator to the notice, specifying them as fields in the notice to write them to the JSON output.
For example, this notice will appear in JSON output as:
{
"notices":[
{
"code":"unusable_trip",
"severity":"WARNING",
"totalNotices":1,
"notices":[
{
"csvRowNumber":40150
"tripId":"3362144",
},
...
]
}
]
}
Values for csvRowNumber
and tripId
will be different for each generated notice.
Here's the fun part - writing the rule. Because this rule ...extends FileValidator
, we need to define what GTFS files we want - in this case the trips.txt
and stop_times.txt
tables.
We do that by declaring the two variables at the top of the class GtfsTripTableContainer tripTable
and GtfsStopTimeTableContainer stopTimeTable
(similar ...TableContainer
classes exist for all GTFS files).
We also need to include the @Inject
annotation on the TripUsabilityValidator(...)
constructor and assign the variables - this is how the validator gives you access to these files.
@GtfsValidator
public class TripUsabilityValidator extends FileValidator {
private final GtfsTripTableContainer tripTable;
private final GtfsStopTimeTableContainer stopTimeTable;
@Inject
TripUsabilityValidator(
GtfsTripTableContainer tripTable, GtfsStopTimeTableContainer stopTimeTable) {
this.tripTable = tripTable;
this.stopTimeTable = stopTimeTable;
}
@Override
public void validate(NoticeContainer noticeContainer) {
for (GtfsTrip trip : tripTable.getEntities()) {
String tripId = trip.tripId();
if (stopTimeTable.byTripId(tripId).size() <= 1) {
noticeContainer.addValidationNotice(new UnusableTripNotice(trip.csvRowNumber(), tripId));
}
}
}
...
}
The validate()
method contains the logic to loop through the trips table, and for each trip_id
it checks if the size of the list of stop_times for that trip_id
is less than or equal to 1.
If so, it adds a notice to the noticeContainer
with the info needed to troubleshoot this error in the GTFS data - this notice will then be output to JSON.
That's it for the main rule logic!
Before we talk about documentation and testing for our new rule, let's look at how this rule would be implemented differently if it was looking at a single record at a time instead of multiple records and files.
Below is an example of a rule FeedServiceDateValidator
that checks that if a start date has been provided in feed_info.txt
an end date was also provided (and vice versa).
Because the start and end date are within a single feed info record, we use ...extends SingleEntityValidator<GtfsFeedInfo>
instead:
/**
* Validates that if one of {@code (start_date, end_date)} fields is provided for {@code
* feed_info.txt}, then the second field is also provided.
*
* <p>Generated notice: {@link MissingFeedInfoDateNotice}.
*/
@GtfsValidator
public class FeedServiceDateValidator extends SingleEntityValidator<GtfsFeedInfo> {
@Override
public void validate(GtfsFeedInfo feedInfo, NoticeContainer noticeContainer) {
if (feedInfo.hasFeedStartDate() && !feedInfo.hasFeedEndDate()) {
noticeContainer.addValidationNotice(
new MissingFeedInfoDateNotice(feedInfo.csvRowNumber(), "feed_end_date"));
} else if (!feedInfo.hasFeedStartDate() && feedInfo.hasFeedEndDate()) {
noticeContainer.addValidationNotice(
new MissingFeedInfoDateNotice(feedInfo.csvRowNumber(), "feed_start_date"));
}
}
/**
* Even though `feed_info.start_date` and `feed_info.end_date` are optional, if one field is
* provided the second one should also be provided.
*/
@GtfsValidationNotice(severity = WARNING, bestPractices = @FileRefs(GtfsFeedInfoSchema.class))
static class MissingFeedInfoDateNotice extends ValidationNotice {
/** The row number of the faulty record. */
private final int csvRowNumber;
/** Either `feed_end_date` or `feed_start_date`. */
private final String fieldName;
MissingFeedInfoDateNotice(int csvRowNumber, String fieldName) {
this.csvRowNumber = csvRowNumber;
this.fieldName = fieldName;
}
}
}
The validate()
now takes an additional parameters now:
GtfsFeedInfo feedInfo
- The GTFS record type to validate. The type should always match the type you use in...extends SingleEntityValidator<X>
.NoticeContainer
- same as before, you add your notices here to export to JSON.
Note that we don't need to define the GTFS tables as local variables and we can also omit the constructor. The notice subclass is declared the same as before.
Notices are documented directly in source code via comments.
Coming back to our example:
/**
* Trips must have more than one stop to be usable.
*
* A trip must visit more than one stop in `stop_times.txt` to be usable by passengers for boarding and alighting.
*/
@GtfsValidationNotice(
severity = WARNING,
files = @FileRefs({GtfsStopTimeSchema.class, GtfsTripSchema.class}),
urls = {
@UrlRef(
label = "Original Python validator implementation",
url = "https://github.com/google/transitfeed")
})
static class UnusableTripNotice extends ValidationNotice {
/** The row number of the faulty record. */
private final int csvRowNumber;
/** The faulty record's id. */
private final String tripId;
UnusableTripNotice(int csvRowNumber, String tripId) {
this.csvRowNumber = csvRowNumber;
this.tripId = tripId;
}
}
Each Notice needs a class-level comment of the following form:
/**
* Short text describing the notice in a single line (required).
*
* Additional text further describing the notice on new lines (optional).
*/
The short description should generally describe the invalid condition found in the feed (e.g. "A recommended file is missing.") as oppossed to describing the expected condition (e.g. "All recommended files should be present.").
Additional text describing the notice is allowed on lines separate from the short description. Markdown syntax is allowed and should be used instead of Javadoc syntax (e.g. prefer `value`
over {@code value}
). Unfortunately, our code formatter will still try to enforce Javadoc formatting to a certain extent, so if you need more complex Markdown formatting (e.g. a list or table), wrap it in a <pre>
block:
/**
* Invalid data.
*
* Here's a list of things to check:
* <pre>
* - a
* - b
* - c
* </pre>
If you would like to link to the GTFS reference or best-practices in a generic way, add a documentation reference to the @GtfsValidationNotice
so that documentation can be generated in a consistent way.
Each field of the Notice must also be documented with a field-level /** comment */
(note: not // comment
).
gtfs-validator
tests use JUnit 4
and Google Truth
.
Generally, you'll want to add test cases to make sure a notice isn't being generated for good data (check for false positive) and that a notice is being generated for bad data (check for false negative).
The following sections describe implementing tests for validators that extend both FileValidator
and SingleEntityValidator
.
tl;dr - The core of the test will end up looking like this:
public class TripUsabilityValidatorTest {
...
@Test
public void tripServingMoreThanOneStopShouldNotGenerateNotice() {
assertThat(
generateNotices(
ImmutableList.of(
createTrip(1, "route id value", "service id value", "t0"),
createTrip(3, "route id value", "service id value", "t1")),
ImmutableList.of(
createStopTime(0, "t0", "s0", 2),
createStopTime(2, "t0", "s1", 3),
createStopTime(0, "t1", "s3", 5),
createStopTime(2, "t1", "s4", 9))))
.isEmpty();
}
@Test
public void tripServingOneStopShouldGenerateNotice() {
assertThat(
generateNotices(
ImmutableList.of(
createTrip(1, "route id value", "service id value", "t0"),
createTrip(3, "route id value", "service id value", "t1")),
ImmutableList.of(
createStopTime(0, "t0", "s0", 2),
createStopTime(0, "t1", "s3", 5),
createStopTime(2, "t1", "s4", 9))))
.containsExactly(new UnusableTripNotice(1, "t0"));
}
private static List<ValidationNotice> generateNotices(
List<GtfsTrip> trips, List<GtfsStopTime> stopTimes) {
NoticeContainer noticeContainer = new NoticeContainer();
new TripUsabilityValidator(
GtfsTripTableContainer.forEntities(trips, noticeContainer),
GtfsStopTimeTableContainer.forEntities(stopTimes, noticeContainer))
.validate(noticeContainer);
return noticeContainer.getValidationNotices();
}
}
See a full example here.
1️⃣ Create an instance of the validator to test
TripUsabilityValidator tripUsabilityValidator = new TripUsabilityValidator();
2️⃣ Create the relevant GtfsTableContainers
and inject them in the validator
tripUsabilityValidator.tripTable =
createTripTable(
noticeContainer,
ImmutableList.of(
createTrip(1, "route id value", "service id value", "t0"),
createTrip(3, "route id value", "service id value", "t1")));
tripUsabilityValidator.stopTimeTable =
createStopTimeTable(
noticeContainer,
ImmutableList.of(
createStopTime(0, "t0", "s0", 2),
createStopTime(2, "t0", "s1", 3),
createStopTime(0, "t1", "s3", 5),
createStopTime(2, "t1", "s4", 9)));
3️⃣ Execute the validator .validate()
method
underTest.validate(noticeContainer);
4️⃣ Verify the content of NoticeContainer
.
assertThat(noticeContainer.getValidationNotices()).isEmpty();
Similar to the rule itself, implementing a unit test for a SingleEntityValidator
is slightly different.
tl;dr - here's the core of what it looks like:
public class FeedServiceDateValidatorTest {
...
public void noStartDateShouldGenerateNotice() {
assertThat(
generateNotices(
createFeedInfo(
1, "name value", "url value", Locale.CANADA, null, GtfsDate.fromEpochDay(450))))
.containsExactly(new MissingFeedInfoDateNotice(1, "feed_start_date"));
}
@Test
public void bothDatesCanBeBlank() {
assertThat(
generateNotices(
createFeedInfo(
1, "name value", "https://www.mobilitydata.org", Locale.CANADA, null, null)))
.isEmpty();
}
...
private static List<ValidationNotice> generateNotices(GtfsFeedInfo feedInfo) {
NoticeContainer noticeContainer = new NoticeContainer();
new FeedServiceDateValidator().validate(feedInfo, noticeContainer);
return noticeContainer.getValidationNotices();
}
public static GtfsFeedInfo createFeedInfo(
int csvRowNumber,
String feedPublisherName,
String feedPublisherUrl,
Locale feedLang,
GtfsDate feedStartDate,
GtfsDate feedEndDate) {
return new GtfsFeedInfo.Builder()
.setCsvRowNumber(csvRowNumber)
.setFeedPublisherName(feedPublisherName)
.setFeedPublisherUrl(feedPublisherUrl)
.setFeedLang(feedLang)
.setDefaultLang(null)
.setFeedStartDate(feedStartDate)
.setFeedEndDate(feedEndDate)
.setFeedVersion(null)
.setFeedContactEmail(null)
.setFeedContactUrl(null)
.build();
}
}
See a full example here.
1️⃣ Create a GtfsEntity
via an annex private method:
public static GtfsFeedInfo createFeedInfo(
int csvRowNumber,
String feedPublisherName,
String feedPublisherUrl,
Locale feedLang,
GtfsDate feedStartDate,
GtfsDate feedEndDate) {
return new GtfsFeedInfo.Builder()
.setCsvRowNumber(csvRowNumber)
.setFeedPublisherName(feedPublisherName)
.setFeedPublisherUrl(feedPublisherUrl)
.setFeedLang(feedLang)
.setDefaultLang(null)
.setFeedStartDate(feedStartDate)
.setFeedEndDate(feedEndDate)
.setFeedVersion(null)
.setFeedContactEmail(null)
.setFeedContactUrl(null)
.build();
}
2️⃣ Create a NoticeContainer
:
NoticeContainer container = new NoticeContainer();
3️⃣ Execute the validator one the previously defined parameters (GtfsEntity
and NoticeContainer
).
new FeedServiceDateValidator().validate(feedInfo, noticeContainer);
4️⃣ Verify the content of NoticeContainer
:
@Test
public void noStartDateShouldGenerateNotice() {
assertThat(
generateNotices(
createFeedInfo(
1, "name value", "url value", Locale.CANADA, null, GtfsDate.fromEpochDay(450))))
.containsExactly(new MissingFeedInfoDateNotice(1, "feed_start_date"));
}