Duplicate/almost identical trips in GTFS Bus Data from Feb, 2025

62 views
Skip to first unread message

elif ensari

unread,
May 13, 2026, 10:19:33 AM May 13
to mtadeveloperresources
Hi, 
We are working with the GTFS bus schedules from February 2025 and are finding several Weekday trip ids representing identical or very similar trips. 

Why do these trips repeat, and is there an easy way to clean these up based on trip id or other data? Currently, we drop duplicates based on first departure, last arrival, number of stops, direction and route.

Some examples to identical trips:
B74:
UP_A5-Weekday-SDon-086500_B74_605, UP_A5-Weekday-SDon-086500_B6_204, UP_A5-Weekday-SDon-086500_B6_281, UP_A5-Weekday-SDon-086500_B6_206

and 

Q22:
41914478-FRPA5-FR_A5-Weekday-10-SDon, 41914490-FRPA5-FR_A5-Weekday-10-SDon, 41914500-FRPA5-FR_A5-Weekday-10-SDon, 41914714-FRPA5-FR_A5-Weekday-10-SDon


There are trips where a majority of stop times are identical but there are a few seconds of difference for one or more stop times. Example trip ids:

B8 - 50th stop time shifts by 11 seconds:
JG_A5-Weekday-147400_B8_150 and JG_A5-Weekday-SDon-147400_B8_150

Q54 - all stop times shift by 1:50 after the 7th stop :
FP_A5-Weekday-121700_Q54_725 and FP_A5-Weekday-SDon-121700_Q58_880


Thank you



Stephen Bauman

unread,
May 13, 2026, 1:03:00 PM May 13
to mtadeveloperresources
What's the date on the datasets you are working with. I've archived one set that was posted on the website on 2024-12-31 and another on 2025-02-15? Both dates are the complete 6 sets for the boroughs and MTA Bus.

This may not apply to your study but you have to be careful when combining the datasets into a single one. I've noticed bus stop location discrepancies between for the same bus stop between different datasets. I've had to modify the stop id to include the dataset. This is probably a bigger problem in Queens, which is served by both NYCT and MTA Bus. As per the GTFS spec, ID's have to be unique (have unique properties) within the same GTFS. No guarantee among different GTFS schedules. I've been burned in the past.

Steve
Message has been deleted

Jayden Lin

unread,
May 13, 2026, 6:58:58 PM May 13
to mtadeveloperresources
(if the name seems familiar, i relooked at the trips and found the actual reason why)

Those B74 and Q22 identical trips are actual separate bus trips, all ran at the same time. Those are what the MTA refers to as a school tripper, and will only run on school days. The reason why there are 4 for each is due to the fact that there are 4 buses that start from the first stop, that being a school, and therefore are not duplicate trips but separate trips, all ran at the same time.

I see in both the B8 and Q54 example, both trips seems the same but they are technically different in it's own way. The ones with "A5-Weekday" refers to the trips that are operated, when school is closed . They may seem the same, but internally, it's a different schedule compared to the ones with "A5-Weekday-SDon". which refers to trips that are operated, when school is open . Traffic conditions vary whether it's a school day or not and the schedule somewhat reflects that in a way, or there are additional trips on school days that will not run when school is closed.
On Wednesday, May 13, 2026 at 10:19:33 AM UTC-4 elif....@gmail.com wrote:

elif ensari

unread,
Jun 1, 2026, 3:12:10 PM Jun 1
to mtadeveloperresources
Thank you Stephen,
Yes we did find those identical stops with different locations too.

elif ensari

unread,
Jun 2, 2026, 2:43:16 PM Jun 2
to mtadeveloperresources
I have one follow up question to clarify: 

If I want to take into account all the trips on a school day, should I only be looking at trips with an -SDon- tag, or keep the unique trips (that don't have identical routes and stop-times) BUT get rid of those that are identical to a trip with an "-SDon-" tag but doesn't have "-SDon-" in its trip id?

Thank you 

Jayden Lin

unread,
Jun 3, 2026, 7:43:02 AM Jun 3
to mtadeveloperresources
For school days, the most effective way is to use all SDon trips and not use the trips without SDon. On non-school days, the most effective way is to not use trips with SDon and only say Weekday.

Stephen Bauman

unread,
Jun 3, 2026, 11:02:28 PM Jun 3
to mtadeveloperresources
There is recent bus schedule dataset, that might provide a definitive answer. It's on the NYS Open Data website.

It's:
I have not verified if the Service ID column is the same as that found in the GTFS feed.  This is the Data Dictionary explanation:

Name of service running for the day. The format of the codes varies between NYCT and MTA Bus routes. Both include the two-character depot code, the pick name, the last digit of the year, the type of day (Weekday/Saturday/Sunday), and if school is open, also SDon

There is also a Trip Type column that specifically identifies the following types:

The type of bus trip: 
1 = Normal revenue trip 
2 = Pull-out from depot 
3 = Pull-in to depot 
4 = Deadhead 
10 = School service – Limited (School service is service on a variant path associated with a school.) 
11 = School Service - Normal 
12 = Limited-stop service 
13 = Express service 
14 = Select Bus Service

There's also the School column

Whether the trip is made while school is open or closed. This value is blank for non-revenue and weekend and holiday trips.

There should be enough information regarding school schedules. If the service id column agrees with the GTFS schedule, then you be able to be able to reconcile the school runs.
Reply all
Reply to author
Forward
0 new messages