I want to add an amen to what Jehiah's said. Especially two points:
(1) SDP or GTFS doesn't matter. We're flexible and can surely handle either one. The important thing is that the data is kept up-to-date. Is the SDP data provided to Trips123 up-to-date? If so, maybe they can share that feed publically.
(2) In addition to schedule and fare data, historical performance data would be excellent. We could alert riders to the on-time rate of the service they're riding. Motivated data miners could also do some interesting analysis of bottlenecks in the system, which might in turn be useful to the MTA.
I also want to emphasize the fare data is important to us at Routefriend. I know many developers are interested only in schedule data.
I agree, a lot of the discussion has been about schedule data because
it's a sort of a starting point, but i agree that it's not the only
data we (as the public) are interested. I won't pretend to speak for
everyone though, i'll just speak for myself.
SDP vs GTFS. I think GTFS is a good starting point as many tools are
moving that way, but i don't see the two as mutually exclusive. If
there is data SDP format and not in GTFS format, then i'd say publish
that; simply put: publish whatever format the data is currently in for
starters. (side note: I am assuming SDP is Schedule Data Profile as
referenced here http://j.mp/H2Mz if so, it would be good to publish
the xml schema as well so that the data is truely open).
If the MTA can convert data and publish it in both formats it should
be published in both; if their are limitations or any data is dropped
between conversion (say complex fare rules) then it certainly should
be published in SDP format while the GTFS fare specifications mature
other data sets:
headways: if the mta schedules trains/busses based on headways i think
thats what the schedule data should reflect; gtfs has a notion of
representation for frequencies which seems appropriate.
realtime data: there is existing data in use on LIRR and several
subway lines for delay information in an electronic format; it's
displayed on electronic signs and should be made electronically
available. (for example: current txt/email alerts only cover large
delays for lirr even though many stations have electronic displays
with more granular data).
historical performance data: I know data is collection (sometimes in
electronic format) for each trip (especially the numbered subway lines
and LIRR) that captures for each scheduled trip at what time it
arrives at each point along the trip. It would be good to publish this
data where available in a day,week or month behind format. (i have
seen a chart of this for 4,5,6 subway line so i know it exists).
anonymized or aggregate farebox data: I know i have seen some charts
and visualization of this, but i don't know where it's published or
accessed from; this should be published online along with other data.
On Wed, Oct 7, 2009 at 1:07 PM, Nicholas Bergson-Shilcock
<[address removed]> wrote:
> Hey all,
> In preparation for writing up some more advocacy and guidance documents
> on open transit data, I thought I'd throw out a few questions to the
> group to get people's feelings about what's most important.
> Feel free to weigh in on any/all of the following:
> First, we've mostly talked about schedule and fare data so far, but
> are there other categories of data that folks are really interested in?
> (Obviously real-time location data would be awesome, but that just
> doesn't exist yet for most of the system so it's not really much of an
> option; it'd be nice though to have access to the limited amount of
> real-time info that *does* currently exist.)
> Second, in terms of formats, it's my understanding that the MTA
> internally uses a format called SDP in part because it handles fare
> structures that GTFS doesn't. My thinking is that it'd be best to have
> both GTFS and SDP available. What do you all think?
> Lastly, I've learned that for some of the MTA agencies (e.g., NYCT),
> fixed schedules aren't necessarily the best option since they run more
> in terms of frequency. Do people care more about having (relatively)
> accurate schedules or very accurate headways (e.g., 4-trains at time X
> have 3 minute headways)?