Comments (17)
After talking with someone who has more experience managing .protos long-term, I suggest that we deprecate expired experimental fields using [deprecated=true]
- so the field would look like:
EXPERIMENTAL_FIELD_THAT_NOBODY_WANTED = 1 [deprecated=true];
Here's why:
- It would allow any existing consumers or producer that have implemented support to leave code as-is. If we used
RESERVED
instead, consumers or producers would need to remove implemented code as the field would no longer be available the next time the stub code was generated by the protocol buffer compiler. With[deprecated=true]
, the stub code may flag the field as deprecated in code annotations, but it would still be usable. - It allows easier discovery of past experimental fields when someone wants to implement something similar (again) in the future.
The above would allow us to easily "un-deprecate" a field as well. For example, let's say that the 2 years expired for an experimental field, and several consumers implemented support but no one is publishing the data yet. The field would get marked as [deprecated=true]
in the .proto, but early adopting consumers could leave code that supports the feature in their product. Then, if a publisher started publishing the data, we could re-adopt the field easily, again without the early adopting consumers needing to change anything. And then, having the requisite producer/consumer, we could vote to officially adopt the field.
The primary downside to using [deprecated=true]
instead of RESERVED
is that the .proto would be more cluttered over time if old expired experimental fields pile up. But I think the above benefits outweigh the negatives.
Thoughts?
from transit.
@barbeau I removed it because I wanted to clean up the timeline of your PR but that didn't work and couldn't revert. That's why it's not showing up. 😬
I would make the time window for experimental field of 1 year. Hopefully most field should take less time than this. Nothing else significant to add. Strong majority vote still make sense to me.
from transit.
Currently, the GTFS-realtime change process doesn't require producers and consumers for adoption of proposed fields as experimental OR production fields.
@gcamp As part of these changes, what would you think of introducing a requirement for an actual producer and consumer before an experimental field could be officially adopted as a production field? This would align GTFS-realtime with GTFS.
from transit.
@barbeau I would agree in principle but in practice we've had big problems integrating producers into the discussion on Github. I've had that with 2 producers who wanted changes on the spec but didn't follow on Github nor put energy on it. (#87 and #111)
Maybe that's a symptom of a bigger problem with the producers we are working with, so I would agree.
from transit.
@gcamp Ok, thanks. One other question here - after the expiration windows passes, and assuming an experimental field doesn't get adopted, how exactly does it get deprecated?
So, given a field like:
EXPERIMENTAL_FIELD_THAT_NOBODY_WANTED = 1
...it looks like our options are, in increasing order of severity:
EXPERIMENTAL_FIELD_THAT_NOBODY_WANTED = 1 [deprecated=true];
- Protobuf docs say “code may be annotated as
@Deprecated
by pb compilers” - I interpret this as just being a hint to the tools, but in theory the field could still be used.
- Protobuf docs say “code may be annotated as
RESERVED 1;
- Protobuf docs say “The protocol buffer compiler will complain if any future users try to use these field identifiers.” Is this a compiler build failure? How exactly does it "complain"?
Any thoughts based on experience with protobufs are welcome!
from transit.
I would prefer option 1., because as much as some field might not be necessary, it would be nice to have the option to revert on using EXPERIMENTAL_FIELD_THAT_NOBODY_WANTED
to actual use it.
from transit.
I agree with @barbeau that we should require at least one producer and one consumer for any new feature before it is formally adopted. This is what keeps the spec compact and focused on the concrete domain of passenger information, and prevents ideas from rushing ahead of implementations.
I can imagine as @gcamp said it's difficult to get producers to engage in discussion on Github. But I don't see how this affects the requirement. Requiring real-world implementation before formally modifying the spec seems to be a separate requirement than including the producer in the design discussion.
I am also in favor of leaving fields in the experimental phase for a significant period of time until we can judge whether they are being adopted by multiple parties.
I have no preference on the experimental field deprecation mechanism. It seems to me that both methods would allow the field to be re-activated in a future version of the spec in the event that the abandoned field is revived a few years later.
from transit.
Sadly I can't provide any specific suggestions on how to deal with the process, but from my viewpoint the main problem with producers is that they're mostly busy with keeping their live feeds running. I can believe that in most agencies the feeds are maintained by one or two people, whose main responsibilities are completely different. In my case it's solely my free time activity - I was interested in eventually implementing #87, but it was only last week when we were finally ready to publish our data to the world since communication with the internal bus tracking system is not the easiest one.
It means that it's quite demanding to implement experimental changes into the feeds, keep track of changes during the approval process and distribute the extension proto files to the consumers. Another problem is to find consumers who are actually willing to work with the protobuf format, for many developers interested only in data from one agency it's much easier to process plain JSON which they're used to. Extensions or experimental fields just bring complications not only for them.
So basically I guess that our only consumers of protobuf format would be Google Maps and eventually a few more services with worldwide coverage, who can also take quite some time to implement the changes.
from transit.
Thanks for your comments @jakluk. I do see what you and @gcamp mean about the difficulty of getting producers to contribute to a discussion on the specification or modify their feeds.
However, I don't really understand how lack of time or resources on the producer side constitutes an argument for or against the requirement to have at least one working producer and consumer before changes to the spec are finalized. It seems only tangentially related to me.
If someone wants the spec to forge ahead quickly with ways to represent all sorts of situations, ahead of people actually implementing them, then they might find that the one-producer-one-consumer requirement slows down progress. But this is precisely the goal of the requirement - to prevent the spec from evolving at a faster rate than existing practice. It is common for people and organizations to generate ideas much faster than they allocate resources to actually understand and test them. The world already has plenty of exhaustive, voluminous data models that no one uses. GTFS benefits from avoiding that path.
from transit.
@abyrd My experience so far with producers has been that they would be willing to implement something only when it's accepted in the spec, even if it's something that they are asking for. Basically their time is so sparse (like @jakluk said), that they don't want to spend time implementing something that might be rejected later on.
On top of that, the complexity of having two .proto files simultaneously is quite high, especially for consumer but also for producers. You'll need two proto files to create one compliant GTFS-rt and one experimental GTFS-rt.
from transit.
I imagine that's probably the case for most producers - they can't pay external contractors or vendors to build things that may be thrown away. It's completely legitimate for them to wait. But there are a few large or forward thinking agencies with their own internal technical capacity, who often write their own GTFS export or processing code. These agencies already add nonstandard fields to their feeds and push for changes to the spec, and GTFS was basically created and popularized by such agencies.
We won't have that much trouble getting a useful extension added to a feed from TriMet, the Netherlands, or maybe NYC or Boston (among other regions). It might take six months or even longer, but prototyping, testing, and revising a new proposal has a cost (in both effort and money), and until someone decides to take on that cost I think it's prudent to consider proposals as provisional.
Ideally the reference implementations that justify inclusion in the spec should be open source, which would cut the cost of follow-up implementation by other parties.
from transit.
Here @mbta, we have some internal extensions, but we only publish them in JSON formats. While the space savings for protobuf files are substantial, there's added complexity to make sure that all clients who want a new field are using the custom .proto
definition. It's also hard to make changes to an existing field in a protobuf, even an existing experimental field.
I don't have any good ideas here, unfortunately. Maybe a requirement to have a working producer and a consumer but not necessarily as a part of a protobuf? We're much more likely to experiment with a field in our JSON data than as something to add to our PB files.
from transit.
@paulswartz Just to clarify, the current GTFS-realtime change process encourages producers/consumers to add new fields to the main .proto instead of implementing custom extensions, as long as the field seems generally useful, because of implementation issues. Just wanted to make sure that was clear for anyone following the discussion.
from transit.
Proposal for defining GTFS-realtime experimental field consensus process based on this discussion is at #126.
We can tackle the limited window on experimental fields separately.
from transit.
Here @mbta, we have some internal extensions, but we only publish them in JSON formats. While the space savings for protobuf files are substantial, there's added complexity to make sure that all clients who want a new field are using the custom
.proto
definition. It's also hard to make changes to an existing field in a protobuf, even an existing experimental field.
This is an interesting point. Protocol buffers are an interesting technology for their compactness, the ability to generate bindings in multiple languages, etc. But if they're getting in the way of people adding fields and evolving the spec then maybe they're not the best choice. General purpose compression like gzip is remarkably efficient on text containing lots of repetitive keys and punctuation like JSON. There is a significant additional cost of expanding the data into a giant string and parsing that string, but I'm not sure how often GTFS-RT needs to be consumed directly by low-powered embedded systems.
I don't have any good ideas here, unfortunately. Maybe a requirement to have a working producer and a consumer but not necessarily as a part of a protobuf? We're much more likely to experiment with a field in our JSON data than as something to add to our PB files.
GTFS-RT does not currently mention or allow replacing Protobuf with alternative serialization formats, so realtime data in a non-protobuf format is not GTFS-RT. It is exactly this extra effort of implementation and adoption within Protobuf-based GTFS-RT, which is known to be more complicated, that we would be awaiting before the specification is expanded to include a new field.
If implementation is so prohibitive that no one is willing to do it, then either:
a) New features are not important or useful enough for anyone to invest in creating and maintaining them, so should not exist; or
b) The Protobuf system is too inflexible for an evolving specification so GTFS-RT should be changed to use some more flexible text-based format (as GTFS-static does); or
c) We need to reassure producers by pre-approving new fields, stating that they will definitely become part of the spec as soon as they are implemented.
The third option might be useful: rather than voting to add an experimental field and voting again to add it to the spec once it's implemented, we could vote only once to include the experimental field in the spec automatically on some future date, conditional on it being implemented by a specific producer and consumer who have volunteered to do so. It would be up to those producers and consumers to coordinate among themselves to get the job done by the deadline, at which point the experimental field would automatically be part of the spec. This lessens the risk for the implementers.
from transit.
Proposal for producer/consumer requirement, limited time for experimental fields opened at #140.
from transit.
#140 is now approved and merged, and #126 was previously approved and merged, so this should cover the topics discussed in this issue. Thanks all!
from transit.
Related Issues (20)
- Codes for stations and stop points / line codes / line pictograms HOT 5
- Deeplinks now supported via Google Transit
- Best Practice Suggestion: Permalink to GTFS feeds should be on same domain as the transit agency's main website HOT 7
- Required type of transportation at `routes` level may lack flexibility (multi-modal routes) HOT 10
- Modifications to the GTFS Governance: Phasing Plan HOT 22
- Use Entity-Relationship Model as Definitive Reference HOT 3
- [Governance] Phase 1: GTFS Digest Release HOT 1
- [GTFS-Fares v2] Multi-leg Transfer: Same product/media transfer behavior HOT 1
- Migration of Outstanding Best Practices issues and PRs
- Update translations.txt after Fares v2 addition into GTFS
- Why is it recommeded that short term service modifications are excluded from GTFS? HOT 4
- [GTFS-Fares v2] Non-sequential Legs Transfer HOT 2
- stops.zone_id conditional requirement with presence of route-based fare_rules? HOT 3
- Integration of carpooling lines HOT 4
- Clarification on language code data standards used in translations.txt HOT 2
- [Governance] Phase 2: Enhancing Voting and Reviews HOT 15
- Clarifying constraints on pathways.stair_count HOT 3
- Missing functionality to define "conceptual grouping of stops/stations" in existing GTFS HOT 11
- Refinement of GTFS Terminology: Transitioning from "Schedule" to "Static" HOT 15
- Make UTF-8 the mandatory GTFS encoding HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transit.