mozilla-services / autopush Goto Github PK
View Code? Open in Web Editor NEWPython Web Push Server used by Mozilla
Home Page: https://autopush.readthedocs.io/
License: Mozilla Public License 2.0
Python Web Push Server used by Mozilla
Home Page: https://autopush.readthedocs.io/
License: Mozilla Public License 2.0
Add a regular, daily check of the iOS feedback service to pull IDs that need to be removed from APNs.
On connect, we should sendClose any prior client connection of the same UAID.
Also, we should have a separate task scheduled, that sweeps through all the connections, and if the last ping was more than N minutes/seconds ago, initiates a server-side ping......
OR.....
We turn on autobahn's websocket auto-ping feature.
We're cancelling already cancelled deferred's, these should be caught more gracefully, and all deferreds should include a trap for Cancelled.
While working on a different project with a different service it occurred to me that we have a lot of moving parts. There are some services that exist in a "dev", "stage", "prod" or other such environment which are isolated from the others. This may be fine, or it may cause unusual issues due to information isolation (e.g. state existing in "stage" will not exist in "dev" leading to changes potentially not being properly reflected in all calls)
It would be useful for data responses to return the environment they are in as an optional component. This way, systems that may be consuming or processing results can easily confirm the source of the data that they are using and not accidentally "cross the streams".
thoughts?
I want to use Sourcegraph for autopush code search, browsing, and usage examples. Can an admin enable Sourcegraph for this repository? Just go to https://sourcegraph.com/github.com/mozilla-services/autopush. (It should only take 30 seconds.)
Thank you!
Idle TCP connections continue to slowly grow. We should cull old connections at new device registrations.
All deferred's generated in a client should be tracked, so they can be cancelled if a client suddenly disconnects.
@jrconlin already fixed the endpoint to use ''ap_settings'', this should be fixed in websocket.py's handler as well.
Add load testing component for potential continuous deployment.
There are a number of conflicting package versions being included in requirements.txt and setup.py
Both autopush/autoendpoint should have a /status that returns OK.
Several parts of the code catch and retry provisioned throughput exceptions. These parts of code should emit metrics indicating that along with context.
i.e. 'error.provisioned.new_connection', 'error.provisioned.store_notification'.
The Web Push spec defines a TTL
header for app servers to specify message retention duration. Setting TTL: 0
indicates a message is ephemeral, and can be dropped immediately if the client is disconnected. This feature was requested by the Loop team to match the current Loop Push behavior, so let's add it to Simple Push.
DynamoDB doesn't support TTLs, but we can work around this by storing the expiration time and dropping expired messages when the client reconnects. We can also specify a TTL
header in the response with the actual time-to-live...so, if an app server sends too many updates, we can indicate we're dropping messages via TTL: 0
.
The Web Push spec also says that an omitted TTL
is equivalent to TTL: 0
. This could be a back-compat issue for us, since the current behavior is to store messages if the client is disconnected. OTOH, I like that storing messages requires an explicit opt-in, so it'd be nice to make this change while we have few users.
SimplePushServerProtocol
only tags emitted metrics with the user agent. It'd be helpful to include the uaid_hash
and remote-ip
, too, like we do for AutoendpointHandler
. The former would help with triaging Bugzilla tickets; the latter with tracking down connection counts per client.
Also, some other metrics we could collect:
mobilenetwork
field is present in the client hello, indicating the client is on a cellular network. If we wanted, we could break this down by carrier, too—mobilenetwork["mcc"]
and mobilenetwork["mnc"]
.bad_message
.Every user timeout for a http connection to endpoint results in a logged sentry error, these shouldn't be logged as user timeouts are normal.
_check_router should take the result of the pinger.register call, along with the bool, but it doesn't take a result or check it.
https://github.com/mozilla-services/autopush/blob/master/autopush/websocket.py#L174
For releng testing
A logging addition most likely has resulted in the increased 5xx rate on the endpoint server.
We should pong no more than once every 5 seconds, however our pong delay is a hard-coded value that doesn't consider client latencies. So instead we should track the last time we saw a ping, and if that value is more than 5 seconds, respond immediately, otherwise subtract the value from 10 (to ensure the client doesn't timeout waiting for our pong).
Meta bug for dealing with detecting and preventing system abuse.
Abusive behaviors may include:
Client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1157696
Now that @bbangert's fantastic routing refactor has landed, we can add a separate bridge for the TEF UDP wake-up platform. Unlike our other bridges, this one only sends a signal for the client to reconnect if it's offline.
It works like this:
{"messageType":"hello","uaid":"","channelIDs":[],"wakeup_hostport":{"ip":"127.0.0.1","port":65535},"mobilenetwork":{"mcc":"001","mnc":"01","netid":"001-01.default"}}
. We'll store these params for the wake-up request.If we see a client isn't online, we can just store the message, and send the ping afterward. The client will get that message the next time it reconnects.
Add logic for server to send pings to client.
"ping_interval":_seconds_
as part of "hello
" responseconsider:
SimplePushServerProtocol should be renamed to PushServerProtocol as its not just SimplePush. All tests/etc should be updated appropriately.
This is blocked by #57.
The router/storage table names used must be configurable, as 'router' and 'storage' could cause name conflicts.
After #57 lands, the notifications per fetch is hardcoded to 10, this should be configurable via a CLI option.
Exceptions should be logged out using Sentry.
Modify the architecture to store individual messages with payload
To retain efficient message delivery and storage, a separate router and db table is needed for storing individual messages. Our current router table has some space to store arbitrary additional data, but in this case we need to know in advance whether the channel exists or not, to avoid storing lots of arbitrary data that may be for expired/unregistered channels.
This first version will have a new table, keyed by hash/range key: uaid-channel_id / timestamp
Using a channel_id of "" combined with uaid will return a structure with a 'channels' field that has all registered channels.
After the router lookup indicates to use individual message delivery, the individual message router will verify the channel id is valid, and deliver/store it as needed.
The router type will be stored for the connection, so several methods can act appropriately based on whether the prior simplepush style delivery is used, or webpush style. The following methods will need to be modified to toggle implementation based on router_type:
A new table for individual message storing, and all the registered channels will be added for this user.
New 'webpush router' for webpush style data retention and routing will be added.s, instead of collapsing by version.
The endpoint should log more details on every hit, using the structured log output that is setup, to track:
Sphinx docs for a full doc site for autopush should be added. The restful docs for the registration endpoint should be switched to the sphinx http restful plugin for nice HTTP docs.
Retain connect info from clients, and add IWakeup interface and TEF wake-up protocol, deploy to staging for TEF testing.
Look into switching twisted's SSL for the Python 3 SSL backport to save ~ 10kb per connection
[what is our current total per connection size?]
For context: https://bugzilla.mozilla.org/show_bug.cgi?id=1152264
In #78, we introduced an adaptive response delay for clients that ping too frequently. Unfortunately, folks are still reporting high battery and data usage, even with the fix in place. This is caused by a bug in the client's adaptive ping logic, and affects all FxOS 2.x releases (1.x is unaffected because it didn't ship with adaptive pings).
Any client can potentially enter this state (especially those on unreliable networks), and there's no recovery apart from manually resetting the prefs. The client patch is in place, but has not yet been uplifted.
On our end, we can detect when clients enter a ping loop, and send a special WebSocket close code (4774). This is normally used for UDP wake-up: if the client detects this code, it won't reconnect, as it expects the server to wake it up for incoming notifications.
The trade-off is that phones on non-TEF networks won't receive any push notifications until their network status changes—either they lose reception and reconnect, or their phone switches between cellular and Wi-Fi. (TEF has their own UDP wake-up platform, so we can actually make this work for them). But it's a small price to pay for battery life and reasonable data usage.
A vague plan:
self.sendClose(code=4774)
.Expose the autobahn max connection as a config option so we can cut off excess connections before the server overloads.
If the server is overloaded, it'll be helpful to have a way to tell the client to go away and reconnect later. https://bugzilla.mozilla.org/show_bug.cgi?id=1184278 tracks the client work to support this.
I'll need example stage and prod configs. If they are close to the same, one will suffice.
Two blocking operations hit the thread-pool in the endpoint, decryption, and internal delivery to a connection node. We should switch to using twisted's http client to reduce the use of the thread-pool.
http://twisted.readthedocs.org/en/latest/web/howto/client.html
pip missing from pypy tarball
$ make build
make: *** No rule to make target /Users/rpappalardo/Dropbox/git/services-test/build/autopush/pypy/bin/pypy', needed by
/Users/rpappalardo/Dropbox/git/services-test/build/autopush/pypy/bin/pip'. Stop.
Right now the ping interval is hard-coded, we need to make it an option so that we can lower it temporarily to fix clients that lowered their value too far.
Right now each prop ping's code is its own module, but the prop ping isn't very modular inside endpoint.py.
For each prop ping, we should probably have a mapping:
proprietary code | should_store | func
gcm false gcm.some_func
tef true tef.other_func
Indicating whether we should store the message and/or attempt local delivery, or pass it.
On startup, autopush/autoendpoint should do a preliminary write/read from both DynamoDB tables to ensure they have appropriate permissions.
The current Dockerfile builds just the project, to run either autopush or autoendpoint.
It'd be useful for rel-eng to have a single docker that can be started and 'does it all'. My thought is to add another dockerfile (docker now lets you have additional docker files and specify the name when building), and have it spin up moto, autopush, and autoendpoint, with auto* using the moto daemon for AWS instead of actual AWS.
The /status endpoint is a good starting point. I'd also like an endpoint that does a deeper check. Off the top of my head:
Right now we have clients that connect, and never say hello. We only detect this 5 mins in with the autoPing. We should set a timer for 20 seconds to remove these errant connections earlier.
If a notification is directly delivered but not ack'd, and the client drops, we drop the notification entirely. Per the todo in websocket.py:
# TODO: Any notifications directly delivered but not ack'd need
# to be punted to an endpoint router
We should add this code so that the connection node fires off a notification delivery to the router to redeliver these un-ack'd messages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.