Coder Social home page Coder Social logo

mozilla-services / autopush Goto Github PK

View Code? Open in Web Editor NEW
217.0 217.0 34.0 4.47 MB

Python Web Push Server used by Mozilla

Home Page: https://autopush.readthedocs.io/

License: Mozilla Public License 2.0

Python 99.45% Makefile 0.16% Nix 0.07% Lua 0.25% Shell 0.02% Dockerfile 0.05%
mozilla python services-engineering-team webpush

autopush's People

Contributors

alex avatar alexcrichton avatar azuremarker avatar bbangert avatar crodjer avatar fzzzy avatar jonathanpmartins avatar jrconlin avatar marco-c avatar mozilla-github-standards avatar oremj avatar pjenvey avatar psiinon avatar pyup-bot avatar tarekziade avatar tublitzed avatar wh0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autopush's Issues

Bugfix: Disconnect dupe uaid's, sweep all connections for excessive idle

On connect, we should sendClose any prior client connection of the same UAID.

Also, we should have a separate task scheduled, that sweeps through all the connections, and if the last ping was more than N minutes/seconds ago, initiates a server-side ping......

OR.....

We turn on autobahn's websocket auto-ping feature.

Fix cancelled deferred error

We're cancelling already cancelled deferred's, these should be caught more gracefully, and all deferreds should include a trap for Cancelled.

Services should self identify

While working on a different project with a different service it occurred to me that we have a lot of moving parts. There are some services that exist in a "dev", "stage", "prod" or other such environment which are isolated from the others. This may be fine, or it may cause unusual issues due to information isolation (e.g. state existing in "stage" will not exist in "dev" leading to changes potentially not being properly reflected in all calls)

It would be useful for data responses to return the environment they are in as an optional component. This way, systems that may be consuming or processing results can easily confirm the source of the data that they are using and not accidentally "cross the streams".

thoughts?

Add provisioned error metrics.

Several parts of the code catch and retry provisioned throughput exceptions. These parts of code should emit metrics indicating that along with context.

i.e. 'error.provisioned.new_connection', 'error.provisioned.store_notification'.

Add a TTL header for incoming messages

The Web Push spec defines a TTL header for app servers to specify message retention duration. Setting TTL: 0 indicates a message is ephemeral, and can be dropped immediately if the client is disconnected. This feature was requested by the Loop team to match the current Loop Push behavior, so let's add it to Simple Push.

DynamoDB doesn't support TTLs, but we can work around this by storing the expiration time and dropping expired messages when the client reconnects. We can also specify a TTL header in the response with the actual time-to-live...so, if an app server sends too many updates, we can indicate we're dropping messages via TTL: 0.

The Web Push spec also says that an omitted TTL is equivalent to TTL: 0. This could be a back-compat issue for us, since the current behavior is to store messages if the client is disconnected. OTOH, I like that storing messages requires an explicit opt-in, so it'd be nice to make this change while we have few users.

Tag WebSocket server metrics with the device ID

SimplePushServerProtocol only tags emitted metrics with the user agent. It'd be helpful to include the uaid_hash and remote-ip, too, like we do for AutoendpointHandler. The former would help with triaging Bugzilla tickets; the latter with tracking down connection counts per client.

Also, some other metrics we could collect:

  • Whether a mobilenetwork field is present in the client hello, indicating the client is on a cellular network. If we wanted, we could break this down by carrier, too—mobilenetwork["mcc"] and mobilenetwork["mnc"].
  • Malformed and incomplete messages that call bad_message.
  • Whether the client was already connected (break down by local vs. remote).

Silence timeouts in endpoint

Every user timeout for a http connection to endpoint results in a logged sentry error, these shouldn't be logged as user timeouts are normal.

Adaptive pong delay

We should pong no more than once every 5 seconds, however our pong delay is a hard-coded value that doesn't consider client latencies. So instead we should track the last time we saw a ping, and if that value is more than 5 seconds, respond immediately, otherwise subtract the value from 10 (to ensure the client doesn't timeout waiting for our pong).

Abuse Mitigation Bug

Meta bug for dealing with detecting and preventing system abuse.

Abusive behaviors may include:

  • Excessive channel registrations
  • Excessive UAID registrations
  • Excessive posts to invalid or inactive UAID/Channels

Add a UDP wake-up bridge

Client bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1157696

Now that @bbangert's fantastic routing refactor has landed, we can add a separate bridge for the TEF UDP wake-up platform. Unlike our other bridges, this one only sends a signal for the client to reconnect if it's offline.

It works like this:

  • When the client connects to the push server, it opens a UDP port and includes the IP, port, and carrier info in the handshake: {"messageType":"hello","uaid":"","channelIDs":[],"wakeup_hostport":{"ip":"127.0.0.1","port":65535},"mobilenetwork":{"mcc":"001","mnc":"01","netid":"001-01.default"}}. We'll store these params for the wake-up request.
  • If the server supports UDP wake-up, and the connection idles for 10 seconds, the server closes the WebSocket with a special close code (4774). The client won't reconnect when it receives this close code.
  • When the push server wants to wake up a client, it sends an HTTP request to the TEF wake-up server (authenticated with a TLS client cert issued by TEF).
  • The TEF wake-up server sends a UDP packet to the client, which prompts the client to reconnect to the push server.

If we see a client isn't online, we can just store the message, and send the ping afterward. The client will get that message the next time it reconnects.

Switch to server driven pings

Add logic for server to send pings to client.

  1. include "ping_interval":_seconds_ as part of "hello" response
  2. include logic for websocket connector to send pings at a given interval.
  3. if interval needs to change, send a "long form" ping message containing the new ping_interval.

consider:

  • On idle disconnect, note remote IP.
  • Add logic to allow for IP block based ping interval specifications.
  • Add logic to dynamically adjust ping delays.

Add WebPush style individual message delivery w/data

Modify the architecture to store individual messages with payload

Implementation Plan

To retain efficient message delivery and storage, a separate router and db table is needed for storing individual messages. Our current router table has some space to store arbitrary additional data, but in this case we need to know in advance whether the channel exists or not, to avoid storing lots of arbitrary data that may be for expired/unregistered channels.

This first version will have a new table, keyed by hash/range key: uaid-channel_id / timestamp

Using a channel_id of "" combined with uaid will return a structure with a 'channels' field that has all registered channels.

After the router lookup indicates to use individual message delivery, the individual message router will verify the channel id is valid, and deliver/store it as needed.

Websocket Changes

The router type will be stored for the connection, so several methods can act appropriately based on whether the prior simplepush style delivery is used, or webpush style. The following methods will need to be modified to toggle implementation based on router_type:

  • register
  • unregister
  • process_notifications
  • ack

Database Changes & Additions

A new table for individual message storing, and all the registered channels will be added for this user.

Router

New 'webpush router' for webpush style data retention and routing will be added.s, instead of collapsing by version.

Add sphinx docs for autopush

Sphinx docs for a full doc site for autopush should be added. The restful docs for the registration endpoint should be switched to the sphinx http restful plugin for nice HTTP docs.

Send a special WebSocket close code for clients that ping too frequently

For context: https://bugzilla.mozilla.org/show_bug.cgi?id=1152264

In #78, we introduced an adaptive response delay for clients that ping too frequently. Unfortunately, folks are still reporting high battery and data usage, even with the fix in place. This is caused by a bug in the client's adaptive ping logic, and affects all FxOS 2.x releases (1.x is unaffected because it didn't ship with adaptive pings).

Any client can potentially enter this state (especially those on unreliable networks), and there's no recovery apart from manually resetting the prefs. The client patch is in place, but has not yet been uplifted.

On our end, we can detect when clients enter a ping loop, and send a special WebSocket close code (4774). This is normally used for UDP wake-up: if the client detects this code, it won't reconnect, as it expects the server to wake it up for incoming notifications.

The trade-off is that phones on non-TEF networks won't receive any push notifications until their network status changes—either they lose reception and reconnect, or their phone switches between cellular and Wi-Fi. (TEF has their own UDP wake-up platform, so we can actually make this work for them). But it's a small price to pay for battery life and reasonable data usage.

A vague plan:

  • Remove the adaptive response delay, and disconnect clients that ping too frequently.
  • On disconnect, store the connection lifespan in DynamoDB. I think this calls for a weighted moving average, to minimize the impact on well-behaved clients that happen to be on spotty networks.
  • When the client reconnects, look up its average connection lifespan. If it drops below a threshold (15 seconds?), flush any pending messages, then close its connection via self.sendClose(code=4774).

Add max connections

Expose the autobahn max connection as a config option so we can cut off excess connections before the server overloads.

make build doesn't work with OSX pypy binary

pip missing from pypy tarball
$ make build
make: *** No rule to make target /Users/rpappalardo/Dropbox/git/services-test/build/autopush/pypy/bin/pypy', needed by/Users/rpappalardo/Dropbox/git/services-test/build/autopush/pypy/bin/pip'. Stop.

Add option to lower min ping interval

Right now the ping interval is hard-coded, we need to make it an option so that we can lower it temporarily to fix clients that lowered their value too far.

Modularize prop ping further

Right now each prop ping's code is its own module, but the prop ping isn't very modular inside endpoint.py.

For each prop ping, we should probably have a mapping:

proprietary code | should_store | func
gcm                false          gcm.some_func
tef                true           tef.other_func

Indicating whether we should store the message and/or attempt local delivery, or pass it.

Startup check of backend

On startup, autopush/autoendpoint should do a preliminary write/read from both DynamoDB tables to ensure they have appropriate permissions.

Setup docker build file that runs it all

The current Dockerfile builds just the project, to run either autopush or autoendpoint.

It'd be useful for rel-eng to have a single docker that can be started and 'does it all'. My thought is to add another dockerfile (docker now lets you have additional docker files and specify the name when building), and have it spin up moto, autopush, and autoendpoint, with auto* using the moto daemon for AWS instead of actual AWS.

Add /status/health or /heath

The /status endpoint is a good starting point. I'd also like an endpoint that does a deeper check. Off the top of my head:

  • make sure dynamodb is working

Add timer on connect for hello

Right now we have clients that connect, and never say hello. We only detect this 5 mins in with the autoPing. We should set a timer for 20 seconds to remove these errant connections earlier.

Bug: Send un-ackd direct delivery notifications to storage

If a notification is directly delivered but not ack'd, and the client drops, we drop the notification entirely. Per the todo in websocket.py:

    # TODO: Any notifications directly delivered but not ack'd need
    # to be punted to an endpoint router

We should add this code so that the connection node fires off a notification delivery to the router to redeliver these un-ack'd messages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.