getlantern / broflake Goto Github PK

View Code? Open in Web Editor NEW

4.0 6.0 0.0 170.48 MB

Interoperable browser-based P2P proxies for censorship circumvention

License: GNU General Public License v3.0

Shell 1.77% Go 27.22% JavaScript 5.82% Python 0.11% HTML 1.59% TypeScript 61.73% Procfile 0.01% CSS 0.19% PHP 1.57%

broflake's Introduction

Browsers Unbounded

🧭 Table of contents

What is Browsers Unbounded?
System components
Quickstart for devs
Observing networks with netstate
UI

💀 Warning

This is prototype-grade software!

❓ What is Browsers Unbounded?

Browsers Unbounded is a system for distributed peer-to-peer proxying. The Browsers Unbounded system includes a browser-based client which enables volunteers to instantly provide proxying services just by accessing a web page. However, Browsers Unbounded is not just a web application! The Browsers Unbounded system introduces software libraries and protocol concepts designed to enable role-agnostic multi-hop p2p proxying across the entire Lantern network or on behalf of any circumvention tool that chooses to integrate it.

Put another way, Browsers Unbounded is a common language which enables circumvention tool users to describe, exchange, and share the resource of internet access across network boundaries and runtime environments.

💾 System components

Module	Description
clientcore	library exposing Browsers Unbounded's high level client API
cmd	driver code for operationalizing Browsers Unbounded outside of a controlling process
common	data structures and functionality shared across Browsers Unbounded modules
egress	egress server
freddie	discovery, signaling, and matchmaking server
netstate	network topology observability tool
ui	embeddable web user interface

▶️ Quickstart for devs

Clone this repo.
Configure Mozilla Firefox to use a local HTTP proxy. In settings, search "proxy". Select Manual proxy configuration. Enter address 127.0.0.1, port 1080, and check the box labeled Also use this proxy for HTTPS.
Build the native binary desktop client: cd cmd && ./build.sh desktop
Build the native binary widget: cd cmd && ./build.sh widget
Build the browser widget: cd cmd && ./build_web.sh
Start Freddie: cd freddie/cmd && PORT=9000 go run main.go
Start the egress server: cd egress/cmd && PORT=8000 go run egress.go
Start a desktop client: cd cmd/dist/bin && FREDDIE=http://localhost:9000 EGRESS=http://localhost:8000 ./desktop
Decision point: do you want to run a native binary widget or a browser widget? To start a native binary widget: cd cmd/dist/bin && FREDDIE=http://localhost:9000 EGRESS=http://localhost:8000 ./widget. Alternatively, to start a browser widget, follow the UI quickstart.

The widget and desktop client find each other via the discovery server, execute a signaling step, and establish several WebRTC connections.

Start Mozilla Firefox. Use the browser as you normally would, visiting all your favorite websites. Your traffic is proxied in a chain: Firefox -> local HTTP proxy -> desktop client -> webRTC -> widget -> WebSocket -> egress server -> remote HTTP proxy -> the internet.

🕸️ Observing networks with netstate

The netstate module is a work-in-progress tool for observing Browsers Unbounded networks. netstate currently visualizes network topology, labeling each Browsers Unbounded node with an arbitrary, user-defined "tag" which may be injected at runtime.

netstated is a distributed state machine which collects and processes state changes from Browsers Unbounded clients. It serves a network visualization at GET /. The gv visualizer client looks for a netstated instance at localhost:8080.

In the example below, we assume that Freddie is at http://localhost:9000 and the egress server is at http://localhost:8000:

Start netstated: cd netstate/d && go run netstated.go
Start a widget as user Alice: cd cmd/dist/bin && NETSTATED=http://localhost:8080/exec TAG=Alice FREDDIE=http://localhost:9000 EGRESS=http://localhost:8000 ./widget
Start a desktop client as user Bob: cd cmd/dist/bin && NETSTATED=http://localhost:8080/exec TAG=Bob FREDDIE=http://localhost:9000 EGRESS=http://localhost:8000 ./desktop
Open a web browser and navigate to http://localhost:8080. As Alice and Bob complete the signaling process and establish connection(s) to one another, you should see the network you have created. You must refresh the page to update the visualization.

🎨 UI

UI settings and configuration

The UI is bootstrapped with Create React App. Then "re-wired" to build one single js bundle entry using rewire. The React app will bind to a custom <browsers-unbounded> DOM el and render based on settings passed to the dataset. In development, this html can be found in ui/public/index.html. In production, the html is supplied by the "embedder" via https://unbounded.lantern.io/embed.

Example production embed:

<browsers-unbounded
   data-layout="banner"
   data-theme="dark"
   data-globe="true"
   data-exit="true"
   style='width: 100%;'
></browsers-unbounded>
<script defer="defer" src="https://embed.lantern.io/static/js/main.js"></script>

This tables lists all the available settings that can be passed to the <browsers-unbounded> DOM el via the data-* attributes. The "default" column shows the default value if the attribute is not set.

dataset	description	default
layout	string "banner" or "panel" layout	banner
theme	string "dark", "light" or "auto" (browser settings) theme	light
globe	boolean to include webgl globe	true
exit	boolean to include toast on exit intent	true
menu	boolean to include menu	true
keep-text	boolean to include text to keep tab open	true
mobile-bg	boolean to run on mobile background	false
mobile-bg	boolean to run on desktop background	true
editor	boolean to include debug dataset editor	false
branding	boolean to include logos	true
mock	boolean to use the mock wasm client data	false
target	string "web", "extension-offscreen" or "extension-popup"	web

In development, these settings can be customized using the REACT_APP_* environment variables in the .env or in your terminal. For example, to run the widget in "panel" layout, you can run REACT_APP_LAYOUT=panel yarn start. To run the widget with mock data, you can run REACT_APP_MOCK=true yarn start.

Settings can also be passed to the widget via the data-* attributes in ui/public/index.html. For example, to run the widget in "panel" layout, you can set data-layout="panel" in ui/public/index.html.

If you enable the editor (by setting REACT_APP_EDITOR=true or data-editor="true"), you can also edit the settings dynamically in the browser using a UI editor the renders above the widget. Note that the mock and target settings are not dynamic and therefore not editable in the browser. These two settings are static and must be set at the time the wasm interface is initialized.

Links:

Github pages sandbox

Browsers Unbounded website

UI quickstart for devs

Work from the ui dir: cd ui
Install deps: yarn

Development:

Copy the example env: cp .env.development.example .env.development
Start the dev server: yarn dev:web and open http://localhost:3000 to view it in the browser.

Production:

Copy the example env: cp .env.production.example .env.production
Build and deploy prod bundle to Github page: yarn deploy

UI deep dive for devs

Work from the ui dir: cd ui
Configure your .env file: cp .env.development.example .env.development
1. Set REACT_APP_WIDGET_WASM_URL to your intended hosted widget.wasm file. If you are serving it from client in step #8, use http://localhost:9000/widget.wasm. If you ran ./build_web.sh (step #7) you can also use /widget.wasm. To config for prod point to a publicly hosted widget.wasm e.g. https://embed.lantern.io/widget.wasm. If you know you know, if not, you likely want to use /widget.wasm.
2. Set REACT_APP_GEO_LOOKUP_URL to your intended geo lookup service. Most likely https://geo.getiantem.org/lookup or http://localhost:<PORT>/lookup if testing geo lookups locally
3. Set REACT_APP_STORAGE_URL to your intended iframe html for local storage of widget state and analytics. Most likely https://embed.lantern.io/storage.html or /storage.html if testing locally
4. Set any REACT_APP_* variables as needed for your development environment. See UI settings and configuration for more info.
5. Configure the WASM client endpoints: REACT_APP_DISCOVERY_SRV, REACT_APP_DISCOVERY_ENDPOINT, REACT_APP_EGRESS_ADDR & REACT_APP_EGRESS_ENDPOINT
Install the dependencies: yarn
To start in developer mode with hot-refresh server (degraded performance): run yarn dev:web and visit http://localhost:3000
To build optimized for best performance:
1. First configure your .env file: cp .env.production.example .env.production (see Step 2)
2. Run yarn build:web
To serve a build:
1. Install a simple server e.g. npm install -g serve (or your lightweight http server of choice)
2. Serve the build dir e.g. cd build && serve -s -l 3000 and visit http://localhost:3000
To deploy to Github pages: yarn deploy
Coming soon to a repo near you: yarn test

Browser extension quickstart for devs

Work from the ui dir: cd ui
Install the dependencies: yarn
Configure your .env file: cd extension && cp .env.example .env
1. Set EXTENSION_POPUP_URL to your intended hosted popup page. If you are serving it from ui in step #6, use http://localhost:3000/popup. To use prod, set to https://embed.lantern.io/popup.
2. Set EXTENSION_OFFSCREEN_URL to your intended hosted offscreen page. If you are serving it from ui in step #6, use http://localhost:3000/offscreen. To use prod, set to https://embed.lantern.io/offscreen.
To start in developer mode with hot-refresh server:

yarn dev:ext chrome
yarn dev:ext firefox

This will compile the extension and output to the ui/extension/dist dir. You can then load the unpacked extension in your browser of choice.

For Chrome, go to chrome://extensions and click "Load unpacked" and select the ui/extension/dist/chrome dir.
For Firefox, go to about:debugging#/runtime/this-firefox and click "Load Temporary Add-on" and select the ui/extension/dist/firefox/manifest.json file.
For Edge, go to edge://extensions and click "Load unpacked" and select the ui/extension/dist/edge dir.

To build for production:

yarn build:ext chrome
yarn build:ext firefox

This will compile the extension and output a compressed build to the ui/extension/packages dir.

broflake's People

Contributors

Stargazers

Watchers

broflake's Issues

Enable clients to make domain fronted requests to Freddie

Freddie exposes an HTTP interface so that he's accessible in the worst case via domain fronting.

Given our role agnosticism model, peers do not see themselves as strictly "censored" or "free." But there must be some way for them to know when they should use domain fronted requests in their interactions with Freddie, and to actually do the domain fronting.

Deliverables

#21

Broflake: Bad system state after several cancelled requests?

Hunch: the problem is with our local HTTP CONNECT proxy. If this turns out to be true, we should wait until Flashlight integration is complete, and see if the issue persists.

Repro:

Set cTableSize and pTableSize to 1. This is just to make sure we test repeated QUIC connections over a single WebSocket connection.
Start Freddie, the egress server, a desktop client, and a widget.
curl a file that's big enough for you to ctrl-Z the request before it completes. This is easiest for very large files, but if your fingers are fast enough, it's reproducible for small files too. I tested with curl --proxy http://127.0.0.1:1080 https://speed.hetzner.de/1GB.bin --output /dev/null and curl --proxy http://127.0.0.1:1080 https://nypost.com
ctrl-Z the curl to cancel it.
Repeat steps 2 and 3 a few times. After a few cancelled curls, your requests will stop working altogether. You will see the requests arrive at the egress server, but you won't receive any bytes in response. If you try an HTTP (not HTTPS) request, you will observe the egress server report a 200 OK and it will report that it copied the response to the client with no errors.
Stop the egress server and restart it. In a few moments, your desktop client will realize that its QUIC connection is broken, and it will redial the egress server.
Repeat step 2. Now your requests are working again.

Alternate step 6:

Instead of stopping the egress server, stop your desktop client and restart it. Your desktop client will establish a new QUIC connection over the same WebSocket connection as before. If you now repeat step 2, your requests will be working again. (In a few moments, the egress server will realize that the old QUIC connection has died, and it will log a message indicating that it's closing that connection).

What do we think is going on?

When we enter the bad state, we still see requests arriving at the egress server. This suggests that the upstream route to the egress server is working A-OK. Moreover, when we enter the bad state, if we try an HTTP request, the egress server reports normal system behavior. When in the bad state, if we execute the alternate step 6, the egress server is able to detect that the previous QUIC connection is no longer active, suggesting that the egress server is still in a healthy state. All symptoms suggest that the response is getting blocked somewhere closer to the desktop client. If we can verify that response bytes are round tripping their way into the desktop client, it stands to reason that the local HTTP CONNECT proxy is the culprit.

Set client concurrency at runtime?

There's lots of discussions about bandwidth throttling, mostly in the context of protecting users who are on capped data plans (like mobile users).

Bandwidth throttling is being handled in https://github.com/getlantern/product/issues/40. But the problem of correctly setting the client bitrate is closely coupled to a different problem, which is correctly setting the client concurrency.

Consider the following example: The client is configured for a peer concurrency of 10. Bob is a mobile user on a capped data plan. Bob wisely sets his bitrate to 128k/sec to avoid racking up a huge bill. But now each of his connected peers experiences an effective bitrate of 12.8k/sec, which is unacceptably bad.

It's probably more desirable for Bob to help 1 peer well than 10 peers poorly -- so it'd be great if Bob's client could reconfigure itself for a concurrency of 1.

In theory, we can set client concurrency at runtime. If we finish moving client construction logic to the Broflake struct, we should be able to create and destroy new clients with arbitrary concurrency values during program execution.

Broflake Regression: repeated HTTP requests return a "context canceled" error

When you proxy your first HTTP request, it will work. But all subsequent HTTP requests will return a context canceled error. This applies only to HTTP (not HTTPS). Something like this used to happen and we fixed it during prototyping. But now it's happening again.

Repro:

Start Freddie, the egress server, a desktop client, and a widget.
Make an HTTP request with cURL: curl --proxy http://127.0.0.1:1080 http://info.cern.ch. It should succeed.
Make another HTTP request with cURL -- it can be to the same host or a different one. It will result in a context canceled error.
Knock yourself out making more HTTP requests with cURL. They will continue to result in context canceled errors.

I discovered this while working on #45, and deep in my bones I feel like the issues are related.

Revisit code organization with respect to the UI

Should the UI code become integrated as part of the Broflake monorepo, or is it best to keep it separate?

Perhaps our build flow considerations should inform our decision.

@woodybury pointed out that the Golang JavaScript glue code (wasm_exec.js) must be patched to bundle it with the UI in the desirable way.

@noahlevenson's prototype-era build scripts very jankily use a static glue code file copied from his GOROOT sometime in late October of 2022. This is definitely not what we want.

Borked client and runaway memory growth

@oxtoacart pushed a branch with a test that breaks the client. The test tries to download a 1GB file from Hetzner via two instances of the desktop client running in parallel, and one of the requesting HTTP clients misbehaves and doesn't read the response.

The typical result is that at least one of the desktop clients gets thrown into a bad state such that it will no longer proxy requests. Often, the widget breaks in unpredictable ways. In my informal tests, the desktop client state has been recoverable, but it can take a very long time. You should also reliably observe runaway memory growth.

Repro:

Fire up Freddie and the egress server.
Start two instances of the desktop client on port 1080 and 1081: PORT=1080 ./desktop, PORT=1081 ./desktop
Start one instance of the widget: ./widget. Let everyone discover each other.
cd useragent && go run .

Putting this test aside for a moment, in my experience you can sometimes make unpredictable things happen just by trying to curl that Hetzner test file through Broflake under normal circumstances.

Disable mobile widget for launch?

During November's murder board, the group astutely observed that mobile users have some special considerations with respect to bandwidth. Many (most?) mobile users are on capped data plans, and so they must be extra careful not to proxy more traffic than they can afford.

Creating a sane and safe experience for mobile users probably requires a combination of technical features (bandwidth throttling, detecting connection type) and UX features (creating a welcome flow which communicates the situation, provide mobile users with special throttling defaults, etc).

We don't have time to solve this stuff before launch, so we should probably just disable the widget on mobile devices with a "coming soon" message.

This issue touches at least two other issues:

https://github.com/getlantern/product/issues/40

https://github.com/getlantern/engineering/issues/78

Test write errors in the egressConsumer workerFSM

A hunch via code inspection:

In state 5 of the egressConsumer workerFSM, a write error will correctly close the WebSocket, exit the state, and terminate the worker's goroutine -- but it will leak the goroutine we fire off to execute the read loop.

T/F?

Broflake: Figure out a better strategy for sending to worker channels

The system of channels which comprise the internal plumbing of our client provide a simple contract: Control plane IPC messages are guaranteed to be delivered, but data plane messages may be dropped at the edges (in the workers) if the client is struggling to maintain the data rate.

However, there's an ambiguous case which merits further scrutiny, and it's the case where we send messages to workers from the router in upstreamRouter.toWorker and downstreamRouter.toWorker.

These are currently implemented as blocking sends. But it is theoretically possible for a race to occur where a message is routed to a worker who has entered a different state and is no longer listening to its rx channel. When this happens, the worker's rx buffer will begin to fill up. If the buffer becomes full, the system will deadlock.

We currently rely on the fact that the worker buffers are sufficiently large -- and sending a message to a since-departed worker is a relatively rare situation -- such that rx buffers probably just never fill up.

But we should provide stronger guarantees here.

Potential solutions:

workerFSM structs expose their current state to the outside world as they're executing. If we invent a convention around "active" vs. "inactive" states, the upstreamRouter and downstreamRouter can just avoid sending to a worker that's in an inactive state.
The workerFSM can close its comms channels when it departs the active state and construct new channels whenever it re-enters the active state.

Document conventions for workers

The workerFSM abstraction is the way we implement protocol logic, and so it's likely we'll be writing and/or modifying workers to accommodate new discovery methods, transports, etc.

Implementers need to know a few things about writing workers, and this should be documented wherever makes the most sense:

Each workerFSM runs on its own goroutine. When you start a workerFSM, the worker's goroutine is created, and the worker begins executing. When you stop a workerFSM, it terminates that worker's goroutine at the conclusion of its currently executing state.

If you put a worker into an infinite loop, you must listen for its context cancellation -- otherwise, your worker will be unable to return from its current state, your worker will not obey calls to stop, and you will leak goroutines. The worker context is passed as the first argument to each FSMstate.

Similarly, if your worker creates child goroutines, it's your responsibility to figure out how to pass the worker context forward and terminate those goroutines when the worker context is cancelled.

Multiple connections to the same peer show up as separate connections in the Broflake Widget UI

The clients are designed to indiscriminately maximize their connection redundancy. This means that in the absence of anyone else to connect to, Bob and Alice will redundantly max out their parallel connections to each other.

IMO, this is a desirable property of the system. In informal tests, it seems that the natural chaos and capacity of the marketplace ensures a relatively intelligent allocation of connections. (If you spin up N browser widgets and 1 desktop client, the desktop client will establish M random connections spread across the N browser widgets via Freddie's natural interleaving of genesis messages in the stream). And we avoid managing state or assigning any notion of peer ID.

In the worst case, under pathological network conditions, peers simply acquire a suboptimal redundancy configuration -- e.g., Bob creates two connections to Alice when he really should have created one connection to Alice and one connection to Charlie. But Bob will naturally correct this condition when/if Alice fails.

However, @woodybury has wisely pointed out that such a suboptimal redundancy configuration has UI implications. In a very small network, where Bob has created 3 parallel connections to Alice, those connections are exposed via our UI bindings as 3 individual peers. It would be much nicer if we could intelligently discern that Bob is actually connected to only 1 peer at that time.

Maybe a solve for the UI that doesn't introduce new state or peer ID: At connection time, could we just pass the IP address of the newly connected communications partner to the consumerConnectionChange event?

Consumers send their IP addresses to producers in a slice of ICE candidates "a la carte" during the signaling process -- so this might require refactoring the workerFSM in producer.go to to look for the srflx candidate and propagate it forward to state 5:

https://github.com/getlantern/broflake/blob/7558ddb3cdff18cc750d32a5c9ffdc6165e15b66/client/go/producer.go#L249

IIRC, that's the only place where IP address gets exposed during discovery and signaling (unless we want to get Freddie involved), though I could be forgetting something...

Create a browser extension version of the widget

This should be prioritized to launch with the MVP.

Replace JSON with Protocol Buffers

For the sake of rapid prototyping, clients and Freddie currently speak to each other using JSON as an interchange format. We must use Protocol Buffers instead.

This is pretty important. Since we always planned to swap in Protobufs, you'll notice plenty of JSON serialization/deserialization code that is particularly grotesque and placeholder-y.

Error propagation from workers

While working on #39, an obliquely related problem emerged.

When a worker encounters a problem so un-handleable that a panic seems warranted, the worker clearly should not attempt to continue executing. It therefore has two options:

Reset itself to state 0. (This is currently what workers do under less panicky but still errorish circumstances).
Stop state machine execution completely, as if the controlling process called stop. (This is currently unprecedented -- we do not yet allow workers to stop the state machine).

Option 1 is reasonable and easy, but it begs a question: How should the worker surface the error to any interested processes -- like a logger or the UI?

Workers were designed to be blissfully unaware of their controlling process. Instead of injecting a bunch of references (to UI bindings, a logger, etc.) into each worker, workers communicate with the rest of the system in a decoupled way using IPC messages.

But we can't use IPC messages to propagate errors, because one of the errors is IPC message channel buffer overflow.

Google Cloud Run, scalability, and WebSocket timeouts

As of 01/12/2023, Broflake services are deployed on Google Cloud Run. Google Cloud Run imposes some opinionated restrictions which may be consequential even in the short term:

Google Cloud Run caps the number of concurrent connections at 1000. With a widget upstream concurrency of 5, this means that our egress server has a maximum capacity of 200 users. However, the egress server should be scalable using GCR's autoscaling behavior, since an open WebSocket connection seems to ensure session stickiness wrt a given container: https://cloud.google.com/run/docs/triggering/websockets#sticky-sessions. The ramifications for Freddie should be considered separately. #9 exists to reduce the number of connections used during signaling, which should really improve things. For now, though -- in theory, the 1000 connection cap means that Freddie can only handle the negotiation of 500 WebRTC connections simultaneously, but it gets a bit tricky when you consider that 200 uncensored users advertising all of their connection slots at once would max out GCR's limit, thus providing no capacity for censored users -- it would essentially deadlock the discovery process. Unlike the egress server, Freddie is not currently scalable using autoscale, since he doesn't provide a way to share connection state with other container instances.
Google Cloud run caps the maximum connection timeout at 60 minutes. This means that any open WebSocket connections will be terminated by Google Cloud Run after an hour. This means that censored peers cannot maintain connections for more than an hour. However, this issue may be downstream of the constraints of reality: it's possible that network churn means that the probability of maintaining any single connection for > 60 minutes approaches zero anyway.

Broflake Widget: after toggling off, it takes a very long time before returning to ready state?

Reported by @woodybury during the 01/11/2023 group testing session.

We know that toggling off is nondeterminstic, because it's waiting on a sync.WaitGroup that's awaiting the tidying of all your connection slots. Depending on the state of your connections, there might be a lot of stuff to tidy up, or hardly anything to tidy up at all.

But there's definitely an upper bound for how long we expect this to take in the worst case. @woodybury, when you say "very long time," how long are we talking?

Geolocate at the UI layer instead of the protocol layer

This issue contradicts the work performed in #30

The idea: Since we're only geolocating to make the UX fancy for western users, geolocation should be performed at the UI layer instead of the protocol layer. We shouldn't bake geolocation into our protocol-level data structures and signaling.

The Broflake client should just send each connected consumer's IP address to JavaScript world, where UI logic should figure out how to geolocate it.

This came up during the 12/20 catchup meeting, and it's also mentioned here: https://github.com/getlantern/product/issues/36

Revisit systemwide mux/demux protocol post-MVP

Our system has the internal plumbing required for sophisticated N:M multiplexing, but we've currently turned it off via a demux hack.

In single-hop routing, real muxing between upstream and downstream routers is (theoretically) pretty easy. If route and backRoute are functions of smux client or stream IDs instead of worker IDs, I think things just work.

Multi-hop routing is harder. Each peer in a chain represents another layer of multiplexing. Do we need to implement our own muxing protocol for that? If yes, it might simply be an adaptation of smux or yamux etc...

Test Broflake's "desktop" client variant on Android

Broflake client codebase using the desktop variant of the binary was tested so far with OSX machines.

We're packaging this project into Flashlight which means it'll exist in Lantern's Android client automagically as soon as it's included in Flashlight (No issue but PR to make the project into a library to fit Flashlight is here).

I doubt there'll be a difference in functionality but it's important to test the codebase on Android.

Here're the steps:

Compile the code for Android using the same gomobile flags we use to compile Flashlight for Android
Make a dummy Android project that embeds the generated .so library
Run the same flow in the README
See how bad things break down

CC @noahlevenson

Deliverables

#46

Firefox: widget doesn't detect datachannel closure

A new Firefox issue that's not as bad as the old Firefox issue. Repro:

Start Freddie and the egress server.
Start a desktop client with ./desktop.
Load the embeddable widget in Firefox and pop open the console. Let the widget establish all 5 of its parallel connections with the desktop client.
Give the desktop client a Ctrl-Z to abruptly disconnect it.

What we expect: In a few seconds, we expect the widget to learn that the desktop client has disconnected, and we expect to see a bunch of datachannel status changes reported in the Firefox console, followed by state machine activity as the widget tries to find a new connection.

What actually happens: The widget just carries on as if nothing ever happened.

Write the producerJITRouter and deploy it in the widget

The producerJITRouter, like the producerPoolRouter, maps consumers 1:1 to producers, so it's only suitable for managing egress consumers. But unlike the producerPoolRouter, the producerJITRouter creates egress connections "just in time," and tears them down when the associated consumer has departed.

This will require adoption of a new special path assertion character -- perhaps $ -- which indicates that the producer worker will create a circuit "on-demand" whenever the consumer worker indicates that it wants one. It's not yet clear how the consumer worker should signal to the producer router when it wants and does not want upstream connectivity -- perhaps by parameterizing a ConnectivityCheckIPC or ConsumerInfoIPC message?

Test the widget in Safari

We need to make this part of our testing flow. I ought to be doing it as part of my informal build poking, but I don't own a mac.

Test the widget in Microsoft Edge

Related: Do we care about Internet Explorer? I'm assuming the number of western users running IE is too insignificant for us to waste cycles on it for the widget, but lmk if anyone disagrees.

Add instrumentation and collect metrics

This should be done after the dust has settled post-Flashlight integration, because it's likely that the egress server will undergo considerable changes.

Egress server misbehavior when deployed

After deploying on Google Cloud Run, I observed a complex of [possibly] related bugs that are irreproducible when hosting locally:

Leaking websocket connections, or just failing to decrement the number of websocket connections
Strange interleaved log messages, even when min and max containers are set to 1
Intermittent periods when clients will continuously report QUIC dial failure

Is this specific to Google Cloud Run? A good first step might be to deploi our boi on a very simple un-abstracted VM and see if we can repro the naughtiness.

Move UI bindings into clientcore

Because I think we want them when we're integrating Broflake with Lantern native apps...

Make client start and stop safer?

Currently, you can call start and stop in profoundly boneheaded ways. For example, you can make 6 consecutive calls to start, followed by 17 consecutive calls to stop. This results in undefined behavior.

We do, however, provide protection in the form of the ready event.

ready indicates that the client has quiesced from a previous call to stop and that it's safe to call start again. We obey ready for the widget UI, disabling the toggle while the system is quiescing, and it seems to work great.

But it's possible that listening for (and obeying) ready is too easy to get wrong. We might want to serialize all calls to start and stop, or simply make start and stop fail if they're called in the wrong state.

Implement optimized HTTP client in client workers and/or revisit HTTP/2

The client workers currently use the default HTTP client to perform discovery and signaling, and we have thus far made no effort to control the number of TCP connections created by a client during a signaling session.

A client, over the course of a signaling session, will create a series of HTTP requests, each one representing a message exchange with their signaling partner. These requests should be placed over a single long-lived TCP connection.

Related to this is the topic of HTTP/2 and signaling session parallelism: a client who wishes to quickly advertise N connection slots will create N concurrent HTTP requests. If both Freddie and the clients speak HTTP/2, these parallel requests can be very desirably muxed over a single TCP connection. We discussed HTTP/2 support while brainstorming and writing RFCs, but I'm not sure where we left it.

Make signaling faster by accepting streamed responses

Peers currently use ioutil.ReadAll to parse messages during signaling. I think we can make things go much faster if they instead accept the first newline-terminated byte sequence in the stream -- which is already what the consumer worker does when listening to the genesis message stream.

Are we leaking smux sessions and/or streams?

At the egress server, we interpret smux stream errors as catastrophic failure and we close the smux session, which in turn closes the underlying WebSocket connection. This seems to work the way we want. But I'm unclear on whether we're cleaning things up correctly.

The client side of our smux session should also be scrutinized for leaks, particularly after Flashlight integration:

https://github.com/getlantern/broflake/blob/main/client/go/user.go

What happens if you background the widget in mobile browsers?

Does it keep going, or does it halt? The answer to this question probably touches:

#34

https://github.com/getlantern/product/issues/40

https://github.com/getlantern/engineering/issues/78

Client data race when no upstream route has been established

In the desktop client, the user stream worker isn't supposed to enter the "proxying" state until an upstream route has been established. It's very simple to accomplish this -- we simply await a non-nil path assertion in state 0 like all the other workers do.

However, I decided to disable that simple path assertion check during development, if only to informally observe the behavior of the client in the pathological state when a user stream could not be routed upstream.

And I noticed something interesting! If there's no upstream route and you try to proxy a request, there's a data race which does not occur during normal operation (after an upstream route has been established).

There's an argument to be made that we should just correctly implement the path assertion check and the data race will go away for free, but it might be worth finding out why it happens...

Repro:

Make sure you've built your desktop client with -race. build.sh desktop will do this for you.
Start Freddie and the egress server.
Start the desktop client.
Proxy an arbitrary request through your desktop client -- e.g., curl --proxy http://127.0.0.1:1080 https://reddit.com

The desktop client should report a data race!

How to TLS-terminate control-plane traffic in the context of P2P

Control-plane traffic from Flashlight (namely: traffic to the geo server, pro server and config server) have their TLS encryption terminated at the next node.

This next node is typically either a CDN, in the case of domain-fronting, or our own proxies in the case of regular Lantern proxies (i.e., chained roundtripper).

In the case of P2P, the next node is actually a random Broflake user on the internet, not a CDN or a Lantern proxy.

This is an issue since that control-plane traffic includes PIIs (e.g., requests to pro server include credit card numbers and emails).

Goal of this ticket is to figure out how to deal with this in the context of the P2P project. The current ideas are:

Have the next node to initiate a CONNECT request and funnel the traffic without looking at it.
- This works but we'll need to figure out a way to include the original IP of the request initiator (i.e., the censored user in this case) since we're using IPs to geolocate the user and serve a limited number of proxies to them. If a nation-state actor decides to spoof a lot of IPs, they can enumerate all of Lantern's infra, which is bible bad.
Have the next node terminate the TLS connection, be able to inspect traffic and run this demo just for a week or so. This'll prove that the prototype works and then we can figure out a better solution.

This also brings another topic to mind: enumeration of Lantern proxies through spoofed IPs. @forkner mentioned it'll be cool to not use IPs as the main indicator for serving a limited number of proxies to a user, but "mine" a new user ID (i.e., let the user spend CPU/memory resources through a proof of work or a KDF like Argon2) and user IDs instead of IPs as the main indicator for serving a limited number of proxies to the user. This way a nation-state actor would spend resources to enumerate the infrastructure which creates a much higher fence. I think we'll need another ticket to discuss these ideas.

CC @myleshorton @forkner @noahlevenson

Use WebSocket Secure

Widgets currently connect to the egress server over an insecure WebSocket. We must use WSS.

https://github.com/getlantern/broflake/blob/7558ddb3cdff18cc750d32a5c9ffdc6165e15b66/client/go/client.go#L20

A widget that's acquired only idle connections may confuse the user

Your widget may acquire some number of stable connections and report "N peers connected" -- but if your connected peers aren't pushing requests through those connections, your widget reports 0 bytes/sec throughput, and you might misinterpret it as being stalled or broken.

When censored peer concurrency > 1, censored peers will try to acquire these idle and unused connections for redundancy.

At some point in the near future, we'll probably write a multipath-y thing wherein all censored users utilize all of their upstream connections in parallel. This would essentially invalidate the topic of this ticket.

It's also possible that we might set censored peer concurrency to 1 for the MVP, which would also invalidate the topic of this ticket.

Deploy Egress server to lantern-cloud

Egress server needs to live in lantern-cloud. I'll deploy a staging and production versions and share the endpoints here

CC @noahlevenson

Add widget start/stop and associated UI bindings

@woodybury pointed out that there's no way to explicitly start and stop the web widget, as is required by our opt-in slider. We must correct this.

Can't stop the Broflake producerUserStream worker

user.go defines the producerUserStream workerFSM, which is the worker that deals with bytestreams generated on the user's computer -- e.g., bytestreams originating from their browser and entering Broflake via Flashlight.

During prototyping, before Flashlight integration, this workerFSM mocked in a bunch of Flashlight-like functionality, and things are a bit placeholder-y in there.

Consequently, this worker was never refactored to obey context cancellation. This means you can't stop it. If stop is called, the system will leak goroutines, and probably even worse things will happen.

After we figure out Flashlight integration and refactor this workerFSM, we need to make sure it obeys context cancellation, per the conventions for writing workers. (See: #22)

Add privacy policy to the web widget

This emerged during the murder board: we ought to have a privacy policy linked from the widget which explains that we don't sell your info to the government (and other pertinent info about encryption/security and things you might worry about when making connections to strangers).

cc: @derek @woodybury

Add a flag to prevent proxying through yourself?

During the 01/11/2023 group testing session, @myleshorton asked:

"are there guards against proxying through yourself if you're running both [the widget and the desktop client]?"

Answer: no, but there probably should be.

Looking to the far future -- in a world with true role agnosticism beyond the censored/uncensored binary, this feature would almost definitely be required, since every peer would be both a connectivity producer and a connectivity consumer.

One approach might entail adding a hash of peer IP address to genesis messages, so that peers can filter on that field when evaluating the suitability of connectivity advertisements.

Develop and implement a short-term plan for in-country STUN servers

The question: What STUN servers should peers use? Uncensored web widget users can probably just hit Google's public server, but censored users are probably going to encounter blocking issues.

In the long term, we want to invent a fancy P2P STUN system. @noahlevenson has developed the beginnings of a plan for this.

We need to get something working right now, though. There are a few parts to this task:

Clients should have some way to fetch a list of STUN servers dynamically at runtime. This behavior might be parameterized by client type, though, since widgets will probably do fine using stun.google.com forever.
We should spin up STUN servers in-country, figure out how to detect when they're blocked, and figure out how to update clients with fresh unblocked STUN servers when they need them.
We should start an ongoing program to crawl the IPv4 space using ZMap looking for public STUN servers that we can keep in our back pocket and serve to clients when required.

Firefox client issues

Firefox represents ~3% of the browser market, but 120% of the headaches, am I rite ladies?

Firefox currently reports ICE candidate gathering failure. I've skimmed some things that suggest maybe it's related to Firefox-specific problems with ICE and localhost.

https://mediasoup.discourse.group/t/firefox-ice-failed-add-a-stun-server-and-see-about-webrtc-for-more-details/805/7

Unit test Broflake

There are no tests

lol

lmao, even

STUN system improvements

#66 lays the foundation for an extensible way to acquire and refresh dynamic lists of STUN servers.

Let's make three quick improvements:

Clients should hardcode a backup list of STUN servers which is provided at build time. Ideally we'd fetch what we believe to be a fresh list during CI, and fail the build if we fail to procure the list.
We should minimize the frequency at which we fetch the list in the STUN batch function.
We must fetch the list using a domain fronted request!

Geolocate users for the UI

Our web widget UI displays a cool visualization of internet traffic flowing to and from connected users. There's just one problem: we don't yet supply geolocation data to the UI bindings to make it work. Let's fix that.

There are a few things to think about here.

What process does the geolocating? We need to geolocate both widget and desktop users, since the UI visualization draws an animated swoosh on the map from each connected user to the widget user's location. (Rumor has it that Lantern client users are already in possession of their own geolocation information?)
If there isn't already a service or process we can use to do the geolocating, Freddie seems like a logical place to do it.
How is geolocation information exchanged between peers? Currently, our protocol defines a producer-side constraints object -- the path assertion -- which is shared with consumers at discovery time via Freddie. However, there is no information currently shared by consumers for producers. In our RFCs for this project, we've discussed the possibility of introducing a consumer-side constraints object, primarily for matchmaking purposes -- that is, consumers might share some data indicating what kind of producers they'd prefer to hear about in the announcements stream. This consumer-side constraints object might be a logical place to consider adding geolocation, though this suggests state management problems of the kind we've made a strenuous effort to avoid. Could consumers simply piggyback their geolocation (and any other future information) on their response to a genesis message?

The ConsumerInfo struct was devised as a stub to begin addressing this problem:

https://github.com/getlantern/broflake/blob/05e8466c297cc69fd5156d4045100bc6213d278e/common/resource.go#L27

Currently, workers in the consumer table fire an event upon connection and disconnection which is accompanied by a ConsumerInfo struct describing the connectivity state.

Deliverables

DRAFT: #30

Deploy Freddie to lantern-cloud

Freddie needs to live in lantern-cloud. I'll deploy a staging and production versions and share the endpoints here

CC @noahlevenson

Don't panic anywhere

Handle errors like a civilized person.

Worker RX buffer clearing

In theory, a router may attempt to send messages to a worker which has entered a different state and is no longer listening to their RX channel. When that happens, the router will begin filling their RX channel buffer, and we'll panic when their RX channel buffer is full.

Do we perhaps want workers to clear their RX channel buffer when entering or exiting certain states? In addition to just being a good hygiene practice, this might be a simple way to provide "endpoint safety" -- that is, to prevent the following degenerate scenario:

Bob is a censored user currently proxying his traffic through the worker in slot 3 of Alice's worker table. In the middle of an HTTP request, Bob abruptly disconnects. There are several chunks of bytes destined for Bob's client which are still in transit. Charlie quickly claims slot 3 in Alice's worker table. The chunks of bytes destined for Bob now arrive at Alice's downstream router, which routes them to Charlie.

Another scenario we need to consider: Is it currently possible for a path assertion IPC message to sit and grow stale in a worker's RX buffer? If so, my hunch is that producer workers should clear their RX buffers twice: once before listening for a non-nil path assertion in state 1, and once before kicking off the proxyloop in state 5.

Stand up a STUN server for devs on Lantern infrastructure

This is to eliminate our reliance on EWS

(Echo Web Services)

cc: @woodybury

The widget fails to proxy in WebKit-based browsers

UPDATE 12/02/2022 PART 2: In conjunction with #12, it sure seems that this issue is actually about WebKit.

UPDATE 12/02/2022 PART 1: This issue was previously focused on all mobile platforms, but after some basic sleuthing it seems that the bug is related to iOS only. See the conversation below...

In pursuit of https://github.com/getlantern/engineering/issues/78, I fired up the widget in both Chrome and Safari on my iPhone 8. The widget completes signaling as expected -- but when I try to proxy Firefox through it, it doesn't work.

The UI Current throughput and Lifetime data proxied values do bounce around a bit in the way I'd expect them to, indicating that we're pushing bytes through the thing.

Inspecting the mobile Chrome log, it looks like the widget's connection to the egress server is borking.

Because of the way we mux and demux protocols at the egress server, sometimes WebSocket borkage actually reflects an error at a higher level of the stack -- for example, it's possible that something is going wrong with an smux stream.

Attached: the widget in mobile Chrome on my iPhone, and the related chrome://inspect output.

getlantern / broflake Goto Github PK

broflake's Introduction

Browsers Unbounded

🧭 Table of contents

💀 Warning

❓ What is Browsers Unbounded?

💾 System components

▶️ Quickstart for devs

🕸️ Observing networks with netstate

🎨 UI

UI settings and configuration

UI quickstart for devs

UI deep dive for devs

Browser extension quickstart for devs

broflake's People

Contributors

Stargazers

Watchers

broflake's Issues

Deliverables

Deliverables

Deliverables

Recommend Projects

Recommend Topics

Recommend Org