privacylab / talek Goto Github PK

View Code? Open in Web Editor NEW

45.0 8.0 5.0 912 KB

a Private Publish Subscribe System

License: BSD 2-Clause "Simplified" License

Go 98.74% Makefile 0.37% Cuda 0.62% Dockerfile 0.20% Shell 0.06%

privacy cloud pubsub publish-subscribe messaging anonymity

talek's Introduction

Talek

Talek is a privacy-preserving messaging system. User communication is stored on untrusted systems using PIR.

Getting Started

A basic client can be found at go get github.com/privacylab/talek/cli/talekclient

Talek uses a construct called topic handles. Topics represent a stream of messages from one author to a few readers. The author who creates a topic can provide a handle to it to allow others to "follow along". A longer description of the specific guarantees of a topic are provided in the academic paper linked below.

Basic Usage:

talekclient --config=talek.conf --create --topic=newhandle
talekclient --config=talek.conf --topic=newhandle --write "Hello World"
talekclient --config=talek.conf --topic=newhandle --share=readOnlyHandle
talekclient --config=talek.conf --topic=readOnlyHandle --read

Develop

Pull requests are welcome! Please run all tests (see below) before submitting a PR.

System Dependencies

Depending on which PIR implementation you use, you may need to install OpenCL / CUDA. Make sure you have the latest graphics drivers for your video card.

NVIDIA CUDA:

Drivers
CUDA

OpenCL on Ubuntu:

sudo apt-get install -y ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-headers clinfo
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so /usr/lib/x86_64-linux-gnu/libCL.so

OpenCL on macOS:

OpenCL is included in the developer tools. See here

Tools

gometalinter for linting

$ make get-tools

Testing

All tests should pass before submitting a pull request

$ make test

The GPU backings are not built by default. Changes to pir/, where the backing interface may be affected should ensure that code is tested with go test -tags 'cuda,opencl' to include testing of all drivers.

Following Along:

Join the mailing list: https://lists.riseup.net/www/info/talek

Publication

Talek: a Private Publish-Subscribe Protocol.
Raymond Cheng, Will Scott, Bryan Parno, Irene Zhang, Arvind Krishnamurthy, Tom Anderson.
In Submission. 2020.
PDF

talek's People

Contributors

Stargazers

Watchers

Forkers

khoazany decanus katzenpost

talek's Issues

Interest Vectors

Interest vectors need to be used to prioritize reads on the client

Scaling Metaissue

High-level questions:

Can we do better than global timestamps? Strict serializability is not necessary
How do we minimize cross-talk across shards?

can't read a write to a topic

with the pir backend from #57 & shim in #61, operations perform without crashing.

However, if I:

talekclient --create
talekclient --topic talek.handle --share talek.shared
talekclient --topic talek.handle --write "Hello World"
talekclient --topic talek.shared --read

The read will time out, without seeing the message written to the topic.

PIR Sharder

The Front end reader needs to

Batch Requests, and queue responses until a decided number are in-flight.
Split the global request vectors into per-shard vectors
Xor responses from shards back to client responses

Bloom Filter

http://blog.michaelschmatz.com/2016/04/11/how-to-write-a-bloom-filter-cpp/

Parse / Firebase Shim

Write a shim to automatically port Parse / Firebase apps

Understand if client needs to know sequence number of topics

Presumably it needs to keep track of where it is in a topic handle, but how does that bootstrap initially? in practice, the shared secret with the topic password should probably also indicate the 'current' sequence number, and the client will 'catch up' and then keep track of where it is from there.

That requires a stateful client though, so there needs to be an interface for the topic to be serialized and restored with both the current sequence number and the seed.

Wire Integration

https://github.com/wireapp

LibPDB API

Currently, the api in libpdb/client.go doesn't really make sense:

publishtrace and poll trace are debug methods and should be in a test class
subscribe doesn't provide any interface for actually getting updates to the topic
No way to cancel a subscription
CreateTopic doesn't allow specification of password, and feels pretty disconnected from publish.

Key exchange between client and follower trust domains

There's some care here around the encryption process for messages between the the client and the follower trust-domains that go beyond link-layer security.

We've talked about a symmetric random overlay xor'ed on responses that the client can then xor out on its side. figuring out what that is on the way out, and figuring out the process for passing requests onwards to those follower trust domains is something we should write a formal description for.

Client needs to validate if responses actually belong to topic

When alternate bucket is read, or when when there is no next update, the client will get back random data that doesn't decode as expected.

libpdb/Topic needs to maintain a checksum or otherwise know when a read is 'successful', and not advance state in that case.

Add install instructions & simple examples

In order for us curious folks to check this out 😄 @willscott told me a bit about this project and it sounds quite cool!

Multiple Talek Instances

Currently only supports 1 instance of protocol

Dynamic Configurations

Currently all servers are statically assigned. Dynamic configurations to support

Adding new groups/servers
Removing servers/group
Resharding

CUDA/OpenCL/CPU-SIMD Comparison

Tracking in https://github.com/privacylab/talek/tree/ryscheng-pir

Client fails to reconnect

If the client library's connection to the frontend breaks, it keeps trying to use the broken connection.

[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: unexpected EOF
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: unexpected EOF
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: connection is shut down
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: connection is shut down
...

Test, Coverage, Lint script

and perhaps a pre-commit hook to make sure we're doing it
And some CI too

Cuckoo Hash

golang implementation https://github.com/asp2insp/cuckoofilter

Item Length Preservation

Currently, the Client will pad Publish messages with zero's up to DataSize. The response will then be the full DataSize long. Do we want to reserve a byte for tagging the actual length of the item?

Client needs to enforce message constraints

libPDB doesn't currently enforce the lengths of messages passed by the client. This length should be exposed and enforced from the config DataSize element.

Applications Integration

Potential client-side shims to easily integrate with existing applications

Signal
Parse/Firebase
Javascript

prevent dos attacks of interest vector

if the client fully controls interest vector, they can trigger popular vectors to trigger other client reads.

perhaps zero knowledge proofs can be used to force compliance of the interest vector.

Scheduling

How do we determine the global read/write rate?
How do we tradeoff this rate with rate-limiting and latency?
Can this be dynamic? e.g. with RAPPOR?

Find initial operators

We should provision 2-3 servers to run an initial demonstration instance of Talek for public interaction / testing.
I think 3 would be great, since 2 remains a bit of a degenerate case.

Replicated Central Controller

Using Raft

Globally log and order operations
Generate periodic shard configurations

Consistency and Safety of Write Epoch application

The initial implementation of this functionality in #39 uses a NextEpoch call in the follower API, but because this goes along a separate channel within the Shard, it may be inconsistent in when it is applied in different replicas.

Probably, adding a field to WriteArgs (or different structs passed between servers versus the client-server interface, preferably) is going to be an easier long-term way to do this, but required a it more refactoring than this initial implementation.

Branding

If PDB isn't the name, and given the lack of a 'DB' interface, maybe that makes sense, we should probably not hesitate to move to something reasonable.

Continuous Integration for all tests

Currently we cannot run the PIR tests for OpenCL or CUDA because our Travis CI environment doesn't support GPU testing. Can we find one that does support it?

Related:
#64

Support multiple GPU devices in context_cl.go

Currently a ContextCL will select a random GPU for use. I think instead it should just enumerate all GPU devices and use them in a single context. Hopefully OpenCL would be smart enough to properly load balance concurrent execution.

Bloom should be in vendor?

it looks like the bloom/ directory is 3rd party code. why isn't it in vendor?

Kubernetes setup

We'd like some easy mechanism for setting up a service and failure detection

DRBG: CTR Mode

Randomness for Dummy Requests

Should this be using crypto/rand?

Vendoring dependencies

https://github.com/tools/godep
https://github.com/kardianos/govendor

Persistent connections

Currently all RPCs create a new TCP connection.

Branding

Talek should have a logo and other identifiable branding.

Server Authentication

PIR on GPU

http://developer.amd.com/resources/articles-whitepapers/opencl-optimization-case-study-simple-reductions/

Just a place to track progress

Address Malicious Subscribers

There isn't currently an exposed interface differentiating the owner of a topic who has the needed keys to publish to that topic, and subscribers who can read new values but aren't able to publish. Is that a distinction that is desired?

Client catch-up / resynchronization strategy

it would be nice if the latency between a publisher writing a message and a client retrieving it is sufficiently long for the message to fall out of the database that there's a recovery strategy for re-synchronization beyond out-of-band resharing of handle, which is the current behavior.

generalized followers

currently server/centralized.go expects act as either a leader with a single follower, or as a follower.
We should at least generalize to support n followers for single leader.

Compact config representation

Probably worth revisiting the JSON file serialization used for talek.conf and talek.handle. In particular, the "public key as array of bytes" is probably better represented hex-encoded.

Ideally a handle would be serialized to a single line that was short enough to copy-paste. This compact representation could be (for example)
seed1.seed2.shared_secret.sequence_number

server.config should include trust domains

currently, the server.Config struct is not able to fully configure a talek server. The missing piece is that it does not include the information about the trust domains. While a talek server does not need to know the keys of the other servers, it does need to know if it's a leader, and which other servers it should be RPC'ing with. (And eventually, the server keys should probably be used to establish and validate these RPC connections).

This is a self-contained work item to add an []trustDomainConfig to server.Config, update talekutil to fill it, and update the server driver to determine leader and rpc's from it.

Topic needs to calculate how often it should poll for updates

there isn't a code path for the client managing relative topic update frequency, or how often it should make a real versus dummy read request.

We can probably start with simple round-robin reads from active topics, but eventually relative speed should be tracked to optimize reading latency.

Distributed Point Functions

https://www.iacr.org/archive/eurocrypt2014/84410245/84410245.pdf

Server recovery

For shards
For central coordinator nodes

If i can't run go build successfully in the project because it fails on cuda headers that i can't install / test because my computer doesn't support them, i'm just not going to run the tests.