Coder Social home page Coder Social logo

privacylab / talek Goto Github PK

View Code? Open in Web Editor NEW
45.0 8.0 5.0 912 KB

a Private Publish Subscribe System

License: BSD 2-Clause "Simplified" License

Go 98.74% Makefile 0.37% Cuda 0.62% Dockerfile 0.20% Shell 0.06%
privacy cloud pubsub publish-subscribe messaging anonymity

talek's Introduction

Talek

Build Status Coverage Status GoDoc

Talek is a privacy-preserving messaging system. User communication is stored on untrusted systems using PIR.

Getting Started

A basic client can be found at go get github.com/privacylab/talek/cli/talekclient

Talek uses a construct called topic handles. Topics represent a stream of messages from one author to a few readers. The author who creates a topic can provide a handle to it to allow others to "follow along". A longer description of the specific guarantees of a topic are provided in the academic paper linked below.

Basic Usage:

talekclient --config=talek.conf --create --topic=newhandle
talekclient --config=talek.conf --topic=newhandle --write "Hello World"
talekclient --config=talek.conf --topic=newhandle --share=readOnlyHandle
talekclient --config=talek.conf --topic=readOnlyHandle --read

Develop

Pull requests are welcome! Please run all tests (see below) before submitting a PR.

System Dependencies

Depending on which PIR implementation you use, you may need to install OpenCL / CUDA. Make sure you have the latest graphics drivers for your video card.

NVIDIA CUDA:

OpenCL on Ubuntu:

sudo apt-get install -y ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-headers clinfo
sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so /usr/lib/x86_64-linux-gnu/libCL.so

OpenCL on macOS:

  • OpenCL is included in the developer tools. See here

Tools

$ make get-tools

Testing

All tests should pass before submitting a pull request

$ make test

The GPU backings are not built by default. Changes to pir/, where the backing interface may be affected should ensure that code is tested with go test -tags 'cuda,opencl' to include testing of all drivers.

Following Along:

Join the mailing list: https://lists.riseup.net/www/info/talek

Publication

Talek: a Private Publish-Subscribe Protocol.
Raymond Cheng, Will Scott, Bryan Parno, Irene Zhang, Arvind Krishnamurthy, Tom Anderson.
In Submission. 2020.
PDF

talek's People

Contributors

mixmasala avatar ryscheng avatar willscott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

talek's Issues

Interest Vectors

Interest vectors need to be used to prioritize reads on the client

Scaling Metaissue

High-level questions:

  • Can we do better than global timestamps? Strict serializability is not necessary
  • How do we minimize cross-talk across shards?

can't read a write to a topic

with the pir backend from #57 & shim in #61, operations perform without crashing.

However, if I:

talekclient --create
talekclient --topic talek.handle --share talek.shared
talekclient --topic talek.handle --write "Hello World"
talekclient --topic talek.shared --read

The read will time out, without seeing the message written to the topic.

PIR Sharder

The Front end reader needs to

  • Batch Requests, and queue responses until a decided number are in-flight.
  • Split the global request vectors into per-shard vectors
  • Xor responses from shards back to client responses

Understand if client needs to know sequence number of topics

Presumably it needs to keep track of where it is in a topic handle, but how does that bootstrap initially? in practice, the shared secret with the topic password should probably also indicate the 'current' sequence number, and the client will 'catch up' and then keep track of where it is from there.

That requires a stateful client though, so there needs to be an interface for the topic to be serialized and restored with both the current sequence number and the seed.

LibPDB API

Currently, the api in libpdb/client.go doesn't really make sense:

  • publishtrace and poll trace are debug methods and should be in a test class
  • subscribe doesn't provide any interface for actually getting updates to the topic
  • No way to cancel a subscription
  • CreateTopic doesn't allow specification of password, and feels pretty disconnected from publish.

Key exchange between client and follower trust domains

There's some care here around the encryption process for messages between the the client and the follower trust-domains that go beyond link-layer security.

We've talked about a symmetric random overlay xor'ed on responses that the client can then xor out on its side. figuring out what that is on the way out, and figuring out the process for passing requests onwards to those follower trust domains is something we should write a formal description for.

Client needs to validate if responses actually belong to topic

When alternate bucket is read, or when when there is no next update, the client will get back random data that doesn't decode as expected.

libpdb/Topic needs to maintain a checksum or otherwise know when a read is 'successful', and not advance state in that case.

Dynamic Configurations

Currently all servers are statically assigned. Dynamic configurations to support

  • Adding new groups/servers
  • Removing servers/group
  • Resharding

Client fails to reconnect

If the client library's connection to the frontend breaks, it keeps trying to use the broken connection.

[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: unexpected EOF
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: unexpected EOF
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: connection is shut down
[FrontendRPC:RPC] frontend_rpc.go:49: rpc error: connection is shut down
...

Item Length Preservation

Currently, the Client will pad Publish messages with zero's up to DataSize. The response will then be the full DataSize long. Do we want to reserve a byte for tagging the actual length of the item?

Applications Integration

Potential client-side shims to easily integrate with existing applications

  • Signal
  • Parse/Firebase
  • Javascript

prevent dos attacks of interest vector

if the client fully controls interest vector, they can trigger popular vectors to trigger other client reads.

perhaps zero knowledge proofs can be used to force compliance of the interest vector.

Scheduling

How do we determine the global read/write rate?
How do we tradeoff this rate with rate-limiting and latency?
Can this be dynamic? e.g. with RAPPOR?

Find initial operators

We should provision 2-3 servers to run an initial demonstration instance of Talek for public interaction / testing.
I think 3 would be great, since 2 remains a bit of a degenerate case.

Consistency and Safety of Write Epoch application

The initial implementation of this functionality in #39 uses a NextEpoch call in the follower API, but because this goes along a separate channel within the Shard, it may be inconsistent in when it is applied in different replicas.

Probably, adding a field to WriteArgs (or different structs passed between servers versus the client-server interface, preferably) is going to be an easier long-term way to do this, but required a it more refactoring than this initial implementation.

Branding

If PDB isn't the name, and given the lack of a 'DB' interface, maybe that makes sense, we should probably not hesitate to move to something reasonable.

Continuous Integration for all tests

Currently we cannot run the PIR tests for OpenCL or CUDA because our Travis CI environment doesn't support GPU testing. Can we find one that does support it?

Related:
#64

Support multiple GPU devices in context_cl.go

Currently a ContextCL will select a random GPU for use. I think instead it should just enumerate all GPU devices and use them in a single context. Hopefully OpenCL would be smart enough to properly load balance concurrent execution.

Kubernetes setup

We'd like some easy mechanism for setting up a service and failure detection

Branding

Talek should have a logo and other identifiable branding.

Address Malicious Subscribers

There isn't currently an exposed interface differentiating the owner of a topic who has the needed keys to publish to that topic, and subscribers who can read new values but aren't able to publish. Is that a distinction that is desired?

Client catch-up / resynchronization strategy

it would be nice if the latency between a publisher writing a message and a client retrieving it is sufficiently long for the message to fall out of the database that there's a recovery strategy for re-synchronization beyond out-of-band resharing of handle, which is the current behavior.

generalized followers

currently server/centralized.go expects act as either a leader with a single follower, or as a follower.
We should at least generalize to support n followers for single leader.

Compact config representation

Probably worth revisiting the JSON file serialization used for talek.conf and talek.handle. In particular, the "public key as array of bytes" is probably better represented hex-encoded.

Ideally a handle would be serialized to a single line that was short enough to copy-paste. This compact representation could be (for example)
seed1.seed2.shared_secret.sequence_number

server.config should include trust domains

currently, the server.Config struct is not able to fully configure a talek server. The missing piece is that it does not include the information about the trust domains. While a talek server does not need to know the keys of the other servers, it does need to know if it's a leader, and which other servers it should be RPC'ing with. (And eventually, the server keys should probably be used to establish and validate these RPC connections).

This is a self-contained work item to add an []trustDomainConfig to server.Config, update talekutil to fill it, and update the server driver to determine leader and rpc's from it.

Topic needs to calculate how often it should poll for updates

there isn't a code path for the client managing relative topic update frequency, or how often it should make a real versus dummy read request.

We can probably start with simple round-robin reads from active topics, but eventually relative speed should be tracked to optimize reading latency.

Signal Integration

Fork Signal and use Talek for private messaging

May need a mechanism to manually add contacts via QR code if we cannot find a reasonable replacement for key distribution.

cuda/cl tags should be opt-in

having dealt with this for a bit now, i've decided i care more, and that the optional libraries need to be opt-in parts of the build.

editors / standard ecosystem tools expect a 'go build' without tags to succeed, and tags are not always available. (e.g. atom doesn't have a way to set them https://www.bountysource.com/issues/42210626-does-go-plus-support-build-tags)

If i can't run go build successfully in the project because it fails on cuda headers that i can't install / test because my computer doesn't support them, i'm just not going to run the tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.