jcarreira / cirrus-kv Goto Github PK
View Code? Open in Web Editor NEWHigh-performance key-value store
License: Apache License 2.0
High-performance key-value store
License: Apache License 2.0
Should we have a mechanism to replicate data to multiple servers?
If so, should this be strongly consistent or eventually consistency would be fine? Or both?
It seems to me eventually consistent would be enough for a large class of applications, namely machine learning.
Add the other tests, such as memory test or simple get puts, to the make check command
After installing cirrus via the on-site instructions on two servers, I modified throughput.cpp
on both servers to point to the IP address of the server running tcpservermain
. Then I ran the following commands:
On one server:
nohup ./tcpservermain & disown
On the other:
nohup ./throughput & disown
throughput.cpp
began running tests, outputting the corresponding information:
throughput_128.log:
throughput 128 test
msg/s: 13.0619
bytes/s: 1671.92
throughput_406.log:
throughput 4096 test
msg/s: 13.0543
bytes/s: 53470.2
throughput_51200.log:
throughput 51200 test
msg/s: 24.9987
bytes/s: 1.27993e+06
throughput_1048576.log:
throughput 1048576 test
msg/s: 24.9772
bytes/s: 2.61905e+07
The benchmark hung after this point. nohup.out
reads:
Warming up
Warm up done
size is 128
Measuring msgs/s..
Warming up
Warm up done
size is 4096
Measuring msgs/s..
Warming up
Warm up done
size is 51200
Measuring msgs/s..
Warming up
Warm up done
size is 1048576
Measuring msgs/s..
Environment:
Ubuntu 14.04 LTS (Amazon EC2 m4.large)
g++ 6.3
Also ran automake --add-missing
in order to make ./bootstrap.sh
work without error.
We should be able to do
make benchmark
and get some numbers about latency and throughput.
The user probably has to specify the name/ip of another server where we can run the Cirrus server.
Look into the benchmarks folder to see an example of a benchmark.
Test_mult clients is not currently run in make check. Add it in.
Currently connect hangs.
Doing something like
CIRRUS_LOGGING=0
should turn off logs. Check C/C++ function getenv
Successive async reads can end up writing to the same memory address. This can lead to buggy reads.
The same bug also exists with async writes.
This bug can be replicated by running the iterator test in the iterator branch.
Right now TCPServer has the attribute max_objects
which is inconsistent with the interface of the RDMAServer (which uses pool_size
).
Both should use a raw count of bytes: pool_size
.
In the same way, tcpservermain should allow setting the size of the server pool as a command line argument (but have default size).
Things to do:
The plan is to make the build system check for the existence of the Infiniband/RDMA library (e.g., using AC_CHECK_LIB [1]).
The RDMA backend can be put in between an
#ifdef HAVE_LIBRDMACM //
// ... code here
#endif
The same for tests and benchmarks that depend on the RDMA/Infiniband backend.
[1] https://www.gnu.org/software/autoconf/manual/autoconf-2.66/html_node/Libraries.html
When client destructor is called as it goes out of scope, the following message is shown:
terminate called without an active exception
This may be due to not detaching or joining the two loose threads during the destructor.
We should have a way to run tests and check style for every commit.
Test is designed to ensure system works with multiple threads on one client. However, a concurrency issue exists as all threads read and write from the oid 1, thus overwriting one another. Additionally, a memory leak exists as d2
is never freed. As it stands now, the test will likely never pass.
We currently use raw structs to pass requests through the network.
We can check for some of the dependencies at configure time (in configure.ac). If the user has not installed some of these we want to emit a nice error message
Let's start by checking the existence of:
flatc
cpplint
This may be useful:
https://stackoverflow.com/questions/7490978/autoconf-check-for-program-and-fail-if-not-found
Coverity issues a sizeable set of alarms on the existing code. We should check the report.
Opening the discussion for thinking of GPU disaggregation.
Two things come to mind:
This allows us to pay for a cheaper instance. However, GPUs are so much more expensive than any instance that the savings here are likely to be negligible.
GPUs are expensive and are exclusively allocated to a single user. However, they are likely to not be fully utilized at all times. This means they could be shared among concurrent users.
We could build a service that provides high levels of GPU virtualization by keeping the dataset remote. Isolation between concurrent tasks could be enforced in software (has been shown to work, e..g., Singularity, but not sure about this adversarial context).
We need a single script that runs all the tests
make test
Coverage statistics can be useful in driving the development of tests.
We may want to use gcov
to get these stats and publish them into coveralls.io
We should be able to easily change between RDMA or Ethernet.
Additionally, we should be able to disable all the RDMA dependencies when compiling on an ethernet-only environment (e.g., EC2).
Ideally, we would like to provide the ability for the developer to provide its own eviction policy.
This might not be the way to go (at least for now) because:
References:
Redis eviction policies: https://redis.io/topics/lru-cache
The test benchmarks/throughput.cpp
runs well on object sizes up to 50 kilobytes, but occasionally stalls on larger objects. Present in bandwidth_benchmark branch. As logging must be disabled to get the speeds for the test, the cause of stalls is not readily apparent. Benchmark had been run without resetting server in-between, could this cause issues? Errors about "pthread_setaffinity_np error 22" were thrown as well on occasion, and only in the later revisions of the test.
Current speeds: (MB/s, messages/s) (at time of issue creation)
128 bytes: 20.7 MB/s, 162072
4K bytes: 556.371 MB/s, 135833
50K bytes: 2445.7 MB/s, 47767.9
1M bytes: 4442e MB/s, 4236.22
10M bytes: 4369.74 MB/s, 416.731
100M byes: stalled entirely
Edit: ran the benchmark once more after resetting the remote server, and all tests ran, albeit after a long delay. Strangely, despite the tests taking so long, the results for transfer speeds are still rather high. This almost makes me think that the stall is happening outside of the timed section.
100M bytes: msg/s: 42.8607 bytes/s: 4494.27MB/s
~4.5 gigabytes/s is the highest I've seen any benchmark run
Right now some of our tests make some assumptions specific to our development environment (e.g., IPs of servers).
We should allow make test
to run anywhere.
Currently we have a dependency on cityhash/libcuckoo. This means we need to set LD_LIBRARY_PATH to run any binary that uses Cirrus. For instance:
[joao@havoc:/data/joao/ddc/tests/object_store]% LD_LIBRARY_PATH=/data/joao/ddc/third_party/libcuckoo/cityhash-1.1.1/src/.libs ./test_fullblade_store
We should remove this dependency or find a way to statically link this library.
Includes the following features
store:
cache:
We should think of getting rid of the IP/port hardcoded values scattered throughout the tests.
A benefit of the hard coded values is that they simplify testing because we only need to call the binary to run the test -- no need to create a custom launch script.
We will need to create a python script to launch these tests.
The IP/port values can come from a few places:
@TylerADavis What do you think?
The current interface does not support asynchronous gets/puts, and so the code for these tests was commented out of some tests, and removed from test_fullblade_store
. These tests should be added back once the interface supports asynchronous operations.
Add back the asynchronous operations, as well as true prefetching.
Come up with a way for the server to notify the client of errors, and for the futures to convey that error back to the user.
We should have automatic generation of code documentation.
Doxygen seems like a good tool to do this.
Due to dependency on local .a not specified.
Example, in src/server:
g++ -Wall -Wextra -ansi -fPIC -std=c++1z -pthread -o bladepoolmain bladepoolmain-bladepoolmain.o -L. -lserver -lrdmacm -libverbs -L../authentication/ -lauthentication -L../utils/ -lutils -L../common/ -lcommon
/usr/bin/ld: cannot find -lserver
collect2: error: ld returned 1 exit status
make: *** [bladepoolmain] Error 1
make: *** Waiting for unfinished jobs....
/usr/bin/ld: cannot find -lserver
collect2: error: ld returned 1 exit status
make: *** [allocmain] Error 1
Deep learning libraries perform similar Distbelief pattern for model training.
We may be able to add this ability to access remote memory within those libraries without having to pass that implementation cost to developers.
Currently, we are only seeing throughput of about 20 MB/s on 128 byte puts (before the introduction of the new interface). We should be seeing speeds of about 1 GB/S
Current speeds: (MB/s, messages/s)
128 bytes: 20.7 MB/s, 162072
4K bytes: 556.371 MB/s, 135833
50K bytes: 2445.7 MB/s, 47767.9
1M bytes: 4442e MB/s, 4236.22
10M bytes: 4369.74 MB/s, 416.731
We should make Cirrus compatible with the Apache 2 license.
This entails removing the copyright messages from the source files.
We should be able to run Cirrus from within a lambda.
Includes the following features:
Store:
CacheManager:
At the moment, all state about the store is kept locally. If a client connects to a remote store that already contains objects, it will have no knowledge of the ObjectIDs in the store or the Mem_addr/ peer_rkey + other information associated with them.
Should we implement some way for the client to get state from the server, and if so, how?
We should have an easy way to turn on and off logging information. Right now this involves recompiling.
An environment variable may be a good option.
It seems there are a few open-source platforms for microfunctions. It might be worth taking a look into what they do.
For instance:
https://github.com/Azure/service-fabric
http://blog.kubernetes.io/2017/01/fission-serverless-functions-as-service-for-kubernetes.html
Currently supports only one client at a time.
Add cpplint installation as requirement in the documentation
sudo pip install cpplint
In src/BladeAllocServer.cpp
allocator->allocate(size)
can fail (exception thrown) when there is no more space available.
We should be able to send a message back to the client side with this error and propagate it all the way to whoever called the put()
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.