Comments (5)
Adding Jongyul (@yulistic), who should be able to tell you about our Ceph/NFS configuration.
from assise.
For Ceph,
- In the write latency microbenchmark, the number of data pools was 2. We deployed 1 MDS for latency microbenchmark because there are few metadata operations in the benchmark. For the scalability microbenchmark and mail server experiment, we used the same number of MDS replicas with the number of server machines.
- It corresponds to Ceph's MON and MGR daemons. Conceptually, "cluster manager" doesn't handle data and metadata. It only deploys, controls, and monitors distributed file system processes (daemons).
- Only kernel page cache size was limited. We used the default configuration for MDS cache limit.
For NFS,
- One client and one server.
For both,
- I wasn't able to find a way to directly restrict page cache size. Instead, we restrict the total memory usage of microbenchmarks. The size of the page cache used by microbenchmark can be obtained after its execution. With this observation, we set the limit of total memory usage of microbenchmark (with cgroup) so that the microbenchmark uses roughly 3 GB of page cache consequently.
You can clear the buffer cache withecho 3 > /proc/sys/vm/drop_caches
.
from assise.
Hi Jongyul,
Thank you very much for the details of the benchmark. Do you mind if I clarify about the experimental setup for measuring read latency and LevelDb performance specifically?
LevelDB Application Benchmark
- Assise
I am running Assise using 3 different hot replicas. When I set up using 3 replicas, Assise seems to stuck at some point of the replication. I tested using the normal iotest and also LevelDb. I run the LevelDb benchmark underbench/leveldb/mlfs
directory using the command./run_bench.sh fillseq
, and the program seems to hang.
For information, I am able to run and complete if Assise uses 2 different hot replicas.
Read Microbenchmark
- NFS and Ceph. NFS: 1 client and 1 server. Ceph: 2 OSD replicas. Using fio library
- HIT
- Client in NFS, Node 1 in Ceph: Write then immediately read
fio --name=test --bs=<bs> --readwrite=write --size=1G --filename=/mnt/local_dir/bench_1.txt # write fio --name=test --bs=<bs> --readwrite=read --size=1G --filename=/mnt/local_dir/bench_1.txt #immediately read
- Client in NFS, Node 1 in Ceph: Write then immediately read
- MISS
- Server in NFS, Node 1 in Ceph: Write
fio --name=test --bs=<bs> --readwrite=write --size=1G --filename=/mnt/local_dir/bench_1.txt
- Client in NFS, Node 2 in Ceph: Clean cache and read
echo 3 > /proc/sys/vm/drop_caches fio --name=test --bs=<bs> --readwrite=read --size=1G --filename=/mnt/local_dir/bench_1.txt
- Server in NFS, Node 1 in Ceph: Write
- HIT
- Assise: Run on 2 hot replicas
- HIT
- Node 1
MLFS_DIGEST_TH=100 ./run.sh iobench_lat wr 1000M <BS> 1
- Node 1
- MISS
- Node 1
./run.sh iobench_lat sw 1000M <BS> 1 # writes so that the file exist
- Node 2
./run.sh iobench_lat sr 1000M <BS> 1 # read from another node.
- Node 1
- HIT
Is the above configuration correct as the result that I got from the above tests are different from the one presented in the paper, and it does not reflect hit and miss in most cases. The results that I have are as follows
- NFS: The result for both Hit and MIss is closer to the Hit result presented in the paper for most block sizes
- Ceph: The result for both Hit and Miss is closer to the Miss result presented in the paper for most block sizes
- Assise: The result does not seem correct as the Hit method might have a higher latency than the Miss method.
May I ask for help regarding the discrepancies on benchmark results as stated above? Thank you!
from assise.
Please compare your results with the raw latency of DRAM and your IB or RoCE network. The miss latency should include the network crossing overhead which is much higher than local DRAM access (hit latency). Then, you will be able to figure out which configuration (miss or hit) is incorrect.
The numbers were measured with microbenchmarks: bench/micro. It seems better to use the same benchmark to reproduce the numbers.
Mentioning @wreda for Assise results.
from assise.
I am running Assise using 3 different hot replicas. When I set up using 3 replicas, Assise seems to stuck at some point of the replication. I tested using the normal iotest and also LevelDb. I run the LevelDb benchmark under
bench/leveldb/mlfs
directory using the command./run_bench.sh fillseq
, and the program seems to hang.
For information, I am able to run and complete if Assise uses 2 different hot replicas.
This could be a bug in the 3-replica configuration. Feel free to open another issue for this with error logs/stack trace, and I'll be happy to take a look.
Read Microbenchmark
Assise: Run on 2 hot replicas
HIT
- Node 1
MLFS_DIGEST_TH=100 ./run.sh iobench_lat wr 1000M <BS> 1
MISS
- Node 1
./run.sh iobench_lat sw 1000M <BS> 1 # writes so that the file exist
- Node 2
./run.sh iobench_lat sr 1000M <BS> 1 # read from another node.
Is the above configuration correct as the result that I got from the above tests are different from the one presented in the paper, and it does not reflect hit and miss in most cases. The results that I have are as follows
- Assise: The result does not seem correct as the Hit method might have a higher latency than the Miss method.
May I ask for help regarding the discrepancies on benchmark results as stated above? Thank you!
The configuration for Assise looks fine. Note, however, that for higher IO sizes (> 4KB) you might experience worse performance in the HIT case since it needs to do multiple hash table lookups. In any case, you can try running LibFS with profiling enabled for both HIT and MISS: MLFS_PROFILE=1 ./run.sh iobench_lat
. This will provide a more fine-grained performance breakdown, which might help us pinpoint the cause of the discrepancy.
EDIT: Upon further thought, the HIT performance is likely worse here since your file size is not small enough to fit inside the log (assuming you're using the default log size of 1 GB). This causes the file to spillover to the other caches. To avoid this, either reduce your file size or increase the log size. For example, at 4 KB IO and a log size of 1 GB, your file should be ≤ 256 MB (to account for any metadata overheads).
from assise.
Related Issues (20)
- about zookeeper HOT 6
- Not using Assise's libpmem during runtime. HOT 2
- registeration memory failed with errno: Cannot allocate memory HOT 2
- Compile errors HOT 1
- Confused about function compute_log_blocks in libfs/src/log/log.c HOT 2
- How to set up log recovery in Assise? HOT 4
- Access control in Assise HOT 2
- Segmentation fault when leases are enabled HOT 5
- Cleanup of threads managing connections to SharedFS HOT 1
- Updation of number of inode links not persisted HOT 4
- Hitting assertion failure during replication HOT 3
- Setting up a cluster with 8 nodes HOT 6
- Setting up Cluster with Multiple Nodes - Segmentation Fault HOT 12
- Memory Region Failure ibv_reg_mr failed [error code: 14] HOT 3
- Multi-thread Filebench HOT 1
- Running the warm replicas experiment from the paper
- Segmentation fault on pthread_create when initializing connections
- I see that the global lru list "g_fcache_head" is not protected by locks, which may cause something wrong? HOT 1
- Can we run assise directly on dram? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assise.