Coder Social home page Coder Social logo

ut-osa / assise Goto Github PK

View Code? Open in Web Editor NEW
57.0 57.0 29.0 12.35 MB

License: GNU General Public License v2.0

Makefile 1.19% C 79.14% C++ 11.42% M4 0.11% Roff 3.56% Filebench WML 1.25% Shell 0.67% Yacc 0.68% Lex 0.09% HTML 0.46% CSS 0.01% Perl 0.15% GDB 0.01% CMake 0.66% Python 0.60%

assise's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

assise's Issues

Not using Assise's libpmem during runtime.

Looks like we are not using Assise's libpmem library file (libfs/lib/nvml/src/nondebug/libpmem.so) during runtime.

LD_FLAGS = -lpthread -laio -lstdc++ -lm -lnuma -L$(NVML_DIR)/nondebug/ -lpmem -lrt -L$(RDMA_CORE_DIR)/build/lib -Wl,-rpath=$(RDMA_CORE_DIR)/build/lib #-Wl,-fuse-ld=gold
(corresponding -Wl,-rpath missing for libpmem).
Due to this, during runtime, Assise falls back to using the system's (For e.g., located in /usr/lib/x86_64-linux-gnu/) libpmem instead.

Here's what LDD output:

# ldd build/libmlfs.so
	linux-vdso.so.1 (0x00007ffce1d9e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f71a300d000)
	libaio.so.1 => /lib/x86_64-linux-gnu/libaio.so.1 (0x00007f71a3008000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f71a2e26000)
	***** **libpmem.so.1 => /lib/x86_64-linux-gnu/libpmem.so.1 (0x00007f71a2de1000)** *****
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f71a2dd6000)
	librdma.so => /home/om/wspace/assise/libfs/lib/rdma/librdma.so (0x00007f71a2dc6000)
	libjemalloc.so.2 => /home/om/wspace/assise/libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 (0x00007f71a2d64000)
	libsyscall_intercept.so.0 => /home/om/wspace/assise/libfs/lib/syscall_intercept/install/lib/libsyscall_intercept.so.0 (0x00007f71a2c3f000)

After adding the -Wl,rpath flags, here's the LDD output:

# ldd build/libmlfs.so
	linux-vdso.so.1 (0x00007fffbb9ff000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fb73863c000)
	libaio.so.1 => /lib/x86_64-linux-gnu/libaio.so.1 (0x00007fb738637000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb738455000)
	**** libpmem.so.1 => /home/om/wspace/assise/libfs/lib/nvml/src/nondebug/libpmem.so.1 (0x00007fb738421000) ****
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fb738416000)
	librdma.so => /home/om/wspace/assise/libfs/lib/rdma/librdma.so (0x00007fb738406000)
	libjemalloc.so.2 => /home/om/wspace/assise/libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 (0x00007fb7383a4000)
	libsyscall_intercept.so.0 => /home/om/wspace/assise/libfs/lib/syscall_intercept/install/lib/libsyscall_intercept.so.0 (0x00007fb73827f000)

Is this the correct intent? Because we seem to be using Assise's libpmem flag for compiling (in -L flag) which got me thinking.

Thanks

Updation of number of inode links not persisted

Consider the example scenario:
A new filesystem that has only /mlfs

  1. Check the number of nlinks for /mlfs, it returns 1
  2. Create a directory /mlfs/A
  3. Check the number of nlinks again, it returns 2 (from DRAM)
  4. First 3 steps run in the same process.
  5. In a new program check the number of nlinks for /mlfs it returns 1 (This should be 2)

If my understanding is correct we have not persisted it during log digestion.

I attempted a fix for this here: #21

Here's the sample program I used to test it:

create.c

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <assert.h>

#include <mlfs/mlfs_interface.h>

#define PARENT_DIR "/mlfs"
#define CHILD_DIR "/mlfs/A"

int main() {
    init_fs();
    int ret;
    struct stat statbuf;

    lstat(PARENT_DIR, &statbuf);
    printf("nlinks before creating = %ld\n", statbuf.st_nlink);

    ret = mkdir(CHILD_DIR, 0777);
    assert(ret == 0);

    lstat(PARENT_DIR, &statbuf);
    printf("nlinks after creating = %ld\n", statbuf.st_nlink);
    shutdown_fs();
}

check.c

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <assert.h>
#include <unistd.h>

#include <mlfs/mlfs_interface.h>

#define PARENT_DIR "/mlfs"
#define CHILD_DIR "/mlfs/A"

int main() {
    init_fs();

    int ret;
    struct stat statbuf;

    ret = access(CHILD_DIR, F_OK);
    assert(ret == 0);

    printf("%s exists!\n", CHILD_DIR);

    assert(lstat(PARENT_DIR, &statbuf) == 0);
    printf("nlinks = %ld\n", statbuf.st_nlink);

    shutdown_fs();
}

Program error in 16+ threads

Hello, we would like to test Assise on our cluster. Assise works well in the case of multi-thread. But when the number of threads increases to 16, Assise stops to work and shows the following error message:

Failed to acquire msg buffer. Buffer is locked. [error code: 0]

Do you have any ideas and suggestions?

Hitting assertion failure during replication

Hello,

I have configured 32GB of emulated NVM on my machine using the steps mentioned in the repository for a 2-node cluster using RDMA.I have set the dev sizes to 8GB NVM using utils/change_dev_size.py 8 0 0

Then I start the cluster, and try to run the example mentioned in the repository: ./tests/run.sh iotest sw 2G 4K 1

However, libfs fails with the assertion (most of the times):

Assertion failed: src/distributed/replication.c, start_rsync_session(), 995 at 'peer->remote_start <= peer->start_digest'

Could I get help with this issue? I am happy to provide access to my cluster if that helps solve the issue.

Setting up Cluster with Multiple Nodes - Segmentation Fault

Hi,

Setup

I am trying to set up a simple cluster with 2 nodes. These are the network interfaces of each node:

  1. Node 1

eno33: 128.110.219.19
enp65s0f0: 10.10.1.2

  1. Node 2

eno33: 128.110.219.27
enp65s0f0: 10.10.1.3

In each of these node, I set g_n_hot_rep to 2 and RPC interface to

static struct peer_id hot_replicas[g_n_hot_rep] = {                                                         
 { .ip = "10.10.1.2", .role = HOT_REPLICA, .type = KERNFS_PEER},                                   
 { .ip = "10.10.1.3", .role = HOT_REPLICA, .type = KERNFS_PEER},
};

I run KernFS starting from the node that has 10.10.1.3 as its interface.

Result

I received a segmentation fault

initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 8192 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 4013
ip address on interface 'ib0' is 10.10.1.2
cluster settings:
--- node 0 - ip:10.10.1.2
--- node 1 - ip:10.10.1.3
Connecting to KernFS instance 1 [ip: 10.10.1.3]
./run.sh: line 15:  4013 Segmentation fault      LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

Debugging

After debugging, it looks like the segmentation fault comes in libfs/lib/rdma/agent.c line 96 and line 130, the rdma_cm_id struct after rdma_create_id is NULL.
I also run the filesystem as a local file system, where g_n_hot_rep = 1 and RPC interface is set to localhost, and it works

Do you mind helping me with this problem? Thank you very much!

Segmentation fault on pthread_create when initializing connections

Note that this issue is not from Assise but the syscall_intercept library. But this will be useful if your system uses GLIBC 3.25 or higher.

When initializing Assise, threads for communication between LibFS and KernFS are created via pthread_create(). In recent systems, the pthread_create() invokes a system call with SYS_clone3, instead of SYS_clone. Unfortunately, the current syscall_intercept does not handle this syscall correctly, so it causes a segmentation fault in the syscall_intercept library.

If you were struggling with this segmentation fault, you would like to fix the syscall_intercept code based on this issue.

Today(09/12/2022), this code still does not handle the issue properly.

Hope this knowledge helps you.

How to set up log recovery in Assise?

I am trying to do a simple crash consistency test for Assise in local mode (i.e., non distributed) and using only NVMs.

Here's what the test does:

do_crash.c

  1. Create a file
  2. _exit(0) // so that exit handlers are not invoked (thus not digested). (Note that we do not call fsync)

check_crash.c

  1. Execute access call to check if file created in do_crash.c is present

The check_crash.c does not seem to be passing i.e., it says the file is absent. The test passes if a clean exit is done (by commenting out _exit(0))

I believe this is because the log recovery is not enabled. How do I set it up to test this crash consistency?
I tried adding a digest call in the init_log() of LibFS, but the n_digest value loaded does not seem to reflect the correct value. I may be wrong here, but the n_digest value seems to be stored in dram rather than NVM? This operation does not seem to be updated in the NVM, only updated in DRAM after transaction completion. I don't have a deep understanding of the codebase, so I guess I'm probably missing something here.

It would be great if you could help me in setting it up. I believe this scenario is similar to the OS fail-over experiment described in the paper.

I've also included the do_crash.c and check_crash.c programs here for reference.

Thanks in advance for your help!

do_crash.c

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
#include <stdlib.h>

#define FILE_NAME "/mlfs/foo"

int main() {
    int ret;
    int child;
    int wstatus;
    int fd;


    if(remove(FILE_NAME) == 0) {
        printf("Deleted file successfully!\n");
    } else {
        printf("Deletion of file unsuccessful, probably does not exist!\n");
    }

    printf("Tring to open\n");
    fd = open(FILE_NAME, O_CREAT | O_RDWR, 0644);
    assert(fd >=0 );
    printf("Open completed\n");
    _exit(0);  // Commenting this line ensures remaining logs are digested due to invocation of Assise's exit handler

    close(fd);
}

check_crash.c

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>

#define FILE_NAME "/mlfs/foo"

int main() {
    int ret;

    ret = access(FILE_NAME, F_OK);
    printf("The return value is: %d\n",ret);
    assert(ret == 0);
    printf("The file is present!\n");

    return 0;
}

Running the warm replicas experiment from the paper

Hello,

I was interested in replicating the reserve replica experiment ("LevelDB random read latencies with reserve replica") from the Assise paper, I was wondering if there are any specific directions I should follow for setting up reserve replicas?

Related questions I had:

  1. Do I need to set the MLFS_REPLICA flag (or rather, unset the -DMASTER) on the reserve replica KernFS?
  2. Do I need SSD's to setup reserve replicas? Would I set the USE_SSD flag, and the SSD_READ_REDIRECT flag? In addition, in the mkfs.sh script, instead of mkfs.mlfs 1, would I do mkfs.mlfs 2?

Thank you!

Can't unmount and re-mount in same process

Hi,

I am actually having this issue with Strata, but I think it is in Assise as well. I have a program that attempts to perform the following set of steps two times:

  1. Initialize a new instance of Strata (calls mkfs on two emulated PM devices, run a command to set up kernfs, run init_fs to start the libfs)
  2. Create, fsync, and close a file in Strata
  3. Unmount Strata by calling shutdown_fs() and killing the kernfs process

In the first iteration, everything works as expected. The second time, when I get to step 2, the process is just killed. It appears to be killed while Strata is trying to open the file, because none of my error handling code after the open() call runs. Strata doesn't print any error messages.

I noticed that the LibFS only initializes the file system in init_fs() if a variable initialized is 0. init_fs() sets this variable to 1, but shutdown_fs() doesn't set it back to 0. Is this intentional? I added a line in shutdown_fs() so that initalized is set to 0 when the system is shut down, and things started working as expected.

Also - is there a way to shut down kernfs cleanly from an external process? I see that it has a shutdown_fs() function but I don't immediately see a way to invoke it externally, and I'd like to be able to umount kernfs after running arbitrary workloads.

Thanks!

./run.sh: line 15: 30193 Segmentation fault (core dumped)

I'm interested in distributed file systems and trying to install Assise. I encountered problems in performing this step:
https://github.com/ut-osa/assise#6-run-kernfs

cd kernfs/tests
sudo ./run.sh kernfs

The output is as follows:

initialize file system
dev-dax engine is initialized: dev_path /dev/pmem5 size 61440 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 31305
ip address on interface 'ib0' is 10.0.0.53
cluster settings:
--- node 0 - ip:10.0.0.53
--- node 1 - ip:10.0.0.52
--- node 2 - ip:10.0.0.51
--- node 3 - ip:10.0.0.50
Connecting to KernFS instance 1 [ip: 10.0.0.52]
./run.sh: line 15: 31305 Segmentation fault      (core dumped) LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

And the output of dmesg

[  323.619063] kernfs[3643]: segfault at 220 ip 00007f1441bb0761 sp 00007f053c3f54c0 error 4 in librdmacm.so.1.3.30.0[7f1441baa000+17000]
[  323.619098] kernfs[3538]: segfault at 220 ip 00007f1441bb304a sp 00007ffd22e7c700 error 4 in librdmacm.so.1.3.30.0[7f1441baa000+17000]

I'm sure that we have RDMA devices and PMEM that work well.

Access control in Assise

Hi,

We're working on implementing access control using Assise leases. We have a proposed methodology, but have a few questions about the code.

Before presenting the questions, here's our current proposed methodology:

  • Each LibFS registers with a SharedFS upon initialization. We would add an extra step where the LibFS shares its owner and the owner's primary group with the local SharedFS.
    • The local SharedFS would track these values in a map (PID to owner and group).
    • The map could be backed up to a private file for crash consistency.
    • Any LibFS requests must first go through the local SharedFS, so if we need to forward this information to another SharedFS, any forwarded requests would also include this owner + group information.
  • The SharedFS holding the requested file/directory will make the lease decision. Three new fields in the inode, uid, gid, and perms, will be added.
    • uid and gid match the process's class (user, group, other).
    • The type of lease (read/write) will determine which set of permissions in perms to check against.
  • If the lease is granted, it will behave identically to current Assise, regardless of whether the permissions of the file/directory change.
    • If a process maliciously writes to its log when it's denied permissions (or has no lease in general), digesting those log entries should fail because the process didn't have the lease at that time.

Here's our questions:

  1. Do we also need to implement execute permissions?
  2. Read leases don't seem to be implemented in KernFS's modify_lease_state function.
    • Do we need to implement these?
    • Could we instead just do a naive check and return something like -EACCES if the permissions check doesn't work out?
  3. acquire_lease is commented out in namex, which is presumably where Assise goes to acquire leases.
    • Is this supposed to be left commented out?
    • If so, where does LibFS guide POSIX calls to acquire read/write leases?
  4. We propose to have SharedFSes track the owner / the owner's primary group of each local LibFS.
    • We can't find any existing data structure in libfs/src/mkfs/mfks.c or kernfs/fs.c that explicitly sets up relationships between LibFSes and KernFSes.
    • Can we assume that these relationships are registered statically?
    • If so, how can we send process owner + group information to SharedFSes when a LibFS starts up? Is this a secure procedure, or should we assume processes can lie about their owner?
  5. Should there be any changes to the digestion of logs?
    • If our permission checks are correct, and digestion rejects writes from processes that lack(ed) the write leases at the time of their writes, then the answer should be no.
    • Are there any edge cases where lease history is unused for checking the validity of log writes?
  6. How can we run libfs as a non-privileged user?
    • For example, running ./run.sh iotest sw 2G 4K 1 (no sudo) yields the following error. It looks like something in the shim requires extra privileges.
    • Would it be okay to spoof LibFS owners / groups in our tests, if this is something we can't avoid?
dev-dax engine is initialized: dev_path /dev/dax0.0 size 4096 MB
fetching node's IP address..
Process pid is 19681
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:19681, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:19681, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
[Local-Client] Creating connection (pid:19681, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
In thread
SEND --> MSG_INIT [pid 0|19681]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
shm_open failed.
: Permission denied
  1. When running the lease test, e.g. sudo ./run.sh lease_test c 1 1, we cannot run the test as we get an error, for example: incorrect fd -2: file /mlfs/fileset/f0_7618.
    • Other tests, like many_files_test, work fine.
    • We see similar errors when running sudo ./run_lease_test_example.sh in the output logs. What can we do to solve this?

Thanks for your help!

Assise with MPI

I'm trying to run a BTIO benckmark (link) with Assise over Intel Optane DCPMM to evaluate the performances of Assise and Optane. I have configured Assise only on one single node, and also have run the benchmark on this node, using 4 MPI ranks (processes).
When running, I get this following error from each MPI rank:
mca_fbtl_posix_pwritev: error in writev:Bad file descriptor.
It seems that Assise does not know to intercept MPI read/write calls.
Assuming Assise is under a continuous development, is Assise expected to support MPI I/Os in the future?
Thank you.

Compile errors

Hello,

I ran into some compiling errors when I compiled the code out of the box.

Errors happened when I tried installing dependencies and compiling libfs like the screenshots below.
compile_libfs

install_deps

My system uses Debian 11 and gcc/g++ version 10.2.1.

Seg fault running Assise as local FS

Hi folks,

I am trying to set up Assise to run as a local file system but I'm having trouble getting it to run. I've been able to successfully build Assise, configure storage, run mkfs, and start up the KernFS/SharedFS process. I followed the instructions here to configure Assise to run as a single local file system. When I try to run a program from libfs/tests (I've been using mkdir_user but have tried a few others), the KernFS appears to segfault. I spent some time trying to figure out where it might be occurring without much luck, although it appears to occur before mkdir_user's main function actually runs.

I did make some small changes to Assise, although I don't think they are the cause of the issue. I want to run Assise on a very small emulated PM device (128 MB would be best, a couple GB at most) so I had to reduce the number of inodes and the size of each LibFS's log in order to prevent asserts from failing.

I'm running Assise on a QEMU/KVM virtual machine with 4 cores and Linux kernel 5.1 and at 8GB of RAM. I've tried running it on 128MB, 1GB, 2GB, and 3GB of emulated PM and get the segmentation fault on all of them.

I also tried disabling the DISTRIBUTED compilation flag, but ran into build issues; I can post more details about that if I need to remove this flag to get things to work.

Thanks in advance for your help!

Memory Region Failure ibv_reg_mr failed [error code: 14]

Hi,

Problem

When setting up, I encountered ibv_reg_mr failed [error code: 14] error, indicating that the memory region requested is a bad / invalid address.
Here are the values that I have

  • At libfs/lib/global/global.h
    • g_log_size 32768UL // (128 MB)
  • At libfs/lib/rdma/mr.c
    • mrs[0] = {addr = 140733004644352, length = 4299161600}
    • mrs[1] = {addr = 140728713871360, length = 111071232}
  • At vi /etc/default/grub
    • GRUB_CMDLINE_LINUX="memmap=8G!4G"
  • Size configuration
    • python3 ./utils/change_dev_size.py 8 0 0
  • At /etc/security/limits.conf
    •  * soft memlock unlimited
       * hard memlock unlimited
      

Do you mind helping me out with this problem?

Thank you!

Segmentation fault when leases are enabled

We're trying to run Assise with leases enabled. We set the -DUSE_LEASE flags in the KernFS and LibFS Makefiles respectively, and set up our file system using emulated NVM as shown in the readme.

Unfortunately, it seems that there is a segmentation fault occuring in iotest when LibFS acquires a lease. This only occurs when leases are active: when they're disabled, the program runs normally. Here is the output of the KernFS and LibFS on the iotest from the readme:

KernFS:

sudo ./run.sh kernfs
initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 4096 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 25584
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
MLFS cluster initialized
[Local-Server] Listening on port 12345 for connections. interrupt (^C) to exit.
Adding connection with sockfd: 0
Adding connection with sockfd: 1
Adding connection with sockfd: 2
RECV <-- MSG_INIT [pid 0]
RECV <-- MSG_INIT [pid 1]
RECV <-- MSG_INIT [pid 2]
[add_peer_socket():80] Peer connected (ip: 127.0.0.1, pid: 25597)
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x7f8baa40f000
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x7f8baa40f000
SEND --> MSG_SHM [paths: /shm_recv_0|/shm_send_0]
start shmem_poll_loop for sockfd 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x7f8baa40f000
SEND --> MSG_SHM [paths: /shm_recv_2|/shm_send_2]
start shmem_poll_loop for sockfd 2
SEND --> MSG_SHM [paths: /shm_recv_1|/shm_send_1]
start shmem_poll_loop for sockfd 1
[discard_leases():933] discarding all leases for peer ID = 1
Connection terminated [sockfd:0]
Connection terminated [sockfd:1]
Exit server_thread 
Exit server_thread 
./run.sh: line 15: 25584 Segmentation fault      LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

LibFS

sudo ./run.sh iotest sw 2G 4K 1
[tid:25597][device_init():50] dev id 1
dev-dax engine is initialized: dev_path /dev/dax0.0 size 4096 MB
[tid:25597][device_init():50] dev id 2
[tid:25597][device_init():50] dev id 3
[tid:25597][cache_init():293] allocating 262144 blocks for DRAM read cache
[tid:25597][read_superblock():504] [dev 1] superblock: size 883712 nblocks 856598 ninodes 300000 inodestart 2 bmap start 27087 datablock_start 27114
fetching node's IP address..
Process pid is 25597
ip address on interface 'lo' is 127.0.0.1
cluster settings:
[tid:25597][register_peer_log():245] assigning peer (ip: 127.0.0.1 pid: 0) to log id 0
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:25597, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
In thread
[Local-Client] Creating connection (pid:25597, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
In thread
[Local-Client] Creating connection (pid:25597, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
[tid:25597][init_rpc():148] awaiting remote KernFS connections
In thread
SEND --> MSG_INIT [pid 0|25597]
SEND --> MSG_INIT [pid 2|25597]
SEND --> MSG_INIT [pid 1|25597]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
[tid:25599][add_peer_socket():63] found socket 1
[tid:25599][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 0)
[tid:25599][_find_peer():206] peer[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 1
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[tid:25598][add_peer_socket():63] found socket 0
[tid:25600][add_peer_socket():63] found socket 2
[tid:25598][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 1)
[tid:25598][_find_peer():191] sockfd[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 0
[tid:25600][_find_peer():176] trying to find peer with ip 127.0.0.1 and pid 0 (peer count: 1 | sock count: 2)
[tid:25600][_find_peer():191] sockfd[0]: ip 127.0.0.1 pid 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x565200a47d30
start shmem_poll_loop for sockfd 2
[tid:25597][rpc_bootstrap():909] peer send: |bootstrap |25597
[tid:25598][signal_callback():1357] received rpc with body: |bootstrap |1 on sockfd 0
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
[tid:25597][init_log():148] end of the log e7c00
init log dev 1 start_blk 916481 end 949248
[tid:25597][ialloc():647] get inode - inum 1
[tid:25597][init_fs():459] LibFS is initialized on dev 1
Total file size: 2147483648B
io size: 4096B
# of thread: 1
[tid:25597][mlfs_posix_mkdir():419] [POSIX] mkdir(/mlfs/)
[tid:25597][namex():274] namex: path /mlfs/, parent 1, name mlfs
[acquire_lease():430] LIBFS ID= 1 trying to acquire lease of type 2 for inum 1
./run.sh: line 5: 25597 Segmentation fault      LD_PRELOAD=../build/libmlfs.so MLFS_PROFILE=1 ${@}

My best guess is that LibFS is having a segfault when it tries to access the lease cache in acquire_lease(). The function calls lcache_find() and looks inside a hash table to find the lease. I have a feeling there is something incorrect about our setup (strata_access@kitten3-lom on the UT cluster), but it might not be since the code works fine with leases off. Do you know where we could start to debug this? Thank you!

Multi-thread Filebench

Hi,

I am interested with assise and currently conducting some build and test work on it.

Here is a problem I met when using filebench as benchmark tool:

I have build the filebench.mlfs executable file in "assise/bench/filebench", and successfully run the "run_varmail.sh" script and get good results.
But when I change the "varmail_mlfs.f", specifically I change the "nrthreads" parameter from 1 to 2, there will be a crash.

As you know, multi-thread performance is always important, and I am really willing to see the results of filebench on assise.

Could you please help me to find out a way to run filebench with multi-thread? Thanks a lot.

Confused about function compute_log_blocks in libfs/src/log/log.c

Hi folks,
When we're reading the function compute_log_blocks in libfs/src/log/log.c, we're confused about the code in case L_TYPE_FILE. For the else case, if size is 3.5 g_block_size, then after shift, only 3 blocks will be added. However, according to our understanding, 4 blocks should be added. Would you mind explaining to us? Thanks!

static uint32_t compute_log_blocks(struct logheader_meta *loghdr_meta)
{
	  struct logheader *loghdr = loghdr_meta->loghdr; 
	  uint8_t type, n_iovec; 
	  uint32_t nr_log_blocks = 0;
	  int i;
	  for (i = 0, n_iovec = 0; i < loghdr->n; i++) {
		  type = loghdr->type[i];
  
		  switch(type) {
			  case L_TYPE_UNLINK:
			  case L_TYPE_INODE_CREATE:
			  case L_TYPE_INODE_UPDATE:
			  case L_TYPE_ALLOC: {
				  nr_log_blocks++;
				  break;
			  } 
			  case L_TYPE_DIR_ADD:
			  case L_TYPE_DIR_RENAME:
			  case L_TYPE_DIR_DEL:
			  case L_TYPE_FILE: {
				  uint32_t size;
				  size = loghdr_meta->io_vec[n_iovec].size;
  
				  if (size < g_block_size_bytes)
					  nr_log_blocks++;
				  else
					  nr_log_blocks += 
						  (size >> g_block_size_shift);
				  n_iovec++;
				  break;
			  }
			  default: {
				  panic("unsupported log type\n");
				  break;
			  }
		  }
	  }
	  return nr_log_blocks;
}

Setting up a cluster with 8 nodes

Hello,

I wish to create an Assise cluster with 8 nodes, with 3-way replication of data. How do I set up the nodes with replicas manually? I am guessing that we need to make certain modifications in libfs/src/distributed/rpc_interface.h.

For example, should I set g_n_hot_reps = 8 (the total number of nodes in the cluster)?

How do I configure which SharedFS replicates which parts of the cached file system namespace?

Thanks for your help.

Removing files doesn't work

Hi,

I have come across an issue where files can't be deleted from the /mlfs directory.

Here is an example of performing ls, followed by rm, and then ls again.

# ~kfirzv/bin/run_with_assise.sh ls -l /mlfs/

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34257
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34257, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34257, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
[Local-Client] Creating connection (pid:34257, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
In thread
SEND --> MSG_INIT [pid 2|34257]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 2
SEND --> MSG_INIT [pid 1|34257]
SEND --> MSG_INIT [pid 0|34257]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 1
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x7bf2a0
start shmem_poll_loop for sockfd 0
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072
total 9216
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_0
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_1
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_2
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_3
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_4
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_5
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_6
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_7
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_8


# -----------------------------


# ~kfirzv/bin/run_with_assise.sh rm -rf /mlfs/mpi_hello_*

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34287
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34287, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34287, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
In thread
[Local-Client] Creating connection (pid:34287, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
SEND --> MSG_INIT [pid 1|34287]
SEND --> MSG_INIT [pid 0|34287]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 1
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 0
SEND --> MSG_INIT [pid 2|34287]
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0x21ac2a0
start shmem_poll_loop for sockfd 2
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072

# --------------------------

# ~kfirzv/bin/run_with_assise.sh ls -l /mlfs/

dev-dax engine is initialized: dev_path /dev/dax5.0 size 512000 MB
fetching node's IP address..
Process pid is 34306
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
Connecting to KernFS instance 0 [ip: 127.0.0.1]
[Local-Client] Creating connection (pid:34306, app_type:0, status:pending) to 127.0.0.1:12345 on sockfd 0
[Local-Client] Creating connection (pid:34306, app_type:1, status:pending) to 127.0.0.1:12345 on sockfd 1
[Local-Client] Creating connection (pid:34306, app_type:2, status:pending) to 127.0.0.1:12345 on sockfd 2
In thread
In thread
In thread
SEND --> MSG_INIT [pid 2|34306]
SEND --> MSG_INIT [pid 0|34306]
SEND --> MSG_INIT [pid 1|34306]
RECV <-- MSG_SHM [paths: /shm_recv_1|/shm_send_1]
RECV <-- MSG_SHM [paths: /shm_recv_0|/shm_send_0]
RECV <-- MSG_SHM [paths: /shm_recv_2|/shm_send_2]
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0xac22a0
start shmem_poll_loop for sockfd 0
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:2 of type:2 and peer:0xac22a0
start shmem_poll_loop for sockfd 2
[add_peer_socket():97] Established connection with 127.0.0.1 on sock:1 of type:1 and peer:0xac22a0
start shmem_poll_loop for sockfd 1
[signal_callback():1370] Assigned LibFS ID=1
MLFS cluster initialized
init log dev 1 start_blk 125564929 end 125827072
total 9216
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_0
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_1
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_2
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_3
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_4
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_5
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_6
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_7
---------- 1 root root 1048576 Jan  1  1970 mpi_hello_world_8

In addition, there seems to be files which are not deleted even after re-allocating the NVRAM between app-direct and memory-mode, re-creating the namespace, and after re-performing mkfs. For instance, performing rm -rf /mlfs/* gives an error that some of these files can't be deleted even though they shouldn't exist anymore.

Is there any way to clean the cache of Assise?

Thanks,
Kfir

Cleanup of threads managing connections to SharedFS

I am trying to run a simple program to init_fs() and shutdown_fs() Assise multiple times in the same process.

But I faced some issues with the socket connections being cleaned up properly. My guess is that Assise previously relied on OS cleaning up the socket descriptors during the cleanup of the process, which will not happen if we reinitialize LibFS in the same process.

I have taken a stab at fixing this and it works for my specific use case where I use it in local mode (strata mode).

Here is the pull request: #19

Would be great if you could review this.

Also, here is the sample program that I was trying to make it work for:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>

#include <mlfs/mlfs_interface.h>


int main() {
    int i;
    for(i=1; i<=3; i++) {
        printf("Init %d \n", i);
        init_fs();

        printf("Shutdown %d \n", i);
        shutdown_fs();
        sleep(3);
    }

    return 0;
}

Thanks!

Benchmark Details

Hi,

I am interested in replicating the benchmark setup as detailed in the Assise paper, and I would like to ask some details about the NFS and CephFS configuration.

In the experimental configuration part , it is stated that

  • Ceph
    • "machines are used as OSD and MDS replicas in Ceph"
    • "Ceph's cluster managers run on 2 additional testbed machines"
  • NFS
    • "NFS uses only one machine as server"

For Ceph,

  1. If the machine number is 2, as used in the write latency microbenchmark
    a. What is the number of data pool, metadata pool, and MDS replicas respectively?
  2. What does the 'Ceph cluster manager' mentioned in the paper refer to, is it the MDS replicas?
    a. If yes, does it mean that the MDS replica number will be 2, because it is run on 2 additional testbed machines?
  3. As the kernel buffer cache is limited to 3GB for all FS, is this only for the kernel page cache size? Are there any specifications on the MDS cache?

For NFS,

  1. If the machine number is 2, does it mean that there are 1 client and 1 server, or 2 clients and 1 server in the cluster?

For both,

  1. I am interested to know how do you set this linux page cache size to 3GB and make sure that the other clients' kernel buffer cache is totally empty before reading? I tried some options when mounting NFS, but it still looks like that the other client does some pre-reading on other writes that happen on other clients, as the benchmark value that I get for a Read-HIT and Read-MISS is the same.

Thank you very much for the kind help!

registeration memory failed with errno: Cannot allocate memory

I'm using two nodes to run Assise and use 60G DRAM to emulate NVM.

I got the error ibv_reg_mr failed [error code: 12]. This is because ibv_reg_mr can't work with a large size.

initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 49152 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 34400
ip address on interface 'ib0' is 10.10.1.3
cluster settings:
--- node 0 - ip:10.10.1.3
--- node 1 - ip:10.10.1.2
Connecting to KernFS instance 1 [ip: 10.10.1.2]
[RDMA-Client] Creating connection (pid:0, app_type:0, status:pending) to 10.10.1.2:12345 on sockfd 0
[RDMA-Client] Creating connection (pid:0, app_type:1, status:pending) to 10.10.1.2:12345 on sockfd 1
[RDMA-Client] Creating connection (pid:0, app_type:2, status:pending) to 10.10.1.2:12345 on sockfd 2
[RDMA-Server] Listening on port 12345 for connections. interrupt (^C) to exit.
creating background thread to poll completions (blocking) test
register memory
registering mr #0 with addr:140431182528512 and size:4299161600
registeration failed with errno: Cannot allocate memory
ibv_reg_mr failed [error code: 12]

I keep reducing the g_log_size to 4096, then it works. Not sure how can I use a larger log size?

Thanks

about zookeeper

When running assise, do I need to configure zookeeper first?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.