Hi, Setup I am trying to set up a simple cluster

Hi Waleed, Thank you very much for the checks in <code class="notran

Sorry for the delayed reply! Last few weeks were hectic. <ol dir="aut

Setting up Cluster with Multiple Nodes - Segmentation Fault about assise HOT 12 OPEN

ut-osa commented on July 20, 2024

Setting up Cluster with Multiple Nodes - Segmentation Fault

from assise.

Comments (12)

wreda commented on July 20, 2024 1

I assume you weren't able to run the RPC test. If so, then the error is not Assise-related. The LD_PRELOAD or use of emulated NVM shouldn't be a factor here.

I haven't encountered this particular error myself but, if I had to guess, it could simply be a driver issue. It might make sense to first check whether the MLNX_OFED drivers are properly installed and that the required modules are loaded in your kernel (e.g. libmlx5, libmlx4). That could be the culprit. If that doesn't help, you can try posting this on the Mellanox community forums.

from assise.

wreda commented on July 20, 2024

I think this is likely due to Assise not finding the proper interface. Can you change rdma_intf at rpc_interface.h#L24 to your RDMA network interface name and rebuild? I presume in your case that should be enp65s0f0.

from assise.

agnesnatasya commented on July 20, 2024

Hi Waleed,

Result

Thank you very much for your help, I set rdma_intf = enp65s0f0 on both nodes, and I changed the utils/rdma_setup.sh from ib0 to enp65s0f0 too but it still seg faults.

The error message is a little bit different, on the 10.10.1.3 node, it says

initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 8192 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 19046
ip address on interface 'enp65s0f0' is 10.10.1.3
cluster settings:
--- node 0 - ip:10.10.1.2
--- node 1 - ip:10.10.1.3
./run.sh: line 15: 19046 Segmentation fault      LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

On the 10.10.1.2 node it says

initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 8192 MB
Reading root inode with inum: 1fetching node's IP address..
Process pid is 9886
ip address on interface 'enp65s0f0' is 10.10.1.2
cluster settings:
--- node 0 - ip:10.10.1.2
--- node 1 - ip:10.10.1.3
Connecting to KernFS instance 1 [ip: 10.10.1.3]
./run.sh: line 15:  9886 Segmentation fault      LD_LIBRARY_PATH=../build:../../libfs/lib/nvml/src/nondebug/ LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 MLFS_PROFILE=1 numactl -N0 -m0 $@

There is an additional Connecting to KernFS instance 1 [ip: 10.10.1.3].

Debugging

Through GDB, it also looks like the rdma_cm_id struct is still NULL when rdma_bind_addr or rdma_resolve_addr is called.
The values of the other seen variables are as follows
add_connection (ip=0x7ffff5335124 "10.10.1.3", port=0x7ffff521f010 "12345", app_type=0, pid=0, ch_type=<optimized out>, polling_loop=1)
and
addr= {sin6_family = 10, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}

Do you happen to know what is the cause of this problem? Does it have something to do with connecting to port on the other node? I have allowed port 12345 on both nodes.

Thank you very much for your help!

from assise.

wreda commented on July 20, 2024

Thanks for the debugging effort! I suspect this is likely a firewall issue.

To test connectivity, you can try running the RPC application in lib/rdma/tests/ and see if it also produces an error. You can use the following commands: ./rpc_client <ip> <port> <iters> and ./rpc_server <port> . I've added additional checks to libfs/lib/rdma/agent.c to avoid segfaults; the error codes might help indicate the issue.

from assise.

agnesnatasya commented on July 20, 2024

Hi Waleed,

Thank you very much for the checks in libfs/lib/rdma/agent.c! After running using the new version, I received an error code 19, it looks like Assise is unable to find the device.

Debugging

Here are some of my debugging effort

I traced again using GDB, I found out that rdma_event_channel ec is NULL when rdma_create_id() is called, which I suspect might be the reason why rdma_create_id() failed. After that call, the returned result is -1, error code is 19 and rdma_cm_id = NULL
- I tried to changelibfs/lib/rdma-core/librdmacm/cma.c's rdma_create_event_channel() function
  - I changed the device name from /dev/infiniband/rdma_cm to /dev/dax0.0 (the name of the DAX in my machine)
  - I printed some lines but it does not print out anything, I think some of the binaries might not be removed during make clean and hence not rebuilt during cd deps; ./install_deps.sh; cd ... However, I do check the libfs/lib/rdma-core/build and it's properly rebuilt, so I'm not too sure what's the cause of it not showing my newest change tot he code.
- I am also a little bit not sure about the LD_PRELOAD variable. Is it supposed to be LD_PRELOAD=../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 or LD_PRELOAD="../../libfs/lib/jemalloc-4.5.0/lib/libjemalloc.so.2 ../../libfs/build/libmlfs.so"?
I also thought of another point of failure, which is sockaddr_in6 addr, which is a IPv6 socket, while the IP that I provide in rpc_interface.h is IPv4. However, I think this is not the problem that causes rdma_create_id() to fail because this function does not use the variable addr

Changes

Regarding your previous suggestion on firewall, it was a great suggestion, thank you! I realised the firewall enabled was on a different network interface. I've enabled incoming and outgoing to and from port 12345 for both nodes on the network interface used by the RDMA enp65s0f0 in my case, but I still receive the above error.

Further information

I am also using an NVM emulation instead of an actual NVM.

Do you have any idea regarding the above error? Thank you very much for your help!

from assise.

agnesnatasya commented on July 20, 2024

Hi Waleed,

Thank you very much, it was indeed the error, I did not have my RDMA set up yet, I was not aware about it during the setup. Do you mind if I add a sentence or two mentioning that properly configured RDMA device and interfaces is a prerequisite?

from assise.

wreda commented on July 20, 2024

Thanks for confirming.

Do you mind if I add a sentence or two mentioning that properly configured RDMA device and interfaces is a prerequisite?

Absolutely! The README can definitely benefit from this. Feel free to do a pull request and I'll merge.

from assise.

agnesnatasya commented on July 20, 2024

Thank you Waleed for that!

Do you mind if I clarify some things with regards to Assise to help me write a proper additional setup instruction?

I assume that the KernFS in this repository is equivalent to the SharedFS in the original paper. Is this correct?
I am a little bit confused why there isn't a cluster manager in this Github setup. Is it because this prototype only supports hot replicas, and that every nodes defined in rpc_interface.h hot_replicas[] are hot replicas, hence there is no need to set up a separate cluster manager?
Is all nodes part of the all the other nodes' replication chain in the general workload setup? Or is this supposed to be determined by the cluster manager's policy? If my assumptions on question 2 is correct, is all nodes in hot_replicas[] part of all the other nodes replication chain since there is no cluster manager

Thank you very much Waleed for your kind help in clarifying about this!

from assise.

wreda commented on July 20, 2024

Sorry for the delayed reply! Last few weeks were hectic.

I assume that the KernFS in this repository is equivalent to the SharedFS in the original paper. Is this correct?

Yes, that's correct.

I am a little bit confused why there isn't a cluster manager in this Github setup. Is it because this prototype only supports hot replicas, and that every nodes defined in rpc_interface.h hot_replicas[] are hot replicas, hence there is no need to set up a separate cluster manager?

Our prototype currently doesn't come with an interface to the cluster manager (zookeeper). Only hot replicas, as you noted, are supported as of now.

Is all nodes part of the all the other nodes' replication chain in the general workload setup? Or is this supposed to be determined by the cluster manager's policy? If my assumptions on question 2 is correct, is all nodes in hot_replicas[] part of all the other nodes replication chain since there is no cluster manager

Thank you very much Waleed for your kind help in clarifying about this!

Correct, all nodes defined in hot_replicas are part of the same replica group.

from assise.

agnesnatasya commented on July 20, 2024

Thanks a lot Waleed for the clarification!

from assise.

caposerenity commented on July 20, 2024

Hi Waleed,

Thank you very much, it was indeed the error, I did not have my RDMA set up yet, I was not aware about it during the setup. Do you mind if I add a sentence or two mentioning that properly configured RDMA device and interfaces is a prerequisite?

@agnesnatasya
Hi, I met the same problem of segmentation fault, and I found that it seems to be caused by rdma_cm_id = NULL . Could you please tell more details about your solution of setting up RDMA? Thanks a lot~

from assise.

agnesnatasya commented on July 20, 2024

Hi @caposerenity! Sure! For me, I have a lab cluster that has Mellanox adapter installed on it, and also the Infiniband drivers installed. I use that to establish the RDMA connection between the nodes.
If you have machines with Mellanox adapter installed, but not the drivers, you can try installing the driver through some online guides, depending on the version of the device, one of the documentations is here https://network.nvidia.com/related-docs/prod_software/Mellanox_IB_OFED_Driver_for_VMware_vSphere_User_Manual_Rev_1_8_1.pdf, but you can also find more casual tutorials online.
If you do not have machines with Mellanox adapter, I am not sure if there is a workaround. You can definitely run a single node Assise, which is similar to Strata (a local filesystem).

from assise.

Setting up Cluster with Multiple Nodes - Segmentation Fault about assise HOT 12 OPEN

Comments (12)

Result

Debugging

Debugging

Changes

Further information

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent