Comments (4)
from assise.
I've added myself as a watcher, so I should be getting notifications.
@hayley-leblanc : There's no need to disable the DISTRIBUTED
flag as it has been deprecated. The steps you followed in the README should be sufficient. Since you've modified the storage configuration, I'd first double-check that you rebuilt both LibFS/KernFS and reran mkfs.sh successfully.
If you already did that, I'll likely need more context to know what might be causing this. Can you rerun KernFS in gdb and share the stack trace? You will need to first recompile KernFS with the -g flag.
from assise.
I double checked that I cleaned and rebuilt LibFS and KernFS, ran change_dev_size.py, re-ran mkfs.sh, etc. with the new configurations, but I'm still running into the issue. Here's the output from running KernFS in gdb:
Starting program: /usr/bin/numactl -N0 -m0 kernfs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
process 3005 is executing new program: /home/novavm/vmshare/assise/kernfs/tests/kernfs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
initialize file system
dev-dax engine is initialized: dev_path /dev/dax0.0 size 3072 MB
[New Thread 0x7fff371ff700 (LWP 3009)]
[New Thread 0x7fff369fe700 (LWP 3010)]
[New Thread 0x7fff361fd700 (LWP 3011)]
[New Thread 0x7fff359fc700 (LWP 3012)]
[New Thread 0x7fff351fb700 (LWP 3013)]
[New Thread 0x7fff349fa700 (LWP 3014)]
[New Thread 0x7fff341f9700 (LWP 3015)]
[New Thread 0x7fff339f8700 (LWP 3016)]
[New Thread 0x7fff331f7700 (LWP 3017)]
Reading root inode with inum: 1fetching node's IP address..
Process pid is 3005
ip address on interface 'lo' is 127.0.0.1
cluster settings:
--- node 0 - ip:127.0.0.1
[New Thread 0x7fff329f6700 (LWP 3020)]
MLFS cluster initialized
[Local-Server] Listening on port 12345 for connections. interrupt (^C) to exit.
Adding connection with sockfd: 0
[New Thread 0x7fff321f5700 (LWP 3031)]
Adding connection with sockfd: 1
RECV <-- MSG_INIT [pid 0]
[New Thread 0x7fff319f4700 (LWP 3032)]
[add_peer_socket():80] Peer connected (ip: 127.0.0.1, pid: 3025)
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:0 of type:0 and peer:0x7fff30e0f000
RECV <-- MSG_INIT [pid 2]
Adding connection with sockfd: 2
SEND --> MSG_SHM [paths: /shm_recv_0|/shm_send_0]
start shmem_poll_loop for sockfd 0
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:1 of type:2 and peer:0x7fff30e0f000
SEND --> MSG_SHM [paths: /shm_recv_1|/shm_send_1]
start shmem_poll_loop for sockfd 1
[New Thread 0x7fff30bff700 (LWP 3033)]
RECV <-- MSG_INIT [pid 1]
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:2 of type:1 and peer:0x7fff30e0f000
SEND --> MSG_SHM [paths: /shm_recv_2|/shm_send_2]
start shmem_poll_loop for sockfd 2
00000000000000000000000000000001
[New Thread 0x7fff2ffff700 (LWP 3034)]
[New Thread 0x7fff2f7fe700 (LWP 3035)]
Adding connection with sockfd: 3
[New Thread 0x7fff2effd700 (LWP 3048)]
Adding connection with sockfd: 4
RECV <-- MSG_INIT [pid 0]
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:3 of type:0 and peer:0x7fff30e0f000
[New Thread 0x7fff2e7fc700 (LWP 3049)]
SEND --> MSG_SHM [paths: /shm_recv_3|/shm_send_3]
Adding connection with sockfd: 5
RECV <-- MSG_INIT [pid 2]
start shmem_poll_loop for sockfd 3
[New Thread 0x7fff2dbff700 (LWP 3050)]
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:4 of type:2 and peer:0x7fff30e0f000
SEND --> MSG_SHM [paths: /shm_recv_4|/shm_send_4]
start shmem_poll_loop for sockfd 4
RECV <-- MSG_INIT [pid 1]
[add_peer_socket():98] Established connection with 127.0.0.1 on sock:5 of type:1 and peer:0x7fff30e0f000
SEND --> MSG_SHM [paths: /shm_recv_5|/shm_send_5]
start shmem_poll_loop for sockfd 5
00000000000000000000000000000011
[New Thread 0x7fff2cdff700 (LWP 3051)]
[New Thread 0x7fff2c5fe700 (LWP 3052)]
Thread 17 "kernfs" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff2effd700 (LWP 3048)]
0x00007ffff7f4b9dd in init_replication (remote_log_id=remote_log_id@entry=2, peer=0x7ffff746d0c0, begin=begin@entry=644609, size=size@entry=906753, addr=addr@entry=0, end=0x7fff2dc0f020) at ./global/mem.h:36
36 return calloc(1, size);
And the stack trace:
#0 0x00007ffff7f4b9dd in init_replication (
remote_log_id=remote_log_id@entry=2, peer=0x7ffff746d0c0,
begin=begin@entry=644609, size=size@entry=906753, addr=addr@entry=0,
end=0x7fff2dc0f020) at ./global/mem.h:36
#1 0x00007ffff7f4d24b in register_peer_log (peer=0x7fff30e0f000,
find_id=<optimized out>) at distributed/peer.c:271
#2 0x00007ffff7f57d31 in signal_callback (msg=0x7ffff789f008) at fs.c:2389
#3 0x00007ffff7b11e09 in shmem_poll_loop (sockfd=sockfd@entry=3)
at shmem_ch.c:106
#4 0x00007ffff7b121a6 in local_server_thread (arg=<optimized out>)
at shmem_ch.c:339
#5 0x00007ffff7d18609 in start_thread (arg=<optimized out>)
at pthread_create.c:477
#6 0x00007ffff7e54293 in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
from assise.
It seems your segfault was due to an outdated mkdir_user
script. It was calling init_fs()
explicitly, which is not needed in the case of Assise (since this function is called automatically by LibFS). I've introduced a patch that addresses this.
Please pull and rebuild LibFS, KernFS, and the tests directory. Let me know if you're still having issues.
from assise.
Related Issues (20)
- about zookeeper HOT 6
- Not using Assise's libpmem during runtime. HOT 2
- registeration memory failed with errno: Cannot allocate memory HOT 2
- Compile errors HOT 1
- Confused about function compute_log_blocks in libfs/src/log/log.c HOT 2
- How to set up log recovery in Assise? HOT 4
- Access control in Assise HOT 2
- Segmentation fault when leases are enabled HOT 5
- Cleanup of threads managing connections to SharedFS HOT 1
- Updation of number of inode links not persisted HOT 4
- Hitting assertion failure during replication HOT 3
- Setting up a cluster with 8 nodes HOT 6
- Setting up Cluster with Multiple Nodes - Segmentation Fault HOT 12
- Memory Region Failure ibv_reg_mr failed [error code: 14] HOT 3
- Benchmark Details HOT 5
- Multi-thread Filebench HOT 1
- Running the warm replicas experiment from the paper
- Segmentation fault on pthread_create when initializing connections
- I see that the global lru list "g_fcache_head" is not protected by locks, which may cause something wrong? HOT 1
- Can we run assise directly on dram? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assise.