Coder Social home page Coder Social logo

networkblockdevice / nbd Goto Github PK

View Code? Open in Web Editor NEW
435.0 32.0 114.0 5.3 MB

Network Block Device

License: GNU General Public License v2.0

Shell 3.23% C 91.26% Makefile 1.58% M4 3.64% Emacs Lisp 0.02% Lex 0.09% Yacc 0.17%
nbd c storage network

nbd's Introduction

NBD README

Welcome to the NBD userland support files!

This package contains nbd-server and nbd-client.

To install the package, download the source and do the normal configure/make/make install dance. You'll need to install it on both the client and the server. Note that released nbd tarballs are found on sourceforge.

For compiling from git, do a checkout, install the SGML tools (docbook2man), and then run './autogen.sh' while inside your checkout. Then, see above.

Contributing

If you want to send a patch, please do not open a pull request; instead, send it to the mailinglist

Security issues

If you think you found a security problem in NBD, please contact the mailinglist. Do not just file an issue for this (although you may do so too if you prefer).

For embargoed issues, please contact Wouter Verhelst [email protected]

Using NBD

NBD is quite easy to use. First, on the client, you need to load the module and, if you're not using udev, to create the device nodes:

# modprobe nbd
# cd /dev
# ./MAKEDEV nbd0

(if you need more than one NBD device, repeat the above command for nbd1, nbd2, ...)

Next, write a configuration file for the server. An example looks like this:

# This is a comment
[generic]
    # The [generic] section is required, even if nothing is specified
    # there.
    # When either of these options are specified, nbd-server drops
    # privileges to the given user and group after opening ports, but
    # _before_ opening files.
    user = nbd
    group = nbd
[export1]
    exportname = /export/nbd/export1-file
    authfile = /export/nbd/export1-authfile
    timeout = 30
    filesize = 10000000
    readonly = false
    multifile = false
    copyonwrite = false
    prerun = dd if=/dev/zero of=%s bs=1k count=500
    postrun = rm -f %s
[otherexport]
    exportname = /export/nbd/experiment
    # The other options are all optional

The configuration file is parsed with GLib's GKeyFile, which parses key files as they are specified in the Freedesktop.org Desktop Entry Specification, as can be found at http://freedesktop.org/Standards/desktop-entry-spec. While this format was not intended to be used for configuration files, the glib API is flexible enough for it to be used as such.

Now start the server:

nbd-server -C /path/to/configfile

Note that the filename must be an absolute path; i.e., something like /path/to/file, not ../file. See the nbd-server manpage for details on any available options.

Finally, you'll be able to start the client:

nbd-client <hostname> -N <export name> <nbd device>

e.g.,

nbd-client 10.0.0.1 -N otherexport /dev/nbd0

will use the second export in the above example (the one that exports /export/nbd/experiment)

nbd-client must be ran as root; the same is not true for nbd-server (but do make sure that /var/run is writeable by the server that nbd-server runs as; otherwise, you won't get a PID file, though the server will keep running).

There are packages (or similar) available for most current operating systems; see the "Packaging status" badge below for details.

For questions, please use the [email protected] mailinglist.

Alternate implementations

Besides this project, the NBD protocol has been implemented by various other people. A (probably incomplete) list follows:

  • nbdkit is a multithreaded NBD server with a plugin architecture.
  • libnbd is a library to aid in writing NBD clients
  • qemu contains an embedded NBD server, an embedded NBD client, and a standalone NBD server (qemu-nbd). They maintain a status document of their NBD implementation.
  • A GEOM gate-based client implementation for FreeBSD exists. It has not seen any updates since 2018, and only implements the client side (any server should run on FreeBSD unmodified, however).
  • A Windows client implementation exists as part of the RBD implementation of Ceph for Windows.
  • lwNBD is a NBD server library, targetting bare metal or OS embedded system. It has a plugin architecture.

Additionally, these implementations once existed but are now no longer maintained:

  • xnbd: This was an NBD implementation with a few extra protocol messages that allowed for live migration. Its code repository has disappeared.
  • enbd: This was an NBD implementation with a few extra protocol messages that allowed extra ioctl calls to be passed on (e.g., the "eject" message for a CD-ROM device that was being exported through NBD). It appears to no longer be maintained.
  • Hurd translator: There was a proof-of-concept implementation of the NBD protocol once as a translator for The Hurd. We do not know what its current status is.
  • Christoph Lohmann once wrote a client implementation for Plan 9. The link he provided us is now stale; we do not know what its current status is.

Badges

Download Network Block Device Coverity Scan Build Status CII badge Travis

Packaging status

nbd's People

Contributors

abligh avatar ahippo avatar alexeicolin avatar anoo1 avatar bignaux avatar bonzini avatar corvuscorax avatar danielkucera avatar ebblake avatar eworm-de avatar fanael avatar ffontaine avatar folkertvanheusden avatar fstirlitz avatar hailfinger avatar jacmet avatar jan-krieg avatar joshtriplett avatar juhaerk avatar ldv-alt avatar marekpikula avatar merovius avatar panarom avatar reidrac avatar scrizt avatar tuomasjjrasanen avatar vapier avatar vivier avatar waveform80 avatar yoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nbd's Issues

A tiny issue in the protocol doc

NBD_CMD_FLUSH is modelled on the Linux kernel empty bio with REQ_FLUSH set. NBD_CMD_FLAG_FUA is modelled on the Linux kernel bio with REQ_FUA set. In case of ambiguity in this specification, the kernel documentation may be useful.

I noticed the Linux kernel has renamed REQ_FLUSH to REQ_REFLUSH...
torvalds/linux@28a8f0d

Ubuntu 17.10 can't seem to get nbd working, getting segfault

I wanted to try nbd to add some swap space to a older machine that only has 512MB.

I have an ubuntu 17.10 machine to which I've installed nbd-client and nbd-server with "sudo apt install nbd-client nbd-server"

I've made a 4GB file in a ramdisk with tmpfs:

sudo mkdir /mnt/a
sudo mount -t tmpfs -o size=4096M tmpfs /mnt/a
dd if=/dev/zero of=/mnt/a/NBDFILE count=$((1024*1024*1024*4/512)) status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 10.2248 s, 420 MB/s

and then I launch a nbd-server

nbd-server -C /dev/null 9000 /mnt/a/NBDFILE

and I'll try to connect with it using 127.0.0.1

sudo  nbd-client 127.0.0.1 9000 /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..Error: Read failed: End of file
Exiting.

and when I look in my dmesg log, there is a segfault:

[451892.201945] nbd-server[18901]: segfault at 0 ip 00007feb584e1cbe sp 00007ffdf2e6d818 error 4 in libc-2.26.so[7feb58441000+1d6000]

I also tried two systems running Debian 9.3 32bit and got the same Read failed: End of file error.

Am I doing something wrong?

I tried it with sudo nbd-server and it didn't make a difference, still a segfault:

[452796.954020] nbd-server[19156]: segfault at 0 ip 00007f649bf6fcbe sp 00007ffe2e3652a8 error 4 in libc-2.26.so[7f649becf000+1d6000]
[452913.048412] nbd-server[19177]: segfault at 0 ip 00007f4ea6827cbe sp 00007ffd0d86d888 error 4 in libc-2.26.so[7f4ea6787000+1d6000]

===============================================

sudo apt install nbd-client
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
nbd-client
0 upgraded, 1 newly installed, 0 to remove and 81 not upgraded.
Need to get 34.3 kB of archives.
After this operation, 128 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu artful/universe amd64 nbd-client amd64 1:3.15.2-3 [34.3 kB]
Fetched 34.3 kB in 0s (103 kB/s)
Preconfiguring packages ...
Selecting previously unselected package nbd-client.
(Reading database ... 402621 files and directories currently installed.)
Preparing to unpack .../nbd-client_1%3a3.15.2-3_amd64.deb ...
Unpacking nbd-client (1:3.15.2-3) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Setting up nbd-client (1:3.15.2-3) ...
update-initramfs: deferring update (trigger activated)
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for systemd (234-2ubuntu12.1) ...
Processing triggers for initramfs-tools (0.125ubuntu12) ...
update-initramfs: Generating /boot/initrd.img-4.13.0-38-generic

sudo apt install nbd-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
nbd-server
0 upgraded, 1 newly installed, 0 to remove and 81 not upgraded.
Need to get 52.5 kB of archives.
After this operation, 164 kB of additional disk space will be used.
Get:1 http://us.archive.ubuntu.com/ubuntu artful/main amd64 nbd-server amd64 1:3.15.2-3 [52.5 kB]
Fetched 52.5 kB in 0s (125 kB/s)
Preconfiguring packages ...
Selecting previously unselected package nbd-server.
(Reading database ... 402607 files and directories currently installed.)
Preparing to unpack .../nbd-server_1%3a3.15.2-3_amd64.deb ...
Unpacking nbd-server (1:3.15.2-3) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up nbd-server (1:3.15.2-3) ...

Creating config file /etc/nbd-server/config with new version
Adding system user nbd' (UID 127) ... Adding new group nbd' (GID 136) ...
Adding new user nbd' (UID 127) with group nbd' ...
Not creating home directory `/etc/nbd-server'.
Processing triggers for systemd (234-2ubuntu12.1) ...
Processing triggers for man-db (2.7.6.1-2) ...
Processing triggers for ureadahead (0.100.0-20) ...

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 17.10
Release: 17.10
Codename: artful

Export config parameter "maxconnections" seems to apply globally, not per export

From man -s5 nbd-server:

       maxconnections
              Optional; integer

              If specified, then it limits the number of opened connections for this export.

In real world usage, in the version supplied by ubuntu xenial (16.04), I observe that the maximum connections are enforced as a total for all exports, rather than per-export, as would be expected for an export-level option, and as documented here.

Build fails on Illumos/OpenSolaris

nbd-server won't build on Illumos/OpenSolaris because of the lack of d_type (and the DT_constants). The problem (and a solution) is explained here: http://blueslugs.com/2010/06/13/adirent-ch-adding-d_type-to-struct-dirent-on-opensolaris/

Currently I worked around this by commenting out the while loop in do_cfile_dir (I don't use that feature anyway).

I'd be happy to provide an Illumos/OmniOS VM if anyone as time to tackle this down.

nbd-server.c: In function โ€˜do_cfile_dirโ€™:
nbd-server.c:529:12: error: โ€˜struct direntโ€™ has no member named โ€˜d_typeโ€™
   switch(de->d_type) {
            ^
nbd-server.c:530:9: error: โ€˜DT_UNKNOWNโ€™ undeclared (first use in this function)
    case DT_UNKNOWN:
         ^
nbd-server.c:530:9: note: each undeclared identifier is reported only once for each function it appears in
nbd-server.c:541:9: error: โ€˜DT_REGโ€™ undeclared (first use in this function)
    case DT_REG:
         ^
nbd-server.c: In function โ€˜send_replyโ€™:
nbd-server.c:1254:3: warning: initialization from incompatible pointer type [enabled by default]
   { &magic, sizeof(magic) },
   ^
nbd-server.c:1254:3: warning: (near initialization for โ€˜v_data[0].iov_baseโ€™) [enabled by default]
nbd-server.c:1255:3: warning: initialization from incompatible pointer type [enabled by default]
   { &opt, sizeof(opt) },
   ^
nbd-server.c:1255:3: warning: (near initialization for โ€˜v_data[1].iov_baseโ€™) [enabled by default]
nbd-server.c:1256:3: warning: initialization from incompatible pointer type [enabled by default]
   { &reply_type, sizeof(reply_type) },
   ^
nbd-server.c:1256:3: warning: (near initialization for โ€˜v_data[2].iov_baseโ€™) [enabled by default]
nbd-server.c:1257:3: warning: initialization from incompatible pointer type [enabled by default]
   { &datsize, sizeof(datsize) },
   ^
nbd-server.c:1257:3: warning: (near initialization for โ€˜v_data[3].iov_baseโ€™) [enabled by default]
make[2]: *** [nbd_server-nbd-server.o] Error 1
make[2]: Leaving directory `/root/nbd'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/root/nbd'
make: *** [all] Error 2

inetd mode is broken

Whenever nbd-server is launched in inetd mode (passing 0 as the port on the command line), it immediately exits, after sending these two messages to syslog:

getsockname failed: Bad file descriptor
Failed to set peername

I managed to make it work by applying this patch (against 3.17):

--- nbd-3.17/nbd-server.c
+++ nbd-3.17/nbd-server.c
@@ -3538,19 +3538,6 @@ int main(int argc, char *argv[]) {
 
        if(serve) {
                g_array_append_val(servers, *serve);
-
-               if(strcmp(genconf.modernport, "0")==0) {
-#ifndef ISSERVER
-                       err("inetd mode requires syslog");
-#endif
-                       CLIENT* client = g_malloc(sizeof(CLIENT));
-                       client->net = -1;
-                       if(!commit_client(client, serve)) {
-                               exit(EXIT_FAILURE);
-                       }
-                       mainloop_threaded(client);
-                       return 0;
-               }
        }
     
        if(!servers || !servers->len) {
@@ -3594,5 +3581,18 @@ int main(int argc, char *argv[]) {
 #endif
                                        ));
 #endif
+
+       if(strcmp(genconf.modernport, "0")==0) {
+#ifndef ISSERVER
+               err("inetd mode requires syslog");
+#endif
+               CLIENT* client = negotiate(0, servers, &genconf);
+               if(!client) {
+                       exit(EXIT_FAILURE);
+               }
+               mainloop_threaded(client);
+               return 0;
+       }
+
        serveloop(servers, &genconf);
 }

But I feel this patch may be too heavy-handed, so I'm not submitting it as a pull request.

copy on write for block device

Is there a way to write the diff file in a specific directory ? i am exporting /dev/mapper/vgXX-lvXX block device and /dev/mapper is polluted by *.diff files.

thanks

Negotiation: Error: INIT_PASSWD bad

Server running on a router with LEDE and nbd-server 3.11

root@casa:~# nbd-server 192.168.21.1:9000 /dev/sdb

** (process:2534): WARNING **: Specifying an export on the command line is deprecated.        

** (process:2534): WARNING **: Please use a configuration file instead.                       
** Message: virtstyle ipliteral                
** Message: connect from 192.168.21.112, assigned file is /dev/sdb                            
** Message: Can't open authorization file /etc/nbd-server/allow (No such file or directory).  
** Message: Authorized client                  
** Message: Starting to serve                  
** Message: Size of exported file/device is 240057409536                                      
Error: Read failed: Connection reset by peer   
Exiting.                                  

Client running on Archlinux with nbd-client version 3.15.3

~/P/nbd โฏโฏโฏ sudo nbd-client 192.168.21.1 9000 /dev/nbd0                                                                                                             โŽtags/nbd-3.11^0 โœญ โœš โœฑ โ—ผ 
Warning: the oldstyle protocol is no longer supported.                                        
This method now uses the newstyle protocol with a default export                              
Negotiation: Error: INIT_PASSWD bad            
Exiting.   

Error with version compiled from git tag nbd-3.11:

~/P/nbd โฏโฏโฏ sudo ./nbd-client -N ssd 192.168.21.1 9000 /dev/nbd0                                                                                                    โŽtags/nbd-3.11^0 โœญ โœš โœฑ โ—ผ 
Negotiation: Error: Server closed connection   
Exiting.

3.15 server and 3.17 client compatibility

I have an nbd sever 3.15 on debian stable and 3.17 client on debian sid. After some update it become broken. (Unfortunately I don't know the previous version that works before update)

# pv /dev/nbd0 > /dev/null
pv: /dev/nbd0: read failed: Input/output error

client's dmesg:

block nbd0: Connection timed out
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 54784
print_req_error: I/O error, dev nbd0, sector 54528
print_req_error: I/O error, dev nbd0, sector 54528
Buffer I/O error on dev nbd0, logical block 6816, async page read

server's syslog:

nbd_server[3507]: Read failed: End of file
nbd_server[3507]: Exiting.
nbd_server[1665]: Child exited with 1

3.15 client on debian stable works fine.

Should 3.17 client work with 3.15 server or this is correct behavior?

Protocol does not guarantee data consistency for on-the-fly commands

From the protocol description: "The server MAY process commands out
of order". There are a few conditions specified, but nothing about
guarantees given on the ordering of actions performed for commands.

Specifically, there is no guarantee (or lack thereof) expressed about
the consistency of data when two or more commands are on-the-fly, have
overlapping ranges and at least one is a write command (write,
write-zeroes, trim).

So the questions:

  1. Is that something that the protocol should specify?

  2. What is the behaviour of the reference implementation? What do the
    various clients (Linux, qemu) expect?

Note: two overlapping ranges A and B can be arranged as such:

A   [----]  [----]  [----]  [------]  [----]      [----]
B   [----]  [--]      [--]    [--]      [----]  [----]

RO block device redundancy

Is it possible to augment NBD to support multiple servers for the reads on a RO block device? Ideally, what I'd like to do is have NBD be able to read from multiple NBD physical servers. This could do two things, one spread the load over more then one machine and provide redundancy should one fail. Obviously if writes were involved you couldn't easily do this, but I only need reads. I'm doing the copy-on-write on the client using device-mapper rather inside NBD.

tls and tlshuge tests failed during 'make check'

nbd-3.15.2

$ make check
....
./tls
TLS handshake failed: Error in the pull function.

** (process:27332): WARNING **: Could not open socket: Could not read size: Resource temporarily unavailable

** (process:27332): WARNING **: Could not read size: Resource temporarily unavailable

** (process:27332): WARNING **: Could not run test: Could not read size: Resource temporarily unavailable
FAIL: tls
./tlshuge
TLS handshake failed: Error in the pull function.

** (process:27351): WARNING **: Could not open socket: Could not read size: Resource temporarily unavailable

** (process:27351): WARNING **: Could not read size: Resource temporarily unavailable

** (process:27351): WARNING **: Could not run test: Could not read size: Resource temporarily unavailable
FAIL: tlshuge

New NBD protocol based on HTTP/2

So, is there any chances NBD to switch to HTTP/2 protocol. The more I work with NBD and etcd (grpc API which mean HTTP/2), the more I guess that HTTP/2 is VERY suitable for NBD.

Pros:

  • More smooth flow control, completely eliminate head-of-line-blocking
  • Standard protocol for chunking and multiplexing streams.
  • Easier extending
  • Easy implementation of exporting multiple devices (and list them) over one tcp-connection.
  • Binary protocol initially built for fast parsing.
  • It has RFC :)
  • It is not conceptually different protocol (comparing to NBD), so rewriting will not require huge redesigning/rearchitecting.

Cons:

  • Binary protocol that hard to write from ground (there are HTTP/2 libraries for different languages)
  • It's incompatible with current implementation by protocol (not by features)

So it there any work or any thoughts about all that ?

"Buffer I/O error" regression between kernels 4.11 and 4.12

Using Ubuntu 16.04, nbd-server and client=1:3.13-1.

a) With mainline (vanilla) kernel 4.11.12 on the client, the commands below run fine without errors.
b) Then I upgrade to mainline kernel 4.12.0, and I run again:

modprobe nbd
nbd-client server-ip -N /opt/ltsp/i386 /dev/nbd5
dmesg

And I see the following errors:

[ 73.824873] nbd: registered device at major 43
[ 84.791001] nbd5: detected capacity change from 0 to 20936916992
[ 84.791071] block nbd5: Attempted send on invalid socket
[ 84.791077] blk_update_request: I/O error, dev nbd5, sector 0
[ 84.791080] Buffer I/O error on dev nbd5, logical block 0, async page read
<the 3 lines above repeated 10 times>
[ 84.791132] block nbd5: Attempted send on invalid socket
[ 84.791133] blk_update_request: I/O error, dev nbd5, sector 2
[ 84.791134] Buffer I/O error on dev nbd5, logical block 1, async page read
[ 84.791140] ldm_validate_partition_table(): Disk read failed.
[ 84.791175] Dev nbd5: unable to read RDB block 0
[ 84.791228] nbd5: unable to read partition table

I can reproduce this in many installations, real or VMs.
Thanks!

postrun isn't called on unclean client disconnections

If I run nbd-client server-ip -N swap /dev/nbd5, and then killall nbd-client (or in general disconnect without using nbd-client -d), the server-side postrun action isn't called.

This affects LTSP, which uses the postrun action to clean up NBD swap files:
https://bugs.launchpad.net/ltsp/+bug/1686062
While normally LTSP clients should call nbd-client -d on shutdown, there are cases (e.g. connectivity or abrupt client shutdown) where the client doesn't call it.

Please call the postrun action even on SIGPIPE or whenever else the server process realizes that the client has exited.

compile error in ubuntu 16.10

when compiling in ubuntu 16.10 I get this error:

nbd-client.c: In function โ€˜usageโ€™:
nbd-client.c:861:38: error: โ€˜PROG_NAMEโ€™ undeclared (first use in this function)
fprintf(stderr, "%s version %s\n", PROG_NAME, PACKAGE_VERSION);
^~~~~~~~~
nbd-client.c:861:38: note: each undeclared identifier is reported only once for each function it appears in
nbd-client.c: In function โ€˜mainโ€™:
nbd-client.c:1067:36: error: โ€˜PROG_NAMEโ€™ undeclared (first use in this function)
printf("This is %s, from %s\n", PROG_NAME, PACKAGE_STRING);
^~~~~~~~~
Makefile:737: recipe for target 'nbd_client-nbd-client.o' failed
make[2]: *** [nbd_client-nbd-client.o] Error 1
make[2]: Leaving directory '/home/nick/src/nbd'
Makefile:822: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/nick/src/nbd'
Makefile:472: recipe for target 'all' failed
make: *** [all] Error 2

nbdtab systemd-mark

It looks like nbdtab does not recognize systemd-mark as corresponding to -systemd-mark. man nbdtab says it's a bug :-). Just lost two days debugging every possible option in systemd why it kills nbd-client :-/

docbook2man needed to build man pages

Is there a way to make building the man pages optional, or better, switch from docbook2man to another system for building them?
docbook2man is part of the docbook2X project, which is not easily available on BSD_. Also building docbook2x requires Perl and SAX parser (among other things) and is not trivial to build on BSD_.

Error message is "Unknown error" when client is unauthorized

This is minor, but it would be nice to detect the unathorized case and issue that error explicitly instead of this:

Jan 06 23:43:18 x systemd[1]: Starting NBD client connection for nbd0...
Jan 06 23:43:18 x nbd_client[8734]: Unknown error in reply to NBD_OPT_GO; cannot continue
Jan 06 23:43:18 x nbd_client[8734]: Exiting.
Jan 06 23:43:18 x nbd-client[8734]: Negotiation: ..Error: Unknown error in reply to NBD_OPT_GO; cannot continue
Jan 06 23:43:18 x nbd-client[8734]: Exiting.

'block nbd0: Device being setup by another task' even after DISCONNECT and CLEAR_SOCK

After my system lost network connectivity (due to kernel bug, workaround is ip link set down/up), nbd-client won't reconnect.

nbd-client[6704]: Negotiation: ..size = 76300MB
nbd-client[6704]: bs=512, sz=80006348800 bytes
nbd_client[6704]: Kernel doesn't support multiple connections
nbd_client[6704]: Exiting.
nbd-client[6704]: Error: Kernel doesn't support multiple connections
nbd-client[6704]: Exiting.

Kernel log:

block nbd0: Device being setup by another task

I wrote a C utility to send ioctl (DISCONNECT and CLEAR_SOCK), and I see log statements from nbd.c in the kernel log confirming the commands. But the above error condition still triggers.

I see that in the code NBD_BOUND is set, but is never reset. Is that by design? Should it be reset in the disconnect handler where task is set to NULL?
PS. FWIW, I added the clear of this bit there, and am running the modified kernel. If the network failure occurs again, I'll see if the patch changes anything.

Fwd: Shouldn't nbd automatically load the kernel module?

Description of problem:
When installing the nbd package, there is nothing that automatically loads the kernel module nbd when using the [email protected]. The module is needed for the client side to work.

Version-Release number of selected component (if applicable):
nbd-3.16.1-1.fc26.x86_64

How reproducible:
Every time.

Steps to Reproduce:

  1. systemctl start nbd@...

Actual results:
The service fails.

Expected results:
Successfully activation of the device.

Additional info:
I'm not sure if this is intentional or not. I worked around it by adding a file in /etc/modules-load.d, and then things work fine. Most services don't need that kind of additional help to work, however, so I felt it could be worth a report.

I can see reasons not to add this to all systems installing nbd. After all, only the client side needs the module. Perhaps a ExecStartPre configuration running "modprobe nbd" in the [email protected] file, or would that be too ugly.

As I said, maybe this is intentional. Feel free to close immediately with "won't fix" if so.

Fwd: Using nbd for swap causes dependency problems

https://bugzilla.redhat.com/show_bug.cgi?id=1490039

Description of problem:
I have a diskless box where I try to use nbd for swap partition. Out of the box, this causes a timeout and failure on startup. When systemd tries to start the dev-nbd0.swap unit, it waits for dev-nbd0.device, but that apparently doesn't start up, so there is a timeout. The [email protected] unit comes up much later, so even if there wasn't the timeout, there would be no swap space available.

I tried to add a dependency from dev-nbd0.swap to [email protected], but that causes a loop: [email protected] depends on sysinit.target which depends on swap.target which depends on dev-nbd0.swap which would then close the loop with my attempted dependency on [email protected].

Related to this: there is a Before=dev-%i.device entry in the [email protected] file, but that is ignored according to systemd:

sep 05 10:56:11 pluto systemd[1]: [email protected]: Dependency Before=dev-nbd0.device ignored (.device units cannot be delayed)

Version-Release number of selected component (if applicable):
nbd-3.16.1-1.fc26.x86_64

How reproducible:
Every time

Steps to Reproduce:

  1. Enable [email protected]
  2. Add an entry to swap on /dev/nbd0 in /etc/fstab

Additional info:
After some experimentation, I've come up with an alternative systemd service file for nbd which seems to work correctly. See the attachment for the details. The key difference from the distributed one is that my version do not include default dependencies. Instead it lists a number of "lower level" dependencies I believe are needed. In addition, it also retriggers udevadm for the device after coming up, so systemd will be aware the device is ready. I'm not sure if this latter problem could also be avoided with the appropriate dependencies; I haven't found out how if so.

I'm not sure if this is the best solution, or even if it is correct. It seems to work in my use case.

mkfs on /dev/nbd0 failed

truncate -s 4G my.img
to create a blank img file and server it by nbd-server.
use: nbd-client -N my myservip /dev/nbd0
to connect the img to /dev/nbd0
use: mkfs.btrfs /dev/nbd0 or other mkfs (ext4, ext3, xfs) all failed to mkfs this img.
Tested on nbd-3.16.2

Can't change the port in nbd-server

~ sudo nbd-server localhost:9000
~ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:10809           0.0.0.0:*               LISTEN      3350/nbd-server     

As you can see it still listens to the default port. The following prints the warning, but at least it listens to the right port:

~ sudo nbd-server localhost:9000 /dev/sdc
** (process:3121): WARNING **: Specifying an export on the command line is deprecated.

** (process:3121): WARNING **: Please use a configuration file instead.

Unfortunately, for some reason it also listens to a bunch of other ports as well:

~ sudo netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:55055           0.0.0.0:*               LISTEN      3122/nbd-server                
tcp        0      0 0.0.0.0:10809           0.0.0.0:*               LISTEN      3122/nbd-server     
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      3122/nbd-server             
tcp6       0      0 :::51489                :::*                    LISTEN      3122/nbd-server     
tcp6       0      0 ::1:9000                :::*                    LISTEN      3122/nbd-server

The configuration was simple:

[generic]
  user = nbd
  group = nbd
[otherexport]
    exportname = /dev/sdc
    #port = 9000

Old school does not work:

[generic]
  oldstyle=true
  user = nbd
  group = nbd
[otherexport]
    exportname = /dev/sdc
    port = 9000

because it just exits:

~ sudo nbd-server
** Message: Since 3.10, the oldstyle protocol is no longer supported. Please migrate to the newstyle protocol.
** Message: Exiting.

Is it a bug or how can specify the port it should listen to? Why does it listen to a bunch of other ports as well? Of course, I can just block those ports at the firewall level, but this is still a peculiar behavior.

nbd-client does not deal with v6-mapped IPv4 addresses

client authorization never have success when call authorized_client(client)

this block not working correctly for me from nbdsrv.c

        if(res->ai_family != addr->sa_family) {
            goto next;
        }

after comment this block all working fine

//      if(res->ai_family != addr->sa_family) {
//          goto next;
//      }

Potential off-by-one in offset handling of TRIM command

While playing around with the "TRIM" support, I found it nothing happens when using offset 0 (using nbd-server). I think exptrim contains an off-by-one bug:

if(prev.startoff < req->from) {

should be

if(prev.startoff <= req->from) {

Although, that's my suspicion. Only confirmed to work for a single-file export.

nbd gives access denied if export name is longer than 23 chars

I'm using nbd server/client 3.17

Using this conf:

[12345678901234567890123]
    exportname = /dev/null
    readonly = true
[123456789012345678901234]
    exportname = /dev/null
    readonly = true

nbd fails to connect to the second export with a really non-obvious error:

# nbd-client -N 12345678901234567890123 localhost /dev/nbd0
Negotiation: ..size = 17592186044415MB
bs=1024, sz=18446744073709550592 bytes

# nbd-client -N 123456789012345678901234 localhost /dev/nbd0
Negotiation: ..Error: Connection not allowed by server policy. Server said: Access denied by server configurationguration for th1

BTW, 'configurationguration' seems to be a typo.

`./configure && make` fails on Debian Jessie

With the 3.14 release tarball, ./configure && make fails with this error:

...
Making all in systemd
make[2]: Entering directory '/home/zander/elixir/nbd/nbd-3.14/systemd'
make[2]: *** No rule to make target '[email protected]', needed by '[email protected]'.  Stop.
make[2]: Leaving directory '/home/zander/elixir/nbd/nbd-3.14/systemd'
Makefile:708: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/zander/elixir/nbd/nbd-3.14'
Makefile:424: recipe for target 'all' failed
make: *** [all] Error 2

Include changelog in tarball?

Hi,

We are maintaining this package in Fedora, I hope you can include the changelog in the source tarball if possible, so this will let users understand what have been changed quickly.

Thanks!

exported files over something around 1 TiB get an insane device size on the client side and are actually empty

Hey.

I've been trying the following on Debian Sid ( @yoe ... pingin Wouter who's the maintainer there):

Debian nbd version 1:3.15.1-2
Linux heisenberg 4.8.0-2-amd64 #1 SMP Debian 4.8.15-2 (2017-01-04) x86_64 GNU/Linux

  • 8TB SATA HDD connected via SATA/USB bridge to the host heisenberg
  • partition 2 is a LUKS container, which is mapped ("decrypted")
  • the "decrypted" device is 8000448233472 bytes in size (~8TB) and it's owner set to nbd:nbd
  • heisenberg works as NBD server and exports the "decrypted" device to localhost
  • another Debian sid (same kernel, nbd versions) run inside a kvm on heisenberg, named "klenze"
  • there's a port forwarding from between the two hosts for the nbd port
  • I did a blockdev --setro /dev/sdb* (with sdb being the SATA disk)
  • I tried with both, the dm-crypt mapping set up with --readonly and without
  • /etc/nbd-server/config:
[generic]
# If you want to run everything as root rather than the nbd user, you
# may either say "root" in the two following lines, or remove them
# altogether. Do not remove the [generic] section, however.
	user = nbd
	group = nbd
	includedir = /etc/nbd-server/conf.d
	listenaddr = 127.0.0.1
	max_threads = 1
	allowlist = true

# What follows are export definitions. You may create as much of them as
# you want, but the section header has to be unique.
[calestyo]
	exportname = /dev/mapper/data-a3
	readonly = true
	rotational = true

Now first, the whole setup works fine with a smaller test file (e.g. a 1GiB image file containing an ext4 can be mounted on the client).

When I now try to connect the NBD device on the client with:

# nbd-client localhost -N calestyo /dev/nbd0 
Negotiation: ..size = 7629822MB
bs=1024, sz=8000448233472 bytes

That seems to work nicely... the server shows something like:

Jan 14 02:53:42 heisenberg nbd-server[28682]: Stopping Network Block Device server: nbd-server.
Jan 14 02:53:42 heisenberg nbd-server[28685]:  nbd-server.
Jan 14 02:53:46 heisenberg nbd_server[28688]: Spawned a child process
Jan 14 02:53:46 heisenberg nbd_server[28691]: virtstyle ipliteral
Jan 14 02:53:46 heisenberg nbd_server[28691]: connect from 127.0.0.1, assigned file is /dev/mapper/data-a3
Jan 14 02:53:46 heisenberg nbd_server[28691]: No authorization file, granting access.
Jan 14 02:53:46 heisenberg nbd_server[28691]: Starting to serve
Jan 14 02:53:46 heisenberg nbd_server[28691]: Size of exported file/device is 8000448233472
Jan 14 02:55:54 heisenberg nbd_server[28688]: Child exited with 0
Jan 14 02:55:56 heisenberg nbd-server[28808]: Stopping Network Block Device server: nbd-server.

Now unfortunately,...
blkid /dev/nbd0
gives nothing, as does e.g. hd /dev/nbd0

and worse:

# blockdev --getsize64 /dev/nbd0
18446743278064762880

Any ideas?

Cheers,
Chris.

NBD: Client failover causes kernel crash

I ran into kernel crash while testing NBD client/server failover. Here is the stack dump I see on my Ubuntu-16.04 box.

[10554.029187] nbd: registered device at major 43
[10573.523556] EXT4-fs (nbd0): mounting ext2 file system using the ext4 subsystem
[10573.524366] EXT4-fs (nbd0): warning: mounting unchecked fs, running e2fsck is recommended
[10573.524500] EXT4-fs (nbd0): mounted filesystem without journal. Opts: (null)
[10591.278962] block nbd0: Receive control failed (result -512)
[10591.278971] block nbd0: pid 115995, nbd-client, got signal 9
[10591.278974] block nbd0: shutting down socket

[10638.646904] block nbd0: Attempted send on closed socket
[10638.646908] blk_update_request: I/O error, dev nbd0, sector 4632
[10638.646912] EXT4-fs warning (device nbd0): htree_dirblock_to_tree:958: inode #2: lblock 0: comm ls: error -5 reading directory block
[10662.102399] ------------[ cut here ]------------
[10662.102420] kernel BUG at /build/linux-0XAgc4/linux-4.4.0/fs/buffer.c:3005!
[10662.102427] invalid opcode: 0000 [#1] SMP
[10662.102434] Modules linked in: nbd ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs snd_hda_codec_hdmi binfmt_misc hp_wmi snd_hda_codec_realtek sparse_keymap snd_hda_codec_generic input_leds intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_intel coretemp snd_hda_codec kvm_intel snd_hda_core snd_hwdep kvm snd_pcm irqbypass snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq serio_raw snd_seq_device snd_timer sb_edac edac_core lpc_ich snd mei_me mei soundcore shpchp tpm_infineon 8250_fintek mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
[10662.102572] scsi_transport_iscsi parport_pc ppdev lp parport autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid nouveau crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mxm_wmi lrw video gf128mul glue_helper ablk_helper i2c_algo_bit cryptd ttm drm_kms_helper syscopyarea sysfillrect e1000e sysimgblt psmouse fb_sys_fops ptp ahci drm pps_core libahci wmi fjes [last unloaded: nbd]
[10662.102673] CPU: 7 PID: 188844 Comm: umount Not tainted 4.4.0-78-generic #99-Ubuntu
[10662.102679] Hardware name: Hewlett-Packard HP Z440 Workstation/212B, BIOS M60 v02.31 12/14/2016
[10662.102686] task: ffff8807deb47000 ti: ffff8807d9e00000 task.ti: ffff8807d9e00000
[10662.102692] RIP: 0010:[] [] submit_bh_wbc+0x152/0x160
[10662.102706] RSP: 0018:ffff8807d9e03d40 EFLAGS: 00010246
[10662.102711] RAX: 0000000000000005 RBX: ffff88079bffbd00 RCX: 0000000000000000
[10662.102719] RDX: 0000000000000000 RSI: ffff88079bffbd00 RDI: 0000000000001411
[10662.103016] RBP: ffff8807d9e03d68 R08: 0000000000000000 R09: 0000000000000fff
[10662.103755] R10: 0000000000002d7c R11: 000000000000ef31 R12: 0000000000001411
[10662.104491] R13: 0000000000000008 R14: ffff8800c973a400 R15: ffff880802f83800
[10662.105227] FS: 00007f538a5ba840(0000) GS:ffff88080c7c0000(0000) knlGS:0000000000000000
[10662.105959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10662.106697] CR2: 0000000001a34878 CR3: 00000007deb1b000 CR4: 00000000001406e0
[10662.107431] Stack:
[10662.108161] ffff88079bffbd00 0000000000001411 0000000000000008 ffff8800c973a400
[10662.108902] ffff880802f83800 ffff8807d9e03d88 ffffffff812497bc ffffffff81f38d80
[10662.109639] ffff88079bffbd00 ffff8807d9e03dd0 ffffffff812bbf42 0000000000000034
[10662.110378] Call Trace:
[10662.111100] [ < ffffffff812497bc > ] __sync_dirty_buffer+0x6c/0x100
[10662.111825] [ < ffffffff812bbf42 > ] ext4_commit_super+0x1d2/0x290
[10662.112553] [ < ffffffff812bccb1 > ] ext4_put_super+0xe1/0x390
[10662.113276] [ < ffffffff812111ef > ] generic_shutdown_super+0x6f/0x100
[10662.113988] [ < ffffffff8121157c > ] kill_block_super+0x2c/0xa0
[10662.114694] [ < ffffffff812116d3 > ] deactivate_locked_super+0x43/0x70
[10662.115399] [ < ffffffff81211bac > ] deactivate_super+0x5c/0x60
[10662.116095] [ < ffffffff8122fc0f > ] cleanup_mnt+0x3f/0x90
[10662.116777] [ < ffffffff8122fca2 > ] __cleanup_mnt+0x12/0x20
[10662.117458] [ < ffffffff8109f011 > ] task_work_run+0x81/0xa0
[10662.118138] [ < ffffffff81003242 > ] exit_to_usermode_loop+0xc2/0xd0
[10662.118805] [ < ffffffff81003c6e > ] syscall_return_slowpath+0x4e/0x60
[10662.119469] [ < ffffffff81840b90 > ] int_ret_from_sys_call+0x25/0x8f
[10662.120121] Code: 44 89 ef e8 81 14 18 00 5b 31 c0 41 5c 41 5d 41 5e 41 5f 5d c3 40 f6 c7 01 0f 84 1c ff ff ff f0 80 63 01 f7 e9 12 ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 31
[10662.121487] RIP [ < ffffffff81247a62 > ] submit_bh_wbc+0x152/0x160
[10662.122161] RSP < ffff8807d9e03d40 >

I have both nbd-server and nbd-client running on the same system, and issue can be reproduced with following commands,

**Server
truncate -s 10G /mnt/nbddisk
mkfs.ext4 /mnt/nbddisk
nbd-server 127.0.0.1@9000 /mnt/nbddisk

**Client
modprobe nbd
nbd-client 127.0.0.1 9000 /dev/nbd0
mount /dev/nbd0 /mnt/
kill -9 < pid of nbd-client >

After killing nbd-client, remounting /dev/nbd0 to different folder fails with "/dev/nbd0 is already mounted or /mnt1/" busy".
Unmounting "/mnt" leads to above kernel crash.

I found below thread reporting the similar crash. I see thread concluded with suggestions, but not sure if the fix is pushed to the mainstream kernel or not.
https://sourceforge.net/p/nbd/mailman/message/34486113/

Is there any way this can be fixed in the driver? I would be glad to help in verifying the fix if needed.

Thanks,
Mehul.

data corruption with multi-threading and copy-on-write enabled

Since the introduction of multithreading, it looks like writing to a nbd device can corrupt data as the data is not written to right offset by nbd-server. Setting max_threads to 1 works around it, but I still ran into deadlocks. Details below.

I reproduced it consistently using cow with a server on Cygwin 32bit or Linux 32 bit, but I suspect that it can also affect 64 bit and non cow exports.

I initially ran into the problem when exporting a VSS snapshot of a disk partition with cow from Cygwin. Connecting the nbd device, mount the NTFS file system on it, making modifications, flush data to the nbd device, and the NTFS file system ends up corrupted.

I can reproduce the corruption easily (more easily in 32 bits for some reason) with (with the server on Linux as well) by doing:

truncate -s 500G file # 500GiB sparse file
cat > conf << \EOF
[generic]
        port = 12345
#       max_threads = 1

[default]
        exportname = /path/to/file
        copyonwrite = true
        maxconnections = 1
EOF
nbd-server -C conf

And connect that on the client and do:

printf abc | sudo dd bs=4k seek=10 conv=notrunc of=/dev/nbd0
printf xyz | sudo dd bs=4k seek=100 conv=notrunc of=/dev/nbd0
printf XYZ | sudo dd bs=4k seek=1000 conv=notrunc of=/dev/nbd0
blockdev --flushbufs /dev/nbd0

I found the diff file had:

0x0000 all zeros
0x1000 3rd new block
0x2000 1st new block

When compiling the server with debug, I see:

*handling read request
Asked to read 1024 bytes at 40960.
Page 10 is not here, we read the original one
(READ from fd 6 offset 40960 len 1024), ++

*handling read request
Asked to read 1024 bytes at 409600.
Page 100 is not here, we read the original one
(READ from fd 6 offset 409600 len 1024), ++

****handling write request
Asked to write 1024 bytes at 409600.
handling write request
Asked to write 1024 bytes at 40960.
Page 10 is not here, we put it at 1
(READ from fd 6 offset 40960 len 4096), Page 100 is not here, we put it at 0
(READ from fd 6 offset 409600 len 4096),++*handling read request
Asked to read 1024 bytes at 4096000.
Page 1000 is not here, we read the original one
(READ from fd 6 offset 4096000 len 1024), ++

**handling write request
Asked to write 1024 bytes at 4096000.
Page 1000 is not here, we put it at 2
(READ from fd 6 offset 4096000 len 4096), +

Look how the intermingled output suggests the requests are being processed in different threads.

What I think is happening is that each of the threads does a lseek() followed by a write(). But because there's no exclusive lock, what can happen is that the order of processing, instead of being thread1.lseek();thread1.write();thread2.lseek();thread2.write(), is instead: thread1.lseek();thread2.lseek();thread1.write();thread2.write() and the data ends up being written in the wrong place.

With max_threads = 1, I no longer see those problems, but I see some occasional deadlocks (in my cases when doing a ntfsclone on the nbd device). With max_threads = 1, we still have a thread handling the requests and another working on them. I suspect there's still some shared state/memory that is not properly mutexed.

Using pwrite() instead of lseek()+write() may remove the need for exclusive lock although I don't know if one gets a guarantee of atomicity regardless of the size of that data being written.

I ended up reverted to 3.9.1 where it runs as expected.

NBD devices don't have ID_FS_TYPE

This bug is causing 30 or 10 seconds delay when NBD clients boot, because initramfs-tools calls wait-for-root /dev/nbd0 <10 or 30 second delay>, and wait-for-root doesn't see ID_FS_TYPE so it only returns after the timeout.

It's been reported against udev in: https://bugs.freedesktop.org/show_bug.cgi?id=62565
...where it was marked as not a bug in udev, and the necessary changes for NBD were suggested:

removing the blacklist could only happen if the nbd kernel side is changed to send out events
like loop, dm, md are doing it

It's also been reported against nbd and initramfs in Ubuntu 5 years ago:
https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/696435

In LTSP we're currently using this as a workaround:
http://bazaar.launchpad.net/~ltsp-upstream/ltsp/ltsp-trunk/view/head:/client/Debian/share/initramfs-tools/scripts/local-top/nbd_ltsp

Error during autoconf

While running autoconf on ubuntu I get following errors, can you tell me what pkg i am missing, I am on ubuntu 14.04

configure.ac:8: error: possibly undefined macro: AM_INIT_AUTOMAKE
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
configure.ac:9: error: possibly undefined macro: AM_MAINTAINER_MODE
configure.ac:154: error: possibly undefined macro: AM_PATH_GLIB_2_0
configure.ac:160: error: possibly undefined macro: AM_CONDITIONAL

Trim using fallocate/FALLOC_FL_PUNCH_HOLE broken

If I'm not mistaken, the trim support as implemented in ndb-server doesn't work as expected. Here's why: in the call to fallocate, only FALLOC_FL_PUNCH_HOLE is passed as mode, whilst this is not allowed: documentation in the (Linux) kernel says, in fs/open.c:

/* Punch hole must have keep size set */
        if ((mode & FALLOC_FL_PUNCH_HOLE) &&
            !(mode & FALLOC_FL_KEEP_SIZE))
                 return -EOPNOTSUPP;

The corresponding man-page also mentions this.

Here's what happens at runtime:

read(4, "%`\225\23\0\1\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5", 28) = 28
write(1, "UNKNOWN from 0 (0) len 5, ", 26UNKNOWN from 0 (0) len 5, ) = 26
fallocate(5, 02, 0, 0)                  = -1 EINVAL (Invalid argument)
write(1, "Performed TRIM request from 0 to"..., 41Performed TRIM request from 0 to 83886080) = 41

(I only noticed now the length argument passed to fallocate is not exactly as expected (5), although this might be related to my local change as described in #7)

Using my server (which passes FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE as mode-argument) this works as expected on the same filesystem.

Notice NBD_CMD_TRIM isn't recognized by getcommandname.

Finally, the result code of fallocate is not handled and returned to the client. Is this intentional?

nbd-server doesn't start with ipv6.disable=1 and unplugged cable

When the following two conditions are true, nbd-server doesn't start:

  • ipv6.disable=1 exists in /proc/cmdline
  • the network cable is unplugged, so that eth0 doesn't have an ipv4 IP at boot either

The error message in the logs is:

Jan 31 13:29:33 srv1-dide nbd_server[1330]: failed to setup servers: failed to open a modern socket: failed to create a socket: Address family not supported by protocol

I tested on Ubuntu 16.04 (nbd-server 1:3.13-1) and on Ubuntu 18.04 (nbd-server 1:3.16.2-1).

nbd-server before commit 2ab3a2d fails to communicate with nbd-client after commit e6b56c1

Recent nbd-clients fail to connect to a bit older nbd-servers.

  • nbd-server: 1:3.13-1 in Ubuntu 16.04
  • nbd-client: 1:3.16.2-1 in Ubuntu 18.04
  • command: nbd-client 10.161.254.11 -N /opt/ltsp/i386 /dev/nbd0
  • server syslog messages:

Jan 9 10:25:06 alkis nbd_server[1681]: Spawned a child process
Jan 9 10:25:06 alkis nbd_server[30471]: Negotiation failed/5a: magic mismatch
Jan 9 10:25:06 alkis nbd_server[30471]: Exiting.
Jan 9 10:25:06 alkis nbd_server[30471]: Modern initial negotiation failed
Jan 9 10:25:06 alkis nbd_server[1681]: Child exited with 1

Was there a protocol breaking change between those versions, or is this a regression?

Wrong byte-order in 'option' replies on LE systems

While running "nbd-client -l" under strace while working on my own NBD server library, I noticed this:

Negotiation
===========
Server Magic & Client Caps
--------------------------
read(4, "NBDMAGIC", 8)                  = 8
...
read(4, "IHAVEOPT", 8)                  = 8
...
read(4, "\0\1", 2)                      = 2
write(4, "\0\0\0\1", 4)                 = 4

Send option 'NBD_OPT_LIST'
--------------------------
write(4, "IHAVEOPT", 8)                 = 8
write(4, "\0\0\0\3", 4)                 = 4
write(4, "\0\0\0\0", 4)                 = 4
...

Receive option reply
--------------------
read(4, "\0\3\350\211\4Ue\251", 8)      = 8
read(4, "\3\0\0\0", 4)                  = 4    <====
read(4, "\0\0\0\2", 4)                  = 4
read(4, "\0\0\0\7", 4)                  = 4
read(4, "\0\0\0\3", 4)                  = 4
read(4, "foo", 3)                       = 3

According to the protocol spec, the option reply should contain the option identifier after the magic value, according to the option received from the client. As you can see, the byte-order seems to be wrong (this is an x86_64 system). I checked the nbd-server code, and indeed, in negotiate the option is read from the network, then converted into host-order, and later on ( handle_list -> send_reply) pushed as-is to the client. I think this requires one more htonl, both in the handle_list case as well as in the default case (returning NBD_REP_ERR_UNSUP). The opt value passed to handle_export_name is unused.

If this is indeed considered incorrect behaviour (if it's not I need to fix my library) I can write a patch (already did for testing, but wasn't sure what the best approach is -> pass network-order to handle_list, do the conversion in there,...).

Race condition on disconnections immediately after connections

Hi, the following code exposes some race condition in disconnections:

for i in $(seq 1 9); do
(
    nbd-client server -N /opt/ltsp/i386 /dev/nbd$i
    nbd-client -d /dev/nbd$i
) &
done

After the code runs (sometimes 2-3 runs are needed), some of the nbd-client instances are still running in some hanged state, preventing nbd-client [re/dis]connections, blocking system shutdown etc.

This affects us in LTSP where we have a "connect, check if there's a newer version of the image, disconnect" logic, and it sometimes causes issues due to the aforementioned race condition.

Some of the errors displayed in dmesg:

[ 563.099196] block nbd6: NBD_DISCONNECT
[ 563.099477] block nbd6: shutting down socket
[ 563.099503] blk_update_request: I/O error, dev nbd6, sector 40892400
[ 563.099587] block nbd6: Receive control failed (result -104)
[ 563.103340] block nbd7: NBD_DISCONNECT
[ 563.103457] block nbd7: shutting down socket
[ 563.103515] blk_update_request: I/O error, dev nbd7, sector 40892024
[ 563.103805] BUG: unable to handle kernel NULL pointer dereference at 000000b8
[ 563.103808] IP: [] nbd_ioctl+0x873/0xa03 [nbd]
[ 563.103821] *pdpt = 000000002948c001 *pde = 0000000000000000
[ 563.103825] Oops: 0000 [#1] SMP
[ 563.103826] Modules linked in: nbd cpufreq_conservative cpufreq_powersave cpufreq_userspace evdev crc32_pclmul snd_intel8x0 snd_ac97_codec ac97_bus intel_rapl_perf snd_pcm snd_timer snd joydev pcspkr serio_raw ac soundcore sg battery video button parport_pc ppdev lp parport ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb mbcache hid_generic usbhid hid sr_mod cdrom sd_mod ata_generic crc32c_intel aesni_intel xts aes_i586 ohci_pci lrw ehci_pci gf128mul ablk_helper cryptd ohci_hcd ehci_hcd psmouse ahci usbcore libahci usb_common ata_piix i2c_piix4 e1000 libata scsi_mod
[ 563.103855] CPU: 0 PID: 962 Comm: nbd-client Not tainted 4.9.0-4-686-pae #1 Debian 4.9.51-1
[ 563.103855] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 563.103857] task: ee2e6040 task.stack: f483e000
[ 563.103858] EIP: 0060:[] EFLAGS: 00010206 CPU: 0
[ 563.103860] EIP is at nbd_ioctl+0x873/0xa03 [nbd]
[ 563.103861] EAX: 0000002d EBX: 00001000 ECX: 00001000 EDX: 00000000
[ 563.103862] ESI: 00001000 EDI: 00000000 EBP: f483fe58 ESP: f483fdf8
[ 563.103863] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 563.103864] CR0: 80050033 CR2: 000000b8 CR3: 2e0e5860 CR4: 000406f0
[ 563.103867] Stack:
[ 563.103868] e9621000 00001000 00000100 00000001 f5495c00 ee396900 ee3969e8 f499449c
[ 563.103871] f4686400 00001000 f4994484 e94fc000 00000000 0000000f f4994428 df1a58d4
[ 563.103874] 98664467 00000000 00000069 00000000 d50cd4a7 f4686400 0006001f f85d4ad0
[ 563.103877] Call Trace:
[ 563.103890] [] ? page_add_new_anon_rmap+0x64/0xa0
[ 563.103892] [] ? nbd_queue_rq+0x120/0x120 [nbd]
[ 563.103896] [] ? blkdev_ioctl+0x25e/0xa90
[ 563.103898] [] ? do_wp_page+0x134/0x7a0
[ 563.103902] [] ? block_ioctl+0x3c/0x50
[ 563.103904] [] ? blkdev_fallocate+0x2c0/0x2c0
[ 563.103906] [] ? do_vfs_ioctl+0x91/0x720
[ 563.103907] [] ? handle_mm_fault+0x902/0xf40
[ 563.103910] [] ? __raw_callee_save___pv_queued_spin_unlock+0x6/0x10
[ 563.103912] [] ? SyS_ioctl+0x60/0x70
[ 563.103914] [] ? do_fast_syscall_32+0x8a/0x150
[ 563.103918] [] ? sysenter_past_esp+0x47/0x75
[ 563.103919] Code: 8b 45 cc 39 f3 8b 40 54 89 45 d0 0f 87 b6 00 00 00 8d b4 26 00 00 00 00 85 db 0f 84 e0 fe ff ff 8b 45 d4 8b 55 d0 89 f1 8d 04 40 <8b> 54 82 04 89 d0 29 f8 39 f3 0f 46 cb 39 c8 0f 47 c1 01 c7 29
[ 563.103940] EIP: []
[ 563.103942] nbd_ioctl+0x873/0xa03 [nbd]
[ 563.103943] SS:ESP 0068:f483fdf8
[ 563.103943] CR2: 00000000000000b8
[ 563.103946] ---[ end trace f2a60801a15f8bb7 ]---
[ 563.104109] block nbd7: Attempted send on closed socket
[ 563.104111] blk_update_request: I/O error, dev nbd7, sector 40892024
[ 563.104194] block nbd7: Attempted send on closed socket
[ 563.104196] blk_update_request: I/O error, dev nbd7, sector 40892024
[ 563.104198] Buffer I/O error on dev nbd7, logical block 20446012, async page read
[ 563.104201] block nbd7: Attempted send on closed socket

copy on write diff files are not getting removed

Hi,

I built a network boot around Arch Linux using NBD. Several clients mount their root filesystem from the server, writes are going to diff file on the server.
It works great, except the server isn't deleting the temporary diff files.

Here is what I use:

  • Arch Linux clients nbd-client 3.15.3-1
  • CentOS 7 (nbd server build from src) tested 3.15.3 and 3.16.1

server cfg:

[generic]
user = nbd
group = nbd

[aclient]
exportname = /opt/diskless/aclient.img
copyonwrite = true
cowdir = /tmp/

The export is a BTRFS formated sparse file.

I found a similar bug here: https://bugs.launchpad.net/ubuntu/+source/nbd/+bug/696454 , but for me it is never working, even when I disconnect an export manually with 'nbdclient -d'

nbd root on ubuntu 16.04 fails due to systemd shutdown sending a kill(-1,SIGSTOP)

Hi,

I'm trying to setup nbd root on ubuntu 16.04, which uses systemd for system management. Everything works well and as expected, but the partition is never cleanly closed on shutdown. I've tracked the problem to a kill(-1,SIGSTOP) signal that is sent by systemd-shutdown, before the root is mounted read only.

THe problem is related to the one discussed in this thread 6 years ago:
https://sourceforge.net/p/nbd/mailman/message/27368126/

I'd like to avoid hacking systemd to solve this, or else the maintenance of the images will become very messy...

Timeout triggers too soon

Starting with version 3.17 (Arch Linux package 3.17-2) the server fails when the client connects:

Connection dropped: Inappropriate ioctl for device

strace tells me this is calling ioctl for BLKGETSIZE64:

ioctl(7, BLKGETSIZE64, 0x7ffd58417048) = -1 ENOTTY (Inappropriate ioctl for device)

The device is a read-only image file, not a block device.

nbd-server 3.17 with nbd-client 3.16.2 is fine, so a change at client side (netlink?) is involved.

Fixes for CentOS 6

The included patch allows 3.13 to be compiled on CentOS 6. The issue is that g_thread_init() was never called in nbd-server.

I've included a line that should fix this in configure.ac, but I can't test it because CentOS 6 doesn't have recent enough versions of autotools, aclocal etc. So I copy-pasted the relevant bits into configure.

Also adds includes pthread.h where used.

Philip_Gwyn-nbd-3.13-g_thread_init-01.patch.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.