rust-vmm / vhost Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
The current clippy-x86
builder in vhost-ci doesn't cover code for non-default features (e.g. vhost-kern
, vhost-user
, etc). So, we have some clippy errors there now, which will be fixed by #23.
To improve code quality, it would be nice if we can pass more flags in clippy there like clippy --all-targets --all-features
.
In a PR #185, we fixed some new clippy warnings, but there is one related to __IncompleteArrayField
that should be auto-generated by bindgen
.
We used an old version that generated also Copy/Clone implementations, but now it looks like they removed it: rust-lang/rust-bindgen#1431
@jiangliu agreed on regenerate it with a new version of bindgen
.
We also modified a bit the vhost_binding.rs
generated file, so it may be better to move the new struct in another file, so we can update it easily next time
PR #161 doesn't reset the VQ after receiving a GET_VRING_BASE message.
(PR #154 it's also required, but it's already included in v0.9.0)
This fix is required to suspend/resume a VM with a vhot-user device, like virtiofsd:
https://gitlab.com/virtio-fs/virtiofsd/-/issues/110
Looking for feedback and guidance here.
The README from this crate says:
A crate to implement the vHost ioctl interfaces and vHost-user protocol.
Do we really want to put together the vhost part that is about defining traits that can be used both for vhost and vhost-user devices, and the part about vhost-user protocol?
I think we could get the vhost-user-protocol living in its own crate since it could be reused by a vhost-user daemon. This would not force this daemon to pull the vhost/vhost-user traits code.
There are several projects using this crate, it would be great to publish this crate to crates.io.
Currently VhostUserSlaveReqHandler::set_slave_req_fd() accepts an SlaveFsCacheReq input parameter, that means the slave communication channel could only used to support virtio-fs map/unmap requests.
It should take an UnixStream as input, so the backend implementation makes the decision what service will be supported on the slave communication channel.
The vhost crate considers offsets between the config offset and length as valid:
vhost/src/vhost_user/message.rs
Line 662 in ee3e872
qemu, on the other hand, considers offsets valid from 0 to length, and assumes the offset uses CONFIG_OFFSET as a base.
See get_config in vhost-user.c, it sets config to offset 0 and length to max to cover the entire area.
https://source.codeaurora.org/quic/qemu/qemu/tree/hw/virtio/vhost-user.c?h=qemu.org/master#n2068
This mismatch causes the qemu GET_CONFIG message to be considered invalid by the vhost crate.
I didn't find anything in the spec indicating which is correct:
https://qemu.readthedocs.io/en/latest/interop/vhost-user.html
Deferring to qemu seems like the best thing for compatibility with qemu, but it might break existing users.
I think there's a limitation for supporting vIOMMU use case because of the following lines:
When an IOTLB is involved, vhost in the kernel considers desc_table_addr
, used_ring_addr
and avail_ring_addr
as IOVAs. That means it could be either a GPA or a GVA. In case it's a GPA, that's fine to check if the address is in range, but if that's a GVA, it doesn't make any sense.
Could we introduce a boolean or flag to let VhostKernBackend
know about this use case, which would avoid the incorrect check?
@stefano-garzarella I've identified this issue while implementing vDPA for Cloud Hypervisor, so I was thinking maybe an alternative would be to implement is_valid()
from VhostKernVdpa
, which would override the default implementation. WDYT?
/cc @jiangliu
Now cloud hypervisor project have offical build against the external vhost crate dragonball branch, it maybe the right time to submit dragon branch into master branch
The following back-end request types are not part of the vhost-user protocol and should be removed:
pub enum BackendReq {
...
/// Virtio-fs draft: map file content into the window.
FS_MAP = 6,
/// Virtio-fs draft: unmap file content from the window.
FS_UNMAP = 7,
/// Virtio-fs draft: sync file content.
FS_SYNC = 8,
/// Virtio-fs draft: perform a read/write from an fd directly to GPA.
FS_IO = 9,
...
}
These are not in the vhost-user standard, which in itself is confusing, but even conflict with values that are defined in the standard (see Back-end message types), such as: VHOST_USER_BACKEND_SHARED_OBJECT_ADD, VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE and VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP.
This issue is to continue the discussion started on #205
Hi,
vhost-net support is missing in vhost_kernel mod here.
I'd like to make a pull request later. For better network performance, we have added vhost-net support in our private repository. It has been working well for a while.
VhostUserBackendMut::handle_event
has a bool return type that is not documented. Drilling into the code reveals that
// handle_event() returns true if an event is received from the exit event fd.
[1]
However, it looks like this is handled at [2]. Are there still valid use-cases where someone else should return a true there? Or shall we drop the bool from VhostUserBackend
and VhostUserBackendMut
?
@jiangliu: Maybe you can comment on this?
[1] https://github.com/rust-vmm/vhost/blame/40006d0b3981504320ae66e165c9fd36ce9cfb18/crates/vhost-user-backend/src/event_loop.rs#L196
[2] https://github.com/rust-vmm/vhost/blame/40006d0b3981504320ae66e165c9fd36ce9cfb18/crates/vhost-user-backend/src/event_loop.rs#L207
We enabled this recently, #175, to make the builds pass but maybe we should get around this with an alternative solution which doesn't require the forceful enabling here.
We recently added several new features mostly related to live migration (internal state migration, POSTCOPY support, SET_LOG_BASE message support), so I think we can release a new version of both crates soon.
Comments/objections?
Unable to build on mac Monterey 12.2.1. Any chance to get around this?
โ virtiofsd git:(main) cargo build --release
Compiling vhost v0.3.0
Compiling futures-executor v0.3.19
Compiling structopt v0.3.26
error[E0432]: unresolved import `vmm_sys_util::eventfd`
--> /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/vhost-0.3.0/src/backend.rs:16:19
|
16 | use vmm_sys_util::eventfd::EventFd;
| ^^^^^^^ could not find `eventfd` in `vmm_sys_util`
error[E0432]: unresolved import `vmm_sys_util::sock_ctrl_msg`
--> /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/vhost-0.3.0/src/vhost_user/connection.rs:18:19
|
18 | use vmm_sys_util::sock_ctrl_msg::ScmSocket;
| ^^^^^^^^^^^^^ could not find `sock_ctrl_msg` in `vmm_sys_util`
Compiling futures v0.3.19
error[E0599]: no method named `send_with_fds` found for struct `UnixStream` in the current scope
--> /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/vhost-0.3.0/src/vhost_user/connection.rs:142:19
|
142 | self.sock.send_with_fds(iovs, rfds).map_err(Into::into)
| ^^^^^^^^^^^^^ method not found in `UnixStream`
error[E0599]: no method named `recv_with_fds` found for struct `UnixStream` in the current scope
--> /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/vhost-0.3.0/src/vhost_user/connection.rs:317:45
|
317 | let (bytes, _) = unsafe { self.sock.recv_with_fds(&mut iovs, &mut [])? };
| ^^^^^^^^^^^^^ method not found in `UnixStream`
error[E0599]: no method named `recv_with_fds` found for struct `UnixStream` in the current scope
--> /Users/myuser/.cargo/registry/src/github.com-1ecc6299db9ec823/vhost-0.3.0/src/vhost_user/connection.rs:349:38
|
349 | let (bytes, fds) = self.sock.recv_with_fds(iovs, &mut fd_array)?;
| ^^^^^^^^^^^^^ method not found in `UnixStream`
Some errors have detailed explanations: E0432, E0599.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `vhost` due to 5 previous errors
warning: build failed, waiting for other jobs to finish...
error: build failed
From @y-x41
While reviewing the vhost crate, it was found that the function VHostUserHandler::set_vring_base() lacks validation of the input parameter index, which is used for indexing an internal vector and could potentially lead to a Out-of-Bounds write resulting in an application panic. The below listing depicts the vulnerable code segment.
impl<S, V, B> VhostUserBackendReqHandlerMut for VhostUserHandler<S, V, B>
where
S: VhostUserBackend<V, B>,
V: VringT<GM<B>>,
B: NewBitmap + Clone,
{
// [...]
fn set_vring_base(&mut self, index: u32, base: u32) -> VhostUserResult<()> {
let event_idx: bool = (self.acked_features & (1 << VIRTIO_RING_F_EVENT_IDX)) != 0;
self.vrings[index as usize].set_queue_next_avail(base as u16);
self.vrings[index as usize].set_queue_event_idx(event_idx);
self.backend.set_event_idx(event_idx);
Ok(())
}
}
X41 advises performing thorough input parameter validation to prevent any possibility of encountering potential Out-of-Bounds read/writes.
Please add a CODEOWNERS file with the details of the maintainers. Please use the following format as outlined in: https://help.github.com/en/articles/about-code-owners
Most projects can simply follow the wildcard syntax. e.g.
* @owner1 @owner2
Not only does this provide a way to see who is responsible or this repository they will also automatically be notified of incoming PR reviews.
Cargo.toml says "Apache-2.0 OR BSD-3-Clause", but many files contain lines like the following, which suggests that only the Apache-2.0 license applies:
// SPDX-License-Identifier: Apache-2.0
It's therefore unclear whether using this code under the terms of only the BSD-3-Clause license is permitted. If the intention is to use the BSD-3-Clause for code originally taken from crosvm, but the Apache-2.0 license for new code (like Cloud Hypervisor does), then Cargo.toml should say "Apache-2.0 AND BSD-3-Clause".
We need a way to allow Master
to send a request with the header containing some specific flags such as need_reply
.
With upgrading Cloud Hypervisor's dependencies of various rust-vmm crates [1], our CI reported the vDPA integration tests (that uses in-kernel vDPA device simulators) are failing with the latest release vhost v0.7.0 [2].
I found the breaking commit is 7e76859, which essentially switches to use host virtual address (HVA) instead of guest physical address (GPA) with the ioctl VHOST_SET_VRING_ADDR
. Considering the in-kernel vDPA device simulators were working fine with GPA, I wonder if this commit brings regressions to vDPA support?
[1] cloud-hypervisor/cloud-hypervisor#5451
[2] https://cloud-hypervisor-jenkins.westus.cloudapp.azure.com/blue/organizations/jenkins/cloud-hypervisor/detail/PR-5451/6/pipeline/169
/cc @sboeuf @rbradford
I wonder if the community is fine with cutting a new point release and publish it to crates.io to include the fix on vDPA regression (#165)?
This would be very useful for Cloud Hypervisor to directly consume the released version of vhost
crate and upgrade all other rust-vmm dependencies. Thank you.
(This issue was originally created in @dagrh here cloud-hypervisor/cloud-hypervisor#1460)
The vhost user protocol has a limit of ~8 memory slots it can handle; that's an issue when you add lots of DIMMs, e.g. in Kata when you have lots of containers in a pod: (Related to kata-containers/runtime#2795 )
QEMU recently got support for VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS which does away with the limit on the slots.
We need to get the same feature.
Easiest way to trigger it is to start a vhost_user_fs and then connect a qemu with:
-m 2G,maxmem=16G,slots=16 -object memory-backend-memfd,id=mem1,size=256M,share=on -device pc-dimm,id=dimm1,memdev=mem1 -object memory-backend-memfd,id=mem2,size=256M,share=on -device pc-dimm,id=dimm2,memdev=mem2 -object memory-backend-memfd,id=mem3,size=256M,share=on -device pc-dimm,id=dimm3,memdev=mem3 -object memory-backend-memfd,id=mem4,size=256M,share=on -device pc-dimm,id=dimm4,memdev=mem4 -object memory-backend-memfd,id=mem5,size=256M,share=on -device pc-dimm,id=dimm5,memdev=mem5 -object memory-backend-memfd,id=mem6,size=256M,share=on -device pc-dimm,id=dimm6,memdev=mem6 -object memory-backend-memfd,id=mem7,size=256M,share=on -device pc-dimm,id=dimm7,memdev=mem7 -object memory-backend-memfd,id=mem8,size=256M,share=on -device pc-dimm,id=dimm8,memdev=mem8
The vhost-user implementation is missing the Inflight I/O tracking feature in order to recover fully from a backend crash or restart while the VM/guest is running.
Use rust-vmm-ci as a gitsubmodule to test the code. For more details about how to make the switch, please check the rust-vmm-ci readme
Currently, some methods in the VhostUserSlaveReqHandler
trait, such as set_vring_kick
, takes RawFd
s as their arguments. However, this design doesn't allow a trait implementor to know whether the given FDs are valid and who its owner is. This means it cannot call from_raw_fd()
for the given FD safely.
So, it'd be nice if the VhostUserSlaveReqHandler
trait is updated to pass around File
s instead of RawFd
s. Then, the ownership of the underlying FD will become clearer.
@sameo , could you please help to pull the initial commit from https://github.com/jiangliu/vhost into this repository?
This PR LGTM, I just left a couple of comments.
Just for curiosity, why did you close #91?
Originally posted by @stefano-garzarella in #92 (review)
Hi @stefano-garzarella
I think there's an error with the current implementation of get_iova_range
as it uses the wrong ioctl: https://github.com/rust-vmm/vhost/blob/main/src/vhost_kern/vdpa.rs#L142
Could you confirm if this is an issue?
The VhostUserInflight
struct implements ByteValued
, but it has 4 bytes of padding at the end, so reading that uninitialized memory is UB, making the API unsound
/// Single memory region descriptor as payload for ADD_MEM_REG and REM_MEM_REG
/// requests.
#[repr(C)]
#[derive(Copy, Clone, Default)]
pub struct VhostUserInflight {
/// Size of the area to track inflight I/O.
pub mmap_size: u64,
/// Offset of this area from the start of the supplied file descriptor.
pub mmap_offset: u64,
/// Number of virtqueues.
pub num_queues: u16,
/// Size of virtqueues.
pub queue_size: u16,
}
...
unsafe impl ByteValued for VhostUserInflight {}
the problem is basically reading that uninitialized memory as &[u8]
, since type reading uinit memory is UB and a reference to uninit mem is also UB.
Although the right thing to do is to fix qemu and have this struct be packed, it will be difficult not to break backward compatibility.
Perhaps, we can fix this while keeping the backwards compatibility:
Make VhostUserInflight
#[repr(C, packed)]
, after reading VhostUserInflight
, read MaybeUninit<u32>
, we can add a new method Endpoint<R>::skip(len: usize)
. But, we need to skip it at the beginning of BackendReqHandler::handle_request()
pub fn handle_request(&mut self) -> Result<()> {
let (size, buf) = match hdr.get_size() {
0 => (0, vec![0u8; 0]),
len => {
let (size2, rbuf) = self.main_sock.recv_data(len as usize)?; // the UB happens inside recv_data()
if size2 != len as usize {
return Err(Error::InvalidMessage);
}
(size2, rbuf)
}
};
Fix qemu (and frontend) adding an explicit padding field guaranteeing that it will be initialized to 0
:
typedef struct VhostUserInflight {
uint64_t mmap_size;
uint64_t mmap_offset;
uint16_t num_queues;
uint16_t queue_size;
uint32_t padding; // MUST be zero initialized
} VhostUserInflight;
but this forces us to have two VhostUserInflight
definitions, one for the backend and one for the frontend, because the frontend must continue sending the unpacked version with padding.
This issue is to continue the discussion started on #208
Hi,
me and @dorindabassey are working on a vhost-user-gpu implementation, see: rust-vmm/vhost-device#598.
We are aiming at being compatible with QEMU's displays, which means supporting the VHOST_USER_GPU_SET_SOCKET
message and the protocol on this socket.
For specification see: https://www.qemu.org/docs/master/interop/vhost-user-gpu.html.
The VHOST_USER_GPU_SET_SOCKET
is currently defined in the FrontendReq
enum:
vhost/vhost/src/vhost_user/message.rs
Line 162 in 6ce9d36
but is not being handled here:
vhost/vhost/src/vhost_user/backend_req_handler.rs
Lines 370 to 376 in 6ce9d36
The protocol on this socket is nearly the same as the normal vhost user protocol and the header structure is exactly the same as the existing VhostUserMsgHeader
. The problem are the flags, because only the REPLY flag is specified, not even a VERSION flag is used.
I would like to implement this protocol, but my main problem is I don't know if adding new device specific protocol code into this project is ok. I see that there are FS specific things, but they might be removed (see #213)
I can think of 3 solution with diferent levels genericness:
creating a GpuBackend
struct and passing it to a set_gpu_socket
when VHOST_USER_GPU_SET_SOCKET
is received in a similar fashion to the existing Backend
and set_backend_req_fd
callback
vhost/vhost/src/vhost_user/backend_req_handler.rs
Lines 535 to 540 in 6ce9d36
vhost/vhost/src/vhost_user/backend_req_handler.rs
Lines 782 to 791 in 6ce9d36
having a set_gpu_socket
but passing it a UnixStream
and let the user parse the messages
Endpoint
so there is no code duplicationnot handling VHOST_USER_GPU_SET_SOCKET
in this project at all, but providing a fully generic way to handle custom messages from the frontend, for messages that are not handled by the handle_request
referenced above.
I am not sure on what is the stance on device-specific and/or non-standard features in this project, any suggestions?
We have noticed an issue in vhost while we migrating VMs using Cloud Hypervisor. We are attempting to migrate Docker container from Ubuntu 20.04 to Ubuntu 22.04 and noticed this issue. Details about our attempts are here cloud-hypervisor/cloud-hypervisor#5877 .
This particular test from https://github.com/cloud-hypervisor/cloud-hypervisor/blob/main/tests/integration.rs#L9616 live_migration::live_migration_sequential::test_live_migration_ovs_dpdk test has been stuck while running on Ubuntu 22.04 host.
On further debugging it has been struck while sending request SET_LOG_BASE to the backend at https://github.com/rust-vmm/vhost/blob/main/vhost/src/vhost_user/connection.rs#L538. It never returns from here (neither error or success).
Attached backtrace for further details.
Steps to reproduce this steps.
Install openvswitch-switch-dpdk and
$ sudo modprobe openvswitch
$ echo 6144 | sudo tee /proc/sys/vm/nr_hugepages
$ sudo service openvswitch-switch start
$ sudo ovs-vsctl init
$ sudo ovs-vsctl set Open_vSwitch . other_config:dpdk-init=true
$ sudo service openvswitch-switch restart
$ sudo ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
$ sudo ovs-vsctl add-port ovsbr0 vhost-user2 -- set Interface vhost-user2 type=dpdkvhostuserclient options:vhost-server-path=/tmp/dpdkvhostclient2
$ sudo ip link set up dev ovsbr0
$ sudo service openvswitch-switch restart
Clone Cloud Hypervisor and build
$ git clone [email protected]:cloud-hypervisor/cloud-hypervisor.git
$ cd cloud-hypervisor
$ cargo build --debug
$ sudo setcap cap_net_admin+ep ./target/debug/cloud-hypervisor
Build custom kernel from here : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#custom-kernel-and-disk-image
Get the Guest image from here : https://cloud-hypervisor.azureedge.net/focal-server-cloudimg-amd64-custom-20210609-0.qcow2
and follow the steps : https://github.com/cloud-hypervisor/cloud-hypervisor/tree/main#disk-image
Once you have custom kernel and guest image, run below command from different terminals
$ target/debug/cloud-hypervisor --api-socket /tmp/cloud-hypervisor.sock --cpus boot=2 --memory size=0,shared=on --memory-zone id=mem0,size=1G,shared=on,host_numa_node=0 --kernel <path to custom kernel>/vmlinux --cmdline "root=/dev/vda1 console=hvc0 rw systemd.journald.forward_to_console=1" --disk path=<path to focal>/focal-server-cloudimg-amd64-custom-20210609-0.raw path=/tmp/ubuntu-cloudinit.img --net tap=,mac=12:34:56:78:90:01,ip=192.168.1.1,mask=255.255.255.0 vhost_user=true,socket=/tmp/dpdkvhostclient2,num_queues=2,queue_size=256,vhost_mode=server -v
$ target/debug/cloud-hypervisor --api-socket /tmp/cloud-hypervisor.sock.dest -v
$ target/debug/ch-remote --api-socket /tmp/cloud-hypervisor.sock.dest receive-migration unix:/tmp/live-migration.sock
$ target/debug/ch-remote --api-socket /tmp/cloud-hypervisor.sock send-migration unix:/tmp/live-migration.sock
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.