cneira / firecracker-task-driver Goto Github PK
View Code? Open in Web Editor NEWnomad task driver that uses firecracker to start micro-vms
License: Apache License 2.0
nomad task driver that uses firecracker to start micro-vms
License: Apache License 2.0
Hi cneira,
The driver can not support register service
to consul
job "neverwinter" {
datacenters = ["dc1"]
type = "service"
group "nwn-group" {
network {
mode = "cni/microvms"
}
service {
name = "nwn-service"
port = 22
address_mode = "alloc"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
address_mode = "alloc"
}
}
task "nwn-server" {
driver = "firecracker-task-driver"
config {
Vcpus = 1
KernelImage = "/home/cneira/Development/vmlinuxs/vmlinux"
BootDisk= "/home/cneira/Development/rootfs/ubuntu/18.04/nwnrootfs.ext4"
Disks = [ "/home/cneira/Development/disks/disk0.ext4:rw" ]
Mem = 1000
Network = "microvms"
}
}
}
}
I modify some code, but it's not work correctly.
support-cni-service.txt
note: move support-cni-service.txt support-cni-service.patch
I think I should get the IP Address assigned by group -> network section
, then setup taskConfigSpec.Nic
.
Could you give me so me advice?
Thx!
Hi, could you please clarify what is the license of this code? I see that you have opted for GPLv2 but your code base also includes sources coming directly from firectl and those were originally Apache 2.0. Most of this file:
comes directly from the firectl:
Could you please clarify? Thank you.
Got "access denied" when downloading the rootfses. Probably b/c S3 bucket config. Please fix the links if possible. Thanks!
So I have a task start failing with the following, not-very-useful info:
rpc error: code = Unknown desc = task with ID "8ee3098b-7420-cb04-2892-fedaa3c730ba/tenant-plugin/339ec6bd" failed
However, going to the Nomad Agent logs I get the following, much more intelligible errors:
failure when invoking CNI: failed to load CNI configuration from dir "/etc/cni/conf.d" for network "default": no net configurations found in /etc/cni/conf.d"
2022-04-04T13:23:32.274-0400 [INFO] client.driver_mgr.firecracker-task-driver: starting firecracker task: driver=firecracker-task-driver driver_cfg="{KernelImage: BootOptions: BootDis
k: Disks:[] Network:default Nic:{Ip: Gateway: Interface: Nameservers:[]} Vcpus:1 Cputype: Mem:128 Firecracker:/usr/bin/firecracker Log: DisableHt:false}" @module=firecracker-task-driver ti
mestamp=2022-04-04T13:23:32.274-0400
2022-04-04T13:23:32.274-0400 [INFO] client.driver_mgr.firecracker-task-driver: Starting firecracker: driver=firecracker-task-driver driver_initialize_container="&{/usr/bin/firecracker
/tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/vmlinux console=ttyS0 reboot=k panic=1 pci=off nomodules /tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-
d552efbf942c/tenant-plugin/rootfs.ext4 [] default { []} [] false 1 300 false false [] <nil> 0xc384c0}+" @module=firecracker-task-driver timestamp=2022-04-04T13:23:32.274-0400
2022-04-04T13:23:32.275-0400 [INFO] client.driver_mgr.firecracker-task-driver: Error starting firecracker vm: driver=firecracker-task-driver @module=firecracker-task-driver driver_cfg
="Failed to start machine: failure when invoking CNI: failed to load CNI configuration from dir \"/etc/cni/conf.d\" for network \"default\": no net configurations found in /etc/cni/conf.d"
timestamp=2022-04-04T13:23:32.275-0400
2022-04-04T13:23:32.275-0400 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin error="rpc error: code = U
nknown desc = task with ID \"3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/0e1713e6\" failed"
2022-04-04T13:23:32.275-0400 [INFO] client.alloc_runner.task_runner: not restarting task: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin reason="Error was unrecovera
ble"
I was wondering if it'd be possible to propagate that error up to the UI? Thanks!
Hi Neira,
When Nomad runs a job for creation of micro VM, it creates a veth interface, but when we stop the job, it doesn't remove that veth.
So, after running some jobs, you would have many veth interfaces on the host machine. It's a bug or we have to do something in the job?
225: vethdc6cd6b7@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:e2:74:60:24:28 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 192.168.127.1/32 scope global vethdc6cd6b7
valid_lft forever preferred_lft forever
inet6 fe80::e2:74ff:fe60:2428/64 scope link
valid_lft forever preferred_lft forever
230: vethc53f5bdf@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 72:3a:7f:35:a1:e1 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet 192.168.127.1/32 scope global vethc53f5bdf
valid_lft forever preferred_lft forever
inet6 fe80::703a:7fff:fe35:a1e1/64 scope link
valid_lft forever preferred_lft forever
How to operate this driver using Firecracker with Jailer?
Having the ability to snapshot and stop a running service would be great. Firecracker supports this.
It will allow massively over-provisioning on RAM, if you run super heavy instances that don't get a lot of traffic. All without writing hard code (just keep all your state in RAM).
Here's how Codesandbox uses it to fork a running VM in under 2 seconds: https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-seconds.
Hello, I can't install firecracker-task-driver.
Nomad Versions: 1.0.3 and head
Firecracker: v0.22.4
How to reproduce:
job "test01-dc1" {
datacenters = ["dc1"]
type = "service"
group "test01-dc1" {
restart {
attempts = 0
mode = "fail"
}
task "firecracker" {
artifact {
source = ".../vmlinux-5.4.0-rc5.tar.gz"
destination = "."
}
artifact {
source = ".../centos-7-x86_64_rootfs.tar.gz"
destination = "."
}
driver = "firecracker-task-driver"
config {
Vcpus = 2
Mem = 128
Network = "test01-dc1"
}
}
}
}
{
"name": "test01-dc1",
"cniVersion": "0.4.0",
"plugins": [
{
"type": "macvlan",
"master": "br0",
"ipam": {
"type": "static",
"addresses": [
{
"address": "192.168.0.30/24",
"gateway": "192.168.0.30"
}
]
}
},
{
"type": "firewall"
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
},
{
"type": "tc-redirect-tap"
}
]
}
Steps:
1 - Start the job
2 - Wait for it to initialize
3 - Stop the job
4 - List the job and it will list as dead
5 - Check the allocation and the vm will still be running
@cneira Thanks for your update.
Now also can not support address_mode = "alloc"
{
"name": "firecracker",
"cniVersion": "0.4.0",
"plugins": [
{
"type": "ptp",
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "192.168.60.0/24",
"resolvConf": "/etc/resolv.conf"
}
},
{
"type": "tc-redirect-tap"
}
]
}
job "hello" {
datacenters = ["dc1"]
type = "service"
group "sshd" {
network {
# mode = "cni/mynet"
port "ssh" {
to = 22
}
}
service {
name = "sshd"
port = "ssh"
address_mode = "alloc"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
address_mode = "alloc"
}
}
task "sshd" {
driver = "firecracker-task-driver"
config {
KernelImage = "/home/ox0spy/projects/nomad/study/firecracker/vmlinux.bin"
BootDisk = "/home/ox0spy/projects/nomad/study/firecracker/rootfs.ext4"
Firecracker = "/usr/local/bin/firecracker"
Vcpus = 1
Mem = 128
Network = "firecracker"
}
}
}
}
docs for address_mode in service block: https://www.nomadproject.io/docs/job-specification/service#address_mode
nomad status <alloc-id>
got the below error message:
Setup Failure failed to setup alloc: pre-run hook "group_services" failed: unable to get address for service "sshd": cannot use address_mode="alloc": no allocation network status reported
Originally posted by @ox0spy in #9 (comment)
Hi there,
Currently the firecracker-task-driver can't use the v1.0.0 firecracker binary because of a breaking change in firecracker. Specifically the ht_enabled
field was renamed to smt
and is now optional per https://github.com/firecracker-microvm/firecracker/releases/tag/v1.0.0
Nomad: 1.1.2
Logs
2021-09-14T23:49:13.352+0200 [DEBUG] agent.plugin_loader: starting plugin: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver args=[/opt/nomad/plugins/firecracker-task-driver]
2021-09-14T23:49:13.353+0200 [DEBUG] agent.plugin_loader: plugin started: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver pid=1765320
2021-09-14T23:49:13.353+0200 [DEBUG] agent.plugin_loader: waiting for RPC address: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver
2021-09-14T23:49:13.512+0200 [DEBUG] agent.plugin_loader: using plugin: plugin_dir=/opt/nomad/plugins version=2
2021-09-14T23:49:13.512+0200 [DEBUG] agent.plugin_loader.firecracker-task-driver: plugin address: plugin_dir=/opt/nomad/plugins network=unix address=/tmp/plugin021821091 timestamp=2021-09-14T23:49:13.510+0200
2021-09-14T23:49:13.522+0200 [DEBUG] agent.plugin_loader.stdio: received EOF, stopping recv loop: plugin_dir=/opt/nomad/plugins err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2021-09-14T23:49:13.533+0200 [DEBUG] agent.plugin_loader: plugin process exited: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver pid=1765320
2021-09-14T23:49:13.538+0200 [DEBUG] agent.plugin_loader: plugin exited: plugin_dir=/opt/nomad/plugins
2021-09-14T23:49:13.656+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/plugins
2021-09-14T23:49:13.656+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/plugins
I am working on a project where I am deploying the micro VM using nomad, the driver is working fine but there is an issue, when the VM is restarting or when we are updating the job with new rootfs, the VM is failing to start. When I dug more, I found the driver is unable to assign IP to the new VM as the IP range is exhausted. When I troubleshoot more I found that there are so many Firecracker VM is created, with each restart it provision more and more VMs uncontrollably and exhaust the whole IP range. Kindly refer to the screenshot to support my case. I am still trying to figure out this behavior of the driver. Technically it should update the rootfs and restart the VM with new rootfs and assign the same IP or new one but why it is creating the VM in the background? I would really appreciate the help here.
I'm trying to follow the readme, but I'm running into lots of issues understanding it. If I make it through, I'll try to make a PR with some clarity changes, but I wanted to note some issues that I was wondering about upfront.
I'm also wondering, why do all the task driver options start with an uppercase letter? Makes them quite unpleasant to have in a Nomad file while other options are lowercase afaik.
Anyways, we'll see how far I get, but some insight might be nice :)
What would be really helpful is if there were examples of using this driver.
Examples I would find particularly useful:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.