Coder Social home page Coder Social logo

cneira / firecracker-task-driver Goto Github PK

View Code? Open in Web Editor NEW
143.0 5.0 19.0 20.19 MB

nomad task driver that uses firecracker to start micro-vms

License: Apache License 2.0

HCL 0.09% Go 99.21% Shell 0.70%
firecracker-microvms firecracker nomad task-driver vmlinux kernel-image bootdisk cni firecracker-task-driver rootfs

firecracker-task-driver's Introduction

Firecracker Task Driver

nomad task driver for creating Firecracker micro-vms.

Requirements

Note: The last version of firecracker that works with this nomad plugin is : 0.25.2, more work is needed to make it work with latest releases.

Installation

Install(and compile) the firecracker-task-driver binary and put it in plugin_dir and then add a plugin "firecracker-task-driver" {} line in your nomad config file.

go get github.com/cneira/firecracker-task-driver
cp $GOPATH/bin/firecracker-task-driver YOURPLUGINDIR

Then in your nomad config file, set

plugin "firecracker-task-driver" {}

In developer/test mode(nomad agent -dev) , plugin_dir is unset it seems, so you will need to mkdir plugins and then copy the firecracker-task-driver binary to plugins and add a plugins_dir = "path/to/plugins" to the above config file. then you can run it like:

nomad agent -dev -config nomad.config

For more details see the nomad docs.

Container network configuration

{
  "name": "default",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.127.0/24",
        "resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "firewall"
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

Example : exposing port 27960 on micro-vm

{
        "name": "microvms2",
                "cniVersion": "0.4.0",
                "plugins": [

                {
                        "type": "ptp",
                        "ipMasq": true,
                        "ipam": {
                                "type": "host-local",
                                "subnet": "192.168.127.0/24",
                                "resolvConf": "/etc/resolv.conf"
                        }
                },
                {
                        "type": "firewall"
                },
                {
                        "type": "portmap",
                        "capabilities": {"portMappings": true},
                        "runTimeConfig":  { 
                                "portMappings":
                                        [ { "hostPort": 27960, "containerPort": 27960, "protocol": "udp" }
                                        ] }
                },
                {
                        "type": "tc-redirect-tap"
                }

        ]
}

In this example with outside world connectivity for your vms. The name of this network is default and this name is the parameter used in Network on the task driver job spec. Also the filename must match the name of the network, and the suffix .conflist.

Creating a rootfs and kernel image for firecracker

We need to an ext4 root filesystem to use as disk and an uncompressed vmlinux image, the process on how to generate them is described here.

Using ZFS zvols to create a rootfs for microvms

Leveraging ZFS zvols to expose rootfs to firecracker is really simple, and zfs has a lot of benefits.

First download a template image, for example from OpenVZCentos7

Now create a ZVOL to host this tarball

$ zfs create -V 1G  zpool/centos7vm 
$ mkfs.ext4  /dev/zvol/zpool/centos7vm
$ mount -t ext4  /dev/zvol/zpool/centos7vm /mnt
$ tar xfvz centos-7-x86_64-minimal.tar.gz -C /mnt
$ zfs snapshot zpool/centos7vm@final 

Now just use your new zvol as your BootDisk For example:

job "example3" {
  datacenters = ["dc1"]
  type        = "service"

  group "test" {
    restart {
      attempts = 0
      mode     = "fail"
    }
    task "test01" {
     driver = "firecracker-task-driver"
      config {
       Vcpus = 1 
       Mem = 128
       KernelImage= "/home/cneira/kernel-images/vmlinux.bin"
       BootDisk = "/dev/zvol/vms/centos7vm"
       Network = "default"
      }
    }
  }
}

Firecracker task driver options


KernelImage (not required, default: vmlinux )

  • kernel image to be used on the micro-vm, if this option is omitted it expects a vmlinux file in the allocation dir.

BootOptions (not required, default: "ro console=ttyS0 reboot=k panic=1 pci=off nomodules")

  • Kernel command line.

BootDisk (not required, default: rootfs.ext4)

  • ext4 rootfs to use, if this is omitted it expects a rootfs called rootfs.ext4 in the allocation dir.

Disks (not required)

  • Additional disks to add to the micro-vm, must use the suffix :ro or :rw, can be specified multiple times.

Network (not required)

  • Network name if using CNI

Vcpus (not required, default: 1)

  • Number of cpus to assign to micro-vm.

Cputype (not required)

  • The CPU Template defines a set of flags to be disabled from the microvm so that the features exposed to the guest are the same as in the selected instance type. templates available are C3 or T2.

Mem (not required, default: 512)

  • Amount of memory in Megabytes to assign to micro-vm.

Firecracker (not required, default: "/usr/bin/firecracker")

  • Location of the firecracker binary, the option could be omitted if the environment variable FIRECRACKER_BIN is set.

Log (not required)

  • Where to write logs from micro-vm.

DisableHt (not required, default: false)

  • Disable CPU Hyperthreading.

When the microvm starts a file will be created in /tmp/ with the following name -, for example : /tmp/test01-785f9472-52a7-3dbf-8305-d482b1f7dc6f will contain the following info :

{
 "AllocId": "590983f4-499a-380f-420e-e5be4d5f46d9",
 "Ip": "192.168.127.62/24",
 "Serial": "/dev/pts/3",
 "Pid": "237216",
 "Vnic": "veth05fb4547vm"
}
  • AllocId (given by nomad)
  • Ip (Ip address assigned by cni configuration)
  • Serial (tty where a serial console is setup for the vm)
  • Pid ( Pid for the firecracker process that started the vm)
  • Vnic (virtual interface on the host linked to the vm)

Examples:

Omitting KernelImage and BootDisk

Don't specifying KernelImage and BootDisk it will default to rootfs.ext4 and vmlinux in the allocation directory.

job "example" {
  datacenters = ["dc1"]
  type        = "service"
  group "test" {
    restart {
      attempts = 0
      mode     = "fail"
    }

  task "test01" {
   artifact {
  	source = "https://firecracker-kernels.s3-sa-east-1.amazonaws.com/vmlinux-5.4.0-rc5.tar.gz"
	  destination = "."
  }
  artifact {
	  source = "https://firecracker-rootfs.s3-sa-east-1.amazonaws.com/ubuntu16.04.rootfs.tar.gz"
	  destination = "."
  }
  driver = "firecracker-task-driver"
    config {
      Vcpus = 1 
      Mem = 128
      Network = "default"
     }
    }
  }
}

CNI network configuration


job "cni-network-configuration-example" {
  datacenters = ["dc1"]
  type        = "service"

  group "test" {
    restart {
      attempts = 0
      mode     = "fail"
    }
    task "test01" {
      driver = "firecracker-task-driver"
      config {
       KernelImage = "/home/build/firecracker/hello-vmlinux.bin" 
       Firecracker = "/home/build/firecracker/firecracker" 
       Vcpus = 1 
       Mem = 128
       BootDisk = "/home/build/firecracker/hello-rootfs.ext4"
       Network = "fcnet"
      }
    }
  }
}

Additional Disks configuration


job "neverwinter" {
  datacenters = ["dc1"]
  type        = "service"
   task "nwn-server" {
      driver = "firecracker-task-driver"
      config {
       Vcpus = 1 
       KernelImage = "/home/cneira/Development/vmlinuxs/vmlinux"
       BootDisk= "/home/cneira/Development/rootfs/ubuntu/18.04/nwnrootfs.ext4"
       Disks = [ "/home/cneira/Development/disks/disk0.ext4:rw" ]
       Mem = 1000 
       Network = "default"
      }
    }
}

Accessing the microvm using serial console

The firecracker-task-driver exposes the serial console as this option is handy to troubleshoot network issues. Each microvm generates a state file on the /tmp/ directory, named using the job name + allocation id. For example:

-rw-r--r--. 1 root root  152 May 12 14:07 /tmp/test01-590983f4-499a-380f-420e-e5be4d5f46d9

The contents of the state file should be like the following:

{
 "AllocId": "590983f4-499a-380f-420e-e5be4d5f46d9",
 "Ip": "192.168.127.62/24",
 "Serial": "/dev/pts/3",
 "Pid": "237216",
 "Vnic": "veth05fb4547vm"
}

Using the serial now we know which serial port is expose and it's a matter of connect to it. You could use SCREEN(1) to connect to the serial console.

$ sudo screen /dev/pts/3

Started Update UTMP about System Runlevel Changes.

CentOS Linux 7 (Core)
Kernel 4.14.225 on an x86_64

192 login: 

Demo

asciicast

Support

ko-fi

It's also possible to support the project on Patreon

I work on this project on my free time and my country is not on the list available for github sponsors so any help for me continue working on this is appreciated.

References

firecracker-task-driver's People

Contributors

cneira avatar dependabot[bot] avatar ncode avatar scar26 avatar valentatomas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

firecracker-task-driver's Issues

Rootfs links not accessible

Got "access denied" when downloading the rootfses. Probably b/c S3 bucket config. Please fix the links if possible. Thanks!

Bug- Veth is not releasing when the MicroVM restart

I am working on a project where I am deploying the micro VM using nomad, the driver is working fine but there is an issue, when the VM is restarting or when we are updating the job with new rootfs, the VM is failing to start. When I dug more, I found the driver is unable to assign IP to the new VM as the IP range is exhausted. When I troubleshoot more I found that there are so many Firecracker VM is created, with each restart it provision more and more VMs uncontrollably and exhaust the whole IP range. Kindly refer to the screenshot to support my case. I am still trying to figure out this behavior of the driver. Technically it should update the rootfs and restart the VM with new rootfs and assign the same IP or new one but why it is creating the VM in the background? I would really appreciate the help here.

image

image

Add support for address_mode = "alloc"

@cneira Thanks for your update.

Now also can not support address_mode = "alloc"

cni conf: /etc/cni/conf.d/firecracker.conflist

{
  "name": "firecracker",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "ptp",
      "ipMasq": true,
      "ipam": {
        "type": "host-local",
        "subnet": "192.168.60.0/24",
        "resolvConf": "/etc/resolv.conf"
      }
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

job config

job "hello" {
    datacenters = ["dc1"]
    type = "service"

    group "sshd" {
        network {
            # mode = "cni/mynet"
            port "ssh" {
                to = 22
            }
        }
        service {
            name = "sshd"
            port = "ssh"
            address_mode = "alloc"
            check {
                type = "tcp"
                interval = "10s"
                timeout = "2s"
                address_mode = "alloc"
            }
        }

        task "sshd" {
            driver = "firecracker-task-driver"

            config {
                KernelImage = "/home/ox0spy/projects/nomad/study/firecracker/vmlinux.bin"
                BootDisk = "/home/ox0spy/projects/nomad/study/firecracker/rootfs.ext4"
                Firecracker = "/usr/local/bin/firecracker"
                Vcpus       = 1
                Mem         = 128
                Network     = "firecracker"
            }
        }
    }
}

docs for address_mode in service block: https://www.nomadproject.io/docs/job-specification/service#address_mode

run job

nomad status <alloc-id> got the below error message:

Setup Failure  failed to setup alloc: pre-run hook "group_services" failed: unable to get address for service "sshd": cannot use address_mode="alloc": no allocation network status reported

Originally posted by @ox0spy in #9 (comment)

Jailer

How to operate this driver using Firecracker with Jailer?

Bug install firecracker-task-driver

Hello, I can't install firecracker-task-driver.

  • go version go1.11 linux/amd64
  • commnad - go get github.com/cneira/firecracker-task-driver
  • error - package crypto/ed25519: unrecognized import path "crypto/ed25519" (import path does not begin with hostname)

How to registry service to consul

Hi cneira,

The driver can not support register service to consul

job "neverwinter" {
    datacenters = ["dc1"]
    type        = "service"

    group "nwn-group" {
        network {
            mode = "cni/microvms"
        }

        service {
            name = "nwn-service"
            port = 22
            address_mode = "alloc"
            check {
                type = "tcp"
                interval = "10s"
                timeout = "2s"
                address_mode = "alloc"
            }
        }

        task "nwn-server" {
            driver = "firecracker-task-driver"
            config {
                Vcpus = 1
                KernelImage = "/home/cneira/Development/vmlinuxs/vmlinux"
                BootDisk= "/home/cneira/Development/rootfs/ubuntu/18.04/nwnrootfs.ext4"
                Disks = [ "/home/cneira/Development/disks/disk0.ext4:rw" ]
                Mem = 1000
                Network = "microvms"
            }
        }
    }
}

I modify some code, but it's not work correctly.
support-cni-service.txt

note: move support-cni-service.txt support-cni-service.patch

I think I should get the IP Address assigned by group -> network section, then setup taskConfigSpec.Nic.

Could you give me so me advice?

Thx!

Supporting snapshot, pause/restore

Having the ability to snapshot and stop a running service would be great. Firecracker supports this.

https://github.com/firecracker-microvm/firecracker/blob/3388fa94c2ceeb2269a6fc9479b6f2798604c4e7/docs/snapshotting/snapshot-support.md

It will allow massively over-provisioning on RAM, if you run super heavy instances that don't get a lot of traffic. All without writing hard code (just keep all your state in RAM).

Here's how Codesandbox uses it to fork a running VM in under 2 seconds: https://codesandbox.io/blog/how-we-clone-a-running-vm-in-2-seconds.

firecracker-task-driver err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"

Nomad: 1.1.2

Logs

    2021-09-14T23:49:13.352+0200 [DEBUG] agent.plugin_loader: starting plugin: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver args=[/opt/nomad/plugins/firecracker-task-driver]
    2021-09-14T23:49:13.353+0200 [DEBUG] agent.plugin_loader: plugin started: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver pid=1765320
    2021-09-14T23:49:13.353+0200 [DEBUG] agent.plugin_loader: waiting for RPC address: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver
    2021-09-14T23:49:13.512+0200 [DEBUG] agent.plugin_loader: using plugin: plugin_dir=/opt/nomad/plugins version=2
    2021-09-14T23:49:13.512+0200 [DEBUG] agent.plugin_loader.firecracker-task-driver: plugin address: plugin_dir=/opt/nomad/plugins network=unix address=/tmp/plugin021821091 timestamp=2021-09-14T23:49:13.510+0200
    2021-09-14T23:49:13.522+0200 [DEBUG] agent.plugin_loader.stdio: received EOF, stopping recv loop: plugin_dir=/opt/nomad/plugins err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
    2021-09-14T23:49:13.533+0200 [DEBUG] agent.plugin_loader: plugin process exited: plugin_dir=/opt/nomad/plugins path=/opt/nomad/plugins/firecracker-task-driver pid=1765320
    2021-09-14T23:49:13.538+0200 [DEBUG] agent.plugin_loader: plugin exited: plugin_dir=/opt/nomad/plugins
    2021-09-14T23:49:13.656+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/plugins
    2021-09-14T23:49:13.656+0200 [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=/opt/nomad/plugins

Readme improvements

I'm trying to follow the readme, but I'm running into lots of issues understanding it. If I make it through, I'll try to make a PR with some clarity changes, but I wanted to note some issues that I was wondering about upfront.

  1. In the container network config section, I tried figuring out how to install both repos, but couldn't make it work.
  2. It would be nice to know what is the "minimum to install" from the requirements, vs "nice to have" (if there are any that aren't required, I haven't made it through).
  3. I tried glancing over the rootfs and image section, but don't really understand why it's needed. This might just be my lack of Firecracker understanding though.

I'm also wondering, why do all the task driver options start with an uppercase letter? Makes them quite unpleasant to have in a Nomad file while other options are lowercase afaik.

Anyways, we'll see how far I get, but some insight might be nice :)

Request: Propagate Firecracker Task Driver errors to Nomad UI

So I have a task start failing with the following, not-very-useful info:

rpc error: code = Unknown desc = task with ID "8ee3098b-7420-cb04-2892-fedaa3c730ba/tenant-plugin/339ec6bd" failed

image

However, going to the Nomad Agent logs I get the following, much more intelligible errors:
failure when invoking CNI: failed to load CNI configuration from dir "/etc/cni/conf.d" for network "default": no net configurations found in /etc/cni/conf.d"

    2022-04-04T13:23:32.274-0400 [INFO]  client.driver_mgr.firecracker-task-driver: starting firecracker task: driver=firecracker-task-driver driver_cfg="{KernelImage: BootOptions: BootDis
k: Disks:[] Network:default Nic:{Ip: Gateway: Interface: Nameservers:[]} Vcpus:1 Cputype: Mem:128 Firecracker:/usr/bin/firecracker Log: DisableHt:false}" @module=firecracker-task-driver ti
mestamp=2022-04-04T13:23:32.274-0400
    2022-04-04T13:23:32.274-0400 [INFO]  client.driver_mgr.firecracker-task-driver: Starting firecracker: driver=firecracker-task-driver driver_initialize_container="&{/usr/bin/firecracker
 /tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/vmlinux  console=ttyS0 reboot=k panic=1 pci=off nomodules /tmp/NomadClient1700322499/3aee425c-e789-5c1c-e029-
d552efbf942c/tenant-plugin/rootfs.ext4  [] default {   []} []    false 1  300    false false [] <nil> 0xc384c0}+" @module=firecracker-task-driver timestamp=2022-04-04T13:23:32.274-0400
    2022-04-04T13:23:32.275-0400 [INFO]  client.driver_mgr.firecracker-task-driver: Error starting firecracker vm: driver=firecracker-task-driver @module=firecracker-task-driver driver_cfg
="Failed to start machine: failure when invoking CNI: failed to load CNI configuration from dir \"/etc/cni/conf.d\" for network \"default\": no net configurations found in /etc/cni/conf.d"
 timestamp=2022-04-04T13:23:32.275-0400
    2022-04-04T13:23:32.275-0400 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin error="rpc error: code = U
nknown desc = task with ID \"3aee425c-e789-5c1c-e029-d552efbf942c/tenant-plugin/0e1713e6\" failed"
    2022-04-04T13:23:32.275-0400 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=3aee425c-e789-5c1c-e029-d552efbf942c task=tenant-plugin reason="Error was unrecovera
ble"

I was wondering if it'd be possible to propagate that error up to the UI? Thanks!

Request for examples

What would be really helpful is if there were examples of using this driver.

Examples I would find particularly useful:

  1. Connecting to another task within a group where one of those tasks is a Firecracker VM (does one just talk to localhost?)
  2. Placing artifact data into the task
  3. Working with environment variables, or noting that it's not possible to do so

veth interface

Hi Neira,

When Nomad runs a job for creation of micro VM, it creates a veth interface, but when we stop the job, it doesn't remove that veth.
So, after running some jobs, you would have many veth interfaces on the host machine. It's a bug or we have to do something in the job?

225: vethdc6cd6b7@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:e2:74:60:24:28 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 192.168.127.1/32 scope global vethdc6cd6b7
valid_lft forever preferred_lft forever
inet6 fe80::e2:74ff:fe60:2428/64 scope link
valid_lft forever preferred_lft forever
230: vethc53f5bdf@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 72:3a:7f:35:a1:e1 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet 192.168.127.1/32 scope global vethc53f5bdf
valid_lft forever preferred_lft forever
inet6 fe80::703a:7fff:fe35:a1e1/64 scope link
valid_lft forever preferred_lft forever

Dead lock stop jobs

Nomad Versions: 1.0.3 and head
Firecracker: v0.22.4

How to reproduce:

  • test01-dc1.nomad
job "test01-dc1" {
  datacenters = ["dc1"]
  type        = "service"

  group "test01-dc1" {
    restart {
      attempts = 0
      mode     = "fail"
    }

    task "firecracker" {
      artifact {
      source = ".../vmlinux-5.4.0-rc5.tar.gz"
        destination = "."
      }
      artifact {
        source = ".../centos-7-x86_64_rootfs.tar.gz"
        destination = "."
      }
      driver = "firecracker-task-driver"
      config {
        Vcpus = 2
        Mem = 128
        Network = "test01-dc1"
      }
    }
  }
}
  • /etc/cni/conf.d/test01-dc1.confdefault
{
  "name": "test01-dc1",
  "cniVersion": "0.4.0",
  "plugins": [
    {
      "type": "macvlan",
      "master": "br0",
      "ipam": {
         "type": "static",
         "addresses": [
            {
                "address": "192.168.0.30/24",
                "gateway": "192.168.0.30"
            }
         ]
      }
    },
    {
      "type": "firewall"
    },
    {
      "type": "portmap",
      "capabilities": {
        "portMappings": true
      }
    },
    {
      "type": "tc-redirect-tap"
    }
  ]
}

Steps:
1 - Start the job
2 - Wait for it to initialize
3 - Stop the job
4 - List the job and it will list as dead
5 - Check the allocation and the vm will still be running

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.