Coder Social home page Coder Social logo

golemfactory / ya-runtime-vm Goto Github PK

View Code? Open in Web Editor NEW
11.0 13.0 14.0 25.35 MB

Docker-like runtime enviroment implementation for Golem

License: GNU General Public License v2.0

Shell 1.15% Rust 50.31% C 44.82% Makefile 2.66% Dockerfile 1.06%
ya-runtime golem

ya-runtime-vm's Introduction

ya-runtime-vm

ya-runtime-vm is an implementation of a Docker-like runtime environment for Linux systems.

This repository consists of 2 crates:

  • ya-runtime-vm

    An application for running Virtual Machine images pre-built for yagna.

  • gvmkit

    A tool for converting Docker images into yagna Virtual Machine images and uploading them to a public repository. Requires for Docker to be installed on your system.

Building

Prerequisites:

  • rustc

    Recommendation: use the Rust toolchain installer from https://rustup.rs/

  • musl-gcc

    On a Ubuntu system, execute in terminal:

       sudo apt install musl musl-tools

Git checkout:

Init runtime/init-container/liburing submodule.

git submodule init
git submodule update

Building:

cd runtime
cargo build

Installing

Prerequisites:

  • cargo-deb

    Cargo helper command which automatically creates binary Debian packages. With Rust already installed, execute in terminal:

    cargo install cargo-deb

Installation:

In terminal, change the working directory to runtime and install a freshly minted Debian package.

cd runtime
sudo dpkg -i $(cargo deb | tail -n1)

This will install the binary at /usr/lib/yagna/plugins/ya-runtime-vm/ya-runtime-vm.

Command line

Follow the installation section before executing.

ya-runtime-vm 0.2.5

USAGE:
    ya-runtime-vm [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -w, --workdir <workdir>              
    -t, --task-package <task-package>    
        --cpu-cores <cpu-cores>           [default: 1]
        --mem-gib <mem-gib>               [default: 0.25]
        --storage-gib <storage-gib>       [default: 0.25]

SUBCOMMANDS:
    test              Perform a self-test
    offer-template    Print the market offer template (JSON)
    deploy            Deploy an image
    start             Start a deployed image
    help              Prints this message or the help of the given subcommand(s)

Caveats

  • Docker VOLUME command

    Directories specified in the VOLUME command are a mountpoint for directories on the host filesystem. Contents of those directories will appear as empty during execution.

    If you need to place static assets inside the image, try not to use the VOLUME command for that directory.

ya-runtime-vm's People

Contributors

boryspoplawski avatar demimarie avatar etam avatar evik42 avatar fepitre avatar marmarek avatar mfranciszkiewicz avatar nieznanysprawiciel avatar omeg avatar prekucki avatar pwalski avatar tworec avatar wkargul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ya-runtime-vm's Issues

missing some cpu capabilities flags in "golem.inf.cpu.capabilities"

some flags such as avx2, bmi1, bmi2 are missing from the list in "golem.inf.cpu.capabilities"

running

let cpu = CpuInfo::try_new().unwrap();
println!("{:?}",cpu.capabilities);

on my local machine produced the correct flags, but some flags are missing from the list from the API

Research 9p file server integration with the VM runtime

What:

  • research on how to integrate the 9p file server with the VM runtime
  • design

Why:

  • 9p file server will provide better means of mounting directories by expanding the fixed-size filesystem root (in tmpfs) and allow for exposing the existing contents of volume directories
  • 9p file server is cross-platform, which would allow to port the VM runtime to Windows

Runnig process in container

  • attach stdout & stderr - stream receiving on demand
  • adding env variables
  • workdir setting
  • user setting (optional)

Better experience of basic MVP scenario (docker-like)

Demo:

  • Ability to run more than one process during activity
  • Simple task with tricky characters in filepath and arguments
  • Workdir defined in dockerfile
  • Setting env variables

Extra:

  • Mounting dirs not defined as volumes

Make contents of `VOLUME`s available inside (currently) mounted directories

What

  • contents of VOLUME directories are available inside the VM

Why

  • increase compatibility with Docker
  • VOLUME is commonly used in Dockerfiles

Details:

  • support both read only root fs (excluding /tmp) and an overlay mounted on a local filesystem
  • extend VM image metadata to specify the root fs mode; default to overlay mode in the VM runtime
  • extend gvmkit-build with options to specify the root fs mode

mount /dev/shm

python multiprocessing apps currently fail due to (suspected) missing /dev/shm

[2020-12-15 16:29:10,652 DEBUG yapapi.events] CommandStdErr(agr_id='426eb630c81522d8987ebeb2796ef6610e25e14d756013b8e167fcdc66346a51', task_id='2', cmd_idx=3, output='Traceback (most recent call last):\n  File "client.py", line 140, in <module>\n    asyncio.get_event_loop().run_until_complete(task)\n  File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete\n    return future.result()\n  File "client.py", line 130, in process_json\n    results = await par_doRx(task)\n  File "client.py", line 47, in par_doRx\n    with ProcessPoolExecutor() as engine:\n  File "/usr/lib/python3.7/concurrent/futures/process.py", line 542, in __init__\n    pending_work_items=self._pending_work_items)\n  File "/usr/lib/python3.7/concurrent/futures/process.py", line 158, in __init__\n    super().__init__(max_size, ctx=ctx)\n  File "/usr/lib/python3.7/multiprocessing/queues.py", line 42, in __init__\n    self._rlock = ctx.Lock()\n  File "/usr/lib/python3.7/multiprocessing/context.py", line 67, in Lock\n    return Lock(ctx=self.get_context())\n  File "/usr/lib/python3.7/multiprocessing/synchronize.py", line 162, in __init__\n    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n  File "/usr/lib/python3.7/multiprocessing/synchronize.py", line 59, in __init__\n    unlink_now)\nOSError: [Errno 38] Function not implemented\n')

Release Checklist

  • Check license & references to other project. (qemu components)
  • Detect if node supports virutalization.

Unable to build: apk 404

Steps:

  • git clone thisrepo --depth 10
  • cd ya-runtime-vm
  • cargo build --release --all
  • Error

Error:

error: failed to run custom build command for `ya-runtime-vm v0.2.5 (/home/hasezoey/Downloads/ya-runtime-vm/runtime)`

Caused by:
  process didn't exit successfully: `/home/hasezoey/Downloads/ya-runtime-vm/target/release/build/ya-runtime-vm-fa369a1c85baae37/build-script-build` (exit code: 1)
  --- stdout
  wget -q -O "unverified" "https://nl.alpinelinux.org/alpine/v3.13/main/x86_64/linux-virt-5.10.16-r0.apk"

  --- stderr
  make: *** [Makefile:31: unpacked_kernel] Error 8
  Error: make failed with code 2
warning: build failed, waiting for other jobs to finish...
error: build failed

Cause:
https://nl.alpinelinux.org/alpine/v3.13/main/x86_64/linux-virt-5.10.16-r0.apk returns 404 Page Not Found

Ran on:

/etc/lsb-release
DISTRIB_ID=LinuxMint
DISTRIB_RELEASE=20.1
DISTRIB_CODENAME=ulyssa
DISTRIB_DESCRIPTION="Linux Mint 20.1 Ulyssa"

/etc/upstream-release/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu Focal Fossa"

POC Deploy command

  • Decode container config from image
  • generate deploy descriptor & create volume directories

Last 8 bytes are config size in ascii.
Config is just before this 8 bytes.

Example extractor:

with open('blender-0.1.golem-app') as f:
    b = f.read()
    json_size = int(b[-8:])
    json = b[-(json_size+8):-8]

Running container entry point

Example configuration for golemfactory/blender:

{
"Config": {
            "Hostname": "26d40000ae12",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": true,
            "AttachStderr": true,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/opt/blender:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "BLENDER_MAJOR=2.78",
                "BLENDER_VERSION=2.78a",
                "BLENDER_BZ2_URL=http://download.blender.org/release/Blender2.78/blender-2.78a-linux-glibc211-x86_64.tar.bz2"
            ],
            "Cmd": null,
            "Image": "golemfactory/blender",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": [
                "/usr/local/bin/entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {}
        }
}
  • set uid from User field. (example value 1000:1000)
  • set Env variables from container definition.
  • allow to set additional env variables on container start.
  • if entry point process ends container should also.

Handle docker `ENV` in VM runtime

What

  • the default exec env should contain environment variables set in Dockerfile

Why

  • increase compatibility with Docker
  • ENV is commonly used in Dockerfiles

Timeout error: unable to Golemize docker image using gvmkit-build

Running gvmkit-build golemfactory/blender:demo gives me this error.

pull golemfactory/blender:demo |███████████████| 1 in 16:00.2 (0.00/s)
Docker Image : sha256:c76719083b512020c48290943a01657be55fe08dda234ee43d4b332b2ca66361
Entry Point  : <none>
Working Dir  : /golem/work/
Command      : '/bin/bash'
User         : <none>
Volumes      :

     - /golem/output
     - /golem/resource
     - /golem/work

Env          :

     - PATH=/blender:/usr/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
     - BLENDER_MAJOR=2.80
     - BLENDER_VERSION=2.80
     - GLIBC_VERSION=217
     - BLENDER_BZ2_URL=http://download.blender.org/release/Blender2.80/blender-2.80-linux-glibc217-x86_64.tar.bz2

Output File  : golemfactory-blender-demo-c76719083b.gvmi

extracting files |███████████████| 13278 in 1:33.1 (142.57/s)
Traceback (most recent call last):
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 445, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 440, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/Cellar/[email protected]/3.9.2_2/Frameworks/Python.framework/Versions/3.9/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 447, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/bin/gvmkit-build", line 8, in <module>
    sys.exit(build())
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/gvmkit_build/build.py", line 181, in build
    builder.convert()
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/gvmkit_build/build.py", line 64, in __exit__
    self._tool.remove(force=True, v=True)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/docker/models/containers.py", line 351, in remove
    return self.client.api.remove_container(self.id, **kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/docker/api/container.py", line 1009, in remove_container
    res = self._delete(
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/docker/api/client.py", line 245, in _delete
    return self.delete(url, **self._set_request_timeout(kwargs))
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/requests/sessions.py", line 624, in delete
    return self.request('DELETE', url, **kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/Users/hp/Documents/gitcoin-gr9/golem-tutorial/micmac-golem/compile-docker/venv/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60)

Environment

  • Intel MacOS Catalina
  • Python 3.9
  • gvmkit-build 0.2.5
  • Docker 20.10.5: Docker desktop is running on my system. I'm able to spin up containers.

gvmkit Docker API dependencies

I'm trying to develop a way to automate the generation of GVM images. This means setting up a container that has the Yagna service running and other dependencies installed in order to run gvmkit-build.

I notice that the gvmkit-build command heavily relies on Docker APIs - it's using Docker Python Client to pull images. This requires running a full privileged Docker In Docker instance (which has security issues). The other option is to pass the Docker socket from the host to the container that is trying to build GVM images - this creates other complications.

In general, is it in the roadmap to remove some of the dependencies on the Docker API? In my opinion, the primary purpose of the gvmkit-build tool is to convert a container to a GVM image. It seems it should be possible to pass in an exported/flattened tar file that is an image of a container. Or perhaps to use a different build tool (Kaniko, Builda) to make the image. This would remove the dependency on the Docker API completely as far as pulling the image.

Next, the gvmkit-build uses a squashfs docker image for doing some of the manipulation to convert to a GVM image. Is it possible to do this as a regular script instead of requiring a Docker to do it? Because using a Docker for this again requires full Docker In Docker or passing the Docker socket to a container.

So to summarize: if we allow gvmkit-build to accept an image file as input, and if we allow converting the image without a squashfs Docker image, then we would be able to containerize the entire gvmkit-build environment. If we allow this, and we have network available on golem nodes, then we would be able to offload GVM image building to the golem network. Then golem nodes could build golem images for other nodes! Even without network for doing remote builds, I think being able to containerize the build environment could make it easier to make new images.

Implement ExeUnit runtime-api that talks with init in vm

Configurable QEMU runtime `cpu` param

Why:
First step to enable running Providers inside VM

What:
Default cpu param value is hardcoded to host.

Running ya-runtime-vm test
on Ubuntu 22 VM (x86_64) (MacOS (Intel) with Parallels 17 or VmWare Fusion 12)
results with:

[2022-06-21T09:21:13Z DEBUG ya_runtime_vm] VM: [    8.292299] Kernel panic - not syncing: Attempted to kill the idle task!
[2022-06-21T09:21:13Z DEBUG ya_runtime_vm] VM: [    8.292299] Rebooting in 1 seconds..
thread 'main' panicked at 'Failed to stop runtime: Error { code: Internal, message: "Sending quit failed: Connection reset by peer (os error 104)", context: {} }', runtime/src/main.rs:454:14

full log

To fix it I needed to change cpu param into kvm64.
It runs rather slowly, but I needed it only for my local dev setup.

This param should be configurable through exe-unit descriptor properties.

I do not know whether this param should be somehow mentioned in Handbook's Invalid VM issues
It says:

In any other case with the virtualization we recommend the:
sudo apt install cpu-checker && sudo kvm-ok command and follow the steps as given in the terminal interface.

In my case sudo kvm-ok indicated everything is fine.

❯ sudo kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used

Collect more unique info about CPU model

I'm by no means familiar with Rust myself, but SO referred me to this specific place in the code.

fn offer_template() -> anyhow::Result<serde_json::Value> {
let cpu = CpuInfo::try_new()?;
let model = format!(
"Stepping {} Family {} Model {}",
cpu.model.stepping, cpu.model.family, cpu.model.model
);
Ok(serde_json::json!({
"properties": {
"golem.inf.cpu.vendor": cpu.model.vendor,
"golem.inf.cpu.model": model,
"golem.inf.cpu.capabilities": cpu.capabilities,
},
"constraints": ""
}))
}

The community would love to have more unique details about the CPU instead of the data collected from .model as that isn't unique. This would allow the requestors to filter out older CPU's and favor newer ones for higher performance. It would also provide better information on the public stats page regarding what people price their hardware for.

Reza from the community proposed this:

CPUID instruction returns lots of info, some stats like model number(unque per model) or anything that is unique per cpu model, if included, would be great to have.

I'm not familiar with Rust myself so I don't know if this is large request or not.

EDIT:
Community user found the info that we are interested in:

"The "brand" field is more interesting than the family, model and stepping fields that currently collected. It is accessible via this method: https://docs.rs/raw-cpuid/9.0.0/raw_cpuid/struct.ExtendedFunctionInfo.html#method.processor_brand_string"

Brand specifically outputs this brand = "Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz"

Here's a sample output.
cpuid.log

Linux-virt package used in init-container

So the Makefile found in ya-runtime-vm/runtime/init-container/ has a bad target. Line 13 targets a no longer working link, and can't download anything from it because I am pretty sure alpine moved their package distro server. I looked for the current package location and think this this file may be the correct replacement

https://dl-cdn.alpinelinux.org/alpine/v3.13/main/x86_64/linux-virt-5.10.29-r0.apk

Haven't managed to get that makefile to compile though, changing that target I still running into issues with the string of the cp commands not being able to locate the file path it copies files to. I figure thats an extraction issue from the download link though so it may all be the same issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.