aurae-runtime / aurae Goto Github PK
View Code? Open in Web Editor NEWDistributed systems runtime daemon written in Rust.
Home Page: https://aurae.io
License: Apache License 2.0
Distributed systems runtime daemon written in Rust.
Home Page: https://aurae.io
License: Apache License 2.0
I was looking over https://github.com/aurae-runtime/aurae/blob/main/api/v0/runtime.proto, and two instances of this caught my attention.
/// A comma-separated list of CPU IDs where the task in the control group
/// can run. Dashes between numbers indicate ranges.
Would it be prudent to represent Cell.cpu_cpus
and Cell.cpu_mems
as repeated
fields so that some of the stringly-typed input can be retired in favor of a structured data schema?
So the fields from Cell
would look something like this:
repeated string cpu_cpus = 2;
...
repeated string cpu_mems = 4;
For the expression of ranges, I might consider using the oneof
capability. A sketch:
message CPUSpec {
message Range {
string start = 1;
string end = 2;
}
oneof spec {
string id = 1;
Range range = 2;
}
}
repeated CPUSpec cpu_cpus = 2;
I believe this is a github setting we need to investigate
Right now auraescript
errors and can produce many types of outputs.
Would it be possible to guarantee (or strongly encourage) auraescript users to always have their output in the form of valid JSON?
If we can instill patterns/best practices that make it such that all auraescript output to stderr and stdout is valid json we can begin logging and querying the data at scale later.
Similar to #21
More importantly should each namespace get a VM?
How do we start to experiment with isolation primitives for Kubernetes? Even though Aurae will not have Kubernetes awareness, it should consider the scope of Kubernetes similar to the scope of Aurae. We just aim to standardize the components according to the #20 principle of least awareness.
Right now running auraed -v
creates entirely too much TRACE
output and is unsuable. We need a way to use -v
without flooding the screen from gRPC.
We need to follow the Rust documentation standard and begin auto-generating our documentation in the aurae.io repository.
More: https://doc.rust-lang.org/rust-by-example/meta/doc.html
For example, https://aurae.io/ links to https://github.com/aurae-runtime/aurae/edit/master/docs/index.md. That should be main instead of master.
This looks simple to fix from the mkdocs docs, so Iโll send a PR.
Steps to reproduce:
Start auraed
Run the following typescript:
await cells.free(<runtime.FreeCellRequest>{
cellName: "non-existent-cell"
});
auraed
logs:16:25:22 [INFO] Starting Aurae Daemon Runtime...
16:25:22 [INFO] Register Server SSL Identity
16:25:22 [INFO] Validating SSL Identity and Root Certificate Authority (CA)
16:25:22 [INFO] User Access Socket Created: /var/run/aurae/aurae.sock
16:25:31 [INFO] CellService: free() cell_name="sleeper-cell"
thread 'tokio-runtime-worker' panicked at 'find cell_name in cgroup_table', auraed/src/runtime/mod.rs:122:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Right now we have several build jobs that are executed during a pull request and a merge to main
Can we please leverage #78 and have the builds use a container instead of apt-get
installing the dependencies each iteration?
It seems like differences in casing for the generated code is causing property/field values to be lost.
JS/TS use camelCase for properties while rust uses snake_case for fields.
@krisnova confirmed this as an issue on stream. A possible solution is to relax the casing requirements with another macro that tags all the fields with #[serde(alias = "camelCaseFieldName")]
as Deno uses serde for serializing/deserializing.
*I actually have a macro for this, that probably needs a little adapting before I can contribute it. I'm just a little time constrained before the break, so wanted to leave this issue as a note for future me. For now, if anyone is trying stuff out, just use hardtoread
field names so the casing will be the same throughout.
Similar to #66 can we please implement a VScode language extension for AuraeScript?
It will need to read from the lib.rs source of truth for the language.
See also #63
Can we host an official project blog on the website? I'd like to move some of my articles over to the project to use as needed.
Also as we identify new paradigms such as Aurae Cells and Aurae Pods we likely will need to do some story telling.
This is due to a bug in pbjson (which we depend on). There is currently an open PR to fix the issue in that repo:
Right now the community repository is the main source of truth for our community.
We should migrate/copy/move the documentation over to the website such that it makes it easy for newcomers to the project to understand how to get started with us.
What are the things new folks to the project would love to see that they didn't see before? Or maybe something that was suprising? Or something wasn't discovered until later?
We should not be able to change a cell after it has been created, you must destroy the cell and allocate a new one if you want to change anything about the cell.
Is there any way we can make optimizations in the Aurae compile time? How many dependencies are we building that we are no longer using? How about caching and optimization?
We should be able to structure the output of aurae scripts such that it can be queried.
Sticking with the "No YAML touches our project" mentality. I propose we identify a clever way to ensure that all output from the Aurae executable is structured using valid json.
As we bring the ability to schedule VMs and containers online, we need to start exploring patterns for leveraging the new isolation boundaries.
Or more importantly should each VM get a Kubelet? Do we want to be able to use Kubernetes taints/tolerations to be able to schedule to VMs running on a single machine?
What about the ability for the Aurae project to support a "Kubelet VM Pattern" that makes it easy to lump all of the Kubernetes "goop" together into a single image (similar to minikube) that can then be used to schedule containers within the VM.
Auraescript needs npm installed to build. The current output if it is missing is (thanks @moto-timo):
error: failed to run custom build command for `auraescript v0.1.0 (/home/ttorling/Projects/aurae/auraescript)`
Caused by:
process didn't exit successfully: `/home/ttorling/Projects/aurae/target/debug/build/auraescript-614525272d6885a2/build-script-bui
ld` (exit status: 1)
--- stdout
cargo:rerun-if-changed="src"
cargo:rerun-if-changed=""/home/ttorling/Projects/aurae/auraescript/lib""
--- stderr
Error: No such file or directory (os error 2)
warning: build failed, waiting for other jobs to finish...
make: *** [Makefile:63: auraescript] Error 101
This is currently unsuccessfully attempted in the auraescript/build.rs file. It would be best to make the fix there.
Can we identify a way to leverage the lib.rs file and its corresponding definitions/macros/documentation to implement an IntelliJ custom langauge
See also #63
I believe we will need to develop a logging cache that serves as an in-memory caching layer (Note: in the future it will need to also persist to disk!) for both stdout and stderr streams from an executable within a cell.
We will need to be able to hook into a stream at any moment and have some basic guarantees about the data and how we retrieve it.
All Aurae cells should setup the namespaces by scheduling a process immediately. We believe this will be the nested auraed
.
Once the new namespaces have been "cloned" we can track their IDs.
All executables should be using setns()
systemcall and not calling clone(3)()
themselves. We should be entering already existing namespaces such that all executables in a cell share the same namespace and namespace ids.
What if we established a pattern such that a single configuration file/source was all that was needed to guarantee that an Aurae instance with defined workloads were able to start.
What if aurae.ts
was all that was needed to remotely provision an entire node with associated workloads?
We should be able to add user mode networking and network devices to an unprivileged container and VM
See my thread ๐ https://twitter.com/krisnova/status/1582353843110965248?s=20&t=y32VYPsVtNP1FOitiQ7h3w
I'd like to propose using bash (with an aurae cli similar to kubectl
or buildah
) or Deno to script against aurae instead of moving forward with the Rhai implementation. I am not heavily invested in bash or Deno per se, but I am not sure that Rhai's unique properties serve the project better than other more popular approaches to scripting against API's. In short, why is Rhai better than python, javascript, bash, etc? I tried to focus my thinking on who aurae users probably will be and how that theoretical persona would want to interact with the system. In addition, I am working on a system with some of the same goals as aurae currently, and I'm sure that project also biases my opinions about aurae.
Soooo, from the readme,
AuraeScript follows a similar client paradigm to Kubernetes kubectl command. However, unlike Kubernetes this is not a command line tool like kubectl. AuraeScript is a fully supported programing language complete with a systems standard library. The Aurae runtime projects supports many clients, and the easiest client to get started building with is AuraeScript.
Does it make sense that the easiest client to get started building with is a scripting language users probably aren't familiar with? As a sysadmin, I would much rather reach for bash to get started hacking. I think buildah
serves as a great model to emulate (tool intro here). buildah
replaces dockerfiles the same way auraescript is attempting to replace yaml for infra configuration. It should be simple to spin up an instance of aurae and configure it with a simple bash script or interactively from the terminal. Bash is the most intuitive answer for this. Moving past simplicity, power users can get pretty far with bash, awk, and a well maintained gnu-style cli such as kubectl
or buildah
before needing to reach for a beefier (and crucially, more complex to integrate and maintain) scripting language such as python, javascript, perl, or lua.
And speaking of lua, an even simpler question to ask than above is why Rhai instead of lua? I've fiddled with it a bit to mess with my neovim configuration, but I am far from an expert. From my outsider's perspective, lua seems like an active success story for small, embed-able languages.
If the scripting language is important to provide a platform for the aurae standard library, then Deno becomes a very compelling option. Deno markets itself as a v8 runtime that is secure, hackable, and embed-able.
Deno
global object (Deno.spawn
, Deno.test
, Deno.exit
) can be modified to include runtime specific functionality. You can ship aurae users a full featured runtime & compiler to build their aurae scripts with.side note: when building docker we were struggling to provide flexibly replaceable extensions as Go had no support for dynamic loading of libraries. As Docker wanted to ship with "batteries included but changeable", we ended up with those CNI / CSI constructs, where core functionality was pushed to external processes with more or less nice interfaces.
Now that auraed
is launching nested versions of itself, we will need to proxy POSIX signals through auraed
SIGKILL
should terminate (kill) the process
SIGHUP
should reload the config from disk and reopen logfiles
SIGINT
should "interrupt" the process and begin to "die nice" ensuring any cleanup logic can be done
Use SIGINT
instead of SIGKILL
to "free" a nested auraed after the signal handler has been implemented.
Proxy all signals to nested executables for them to manage independently.
For example sending a SIGHUP
to a nested auraed
should proxy a SIGHUP
to all of the nested executables!
The projects needs a place to store containers, which could implicitly mean the project has an authz/secrets problem as well.
Can we please identify a place for the project to store containers? Can we also work with the other maintainers to make sure folks and the build systems have access to the container registry as needed?
for errors that lead to actionable resolutions by the user, we should include a link to some documentation about the error (hosted on aurae.io?) and a link to file a new issue here in the repo.
i think we can get this pretty easily by extending the From<CellsServiceError> for Status
response translation to include known docs and links to file a new issue using constant strings.
As we are still in the sandbox phase of building Aurae we are using unwrap
statements in the code. We should replace these with safer and more idiomatic systems in Rust.
Additionally we should build a linting system that prevents code like this from entering the project.
For legal reasons we need to include the license in every file that we consider "Source Code".
Should we have a CI/CD check for this or any easy make
target we can run to check/append files as needed?
There are a few things we would like to accomplish with this milestone, but the overall aim to be captured here is that it signifies the project has reached a maturity where it can commit to being ready to accept contributions at any time, regardless of one's prior exposure to Aurae. Consequently, this milestone is less one of functionality, but more a state of readiness and cleanliness.
Some of the concrete deliverables of the milestone are:
v0
Some of the sociotechnical goals of the milestone are:
This umbrella issue will be associated with a milestone and will be updated with links to issues and to reflect the current state of progress towards the milestone
The /auraescript/src/lib.rs file serves as the source of truth for objects, functions, types, and aliases for the AuraeScript programming language.
We need to establish a convention that will generate meaningful documentation directly from the source code.
For example we expose the about()
function in AuraeScript which is defined here as:
pub fn register_stdlib(mut engine: Engine) -> Engine {
engine
// about function
//
// Reserved function name to share information about the current
// client interpreter.
.register_fn("about", about)
}
v0
style API/stdlib convention or do something else? What about semantic versioning in the Cargo.toml?If there is no receiver, StreamLogger and LogChannel error, which prints "Failed to log message...". I think theses errors can be safely ignored as we have multiple loggers registered and StreamLogger is only used for the observe api.
One of the big problems with TLS/mTLS and Kubernetes is that in order to set up a cluster, you need to touch DNS records.
This is both a good thing and a bad thing. SAN and hostname material is often embedded and relied on various TLS scenarios.
How do we as a first principle call out some elegance around our relationship with DNS without pissing the internet off?
I just want to be able to say "puttin on the ritz" while debugging production.
Is it easy to add an alias for a flag?
Can we please break the build and fail a pull request if code creates a rust warning?
We are trying to keep a clean set of code for the project, and we would like to leverage the Makefile commands to ensure that no warnings are generated.
This includes
How do we know if our site as down?
Right now aurae.io is offline and is returning a 404.
Can we alert our discord and let the maintainers know when the site is broken?
IPv6 is the future present.
IPv6 is now! The time has arrived! We are officially here ๐
Can we place adopt IPv6 support for the networking subsystem by default? Additionally can we go a step further and adopt IPv6 for all of our documentation and code defaults moving forward. We should offer IPv4 documentation and code as a secondary example to the default IPv6 content.
Here at Aurae we create a lo
loopback device listening on localhost ::1
(or the IPv4 equivalent 127.0.0.1
).
use std::net::TcpListener;
fn main() {
let listener = TcpListener::bind(โ[::1]:8080โ).unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
println!("Connection established!");
}
}
Here at Aurae we create a lo
loopback device listening on localhost 127.0.0.1
.
use std::net::TcpListener;
fn main() {
let listener = TcpListener::bind(โ127.0.0.1:8080โ).unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
println!("Connection established!");
}
}
Auraescript takes too long to compile. I suspect it has to do with the build.rs doing too much each time, specifically redundant npm calls.
We want to have beautiful logs in the Aurae project. This is a perpetually open issue that will be ever-relevant to newcomers to the project.
At any time new contributors to the project are welcome to audit our current log lines.
In the code base you will see statements such as the following:
info!("an ugly log");
warn!("a bad log");
debug!("something debug");
trace!("trace something");
at any time in the course of the project's development it is safe and encouraged for users to audit our log lines.
We need to admit that we are building a Turing complete language, and we intrinsically will adopt all of the major exciting problems of managing a popular programming language.
The first exciting problem we get to tackle is picking out a name which will inevitably be highly criticized by strange tech enthusiasts with opinions.
Some options for us to peruse.
Scanning over the open GH issues, I think there are some that we can possibly close. So naturally, I'm opening another issue to document the ones that may be ready to close.
Probably outdated due to Deno being JS/TS?
ts-proto (Proto -> Typescript generator) includes the docs from the proto files for us:
Warnings are breaking the builds, but we may not have completed activating the lints we want:
Do we still have this issue?
Other:
Should we consider isolating an Aurae cell at the namespace level as well? If so what are the sane defaults we should assume for every Aurae cell? Which namespaces should we consider, and what do we do given the various kernels and their support and awareness of each namespace?
Upon merging a PR to main (or committing directly to the branch because I am a horrible person) a GitHub action is kicked off to update the static site using GitHub pages. Shortly after the event, the website aurae.io will return a 404.
Looking in the get GitHub pages settings for the repository the domain name value is unexpectedly missing.
Setting the value back to aurae.io
fixes the 404 and the site is now updated with the most recent changes from the main
branch.
The PSL releases the project to the public domain, and also includes the concept of a "steward" of the project.
Can we please change the license from Apache 2.0 to the PSL 1.0 and call out the Nivenly Foundation as our official steward?
We will need to update the CLA as well as have existing contributors to Aurae agree to the new terms.
Curious if there is any thoughts about clustering right now. The docs make references to Kubernetes, but from what I can tell the current code is purely focused on running AuraeScript against a local instance of aurae.
Is the current idea to fully offload this to an external system, such as Kubernetes, or have a cluster aware layer that is accessible from AuraeScript?
I ask because this very closely mirrors ideas I've been playing with and would like to contribute.
Right now it is possible to use the rmdir
command in the /sys/fs/cgroup
directory to manually destroy cgroups.
In the event that we manually destroy a cgroup that was started with auraed
the internal cache still believes the cgroup exists. This is problematic when we try to re-create the cgroup again with auraed
.
Imagine the principle of least privilege but for systems awareness.
As we traverse up the stack, a system should only have awareness of the systems that sit "below" itself.
For example the Kubernetes Kubelet has awareness of the control plane. The control plane also has awareness of the Kubelet. This interdependence model is to be avoided with Aurae.
In other words auraed
should never have Kubernetes awareness. The Kubernetes control plane might potentially choose to leverage Aurae as a Kubelet/Systemd alternative, however the interaction between these systems will likely need to be patched in a generic way to make them work.
For example Kuberenetes might want to schedule a "Pod" however Aurae should have no awareness of Kubernetes "pods". Aurae will just run containers, if Kubernetes chooses to containers up with shared networking, storage, and metadata and refer to that as a "pod" so be it.
This same pattern is reflected at the kernel level as well.
The kernel should never have awareness of Aurae, however Aurae will have kernel awareness. The pattern flows upwards with each system having awareness of the systems below itself, but never above.
The principle of least awareness.
Do we want to provide a secure way of installing, managing, upgrading, and authenticating kernel modules and eBPF probes?
Think DKMS but for more than just kernel modules, and with an authz gRPC API to do the dirty work. We could also authenticate 3rd party binary blobs as well as provide attestations they are what we want them to be.
Also work considering BCC and BPFtrace for existing BPF work we could easily give a story to.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.