Coder Social home page Coder Social logo

asterinas / asterinas Goto Github PK

View Code? Open in Web Editor NEW
669.0 669.0 69.0 4.78 MB

Asterinas is a secure, fast, and general-purpose OS kernel, written in Rust and providing Linux-compatible ABI.

Home Page: https://asterinas.github.io/

License: Other

Rust 95.51% Makefile 0.65% Assembly 0.78% C 2.61% Shell 0.45%
kernel os rust tee

asterinas's Introduction

asterinas-logo
A secure, fast, and general-purpose OS kernel written in Rust and compatible with Linux
OSDK Test Kernel Test

English | 中文版

Introducing Asterinas

Asterinas is a secure, fast, and general-purpose OS kernel that provides Linux-compatible ABI. It can serve as a seamless replacement for Linux while enhancing memory safety and developer friendliness.

  • Asterinas prioritizes memory safety by employing Rust as its sole programming language and limiting the use of unsafe Rust to a clearly defined and minimal Trusted Computing Base (TCB). This innovative approach, known as the framekernel architecture, establishes Asterinas as a more secure and dependable kernel option.

  • Asterinas surpasses Linux in terms of developer friendliness. It empowers kernel developers to (1) utilize the more productive Rust programming language, (2) leverage a purpose-built toolkit called OSDK to streamline their workflows, and (3) choose between releasing their kernel modules as open source or keeping them proprietary, thanks to the flexibility offered by MPL.

While the journey towards a production-grade OS kernel can be challenging, we are steadfastly progressing towards our goal. Currently, Asterinas only supports x86-64 VMs. However, our aim for 2024 is to make Asterinas production-ready on x86-64 VMs.

Getting Started

Get yourself an x86-64 Linux machine with Docker installed. Follow the three simple steps below to get Asterinas up and running.

  1. Download the latest source code.
git clone https://github.com/asterinas/asterinas
  1. Run a Docker container as the development environment.
docker run -it --privileged --network=host --device=/dev/kvm -v ./asterinas:/root/asterinas asterinas/asterinas:0.4.2
  1. Inside the container, go to the project folder to build and run Asterinas.
make build
make run

If everything goes well, Asterinas is now up and running inside a VM.

The Book

See The Asterinas Book to learn more about the project.

License

Asterinas's source code and documentation primarily use the Mozilla Public License (MPL), Version 2.0. Select components are under more permissive licenses, detailed here. For the rationles behind the choice of MPL, see here.

asterinas's People

Contributors

cchanging avatar clawseven avatar dequeueing avatar fgh1999 avatar grief8 avatar hsy-intel avatar ihichew avatar jiacai2050 avatar js2xxx avatar junyang-zh avatar lclclcdisacat avatar lesliekid avatar liqinggd avatar lrh2000 avatar plucky923 avatar sdww0 avatar skpupil avatar stanplatinum avatar stevenjiang1110 avatar tatetian avatar vvvvsv avatar yanwq-monad avatar yingdi-shan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

asterinas's Issues

Tracking issue for user-mode development

This is the tracking issue for user-mode development.

User-mode development allows Jinux developers to test and debug more crates, components, or subsystems in user space, thus accelerating the development process. In particular, user-mode development should enable cargo test for the crates that depend on either jinux-framework or jinux-std.

The current progress towards enabling user-mode development:

  • Draft a RFC about how to support user mode development in Jinux
  • Implement basic supports for user mode development, e.g., support testing vm-related APIs in user mode.

Tracking unimplemented features of signals

  • support running signal handler on a user-provided stack (set by sigaltstack)
  • save and restore floating point registers(fpRegs are not stored in CpuContext now)
  • the complex union field in siginfo_t. Get and Set a field in a union requires unsafe operation, currently not implemented.
  • some signal-related syscalls

Implement a basic procfs

This is the tracking issue for procfs.

You may wonder why prioritizing procfs, a special-purpose file system that is rarely used by applications. The rational is that ...

Implement the component initialization mechanism

We want to implement a component system initialization method that makes it easy for component developers to register components with the component system. The component initialization mechanism should follow the following rules.

  1. Components should be initialized in priority. A component may depend on the initialization of other components, so they need to wait for other components to initialize. We want to automatic generate priority and developers do not need to fill in the priority.
  2. Component system can automatically find the component name and match the component name with priority at compile time or runtime.

TODOs

  • Automatic generate priority.
  • Match the component name with the priority.
  • Extract the component system as a separate crate

Upgrade bootloader to v0.11.0

The latest bootloader version is v0.11.0, which passes ramdisk-related parameters to the kernel. But its ramdisk support has not yet been released as a stable version. So the current idea is to open a new branch for support, and then merge it after the release of the stable version.

Draft RFC: Jinux component system

This RFC describes the motivation and design of Jinux's component system.

Motivation

The only way to write complex software that won't fall on its face is to build it out of simple modules connected by well-defined interfaces so that most problems are local and you can have some hope of fixing or optimizing a part without breaking the whole.

--The Art of Unix Programming

Every OS strives to be modular. OS developers have no choice but to embrace modularity as it is the only way to tame the sheer complexity of an OS and make it work. As Jinux developers, we are a bit more ambitious: not only do we want it to work, we want it to do so in a safe and secure way.

Rust can facilitate modular software design with language features like crates and modules, but we find them insufficient for large and complex software like Jinux. In particular, we argue that there are two areas that lack support from the Rust language or ecosystem, one is modular initialization, and the other is access control. The former is important for OS maintainability, while the latter is for OS security.

Modular initialization

OS startup is not an easy job. A fully-fledged OS supports all kinds of CPU architectures, machine modules, buses, block devices, network devices, file systems, and more. Each of them is qualified as one or multiple OS components. So there could be hundreds of OS components that need to be initialized during OS startup. And the initialization of components must be done in an order that takes into account the functional dependencies between the OS components.

To promote a highly modular architecture for Jinux, we decide that the concepts of OS components must be supported as first-class citizens and their initialization must be done in an automatic and ergonomic fashion.

Access control

Microkernels are the pioneers in the pursuit of constructing a highly modular OS with OS components. Take seL4-based multiserver OS as an example. Each OS component runs as a separate process to enforce strong isolation. The interfaces between OS components are well-defined with a dedicated language. This approach undoubtedly benefits security but comes at the price of communication overheads (e.g., RPC).

seL4 components

As Jinux developers, we believe that a language-based approach to OS components can reap the most important benefit (i.e., security) of the process-based approach while avoiding its main drawback (i.e., performance). In Jinux, we implement OS components as regular Rust crates. As we do not allow OS components to contain unsafe Rust code, crate-based OS components are as strong as process-based ones in preventing one component from messing up the memory of another one. But here comes the key difference: process-based components can enforce access control by exposing their APIs selectively (e.g., through RPC), but crate-based ones must choose between exposing their API to all (use pub keyword) or none. Without enforcing some form of access control, crate-based OS components cannot match the security of process-based ones.

Design

The Jinux component system extends regular Rust crates with the extra abilities of modular initialization and access control.

One key design challenge is to enable these two features in the most ergonomic way. In Jinux, we want to promote constructing a highly-modular OS kernel by having a large number of fine-grained OS components. So programming OS components, including their initialization and access control, must be a breeze, not a pain in the ass.

Modular initialization

Static variable initialization

Initializing a Jinux component can be boiled down to initializing static variables in Rust. This is because 1) a Jinux component is simply a Rust crate; 2) crate-level states are commonly expressed as static variables.

Prior Rust OSes adopt one of the two methods to initialize static variables. The first method uses the lazy_static! macro, which ensures that the variable is initialized automatically when it is accessed for the first time.

lazy_static! {
    static ref UART_BUF: Mutex<VecDeque<u8>> = Mutex::new(VecDeque::with_capacity(512));
}

This method is easy-to-use, but it provides no means to gracefully handle errors that may occur during variable initialization. So it is not suited for a production-quality implementation.

The second method is wrapping the static value within Once<T>, whose init method will be invoked manually by the component's init function. If some code attempts to use Once<T> before its init method is invoked, a panic would be triggered.

static UART_BUF: Once<Mutex<VecDeque<u8>>> = Once::new();

pub fn init_uart() {
    UART_BUF.init(|| Mutex::new(VecDeque::with_capacity(512)));
}

The second method gives more control to the developers, including the ability to handle errors and the timing of initialization. But this method is not scalable as the number of OS components grows. In a centralized location, some magic code must be aware of the init function of each and every OS component and invoke them one by one---and in the right order. This is against our design goal of ergonomics.

The Component trait

Our first step towards easy component initialization is to provide an abstraction for components, which is the Component trait.

pub struct UartComponent {
    buf: Mutex<VecDeque<u8>>,
}

impl Component for UartComponent {
    fn init() -> Result<Self> {
        let buf = Mutex::new(VecDeque::try_with_capacity(512)?);
        Ok(Self { buf })
    }

    fn name(&self) -> &str {
        "uart"
    }
}

The #[init_component] macro attribute

Next, we define a static variable that represents the component.

#[init_component]
static SELF: Once<UartComponent> = Once::new();

Thanks to the magic #[init_component] macro as well as the Component trait, the static variable will be initialized automatically by the component system.

Behind the scene, this magic macro will generate code that initializes the UartComponent. During the kernel startup, the generated initialization will be invoked after all dependent OS components of this UART component have been initialized. As the code generation and dependency resolution are conducted by the component system automatically, developers' efforts are minimized.

Access control

Let's consider this simple question: given crate a, b, and c, how can we let crate b access a static variable a::A, while forbidding crate c from doing so?

The Controlled<T> wrapper type

We introduce Controlled<T>, which is used to wrap an access-controlled value.

// file: a/lib.rs

/// Resource represents some kind of resources managed by crate a.
pub struct Resource { /* ... */ }

impl Resource {
    pub (crate) fn new() -> Self { todo!() }

    pub fn count(&self) -> usize { todo!() }
}

static A: Controlled<Resource> = Controlled::new(Resource::new());

Anyone who has a reference to a Controlled<T> object cannot gain a reference to its internal value of T... unless using the access macro.

The access macro

Crate b is allowed to access a::A by using the access macro.

// file: b/lib.rs

fn use_a()  {
    access!(a::A).count();
}

But you may ask: what if crate c attempts to use the access macro? Well, now is the time to introduce the configuration file of Jinux component system.

The Components.toml file

The Jinux component system respects a configuration file named Components.toml, which, among other things, specifies the access control policy.

Here is an example configuration file which specifies the access control policy where crate b is allowed to use a::A but crate b is not.

[components]
a = { path = "a/" }
b = { path = "b/" }
c = { path = "c/" }

[whitelist]

[whitelist.a.Resource]
b = true

The whitelist is a table whose keys are public types of components, whose values tell whether a public type may be used by a component if it is wrapped inside Controlld<_>.

The cargo component command

Since components are configured statically, the access control policies are also enforced statically. By default, when components are built, the access control policy will not be enforced. The rationale is that we do not want the new access control mechanism to interrupt or disturb the development activities or flows as the way they are now.

To capture any violations against the access control policy, one should use the following command.

cargo component audit

cargo component is a Cargo subcommand that we develop for Jinux's component system. Its audit subcommand checks each usage of the access macro against the whitelist given in Components.toml. The command will report any access control violation if any is found.

Implementation

To be continued.

framework: allow the kernel heap size to grow dynamically

The current implementation of the heap allocator is backed by a static memory pool whose size is determined at compile time. But the underlying allocator crate, buddy-system-allocator, actually supports more free memory dynamically with the add_to_heap method.

My suggestion is to allow the kernel heap size to grow dynamically with two steps:

  1. Initially, give the heap allocator a small memory pool whose size is determined at compile time;
  2. If the heap allocator runs out of free memory, let it ask more memory from the frame allocator, which manages (almost) all usable memory on the system.

Add initramfs support

The progress of initramfs support:

  • Add the cpio-decocer to parse cpio newc-format archive.
  • Unpack the cpio archive to RamFS to prepare the root fs.
  • Wait for the bootloader to upgrade and use it to load the ramdisk file (depends on #68)

RFC: Support developing and testing jinux in user mode

Motivation

As a general-purpose operating system, Jinux contains many components, such as fs, vm, network, scheduler, etc. These components are logically independent of the details of the underlying hardware. Therefore, such components can be developed and tested in isolation. At present, jinux can test these components through the custom test framework, but there are some problems. First, these tests are run in the kernel mode. We need to start a virtual machine every time we run tests. Considering these tests are run frequently, it is rather expensive. Second, we still lack some support for testing provided in the standard library, such as #[should_panic].

But under the current implementation, all components will depend on jinux-std or jinux-frame. Since jinux-frame depends on the bootloader and contains many architecture-related assembly implementations, it can only be compiled to the x86_64-unknown-none target (or the current x86_64-custom), which is designed for the kernel mode, and cannot be run directly in user mode.

Therefore, we hope to introduce a mechanism to directly test hardware-irrelevant components in user mode.

Explanation

Currenty, we only hope to support testing function calls in hardware-irrelevant components. More complex cases, such as interrupts, exceptions, system calls are not included. Device drivers and user programs are not supported too. Such cases can be left to future RFCs or use the current integration testing framework.

Take a real case for example, if we want to test whether copy on write mechanism in Vmo is implemented correctly, we may write the following test code. But at present, Vmo depends on VmFrame internally, and VmFrame depends on hardware. The bootloader needs to pass in the range of available physical memory, and then the physical memory allocator can be initialized and allocate VmFrame, so the test code cannot run in user mode. But in fact, considering that VmFrame means holding a physical page, we can simply use a [u8; PHYS_MEM_SIZE] to simulate the physical memory and VmFrame is just a slice of PAGE_SIZE, then such a test program can also be run in user mode.

// Example: We want to test copy on write.
#[test]
pub fn test_copy_on_write() {
    let parent = VmoOptions::new(PAGE_SIZE).alloc().unwrap();
    // write 255 to offset 0 in parent vmo.
    parent.write_val(0, 255u8).unwrap();
    let child = parent.new_cow_child(0..parent.size()).alloc().unwrap();
    // read from child vmo should be 255
    assert!(child.read_val(0) == 255u8);
    // write 1 to child vmo.
    child.write_val(0, 1u8);
    // read from parent should be 255
    assert!(parent.read_val(0) == 255u8);
    // read from child should be 1.
    assert!(child.read_val(0) == 1u8);
}

Design

We adjust the interfaces of jinux-frame. The interfaces are divided into two types: umode and kmode, which can be configured by features="kmode" or features="umode". If jinux-frame is configured as kmode, then jinux-frame will depend on bootloader, and can be started in the virtual machine and run in the kernel mode. If configured as umode, then jinux-frame will not depend on bootloader, so it can run directly on the host machine test.

crate-level dep

The figure above shows the crate-level dependencies inside jinux-frame after planning to support user-mode development. When running in kernel mode, only the crates in the green box will be compiled and run, and when running in user mode, the crates in the orange box will be compiled and linked.

We split the APIs of jinux-frame into three parts, namely frame-common, frame-kmode, and frame-umode. Frame-kmode includes those parts that rely on bootloader or architecture-related assembly. Frame-umode provides the same interface as frame-kmode, but internally it will only rely on user mode implementation. frame-common represents those general-purpose APIs, such as WaitQueue, SpinLock, which have nothing to do with specific hardware details.

The crate frame controls which APIs are exposed based on features. As shown in the code below. APIs from frame-kmode and frame-umode will be controlled by specific features, while APIs from frame-common will not be controlled. The frame crate will rely on arch-related APIs to implement higher level APIs.

// jinux-frame/vm/frames.rs
#[cfg(target_os = "none")]
pub use frame_kmode::alloc_frame;
#[cfg(not( target_os = "none"))]
pub use frame_umode::alloc_frame;

pub use frame_common::Paddr;

pub struct VmFrame {
    start_pa: Paddr,
}

impl VmFrame {
    pub fn alloc() -> Self {
         Self {start_pa: alloc_frame() }
    }
}

For crates that depend on jinux-frame, it does not need to do any modification.

# jinux-std/Cargo.toml
jinux-frame = { path = "../jinux-frame" }

The below example shows how to use APIs from jinux-frame. For the API that used in test, it will finally call APIs in frame-umode, and if it is not in test, it will call that in frame-mode.

// jinux-std/lib.rs
use jinux_frame::VmFrame;

#[cfg(all(test, not( target_os = "none")))]
mod test {
    use super::*;
    #[test]
    fn test_fn() {
        // fn in frame-umode when running `cargo test`
        VmFrame::alloc();
    }
}

fn normak_fn() {
    // fn in frame-kmode when running `cargo kbuild`
    VmFrame::alloc();
}

Implementation

avoid depending on bootloader

We need to avoid depending on the bootloader crate when testing in usermode, because the bootloader crate can only be compiled to the target x86_64-unknown-none, which cannot run on host machines. Cargo provides a target-specific dependency mechanism, so we only compile and link frame-kmode when the feature of the jinux-frame crate has feature kmode, while we compile and link frame-umode when jinux-frame crate has feature umode.

[dependencies]
frame-common = {path = "../frame-common"}

# kmode
[target.'cfg(target_os = "none")'.dependencies]
frame-kmode = { path = "../frame-kmode" }

# umode
[target.'cfg(not(target_os = "none"))'.dependencies]
frame-umode = { path = "../frame-umode" }

deal with existing tests

Existing integration tests may only run on x86_64-unknown-none target, so this tests should also be set behind #[cfg(target_os = "none")].

reimplement kmode APIs

Considering the current API comprehensively, and referring to the definition of kernel-hal in zcore, the interface between jinux-kmode/jinux-umode and jinux-frame can be categorized. The below table shows what APIs involves:

Type Description Examples How to reimplement
console read from keyboard, write to screen println, trace use std
cpu cpu info num_cpu, this_cpu -
mm physical memory, page table PhysFrame(alloc, dealloc), phys_to_virt -
task low-level thread Task(new, current, run, yield) -
interrupt interrupt(including timer) disable_interrupt/enable_interrupt, allocate_irq -
timer system time get_system_time -
user executing code in user mode UserMode(execute) -

Further possibilities

Not only unit test, but also fuzzing or symbolic testing can be adopted.

Unsolved questions?

  • What is the suitable API between frame-kmode/umode and jinux-frame?
  • Should be keep frame-kmode/ umode and frame in the same crate?

framework: reimplement address space-related data structures with radix trees

There are various address space-related data structures (e.g., memory mappings, page caches) that use BTreeMap (or hash maps) internally to map memory addresses to other some data types.

For these use cases, however, a better choice would be Linux-style radix trees, which is more efficient than binary trees in terms of memory consumption and CPU time.

  • Add a safe and generic radix tree implementation to Jinux Framework. Reuse existing Rust radix tree crates if possible.
  • Refactor various data structures with radix trees

Add workqueue

This is the draft implementation for a simple workqueue mechanism.

/// Workqueues are a mechanism for asynchronous execution.
///
/// A workqueue is a FIFO queue of work items, each of which
/// is associated with a function. The work items are processed by
/// a pool of worker threads. When no work items are left for
/// processing, the worker threads shall go to sleep.
///
/// Ideally, the number of active worker threads should be kept
/// small to save resources but not too small to hurt performance.
/// The exact strategy for managing worker threads and dispatching
/// work items are transparent to the users of workqueues in order
/// to make the API as easy-to-use as possible.
///
/// In this initial implementation, there is only one workqueue and
/// one worker thread system-wide. Over time, we shall improve
/// the concurrency of the implementation by mimicing the strategy
/// adopted by Linux's Concurrency Managed Workqueue.
pub struct Workqueue;

impl Workqueue {
    /// Enqueue a closure as a work item.
    pub fn enqueue<F>(f: F)
    where
        F: FnOnce() + Send,
    {
        let work_item = Box::new(WorkItem::new());
        let mut queue = QUEUE.lock();
        queue.push_back(work_item);
    }

    /// Initialize the workqueues and worker threads.
    pub fn init() {
        QUEUE.init(|| Mutex::new(LinkedList::new()));
        todo!("spawn the worker thread");
    }
}

static QUEUE: Once<Mutex<LinkedList<LinkAdapter>>> = Once::new();

intrusive_adapter!(LinkAdapter = Box<WorkItem>: WorkItem { link: LinkedListLink });

struct WorkItem {
    work: Box<dyn FnOnce() -> ()>,
    link: LinkedListLink,
}

impl WorkItem {
    pub fn new<F>(f: F) -> Self
    where
        F: FnOnce + Send,
    {
        Self {
            work: Box::new(f),
            link: LinkedListLink::new(),
        }
    }
}

mod test {
    fn closure() {
        Workqueue::enqueue(|| {
            println!("closure as a work item");
        })
    }
}

Support microVMs

Motivation

MicroVMs are light-weight VMs supported by Firecracker and QEMU's microvm machine type. MicroVMs provide isolated environments for containers to run efficiently and securely. MicroVMs only require minimal hardware support from the guest OS. Thus, MicroVMs are a perfect target for the MVP version of Jinux.

It’s a minimalist machine type without PCI nor ACPI support, designed for short-lived guests... The microvm machine type supports the following devices:

  • ISA bus
  • i8259 PIC (optional)
  • i8254 PIT (optional)
  • MC146818 RTC (optional)
  • One ISA serial port (optional)
  • LAPIC
  • IOAPIC (with kernel-irqchip=split by default)
  • kvmclock (if using KVM)
  • fw_cfg
  • Up to eight virtio-mmio devices (configured by the user)

One immediate benefit of microVM support is to enable live demo by using a serverless cloud service at an affordable cost, in the same way as how Kerla's live demo works.

TODOs

Use log crate to print information

Currently, all printing relies on macros such as info, error, println, etc. provided by jinux-frame. We want to use log crate instead.

Tracking the unimplemented syscall to support busybox ash

This issue tracks the essential syscall to support busybox binaries. We compile a statically linked busybox(other configs are as default), then use strace to record the syscall.

The unimplemented syscalls for running ash:

  • readlink (only can read /proc/self/exe, returns the name of current executable file)
  • mprotect (only a fake implementation now)
  • prctl (Set or get process name)
  • getppid
  • getcwd (only return the root directory as a fake result now)
  • ioctl (support commands about termios)
  • fcntl (only support a naive dup now)
  • getpgrp
  • rt_sigaction
  • setpgid
  • openat (only return ENOENT now)
  • fstat (only return fake result for stdout now)
  • read
  • close
  • geteuid (only return fake result now)
  • write (only write to stdout now)
  • poll

Syscalls to implement for running ls in ash:

  • lseek
  • vfork

Known limitations in current implementations for futex

There are two main limitations in futex now.

  • The futex requires the atomic load/store while loading/storing value. Currently, we simply use read_bytes/write_bytes instead of atomic operations. We should implement atomic read/write with atomic instructions.
  • Set the correct key for futex word

Following the implementation in occlum, now kxos uses the virtual address as futex key to mark different futex words. In kernel, all futex words are stored in a key-value map(HashMap or BtreeMap), so we can get the exact futex word given a futex key(virtual address, for occlum). Occlum only has one address space, so the virtual address is unique for each futex word. In kxos, each process has an individual address space. The same futex word can be at different virtual addresses for different processes( futex manual ).

futex

Like the above figure, process 1 and 2 may share futex words on different virtual addresses but same physical addresses. Process needs to know the exact futex word when it performs futex wake, which means waking up futexes waiting on the same futex word. Using virtual address to mark different futex words seems not to be a choice.

There can be two possible choices.

  1. Use physical address as the key to mark different futex words, since the physical address of futex word is fixed. But for now, the kxos-frame API does not expose the physical address.
  2. Still use virtual address. This depends on how we implement shared memory. Futex words are placed in shared memory. If shared memory are mapped to same virtual address for all processes, then virtual address is enough.

Tracking issue for Ramfs

This is the tracking issue for Ramfs.

Ramfs is going to the first file system implemented in KxOS because it is used as the initramfs.

  • Design an Ramfs implementation based on VMOs.

回顾和整理迭代1的代码(09/04)

  • 洪亮review当前代码实现
  • 确保代码风格符合规范
    • make fmt或者cargo fmt
  • #7
  • #15
  • 减少kx_frame不必要暴露出来的API,包括
    • 公开函数
    • 公开类型的公开方法

列出来更多technical debt...

关于Task, Thread, Process的区别

由于KxOS引入了引入了Task来封装unsafe操作,在任务这一块存在Task, Thread, Process这样三个层次。在实现之前,首先需要明确区分Task, Thread, Process分别负责什么样的功能,以下提供一个初步的设想来供讨论。

  1. Task持有运行一个函数所需要的最少资源,包括:
  • 所要执行的函数
  • 函数所需的数据
  • 内核栈
  • 内核寄存器状态
  • 用户地址空间(只有用户态task才有)
  • tid(线程的ID,是否需要持有?主要是为了方便获得current_thread)
    只有Task定义在frame部分,因此Task需要封装所有的unsafe代码
  1. Thread代表调度的最小单元,Thread是Scheduler内部持有的对象。Thread与Task严格维护一对一的关系。Thread内部持有
  • tid(如果tid为0, 则表示为主线程)
  • thread的状态(sleep, exit, ready)
  • 优先级
  • exit_code
  • Thread还会通过指针指向当前所处的进程(Weak <Process> )
    Thread的状态可能被并发访问,因此Thread需要在内部使用锁或者原子操作进行并发访问的控制,最终Thread能封装成为Send+Sync的类型(且不需要unsafe)
  1. Process代表着一组共享资源的线程,共享的资源包括但不限于
  • 文件
  • 地址空间
  • 信号

下面给出一些接口的定义。以及一些简单的实现

/// Task代表运行一个函数所需要的最少资源,
/// 包括所需要执行的函数;函数所需的数据; 内核栈;内核寄存器状态;用户地址空间;tid?
pub struct Task {}

impl Task {
    /// 新建一个task
    pub fn new() {}
    /// 保存当前Task的状态,切换到另一个Task来执行
    pub fn switch_to(other: &Task) {}
    /// 读取当前的Task信息, Task信息一般是存在内核栈上的,
    pub fn current() -> Arc<Task> {}
}

/// Thread代表调度的基本单位,Thread与Task是严格一对一的关系
/// Thread内部包括tid(如果tid为0, 则表示为主线程), thread的状态,优先级,exit_code
/// Thread还会持有当前的进程(Weak<Process>)
pub struct Thread {}

pub struct ThreadStatus {
    Ready,
    Sleeping, //Blocking,
    Exited
}

impl Thread {
    /// 新建一个thread,这个函数只能在创建进程或者是进程内部来调用
    pub fn new() {
    }
    
    /// 获取当前Task对应的Thread
    pub fn current_thread() {}
    
    /// 让出当前thread所占有的CPU
    pub fn yield_now() {
        GLOBAL_SCHEDULER::re_schedule();
    }
    
    /// 退出一个thread
    /// 如果当前线程tid为0的话,需要退出整个进程
    pub fn exit(exit_code: i32) {
        self.exit_code = exit_code;
        self.status = ThreadStatus::Exited;
        if self.tid == 0 {
              // 退出进程的逻辑
        }
        GLOBAL_SCHEDULER::re_schedule();
    }
    
    // 这个实现可能有问题,wakeup的操作需要在计时器的中断上下文中执行
    pub fn sleep(ms: usize) {
        self.status = ThreadStatus::Sleeping;
        let current_thread = current_thread();
        timer::set_timeout(ms, || {
            current_thread.wake_up();
            GLOBAL_SCHEDULER::re_schedule();
        });
    }
    
    // 唤醒一个线程
    pub fn wake_up() {
        self.status = ThreadStatus::Ready;
    }
}

/// Process对应一组Thread,其中tid为0的Thread代表主线程
/// Process管理资源,包括共有的文件,地址空间,信号等
/// Process上可以实现fork, waitpid等函数
pub struct Process {}

// 调度器会持有所有ready的Thread
pub struct Scheduler {}

impl Scheduler {
    pub fn enqueue(thread: Arc<Thread>) {}
    pub fn dequeue() -> Option<Arc<Thread>> {}
    
    /// 重新调度运行
    pub fn re_schedule() {
        let current = current_thread();
        // Add current to Scheduler if current is ready
        if current.status == ThreadStatus::Ready {
            self.enqueue(current);
        }
        let next = self.dequeue().unwrap_or_default(IDLE_THREAD);
        current.task.switch_to(next.task);
    }
}

一些不确定的点:

  1. 在Task不持有tid的情况下,如何实现current_thread?
  2. 这个实现能够完全避免使用unsafe吗?

Tracking todo list in 11.11 weekly meeting

  • use safeptr wrapper to deal with union safely(siginfo_t....)
  • sig_context can be removed
  • wait_until can provide more tiny interface(Result<Option> -> Option)
  • enable wait interruptible in std
  • determine the suitable lock granularity

开发计划

以下是项目开发计划,将开发任务分为主线与支线任务。理想上,主线任务形成了一个最快速验证项目架构和特色的路径。这样,沿着主线任务一步一步开发,就可以让我们最短时间和最小代价验证项目的可行性。支线任务也很可能涉及很重要的任务,也可能比较复杂,但其价值更多是工程上或功能上,而非学术上或架构上。

主线任务

  • 8月:搭建系统框架
    • 冲刺1:搭建系统框架(@宇科 & @剑锋)
      • 运行最简单的内核(08/21)
      • 运行最简单的ELF程序(08/28)
      • 回顾和整理迭代1的代码(09/04)
  • 9月:验证特权分离,并实现最小化内核
    • 冲刺1:支持virtio块设备(@宇科)
      • 跑通基于PCI和Virtio的块设备(09/18)
      • 重构实现以剥离和抽象unsafe(10/05)
      • 实现open/read/write/close等系统调用
    • 冲刺2:支持多进程管理与调度(@剑锋)
      • 实现fork/getpid/exit(09/11)
      • 启动普通C程序(09/26)
      • 实现waitid/futex(10/10)
      • 实现sleep/yield(10/10)
      • 实现exec
      • 实现kill
  • 10月:验证零开销Capabilities
    • 冲刺1:实现capability的框架(@剑锋)
      • 实现typeflags和require宏(10/24)
    • 冲刺2:实现VM capability的API(@宇科)
      • 设计VM capability的API
  • 11月:完善多进程与实现Capabilities
    • 冲刺1:实现VM capability的API(@宇科)
      • 撰写VM capability的API spec(11/07,@洪亮)
    • 冲刺2:实现Signal(@剑锋)
      • 实现Signal的基本功能(11/07)
  • 12月:支持shell运行

支线任务:

  • 使能用户态模式的kx-frame
  • 多核支持
  • 构建initramfs

可以让Task/Schedule相关的frame API更简单吗

背景

目前,我们task和schedule相关的frame API还比较高层。比如,task::spawn来启动task、task::yield来切换task、WaitQueue::wait来挂起task,而调度算法则由Scheduler trait抽象。

问题

为了kx_frame最简化,剑峰提了一个改进的方向:指让task暴露出来一个Task::switch_to方法。似乎仅依赖于这一方法,就可以让frame的上层安全的实现spawn、yield、wait以及scheule等能力。我认为这个想法是有一定道理的,可以继续深入思考一下。

framework: combine vm and mm modules into one

In the source tree of Jinux Framework, there are two folders, vm and mm, that is in charge of memory management. There seems to be no point in having two separate modules. They should be combined into one.

How to derive Pod implementations automatically and safely?

Background

The plain-old data (POD) trait Pod is a marker trait that tells whether a concrete type is safe to be converted from and to arbitrary bytes of the same memory size. The trait is unsafe. Because when implemented for a wrong type, undefined behaviors will be caused.

Problem

The problem with this trait is that we have to implement Pod for countless number of types. And because the implementation for this trait is unsafe, it seems that we have to put the implementation along with the definition of the target type inside the privileged part of KxOS (i.e., kx-frame). This would bloat the size of TCB. So it is very undesirable.

Theoretically, we should be able to derive implementations for Pod automatically and safely because of the simple truth that a struct is POD if all its fields are POD.

Solution

First, we implement Pod for the primitive POD types in kx-frame.

Then, we implement a derive procedural macro also as part of kx-frame. The macro would transforms the following code

#[derive(Pod)]
#[repr(C)]
pub struct A {
    flags: i8,
    count: u64,
}

#[derive(Pod)]
#[repr(C)]
pub struct B {
    a: A,
    buf: [u8; 128],
}

to

#[repr(C)]
pub struct A {
    flags: i8,
    count: u64,
}

unsafe impl Pod for A {}

#[repr(C)]
pub struct B {
    a: A,
    buf: [u8; 128],
}

unsafe impl Pod for B {}

But how to prevent the procedural macro from misuse? For example, how to prevent the macro from generating the following code.

#[repr(C)]
pub struct C {
    nums: Vec<u8>,
}

// Lead to undefined behaviors!
unsafe impl Pod for C {}

Our idea to introduce a magic macro called assert_impl_trait_for(type_, trait), which, as its name suggests, check at compile time whether trait is implemented by type_. If not, a compiler error will be thrown. With this new macro, our Derive(Pod) macro generates code pattern as follows.

#[repr(C)]
pub struct B {
    a: A,
    buf: [u8; 128],
}

assert_impl_trait_for!(Pod, A);
assert_impl_trait_for!(Pod, [u8; 128]);

unsafe impl Pod for B {}

This way, we can ensure the unsafe implementation blocks is safe. But how to write such a magical macro? The answer is: while it is possible, we don't have to. Because there is a simpler solution.

#[repr(C)]
pub struct B {
    a: A,
    buf: [u8; 128],
}

unsafe impl Pod for B 
where
    A: Pod,
    [u8; 128]: Pod,
{}

Yes, trait bounds are more flexible and powerful than you might think. This solution even works for generics.

#[repr(C)]
pub struct B<T: Pod> {
    any: T,
    buf: [u8; 128],
}

unsafe impl<T: Pod> Pod for B<T>
where
    T: Pod,
    [u8; 128]: Pod,
{}

Fix unsafe code in xmas-elf crate

There are several places where unsafe is used in xmas-elf. I examine the code and they should be fixable using our Pod and POD derivation macro. So I think we should fork xmas-elf and make the necessary changes to remove unsafe using our Pod infrastructure. For this to work, we also need to extract POD-related crates out of KxOS and host them as a separate project. This way, our xmas-elf crate can specify our POD crates as a dependency.

(By the way, actually I adopt the term Plain Old Data (POD) from xmas-elf.)

支持在macOS上开发

It seems that the hello world programs cannot be built on macOS. If so, either fix the build script (e.g., with gcc/clang) or upload the compiled binary with Git LFS.

Reorganize the codebase for clarification and scabability

Motivation

The organization of the codebase (under src/) is not clear. The situation will become even worse as more crates are added. We propose to reorganize the codebase so that the code structure reflects privilege separation more clearly. Specifically, we want to

  • Make it obvious which crates are privileged and which are not.
  • Keep this obviousness even if the current crates are divided into more smaller crates for modularity and many new crates are added.

In addition to distinguishing between privileged and unprivileged code, the reorganization can also help with:

  • Distinguishing between KxOS-specific crates and general-purpose crates. For example, the type-flags crates do not depend on KxOS or include any KxOS-specific code. So the crates' name should not begin with kxos_.
  • Distinguishing between library crates and components crates. Libraries provide utilities, while components control resources. Some examples of library crates include type-flags, kxos-rights, and kxos-std, while components crates include kxos-syscall (extract all system call dispatching logic from kxos-std to this new crate), kxos-pci (which controls the PCI subsystem), and kxos-virtio (which manages the virtio drivers).

Proposed changes

The new code structure would look like the following.

kxos/
    src/
        kxos-boot/
        framework/
            kxos-frame/
            pod/
            pod-derive/
        services/
            libs/
                kxos-std
                kxos-rights
                kxos-rights-proc
                kxos-util
                type-flags
                type-flags-util
            comps/
                kxos-syscall
                kxos-pci
                kxos-virtio
        apps/
        tests/
    README.md 

Some notes.

  • PODs should be put in a separate crate named pod because this is independent of kxos-frame. And pod-derive crate only depends on pod crate.
  • user/ is renamed to apps, the latter of which seems to be more clear.
  • I have some initial thoughts on implementing component-level access control mechanism in addition to our current object-level access control mechanism based on zero-cost capabilities. This idea is originally proposed in CapComp. I need to adapt the original design for KxOS. By putting all components under comps, it becomes clear which crates are subject to access control and which are not.
  • Need to update src/README.md.
  • One should use his or her best judgement when extracting "component code" from kxos-std to kxos-syscall, while leaving "library code" in kxos-std.

Can we provide compatible interfaces with trapframe.rs?

Trapframe.rs is rcore's implementation of trapframe. Trapframe.rs provides universal interfaces to deal with trapframes across different architecture. We only care about its x86_64 part.

At top level, trapframe.rs provides a initfunction, which will init gdt, tss, idt and enable syscall instruction(see code comment). Such operations has also included in kxos-frame. So, it’s possible for kxos to follow the same init process.

There are some slight differences (may not be a big problem).

  1. Trapframe.rs provides a more detailed Trapframe structure (also with GpRegs)
  2. Trapframe.rs stores interrupt handler in IDT(init idt), while kxos registers interrupt handlers for each IrqLine and manually call interrupt handler now. The difference is that IDT may support only one handler for each irq number, while kxos support multiple handlers.
  3. Trapframe.rs seems to allocate an additional stack to deal with trap(see code comment)
  4. Trapframe.rs mentions support for fast syscall(L44 and L61).

Treat object methods as entry points

We propose to slightly extend the current design of the component-level access control mechanism. As specified in the original RFC, the component-level access control mechanism applies to entry points, which include non-method public functions and public static variables. This RPC proposes to treat object methods as entry points, too.

Motivating example: what's wrong with getting the current thread?

One common programming pattern in OS is to first get the pointer to the data structure that represents the current thread, then access the OS resources associated with the thread using the pointer.

In Linux, there are 9000+ references to current, the C macro to get the task_struct for the current thread.

SYSCALL_DEFINE1(dup, unsigned int, fildes)
{
	int ret = -EBADF;
	struct file *file = __fget_files(current->files, fildes);

	if (file) {
		ret = get_unused_fd_flags(0);
		if (ret >= 0)
			fd_install(ret, file);
		else
			fput(file);
	}
	return ret;
}

Occlum follows a similar programming pattern with its Rust version of current macro.

pub fn do_dup(old_fd: FileDesc) -> Result<FileDesc> {
    let current = current!();
    let mut files = current.files().write();
    let file = files.get(old_fd)?;
    let new_fd = files.add(file, false);
    Ok(new_fd)
}

This current-based programming pattern for resource access is against our philosophy of least privilege principles. This is because the current macro provides open access to any OS resources attached to the current thread in almost all context.

Solution: the component-level access control mechanism comes to the rescue

Let's see how access control to object methods can address the problem above.

pub struct Thread {
    tid: Tid,
    process: Arc<Process>,
    files: Arc<RwLock<FileTable>>,
    // more...
}

impl Thread {
    pub fn current() -> Arc<Thread> { ... }

    pub fn tid(&self) -> Tid { self.tid }

    pub fn process(&self) -> &Arc<Process> { &self.process }

    #[controlled]
    pub fn files(&self) -> RwLockReadGuard<FileTable, '_> {
        self.files.read()
    }

    #[controlled]
    pub fn files_mut(&self) -> RwLockWriteGuard<FileTable, '_> {
        self.files.write()
    }
}

As you can see, the original files method is split into two methods, one returns read-only guard, the other mutable guard. And both methods are annotated with the #[controlled] attribute.

With these changes, the component-level access control mechanism can now prevent unauthorized components from using Thread::files and Thread::files_mut. And thus, making their access to current harmless.

What about trait methods?

Can the access control mechanism, which is based on MIR-level static analysis, work reliably on trait methods? Does the following syntax make sense?

pub trait File {
    #[controlled]
    fn read(&self, buf: &mut [u8]) -> Result<usize>;

    #[controlled]
    fn write(&self, buf: &mut [u8]) -> Result<usize>;
}

pub struct Pipe;

impl File for Pipe {
    fn read(&self, buf: &mut [u8]) -> Result<usize> { ... }
    fn write(&self, buf: &mut [u8]) -> Result<usize> { ... }
}

pub struct Socket;

impl File for Socket {
    fn read(&self, buf: &mut [u8]) -> Result<usize> { ... }
    fn write(&self, buf: &mut [u8]) -> Result<usize> { ... }
}

Implementation

...

Tracking issue: Elf解析与构建地址空间

目前仅支持对于非常简单的elf进行解析与构建地址空间,为了运行更加复杂的程序,需要进一步完善目前的实现。这个issue用来追踪待完善的功能。

  • process initial stack & register state
    • 实现initial stack的数据布局(命令行参数,环境变量等)
    • 正确设置寄存器的状态(argc等)
    • 用户堆空间的分配
  • elf解析的优化
    • 允许segment的起始地址并不是page-aligned(兼容x64 ABI要求)
    • 读取elf的第一个chunk(128bytes)来解析文件格式(参考Linux内部实现),而不需要读取整个elf文件的内容
  • 动态链接的支持
  • 用户态运行库对接内核的实现(兼容libc的实现)

kxos-frame开发进度

  1. 启动部分
  • 利用bootloader库启动, 并打印Hello World
  1. 中断
  • 实现irq
  • 注册中断函数
  • 测试
  1. 虚拟内存
  • 框架搭建
  • 测试
  1. 多任务
  • 框架搭建
  • 任务切换
  1. 用户与内核切换
  • 进入user mode
  • 陷入trap, 输出一点东西
  • 将内容转到execute中
  1. 驱动
  • PCI基础架构搭建
  • Virtio基础架构搭建
  • Virtio块设备
  • VirtioGPU设备
  • Virtio输入设备
  • ...
  1. 同步
  • spin lock
  • wait
  1. 查询cpu等参数并封装
  • ...

Support essential shell commands

First batch (Including already supported):

  • sys_info
    • arch
    • ps
    • top
  • shell/terminal/console
    • ash
    • clear
    • stty
    • sh
  • fs/file
    • cat
    • cp
    • dirname
    • find
    • ln
    • ls
    • mkdir
    • mv
    • pwd
    • readlink
    • rm
    • rmdir
    • stat
    • tail
    • tee
    • touch
    • unlink
  • edit / interactive
    • sed
    • awk
    • vi
    • more
    • less
  • date/time
    • cal
    • date
  • utility
    • echo
    • env
    • sleep
    • kill
    • uname

Add CONTRIBUTING.md

Add a CONTRIBUTING.md file to provide guidance on coding style and conventions.

Iteration Plan for Feb. 2023

This plan captures our work in February. This is a 4-week iteration. For each iteration, we want to make balanced progress in two directions, one is to add new functionalities, the other is to pay technical debts.

Plan Items

Mark Description
work completed
🏃 work in progress
blocked task
💪 stretch goal for this iteration
🔴 missing issue reference
a large work item, larger than one iteration

Component system

File systems

System calls

Misc

Technical debts

Docs

Implement safe pointers

I propose a new abstraction named SafePtr to handle pointers safely. SafePtr is quite like InFramePtr, but more general and flexible. SafePtr can represent any kinds of pointers, including user-space pointers and physical pointers. SafePtr can be backed by any safe VM abstractions, including Vmar, Vmo, VmFrame, and more.

Furthermore, SafePtr are capabilities. Currently, the user-space can be read or written freely with utility functions like read_bytes_from_user and write_val_to_user. This is against our design philosophy of everything-is-a-capability. My plan is to replace these utility functions with SafePtr.

A draft implementation

A draft implementation is provided for reference. I am almost certain that it won't compile. But it is extensively documented and should serve as a good starting point for a production-quality implementation.

The draft implementation depends on two other changes

/// Safe pointers.
/// 
/// # Overview
/// 
/// Safe pointers allows using pointers to access memory without 
/// unsafe code, which is a key enabler for writing device drivers in safe Rust.
/// 
/// To ensure its soundness, safe pointers (`SafePtr<T, M, _>`) have to be 
/// more restricted than raw pointers (`*const T` or `*mut T`).
/// More specifically, there are three major restrictions.
/// 
/// 1. A safe pointer can only refer to a value of a POD type `T: Pod`,
/// while raw pointers can do to a value of any type `T`.
/// 2. A safe pointer can only refer to an address within a virtual memory object
/// of type `M: VmIo` (e.g., VMAR and VMO), while raw pointers can do to 
/// an address within any virtual memory space.
/// 3. A safe pointer only allows one to copy values to/from the target address,
/// while a raw pointer allows one to borrow an immutable or mutable reference
/// to the target address.
/// 
/// The expressiveness of safe pointers, although being less than that of 
/// raw pointers, is sufficient for our purpose of writing an OS kernel in safe
/// Rust.
/// 
/// In addition, safe pointers `SafePtr<T, M, R>` are associated with access 
/// rights, which are encoded statically with type `R: TRights`.
/// 
/// # Examples
/// 
/// ## Constructing a safe pointer
/// 
/// An instance of `SafePtr` can be created with a VM object and an address 
/// within the VM object.
/// 
/// ```
/// let u32_ptr: SafePtr<u32, Vec<u8>, _> = {
///     let vm_obj = VmoOptions::new(PAGE_SIZE).alloc().unwrap();
///     let addr = 16;
///     SafePtr::new(vm_obj, addr)
/// };
/// ```
/// 
/// The generic parameter `M` of `SafePtr<_, M, _>` must implement the `VmIo` 
/// trait. The most important `VmIo` types are `Vmar`, `Vmo`, `Mmio`, and 
/// `VmFrame`. The blanket implementations of `VmIo` also include pointer-like 
/// types that refer to a `VmIo` type. Some examples are `&Vmo`, `Box<Vmar>`, 
/// and `Arc<Mmio>`.
/// 
/// The safe pointer itself does not and cannot guarantee that its address is valid.
/// This is because different VM objects may interpret addresses differently
/// and each VM object can have different restrictions for valid addresses.
/// So the detection of invalid addresses is delayed to the time when the 
/// pointers are actually read from or written to.
/// 
/// Initially, a newly-created safe pointer has all access rights.
/// 
/// ## Reading and writing a safe pointer
/// 
/// The value pointed to by a safe pointer can be read or written with the
/// `read` or `write` method. Both methods may return errors. The possible reasons
/// of error are determined by the underlying VM objects. 
/// 
/// ```
/// u32_ptr.write(1234).unwrap();
/// assert!(u32_ptr.read().unwrap() == 1234);
/// ```
/// 
/// ## Manipulating a safe pointer
/// 
/// The address of a safe pointer can be obtained by the `addr` method.
/// The address can be updated by assigning a new value with the `set_addr` method
/// or updated incrementally through methods like `add`, `offset`, `byte_addr`,
/// `byte_offset`.
///  
/// The VM object of a safe pointer can also be obtained or updated through the
/// `vm` and `set_vm` methods. A new safe pointer that is backed by the same
/// VM object of an existing safe pointer can be obtained through the `borrow_vm`
/// method.
/// 
/// As an example, the code below shows how the `add` and `borrow_vm` methods 
/// can be used together to to iterate all values pointed to by an array pointer.
/// 
/// ```
/// fn collect_values<T>(array_ptr: &SafePtr<T, M, _>, array_len: usize) -> Vec<T> {
///     let mut curr_ptr: SafePtr<T, &M, _> = array_ptr.borrow_vm();
///     (0..array_len)
///         .iter()
///         .map(|_| {
///             let val = curr_ptr.read().unwrap();
///             curr_ptr.add(1);
///             val
///         })
///         .collect()
/// }
/// ```
/// 
/// The data type of a safe pointer can be converted with the `cast` method.
/// 
/// ```rust
/// let u8_ptr: SafePtr<u8, _, _> = u32_ptr.cast();
/// ```
/// 
/// ## Reading and writing the fields of a struct
/// 
/// Given a safe pointer that points to a struct (say, `Foo`), one can read
/// the value of its field as follows.
/// 
/// ```
/// pub struct Foo {
///     first: u64,
///     second: u32,
/// }
/// 
/// fn read_second_field<M: VmIo>(ptr: &SafePtr<Foo, M, _>) -> u32 {
///     let field_ptr = ptr
///         .borrow_vm()
///         .byte_add(offset_of!(Foo, second) as usize)
///         .cast::<u32>();
///     field_ptr.read().unwrap()
/// }
/// ```
/// 
/// But this coding pattern is too tedius for such a common task.
/// To make the life of users easier, we provide a convinient macro named 
/// `field_ptr`, which can be used to obtain the safe pointer of a field from
/// that of its containing struct.
/// 
/// ```
/// fn read_second_field<M: VmIo>(ptr: &SafePtr<Foo, M, _>) -> u32 {
///     let field_ptr = field_ptr!(ptr, Foo, second);
///     field_ptr.read().unwrap()
/// }
/// ``` 
/// 
/// # Access rights
/// 
/// A safe pointer may have a combination of three access rights:
/// Read, Write, and Dup.
pub struct SafePtr<T, M, R> {
    addr: usize,
    vm_obj: M,
    rights: R,
    phantom: PhantomData<T>,
}

impl<T: Pod, M: VmIo, R: TRights> SafePtr<T, M, R> {
    /// Create a new instance.
    /// 
    /// # Access rights
    /// 
    /// The default access rights of a new instance are `Read`, `Write`, and 
    /// `Dup`.
    pub fn new(vm_obj: M, addr: usize) -> Self {
        Self {
            vm_obj,
            addr,
            rights: <TRights![Read, Write, Dup]>::new(),
            phantom: PhantomData,
        }
    }

    // =============== Read and write methods ==============

    /// Read the value from the pointer.
    /// 
    /// # Access rights
    /// 
    /// This method requires the Read right.
    #[require(R > Read)]
    pub fn read(&self) -> Result<T> {
        vm_obj.read_val(self.addr)
    }

    /// Read a slice of values from the pointer.
    /// 
    /// # Access rights
    /// 
    /// This method requires the Read right.
    #[require(R > Read)]
    pub fn read_slice(&self, slice: &mut [T]) -> Result<()> {
        vm_obj.read_slice(slice)
    }

    /// Overwrite the value at the pointer.
    /// 
    /// # Access rights
    /// 
    /// This method requires the Write right.
    #[require(R > Write)]
    pub fn write(&self, val: &T) -> Result<()> {
        vm_obj.write_val(self.addr, val)
    }

    /// Overwrite a slice of values at the pointer.
    /// 
    /// # Access rights
    /// 
    /// This method requires the Write right.
    #[require(R > Write)]
    pub fn write_slice(&self, slice: &[T]) -> Result<()> {
        vm_obj.write_slice(slice)
    }

    // =============== Address-related methods ==============

    pub const fn addr(&self) -> usize {
        self.addr
    }

    pub const fn set_addr(&mut self, addr: usize) {
        self.addr = addr;
    }

    pub const fn is_aligned(&self) -> bool {
        self.addr % core::mem::alignment::<T>() == 0
    }

    pub const fn add(&mut self, count: usize) {
        let offset = count * core::mem::size_of::<T>();
        self.addr += offset;
    }

    pub const fn offset(&mut self, count: isize) {
        let offset = count * core::mem::size_of::<T>();
        if count >= 0 {
            self.addr += count as usize;
        } else {
            self.addr -= (-count) as usize;
        }
    }

    pub const fn byte_add(&mut self, bytes: usize) {
        self.addr += bytes;
    }

    pub const fn byte_offset(&mut self, bytes: isize) {
        if isize >= 0 {
            self.addr += bytes as usize;
        } else {
            self.addr -= (-bytes) as usize;
        }
    }

    // =============== VM object-related methods ==============

    pub const fn vm(&self) -> &M {
        &self.vm_obj
    }

    pub const fn set_vm(&mut self, vm_obj: M) {
        self.vm_obj = vm_obj;
    }

    pub const fn borrow_vm(&self) -> SafePtr<T, &M, R> {
        let SafePtr { addr, vm_obj, rights, .. } = self;
        Self {
            addr: *addr,
            vm_obj,
            rights: rights.clone(),
            phantom: PhantomData,
        }
    }

    // =============== Type conversion methods ==============

    pub const fn cast<U>(self) -> SafePtr<U, M, R> {
        let SafePtr { addr, vm_obj, rights, .. } = self;
        Self {
            addr,
            vm_obj,
            rights,
            phantom: PhantomData,
        }
    }

    #[require(R > R1)]
    pub fn restrict<R1: TRights>(self) -> SafePtr<T, M, R1> {
        let SafePtr { addr, vm_obj, rights, .. } = self;
        Self {
            addr,
            vm_obj,
            rights: R1::new(),
            phantom: PhantomData,
        }
    }
}

#[require(R > Dup)]
impl<T, M: Clone, R: TRights> Clone for SafePtr<T, M, R> {
    fn clone(&self) -> Self {
        Self {
            addr: self.addr,
            vm_obj: self.vm_obj.clone(),
            rights: self.rights,
            phantom: PhantomData,
        }
    }
}

#[require(R > Dup)]
impl<T, M: Dup, R: TRights> Dup for SafePtr<T, M, R> {
    fn dup(&self) -> Result<Self> {
        let duplicated = Self {
            addr: self.addr,
            vm_obj: self.vm_obj.dup()?,
            rights: self.rights,
            phantom: PhantomData,
        };
        Ok(duplicated)
    }
}

/// Create a safe pointer for the field of a struct.
#[macro_export]
macro_rules! field_ptr {
    ($ptr:expr, $type:ty, $($field:tt)+) => {{
        use jinux_frame::offset_of;
        use jinux_frame::vm::VmIo;
        // import more...

        #[inline]
        fn new_field_ptr<T, M, R, U>(
            container_ptr: &SafePtr<T, M, R>,
            field_offset: *const U
        ) -> SafePtr<U, &M, R> 
        where
            T: Pod,
            M: VmIo,
            R: TRights,
            U: Pod,
        {
            container_ptr
                .borrow_vm()
                .byte_add(offset as usize)
                .cast()
        }

        let ptr = $ptr;
        let field_offset = offset_of!(ty, $($field)*);
        new_field_ptr(ptr, field_offset)
    }}
}

Add basic CI support

Add some basic Github actions for the quality of the codebase. These actions should test the results of common build steps.

make check
make test
make docs

How to use pointers in a safe and ergonomical way?

Problem

Here is an example extracted from rCore, where an arbitrary physical address is converted to a static, mutable reference.

struct VirtQueue {
    avail: &'static mut AvailRing,
    //...
}

impl VirtQueue {
    /// Create a new VirtQueue.
    pub fn new() -> Self {
        //...
        let avail = unsafe {
            &mut *(mm::phys_to_virt(cfg.queue_driver as usize) as *mut AvailRing)
        };
        Self {
            avail,
            //...
        }
    }

    pub fn add(&mut self, /*...*/*) {
        //...
        let avail_slot = self.avail_idx & (self.queue_size - 1);
        self.avail.ring[avail_slot as usize] = head;
        //...
    }
}

CConverting raw pointers to references is common in rCore as well as other Rust OSes. This practice is generally unsafe and can cause undefined behaviors.

In KxOS, we cannot accept such practice. The kx_frame crate has provided two essential VM abstraction of VmVrame and VmSpace for safe memory access with raw pointers. But it is still unclear whether VmVrame and VmSpace allows using raw pointers in an ergonomical way? Prefereablly, as easy-to-use as the unsafe practice shown above.

Solution

We propose a new abstraction for physical pointers called InFramePtr and two convenient macros to access InFramePtr easily.

/// An in-frame pointer to a POD value, enabling safe access
/// to a POD value given its physical memory address.
pub struct InFramePtr<T> {
    frame: VmFrame,
    offset: u32,
    marker: PhantomData<*mut T>, 
}

impl<T: Pod> InFramePtr<T> {
    pub fn new(paddr: Paddr) -> Result<Self> {
        let frame = {
            let page_paddr = Paddr & !(PAGE_SIZE - 1);
            let options = VmAllocOptions::new(1)
                .paddr(Some(page_paddr));
            VmFrameVec::allocate(&options)?.remove(0)
        };
        let offset = (paddr - frame.paddr()) as u32;
        Ok(Self {
            frame,
            offset,
            marker: PhantomData,   
        })
    }

    pub fn read(&self) -> T {
        self.frame.read_val(self.offset)
    }

    pub fn write(&self, new_val: T) {
        self.frame.write_val(self.offset, new_val)
    }

    pub fn offset(&self) -> usize {
        self.offset as _
    }

    pub fn frame(&self) -> &Frame {
        &self.frame
    }
}

/// Read from an in-frame pointer to a POD value.
///
/// An in-frame pointer's `read` method only allows the pointed value 
/// to be read as a whole. But it is a common need to read a field 
/// of the value. Both use cases are supported by this macro as shown
/// by the example below.
///
/// ```
/// #[derive(Pod)]
/// #[repr(c)]
/// struct Foo {
///   a: u8,
///   b: u64,
/// }
///
/// let foo_ptr = InFramePtr<Foo>::new(0xabcd).unwrap();
/// let foo = read_ptr!(foo_ptr);
/// let foo_b = read_ptr!(foo_ptr.b);
/// assert!(foo.b == foo_b);
macro_rules! read_ptr {
    ($($ptr:tt)*) => {
        todo!("omit details")
    }
}

/// Write to an in-frame pointer to a POD value.
///
/// Similar to the `read_ptr` macro.
macro_rules! write_ptr {
    ($($ptr:tt)*, $new_val:expr) => {
        todo!("omit details")
    }
}

撰写短文

我们计划在2023年初撰写一篇短文,作为中期报告。

可以考虑投稿的workshop。候选的workshop有:

  • 2023年2月左右:HotOS'23(两年一次)
  • 2023年4月左右:ChinaSys'23
  • 2023年4月左右:APSys'23

kxos-std开发计划与完成情况

用来管理kxos-std部分的开发进度:

  1. memory
  2. process
  • elf文件的加载
  • initproc的创建与启动
  1. scheduler
  • FIFO
  • 基于时间片的轮转调度
  1. syscall
  • sys_read
  • sys_write
  • sys_exit
  • sys_fork
  • sys_exec
  • sys_waitpid
  • sys_yield
  • sys_mmap
  1. userspace程序
  • shell

framework: reimplement the frame allocator with the buddy system

The current implementation of the frame allocator in Jinux Framework is based a free list. This would become increasingly inefficient as the number of free memory fragments grows. A more efficient choice is to use the buddy system, which has already been used by the heap allocator. Ideally, the buddy system should be reused by both types of allocators to avoid code redundancy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.