onyx,heatd

Our current spinlocks are essentially

struct spinlock { unsigned int word; };

void lock(struct spinlock *s)
{
    while(!cmpxchg(&s->word, 0, 1) cpu_pause();
}

this is alright for raw throughput but is palpably unfair to other less fortunate CPUs.
Two options (and we'll need both) are available:

Ticket locks
These are pretty much just struct spinlock { u16 next_ticket, u16 curr_ticket; };. Waiters increment next_ticket (thru an atomic fetch_add), then wait for curr_ticket to be == their ticket. This is still cache-awful but at least it's much fairer. Throughput should take a hit, but that's okay. (note: while writing this up, i noticed that a spin_try_lock in this would be tricky, but it's doable with a single 32-bit cmpxchg).
MCS locks
MCS locks basically make waiters spin on their own cacheline, so it's OPTIMAL for systems with lots of threads and CPUs. The idea here is to essentially make a linked list of CPUs and use percpu accessors to access these separate structs. These are kept percpu and we should keep one struct per level (irqsave, normal (no preempt)).

We likely want both, MCS locks aren't necessarily needed in many cases, e.g riscv machines that have a small-ish amount of CPUs.

NOTE: it's super important to keep the current struct spinlock size, we do not want to bloat up all the various users of spinlocks in the kernel.

Fix scheduler primitive hanging bugs

Virtual Box exposes some bugs I hadn't seen before with the primitives. They are hanging on a spinlock. Fix it ASAP.

Weird crashes around/related to pthread_key_create and possibly stack

Testcase: while true; do gcc -v >/dev/null 2>&1; done

While doing it in two ttys at once, things start to break.

Couldn't see more than that since /bin/gcc wasn't built with symbols.
I also managed to crash bash in those loops, and same for dash (but that was very rare).

Use and pass around iovec_iters in the network stack

The network stack was written pre-iovec_iter, and as such handrolls a lot of the iovec_iter logic. For instance:

unix.cpp was written post-iovec_iter and as such creates its own iovec_iter
tcp.cpp, udp.cpp, icmp.cpp do not use iovec_iter and handroll the logic

This would probably mean that we would want to pass around a structure different from msghdr, with a iovec_iter instead of a struct iovec.

#93 is related to this. While the conversion doesn't happen, sendmsg can't be used with !IOVEC_USER.

Use after free in sigtimedwait

Onyx/kernel/kernel/signal.c

Line 1069 in d4827e0

if(copy_to_user(info, pending->info, sizeof(sigset_t)) < 0)

Reproducible kernel panic in vm.cpp

Repro: Do "file /lib/libuuid.so"

panic: Assertion mm->shared_set_size == 0 failed in ...

[x86] Deal with non-zero BSP APIC ids

Some systems have CPUs that do not/may not have APIC ids with 0. Case in point was AMD 15h.

Check IO APIC redirection stuff (surely broken) and MSI stuff (probably broken) for this.

dcache tests

The dcache is very complex and could use some in-kernel unit tests to test things hard to reach from userspace (i.e not related with namei)

Fix weird inconsistent locking in the network stack (sockets)

For instance, many sendmsg() paths do not hold locks...

Add eBPF support

eBPF (and BPF) is useful to dynamically be able to trace functions. Using this and the current skeleton of nop'ed mcount calls, we'll be able to have really fast and efficient tracing

dentry_wait_for_pending deadlock

There's a deadlock related to dentry_wait_for_pending. Couldn't get much more info than that.

Shower thoughts:

Lets say i'm going to create a file (dir inode write locked), do a lookup, find a pending dentry
The pending resolver tries to get it in a shared lock
This sounds like a problem?

Replace unsafe, manual thread_change_addr_limit with auto_addr_limit

Or maybe do away with it altogether? Like Linux did with set_fs.

Serial input causes panic

Using the CI build of disk-image.img from https://github.com/heatd/Onyx/actions/runs/2907275605 (produced from 2cec04e) and the following QEMU incantation:

qemu-system-x86_64 -drive file=disk-image.img,format=raw,media=disk -boot d -enable-kvm -m 1G -cpu host,migratable=on,+invtsc -smp 4 -vga qxl -device usb-ehci -device usb-mouse -machine q35 -bios /usr/share/qemu/OVMF.fd  -netdev user,id=u1 -device virtio-net,netdev=u1 -serial stdio

Attempting to use the resulting serial console immediately results in a panic on the first key stroke:

No further output beyond /bin/login: username: is evident in the serial console.

For additional reference, two builds of QEMU were used with the same results:

QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
QEMU emulator version 6.2.92 (v6.0.0-8451-gb992cef642)

And since I am using -cpu host as requested, the host CPU is a Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (Q3'15 Skylake, poor old thing is in dire need of replacement).

Symlinks, O_NOFOLLOW don't work properly

Unify IDE and AHCI ata code

Instead of having everything duplicated, the ATA code should just probe published drives on the ATA bus.

Also, SCSI on top of ATA? Like linux?

Add unit tests for tty job control

Currently, there are no tests for job control which is not good

Hunt for sleep under no-preemption space

Add a is_preemption_enabled() in mutex_lock(), wait_queue stuff, etc. Should find some bugs over the kernel, usually I trigger some of them when stressing the kernel.

GCC + LTO breaks the kernel

Just doesn't properly boot, every CPU hangs in sched_idle().

Fix github actions CI testing flakyness

Commit pages when doing DMA with memory

Switch neighbours to queue pending packets

Currently we're (naively) waiting for ARP/NDP replies, which crashes the kernel if we do it on the bottom half.
Linux queues pending packets on arp_queue and dispatches them when switching a neighbour to NUD_REACHABLE.

POSIX requires pthread_create to inherit the fenv

See exec(3) in the POSIX spec for more details.

Finish AHCI support

Default minimal build (without CONFIG_ZSTD) cannot boot

The default builds create a zstd initrd, which fails to decompress at boot time since we don't build the zstd code.

The build scripts should attempt to detect this and fallback to no compression?

Or possibly build zstd by default. It's not too large.

SLAB crashes on network intensive workloads

Using ping -f 127.0.0.1 > /dev/null we can find various allocator-related crashes. Note that these were all found with UBSAN and KASAN enabled.

For instance:

#0  halt () at arch/x86_64/debug.cpp:15
#1  0xffffffff8102dc3e in panic (msg=0xffffffff812efa3d "Assertion %s failed in %s:%u, in function %s\n") at kernel/panic.cpp:129
#2  0xffffffff8102dcab in __assert_fail (assertion=0xffffffff812f3ec7 "!list_is_empty(&cache->partial_slabs)", file=0xffffffff8130473c "kernel/mm/slab.cpp", line=596, 
    function=0xffffffff812fc7e2 "kmem_cache_alloc_refill_mag") at kernel/panic.cpp:135
#3  0xffffffff812df505 in kmem_cache_alloc_refill_mag (cache=cache@entry=0xffff80131cbdd380, pcpu=pcpu@entry=0xffff80131cbdd880, flags=flags@entry=0) at kernel/mm/slab.cpp:596
#4  0xffffffff812de7c2 in kmem_cache_alloc (cache=0xffff80131cbdd380, flags=0) at kernel/mm/slab.cpp:797
#5  0xffffffff812e0379 in kmalloc (size=176, flags=0) at kernel/mm/slab.cpp:1237
#6  0xffffffff8101b22b in operator new (size=18446744071582674336) at kernel/cppnew.cpp:17
#7  0xffffffff810a0918 in vmo_create (size=size@entry=240, priv=priv@entry=0x0 <abi::abi_data>) at kernel/mm/vm_object.cpp:39
#8  0xffffffff810e44eb in packetbuf::allocate_space (this=this@entry=0xffffd000367f9a80, length=length@entry=240) at kernel/net/packetbuf.cpp:54
#9  0xffffffff810e5449 in packetbuf_clone (original=0xffffd000367f9ee0) at kernel/net/packetbuf.cpp:195
#10 0xffffffff811025fa in loopback_send_packet (buf=0xffffffff813a6da0 <buffer_lock>, nif=0x0 <abi::abi_data>) at kernel/net/loopback.cpp:36
#11 0xffffffff810ce114 in ip::v4::send_packet (flow=..., buf=buf@entry=0xffffd000367f9ee0, options=...) at kernel/net/ipv4/ipv4.cpp:347
#12 0xffffffff810c6a1c in icmp::icmp_socket::sendmsg (this=<optimized out>, msg=<optimized out>, flags=<optimized out>) at kernel/net/ipv4/icmp.cpp:310
#13 0xffffffff81128f41 in socket_sendmsg (sock=0xffff80131e783900, umsg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1322
#14 sys_sendmsg (sockfd=<optimized out>, msg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1374
#15 0xffffffff811cde12 in do_syscall64 (frame=<optimized out>) at arch/x86_64/syscall.cpp:44
#16 0xffffffff811b9b0f in syscall_ENTRY64 () at arch/x86_64/entry.S:130
#17 0x0000000000000033 in abi::abi_data ()

and

Page fault inside list_remove (with 0xDEB5 aka LIST_REMOVE_POISON)

0xffff804d72890ca0:     0xffff804d72890cd0      0xffffffff812df9a6 <_ZL17kmem_free_to_slabP10slab_cacheP4slabPv+166>
0xffff804d72890cd0:     0xffff804d72890cf0      0xffffffff812dd898 <_ZN10quarantine3popEv+40>
0xffff804d72890cf0:     0xffff804d72890d30      0xffffffff812ddb98 <_ZN10quarantine5flushEv+120>
0xffff804d72890d30:     0xffff804d72890e50      0xffffffff810b2bb1 <_Z15page_do_reclaimP12reclaim_data+257>
0xffff804d72890e50:     0xffff804d72890fd8      0xffffffff8109fd60 <_ZL10pagedaemonPv+1168>
0xffff804d72890fd8:     0xffff804d72890ff8      0xffffffff811cec26 <_ZN3x868internal19kernel_thread_startEPv+70>

Are these triggered because of memory pressure + the KASAN quarantine? I can't tell.

/proc support?

Some kernel support for /proc and pseudo filesystems in general is in order, maybe?

At least, dentries need to be invalidate-able.

Enforce PATH_MAX

Not enforcing PATH_MAX is an easy DoS vector

Add arm64 support

Raspberry pi time!

Port an IRC client and ircd

Sortix is beating us. Don't let that be the case.

virtio-net crashes the kernel in certain situations

After I invoke 'net_tests', there's likely memory corruption that crashes another part of the kernel. KASAN isn't catching it. I am confused.

Finish TCP support

A good start is by finishing up the UDP corking and bringing it to TCP

Unsafe memcpy of possibly non-trivial class in kernel/fs/pipe.cpp

Onyx/kernel/kernel/fs/pipe.cpp

Line 306 in d0e1831

/* TODO: This memcpy seems unsafe, at least... */

error: 'ACPI_STATUS' does not name a type; did you mean 'ACPI_STATE_S0'?

This issue arose using the latest gcc toolchain and latest sysroot when running make iso

Add virtio-scsi support

Integrate a basic tracing infrastructure

Add a basic tracing infrastructure to the kernel, with an efficient ring buffer and trace events, etc.

Convert various fds to implement read_iter and write_iter

Since e7871d3 we have a way to properly implement readv/writev for devices (legacy ->read and ->write are deprecated.). With read_iter and write_iter, writevs and readvs work just like the regular variants (think of the tty readv doing an internal read for an iov, getting data, then hanging for the next... yuck).

Better kernel crash call traces?

Right now, only addresses are printed in an event of crash, so one needs to run llvm-symbolize on those to get names of the functions involved. I think it would be good if this data was printed automatically. I'm thinking of linux's call trace as an example.

C++ mangled names may complicate things though.

Creds aren't put back in this line

Onyx/kernel/kernel/net/ipv4.cpp

Line 608 in d417f5b

return true;

heatd / onyx Goto Github PK

onyx's People

Contributors

Stargazers

Watchers

Forkers

onyx's Issues

Recommend Projects

Recommend Topics

Recommend Org