heatd / onyx Goto Github PK
View Code? Open in Web Editor NEWUNIX-like operating system written in C and C++
License: Other
UNIX-like operating system written in C and C++
License: Other
Our current spinlocks are essentially
struct spinlock { unsigned int word; };
void lock(struct spinlock *s)
{
while(!cmpxchg(&s->word, 0, 1) cpu_pause();
}
this is alright for raw throughput but is palpably unfair to other less fortunate CPUs.
Two options (and we'll need both) are available:
Ticket locks
These are pretty much just struct spinlock { u16 next_ticket, u16 curr_ticket; };
. Waiters increment next_ticket (thru an atomic fetch_add), then wait for curr_ticket to be == their ticket. This is still cache-awful but at least it's much fairer. Throughput should take a hit, but that's okay. (note: while writing this up, i noticed that a spin_try_lock in this would be tricky, but it's doable with a single 32-bit cmpxchg).
MCS locks
MCS locks basically make waiters spin on their own cacheline, so it's OPTIMAL for systems with lots of threads and CPUs. The idea here is to essentially make a linked list of CPUs and use percpu accessors to access these separate structs. These are kept percpu and we should keep one struct per level (irqsave, normal (no preempt)).
We likely want both, MCS locks aren't necessarily needed in many cases, e.g riscv machines that have a small-ish amount of CPUs.
NOTE: it's super important to keep the current struct spinlock size, we do not want to bloat up all the various users of spinlocks in the kernel.
Virtual Box exposes some bugs I hadn't seen before with the primitives. They are hanging on a spinlock. Fix it ASAP.
Testcase: while true; do gcc -v >/dev/null 2>&1; done
While doing it in two ttys at once, things start to break.
Couldn't see more than that since /bin/gcc wasn't built with symbols.
I also managed to crash bash in those loops, and same for dash (but that was very rare).
The network stack was written pre-iovec_iter, and as such handrolls a lot of the iovec_iter logic. For instance:
This would probably mean that we would want to pass around a structure different from msghdr, with a iovec_iter instead of a struct iovec.
#93 is related to this. While the conversion doesn't happen, sendmsg can't be used with !IOVEC_USER.
Line 1069 in d4827e0
Repro: Do "file /lib/libuuid.so"
panic: Assertion mm->shared_set_size == 0 failed in ...
Some systems have CPUs that do not/may not have APIC ids with 0. Case in point was AMD 15h.
Check IO APIC redirection stuff (surely broken) and MSI stuff (probably broken) for this.
The dcache is very complex and could use some in-kernel unit tests to test things hard to reach from userspace (i.e not related with namei)
lldb, gdb, (good) strace would be great
For instance, many sendmsg() paths do not hold locks...
eBPF (and BPF) is useful to dynamically be able to trace functions. Using this and the current skeleton of nop'ed mcount calls, we'll be able to have really fast and efficient tracing
There's a deadlock related to dentry_wait_for_pending. Couldn't get much more info than that.
Shower thoughts:
Or maybe do away with it altogether? Like Linux did with set_fs.
Triple faults
Using the CI build of disk-image.img
from https://github.com/heatd/Onyx/actions/runs/2907275605 (produced from 2cec04e) and the following QEMU incantation:
qemu-system-x86_64 -drive file=disk-image.img,format=raw,media=disk -boot d -enable-kvm -m 1G -cpu host,migratable=on,+invtsc -smp 4 -vga qxl -device usb-ehci -device usb-mouse -machine q35 -bios /usr/share/qemu/OVMF.fd -netdev user,id=u1 -device virtio-net,netdev=u1 -serial stdio
Attempting to use the resulting serial console immediately results in a panic on the first key stroke:
No further output beyond /bin/login: username:
is evident in the serial console.
For additional reference, two builds of QEMU were used with the same results:
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
QEMU emulator version 6.2.92 (v6.0.0-8451-gb992cef642)
And since I am using -cpu host
as requested, the host CPU is a Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
(Q3'15 Skylake, poor old thing is in dire need of replacement).
Instead of having everything duplicated, the ATA code should just probe published drives on the ATA bus.
Also, SCSI on top of ATA? Like linux?
Currently, there are no tests for job control which is not good
Add a is_preemption_enabled() in mutex_lock(), wait_queue stuff, etc. Should find some bugs over the kernel, usually I trigger some of them when stressing the kernel.
Just doesn't properly boot, every CPU hangs in sched_idle().
Currently we're (naively) waiting for ARP/NDP replies, which crashes the kernel if we do it on the bottom half.
Linux queues pending packets on arp_queue and dispatches them when switching a neighbour to NUD_REACHABLE.
See exec(3) in the POSIX spec for more details.
The default builds create a zstd initrd, which fails to decompress at boot time since we don't build the zstd code.
The build scripts should attempt to detect this and fallback to no compression?
Or possibly build zstd by default. It's not too large.
Using ping -f 127.0.0.1 > /dev/null
we can find various allocator-related crashes. Note that these were all found with UBSAN and KASAN enabled.
For instance:
#0 halt () at arch/x86_64/debug.cpp:15
#1 0xffffffff8102dc3e in panic (msg=0xffffffff812efa3d "Assertion %s failed in %s:%u, in function %s\n") at kernel/panic.cpp:129
#2 0xffffffff8102dcab in __assert_fail (assertion=0xffffffff812f3ec7 "!list_is_empty(&cache->partial_slabs)", file=0xffffffff8130473c "kernel/mm/slab.cpp", line=596,
function=0xffffffff812fc7e2 "kmem_cache_alloc_refill_mag") at kernel/panic.cpp:135
#3 0xffffffff812df505 in kmem_cache_alloc_refill_mag (cache=cache@entry=0xffff80131cbdd380, pcpu=pcpu@entry=0xffff80131cbdd880, flags=flags@entry=0) at kernel/mm/slab.cpp:596
#4 0xffffffff812de7c2 in kmem_cache_alloc (cache=0xffff80131cbdd380, flags=0) at kernel/mm/slab.cpp:797
#5 0xffffffff812e0379 in kmalloc (size=176, flags=0) at kernel/mm/slab.cpp:1237
#6 0xffffffff8101b22b in operator new (size=18446744071582674336) at kernel/cppnew.cpp:17
#7 0xffffffff810a0918 in vmo_create (size=size@entry=240, priv=priv@entry=0x0 <abi::abi_data>) at kernel/mm/vm_object.cpp:39
#8 0xffffffff810e44eb in packetbuf::allocate_space (this=this@entry=0xffffd000367f9a80, length=length@entry=240) at kernel/net/packetbuf.cpp:54
#9 0xffffffff810e5449 in packetbuf_clone (original=0xffffd000367f9ee0) at kernel/net/packetbuf.cpp:195
#10 0xffffffff811025fa in loopback_send_packet (buf=0xffffffff813a6da0 <buffer_lock>, nif=0x0 <abi::abi_data>) at kernel/net/loopback.cpp:36
#11 0xffffffff810ce114 in ip::v4::send_packet (flow=..., buf=buf@entry=0xffffd000367f9ee0, options=...) at kernel/net/ipv4/ipv4.cpp:347
#12 0xffffffff810c6a1c in icmp::icmp_socket::sendmsg (this=<optimized out>, msg=<optimized out>, flags=<optimized out>) at kernel/net/ipv4/icmp.cpp:310
#13 0xffffffff81128f41 in socket_sendmsg (sock=0xffff80131e783900, umsg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1322
#14 sys_sendmsg (sockfd=<optimized out>, msg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1374
#15 0xffffffff811cde12 in do_syscall64 (frame=<optimized out>) at arch/x86_64/syscall.cpp:44
#16 0xffffffff811b9b0f in syscall_ENTRY64 () at arch/x86_64/entry.S:130
#17 0x0000000000000033 in abi::abi_data ()
and
Page fault inside list_remove (with 0xDEB5 aka LIST_REMOVE_POISON)
0xffff804d72890ca0: 0xffff804d72890cd0 0xffffffff812df9a6 <_ZL17kmem_free_to_slabP10slab_cacheP4slabPv+166>
0xffff804d72890cd0: 0xffff804d72890cf0 0xffffffff812dd898 <_ZN10quarantine3popEv+40>
0xffff804d72890cf0: 0xffff804d72890d30 0xffffffff812ddb98 <_ZN10quarantine5flushEv+120>
0xffff804d72890d30: 0xffff804d72890e50 0xffffffff810b2bb1 <_Z15page_do_reclaimP12reclaim_data+257>
0xffff804d72890e50: 0xffff804d72890fd8 0xffffffff8109fd60 <_ZL10pagedaemonPv+1168>
0xffff804d72890fd8: 0xffff804d72890ff8 0xffffffff811cec26 <_ZN3x868internal19kernel_thread_startEPv+70>
Are these triggered because of memory pressure + the KASAN quarantine? I can't tell.
Some kernel support for /proc and pseudo filesystems in general is in order, maybe?
At least, dentries need to be invalidate-able.
Not enforcing PATH_MAX is an easy DoS vector
Raspberry pi time!
Sortix is beating us. Don't let that be the case.
After I invoke 'net_tests', there's likely memory corruption that crashes another part of the kernel. KASAN isn't catching it. I am confused.
A good start is by finishing up the UDP corking and bringing it to TCP
Onyx/kernel/kernel/fs/pipe.cpp
Line 306 in d0e1831
This issue arose using the latest gcc toolchain and latest sysroot when running make iso
Add a basic tracing infrastructure to the kernel, with an efficient ring buffer and trace events, etc.
Since e7871d3 we have a way to properly implement readv/writev for devices (legacy ->read and ->write are deprecated.). With read_iter and write_iter, writevs and readvs work just like the regular variants (think of the tty readv doing an internal read for an iov, getting data, then hanging for the next... yuck).
Right now, only addresses are printed in an event of crash, so one needs to run llvm-symbolize on those to get names of the functions involved. I think it would be good if this data was printed automatically. I'm thinking of linux's call trace as an example.
C++ mangled names may complicate things though.
Onyx/kernel/kernel/net/ipv4.cpp
Line 608 in d417f5b
POSIX requires us to only error with EBADF when the dirfd is bad and we're actually using it (so, we're not using the root directory).
I think it would be nice to have a copy of Onyx installed on a (virtual) hard drive. It would be useful for e.g. porting userspace stuff.
But IIRC there are some stability issues in such setups. What are those issues?
WORD_SIZE shouldn't be sizeof(size_t)/CHAR_BIT, but only the sizeof.
We have a strong need for efficient, percpu stat counters that do not require expensive atomics in fast paths. This may require a percpu memory allocator.
See the Paul McKenney perfbook for more details or ideas
Sending SIGHUPs to orphaned process groups, setting up the tty's concept of foreground pgrp, controlling terminals, and signals all need to be implemented
Go to every mmu.cpp and fix that. Also make sure to possibly copy the new arch/x86_64/linker.ld to other architectures. It's better overall, does away with the vdso section and puts bss at the end (smaller executables on disk).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.