Coder Social home page Coder Social logo

Comments (21)

tongchen126 avatar tongchen126 commented on July 3, 2024 3

Hello, I have tested debian on rocket using nexys video and a rootfs is uploaded to the release page, which uses sysvinit instead of systemd.
The link is here.
Also interested in your Fedora rootfs, Will give it a try in the future.

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024 1

The rocket core is a few months old, but I don't think that would have anything to do with ability to support Fedora or Debian. Its freshness can be assessed by looking at the commit log of the https://github.com/litex-hub/pythondata-cpu-rocket repo (specifically, commits mentioning "update to chipsalliance/rocket-chip commit #xxx").

So far, the way I've been going about trying to boot Fedora is as follows (EDIT: added more detailed instructions and downloadable links):

  1. This tarball is a fully functional rv64gc Fedora Rawhide VM image from the F31 time frame (i.e., the last one still using BBL, before they switched to OpenSBI). IOW, you can (and are encouraged to) set it up in QEMU, and boot it there (there's a root and user account, both using password riscv, as shown on the console right before the login prompt):
    • unpack the tarball under var/lib/libvirt/images/
    • import /var/lib/libvirt/images/fv64gc/fv64gc.xml using virsh and/or virt-manager
    • /var/lib/libvirt/images/fv64gc/bbl-5.1.0-0.rc1.git0.1.1.riscv64.fc31.riscv64 is the provided BBL-wrapped kernel image
    • /var/lib/libvirt/images/fv64gc/initramfs-5.2.0-0.rc7.git0.1.0.riscv64.fc31.riscv64.img is the initrd filesystem (also available on the "hard drive" image)
    • /var/lib/libvirt/images/fv64gc/Fedora-Developer-Rawhide-20190703.n.0-sda.raw is the HDD image, containing / and /boot partitions.
  2. Build the provided initrd image into our LiteX-specific linux kernel (litex-rebase branch):
        cp /var/lib/libvirt/images/fv64gc/initramfs*.img linux/initramfs.cpio.gz
    
        # build the kernel:
        pushd linux
        gunzip initramfs.cpio.gz
        make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- \
                litex_rocket_defconfig litex_rocket_initramfs.config
        make ARCH=riscv CROSS_COMPILE=riscv64-unknown-linux-gnu- -j3
        popd
    
        # modify DTS bootargs line and set `bootargs` to the string shown below:
        vi linux-on-litex-rocket/conf/nexys4ddr.dts
    
        bootargs = "earlycon=sbi console=hvc0 swiotlb=noforce ro root=/dev/mmcblk0p2 rootfstype=ext2 fsck.mode=skip ignore_loglevel plymout.enable.no systemd.log_level=debug";
    
        # build BBL:
        pushd riscv-pk/build
        ../configure --host=riscv64-unknown-linux-gnu \
                --with-arch=rv64imac \
                --with-payload=../../linux/vmlinux \
                --with-dts=../../conf/nexys4ddr.dts \
                --enable-logo
        make bbl
        riscv64-unknown-linux-gnu-objcopy -O binary bbl ~/boot.bin
        popd
    
  3. "Rip" the rv64gc-Fedora root filesystem:
    
        # make a copy of the raw image (it will be modified in the process):
        cp /var/lib/libvirt/images/fv64gc/*.raw .
    
        # mount the second partition from the image via loopback:
        losetup -f -P Fedora-Developer-Rawhide-20190703.n.0-sda.raw
        mount /dev/loop0p2 /mnt
    
        # modify existing fstab entries' device, FS type, and timeout values as shown below:
        vi /mnt/etc/fstab
    
        /dev/mmcblk0p2 / ext2 defaults,noatime,x-systemd.device-timeout=0,x-systemd.mount-timeout=0 0 0
        /dev/mmcblk0p1 /boot msdos defaults,noatime,x-systemd.device-timeout=0,x-systemd.mount-timeout=0 0 0
    
        # archive all files:
        (cd /mnt; find . | cpio -H newc -o | xz > ~/mmcblk0p2.ext2.cpio.xz)
    
        # unmount and disconnect image from loopback:
        umount /mnt
        losetup -d /dev/loopX
    
  4. Prepare the microSD card (e.g., /dev/sdd):
        # fdisk /dev/sdd
    
                Welcome to fdisk (util-linux 2.35.2).
                Changes will remain in memory, until you decide to write them.
                Be careful before using the write command.
    
                Command (m for help): p
                Disk /dev/sdd: 29.74 GiB, 31914983424 bytes, 62333952 sectors
                Disk model: SD/MMC
                Units: sectors of 1 * 512 = 512 bytes
                Sector size (logical/physical): 512 bytes / 512 bytes
                I/O size (minimum/optimal): 512 bytes / 512 bytes
                Disklabel type: dos
                Disk identifier: 0x67f480f9
    
                Device     Boot   Start      End  Sectors  Size Id Type
                /dev/sdd1          2048  2099199  2097152    1G  6 FAT16
                /dev/sdd2       2099200 62333951 60234752 28.7G 83 Linux
    
        # mkdosfs /dev/sdd1
        # mount /dev/sdd1 /mnt
        # cp boot.bin /mnt/
        # umount /mnt
    
        # mkfs.ext2 /dev/sdd2
        # mount /dev/sdd2 /mnt
        # (cd /mnt; xzcat ~/mmcblk0p2.ext2.cpio.xz | cpio -id)
        # umount /mnt
    
  5. Program the LiteX bitstream into the board, with microSD card plugged in, boot using sdcardboot

Right now, it gets stuck here for me:
LitexRocketFedoraStuck

So, any ideas on how to get past "a start job is waiting for /dev/mmcblk0p2" would be super helpful.

LMK what you think.

from linux-on-litex-rocket.

cjearls avatar cjearls commented on July 3, 2024

I don't know what task it could be waiting for, but I have a few ideas for debugging or fixing the problem in a few different directions, so here they are in no particular order:

  1. Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.

  2. It seems like newer versions of Fedora use OpenSBI, which from this webpage about requirements (https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md) appears to support trapping and handling floating point. It may be that whatever issue is happening was solved in a newer release, so the answer might be as simple as getting set up to use OpenSBI and adding support for newer versions of Fedora. I'm not sure exactly all that would go into switching from BBL, though, so maybe this isn't feasible.

  3. We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly

What do you think? I'm still learning a lot about many of these projects, so I'm sure if any of these were as simple as "just do X" and it's fixed, it would be working by now.

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024
  1. Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.

Rigging up a LiteX+Rocket system to a debugger and stepping through CPU instructions would be interesting and useful in general, though not sure it's the right level of detail for this particular problem. Somehow bumping the systemd debug output level to where it would tell us what it's really waiting for might be faster in this particular case.

  1. It seems like newer versions of Fedora use OpenSBI, which from this webpage about requirements (https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md) appears to support trapping and handling floating point.

Actually, I thought the opposite was true: riscv-software-src/opensbi#148 (comment)
BBL does handle FPU emulation, which is one reason why I was trying to stick with it (for now). Adopting opensbi would either rule out the "smaller" FPGAs (e.g., ecp5) or would require someone to get its maintainers to revisit their decision w.r.t. FPU emulation.

  1. We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly

I don't know the status of LiteSDCard simulation -- could one simulate a SDCard populated with 16GB worth of Fedora filesystems?
I do remember that last time I tried (a few years ago) it took about 9 hours to boot a 64-bit kernel on rocket in Verilator :)

I'm a bit short on spare time ATM, but I'll try to sanitize and publish my rv64gc Fedora VM disk image, so you should be able to at least catch up to where it gets stuck for me. Then, maybe one of us can dive into the systemd voodoo and figure out what's really getting stuck...

Adding opensbi support for LiteX (in general, and LiteX+Rocket in particular) is another worthy long term project, although not on my personal todo list. But it is indeed the way Fedora is headed, so it would have to get sorted out eventually in any case...

from linux-on-litex-rocket.

cjearls avatar cjearls commented on July 3, 2024
  1. Rocket appears to support debugging over JTAG, so we might be able to get a better idea of what's going on when it hangs if we enable debugging, resynthesize Rocket, and figure out where execution is when the boot hangs.

Rigging up a LiteX+Rocket system to a debugger and stepping through CPU instructions would be interesting and useful in general, though not sure it's the right level of detail for this particular problem. Somehow bumping the systemd debug output level to where it would tell us what it's really waiting for might be faster in this particular case.

I see. I agree, it could be helpful in general, but it makes sense that it might be a little too low-level of a solution.

Actually, I thought the opposite was true: riscv/opensbi#148 (comment)
BBL does handle FPU emulation, which is one reason why I was trying to stick with it (for now). Adopting opensbi would either rule out the "smaller" FPGAs (e.g., ecp5) or would require someone to get its maintainers to revisit their decision w.r.t. FPU emulation.

The comment here is specifically referring to emulating the atomics, which OpenSBI needs to operate correctly, so it can't emulate that extension. I got this link directly from the OpenSBI Github https://review.coreboot.org/plugins/gitiles/opensbi/+/HEAD/docs/platform_requirements.md, and it says "The base RISC-V platform requirements for OpenSBI are [...] At least rv32ima or rv64ima required on all HARTs [...] The RISC-V extensions not covered by rv32ima or rv64ima are optional for OpenSBI. Although, OpenSBI will detect and handle some of these optional RISC-V extensions at runtime.

The optional RISC-V extensions handled by OpenSBI at runtime are:

D-extension: Double precision floating point
F-extension: Single precision floating point
H-extension: Hypervisor"
This seems to suggest that the F and D extensions would be automatically handled by OpenSBI

  1. We could try simulating the LiteX to see if the bug still occurs. It might take a few days to get to the same point in the boot process, but if the bug doesn't occur when simulating, it would point to this being a hardware problem, which would narrow down the places we have to search for bugs significantly

I don't know the status of LiteSDCard simulation -- could one simulate a SDCard populated with 16GB worth of Fedora filesystems?
I do remember that last time I tried (a few years ago) it took about 9 hours to boot a 64-bit kernel on rocket in Verilator :)

I'm a bit short on spare time ATM, but I'll try to sanitize and publish my rv64gc Fedora VM disk image, so you should be able to at least catch up to where it gets stuck for me. Then, maybe one of us can dive into the systemd voodoo and figure out what's really getting stuck...

Sounds good. There's no rush, my semester finishes up over the next 2-3 weeks, so my spare time is partially constrained until then as well.

Adding opensbi support for LiteX (in general, and LiteX+Rocket in particular) is another worthy long term project, although not on my personal todo list. But it is indeed the way Fedora is headed, so it would have to get sorted out eventually in any case...

I think OpenSBI is already supported for Linux-on-Litex-VexRiscv, they have prebuilt bitstreams of it, and when I run Linux-on-LiteX-VexRiscv on my OrangeCrab board, it displays this before booting:
"--============= Liftoff! ===============--

OpenSBI v0.8-1-gecf7701


/ __ \ / | _ _ |
| | | |
__ ___ _ __ | (
| |
) || |
| | | | '_ \ / _ \ '_ \ ___ | _ < | |
| || | |) | __/ | | |) | |) || |
_
/| ./ _|| ||/|____/|
| |
|_|

Platform Name : LiteX / VexRiscv-SMP
Platform Features : timer,mfdeleg
Platform HART Count : 8
Boot HART ID : 0
Boot HART ISA : rv32imas
BOOT HART Features : time
BOOT HART PMP Count : 0
Firmware Base : 0x40f00000
Firmware Size : 124 KB
Runtime SBI Version : 0.2
"

from linux-on-litex-rocket.

jluebbe avatar jluebbe commented on July 3, 2024

So, any ideas on how to get past "a start job is waiting for /dev/mmcblk0p2" would be super helpful.

I've seen issues similar to this caused by kernel configuration incompatible with udev/systemd. Comparing https://github.com/litex-hub/linux/blob/litex-rebase/arch/riscv/configs/litex_rocket_defconfig with https://github.com/systemd/systemd/blob/main/README I see:
CONFIG_SYSFS_DEPRECATED is y but should be n

The rest looks fine on first glance.

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

CONFIG_SYSFS_DEPRECATED is y but should be n

Thanks @jluebbe -- I tried turning that off and it's still mostly the same behavior. Now, there's a time limit on waiting for /dev/mmcblk0p2 (there sometimes was one before, not sure how it stochastically picks between no limit and 1min 30s) and the boot process ends up like this:

...
[  OK  ] Started Journal Service.
         Starting Create Volatile Files and Directories...
[  OK  ] Started udev Kernel Device Manager.
         Starting udev Coldplug all Devices...
[  133.493104] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[  *** ] (2 of 3) A start job is running for /dev/mmcblk0p2 (21s / 1min 30s)
[ TIME ] Timed out waiting for device /dev/mmcblk0p2.
[DEPEND] Dependency failed for Initrd Root Device.
[DEPEND] Dependency failed for /sysroot.
[DEPEND] Dependency failed for Initrd Root File System.
[DEPEND] Dependency failed for Relo…figuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[  213.090818] systemd-journald[40]: Sent WATCHDOG=1 notification.
         Starting Setup Virtual Console...
[  OK  ] Reached target Paths.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
[  214.653112] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[  OK  ] Started Setup Virtual Console.
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.
[  311.781778] systemd-journald[40]: Sent WATCHDOG=1 notification.
[  315.693123] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[  321.113122] systemd-journald[40]: Successfully sent stream file descriptor to service manager.
[  322.911197] systemd-journald[40]: Data hash table of /run/log/journal/3800c492b882414397814a5b17d8b631/system.journal has a fill level at 75.1 (1537 of 2047 items, 745472 file size, 485 bytes per hash table item), suggesting rotation.
[  323.033213] systemd-journald[40]: /run/log/journal/3800c492b882414397814a5b17d8b631/system.journal: Journal header limits reached or header out-of-date, rotating.
[  323.040566] systemd-journald[40]: Rotating...
[  323.106898] systemd-journald[40]: Journal effective settings seal=no compress=yes compress_threshold_bytes=512B
[  323.384536] systemd-journald[40]: Reserving 2047 entries in hash table.
[  323.392181] systemd-journald[40]: Vacuuming...
[  323.466821] systemd-journald[40]: Vacuuming done, freed 0B of archived journals from /run/log/journal/3800c492b882414397814a5b17d8b631.
[  393.123104] systemd-journald[40]: Sent WATCHDOG=1 notification.
[  491.781609] systemd-journald[40]: Sent WATCHDOG=1 notification.
[  578.973101] systemd-journald[40]: Sent WATCHDOG=1 notification.
[  671.781650] systemd-journald[40]: Sent WATCHDOG=1 notification.
...

with endless Sent WATCHDOG output, and no emergency shell.

I tried manually creating /dev/mmcblk0[p1|p2] nodes in the initrd cpio before embedding it into the kernel and BBL, but that didn't seem to help either. Not sure what it's "waiting" for, really... Systemd is not the most intuitive thing to debug, once one leaves the "beaten path" :)

from linux-on-litex-rocket.

michaelolbrich avatar michaelolbrich commented on July 3, 2024

Creating device nodes in the initrd will not help. Systemd will mount devtmpfs to /dev right at the beginning.
And waiting for devices does not mean waiting for the device node in /dev. It means waiting for the uevent. If the device is already there when udev starts then coldplug should trigger that uevent.

And it looks like the uevents never show up. In the past, this was often caused by a missing CONFIG_FHANDLE=y in the kernel config. But that's enabled by default in recent kernels.

The other odd thing is, that the Emergency Shell is starting, but apparently not in the tty that you're using. Are there multiple console=.. arguments in the kernel command-line? I think the shell will only start on one of them.

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

And it looks like the uevents never show up. In the past, this was often caused by a missing CONFIG_FHANDLE=y in the kernel config. But that's enabled by default in recent kernels.

Yeah, I can find CONFIG_FHANDLE=y in my kernel .config...

The other odd thing is, that the Emergency Shell is starting, but apparently not in the tty that you're using. Are there multiple console=.. arguments in the kernel command-line? I think the shell will only start on one of them.

My bootargs looks like this:

bootargs = "earlycon=sbi console=hvc0 swiotlb=noforce root=/dev/mmcblk0p2 rootfstype=ext2 fsck.mode=skip ignore_loglevel plymouth.enable=no systemd.log_level=debug";

Interaction with the uart currently occurs over ecalls into machine mode, where BBL takes care of it. I know there's a LiteX uart driver in linux proper, not sure if I actually need to figure out how to use it directly for this to work...

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

for anyone interested, I edited #10 (comment) with a comprehensive list of steps going from a functional, standard-issue QEMU rv64gc VM to wherever I got stuck trying to get it booted on LiteX.
I think any further progres from this point is down to systemd voodoo... :)

from linux-on-litex-rocket.

rdolbeau avatar rdolbeau commented on July 3, 2024

I've had no end of trouble running a basic Yocto on a Litex/VexRiscv where buildroot is fine using the same (Yocto-compiled) kernel, including after adding a lot of locally compiled package. Indeed there could be some bad interactions between the devices in Litex and systemd...

For opensbi I had managed to compile a 64-bits variant of the Litex/VexRiscv to use with Rocket, but it didn't work. I'm attaching the patch in case it could be useful as a starting point (based on the litex opensbi) opensbi_litexrocket.txt

from linux-on-litex-rocket.

cjearls avatar cjearls commented on July 3, 2024

It might be worthwhile to check out https://github.com/firesim/FireMarshal, it appears to generate working RISCV Linux distributions

from linux-on-litex-rocket.

troibe avatar troibe commented on July 3, 2024

Thank you @tongchen126 this is really impressive!
How long did it take you to figure everything out?
Especially the part of using sysvinit instead of systemd...
@gsomlo Maybe we can integrate it in this repository if you can reproduce it.

from linux-on-litex-rocket.

tongchen126 avatar tongchen126 commented on July 3, 2024

@developandplay
Yeah, take me quite some time to make it work. Anyway, I wish the RCU stall and kernel oops when using systemd can be fixed in the future.

from linux-on-litex-rocket.

roryt12 avatar roryt12 commented on July 3, 2024

I think I have some progress on this. Please check my steps here

https://github.com/roryt12/Riscv64-Debian-qmtech-wukong-FPGA

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

from linux-on-litex-rocket.

roryt12 avatar roryt12 commented on July 3, 2024

Gabriel that was very helpful, indeed, port.data_width gave me 128 ! Thank you. Maybe will be helpful for others to add an INFO message there for the feature ?

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

from linux-on-litex-rocket.

roryt12 avatar roryt12 commented on July 3, 2024

Bummer! After one day of flawless operation, I got RCU stalls again when I enabled sshd and logged in with 2 additional sessions. I used rcupdate.rcu_cpu_stall_suppress=1 in the boot args, and now it runs smoothly.

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

from linux-on-litex-rocket.

gsomlo avatar gsomlo commented on July 3, 2024

After some hacking on the linux drivers/tty/serial/liteuart.c driver, I got it to go from crashing before/during login on fedora to actually working:

At this point, I believe building LiteX + Rocket on any board that can accomodate a "full" variant (i.e., one which implements an FPU in gateware) should support running Fedora or any other rv64gc distro once the liteuart linux driver is fixed upstream and the fix makes its way into a distro kernel. I'm currently working on that (slowly).

from linux-on-litex-rocket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.