Coder Social home page Coder Social logo

Comments (8)

palmer-dabbelt avatar palmer-dabbelt commented on June 27, 2024

I tried the Fedora Kconfig and it appears to use a dummy console. If I add CONFIG_NO_HZ=y to my regular config then I can boot, but if I select CONFIG_NO_HZ_IDLE=y then I get a hang. Do you have a Kconfig that manifests your error in QEMU simulation (ie, without a PCIe graphics card)? Walking all the way across the lab to check if the machine boots is a headache :).

from riscv-linux.

atishp04 avatar atishp04 commented on June 27, 2024

For me, it booted but hangs after
"Starting logging: OK
Starting mdev...
sort: /sys/devices/platform/Fixed: No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
Initializing random number generator... done.
Starting network...
udhcpc (v1.24.2) started
Sending discover...
Sending select for 10.0.2.15...
Lease of 10.0.2.15 obtained, lease time 86400
deleting routers
adding dns 10.0.2.3
Starting dropbear sshd: OK"

Are you seeing the same hang ? I am seeing same hang for both
CONFIG_NO_HZ=y
or
CONFIG_NO_HZ=y
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ_COMMON=y


However, it hangs during the boot for spike/smp combination (for both configuration).
[ 0.020000] clocksource: Switched to clocksource riscv_clocksource
[ 0.020000] NET: Registered protocol family 2
[ 0.020000] TCP established hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.020000] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.020000] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.020000] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[ 0.020000] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[ 0.030000] NET: Registered protocol family 1
[ 0.050000] Unpacking initramfs...
[ 0.080000] Initialise system trusted keyrings
[ 0.080000] workingset: timestamp_bits=62 max_order=19 bucket_order=0
[ 21.020000] INFO: rcu_sched self-detected stall on CPU
[ 21.020000] INFO: rcu_sched self-detected stall on CPU
[ 21.020000] 3-...!: (698636 ticks this GP) idle=2b6/2/0 softirq=1/1 fqs=0
[ 21.020000] (t=2100 jiffies g=-299 c=-300 q=1376)
[ 21.020000] rcu_sched kthread starved for 2100 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[ 21.020000] rcu_sched I 0 8 2 0x00000000
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea
[ 21.020000] [< (ptrval)>] schedule+0x16/0x58
[ 21.020000] [< (ptrval)>] schedule_timeout+0x12a/0x248
[ 21.020000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc
[ 21.020000] [< (ptrval)>] kthread+0xcc/0xde
[ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.020000] Task dump for CPU 2:
[ 21.020000] swapper/2 R running task 0 0 1 0x00000008
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea
[ 21.020000] Task dump for CPU 3:
[ 21.020000] swapper/3 R running task 0 0 1 0x00000008
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] walk_stackframe+0x0/0xa2
[ 21.020000] [< (ptrval)>] show_stack+0x26/0x34
[ 21.020000] [< (ptrval)>] sched_show_task+0xa6/0xfc
[ 21.020000] [< (ptrval)>] dump_cpu_task+0x3a/0x48
[ 21.020000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8
[ 21.020000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660
[ 21.020000] [< (ptrval)>] update_process_times+0x1e/0x48
[ 21.020000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2
[ 21.020000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32
[ 21.020000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2
[ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.030000] 2-...!: (698635 ticks this GP) idle=31a/2/0 softirq=1/1 fqs=0
[ 21.040000] (t=2102 jiffies g=-299 c=-300 q=1376)
[ 21.040000] rcu_sched kthread starved for 2102 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=3
[ 21.040000] rcu_sched R running task 0 8 2 0x00000000

I am running with fairly latest linux
commit 8fa96b8
Author: Palmer Dabbelt [email protected]
Date: Wed Feb 7 18:23:09 2018 -0800

irq-riscv-intc post-merge cleanups

Should I update anything else to verify?

from riscv-linux.

palmer-dabbelt avatar palmer-dabbelt commented on June 27, 2024

The difference is probably just my working directory being a mess, but I can reproduce some failures with

$ git diff
diff --git a/Makefile b/Makefile
index 408a4247fe64..6375cab35974 100644
--- a/Makefile
+++ b/Makefile
@@ -130,6 +130,7 @@ endif
 
 $(vmlinux): $(linux_srcdir) $(linux_wrkdir)/.config $(buildroot_initramfs_sysroot_stamp)
        $(MAKE) -C $< O=$(linux_wrkdir) \
+               CROSS_COMPILE=riscv64-unknown-linux-gnu- \
                CONFIG_INITRAMFS_SOURCE="$(confdir)/initramfs.txt $(buildroot_initramfs_sysroot)" \
                CONFIG_INITRAMFS_ROOT_UID=$(shell id -u) \
                CONFIG_INITRAMFS_ROOT_GID=$(shell id -g) \
diff --git a/conf/linux_defconfig b/conf/linux_defconfig
index eff1c1731aa8..e0aac738869a 100644
--- a/conf/linux_defconfig
+++ b/conf/linux_defconfig
@@ -83,3 +83,6 @@ CONFIG_MTD=y
 CONFIG_MTD_SPI_NOR=y
 CONFIG_MTD_M25P80=y
 CONFIG_MTD_BLOCK_RO=y
+CONFIG_NO_HZ=y
+CONFIG_NO_HZ_IDLE=y
+CONFIG_NO_HZ_COMMON=y
diff --git a/linux b/linux
index 8fa96b887786..632fc702e0ef 160000
--- a/linux
+++ b/linux
@@ -1 +1 @@
-Subproject commit 8fa96b8877861fbcc7340f8f0a975569da254d93
+Subproject commit 632fc702e0efd519bf75f55c7f85ac7300e08dab
diff --git a/riscv-gnu-toolchain b/riscv-gnu-toolchain
--- a/riscv-gnu-toolchain
+++ b/riscv-gnu-toolchain
@@ -1 +1 @@
-Subproject commit b4dae89f85bf882852c6186b1284df11065bfcd9
+Subproject commit b4dae89f85bf882852c6186b1284df11065bfcd9-dirty
diff --git a/riscv-qemu b/riscv-qemu
index cd2c21d70361..6256f8ac6004 160000
--- a/riscv-qemu
+++ b/riscv-qemu
@@ -1 +1 @@
-Subproject commit cd2c21d70361f6887e265860c4f0dbb49ad323ca
+Subproject commit 6256f8ac6004677cc4652b1fcd77c839dfc05015-dirty

so I'm going to go figure out what's going on.

from riscv-linux.

palmer-dabbelt avatar palmer-dabbelt commented on June 27, 2024

@atishp04 is going to take a look, thanks!

from riscv-linux.

atishp04 avatar atishp04 commented on June 27, 2024

Just a quick update. I am able to get repeated stall warnings in qemu setup as well (with smp).

[ 21.020000] INFO: rcu_sched self-detected stall on CPU
[ 21.020000] INFO: rcu_sched self-detected stall on CPU
[ 21.020000] 3-...!: (698636 ticks this GP) idle=2b6/2/0 softirq=1/1 fqs=0
[ 21.020000] (t=2100 jiffies g=-299 c=-300 q=1376)
[ 21.020000] rcu_sched kthread starved for 2100 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
[ 21.020000] rcu_sched I 0 8 2 0x00000000
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea
[ 21.020000] [< (ptrval)>] schedule+0x16/0x58
[ 21.020000] [< (ptrval)>] schedule_timeout+0x12a/0x248
[ 21.020000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc
[ 21.020000] [< (ptrval)>] kthread+0xcc/0xde
[ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.020000] Task dump for CPU 2:
[ 21.020000] swapper/2 R running task 0 0 1 0x00000008
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] __schedule+0x1c6/0x4ea
[ 21.020000] Task dump for CPU 3:
[ 21.020000] swapper/3 R running task 0 0 1 0x00000008
[ 21.020000] Call Trace:
[ 21.020000] [< (ptrval)>] walk_stackframe+0x0/0xa2
[ 21.020000] [< (ptrval)>] show_stack+0x26/0x34
[ 21.020000] [< (ptrval)>] sched_show_task+0xa6/0xfc
[ 21.020000] [< (ptrval)>] dump_cpu_task+0x3a/0x48
[ 21.020000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8
[ 21.020000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660
[ 21.020000] [< (ptrval)>] update_process_times+0x1e/0x48
[ 21.020000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2
[ 21.020000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32
[ 21.020000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2
[ 21.020000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.030000] 2-...!: (698635 ticks this GP) idle=31a/2/0 softirq=1/1 fqs=0
[ 21.040000] (t=2102 jiffies g=-299 c=-300 q=1376)
[ 21.040000] rcu_sched kthread starved for 2102 jiffies! g18446744073709551317 c18446744073709551316 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=3
[ 21.040000] rcu_sched R running task 0 8 2 0x00000000
[ 21.040000] Call Trace:
[ 21.040000] [< (ptrval)>] __schedule+0x1c6/0x4ea
[ 21.040000] [< (ptrval)>] schedule+0x16/0x58
[ 21.040000] [< (ptrval)>] schedule_timeout+0x12a/0x248
[ 21.040000] [< (ptrval)>] rcu_gp_kthread+0x31e/0x6cc
[ 21.040000] [< (ptrval)>] kthread+0xcc/0xde
[ 21.040000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.040000] Task dump for CPU 2:
[ 21.040000] swapper/2 R running task 0 0 1 0x00000008
[ 21.040000] Call Trace:
[ 21.040000] [< (ptrval)>] walk_stackframe+0x0/0xa2
[ 21.040000] [< (ptrval)>] show_stack+0x26/0x34
[ 21.040000] [< (ptrval)>] sched_show_task+0xa6/0xfc
[ 21.050000] [< (ptrval)>] dump_cpu_task+0x3a/0x48
[ 21.050000] [< (ptrval)>] rcu_dump_cpu_stacks+0x7a/0xb8
[ 21.050000] [< (ptrval)>] rcu_check_callbacks+0x4a0/0x660
[ 21.050000] [< (ptrval)>] update_process_times+0x1e/0x48
[ 21.050000] [< (ptrval)>] tick_nohz_handler+0x7c/0xd2
[ 21.050000] [< (ptrval)>] riscv_timer_interrupt+0x26/0x32
[ 21.050000] [< (ptrval)>] riscv_intc_irq+0xaa/0xd2
[ 21.050000] [< (ptrval)>] ret_from_syscall+0xa/0xe
[ 21.050000] Task dump for CPU 3:
[ 21.050000] swapper/3 R running task 0 0 1 0x00000008
[ 21.050000] Call Trace:
[ 21.050000] [< (ptrval)>] __schedule+0x1c6/0x4ea

The first one is after 21sec and rest of them keeps appearing after each ~63 secs. Still investigating.

from riscv-linux.

atishp04 avatar atishp04 commented on June 27, 2024

Got some time to look at the issue.
It looks like it's a self detected stall on one or more cpus where rcu_sched kthread couldn't run for more than 21 seconds. All other cpus seems to run the idle thread (swapper task).

arch_idle() function invokes wfi which puts the cpu to until "idle" mode until the next interrupt happens.
In periodic tick, the cpus are woken up by periodic timer tick. However, the no_hz idle case it should be only woken up when there is some task queued i.e. scheduler IPIs.

Here are possible causes:
1.All of the cpus are in idle. Thus nobody is forwarding the jiffies - This is not the case. We can see repeated stalls have correct jiffies.
2. The concerned cpu never returned from WFI after IPI. I am still investigating this cause. Doesn't seem likely because I have qemu logs which suggest the cpus do return from WFI after IPI.

  1. The concerned cpu immediately returns from WFI thus looping continuously without giving any other thread chance to run.

  2. My initial assumption was something wrong with timer_probe and clock source being initialized for each cpu. But that has been fixed with following patches.

http://lists.infradead.org/pipermail/linux-riscv/2018-April/000510.html

It is difficult to debug as I can't put printk in the arch_idle function as it is called so many times that serial console will hang in the first place. trace_printk is of no help as it is stuck during the boot itself. I am not aware of any method to access the trace buffer unless it's booted completely.

Looking for any ideas or suggestions to debug the problem further.

from riscv-linux.

sorear avatar sorear commented on June 27, 2024

Add an instruction or CSR to qemu which does printf on a register, then debug by doing csr writes in arch_idle? If you're feeling ambitious implement Liviu's semihosting proposal (already in openocd and should be findable on sw-dev, it's a small change to EBREAK translation)

from riscv-linux.

atishp04 avatar atishp04 commented on June 27, 2024

Fixed in
atishp04@fd1e893

We can close this issue now.

from riscv-linux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.