openxiangshan / gem5 Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
The function clearPrefetched()
doesn't clear the _ever_prefetched
flag. It seems that here we need a clearEverPrefetched()
like function. When that is applied, there is a slight change in scores(<0.02).
GEM5/src/mem/cache/cache_blk.hh
Lines 205 to 209 in 6a85ffb
用这个命令编译scons ./build/RISCV/gem5.opt -j 64,报下面的错误:
In file included from build/RISCV/arch/generic/pcstate.hh:49,
from build/RISCV/arch/generic/isa.hh:45,
from build/RISCV/arch/riscv/isa.hh:39,
from build/RISCV/arch/riscv/tlb.hh:38,
from build/RISCV/arch/riscv/tlb.cc:32:
build/RISCV/arch/riscv/tlb.cc: In member function ‘gem5::Fault gem5::RiscvISA::TLB::doTranslate(const RequestPtr&, gem5::ThreadContext*, gem5::BaseMMU::Translation*, gem5::BaseMMU::Mode, bool&)’:
build/RISCV/base/trace.hh:188:54: error: ‘paddr’ may be used uninitialized [-Werror=maybe-uninitialized]
188 | ::gem5::Trace::getDebugLogger()->dprintf_flag( \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
189 | ::gem5::curTick(), name(), #x, __VA_ARGS__); \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
build/RISCV/arch/riscv/tlb.cc:1436:9: note: in expansion of macro ‘DPRINTF’
1436 | DPRINTF(TLB, "translate(vpn=%#x, asid=%#x): %#x pc%#x\n", vaddr,
| ^~~~~~~
build/RISCV/base/trace.hh:75:10: note: by argument 8 of type ‘const long unsigned int&’ to ‘void gem5::Trace::Logger::dprintf_flag(gem5::Tick, const string&, const string&, const char*, const Args& ...) [with Args = {long unsigned int, gem5::BitfieldType<gem5::bitfield_backend::Unsigned<long unsigned int, 59, 44> >, long unsigned int, long unsigned int}]’ declared here
75 | void dprintf_flag(Tick when, const std::string &name,
| ^~~~~~~~~~~~
build/RISCV/arch/riscv/tlb.cc:1264:10: note: ‘paddr’ declared here
1264 | Addr paddr;
用的xs-dev 分支。
还有你们的香山核在gem5里就是DerivO3CPU吗?
simple_gem5.sh 跑baremetal bin 的时候报这个错误
Attach 1 decoders to thread with addr: <orphan System>.cpu.decoder
Create threads for test sys cpu (RiscvO3CPU)
Add dtb for L1D prefetcher
Add L2 prefetcher as downstream of L1D prefetcher
Add L3 prefetcher as downstream of L2 prefetcher
Add dtb for L2 prefetcher
Finish memory system configuration
No cpu_class provided
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.bop_large
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.bop_small
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.ipcp
Registering probe listeners for Prefetcher system.cpu.dcache.prefetcher.spp
Registering probe listeners for Prefetcher system.l2.prefetcher
Registering probe listeners for Prefetcher system.l3.prefetcher
**** REAL SIMULATION ****
build/RISCV/sim/simulate.cc:194: info: Entering event queue @ 0. Starting simulation...
gem5 has encountered a segmentation fault!
重复步骤:
按照simple_gem5.sh 修改脚本如下:
修改--generic-rv-cpt=./xiangshang/benchmark/image/coremark.bare.1timesriscv。
完整脚本如下
# DO NOT track your local updates in this script!
# set -x
export gem5_home=/xiangshang/gem5/GEM5-xs-dev # The root of GEM5 project
export gem5=$gem5_home/build/RISCV/gem5.fast # GEM5 executable
# Note 1: workload list contains the workload name, checkpoint path, and parameters, looks like:
# astar_biglakes_122060000000 astar_biglakes_122060000000_0.244818/0/ 0 0 20 20
# bwaves_1003220000000 bwaves_1003220000000_0.036592/0/ 0 0 20 20
# Note 2: The meaning of fields:
# workload_name, checkpoint_path, skip insts(usually 0), functional_warmup insts(usually 0), detailed_warmup insts (usually 20), sample insts
# Note 3: you can write a script to generate such a list accordingly
export desc_dir=/xiangshang/benchmark/coremark/fs
export workload_list=/xiangshang/benchmark/coremark/fs/int_list.lst
# The checkpoint directory. We will find checkpoint_path in workload_list
# under this directory to get the checkpoint path.
export cpt_dir='/xiangshang/benchmark/image'
# A tag to identify current batch run
export tag="an-example-to-run-gem5-with-composite-prefetcher"
export log_file='log.txt'
export ds=$(pwd) # data storage. It is specific for BOSC machines, you can ignore it
export top_work_dir=$tag
export full_work_dir=$ds/exec-storage/$top_work_dir # work dir wheter stats data stored
mkdir -p $full_work_dir
ln -sf $full_work_dir . # optional, you can customize it yourself
check() {
if [ $1 -ne 0 ]; then
echo FAIL
touch abort
exit
fi
}
function run() {
set -x
cpt=$1
dw_len=${2:-20000000}
# dw_len=${2:-1525605}
total_detail_len=${3:-40000000}
if [[ -n "$4" ]]; then
work_dir=$4
else
work_dir=$PWD
fi
arch_db=${5:-0}
cd $work_dir
if test -f "completed"; then
echo "Already completed; skip $cpt"
return
fi
rm -f abort
rm -f completed
cpt_name=$(basename -- "$cpt")
extension="${cpt_name##*.}"
# replace the path of gcpt.bin with your gcpt restorer
# gcpt restorer can be found in https://github.com/OpenXiangShan/NEMU/tree/gem5-ref-main/resource/gcpt_restore
# Please use gem5-ref-main branch
cpt_option="--generic-rv-cpt=$cpt --gcpt-restorer=/xiangshang/NEMU-gem5-ref-main/resource/gcpt_restore/build/gcpt.bin"
# You can also pass a baremetal bin here
if [ $extension != "gz" ]; then
cpt_option="--generic-rv-cpt=./xiangshang/benchmark/image/coremark.bare.1timesriscv --raw-cpt"
fi
if [[ "$arch_db" -eq "0" ]]; then
arch_db_args=
else
arch_db_args="--enable-arch-db --arch-db-file=mem_trace.db --arch-db-fromstart=True"
fi
if [[ -z "$crash_tick" ]]; then
crash_tick=-1
fi
if [[ -z "$capture_cycles" ]]; then
capture_cycles=30000
fi
start=$(($crash_tick - 500*$capture_cycles))
# start=$crash_tick
start=$(($start>0 ? $start : 0))
end=$(($crash_tick + 500*$capture_cycles))
start_end=" --debug-start=$start --debug-end=$end "
if [[ -n "$debug_flags" ]]; then
debug_flag_args=" --debug-flag=$debug_flags "
else
echo "No debug flag set"
debug_flag_args=
start_end=
fi
# --debug-flags=CommitTrace \
if [[ $crash_tick = -1 ]]; then
start_end=
debug_flag_args=
fi
echo "total_detail_len: $total_detail_len"
# gdb -ex run --args \
# Note 1: Use DecoupledBPUWithFTB to enable nanhu's decoupled frontend
# Note 2: MUST use DRAMsim3, or performance is skewed
# To enable DRAMSim3, follow ext/dramsim3/README
# Note 3: By default use MultiPrefetcher (SMS + BOP) as L2 prefetcher
# Note 4: Recommend to enable Difftest
######## Some additional args:
$gem5 $debug_flag_args $start_end \
$gem5_home/configs/example/fs.py \
--xiangshan-system --cpu-type=DerivO3CPU \
--mem-size=8GB \
--caches --cacheline_size=64 \
--l1i_size=64kB --l1i_assoc=8 \
--l1d_size=64kB --l1d_assoc=8 \
--l1d-hwp-type=XSCompositePrefetcher \
--short-stride-thres=0 \
--l2cache --l2_size=1MB --l2_assoc=8 \
--l3cache --l3_size=16MB --l3_assoc=16 \
--l1-to-l2-pf-hint \
--l2-hwp-type=WorkerPrefetcher \
--l2-to-l3-pf-hint \
--l3-hwp-type=WorkerPrefetcher \
--mem-type=DRAMsim3 \
--dramsim3-ini=$gem5_home/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini \
--bp-type=DecoupledBPUWithFTB --enable-loop-predictor \
--enable-difftest \
$arch_db_args $cpt_option \
--maxinsts=3849417830
check $?
# Here is a scratchpad for frequently used options
# Enable complex stride component or SPP component in composite prefetcher
# --l1d-enable-cplx \
# --l1d-enable-spp \
# Record arch db traces only after warmup
# --arch-db-fromstart=False
# Enable loop predictor and loop buffer
# --enable-loop-predictor \
# --enable-loop-buffer \
# Employ an ideal L2 cache with nearly-perfetch hit rate and low-access latency
# --mem-type=SimpleMemory \
# --ideal-cache \
# Debugging memory corruption or memory leak
# valgrind -s --track-origins=yes --leak-check=full --show-leak-kinds=all --log-file=valgrind-out-2.txt --error-limit=no -v \
touch completed
}
function prepare_env() {
set -x
echo "prepare_env $@"
all_args=("$@")
task=${all_args[0]}
task_path=${all_args[1]}
gz=$(find -L $cpt_dir -wholename "*${task_path}*gz" | head -n 1)
echo $gz
work_dir=$top_work_dir/$task
echo $work_dir
mkdir -p $work_dir
}
function arg_wrapper() {
prepare_env $@
all_args=("$@")
args=(${all_args[0]})
k=1000
M=$((1000 * $k))
skip=${args[2]}
fw=${args[3]}
dw=${args[4]}
sample=${args[5]}
total_M=$(( ($dw + $sample)*$M ))
dw_M=$(( $dw*$M ))
run $gz $dw_M $total_M $work_dir 0 >$work_dir/$log_file 2>&1
}
function single_run() {
# run /nfs-nvme/home/zhouyaoyang/projects/nexus-am/apps/cachetest_i/build/cachetest_i-riscv64-xs.bin
task=$tag
work_dir=$full_work_dir
mkdir -p $work_dir
# Note: If you are debugging with single run, following 3 variables are mandatory.
# - It prints debug info in tick range: (crash_tick - 500 * capture_cycles, crash_tick + 500 * capture_cycles)
# - If you want to print debug info from beginning, set crash_tick to 0, and set capture_cycles to a large number
# crash_tick=$(( 0 ))
# capture_cycles=$(( 250000 ))
# debug_flags=CommitTrace # If you unset debug_flags, no debug print will be there
# If you unset debug_flags or crash_tick, no debug print will be there
# Common used flags for debug/tuning
# debug_flags=CommitTrace,IEW,Fetch,LSQUnit,Cache,Commit,IQ,LSQ,PageTableWalker,TLB,MSHR
warmup_inst=$(( 20 * 10**6 ))
max_inst=$(( 40 * 10**6 ))
# debug_gz=/nfs-nvme/home/share/checkpoints_profiles/spec06_rv64gcb_o2_20m/take_cpt/mcf_191500000000_0.105600/0/_191500000000_.gz
debug_gz=/nfs-nvme/home/share/checkpoints_profiles/spec06_rv64gcb_o2_20m/take_cpt/libquantum_1006500000000_0.149838/0/_1006500000000_.gz
rm -f $work_dir/completed
rm -f $work_dir/abort
run $debug_gz $warmup_inst $max_inst $work_dir 1 > $work_dir/$log_file 2>&1
}
export -f check
export -f run
export -f single_run
export -f arg_wrapper
export -f prepare_env
function parallel_run() {
# We use gnu parallel to control the parallelism.
# If your server has 32 core and 64 SMT threads, we suggest to run with no more than 32 threads.
export num_threads=1
cat $workload_list | parallel -a - -j $num_threads arg_wrapper {}
}
# Usually, I use paralell run to benchmark, and use single_run to debug
parallel_run
# single_run
然后执行:./simple_gem5.sh
I compiled the code from the branch dbp-merge-xsdev-221010
.
git switch dbp-merge-xsdev-221010
scons build/RISCV/gem5.debug -j16
And tried to run a hello world program.
./build/RISCV/gem5.debug configs/example/se.py --cpu-type=DerivO3CPU --caches -c tests/test-progs/hello/bin/riscv/linux/hello
It failed with
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version [DEVELOP-FOR-22.1]
gem5 compiled May 25 2023 12:01:17
gem5 started May 25 2023 12:24:53
gem5 executing on mprc2, pid 1587869
command line: ./build/RISCV/gem5.debug configs/example/se.py --cpu-type=DerivO3CPU --caches -c tests/test-progs/hello/bin/riscv/linux/hello
build/RISCV/base/loader/image_file_data.cc:107: info: Loading file tests/test-progs/hello/bin/riscv/linux/hello
build/RISCV/base/loader/image_file_data.cc:127: info: File size is 4814352 bytes
build/RISCV/base/loader/image_file_data.cc:133: info: First 4 bytes are 0x7f 0x45 0x4c 0x46
build/RISCV/base/loader/image_file_data.cc:135: info: Mapped start address is ELF, 0x7ffff4d3e000
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
build/RISCV/mem/dram_interface.cc:690: warn: DRAM device capacity (8192 Mbytes) does not match the address range assigned (512 Mbytes)
build/RISCV/base/loader/image_file_data.cc:107: info: Loading file tests/test-progs/hello/bin/riscv/linux/hello
build/RISCV/base/loader/image_file_data.cc:127: info: File size is 4814352 bytes
build/RISCV/base/loader/image_file_data.cc:133: info: First 4 bytes are 0x7f 0x45 0x4c 0x46
build/RISCV/base/loader/image_file_data.cc:135: info: Mapped start address is ELF, 0x7fffd4cfe000
build/RISCV/cpu/o3/cpu.cc:332: warn: Difftest is disabled
0: system.remote_gdb: listening for remote gdb on port 7000
**** REAL SIMULATION ****
build/RISCV/sim/simulate.cc:194: info: Entering event queue @ 0. Starting simulation...
gem5.debug: build/RISCV/cpu/pred/decoupled_bpred.cc:155: std::pair<bool, bool> gem5::branch_prediction::DecoupledBPU::decoupledPredict(const StaticInstPtr&, const InstSeqNum&, gem5::PCStateBase&, gem5::ThreadID): Assertion `pc.instAddr() < end && pc.instAddr() >= start' failed.
I'm wondering if the way I run hello world is wrong or the branch I'm using has a bug?
为了在gem5 里支持你们自定义格式的checkpoint,gem5 做了哪些修改?有没有相应的提交?
用你们提供的配置参数跑会错误。参数如下:
"$gem5_home/build/RISCV/gem5.fast \
$gem5_home/configs/example/fs.py\
--xiangshan-system \
--bare-metal \
--cpu-type=DerivO3CPU \
--mem-size=8GB \
--caches --cacheline_size=64 \
--l1i_size=64kB --l1i_assoc=8 \
--l1d_size=64kB --l1d_assoc=8 \
--l1d-hwp-type=XSCompositePrefetcher \
--short-stride-thres=0 \
--l2cache --l2_size=1MB --l2_assoc=8 \
--l3cache --l3_size=16MB --l3_assoc=16 \
--l1-to-l2-pf-hint \
--l2-hwp-type=WorkerPrefetcher \
--l2-to-l3-pf-hint \
--l3-hwp-type=WorkerPrefetcher \
--mem-type=DRAMsim3 \
--dramsim3-ini=$gem5_home/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini \
--bp-type=DecoupledBPUWithFTB --enable-loop-predictor \
--kernel=./benchmark/image/coremark.2timesriscv \
--command-line='0x0 0x0 0x66 0 7 1 2000' |& tee run.log" Enter
报下面的错误
Global frequency set at 1000000000000 ticks per second
WARNING: Output directory ext/dramsim3/DRAMsim3/ not exists! Using current directory for output!
fatal: system.workload.bootloader without default or user set value
gem5 Simulator System. https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.
gem5 version [DEVELOP-FOR-22.1]
gem5 compiled Jan 25 2024 10:35:49
gem5 started Jan 25 2024 10:36:29
gem5 executing on n168-020-004, pid 17185
command line: ./GEM5-xs-dev/build/RISCV/gem5.fast ./GEM5-xs-dev/configs/example/fs.py --xiangshan-system --cpu-type=DerivO3CPU --mem-size=8GB --caches --cacheline_size=64 --l1i_size=64kB --l1i_assoc=8 --l1d_size=64kB --l1d_assoc=8 --l1d-hwp-type=XSCompositePrefetcher --short-stride-thres=0 --l2cache --l2_size=1MB --l2_assoc=8 --l3cache --l3_size=16MB --l3_assoc=16 --l1-to-l2-pf-hint --l2-hwp-type=WorkerPrefetcher --l2-to-l3-pf-hint --l3-hwp-type=WorkerPrefetcher --mem-type=DRAMsim3 --dramsim3-ini=./GEM5-xs-dev/ext/dramsim3/xiangshan_configs/xiangshan_DDR4_8Gb_x8_3200_2ch.ini --bp-type=DecoupledBPUWithFTB --enable-loop-predictor --kernel=./benchmark/image/coremark.2timesriscv '--command-line=0x0 0x0 0x66 0 7 1 2000'
[<m5.params.AddrRange object at 0x7f42d71dfc10>]
['basic']
db_switches: []
Attach 1 decoders to thread with addr: <orphan System>.cpu.decoder
Create threads for test sys cpu (RiscvO3CPU)
Add dtb for L1D prefetcher
Add L2 prefetcher as downstream of L1D prefetcher
Add L3 prefetcher as downstream of L2 prefetcher
Add dtb for L2 prefetcher
Finish memory system configuration
No cpu_class provided
了解到nanhu有好几个版本,想请问下模拟器具体是和哪个版本进行对齐的,或者对齐的进度。nanhu-G?nanhu v3?
目前正在尝试用gem5做仿真,目标是想运行一个16核的CPU,在内核启动中遇到CPU8-15无法上线的问题
log如下
[ 0.004941] EFI services will not be available.
[ 0.006221] smp: Bringing up secondary CPUs ...
[ 1.026992] CPU8: failed to come online
[ 2.053885] CPU9: failed to come online
[ 3.080779] CPU10: failed to come online
[ 4.107673] CPU11: failed to come online
[ 5.134567] CPU12: failed to come online
[ 6.161461] CPU13: failed to come online
[ 7.188354] CPU14: failed to come online
[ 8.215248] CPU15: failed to come online
[ 8.215338] smp: Brought up 1 node, 8 CPUs
OS启动完成后查看系统的信息如下:
root@UCanLinux:~ # ls -lh /sys/devices/system/cpu/
total 0
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu0
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu1
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu10
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu11
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu12
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu13
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu14
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu15
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu2
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu3
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu4
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu5
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu6
drwxr-xr-x 3 root root 0 Jan 1 00:00 cpu7
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu8
drwxr-xr-x 2 root root 0 Jan 1 00:00 cpu9
-r--r--r-- 1 root root 4.0K Jan 1 00:00 isolated
-r--r--r-- 1 root root 4.0K Jan 1 00:00 kernel_max
-r--r--r-- 1 root root 4.0K Jan 1 00:00 offline
-r--r--r-- 1 root root 4.0K Jan 1 00:00 online
-r--r--r-- 1 root root 4.0K Jan 1 00:00 possible
-r--r--r-- 1 root root 4.0K Jan 1 00:00 present
-rw-r--r-- 1 root root 4.0K Jan 1 00:00 uevent
root@UCanLinux:~ # cat /sys/devices/system/cpu/online
0-7
上线的CPU只有0-7,请问是否有使用gem5在fs模式下仿真使用16核CPU的经验,gem5本身对核数是不是有限制呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.