Coder Social home page Coder Social logo

riscv-software-src / riscv-perf-model Goto Github PK

View Code? Open in Web Editor NEW
123.0 23.0 50.0 1.48 MB

Example RISC-V Out-of-Order/Superscalar Processor Performance Core and MSS Model

License: Apache License 2.0

CMake 3.49% C++ 87.69% Python 3.61% C 5.17% Makefile 0.04%
modeling out-of-order performance-analysis risc-v

riscv-perf-model's Introduction

Regress Olympia on Ubuntu

olympia

Olympia is a Performance Model written in C++ for the RISC-V community as an example of an Out-of-Order RISC-V CPU Performance Model based on the Sparta Modeling Framework.

Olympia's intent is to provide a starting point for RISC-V CPU performance modeling development enabling the community to build upon Olympia by extending its functionality in areas like branch prediction, prefetching/caching concepts, application profiling, middle-core design, etc.

Currently, Olympia is a trace-driven simulator running instructions streams provided in either JSON format or STF. However, extending Olympia with a functional back-end to run applications natively is under development.

Building

  1. Set up a clean working conda environment by following the directions here
  2. Download and build Sparta and checkout branch map_v2. Follow the directions on the Sparta README to build and install Sparta
  3. Make sure you have the required libraries for the STF toolsuite installed
  4. Clone olympia
    git clone --recursive [email protected]:riscv-software-src/riscv-perf-model.git
    
  5. Build Olympia in the new conda environment created

################################################################################
# Enable conda environment (suggested)
conda activate sparta

################################################################################
# Optimized, no symbols

# A release build
mkdir release; cd release

# Assumes sparta was installed in the conda environment.
# If not, use -DSPARTA_SEARCH_DIR=/path/to/sparta/install
cmake .. -DCMAKE_BUILD_TYPE=Release

# Just builds the simulator
make olympia

################################################################################
# Fast Debug, optimized (not LTO) with debug symbols

# A FastDebug build
mkdir fastdebug; cd fastdebug

# Assumes sparta was installed in the conda environment.
# If not, use -DSPARTA_SEARCH_DIR=/path/to/sparta/install
cmake .. -DCMAKE_BUILD_TYPE=fastdebug

# Just builds the simulator
make olympia

################################################################################
# Debug

# A debug build
mkdir debug; cd debug

# Assumes sparta was installed in the conda environment.
# If not, use -DSPARTA_SEARCH_DIR=/path/to/sparta/install
cmake .. -DCMAKE_BUILD_TYPE=Debug

# Just builds the simulator
make olympia

################################################################################
# Regression
make regress

Developing

Developing on Olympia is encouraged! Please check out the Issue section for areas of needed contributions. If there is no Assignee, the work isn't being done!

When developing on Olympia, please adhere to the documented Coding Style Guidelines.

Example Usage

Get Help Messages

./olympia --help                  # Full help
./olympia --help-brief            # Brief help
./olympia --help-topic topics     # Topics to get detailed help on
./olympia --help-topic parameters # Help on parameters

Get Simulation Layout

./olympia --show-tree       --no-run # Show the full tree; do not run the simulator
./olympia --show-parameters --no-run # Show the parameter tree; do not run the simulator
./olympia --show-loggers    --no-run # Show the loggers; do not run the simulator
# ... more --show options; see help

Running

# Run a given JSON "trace" file
./olympia ../traces/example_json.json

# Run a given STF trace file
./olympia ../traces/dhry_riscv.zstf

# Run a given STF trace file only 100K instructions
./olympia -i100K ../traces/dhry_riscv.zstf

# Run a given STF trace file and generate a
# generic full simulation report
./olympia ../traces/dhry_riscv.zstf --report-all dhry_report.out

Generate and Consume Configuration Files

# Generate a baseline config
./olympia --write-final-config baseline.yaml --no-run

# Generate a config with a parameter change
./olympia -p top.cpu.core0.lsu.params.tlb_always_hit true --write-final-config always_hit_DL1.yaml --no-run
dyff between baseline.yaml always_hit_DL1.yaml

# Use the configuration file generated
./olympia -c always_hit_DL1.yaml -i1M ../traces/dhry_riscv.zstf

Generate Logs

# Log of all messages, different outputs
./olympia -i1K --auto-summary off ../traces/dhry_riscv.zstf \
   -l top info all_messages.log.basic   \
   -l top info all_messages.log.verbose \
   -l top info all_messages.log.raw

# Different logs, some shared
./olympia -i1K --auto-summary off ../traces/dhry_riscv.zstf \
   -l top.*.*.decode info decode.log \
   -l top.*.*.rob    info rob.log    \
   -l top.*.*.decode info decode_rob.log \
   -l top.*.*.rob    info decode_rob.log

Generate PEvents (for Correlation)

PEvents or Performance Events are part of the Sparta Modeling Framework typically used to correlate a performance model with RTL. Unlike pipeout collection Name/Value Definition Pairs (see an example in Inst.hpp), PEvent Name/Value Definitions are typically more compact. Below the surface, Sparta uses the logging infrastructure to collect the data.

Olympia has instrumented a few PEvents as an example. The following commands are useful in listing/using this functionality.

# Dump the list of supported PEvents
./olympia --help-pevents --no-run

# Generate RETIRE only pevents for the first 100 instructions of Dhrystone
./olympia traces/dhry_riscv.zstf --pevents retire.out RETIRE -i100

# Generate COMPLETE only pevents ...
./olympia traces/dhry_riscv.zstf --pevents complete.out COMPLETE -i100

# Generate COMPLETE pevents into complete.out and RETIRE pevents into retire.out ...
./olympia traces/dhry_riscv.zstf --pevents retire.out RETIRE --pevents complete.out COMPLETE -i100

# Generate RETIRE and COMPLETE pevents to the same file
./olympia traces/dhry_riscv.zstf --pevents complete_retire.out RETIRE,COMPLETE -i100

# Generate all pevents
./olympia traces/dhry_riscv.zstf --pevents complete_retire.out all -i100

Generate Reports

# Run with 1M instructions, generate a report from the top of the tree
# with stats that are not hidden; turn off the auto reporting
cat reports/core_stats.yaml
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off  --report "top" reports/core_stats.yaml my_full_report.txt text

# Generate a report only for decode in text form
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off  --report "top.cpu.core0.decode" reports/core_stats.yaml my_decode_report.txt text

# Generate a report in JSON format
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off  --report "top" reports/core_stats.yaml my_json_report.json json

# Generate a report in CSV format
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off  --report "top" reports/core_stats.yaml my_csv_report.csv csv

# Generate a report in HTML format
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off  --report "top" reports/core_stats.yaml my_html_report.html html

Generate More Complex Reports

# Using a report definition file, program the report collection to
# start after 500K instructions
cat reports/core_report.def
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off    \
   --report reports/core_report.def  \
   --report-search reports           \
   --report-yaml-replacements        \
       OUT_BASE my_report            \
       OUT_FORMAT text               \
       INST_START 500K

# Generate a time-series report -- capture all stats every 10K instructions
cat reports/core_timeseries.def
./olympia -i1M ../traces/dhry_riscv.zstf --auto-summary off       \
   --report reports/core_timeseries.def \
   --report-search reports              \
   --report-yaml-replacements           \
       OUT_BASE my_report               \
       TS_PERIOD 10K
python3 ./reports/plot_ts.y my_report_time_series_all.csv

Experimenting with Architectures

# By default, olympia uses the small_core architecture
./olympia -i1M  ../traces/dhry_riscv.zstf --auto-summary off --report-all report_small.out

# Use the medium sized core
cat arches/medium_core.yaml  # Example of the medium core
./olympia -i1M  ../traces/dhry_riscv.zstf --arch medium_core --auto-summary off --report-all report_medium.out
diff -y -W 150 report_small.out report_medium.out

# Use the big core
cat arches/big_core.yaml  # Example of the big core
./olympia -i1M  ../traces/dhry_riscv.zstf --arch big_core --auto-summary off --report-all report_big.out
diff -y -W 150 report_medium.out report_big.out

Generate and View a Pipeout

./olympia -i1M ../traces/dhry_riscv.zstf --debug-on-icount 100K -i 101K -z pipeout_1K --auto-summary off

# Launch the viewer
# *** MacOS use pythonw
python $MAP_BASE/helios/pipeViewer/pipe_view/argos.py -d pipeout_1K -l ../layouts/small_core.alf

Issue Queue Modeling

Olympia has the ability to define issue queue to execution pipe mapping, as well as what pipe targets are available per execution unit. Also, with the implemenation of issue queue, Olympia now has a generic execution unit for all types, so one doesn't have to define alu0 or fpu0, it is purely based off of the pipe targets, instead of unit types as before. In the example below:

top.cpu.core0.extension.core_extensions:
  # this sets the pipe targets for each execution unit
  # you can set a multiple or just one:
  # ["int", "div"] would mean this execution pipe can accept
  # targets of: "int" and "div"
  pipelines:
  [
    ["int"], # exe0
    ["int", "div"], # exe1
    ["int", "mul"], # exe2
    ["int", "mul", "i2f", "cmov"], # exe3
    ["int"], # exe4
    ["int"], # exe5
    ["float", "faddsub", "fmac"], # exe6
    ["float", "f2i"], # exe7
    ["br"], # exe8
    ["br"] # exe9
  ]
  # this is used to set how many units per queue
  # ["0", "3"] means iq0 has exe0, exe1, exe2, and exe3, so it's inclusive
  # if you want just one execution unit to issue queue you can do:
  # ["0"] which would result in iq0 -> exe0
  # *note if you change the number of issue queues, 
  # you need to add it to latency matrix below

  issue_queue_to_pipe_map:
  [ 
    ["0", "1"], # iq0 -> exe0, exe1
    ["2", "3"], # iq1 -> exe2, exe3
    ["4", "5"], # iq2 -> exe4, exe5
    ["6", "7"], # iq3 -> exe6, exe7
    ["8", "9"]  # iq4 -> exe8, exe9
  ]

The pipelines section defines for each execution unit, what are it's pipe targets. For example, the first row that has ["int"] defines the first execution unit exe0 that only handles instructions with pipe targets of int. Additionally, the second row defines an execution unit that handles instructions of int and div and so on.

The issue_queue_to_pipe_map defines which execution units map to which issue queues, with the position being the issue queue number. So in the above ["0", "1"] in the first row is the first issue queue that connects to exe0 and exe1, so do note it's inclusive of the end value. If one wanted to have a one execution unit to issue queue mapping, the above would turn into:

issue_queue_to_pipe_map:
  [ 
    ["0"], # iq0 -> exe0
    ["1"], # iq1 -> exe1
    ["2"], # iq2 -> exe2
    ["3"], # iq3 -> exe3
    ["4"], # iq4 -> exe4
    ["5"], # iq5 -> exe5
    ["6"], # iq6 -> exe6
    ["7"], # iq7 -> exe7
    ["8"], # iq8 -> exe8
    ["9"], # iq9 -> exe9
  ]

Additionally, one can rename the issue queues and execution units to more descriptive names of their use such as:

exe_pipe_rename:
  [
    ["exe0", "alu0"],
    ["exe1", "alu1"],
    ["exe2", "alu2"],
    ["exe3", "alu3"],
    ["exe4", "alu4"],
    ["exe5", "alu5"],
    ["exe6", "fpu0"],
    ["exe7", "fpu1"],
    ["exe8", "br0"],
    ["exe9", "br1"],
  ]

  # optional if you want to rename each iq* unit
  issue_queue_rename:
  [
    ["iq0", "iq0_alu"],
    ["iq1", "iq1_alu"],
    ["iq2", "iq2_alu"],
    ["iq3", "iq3_fpu"],
    ["iq4", "iq4_br"],
  ]

The above shows a 1 to 1 mapping of the renaming the execution units and issue queues. Do keep in mind that the order does matter, so you have to rename it exe0, exe1 and in order. Additionally, you have to either rename all execution units or all issue queue units, you cannot do partial. You can rename only the execution units but not the issue queues.

Finally, if you do rename the issue queue names, you will need to update their definition in the scoreboard as so:

top.cpu.core0.rename.scoreboards:
  # From
  # |
  # V
  integer.params.latency_matrix: |
      [["",         "lsu",     "iq0_alu", "iq1_alu", "iq2_alu", "iq3_fpu", "iq4_br"],
      ["lsu",       1,         1,         1,          1,        1,         1],
      ["iq0_alu",   1,         1,         1,          1,        1,         1],
      ["iq1_alu",   1,         1,         1,          1,        1,         1],
      ["iq2_alu",   1,         1,         1,          1,        1,         1],
      ["iq3_fpu",   1,         1,         1,          1,        1,         1],
      ["iq4_br",    1,         1,         1,          1,        1,         1]]
  float.params.latency_matrix: |
      [["",         "lsu",     "iq0_alu", "iq1_alu", "iq2_alu", "iq3_fpu", "iq4_br"],
      ["lsu",       1,         1,         1,          1,        1,         1],
      ["iq0_alu",   1,         1,         1,          1,        1,         1],
      ["iq1_alu",   1,         1,         1,          1,        1,         1],
      ["iq2_alu",   1,         1,         1,          1,        1,         1],
      ["iq3_fpu",   1,         1,         1,          1,        1,         1],
      ["iq4_br",    1,         1,         1,          1,        1,         1]]

riscv-perf-model's People

Contributors

aarongchan avatar ah-condor avatar arupc-vmicro avatar avinashmehtadelhi avatar bdutro avatar danbone avatar dingiso avatar furuame avatar h0lyalg0rithm avatar jeffnye-gh avatar kathlenemagnus avatar kathlenemagnus-mips avatar klingaard avatar kunal-buch avatar lhtin avatar light2802 avatar peter-d avatar shubhf avatar vineethkm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

riscv-perf-model's Issues

RAW hazard handling in olympia

Discussed in #72

Originally posted by avinashmehtadelhi July 28, 2023
Hi
Wanted to know more about how olympia handles RAW hazards.
PFA log from dhrystone trace
Following are adjacent 2 instructions
uid: 90 1147a 'sub x10,x14,x15'
uid: 91 1147e 'andi x11,x10, +0xff'
(RAW hazard x10)
but both of them enter the ALU in clock 132, exit ALU in clock 133 and retire in 137
Is this correct behaviour?
dhry_big_1k.log.verbose.txt

Transfer instructions (F2I, I2F) update wrong scoreboard

The following sequence of instructions cause olympia to timeout:

[
    {
        "mnemonic": "fmv.d.x",
        "rs1": 1,
        "fd": 17
    },
    {
        "mnemonic": "fadd.d",
        "fs1": 10,
        "fs2": 17,
        "fd": 7
    }
]

The problem is that the fmv instruction is dispatched to the IEX block to read x1, but when that pipeline writes the f17 rename, it's actually writing to a random rename for x17.

The code that needs modification: https://github.com/riscv-software-src/riscv-perf-model/blob/f7c17dcb8b10f80650fa50cbc1affe31e3a0624f/core/ExecutePipe.cpp#L142C2-L143

The code should not assume the destination register file is the same as the unit's.

InstructionGenerator to also generate dummy ops

For more accurate Fetch Unit behavior, speculative ops that aren't from a correct path trace provided instruction are desirable. i.e., wrong-path fetched ops (similar ops may be useful for prefetches).
The InstructionGenerator should be enabled to create these dummy ops so that its unique_id is correct and the correct path trace instructions are not lost.
One approach would be to overload the getNextInst(const sparta::Clock * clk) with another signature like getNextInst(const sparta::Clock * clk, bool dummy).
Another approach might be to add a function getDummyInst(const sparta::Clock * clk)

Note from a prior email exchange with Knute:
The use of RISC-V hint instructions or no-ops where the operand number has special meaning.  For example, if you look at

inst = mavis_facade_->makeInstDirectly(ex_info, clk);

you can see how you can create a specific instruction + operands that you can use to determine if spec.

Build System between riscv-perf-model and map

Hi Knute,

After reading the message in mail list, I figure out that the version of map (before the new build system) is what making the build of riscv-perf-model frustrated.

Why not add the map with specific commit as one git submodule in riscv-perf-model repo since they all closely bounded. And we could update its commit when it's ready to use the same build system.

Thanks,
Dingisoul

Mon-ascii characters in the generated alf files that are checked in, for example in `small_core.alf`

From @peter-d :

          Hi Knute - I run into problems with this PR, seems there are some non-ascii characters in the generated alf files that are checked in, for example in `small_core.alf`:

image

On my end (WSL2 Ubuntu), this crashes the yaml parsing when you start the pipe viewer. I haven't been able to rerun the generate script without errors, it's not clear to me what sparta version it depends on exactly to work.

Originally posted by @peter-d in #107 (comment)

Implement micro op fusion in decode stage.

In the decode stage, we might find several pairs of uops that can be merged into one instruction to increase performance. Since this optimization is common in modern high-performance CPUs, we can add this feature for users to model the performance gain.

Compilation error on Apple Silicon box

/Users/zen/Sparta/riscv-perf-model/core/Inst.hpp:216:31: error: must specify at least one argument for '...' parameter of variadic macro [-Werror,-Wgnu-zero-variadic-macro-arguments]
                              SPARTA_ADDPAIR("complete",  &Inst::getCompletedStatus),                                                   
                              ^                                                                                                         
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:38:29: note: expanded from macro 'SPARTA_ADDPAIR'                      
#define SPARTA_ADDPAIR(...) _ADDPAIR_UTIL(__VA_ARGS__)                                                                                  
                            ^                                                                                                           
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:35:75: note: expanded from macro '_ADDPAIR_UTIL'                       
#define _ADDPAIR_UTIL(p_1, ...) addPair(p_1, std::forward<Args>(args)..., _ADDPAIR_RESOLVE(__VA_ARGS__)                                 
                                                                          ^                                                             
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:32:77: note: expanded from macro '_ADDPAIR_RESOLVE'                    
#define _ADDPAIR_RESOLVE(...) GET_ARGS(__VA_ARGS__, _RESOLVED_2, _RESOLVED_1)(__VA_ARGS__)                                              
                                                                            ^                                                           
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:29:9: note: macro 'GET_ARGS' defined here                              
#define GET_ARGS(_1, _2, FCN_NAME, ...) FCN_NAME                                                                                        
        ^                                                                                                                               
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.cpp:4:                   
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.hpp:15:     
In file included from /Users/zen/Sparta/riscv-perf-model/core/CoreTypes.hpp:6:                            
/Users/zen/Sparta/riscv-perf-model/core/Inst.hpp:217:31: error: must specify at least one argument for '...' parameter of variadic macro [-Werror,-Wgnu-zero-variadic-macro-arguments]
                              SPARTA_ADDPAIR("unit",      &Inst::getUnit),
                              ^                                                                                                         
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:38:29: note: expanded from macro 'SPARTA_ADDPAIR'
#define SPARTA_ADDPAIR(...) _ADDPAIR_UTIL(__VA_ARGS__)                                                                                  
                            ^                                                                                                           
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:35:75: note: expanded from macro '_ADDPAIR_UTIL'
#define _ADDPAIR_UTIL(p_1, ...) addPair(p_1, std::forward<Args>(args)..., _ADDPAIR_RESOLVE(__VA_ARGS__)
                                                                          ^
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:32:77: note: expanded from macro '_ADDPAIR_RESOLVE'
#define _ADDPAIR_RESOLVE(...) GET_ARGS(__VA_ARGS__, _RESOLVED_2, _RESOLVED_1)(__VA_ARGS__)
                                                                            ^
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:29:9: note: macro 'GET_ARGS' defined here       
#define GET_ARGS(_1, _2, FCN_NAME, ...) FCN_NAME                                                                                        
        ^                                                                                                                               
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.cpp:4:                                             
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.hpp:15:                  
In file included from /Users/zen/Sparta/riscv-perf-model/core/CoreTypes.hpp:6:
/Users/zen/Sparta/riscv-perf-model/core/Inst.hpp:218:31: error: must specify at least one argument for '...' parameter of variadic macro [-Werror,-Wgnu-zero-variadic-macro-arguments]
                              SPARTA_ADDPAIR("latency",   &Inst::getExecuteTime),
                              ^
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:38:29: note: expanded from macro 'SPARTA_ADDPAIR'
#define SPARTA_ADDPAIR(...) _ADDPAIR_UTIL(__VA_ARGS__)                                                                                  
                            ^                                                                                                           
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:35:75: note: expanded from macro '_ADDPAIR_UTIL'                       
#define _ADDPAIR_UTIL(p_1, ...) addPair(p_1, std::forward<Args>(args)..., _ADDPAIR_RESOLVE(__VA_ARGS__)
                                                                          ^   
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:32:77: note: expanded from macro '_ADDPAIR_RESOLVE'
#define _ADDPAIR_RESOLVE(...) GET_ARGS(__VA_ARGS__, _RESOLVED_2, _RESOLVED_1)(__VA_ARGS__)                        
                                                                            ^
/Users/zen/Sparta/map/sparta/sparta/pairs/RegisterPairsMacro.hpp:29:9: note: macro 'GET_ARGS' defined here
#define GET_ARGS(_1, _2, FCN_NAME, ...) FCN_NAME                                                                                        
        ^                                                                                                                               
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.cpp:4:    
In file included from /Users/zen/Sparta/riscv-perf-model/mss/BIU.hpp:15:                                            
In file included from /Users/zen/Sparta/riscv-perf-model/core/CoreTypes.hpp:6:            

Linking CXX Executable issue

(sparta) [manicka.vinodhini@localhost release]$ make olympia
[ 0%] Built target stf_git_version
[ 49%] Built target stf
[ 54%] Built target mss
[ 94%] Built target core
[ 96%] Linking CXX executable olympia
/home/manicka.vinodhini/miniconda3/envs/sparta/lib/libboost_filesystem.so.1.78.0: error: undefined reference to '__cxa_throw_bad_array_new_length', version 'CXXABI_1.3.8'
/home/manicka.vinodhini/miniconda3/envs/sparta/lib/libyaml-cpp.so.0.7.0: error: undefined reference to 'std::__cxx11::basic_stringstream<char, std::char_traits, std::allocator >::basic_stringstream()', version 'GLIBCXX_3.4.26'
/home/manicka.vinodhini/miniconda3/envs/sparta/lib/libboost_serialization.so.1.78.0: error: undefined reference to 'std::uncaught_exceptions()', version 'GLIBCXX_3.4.22'

Installed server details:
OS: Centos 7
GCC version: 9.3.1(devtoolset-9)
Conda version: 23.7.3

clang-format in place, but not applied

Hi,

Thanks for uploading the .clang-format file. I tried applying the .clang-format to the files in this repo but it modifies all the files. For e.g.

diff --git a/core/Decode.cpp b/core/Decode.cpp
index 90e8f75..139d97e 100644
--- a/core/Decode.cpp
+++ b/core/Decode.cpp
@@ -1,6 +1,5 @@
 // <Decode.cpp> -*- C++ -*-

-
 #include <algorithm>

 #include "Decode.hpp"
@@ -12,20 +11,15 @@ namespace olympia
 {
     constexpr char Decode::name[];

-    Decode::Decode(sparta::TreeNode * node,
-                   const DecodeParameterSet * p) :
-        sparta::Unit(node),
-        fetch_queue_("FetchQueue", p->fetch_queue_size, node->getClock(), &unit_stat_set_),
-        num_to_decode_(p->num_to_decode)
+    Decode::Decode(sparta::TreeNode* node, const DecodeParameterSet* p) : sparta::Unit(node),
+                                                                          fetch_queue_("FetchQueue", p->fetch_queue_size, node->getClock(), &unit_stat_set_),
+                                                                          num_to_decode_(p->num_to_decode)

Can you please let me know if this is the expected behaviour? Is the .clang-format file meant for newly added files/code only? I used clang-format-15 to apply the format.

Originally posted by @RamVarad in #100

Enhance middle machine to enable dynamic topologies

Currently the olympia model has 1 ALU, 1 branch, etc and Dispatch is hard-coded to this topology. Using Sparta's extension support, add a dynamic topology definition to allow a user to create different topologies.

This includes (as a start):

  1. Defining the number of ALUs, branch units, load/store units, etc
  2. Defining structures like instruction issuing queues and pipes (and the ratios for them)

Add python shell support to Olympia

The Sparta modeling framework has the beginnings of python support, although not full fleshed out. This project entails two endeavors:

  1. Resurrect the python support in sparcians/map/sparta
  2. Have a working example of the use of python in Olympia

Basics of the initial implementation:

  1. Initialize a python shell:
   ./olympia --python-shell
  1. Create an Olympia simulation object
(python) topology = "simple"
(python) scheduler = sparta.Scheduler()
(python) num_cores = 1
(python) workload = "traces/dhry_riscv.zstf"
(python) sim = OlympiaSim(topology, scheduler, num_cores, workload)
  1. Run simulation
(python) sim.run()

Advanced implementation (towards the other end of the project). This will allow the user to create an empty simulation container and create components/topology on the fly:

sim = OlympiaSim()

# Set up the simulation tree
top = sim.getRootTreeNode()
core_tn = sparta.ResourceTreeNode(top, "core", "RISC-V Core", sparta.TreeNode.GROUP_NAME_NONE, sparta.TreeNode.GROUP_IDX_NONE, olympia.core_factory)
fetch_tn = sparta.ResourceTreeNode(core_tn, "fetch", "Fetch Unit", sparta.TreeNode.GROUP_NAME_NONE, sparta.TreeNode.GROUP_IDX_NONE, olympia.fetch_factory)
# ... other nodes

# Set up configurations
fetch_tn.getParameters().num_to_fetch = 8
# ...other configs

# Set up simulation (instantiates Fetch, Decode, etc). (this is a made-up function)
sim.initializeComponents()

# Set up bindings
sparta.bind(top.core.fetch.ports.out_fetch_queue_write, top.core.decode.ports.in_fetch_queue_write)

# Finalize the tree
sim.finalize()

# Run it
sim.run(workload)

Add execution driven support

Olympia a trace-driven model; the input to the simulator is a trace generated in STF standard.. The trace (.zstf or .stf file extension) is a stream of instructions that a program took to execute itself on the core. The instruction stream consists of only those instructions that actually retired.

Because the trace only contains instructions that are on the correct path, performance deviations results from behaviors like branch mispredictions, speculative load/store/instruction cache effects, etc are not taken into consideration when analyzing core performance.

In execution-driven mode (EDM), instead of using an instruction stream from an STF trace, the Olympia simulator would instead execute a workload directly including speculative behaviors.

To facilitate execution-driven mode in Olympia, Olympia would need a "dynamic trace" from a functional model that is actively executing the workload. But the functional model cannot just execute the instruction stream as the program dictates. Instead, the functional must be directed by the performance model on which instructions to execute which includes speculative paths. Do so, however, will corrupt the program execution. The functional model must be able to recover from a flush and handle redirects just like real hardware.

This issue will define those requirements on the functional model to support OoO execution as well as speculative paths. From these requirements, a functional modeling API must be built and a backend chosen (like Spike or SAIL or ...?) to perform the execution.

Extend Olympia to support external components

Academic researchers are interested in testing out the component they worked on.We olympia to be flexible to support those components
Here are some modules that we might like to include

  • I2C
  • DMA
  • PCI
  • Fault modelling (temperature)
  • Dynamic Voltage and Frequency Handler
  • external GPU
  • Matrix processing engine

Add a branch predictor

Need a branch predictor to Fetch. Open to considerations on the type. Should be parameterized, but a common API would be beneficial to keep the model generic.

Resolutions of the branches will require feedback from Retire and/or Execute . Typicality this can be done with ports.

Improve Load/Store Unit

The load/store unit in the RISC-V Performance Model is a simple, single, in-order, pipeline (sparta::Pipeline) with a fixed number of stages: MMU Lookup, Cache lookup, and completion.

To enhance this unit, or rather to make it more extensible, the taker of this GH issue must understand the current limitations of the block.

There is no unit test for the load/store unit

  • Add a unit test similar to Rename and Dispatch
  • Add testing for various loads and stores; ensure timing looks sane for given parameters

The pipeline stages are hardcoded

See

enum class PipelineStage
{
MMU_LOOKUP = 0, //1,
CACHE_LOOKUP = 1, //3,
COMPLETE = 2, //4
NUM_STAGES
};
with fixed latencies and length

  • Fix 1: Remove the enum and replace with load/store parameters; verify current behavior is unchanged.

The MMU can only handle 1 outstanding miss

  • Add support to allow the MMU to service multiple outstanding misses; consider merging multiple misses into 1 access

The cache can only handle 1 outstanding miss

  • Add support to allow multiple cache misses; consider merging multiple misses to the same line into 1 miss

There is only support for 1 load/store pipeline

Arbitration for load/store pipe is simple

  • Add support for a smarter arbitration: loads take priority over stores; consider age, etc

Separate loads from stores

  • Stores are typically not important. Add support to move them off to the side

Other ideas welcomed...

Generate STF trace for Coremark using STF instrumented Dromajo

Try out the steps to generate dhrystone stf trace following instructions from https://github.com/riscv-software-src/riscv-perf-model/blob/master/traces/README.md . This is the suggested order of operations:
- replicate the current instructions in the README using the specified Dromajo SHA (which is old now) and generate Dhrystone trace
- repeat the same steps with latest Dromajo SHA to see if its still working
- repeat the same steps as above to generate coremark trace
- look into possibilities of creating and maintaining a fork of Dromajo with STF generation capabilities instead of a patch.

Not able to trace with Dromajo

Hi

I am following the instructions in https://github.com/riscv-software-src/riscv-perf-model/blob/master/traces/README.md for trying tracing with new workloads.

The steps do not work out of the box on latest checkout with Ubuntu 22.04. Particularly there were three errors in building buildroot.

  1. SIGSTKSZ not defined which is similar to that described in openwrt/openwrt#9055 and could be resolved by applying patch from https://launchpad.net/ubuntu/+archive/primary/+sourcefiles/m4/1.4.18-5ubuntu1/m4_1.4.18-5ubuntu1.debian.tar.xz
  2. _STAT_VER not defined, that was fixed from the diff for libfakeroot.c here - https://salsa.debian.org/clint/fakeroot/-/merge_requests/10/diffs#6c4c023cdb2bb2d22a70b89b3dca80920ac0dd79
  3. mknod operation not permitted in the cpio.mk step. The only fix I could find is to edit traces/stf_trace_gen/dromajo/run/buildroot-2020.05.1/fs/cpio/cpio.mk and add "sudo" before the mknod line.

After these fixes I could perform all the build steps in the README, but when running Dromajo it fails with -

[    0.594772] Freeing unused kernel memory: 232K
[    0.598231] Run /init as init process
mount: you must be root
mount: you must be root
mount: you must be root
mount: you must be root
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
hostname: sethostname: Operation not permitted
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: ip: RTNETLINK answers: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted

This does not go away by running dromajo with root, nor I can perform the complete build under root as buildroot make infra complains that it is unsafe to do so. Trying this on a different machine did not help either.
I am stuck here and any suggestions to get past the issue would be much appreciated.

Thanks!

Full OpenSBI log if that helps -

sr_read: invalid CSR=0x30a
csr_read: invalid CSR=0xda0
csr_read: invalid CSR=0xfb0

OpenSBI v1.3-111-gdc0bb19
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name             : ucbbar,dromajo-bare
Platform Features         : medeleg
Platform HART Count       : 1
Platform IPI Device       : aclint-mswi
Platform Timer Device     : aclint-mtimer @ 1000000Hz
Platform Console Device   : uart8250
Platform HSM Device       : ---
Platform PMU Device       : ---
Platform Reboot Device    : ---
Platform Shutdown Device  : ---
Platform Suspend Device   : ---
Platform CPPC Device      : ---
Firmware Base             : 0x80000000
Firmware Size             : 194 KB
Firmware RW Offset        : 0x20000
Firmware RW Size          : 66 KB
Firmware Heap Offset      : 0x28000
Firmware Heap Size        : 34 KB (total), 2 KB (reserved), 9 KB (used), 22 KB (free)
Firmware Scratch Size     : 4096 B (total), 328 B (used), 3768 B (free)
Runtime SBI Version       : 2.0

Domain0 Name              : root
Domain0 Boot HART         : 0
Domain0 HARTs             : 0*
Domain0 Region00          : 0x0000000012002000-0x0000000012002fff M: (I,R,W) S/U: (R,W)
Domain0 Region01          : 0x0000000080000000-0x000000008001ffff M: (R,X) S/U: ()
Domain0 Region02          : 0x0000000080020000-0x000000008003ffff M: (R,W) S/U: ()
Domain0 Region03          : 0x0000000002080000-0x00000000020bffff M: (I,R,W) S/U: ()
Domain0 Region04          : 0x0000000002000000-0x000000000207ffff M: (I,R,W) S/U: ()
Domain0 Region05          : 0x0000000000000000-0xffffffffffffffff M: () S/U: (R,W,X)
Domain0 Next Address      : 0x0000000080200000
Domain0 Next Arg1         : 0x0000000082200000
Domain0 Next Mode         : S-mode
Domain0 SysReset          : yes
Domain0 SysSuspend        : yes

Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.11
Boot HART Base ISA        : rv64imafdcv
Boot HART ISA Extensions  : none
Boot HART PMP Count       : 8
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 38
Boot HART MHPM Info       : 0 (0x00000000)
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] Linux version 5.8.0-rc4 (shpotdar@ROG-Shivam) (riscv64-linux-gcc.br_real (Buildroot 2020.05.1-gf3c3112-dirty) 9.3.0, GNU ld (GNU Binutils) 2.32) #1 SMP Sun Nov 19 00:25:00 IST 2023
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] Initial ramdisk at: 0x(____ptrval____) (3239936 bytes)
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x00000000bfffffff]
[    0.000000] software IO TLB: mapped [mem 0xbaee6000-0xbeee6000] (64MB)
[    0.000000] SBI specification v2.0 detected
[    0.000000] SBI implementation ID=0x1 Version=0x10003
[    0.000000] SBI v0.2 TIME extension detected
[    0.000000] SBI v0.2 IPI extension detected
[    0.000000] SBI v0.2 RFENCE extension detected
[    0.000000] SBI v0.2 HSM extension detected
[    0.000000] riscv: ISA extensions acdfimsuv
[    0.000000] riscv: ELF capabilities acdfim
[    0.000000] percpu: Embedded 17 pages/cpu s31976 r8192 d29464 u69632
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 258055
[    0.000000] Kernel command line: root=/dev/ram rw earlycon=sbi console=hvc0 bench=spec06_gcc
[    0.000000] Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[    0.000000] Inode-cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.000000] Sorting __ex_table...
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory: 944340K/1046528K available (6406K kernel code, 4272K rwdata, 4096K rodata, 235K init, 317K bss, 102188K reserved, 0K cma-reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]       fixmap : 0xffffffcefee00000 - 0xffffffceff000000   (2048 kB)
[    0.000000]       pci io : 0xffffffceff000000 - 0xffffffcf00000000   (  16 MB)
[    0.000000]      vmemmap : 0xffffffcf00000000 - 0xffffffcfffffffff   (4095 MB)
[    0.000000]      vmalloc : 0xffffffd000000000 - 0xffffffdfffffffff   (65535 MB)
[    0.000000]       lowmem : 0xffffffe000000000 - 0xffffffe03fe00000   (1022 MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=1.
[    0.000000] rcu:     RCU debug extended QS entry/exit.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] riscv-intc: 64 local interrupts mapped
[    0.000000] plic: plic@10000000: mapped 31 interrupts with 1 handlers for 2 contexts.
[    0.000000] riscv_timer_init_dt: Registering clocksource cpuid [0] hartid [0]
[    0.000000] clocksource: riscv_clocksource: mask: 0xffffffffffffffff max_cycles: 0x1d854df40, max_idle_ns: 3526361616960 ns
[    0.000006] sched_clock: 64 bits at 1000kHz, resolution 1000ns, wraps every 2199023255500ns
[    0.000223] Console: colour dummy device 80x25
[    0.000296] printk: console [hvc0] enabled
[    0.000296] printk: console [hvc0] enabled
[    0.000382] printk: bootconsole [sbi0] disabled
[    0.000382] printk: bootconsole [sbi0] disabled
[    0.000504] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=4000)
[    0.000633] pid_max: default: 32768 minimum: 301
[    0.001024] Mount-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.001115] Mountpoint-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.004068] rcu: Hierarchical SRCU implementation.
[    0.004819] smp: Bringing up secondary CPUs ...
[    0.004873] smp: Brought up 1 node, 1 CPU
[    0.006542] devtmpfs: initialized
[    0.010322] random: get_random_u32 called from bucket_table_alloc.isra.0+0x4e/0x154 with crng_init=0
[    0.011083] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.011277] futex hash table entries: 256 (order: 2, 16384 bytes, linear)
[    0.012973] NET: Registered protocol family 16
[    0.082899] vgaarb: loaded
[    0.085001] SCSI subsystem initialized
[    0.086323] usbcore: registered new interface driver usbfs
[    0.086568] usbcore: registered new interface driver hub
[    0.086805] usbcore: registered new device driver usb
[    0.091012] clocksource: Switched to clocksource riscv_clocksource
[    0.116337] NET: Registered protocol family 2
[    0.117520] tcp_listen_portaddr_hash hash table entries: 512 (order: 2, 20480 bytes, linear)
[    0.117645] TCP established hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.117786] TCP bind hash table entries: 8192 (order: 6, 262144 bytes, linear)
[    0.117994] TCP: Hash tables configured (established 8192 bind 8192)
[    0.118239] UDP hash table entries: 512 (order: 3, 49152 bytes, linear)
[    0.118387] UDP-Lite hash table entries: 512 (order: 3, 49152 bytes, linear)
[    0.118909] NET: Registered protocol family 1
[    0.120184] RPC: Registered named UNIX socket transport module.
[    0.120250] RPC: Registered udp transport module.
[    0.120305] RPC: Registered tcp transport module.
[    0.120360] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.120434] PCI: CLS 0 bytes, default 64
[    0.121766] Unpacking initramfs...
[    0.322575] Freeing initrd memory: 3164K
[    0.324183] workingset: timestamp_bits=62 max_order=18 bucket_order=0
[    0.370769] NFS: Registering the id_resolver key type
[    0.370858] Key type id_resolver registered
[    0.370909] Key type id_legacy registered
[    0.370990] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.371424] 9p: Installing v9fs 9p2000 file system support
[    0.372855] NET: Registered protocol family 38
[    0.372994] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.373074] io scheduler mq-deadline registered
[    0.373128] io scheduler kyber registered
[    0.523949] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    0.531201] 12002000.uart: ttyS0 at MMIO 0x12002000 (irq = 1, base_baud = 1562500) is a 16550A
[    0.536078] 12007000.uart: ttyS1 at MMIO 0x12007000 (irq = 2, base_baud = 1562500) is a 16550A
[    0.538090] [drm] radeon kernel modesetting enabled.
[    0.569040] loop: module loaded
[    0.573582] libphy: Fixed MDIO Bus: probed
[    0.576518] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    0.576582] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    0.577025] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.577097] ehci-pci: EHCI PCI platform driver
[    0.577280] ehci-platform: EHCI generic platform driver
[    0.577478] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    0.577572] ohci-pci: OHCI PCI platform driver
[    0.577753] ohci-platform: OHCI generic platform driver
[    0.578609] usbcore: registered new interface driver uas
[    0.578851] usbcore: registered new interface driver usb-storage
[    0.579408] mousedev: PS/2 mouse device common for all mice
[    0.581571] usbcore: registered new interface driver usbhid
[    0.581633] usbhid: USB HID core driver
[    0.584444] NET: Registered protocol family 10
[    0.586799] Segment Routing with IPv6
[    0.586970] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.588789] NET: Registered protocol family 17
[    0.589509] 9pnet: Installing 9P2000 support
[    0.589716] Key type dns_resolver registered
[    0.594772] Freeing unused kernel memory: 232K
[    0.598231] Run /init as init process
mount: you must be root
mount: you must be root
mount: you must be root
mount: you must be root
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
can't open /dev/null: No such file or directory
hostname: sethostname: Operation not permitted
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Starting network: ip: RTNETLINK answers: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted
FAIL

Fix issues found during demo

Demo given on Sept 28, 2022 illustrated a couple of issues:

  • README needs some updating
  • The simulator exits with an error with the --no-run flag
  • total_insts_executed stat isn't being incremented in the model
  • Branches do not seem to be going to the branch unit

CC: @arupc

Setup CI regression

Will need to learn about that works with RISC-V software, but need to set up CI for Olympia

Add git SHA to output

Can we modify Olympia such that it always outputs the git SHA used to build the simulator? This will be helpful for reproducibility.

Typos

Currently:
An extraction of the Map/Spara Example Performance Model based on the Sparta Modeling Framework, Olympia is a fully-features OoO CPU performance model for the RISC-V community.

Olympia is a trace-driven simulator runnig instructions streams defined in either JSON format or STF.

Better:

An extraction of the Map/Sparta Example Performance Model based on the Sparta Modeling Framework, Olympia is a fully-featured OoO RISC-V CPU performance model for the RISC-V community.

Olympia is a trace-driven simulator running instructions streams defined in either JSON format or STF.

Central allocator class for all allocations

One of the latest branches of Olympia is seg-faulting due to memory allocators being destroyed before units in simulation.

When a unit in simulation is torn down, it will destroy internal members like queues, buffers, etc which might still be holding onto objects that were allocated using an Allocator. Need to ensure the Allocators outlive the simulation components.

Ironically this was just asked in Map/Sparta: sparcians/map#445 (comment)

cmake error while trying to build

Trying to build olympia from this repo and ran into this:

(sparta_dev) [master] src/riscv-perf-model/release % mkdir release; cd release
(sparta_dev) [master] src/riscv-perf-model/release % cmake .. -DCMAKE_BUILD_TYPE=Release -DSPARTA_BASE=</path/to/>map/sparta
-- The CXX compiler identification is GNU 10.4.0
-- Detecting CXX compiler ABI info
...
...
CMake Error at CMakeLists.txt:23 (include):
  include could not find requested file:

    /users/achakraborty/mywork/src/riscv-perf-model/stf_lib/cmake/stf-config.cmake


CMake Error at CMakeLists.txt:92 (add_subdirectory):
  The source directory

    /users/achakraborty/mywork/src/riscv-perf-model/stf_lib

  does not contain a CMakeLists.txt file.

Checked that when I build using the previous olympia repo, cmake is successful. May be we need to copy one more file from old repo?

Error building an STF Capable Dromajo

Discussed in #80

Originally posted by Nofal475 August 23, 2023
I am following the steps given to buid a STF-Capable dromajo, but for some reason it is failing
I get a long error message when I run make, part of which is attached below
If I don't apply dromajo_stf_lib.patch I don't get this error
Error:
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/type_traits:910:52: error: static assertion failed: template argument must be a complete class or an unbounded array
910 | static_assert(std::__is_complete_or_unbounded(__type_identity<_Tp>{}),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/type_traits:910:52: note: 'std::__is_complete_or_unbounded<__type_identitystf::TraceInfoRecord >((std::__type_identitystf::TraceInfoRecord(), std::__type_identitystf::TraceInfoRecord()))' evaluates to false
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h: In instantiation of 'void std::_Destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator = stf::TraceInfoRecord*]':
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/alloc_traits.h:850:15: required from 'void std::_Destroy(_ForwardIterator, _ForwardIterator, allocator<_T2>&) [with _ForwardIterator = stf::TraceInfoRecord*; _Tp = stf::TraceInfoRecord]'
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_vector.h:730:15: required from 'std::vector<_Tp, _Alloc>::~vector() [with _Tp = stf::TraceInfoRecord; _Alloc = std::allocatorstf::TraceInfoRecord]'
/home/nofal/Workspace/Student/map/riscv-perf-model/traces/stf_trace_gen/dromajo/stf_lib/stf-inc/stf_writer_base.hpp:24:11: required from here
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:188:51: error: static assertion failed: value type is destructible
188 | static_assert(is_destructible<_Value_type>::value,
| ^~~~~
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:188:51: note: 'std::integral_constant<bool, false>::value' evaluates to false
/home/nofal/miniconda3/envs/sparta/x86_64-conda-linux-gnu/include/c++/12.3.0/bits/stl_construct.h:195:25: error: invalid use of incomplete type 'std::iterator_traitsstf::TraceInfoRecord*::value_type' {aka 'class stf::TraceInfoRecord'}
195 | std::_Destroy_aux<__has_trivial_destructor(_Value_type)>::
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/nofal/Workspace/Student/map/riscv-perf-model/traces/stf_trace_gen/dromajo/stf_lib/stf-inc/stf_record_types.hpp:2055:11: note: forward declaration of 'std::iterator_traitsstf::TraceInfoRecord*::value_type' {aka 'class stf::TraceInfoRecord'}
2055 | class TraceInfoRecord : public TypeAwareSTFRecord<TraceInfoRecord, descriptors::internal::Descriptor::STF_TRACE_INFO> {
| ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/dromajo.dir/build.make:76: CMakeFiles/dromajo.dir/src/dromajo.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:147: CMakeFiles/dromajo.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Issue with Building Conda Environment

Getting this error on M1 Macbook Air running OSX Ventura 13.3.1 when trying to build via conda. I was able to still run Olympia by skipping this step and just cloning and building Sparta in conda.

(sparta) aaronchan@Aarons-MacBook-Air-2 conda % conda env create -f environment.yml       
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - xz[version='>=5.2.6',build=h166bdaf_0]
  - sqlite[version='>=3.36.0',build=h9cd32fc_2]
  - yaml-cpp[version='>=0.7.0',build=h27087fc_2]
  - binutils_impl_linux-64[version='>=2.39',build=he00db2b_1]
  - ncurses[version='>=6.3',build=h27087fc_1]
  - liblapack[version='>=3.9.0',build=16_linux64_openblas]
  - libssh2[version='>=1.10.0',build=haa6b8db_3]
  - libffi[version='>=3.4.2',build=h7f98852_5]
  - libgfortran-ng[version='>=12.2.0',build=h69a702a_19]
  - libstdcxx-devel_linux-64[version='>=10.4.0',build=hd38fd1e_19]
  - zstd[version='>=1.5.2',build=h6239696_4]
  - pcre[version='>=8.45',build=h9c3ff4c_0]
  - _openmp_mutex[version='>=4.5',build=2_gnu]
  - libgcc-ng[version='>=12.2.0',build=h65d4601_19]
  - krb5[version='>=1.20.1',build=hf9c8cef_0]
  - libgcc-devel_linux-64[version='>=10.4.0',build=hd38fd1e_19]
  - doxygen[version='>=1.8.20',build=had0d8f1_0]
  - c-ares[version='>=1.18.1',build=h7f98852_0]
  - libcurl[version='>=7.87.0',build=h6312ad2_0]
  - libgfortran5[version='>=12.2.0',build=h337968e_19]
  - python[version='>=3.10.8',build=h257c98d_0_cpython]
  - rhash[version='>=1.4.3',build=h166bdaf_0]
  - hdf5[version='>=1.10.6',build=nompi_h6a2412b_1114]
  - libnsl[version='>=2.0.0',build=h7f98852_0]
  - gxx_linux-64[version='>=10.4.0',build=h6e491c6_11]
  - keyutils[version='>=1.6.1',build=h166bdaf_0]
  - libzlib[version='>=1.2.13',build=h166bdaf_4]
  - bzip2[version='>=1.0.8',build=h7f98852_4]
  - libgomp[version='>=12.2.0',build=h65d4601_19]
  - libopenblas[version='>=0.3.21',build=pthreads_h78a6416_3]
  - rapidjson[version='>=1.1.0',build=he1b5a44_1002]
  - _libgcc_mutex[version='>=0.1',build=conda_forge]
  - libsqlite[version='>=3.40.0',build=h753d276_0]
  - binutils_linux-64[version='>=2.39',build=h5fc0e48_11]
  - icu[version='>=68.2',build=h9c3ff4c_0]
  - libedit[version='>=3.1.20191231',build=he28a2e2_2]
  - openssl[version='>=1.1.1s',build=h0b41bf4_1]
  - tk[version='>=8.6.12',build=h27826a3_0]
  - readline[version='>=8.1.2',build=h0f457ee_0]
  - libiconv[version='>=1.17',build=h166bdaf_0]
  - libuv[version='>=1.44.2',build=h166bdaf_0]
  - expat[version='>=2.5.0',build=h27087fc_0]
  - gcc_impl_linux-64[version='>=10.4.0',build=h5231bdf_19]
  - gxx_impl_linux-64[version='>=10.4.0',build=h5231bdf_19]
  - boost-cpp[version='>=1.76.0',build=h312852a_1]
  - gcc_linux-64[version='>=10.4.0',build=h9215b83_11]
  - zlib[version='>=1.2.13',build=h166bdaf_4]
  - numpy[version='>=1.24.1',build=py310h08bbf29_0]
  - libstdcxx-ng[version='>=12.2.0',build=h46fd767_19]
  - libblas[version='>=3.9.0',build=16_linux64_openblas]
  - cppcheck[version='>=2.7.5',build=py310h94ea96f_1]
  - libsanitizer[version='>=10.4.0',build=h5246dfb_19]
  - ca-certificates[version='>=2022.12.7',build=ha878542_0]
  - libev[version='>=4.33',build=h516909a_1]
  - boost[version='>=1.76.0',build=py310h7c3ba0c_1]
  - ld_impl_linux-64[version='>=2.39',build=hcc3a1bd_1]
  - libuuid[version='>=2.32.1',build=h7f98852_1000]
  - libcblas[version='>=3.9.0',build=16_linux64_openblas]
  - curl[version='>=7.87.0',build=h6312ad2_0]
  - libnghttp2[version='>=1.51.0',build=hdcd2b5c_0]

Error trying to build olympia

Running to the following error trying to build olympia

% cmake .. -DCMAKE_BUILD_TYPE=Release -DSPARTA_BASE=<path/to/>/map/sparta
...
-- Build files have been written to: /users/achakraborty/mywork/src/riscv-perf-model/release
% make olympia
...
[ 44%] Building CXX object core/CMakeFiles/core.dir/Fetch.cpp.o
/users/achakraborty/mywork/src/riscv-perf-model/core/Fetch.cpp: In member function 'void olympia::Fetch::fetchInstruction_()':
/users/achakraborty/mywork/src/riscv-perf-model/core/Fetch.cpp:76:52: error: conversion from 'std::size_t' {aka 'long unsigned int'} to
 'uint32_t' {aka 'unsigned int'} may change value [-Werror=conversion]
   76 |         credits_inst_queue_ -= insts_to_send->size();

Adding new sparta unit : L2Cache

  • Create a new sparta unit for L2Cache that received requests from IL1 and DL1 and sends requests out to BIU for the misses in L2Cache.
  • L2Cache can handle multiple outstanding/pending misses and supports out-of-order returns from BIU.
  • Separate ports for acks/credits and responses for DL1, IL1 and BIU.

Enhance Olympia performance features

This GH issue is a starting point, or set of tasks, to add more functionality to Olympia based on what exists in the open-source community.

Those tasks are:

  • Find/list those open-source performance simulators in the industry that are similar to Olympia (OoO, super scalar models, not necessarily RISC-V based)
  • For each simulator found, list what that simulator supports/has that Olympia lacks
  • Add GH issues to this project so that discussion/work can be determined

Once GH issues are added, this GH issue can be considered complete.

credits should be reduced with i in fetchinstructions()

The main loop in the function fetchInstruction will abnormally break when the ex_inst is nullptr. The number of insts which has been fetched will be less than the upper. I think the credits should be reduced with i not the upper.
Like this:

uint32_t i = 0;
for(; i < upper; ++i)
    ...
credits_inst_queue_ -= i;

simdb example

I've tried to copy some of the simdb examples but I'm not having much luck getting things to work.

Would it be possible to implement an example using olympia?

Mainly I've followed
https://github.com/sparcians/map/blob/master/helios/plato/docs/capturing_hdf5_with_sparta.md

Which I've gotten into a state where it actually produces a h5 file but I can't actually read it using pandas

(this is the example under plato/demo/data, same thing occurs in my h5 file to)

[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_hdf('test.hdf5')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/vl/edatools/intern/python/3.8.3/CentOS-7/lib/python3.8/site-packages/pandas/io/pytables.py", line 398, in read_hdf
    raise ValueError(
ValueError: Dataset(s) incompatible with Pandas data types, not table, or no datasets found in HDF5 file.```

Separate Address/Data Readiness

Separate address and data readiness in the LSU and scoreboard to allow for address calculations to run ahead if data is not ready. Currently we wait for both address and data operands to be ready before starting store operations.

Boost 1.81 not working (on macOS) - non-default boost location not supported by CMakeLists

Hi,

I've had some issues for Olympia to build on macOs (M1, running Ventura 13.1), not sure if the issue is present in other systems: default boost installed with brew install boost gives v1.81, which has issues in Phoenix which cause it to fail with duplicate symbols when linking (boostorg/phoenix#111). (also Sparta build fails on this)

Just for other people running into this, here's my journey on this issue:

First of all: not to worry, I've also got boost v1.76 via homebrew, but it's not in a default location. So I had to build Sparta, Olympia and so also sta_lib with a non default boost location.

Then it got a bit complicated:

  • Sparta build needs -DBoost_DIR=/opt/homebrew/Cellar/[email protected]/1.76.0_3/lib/cmake/Boost-1.76.0 to point it to the cake folder for the right boost version. (it does not, for some reason, use the cmake default included files to find boost)
  • Olympia includes sparta-config.cmake so needs the same command line option for make to find Boost.
  • But Olympia includes sty_lib in the build tree as a submodule, which does use the cmake default scripts, and they need another define on the command line: -DBOOST_ROOT=/opt/homebrew/Cellar/[email protected]/1.76.0_3

Which makes for a cmake command that gets a bit verbose:

cmake -S . -B Release -DCMAKE_BUILD_TYPE=Release -DSPARTA_BASE=$HOME/src/map/sparta \
   -DBOOST_ROOT=/opt/homebrew/Cellar/[email protected]/1.76.0_3 \
   -DBoost_DIR=/opt/homebrew/Cellar/[email protected]/1.76.0_3/lib/cmake/Boost-1.76.0

Which then passes cmake but fails at build, because boost is not added as an include for core and mss.

Which I've fixed in the respective CMakeLists.txt, PR incoming soon.

But more generally: it would be nice to harmonise the boost config across Sparta, Olympia and stf_lib, so only 1 command line arg is needed to use it if it's in a non-default place. This avoids the double command line args needed in cmake.

I am willing to file a PR against Sparta as well to harmonise it to the standard cmake way of finding boost, but the comments in the CMakeLists.txt there seem to hint the custom way is there for (historic) reasons.

Float to Integer instructions cause assertion: "Got an I2F instruction in an ExecutionPipe that does not source the integer RF"

All of the float to integer instructions cause this assertion to fire within the execute pipe:

if(SPARTA_EXPECT_FALSE(ex_inst->getPipe() == InstArchInfo::TargetPipe::I2F)) {
sparta_assert(reg_file_ == core_types::RegFile::RF_INTEGER,
"Got an I2F instruction in an ExecutionPipe that does not source the integer RF");

If the intention is that these instructions use the i2f target pipe (not a new f2i type), then I think the code just needs to be changed to handle f2i, and the assertion removed.

Test trace:

[
    {
        "mnemonic": "fcvt.w.s",
        "rs1": 1,
        "rs2": 2,
        "rd": 3
    }
]

cmake fails complaining about relative paths found in include directories

I'm not sure if this is the right forum to discuss this. I built sparta on my Ubuntu system and ran cmake in the Olympia directory which results in cmake complaining about relative paths in the include directories for zstd.

❯ cmake .. -DCMAKE_BUILD_TYPE=Release -DSPARTA_BASE=/scratch/joy/workspace/map/sparta
..
..
zstd_LIBRARIES (ADVANCED)
    linked by target "olympia" in directory /home/joy/workspace/riscv-perf-model
    linked by target "Dispatch_test" in directory /home/joy/workspace/riscv-perf-model/test/core/dispatch

CMake Error in CMakeLists.txt:
  Found relative path while evaluating include directories of "olympia":

    "zstd_INCLUDE_DIRS-NOTFOUND"



CMake Error in core/CMakeLists.txt:
  Found relative path while evaluating include directories of "core":

    "zstd_INCLUDE_DIRS-NOTFOUND"



CMake Error in mss/CMakeLists.txt:
  Found relative path while evaluating include directories of "mss":

    "zstd_INCLUDE_DIRS-NOTFOUND"

Model cache replacement policies

Develop replacement policies: random, LRU, TreePLRU using Sparta for various caches in Olympia.

Add a documentation on how to add more complicated replacement policies.

Rename reference counters not decremented for AMO instructions

There is a bug in the rename implementation which doesn't decrement the reference counter for RS2 mappings on AMO instructions, leading to a lock up once all rename registers are exhausted.

You can easily reproduce by using a trace of repeated AMOs like:

   {
        "mnemonic": "amoadd.w",
        "rs1": 1,
        "rs2": 1,
        "rd": 1
    },
   {
        "mnemonic": "amoadd.w",
        "rs1": 1,
        "rs2": 1,
        "rd": 1
    }
...

olympia -p top.cpu.core0.rename.params.num_integer_renames 48 -l "top.cpu*" info info.log traces/amo_lockup.json

No error message is printed for the lock up but you can see from the logfile that the trace hasn't finished:

{0000000084 00000084 top.cpu.core0.dispatch info} receiveCredits_: lsu got 1 credits, total: 8
{0000000084 00000084 top.cpu.core0.dispatch info} scheduleDispatchSession: no rob credits or no instructions to process
{0000000084 00000084 top.cpu.core0.lsu info} completeInst_: Complete Load Instruction: amoadd.w uid(16)
{0000000085 00000085 top.cpu.core0.rob info} retireInstructions_: num to retire: 1
{0000000085 00000085 top.cpu.core0.rob info} retireInstructions_: retiring uid: 16    RETIRED 0 pid: 16 'amoadd.w       1,1,1' 
{0000000085 00000085 top.cpu.core0.dispatch info} scheduleDispatchSession: no rob credits or no instructions to process
{0000000085 00000085 top.cpu.core0.dispatch info} robCredits_: ROB got 1 credits, total: 30
{0000000086 00000086 top.cpu.core0.rename info} getAckFromROB_: Retired instruction: uid: 16    RETIRED 0 pid: 16 'amoadd.w     1,1,1' 
{0000000086 00000086 top.cpu.core0.rename info} scheduleRenaming_: current stall: NO_RENAMES
{0000000000 -------- top.cpu.core0.rob info} ~ROB: ROB is destructing now, but you can still see this message

The bug appears to be caused due to a mismatch in conditions when updating the reference counter. The increment condition is based on isLoadStoreInst() && exists(RS2), whereas the decrement condition is just isStoreInst(). Atomics only trigger the increment logic.

https://github.com/riscv-software-src/riscv-perf-model/blob/72ba498a95b4d4f8462e47eea3a6250db5288053/core/Rename.cpp#L273C1-L281C59

                    if(renaming_inst->isLoadStoreInst()) {
                        // check for data operand existing based on RS2 existence
                        // store data register info separately
                        if(src.field_id == mavis::InstMetaData::OperandFieldID::RS2) {
                            const auto rf  = olympia::coreutils::determineRegisterFile(src);
                            const auto num = src.field_value;
                            auto & bitmask = renaming_inst->getDataRegisterBitMask(rf);
                            const uint32_t prf = map_table_[rf][num];
                            reference_counter_[rf][prf]++;

https://github.com/riscv-software-src/riscv-perf-model/blob/72ba498a95b4d4f8462e47eea3a6250db5288053/core/Rename.cpp#L136C1-L139C10

        if(inst_ptr->isStoreInst()) {
            const auto & data_reg = inst_ptr->getRenameData().getDataReg();
            --reference_counter_[data_reg.rf][data_reg.val];
        }

Furthermore, it appears that the store data register is not released immediately. I don't believe this is as severe because it will be released once the architectural register is overwritten later on, but still seems like a good idea to insert it into the freelist when possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.