Coder Social home page Coder Social logo

gthparch / macsim Goto Github PK

View Code? Open in Web Editor NEW
128.0 11.0 48.0 44.59 MB

A heterogeneous architecture timing model simulator.

Home Page: http://comparch.gatech.edu/hparch/macsim.html

Makefile 0.62% Python 6.71% C 15.61% Shell 0.08% C++ 75.86% Perl 1.00% M4 0.12%

macsim's Introduction

Macsim

Introduction

  • MacSim is a heterogeneous architecture timing model simulator that is developed from Georgia Institute of Technology.
  • It simulates x86, ARM64, NVIDIA PTX and Intel GEN GPU instructions and can be configured as either a trace driven or execution-drive cycle level simulator. It models detailed mico-architectural behaviors, including pipeline stages, multi-threading, and memory systems.
  • MacSim is capable of simulating a variety of architectures, such as Intel's Sandy Bridge, Skylake (both CPUs and GPUs) and NVIDIA's Fermi. It can simulate homogeneous ISA multicore simulations as well as heterogeneous ISA multicore simulations. It also supports asymmetric multicore configurations (small cores + medium cores + big cores) and SMT or MT architectures as well.
  • Currently interconnection network model (based on IRIS) and power model (based on McPAT) are connected.
  • MacSim is also one of the components of SST, so multiple MacSim simulatore can run concurrently.
  • The project has been supported by Intel, NSF, Sandia National Lab.

Note

  • If you're interested in the Intel's integrated GPU model in MacSim, please refer to intel_gpu branch.

  • We've developed a power model for GPU architecture using McPAT. Please refer to the following paper for more detailed information. Power Modeling for GPU Architecture using McPAT Modeling for GPU Architecture using McPAT.pdf) by Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, Wonyong Sung, from Transactions on Design Automation of Electronic Systems (TODAES) Vol. 19, No. 3.

  • We've characterised the performance of Intel's integrated GPUs using MacSim. Please refer to the following paper for more detailed information. Performance Characterisation and Simulation of Intel's Integrated GPU Architecture (ISPASS'18)

Intel GEN GPU Architecture

  • Intel GEN9 GPU Architecture:

Documentation

Please see MacSim documentation file for more detailed descriptions.

Download

  • You can download the latest copy from our git repository.
git clone -b intel_gpu https://github.com/gthparch/macsim.git

download traces 
/macsim/tools/download_trace_files.py

build

./build.py --ramulator (please see /macsim/INSTALL)

People

Q & A

If you have a question, please use github issue ticket.

Tutorial

  • We had a tutorial in HPCA-2012. Please visit here for the slides.
  • We had a tutorial in ISCA-2012, Please visit here for the slides.

SST+MacSim

  • Here are two example configurations of SST+MacSim.
    • A multi-socket system with cache coherence model:
    • A CPU+GPU heterogeneous system with shared memory:

macsim's People

Contributors

ammrat13 avatar bluetopaz avatar farzonl avatar ghjeong12 avatar hyesoon avatar jieuntemp avatar jlee663 avatar macsimgt avatar mgoldstein322 avatar nailifeng avatar omikun avatar pgera avatar pranith avatar ramyadhadidi avatar samjijina avatar williamwangpeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

macsim's Issues

execution-driven?

Hi,

As said in README.md, macsim can be either trace driven or execution-driven, but I didn't find any execution-driven example.

disabling BP has no impact in case of CPU (x86)

Hello,
In running few experiments using the BP, I came to an issue where whether we disable BP or not, it does not have any impact on the results.
Looking into the issue, I found that in the frontend_c::predict_bpu function, the BP disabled path has no impact in case of CPU (I am also not very sure how it works on the GPU case):
if (*m_simBase->m_knobs->KNOB_USE_BRANCH_PREDICTION) { // do nothing } // no branch prediction else { // GPU : stall on branch policy, stop fetching if (m_ptx_sim && *m_simBase->m_knobs->KNOB_MT_NO_FETCH_BR) { set_br_wait(uop->m_thread_id); mispredicted = false; } }

My understanding from the flow that is should return mispredicted = true if BP was disabled?
Although semantically disabling BP is not the same as misprediction but this is the closest I can get form the code?
right now simply disabling BP has no impact.

In the meantime, I did something like:
if (*m_simBase->m_knobs->KNOB_USE_BRANCH_PREDICTION) { // do nothing } // no branch prediction else { // GPU : stall on branch policy, stop fetching if (m_ptx_sim && *m_simBase->m_knobs->KNOB_MT_NO_FETCH_BR) { set_br_wait(uop->m_thread_id); mispredicted = false; } else { //CPU x86 path should return misprediction in case that bp is disabled! mispredicted = true; } }
Not sure if this is what originally the code intended or not.

Mohamed

Generating traces for integrated GPU

Hello,

I have a question regarding generating traces for OpenCL kernels running on integrated GPU.

I'm using tools in x86_trace_generator directory to generate traces for OpenCL kernels to run on integrated GPU. So far, the generated traces are only for one thread which I think it is for the whole kernel. I'm using both -thread Knob and -skipthread0 in this case. Even though I specify the value of -thread Knob to be X threads, only trace file for one thread is generated.

How to generate traces for multiple threads like for example (168 threads) for OpenCL kernels to use when simulating integrated GPU?

Thank you,

about pin 2.12 version

macsim master version need pin 2.12,But the intel pin official website has not been able to download this version

DRAM interface for WRITE/READ?

Hello,

I am looking at the provided DRAM interfaces for either DRAMsim as well as the Ramulator and I am not clear. In detail, dram_dramsim.cc line 142 we see:
if (m_dramsim->addTransaction(req->m_type == MRT_WB,static_cast<uint64_t>(req->m_addr))) ..
and also in dram_ramulator.cc line 145 we have:
if (req->m_type != MRT_WB) { ...

Here it seems every request generated by the cores are considered either MRT_WB as a WRITE request or not (everything else considered as READ request to DRAM).

Investigating with two different traces from IsolBench (Bandwidth Read and Bandwidth Write), I can confirm that most of the requests of Bandwidth Read application is of type of MRT_DFETCH and most of the requests of Bandwidth Write application is of type of MRT_DSTORE (not MRT_WB). Therefore, simulation with Ramulator/DRAMsim always receives READ requests even the core generates MRT_DSTORE.

Can you perhaps elaborate on this and let me know why all requests are considered either MRT_WB or NOT? I believe MRT_DSTORE should not be considered as READ request!! (or I am wrong?)

Fixing the statement to consider: m_type == MRT_WB || m_type == MRT_DSTORE as WRITE request gives ASSERT FAILED for both Ramulator/DRAMsim.

Any feedback would be appreciated.

IARG_BRANCH_TARGET_ADDR is invalid for XBEGIN and XEND

The INS_IsControlFlow function is called in the instrument function to check whether the IARG_BRANCH_TARGET_ADDR argument is valid for the instruction. However, there are two instructions for which INS_IsControlFlow returns true but IARG_BRANCH_TARGET_ADDR is invalid. These instructions include XBEGIN and XEND. Therefore the code should be changed to:

if (INS_IsControlFlow(ins) && !INS_IsXend(ins) && !INS_IsXbegin(ins))
{
...
}

Credit goes to Divya Praneetha for stumbling upon this bug.

Questions:llc_decoupled_network

Hello
I hope you are well
when we use the memory type as llc_decoupled_network, L3 caches is disable and not connect with other cache.But why distribute the router to L3 caches.
// NEXT_ID, PREV_ID, DONE, COUPLE_UP, COUPLE_DOWN, DISABLE, HAS_ROUTER
m_l3_cache[ii]->init(-1, -1, false, false, false, true, true);
Thank you very much for your reply.

Shared LLC(between CPU-GPU)

Hello
I hope you are well
First of all I want to know how I can determine a Shared LLC between CPU and GPU and another question that I have is that how I can detect the block in LLC is related CPU or GPU?

Question

Hello
I hope you are well
Is there any GPU/CPU macsim traces available for public access?

Perfect_BP should not have an impact if USE_BRANCH_PREDICTION was set to false

Hello,
Another issue in the frontend_c::predict_bpu function is that if BP was disabled but the user sets perfect_BP, it will still act as perfect.
It might be more of a usability than a correctness issue but it sounds more intuitive if BP was disabled to void the effect of the perfect_BP parameter.
In that case, this line of code:

if (*m_simBase->m_knobs->KNOB_USE_BRANCH_PREDICTION) mispredicted = false;

can be replaced by the condition of use_branch_prediction:

if ((*m_simBase->m_knobs->KNOB_USE_BRANCH_PREDICTION) && (*m_simBase->m_knobs->KNOB_PERFECT_BP)) mispredicted = false;

Mohamed

Trace Generator bug in creating traces for multi-threaded code

I am trying to use Intel PIN and trace generator to generate traces for code that is multithreaded, however I get a segmentation fault.

Directions to reproduce:

  1. Sample code (from online source):
#include <pthread.h>
#include <stdio.h>
#include <string.h>

/* function to be run as a thread always must have the same signature:
   it has one void* parameter and returns void */
void *threadfunction(void *arg)
{
  printf("Hello, World!\n"); /*printf() is specified as thread-safe as of C11*/
  return 0;
}

int main(void)
{
  pthread_t thread;
  int createerror = pthread_create(&thread, NULL, threadfunction, NULL);
  int createerror2 = pthread_create(&thread, NULL, threadfunction, NULL);
  /*creates a new thread with default attributes and NULL passed as the argument to the start routine*/
  if (!createerror && !createerror2) /*check whether the thread creation was successful*/
    {
      pthread_join(thread, NULL); /*wait until the created thread terminates*/
      return 0;
    }
  fprintf("%s\n", strerror(createerror), stderr);
  return 1;
}
  1. Compile with gcc pthread_hello_world.c -o HelloWorld -l pthread -Wall

  2. Run make clean && make in ../tools/x86_trace_generator

  3. Run pin: pin -t obj-intel64/trace_generator.so -- ./HelloWorld

Output:

-> Thread[0->0] begins.
-> Trace Generation Starts at icount 0
-> Thread[1->0] begins.
-> Trace Generation Starts at icount 0
-> Thread[2->0] begins.
-> Trace Generation Starts at icount 0
C: Tool (or Pin) caused signal 11 at PC 0x7fb2984de6bb
Segmentation fault (core dumped)

error while compiling macsim with SST

I'm getting following error while trying to compile macsim with sst.

##################################################################
make[6]: Entering directory '/mnt/New_Volume1/Acads/DDP/simulators/SST/src/sstelements-8.0.0/sst-elements-library-8.0.0/src/sst/elements/macsim'
CXX macsimComponent.lo
macsimComponent.cpp:6:44: fatal error: sst/core/serialization/element.h: No such file or directory
#include <sst/core/serialization/element.h>

compilation terminated.
##################################################################

I'm using following commands:
./autogen.sh
./configure --prefix=$SST_HOME --with-boost=$BOOST_HOME
make && make install

Please help.

About a XED library updating issue

Hello, I follow the instructions on the website for updating XED with Pin2.12.
But when I modified "5. replace string tr_opcode_names[66] = { in trace_generator (trace_generator.h)

=> extract_category()", It showed such error:
trace_generator.cpp: In function 'void sanity_check()':
trace_generator.cpp:1165: error: 'tr_opcode_names' was not declared in this scope
trace_generator.cpp: In function 'void write_inst_to_file(std::ofstream*, Inst_info*)':
trace_generator.cpp:1199: error: 'tr_opcode_names' was not declared in this scope
trace_generator.cpp: In function 'void dprint_inst(LEVEL_BASE::ADDRINT, std::string*, LEVEL_VM::THREADID)':
trace_generator.cpp:1245: error: 'tr_opcode_names' was not declared in this scope
make: *** [obj-intel64/trace_generator.o] Error 1
It seems the function" extract_category()" doesn't work to replace default operation.
Also, I have modified the initialization in xed_extractor.cpp :
void initialize_sim(void)
{
extract_register();
extract_category();
macsim_opcode();
macsim_opcode_enum();
macsim_reg();
macsim_pin_convert();
}
Is that correct?
Hope someone can help me, thanks.

frontend_c::btb_access function returns wrong value if BTB was disabled

Hello,
First of all, thank you for the efforts. I have been using Macsim since its early days (2012) as a PhD student and still use it.
I appreciate its cleanness and readability.
Recently, I came across this issue in the branch prediction.
If BTB is disabled, my understanding is that this function in the frontend frontend_c::btb_access should return true. Since the semantics of the return is in fact whether it is a BTB miss (also as a side note for the same reason, I guess the naming of the function might be counter intuitive since it in fact determines whether it is a BTB miss or not).

If this understanding is correct, this line:
if (!*KNOB(KNOB_ENABLE_BTB)) return false;
should be:
if (!*KNOB(KNOB_ENABLE_BTB)) return true;

With the original code, whether BTB is disabled or perfect in fact gives the same result.

Thanks,
Mohamed

about the use of macsim sst

l use sst to connect macsim, I noticed "ifdef USING_SST" in macsim code , but I found no place defined USING_SST, I tried to print USING_SST,the result is empty, could you tell me how to define the USING_SST,It would be better if you give an example.
thanks!

GPU traces with previous version of MacSim?

I received the Parboil and also the GraphBig-10K traces. I tried to run some tests to see if I would be able to run with MacSim and found that my MacSim version is not able to read the traces (the traces that I have are version 14). Just to clarify, I am using a MacSim version 1.2. However, I noticed that the version of the traces (14), only works with the newer release of the MacSim. I am just wondering is there any workaround that enable older MacSim (version 1.2 to be specific) to work with the traces that Sam handed to me?

For example, in the mri-q benchmark, I modified the kernel_config.txt to be:
-1 newptx
/trace2/ptx_jun2016/parboil_jul16/mri-q/small/_Z17ComputePhiMag_GPUPfS_S_i_0/Trace.txt
/trace2/ptx_jun2016/parboil_jul16/mri-q/small/_Z12ComputeQ_GPUiiPfS_S_S_S__0/Trace.txt

and also I modified the beginning of Trace.txt files as follows:
1024 newptx 6
and
16 newptx 3

Specifically, I receive the following when trying the traces version 14 with my MacSim:

src/trace_read.cc:1599: ASSERT FAILED (I=27 C=17152): temp_num_req > 0
src/trace_read.cc:1599: ASSERT FAILED (I=27 C=17152): size:0 max:64 num:0 type:2 num:1

Can you please advise?

Having difficulties with python when building

Hi,

I am getting some errors with build scripts in python.

Initially I attempted to compile with python 3.9 and got errors regarding ConfigParser. Internet search revealed that his may be due to python version as the name of the ConfigParser module changed to configparser.

Due to this problem I installed python 2.7. But this time I get this error:

scons: Reading SConscript files ...
AttributeError: '_Environ' object has no attribute 'iteritems':
  File "/mnt/c/Users/hp1/Desktop/doga/assignment3/macsim/SConstruct", line 63:
    pre_compile_check()
  File "/mnt/c/Users/hp1/Desktop/doga/assignment3/macsim/SConstruct", line 51:
    for key,val in os.environ.iteritems():

How can I solve this? Any ideas?

Thanks in advance.

Configuration:

  • Windows 10
  • WSL 2
  • ArchLinux

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.