cmu-safari / ramulator Goto Github PK

A Fast and Extensible DRAM Simulator, with built-in support for modeling many different DRAM technologies including DDRx, LPDDRx, GDDRx, WIOx, HBMx, and various academic proposals. Described in the IEEE CAL 2015 paper by Kim et al. at http://users.ece.cmu.edu/~omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf

License: MIT License

Makefile 0.33% Python 1.64% C++ 93.62% Shell 2.16% Perl 2.09% C 0.16%

ramulator's Introduction

We have released an updated version of Ramulator, called Ramulator 2.0, in August 2023. Ramulator 2.0 is easier to use, extend, and modify. It also has support for the latest DRAM standards at the time (e.g., DDR5, LPDDR5, HBM3 GDDR6). We suggest that you use Ramulator 2.0 and welcome your feedback and bug/issue reports.

Ramulator: A DRAM Simulator

Ramulator is a fast and cycle-accurate DRAM simulator [1, 2] that supports a wide array of commercial, as well as academic, DRAM standards:

DDR3 (2007), DDR4 (2012)
LPDDR3 (2012), LPDDR4 (2014)
GDDR5 (2009)
WIO (2011), WIO2 (2014)
HBM (2013)
SALP [3]
TL-DRAM [4]
RowClone [5]
DSARP [6]

The initial release of Ramulator is described in the following paper:

Y. Kim, W. Yang, O. Mutlu. "Ramulator: A Fast and Extensible DRAM Simulator". In IEEE Computer Architecture Letters, March 2015.

For information on new features, along with an extensive memory characterization using Ramulator, please read:

S. Ghose, T. Li, N. Hajinazar, D. Senol Cali, O. Mutlu. "Demystifying Complex Workload–DRAM Interactions: An Experimental Study". In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), June 2019 (slides). In Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), 2019.

[1] Kim et al. Ramulator: A Fast and Extensible DRAM Simulator. IEEE CAL 2015.
[2] Ghose et al. Demystifying Complex Workload–DRAM Interactions: An Experimental Study. SIGMETRICS 2019.
[3] Kim et al. A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM. ISCA 2012.
[4] Lee et al. Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture. HPCA 2013.
[5] Seshadri et al. RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization. MICRO 2013.
[6] Chang et al. Improving DRAM Performance by Parallelizing Refreshes with Accesses. HPCA 2014.

Usage

Ramulator supports three different usage modes.

Memory Trace Driven: Ramulator directly reads memory traces from a file, and simulates only the DRAM subsystem. Each line in the trace file represents a memory request, with the hexadecimal address followed by 'R' or 'W' for read or write.

0x12345680 R
0x4cbd56c0 W
...

CPU Trace Driven: Ramulator directly reads instruction traces from a file, and simulates a simplified model of a "core" that generates memory requests to the DRAM subsystem. Each line in the trace file represents a memory request, and can have one of the following two formats.

<num-cpuinst> <addr-read>: For a line with two tokens, the first token represents the number of CPU (i.e., non-memory) instructions before the memory request, and the second token is the decimal address of a read.
<num-cpuinst> <addr-read> <addr-writeback>: For a line with three tokens, the third token is the decimal address of the writeback request, which is the dirty cache-line eviction caused by the read request before it.

gem5 Driven: Ramulator runs as part of a full-system simulator (gem5 [7]), from which it receives memory request as they are generated.

For some of the DRAM standards, Ramulator is also capable of reporting power consumption by relying on either VAMPIRE [8] or DRAMPower [9] as the backend.

[7] The gem5 Simulator System.
[8] Ghose et al. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study. SIGMETRICS 2018.
[9] Chandrasekar et al. DRAMPower: Open-Source DRAM Power & Energy Estimation Tool. IEEE CAL 2015.

Getting Started

Ramulator requires a C++11 compiler (e.g., clang++, g++-5).

Memory Trace Driven

 $ cd ramulator
 $ make -j
 $ ./ramulator configs/DDR3-config.cfg --mode=dram dram.trace
 Simulation done. Statistics written to DDR3.stats
 # NOTE: dram.trace is a very short trace file provided only as an example.
 $ ./ramulator configs/DDR3-config.cfg --mode=dram --stats my_output.txt dram.trace
 Simulation done. Statistics written to my_output.txt
 # NOTE: optional --stats flag changes the statistics output filename

CPU Trace Driven

 $ cd ramulator
 $ make -j
 $ ./ramulator configs/DDR3-config.cfg --mode=cpu cpu.trace
 Simulation done. Statistics written to DDR3.stats
 # NOTE: cpu.trace is a very short trace file provided only as an example.
 $ ./ramulator configs/DDR3-config.cfg --mode=cpu --stats my_output.txt cpu.trace
 Simulation done. Statistics written to my_output.txt
 # NOTE: optional --stats flag changes the statistics output filename

gem5 Driven

Requires SWIG 2.0.12+, gperftools (libgoogle-perftools-dev package on Ubuntu)

 $ hg clone http://repo.gem5.org/gem5-stable
 $ cd gem5-stable
 $ hg update -c 10231  # Revert to stable version from 5/31/2014 (10231:0e86fac7254c)
 $ patch -Np1 --ignore-whitespace < /path/to/ramulator/gem5-0e86fac7254c-ramulator.patch
 $ cd ext/ramulator
 $ mkdir Ramulator
 $ cp -r /path/to/ramulator/src Ramulator
 # Compile gem5
 # Run gem5 with `--mem-type=ramulator` and `--ramulator-config=configs/DDR3-config.cfg`

By default, gem5 uses the atomic CPU and uses atomic memory accesses, i.e. a detailed memory model like ramulator is not really used. To actually run gem5 in timing mode, a CPU type need to be specified by command line parameter --cpu-type. e.g. --cpu-type=timing

Simulation Output

Ramulator will report a series of statistics for every run, which are written to a file. We have provided a series of gem5-compatible statistics classes in Statistics.h.

Memory Trace/CPU Trace Driven: When run in memory trace driven or CPU trace driven mode, Ramulator will write these statistics to a file. By default, the filename will be <standard_name>.stats (e.g., DDR3.stats). You can write the statistics file to a different filename by adding --stats <filename> to the command line after the --mode switch (see examples above).

gem5 Driven: Ramulator automatically integrates its statistics into gem5. Ramulator's statistics are written directly into the gem5 statistic file, with the prefix ramulator. added to each stat's name.

NOTE: When creating your own stats objects, don't place them inside STL containers that are automatically resized (e.g, vector). Since these containers copy on resize, you will end up with duplicate statistics printed in the output file.

Reproducing Results from Paper (Kim et al. [1])

Debugging & Verification (Section 4.1)

For debugging and verification purposes, Ramulator can print the trace of every DRAM command it issues along with their address and timing information. To do so, please turn on the print_cmd_trace variable in the configuration file.

Comparison Against Other Simulators (Section 4.2)

For comparing Ramulator against other DRAM simulators, we provide a script that automates the process: test_ddr3.py. Before you run this script, however, you must specify the location of their executables and configuration files at designated lines in the script's source code:

Ramulator
DRAMSim2 (https://wiki.umd.edu/DRAMSim2): test_ddr3.py lines 39-40
USIMM (http://www.cs.utah.edu/~rajeev/jwac12): test_ddr3.py lines 54-55
DrSim (http://lph.ece.utexas.edu/public/Main/DrSim): test_ddr3.py lines 66-67
NVMain (http://wiki.nvmain.org): test_ddr3.py lines 78-79

Please refer to their respective websites to download, build, and set-up the other simulators. The simulators must to be executed in saturation mode (always filling up the request queues when possible).

All five simulators were configured using the same parameters:

DDR3-1600K (11-11-11), 1 Channel, 1 Rank, 2Gb x8 chips
FR-FCFS Scheduling
Open-Row Policy
32/32 Entry Read/Write Queues
High/Low Watermarks for Write Queue: 28/16

Finally, execute test_ddr3.py <num-requests> to start off the simulation. Please make sure that there are no other active processes during simulation to yield accurate measurements of memory usage and CPU time.

Cross-Sectional Study of DRAM Standards (Section 4.3)

Please use the CPU traces (SPEC 2006) provided in the cputraces folder to run CPU trace driven simulations.

Other Tips

Power Estimation

For estimating power consumption, Ramulator can record the trace of every DRAM command it issues to a file in DRAMPower [8] format. To do so, please turn on the record_cmd_trace variable in the configuration file. The resulting DRAM command trace (e.g., cmd-trace-chan-N-rank-M.cmdtrace) should be fed into a compatible DRAM energy simulator such as VAMPIRE [8] or DRAMPower [9] with the correct configuration (standard/speed/organization) to estimate energy/power usage for a single rank (a current limitation of both VAMPIRE and DRAMPower).

Contributors

Yoongu Kim (Carnegie Mellon University)
Weikun Yang (Peking University)
Kevin Chang (Carnegie Mellon University)
Donghyuk Lee (Carnegie Mellon University)
Vivek Seshadri (Carnegie Mellon University)
Saugata Ghose (Carnegie Mellon University)
Tianshi Li (Carnegie Mellon University)
@henryzh

Acknowledgments

We thank the SAFARI group members who have contributed to the initial development of Ramulator, including Kevin Chang, Saugata Ghose, Donghyuk Lee, Tianshi Li, and Vivek Seshadri. We also thank the anonymous reviewers for feedback. This work was supported by NSF, SRC, and gifts from our industrial partners, including Google, Intel, Microsoft, Nvidia, Samsung, Seagate and VMware.

ramulator's People

Contributors

Stargazers

Watchers

Forkers

henryzh i7mist lwj0012 lit0r cfandy 8l hoangt spicychckn sufizz mappouras drjantz tupipa slevin-by yellwood hyojongk danteisalive kvprathap chaopeng13 afrodri varunnagpaal gshrikant amanusk hongyunnchen rspliet adivittala elaheh-sadredini dfujiki gumi-presentation-by-dzh rohitsahoo marialean1985 konkanello lucjaulmes miyavi-chen arun-sub b08770 esalcort johnjohnlin suyashmahar kittysimida sumit-2020 chwipoc diantaowang rachmadvwp matthewwedwards c-w-m shuangchenli nisabostanci tks2004 harshgugale byungwoo733 pratyusha-duvvuri adolfo-karim youtubezou superwind bryanherbert metafly shiangjun hxji mattvilim vdimic zhenman thesukantadey kangjiangnudt bathepawan kobzol fkoc03 minhsqtruong tsengs0 bespoke-silicon-group guoqinglei linestro tmnvnbl alaeddine1996 lsteiner-tukl abhishekuor antur5 junhua-zhang memmeta sasinduwijeratne faressalem mrperleberg plsmaop winnie10 langrange-l jiaminglin lagunazeng elon-wang tdietert 13301338176 nanomaoli awfeequdng ltears sykret smosanu fabwu akiraduko darrontam shenjiangqiu parcolab tongzhongkai

ramulator's Issues

Multicore Usage + Caches

We are planning to use multi-core ramulator with caches. It would be great if we can have an updated README and examples of cpu.trace files for cache enabled.

In specific, how the bubble count in the trace file should change in cache enabled case?

Thanks!
Manish

Gem5-Ramulator fails with !req_stall assertion failure

Hello,

I am trying to run ramulator with Gem5 and using the latest ramulator version.
I am running Gem5 on a full-system mode by first taking a checkpoint in atomic cpu, simple memory and restoring from the chechpoint in arm_detailed cpu with ramulator. After restoring from the checkpoint, when i run highly memory intensive synthetic benchmarks, the gem5 aborts with below error.
gem5.opt: build/ARM/mem/ramulator.cc:121: bool Ramulator::recvTimingReq(PacketPtr): Assertion `!req_stall' failed.

Could someone help me in figuring out the reason of this failure?

Thanks,
Prathap

HBM generation 2

Hi,
I was wondering that since HBM gen 2 is already out in the market do you have any plan to integrate that into Ramulator? Or do you think incorporating a new speed entry in HBM.h with necessary changes will be sufficient? It will be really helpful if you please let me know.

Thanks a lot.

Mahzabeen

0 Value in Simulation Output

Hi,
Under the latest version (7b50b64) of Ramulator, I started the simulation by "./ramulator configs/DDR3-config.cfg --mode=cpu cputraces/429.mcf". However, in the output, I found several "0" values (some are shown below) which seemed not reasonable.

ramulator.read_latency_avg_0 0.000000
ramulator.read_latency_sum_0 1276331318
ramulator.req_queue_length_avg_0 0.000000
ramulator.req_queue_length_sum_0 2147122205
ramulator.read_req_queue_length_avg_0 0.000000
ramulator.read_req_queue_length_sum_0 1276340988
ramulator.write_req_queue_length_avg_0 0.000000
ramulator.write_req_queue_length_sum_0 870781217

Could you help check whether those "0" values are correct or not?

Thanks a lot!

Capacity Mismatch for DDR3

Version of Clang used for compiling gem5

I have tried to compile gem5 with ramulator patched in. I am encountering compilation errors while using both g++ and clang compilers.

While using g++, the ramulator code gives a compile time error saying : template specialization not allowed in a different namespace.

While using clang, the gem5 part of the code fails with several kinds of compilation errors. I tried using clang versions 3.3, 3.4 and 3.5. Each fails with a different kind of compilation error.

I am curious to know what version of clang you used to compile gem5.

Does cache really work?

Hi,
In the ramulator root, I ran the commands as follows:

./ramulator configs/DDR3-A-config.cfg --mode=cpu --stats a.txt cputraces/483.xalancbmk
./ramulator configs/DDR3-B-config.cfg --mode=cpu --stats b.txt cputraces/483.xalancbmk

The only difference between these two configuration files is that cache is set to no in DDR3-A-config.cfg while l1l2 in DDR3-B-config.cfg. (DDR3-A-config is actually the default DDR3 config file coming along with the repository).
However, the two output files (a.txt and b.txt) are the same. So could you check whether the cache system is working or I have made some mistake in utilizing cache?

Thanks a lot in advance!

How can I get write request callback?

I see there is a callback function for read request, but I don't see any write request call back.
I find the read request callback is invoked in Controller.cpp while processing the pending request queue. The pending queue is only for read request. Could you give me some suggestions on invoking write request callback?

Thank you.

Runtime error gem5+ramulator with x86 & DDR3 as config

command line: ./build/X86/gem5.opt ./configs/example/se.py --mem-type=ramulator --ramulator-config=/home/sobanerje/ramulator-master/configs/DDR3-config.cfg -c tests/test-progs/hello/bin/x86/linux/hello
Global frequency set at 1000000000000 ticks per second
gem5.opt: build/X86/base/statistics.hh:1026: void Stats::VectorBase<Stats::Vector, Stats::StatStor>::doInit(size_type) [Derived = Stats::Vector, Stor = Stats::StatStor]: Assertion `s > 0 && "size must be positive!"' failed.
Program aborted at tick 0
Aborted (core dumped)

Please advice
Thanks

What is the average amount of time to finish a simulation using Ramulator?

Hi Ramulator,

I'm trying to run a Gem5 simulation with ramulator's memory (LPDDR4), but after more than 36 hours, the bbench-ics simulation still hasn't reach any end.
So, I would like to ask:

How long does it take for you to finish your bbench-ics simulation using ramulator memory?
I suspect that the bbench simulation will end automatically when it has finished, is this a correct expectation?

Here is my command to execute the simulation :
./build/ARM/gem5.fast configs/example/fs.py -b bbench-ics --kernel=vmlinux.smp.mouse.arm --frame-capture --mem-type=ramulator --ramulator-config=/path_to_ramulator/configs/LPDDR4-config.cfg --mem-size=2GB --disk-image=/path_to_gem5-full-system-images/ARMv7a-ICS-Android.SMP.nolock.img

Ramulator for NVMs

I'm interested in using Ramulator for simulating NVM(PCM). Do you know if someone has used/wrote configs(.h/.cpp) for using NVM on ramulator? Any suggestions/tips are welcome.

timing lookup table typos?

DDR4.h (DDR3.h as well)
// CAS <-> CAS (between sibling ranks)
t[int(Command::RD)].push_back({Command::RD, 1, s.nBL + s.nRTRS, true});
t[int(Command::RD)].push_back({Command::RDA, 1, s.nBL + s.nRTRS, true});
t[int(Command::RDA)].push_back({Command::RD, 1, s.nBL + s.nRTRS, true});
t[int(Command::RDA)].push_back({Command::RDA, 1, s.nBL + s.nRTRS, true});

//ssk are these typos? looks like they should be wr > wr
t[int(Command::RD)].push_back({Command::WR, 1, s.nBL + s.nRTRS, true});
t[int(Command::RD)].push_back({Command::WRA, 1, s.nBL + s.nRTRS, true});
t[int(Command::RDA)].push_back({Command::WR, 1, s.nBL + s.nRTRS, true});
t[int(Command::RDA)].push_back({Command::WRA, 1, s.nBL + s.nRTRS, true});

// ssk these are the correct eqns for rd > wr
t[int(Command::RD)].push_back({Command::WR, 1, s.nCL + s.nBL + s.nRTRS - s.nCWL, true});
t[int(Command::RD)].push_back({Command::WRA, 1, s.nCL + s.nBL + s.nRTRS - s.nCWL, true});
t[int(Command::RDA)].push_back({Command::WR, 1, s.nCL + s.nBL + s.nRTRS - s.nCWL, true});
t[int(Command::RDA)].push_back({Command::WRA, 1, s.nCL + s.nBL + s.nRTRS - s.nCWL, true});

Ramulator does not follow the order of the trace file in DRAM mode

Hi,

When I run ramulator in DRAM mode, the order of Issue commands are not in the order given by the trace file file.
for example, the order of trace file is
Address_1 W
Address_1 R
Address_2 R
Address_3 R
the order of issue commands are
Address_1 R
Address_2 R
Address_3 R
Address_1 W

The "Write" should be addressed first. Don't the simulator follow the order in DRAM mode?

completion time and queue hit

The issues below are relevant to the concept of "Burst Length(BL)" in DRAM specification. The question is in the context when we have queue hit(or short cut for read requests?), having read request(s) which happen to have a write request in wqueue toward the same address as the read request(s) in "dram" mode.
What is definition of request completion in ramulator?

While write operation is not obvious, read operation, I believe, it ended when memory controller successfully return data requested.

Let's say that we have trace like this:
0xC000000 W
0xC000000 R
0xC000000 R
0xC000000 R
...
No matter how many read requests follow, "ramulator.dram_cycles" won't change for now, different from my expectation. Does the ramulator assume that the request is done when it has the same request already in the queue?

When we have a queue hit describe above, it seems reasonable for a controller to complete in a next cycle but still don't we need to consider "something equivalent to burst length between the queues in memory controller and LSQ or any host" to return data back to a host, or whatever it requests? In other words, is it okay to set depart time as current clock +1 in memory controller regardless of data transfer time similar as nBL?

I am bring up these because, unlike other trace-driven DRAM simulator, ramulator support event-driven mode which could be integrated with other front end such as GEM5.

Feel free to correct me if I miss any!

Thanks,
Yongkee

New gem5 patch is generating build error

Please have a look at the build error:

build/X86/mem/ramulator.cc:37:19: error: no matching constructor for initialization of
'ramulator::Gem5Wrapper'
wrapper = new ramulator::Gem5Wrapper(configs, system()->cacheLineSize());
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
build/ramulator/Ramulator/src/Gem5Wrapper.h:20:5: note: candidate constructor not viable: no known
conversion from 'ramulator::Config' to 'const string' (aka 'const basic_string') for 1st
argument
Gem5Wrapper(const string& config_file, int cacheline);
^
build/ramulator/Ramulator/src/Gem5Wrapper.h:14:7: note: candidate constructor
(the implicit copy constructor) not viable: requires 1 argument, but 2 were provided
class Gem5Wrapper

Thanks

stoul invalid argument, ramulator terminated

I installed ramulator-cputrace and the cpu.trace runs fine, but when I run any file from cputraces folder, it shows the following error.

./ramulator-cputrace cputraces/481.wrf.gz
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoul
Aborted (core dumped)

Can you please help.

ramulator addressing

Hi,
I have a very simple question about DRAM capacity provided in the Main.cpp. For example the DDR3 configuration says it is: DDR3_8Gb_x8 with one channel and one rank. Does it mean that the total DDR3 capacity here is 8GByte [(8Gb*8)/8] or is it only 8Gbit. From the address generation mechanism I found it to be 8 GByte because usually traces have byte level addressing and for this particular configuration we can use 33 bit for addressing (as I computed 33 = 6(ignore) + 8(col) + 3(bank) + 16(row)). But as the page size calculation suggests, each page size is 2 KByte (2^14 bits). But with 2^8 (256) addressable columns (each of which stores 64 bytes of data), page size will be 16 KByte (2^8 * 64 * 8 bits).
Please let me know if I am misunderstanding any part. I am using the bugfix branch.

Thanks

Time related statistics in the simulation output?

Hi,
In the simulation output (for the CPU mode), there is only the number of cycles instead of the simulated time. So if I want to get the actual time, I need to find the duration of the cycle (tCK) in the corresponding header file and multiply it with the total number of DRAM cycles. Am I correct?

Could you add "Simulated Time" and "Actual Throughput (Bytes / Seconds)" to the simulation output since I believe they are real important metrics for evaluating system performance ?

Thanks a lot!

Can ramulator affect the simulation time in Gem5?

Except the ramulator part, I have run the Gem5 twice with the same configurations. But surprisingly, the simulation times ( "sim_seconds" in stats.txt ) are exactly the same. Did I miss anything or has the ramulator wired the timing mechanism to Gem5?

How have I run Gem5?
./gem5.opt configs/example/se.py --mem-type=ramulator --mem-size=2MB --ramulator-config=gem5-config1.cfg --caches --l2cache -c [my_executable]

./gem5.opt configs/example/se.py --mem-type=ramulator --mem-size=2MB --ramulator-config=gem5-config2.cfg --caches --l2cache -c [my_executable]

For config1.cfg:
standard = WideIO
channels = 4
ranks = 1
speed = WideIO_266
org = WideIO_8Gb

For config2.cfg:
standard = DDR3
channels = 4
ranks = 1
speed = DDR3_1600K
org = DDR3_4Gb_x8

Thanks a lot in advance!

Timing (cycle) accurate simulation

I have some questions regarding Ramulator and Ramulator+Gem5.

Starting with ramumator, how were the memory traces provided in cpu.trace folder generated? Meaning what were the system specifications that generated those traces (memory capacity, type, processor type, etc)

About running Ramulator+Gem5 in order to get timing accurate (cycle accurate) results do I need to run gem5 in full system (FS) mode or is system emulation (SE) mode sufficient. To make the question clear think of the simple example of pointer chasing that miss in the last level cache and have to retrieve the data from RAM. So each time I need to go to memory to read the address of the next memory I want to load. The time that it takes RAM to respond determines the time that the next memory request will be issued. Its important for me to have this type of accuracy for some experiments that I want to run. So does SE (maybe with cpu flag set at detailed) suffice or do I need a FS?

Thanks
George

Failed to compile with Gem5

I have cloned the Gem5 and reverted to the version 10231, and patched the gem5.

But when I try to compile the gem5, it also prompt following error:

[ LINK] -> ARM/gem5.opt
Undefined symbols for architecture x86_64:
"ramulator::ALDRAM::aldram_timing(ramulator::ALDRAM::Temp)", referenced from:
ramulator::Controllerramulator::ALDRAM::update_temp(ramulator::ALDRAM::Temp) in libramulator.a(Controller.os)
"ramulator::TLDRAM::standard_name", referenced from:
ramulator::Controllerramulator::TLDRAM::issue_cmd(ramulator::TLDRAM::Command, std::__1::vector<int, std::__1::allocator > const&) in libramulator.a(Controller.os)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
scons: *** [build/ARM/gem5.opt] Error 1
scons: building terminated because of errors.

I have tried different architectures in building gem5, such as "scons build/x86/gem5.opt" and "scons build/ARM/gem5.opt", the same error occurs.

Following is my OS and GCC configuration:

shawnlessdeMacBook-Pro:src shawnless$ uname -a
Darwin shawnlessdeMacBook-Pro.local 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

shawnlessdeMacBook-Pro:src shawnless$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.2.0
Thread model: posix
shawnlessdeMacBook-Pro:src shawnless$

Any suggestion ?

How to simulate RowClone?

I have not found how to simulate RowClone using ramulator.
What ISA should be used for it?

Failure in installing Ramulator

Hi,
When I try to compile Ramulator, I met the following errors:

In file included from src/Main.cpp:1:
In file included from src/Processor.h:4:
In file included from src/Request.h:4:
In file included from /usr/include/c++/4.8/vector:62:
/usr/include/c++/4.8/bits/stl_construct.h:75:38: error: call to
implicitly-deleted copy constructor of 'std::basic_ofstream'
{ ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/4.8/bits/stl_uninitialized.h:75:8: note: in instantiation of
function template specialization
'std::_Constructstd::basic_ofstream<char, std::basic_ofstream >'
requested here
std::_Construct(std::__addressof(___cur), *__first);
^
/usr/include/c++/4.8/bits/stl_uninitialized.h:117:2: note: in instantiation of
function template specialization
'std::__uninitialized_copy::__uninit_copy<std::move_iterator<std::b asic_ofstream
*>, std::basic_ofstream *>' requested here
__uninit_copy(__first, __last, __result);
^
/usr/include/c++/4.8/bits/stl_uninitialized.h:258:19: note: in instantiation of
function template specialization
'std::uninitialized_copystd::move_iterator<std::basic_ofstream<char *>,
std::basic_ofstream *>' requested here
{ return std::uninitialized_copy(__first, __last, __result); }
^
/usr/include/c++/4.8/bits/stl_uninitialized.h:279:19: note: in instantiation of
function template specialization
'std::__uninitialized_copy_astd::move_iterator<std::basic_ofstream<char
*>, std::basic_ofstream *, std::basic_ofstream >' requested
here
return std::__uninitialized_copy_a
^
/usr/include/c++/4.8/bits/vector.tcc:413:15: note: in instantiation of function
template specialization
'std::__uninitialized_move_if_noexcept_astd::basic_ofstream<char *,
std::basic_ofstream *, std::allocatorstd::basic_ofstream >'
requested here
= std::__uninitialized_move_if_noexcept_a
^
/usr/include/c++/4.8/bits/vector.tcc:101:4: note: in instantiation of function
template specialization 'std::vectorstd::basic_ofstream<char,
std::allocatorstd::basic_ofstream
>::_M_emplace_back_auxstd::basic_string' requested here
_M_emplace_back_aux(std::forward<_Args>(_args)...);
^
src/SpeedyController.h:67:33: note: in instantiation of function template
specialization 'std::vectorstd::basic_ofstream<char,
std::allocatorstd::basic_ofstream
>::emplace_backstd::basic_string' requested here
cmd_trace_files.emplace_back(prefix + to_string(i) + suffix);
^
src/Main.cpp:52:44: note: in instantiation of member function
'ramulator::SpeedyControllerramulator::DDR3::SpeedyController' requested
here
SpeedyController ctrl = new SpeedyController(channel);
^
/usr/include/c++/4.8/fstream:599:28: note: copy constructor of
'basic_ofstream<char, std::char_traits >' is implicitly deleted
because base class 'basic_ostream<char, std::char_traits >' has a
deleted copy constructor
class basic_ofstream : public basic_ostream<_CharT,_Traits>
^
/usr/include/c++/4.8/ostream:58:27: note: copy constructor of
'basic_ostream<char, std::char_traits >' is implicitly deleted
because base class 'basic_ios<char, std::char_traits >' has a
deleted copy constructor
class basic_ostream : virtual public basic_ios<_CharT, _Traits>
^
/usr/include/c++/4.8/bits/basic_ios.h:66:23: note: copy constructor of
'basic_ios<char, std::char_traits >' is implicitly deleted because
base class 'std::ios_base' has an inaccessible copy constructor
class basic_ios : public ios_base
^
1 error generated.
Makefile:14: recipe for target 'ramulator-dramtrace' failed
make: *** [ramulator-dramtrace] Error 1

My OS is openSUSE 13.2 (Harlequin) (x86_64), LLVM is 3.6.2 and gcc is 4.8.3...

Do you have any clue why these errors happen? Which environment is recommended?

Thanks a lot in advance.

Problem in MemoryFactory<T>::create()

Hi,

I am using Gem5+Ramulator. When one Gem5Wrapper object is created, the corresponding MemoryFactory::create() will be called. When T is WideIO2, this function is specialized by MemoryFactory::create() implemented in MemoryFactory.cpp (line 35).

The weird thing is, when running Gem5, if I configure ramulator as WideIO2, it still uses the most general template (MemoryFactory.h line 52) instead of the specialized one. However, I go back to the standalone ramulator and add a main() which only initialize a Gem5Wrapper with WideIO2, which correctly uses the specialized one. Could you check whether you have the same problem? If I am the only one who has the problem, could you give me some clue to resolve it?

I just use the default compile setting, my gcc version is 4.8.3 and clang version is 3.6.2

Thanks a lot!

Running GEM5+Ramulator with se.py+ruby with the new patch

I could successfully compile GEM5+ramulator environment with SGwithADD's recent(yesterday) patch.
However, when I tried to run some benchmarks, it fails showing messages as follows.

gem5.opt: build/ramulator/Ramulator/src/Config.cpp:12: void ramulator::Config::parse(const string &): Assertion `file.good() && "Bad config file"' failed.
gem5 Simulator System. http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 compiled Aug 6 2015 15:03:19
gem5 started Aug 7 2015 11:38:22
gem5 executing on dongha-VirtualBox
command line: ./build/X86_MESI_Two_Level/gem5.opt ./configs/example/se.py --mem-type=ramulator --ramulator-config=DDR3 -c mcf_base.amd64-m64-gcc43-nn -o ./inp.in
ramulator added
Global frequency set at 1000000000000 ticks per second
Program aborted at tick 0

I followed the README file which says "Run gem5 with --mem-type=ramulator and --ramulator-config=DDR3".
But it seems ramulator fails to parse the '--ramulator-config=DDR3' option.

So my questions are...

Do I have to modify se.py or other GEM5 files? Or am I missing something in the command line? If so, can you help me how?

And also, when I enable ruby with '--ruby' option, it seems gem5 does not include ramulator.
2) Do I have to modify se.py to enable ramulator with ruby?

Anyway, I was lucky to find the patch as soon as I started using ramulator. Thanks.

error: no matching member function for call to 'erase'

I got this error :

error: no matching member function for call to 'erase'

when I tried to build gem5.opt in X86 architecture in gem5-stable.

Do I miss anything?

x86 Gem5/Ramulator Segmentation Fault

Hi all,

I have a problem trying to build gem5 with ramulator when compiling for x86 or ARM architecture. Curiously this problem does not occur for ALPHA architecture.

When building gem5 with ramulator for x86 I get the following errors/warnings:

[ CXX] X86/arch/x86/stacktrace.cc -> .po
build/X86/arch/x86/process.cc:73:18: warning: unused variable 'NumArgumentRegs' [-Wunused-const-variable]
static const int NumArgumentRegs = sizeof(ArgumentReg) / sizeof(const int);
^
build/X86/arch/x86/process.cc:81:18: warning: unused variable 'NumArgumentRegs32' [-Wunused-const-variable]
static const int NumArgumentRegs32 = sizeof(ArgumentReg) / sizeof(const int);
^

[ CXX] X86/mem/cache/cache.cc -> .po
build/X86/mem/ramulator.cc:74:10: warning: unused variable 'addr' [-Wunused-variable]
long addr = resp_queue.front()->getAddr();
^

[ SHCXX] ramulator/Ramulator/src/Config.cpp -> .os
build/fputils/fp80.c:51:21: warning: unused variable 'fp64_pinf' [-Wunused-const-variable]
static const fp64_t fp64_pinf = BUILD_FP64(0, 0, FP64_EXP_SPECIAL);
^
build/fputils/fp80.c:52:21: warning: unused variable 'fp64_ninf' [-Wunused-const-variable]
static const fp64_t fp64_ninf = BUILD_FP64(1, 0, FP64_EXP_SPECIAL);
^

In order to resolve that, I removed the -Wall flag. Then I am able to build gem5 but when I try to test it or run a full system simulation I get a "Segmentation Fault" error before the simulation even starts.

$ ./build/X86/gem5.prof configs/example/se.py -c tests/test-progs/hello/bin/x86/linux/hello
Segmentation fault (core dumped)

I am using clang 3.5, libstdc++, swig 3.0.2, gperftools are installed, I can build and run successfully gem5 and ramulator independently. These problems only arise when I try to include ramulator with gem5. Can anyone help?

Thanks,
George

Gem5+Ramulator Error when running SPEC2006 benchmarks

Hi all,

I came across a problem trying to test Gem5+Ramulator for SPEC2006.

I installed SPEC2006 and then I created static binaries for each benchmark. I tried to test them with gem5+ramulator but an error would pop up during their system emulation execution.

ERROR:
gem5.opt: build/X86/mem/request.hh:567: int Request::contectId() const: Assertion 'privateFlags.isSet(VALID_CONTEXT_ID)' failed

I run them like the example bellow:

build/X86/gem5.opt configs/example/se.py --cpu-type=detailed --caches --mem-type=ramulator --ramulator-config=/path_to_conf/DDR3-config.cfg -c /path_to_SPEC2006/staticbinaries/bzip2 -o /path_to_SPEC2006/input_for_bzip2

The emulations starts but it gets interrupted from the error mentioned above. Same problem for all the benchmarks I tried.

Simple programs like hello world execute just fine. This problem does not appear when I run gem5 without ramulator.

Any ideas how to fix that? Thanks

Integration of ramulator with multi2sim

Hi,

I am from the team that develops the multi2sim simulator. Currently we don't have a DRAM model in our simulator and Ramulator seems to be a really good choice for integration. We would like to integrate it with multi2sim.
Could anyone tell me what are the relevant files to look at for this and what is the best way to proceed for this. Will be glad if we could have your support. Thanks

New ramulator running forever in cpu mode

Hi,
I was trying out the most recent ramulator downloaded today (1/23/2018), in cpu mode it is running forever with the simple cpu.trace that comes with ramulator.

Any help on this will be greatly appreciated.

Thanks

Ramulator runs forever

I tracked it down to this snippet from Controller.h. I grant that read-write arbitration is complex, and an always on MC might be OK like this. (there is no harm in leaving a few buffered writes around). But the simulator never completes and never prints stats. A real MC that did power management would like to purge them.

P.S. You should expose the debug option to print the commands to STDOUT. It would have taken me half the time to track this down.

/*** 3. Should we schedule writes? **/
if (!write_mode) {
// yes -- write queue is almost full or read queue is empty
if (writeq.size() >= int(0.8 * writeq.max)
|| readq.size() == 0) // Hasan did not appear to test this; writes are never purged, sim goes forever
/|| readq.size() == 0) // Hasan: Switching to write mode when there are just a few
// write requests, even if the read queue is empty, incurs a lot of overhead.
// Commented out the read request queue empty condition
*/
write_mode = true;
}

Questions regarding DDR4.cpp

Hi all,

This is not really an issue but a question I have regarding the TimingEntry instance in DDR4.cpp.
WOuld it be possible to explain in brief how this TimingEntry vector container works, why is multiple nCCDS pushed in for a RD command and how would it be used?

Is the current initialization init_timing set for DRAM only?

Kindly explain this feature.
Thanks

Don't Support Read/Write Value ?

Hi all:
I have just inspected few codes of Ramulator, just want to confirm does Ramulator track the read/write value ? The entry in dram.trace only contains address and read/write commands.

Org Table

read request order and callback order

why isn't the read request order the same with callback order in ramulator? This is not the same with real memory access behavior. How to make it ordered?

DRAM<T>:::update_timing() code

When I was inspecting code of DRAM.h, I was confused with DRAM.h:356 and DRAM.h:375. It says

for (auto& t : timing[int(cmd)]) {

but timing is a private member of DRAM which is initialized in constructor with

timing = spec->timing[int(level)];

Where spec->timing is a two dimensional data for all Spec. So DRAM.timing is a vector and DRAM.timing[ int(cmd)] is a single structure, not a vector.

Why the DRAM::update_timing() uses the for clause to traverse this scalar object ?

GEM5+Ramulator run issue

Hi all,

I'm trying to run GEM5+Ramulator.

I'm having this error:

gem5.opt: build/X86/mem/request.hh:567: int Request::contextId() const: Assertion `privateFlags.isSet(VALID_CONTEXT_ID)' failed.

Anyone can help?

Ramulator + ZSim

Hello all,

I know it is possible to use Ramulator with Gem5.
However, did you maybe consider connecting Ramulator with ZSim, would it be possible?

Thanks!
Best,
Milan

Hi,something about address mapping in the perfect ramulator

It's my honor to meet so perfect monitor ,ramulator.
I'm a beginner about MC and DRAM, all of my learning is base on ramulator.
I have some questions about its address mapping.Please help me .thank you very much!

In my view,nowadays address mapping is very complex,such as xor or segmetation. In the ramulator's memory part codes, there are only two address mapping types :ChRaBaRoCo and RoBaRaCoCh,
I think xor and segmetation etc is necessary,Maybe there are some code about these I haven't found,
So ,please help me to solve the newest type of address mapping.
Thank you very much!! :)

Ramulator+Gem5 Width (and Latency?) change - MemoryFactory

I noticed that in MemoryFactory.h file the width of the channel is multiplied by the variable gang_number in order to make it equal with the cacheline (cache block size in bytes) variable.

When you do that, shouldn't you also change the read_latency of the channel? That means that similar to the width
spec->read_latency *= gang_number;
Because you assume that you can stream double the amount of data for each reqeust?

Just wanted to make sure I am not missing something.

HBM IPC numbers

Hi,

When I run CPU trace simulation, I am seeing IPC numbers for HMC contrary to expectation. For some reason, these numbers do not follow the trend shown in the CAL paper. Attached a plot showing the results I am seeing.

To debug the issue, I put this assert inside Memory.h header file. Turns out it never got called for HBM (as a note: it did got invoked for WideIO2)

bool send(Request req)
{
...
// dispatch to the right channel
assert(req.addr_vec[0] <= 0);
return ctrls[req.addr_vec[0]]->enqueue(req);
}

Can somebody provide pointers on what could be the issue?

Thanks,
Kunal

How to set rowpolicy to Closed?

I would like to use autoprecharge.

Problem with Ramulator.

To use ramulator.
I followed README.md

Memory trace Driven
$ cd ramulator
$ make -j

$ ./ramulator configs/DDR3-config.cfg --mode=dram dram.trace (here is problem)

~/ramulator$ ls
configs dram.trace LICENSE plot.py test_ddr3.py
cpu.trace gem5-0e86fac7254c-ramulator.patch Makefile README.md test_spec.py
cputraces gem5-stable

~/ramulator$ ./ramulator configs/DDR3-config.cfg --mode=dram dram.trace
-bash: ./ramulator: No such file or directory

what should I do? Please help me~

Cache::send silently fails when it goes to lower levels?

Hello, I first want to thank to the authors for making such useful project. I am using the DRAM part of Ramulator standalone (i.e. not using the main file and Processor.cpp; no GEM5), and everything goes fine until I use a shared cache. In Cache::send, even if lower_cache fails to send the request, this function still returns true while req is lost forever (it is not pushed to any wait_list). Specifically, the code segment is:

if (!is_last_level) {
  lower_cache->send(req);
} else {
  cachesys->wait_list.push_back(
      make_pair(cachesys->clk + latency[int(level)], req));
}
return true;

I try to reproduce this problem with a minimal example. In this example, I use 2 L1 and 1 shared L2 cache(s), sending read request 0~9 for both cores.

shared_ptr<CacheSystem> csys(new CacheSystem(configs, mem_send));
csys->first_level = Cache::Level::L1;
csys->last_level = Cache::Level::L2;
// Cache: direct map, line size = 1, size = 64
const int MSHR = 2;
const int SIZE = 64;
Cache *priv0 = new Cache(SIZE, 1, 1, MSHR, Cache::Level::L1, csys);
Cache *priv1 = new Cache(SIZE, 1, 1, MSHR, Cache::Level::L1, csys);
Cache *share = new Cache(SIZE, 1, 1, 2*MSHR, Cache::Level::L2, csys);
priv0->concatlower(share);
priv1->concatlower(share);
for (int i = 0, j0 = 0, j1 = 0; i < 1000; ++i) {
    if (j0 < 10) {
        if (priv0->send(Request(j0, Request::Type::READ, [share](Request r) {share->callback(r);}))) {
            j0++;
        }
    }
    if (j1 < 10) {
        if (priv1->send(Request(j1+1024, Request::Type::READ, [share](Request r) {share->callback(r);}))) {
            j1++;
        }
    }
    csys->tick();
    mem->tick();
}

During the first call of priv1->send(...), since core 0 has locked the only line (since L1, L2 are direct map), the underlying lower_cache->send(req); fails at cache_set_unavailable++; return false;. However I don't have a simple way to know this happens.

So my problem is: is this a correct behavior; or am I doing something wrong? The full listing is attached as my_main.txt and can be compiled by (g++7.3.1; executed after a standard Ramulator compilation):

g++ my_main.cpp obj/*.o -DRAMULATOR -g

SALP-1 Assertion Failing with ChRaBaRoCo Addressing

When using ramulator-cputrace with the provided cpu.trace file, an assertion is triggered:

ramulator-cputrace: src/Scheduler.h:168: void ramulator::RowTableramulator::SALP::update(typename T::Command, const vector &, long) [T = ramulator::SALP]: Assertion `match->second.row == row' failed.

This assertion occurs using ChRaBaRoCo addressing if the following code is added to src/Main.cpp:

SALP* salp1_8 = new SALP(SALP::Org::SALP_4Gb_x8, SALP::Speed::SALP_1600K, SALP::Type::SALP_1, 8);
IPC = run_simulation(salp1_8, argv[1], 1, 1, 4, 1);
printf("%10s: %.5lf\n", "SALP-1", IPC / baseIPC);

Aside from the addressing mode, no other modifications have been made. The error does not appear using cpu.trace for RoBaRaCoCh addressing, though such errors may occur for a longer trace.

Gem5-stable Building fails because of -Wno-undefined-bool-conversion (Unknown Warning

Hi Ramulator!

I have an issue when I tried to build the Gem5 Driven Section.

scons: Building targets ...
 [ISA DESC] ARM/arch/arm/isa/main.isa -> generated/inc.d
 [NEW DEPS] ARM/arch/arm/generated/inc.d -> arm-deps
 [ENVIRONS] arm-deps -> arm-environs
 [     CXX] ARM/sim/main.cc -> .o
error: unknown warning option '-Wno-undefined-bool-conversion'; did you mean '-Wno-bool-conversion'? [-Werror,-Wunknown-warning-option]
scons: *** [build/ARM/sim/main.o] Error 1
scons: building terminated because of errors.

I tried to locate this warning an I find it in gem5-0e86fac7254c-ramulator.patch line 37.
(It turns out that the error has a relation with my clang or g++ version, but I'm not sure whether it is right or not.)
** clang version : 3.4-1ubuntu3
** g++ version : Ubuntu 4.8.4-2ubuntu1~14.04

So, could you give me any suggestion?

Ramulator cycles is not included as part of Gem5 system sim time?!

Hi there,

I have run GEM5 with Ramulator. I have noticed that the simulation time of GEM5 does not include Ramulator simulation time.

For example:

I ran two simulations with the same benchmark:

GEM5+Ramulator using HBM
GEM5+Ramulator using DDR4

Run command:
./build/X86/gem5.opt -d m5out/test_HBM ./configs/example/fs.py --cpu-clock=1GHz --caches --l2cache --l1d_size=64kB --checkpoint-dir=m5out/cpt_general -r 1 --script=[my_benchmark] --mem-type=ramulator --ramulator-config=configs/HBM-config.cfg --cpu-type=detailed

Surprisingly, in the "stats.txt" file, the simulation time for DDR4 is less than the simulation for HBM for only few cycles.

Why Ramulator simulation time is not included in GEM5 simulation time?

Gem5-stable Building fails because of -Wno-undefined-bool-conversion (Unknown Warning)

Hi Ramulator!

I have an issue when I tried to build the Gem5 Driven Section.

'''
scons: Building targets ...
[ISA DESC] ARM/arch/arm/isa/main.isa -> generated/inc.d
[NEW DEPS] ARM/arch/arm/generated/inc.d -> arm-deps
[ENVIRONS] arm-deps -> arm-environs
[ CXX] ARM/sim/main.cc -> .o
error: unknown warning option '-Wno-undefined-bool-conversion'; did you mean '-Wno-bool-conversion'? [-Werror,-Wunknown-warning-option]
scons: *** [build/ARM/sim/main.o] Error 1
scons: building terminated because of errors.
'''

What is the length of one read request?

column address bit width

I'm a little confused by the column address bit width when I'm tracing the CMD of dram controller.

The column address bit width is subtracted by log2(prefetch_size) as at src/Memory.h:122;

addr_bits[int(T::Level::MAX) - 1] -= calc_log2(spec->prefetch_size);

This cutting causes incorrect cmd trace. For example, suppose organization is 2Gbx8, channel with is 64. A single read command would return 8chip * 8bit * 8burst = 64 byte data.

the address send to memory controller should start at 0 with 64 increment, that is:
0x00000000
0x00000040
0x00000080

the column address should start at 0 with 8 increment , that is 0x0, 0x8, 0x10. But actually the cmd trace shows column address increases with unit.
ACT 1: 0 0 0 0 0
RD 12: 0 0 0 0 0
RD 16: 0 0 0 0 1
RD 20: 0 0 0 0 2

I know that the low bits ( 3 bit for DDR3) is used only for burst order, not for addressing. I'm wondering whether the actually address send to dram is exactly as the trace shows. If it is, please correct me if I misunderstand the column address of dram.

Thanks a lot in advance.

cmu-safari / ramulator Goto Github PK

ramulator's Introduction

Ramulator: A DRAM Simulator

Usage

Getting Started

Simulation Output

Reproducing Results from Paper (Kim et al. [1])

Debugging & Verification (Section 4.1)

Comparison Against Other Simulators (Section 4.2)

Cross-Sectional Study of DRAM Standards (Section 4.3)

Other Tips

Power Estimation

Contributors

Acknowledgments

ramulator's People

Contributors

Stargazers

Watchers

Forkers

ramulator's Issues

Thanks a lot.

Following is my OS and GCC configuration:

$ ./ramulator configs/DDR3-config.cfg --mode=dram dram.trace (here is problem)

Recommend Projects

Recommend Topics

Recommend Org