hewlettpackard / cacti Goto Github PK
View Code? Open in Web Editor NEWAn integrated cache and memory access time, cycle time, area, leakage, and dynamic power model
Home Page: http://www.hpl.hp.com/research/cacti/
An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model
Home Page: http://www.hpl.hp.com/research/cacti/
The default configuration file (cache.cfg) when given as input to the cacti program compiled with GCC 8 crashes. Upon inspection, I found out that there are some uninitialized variables in InputParameter class; so, I fixed them like following.
diff --git a/io.cc b/io.cc
index 3a798aa..2983dfa 100644
--- a/io.cc
+++ b/io.cc
@@ -63,7 +63,10 @@ InputParameter::InputParameter()
cl_power_gated(false),
interconect_power_gated(false),
power_gating(false),
- cl_vertical (true)
+ cl_vertical (true),
+ dram_ecc(NO_ECC),
+ io_type(DDR3),
+ dram_dimm(UDIMM)
This solves my assertion failures I was encountering earlier. However, the output with the default cache.cfg is still unexpected and different than what you would get if you compile it with older GCC versions like 4.8.
How can I run this on windows?
I'm trying to compile CACTI 6.5 (a clone of this repo) on a Ubuntu 14.04 64-bit system and gcc 4.8.4. make command gives me the following error:
g++ -m32 -Wno-unknown-pragmas -O3 -msse2 -mfpmath=sse -DNTHREADS=4 -Icacti -c cacti/Ucache.cc -o obj_opt/Ucache.o
In file included from /usr/include/c++/4.8/bits/stl_algo.h:59:0,
from /usr/include/c++/4.8/algorithm:62,
from cacti/Ucache.cc:50:
/usr/include/c++/4.8/cstdlib:178:10: error: expected unqualified-id before ‘__int128’
inline __int128
As far as I understand from other occurrences of this error, it's some how related with 32-bit 64-bit difference.
The CACTI in McPAT also gives the same error.
How can I walk around this problem?
Which platform do you prefer for these simulators?
hello
i am a beginner to cacti. i want to evaluate various configurations of instruction cache with cacti but i do not know what are a practical, real l1i cache config file parameters like (number of ports , banks, etc). can any one help me?
Also i want to evaluate another memory with cacti which is not a cache. However when i set he block size to 16 for the aforementioned memory, i got the error : the block size must be more than 64. how can i fix this problem?
Any help will be appreciated
tnx
Hi,
I tried to run the latest build with the example config files, but I got floating-point exceptions. I have tried building the binary with GCC 4.9 and GCC 6.5, and both compilations had the same problem.
Here is the GDB backtrace log from running ./cacti -infile 2DDRAM_micron1Gb.cfg. I appreciate it if you could provide any insight on it!
Program received signal SIGFPE, Arithmetic exception.
0x000000000045e321 in IOTechParam::IOTechParam (this=0x6aa0b0, g_ip=0x0, io_type1=32767, num_mem_dq=-56112, mem_data_width=0, num_dq=6987952, connection=0, num_loads=1, freq=0)
at extio_technology.cc:918
918 (num_dq/mem_data_width)/(g_ip->num_clk/2);
(gdb) bt
#0 0x000000000045e321 in IOTechParam::IOTechParam (this=0x6aa0b0, g_ip=0x0, io_type1=32767, num_mem_dq=-56112, mem_data_width=0, num_dq=6987952, connection=0, num_loads=1, freq=0)
at extio_technology.cc:918
#1 0x000000000041f530 in cacti_interface (infile_name=...) at io.cc:1292
#2 0x0000000000411e64 in main (argc=0, argv=0x0) at main.cc:78
Thanks,
Andy
The oldest support technology in the current version is 90nm. I wonder if there is anyway to make it work for 130nm? Or where could I find the 5.1 version, as it works for 130nm mentioned in the paper?
Hi,
I trying to use the 3D dram config file, but I getting some errors.
At first, I ran as it is in the repository and it outputs a float pointing exception. I found the source of exception by debugging it and I included the following parameters (mem_data_width = 8 and num_clk = 2) as a workaround.
Then, I ran it again and found some weird values for timing (tRCD, tRAS, tCAS), power and TSV components. Some of them presents values up to e+107 ns and e+54 nJ. When I run with stacked-die-count equals to 1, I get some reasonable values for timing and energy.
Has anyone came across this problem?
I am also interested in the parameters used in the cacti-3DD paper.
Thanks in advance.
Thanks for this impressive tools!
I would like to use CACTI7 to explore the SRAM area, power, delay with different configuration, such as size, read port number, write port number, frequency. But when I update the cfg file, I get the error: ERROR: no valid data array organizations found
. the CFG file is as follow:
# Cache size
-size (bytes) 1024
//-size (bytes) 4096
//-size (bytes) 32768
//-size (bytes) 131072
//-size (bytes) 262144
//-size (bytes) 1048576
//-size (bytes) 2097152
//-size (bytes) 4194304
//-size (bytes) 8388608
//-size (bytes) 16777216
//-size (bytes) 33554432
//-size (bytes) 134217728
//-size (bytes) 67108864
//-size (bytes) 1073741824
//-size (bytes) 1048576
# power gating
-Array Power Gating - "false"
-WL Power Gating - "false"
-CL Power Gating - "false"
-Bitline floating - "false"
-Interconnect Power Gating - "false"
-Power Gating Performance Loss 0.01
# Line size
//-block size (bytes) 8
-block size (bytes) 64
# To model Fully Associative cache, set associativity to zero
//-associativity 0
-associativity 1
//-associativity 4
//-associativity 8
-read-write port 1
-exclusive read port 0
-exclusive write port 0
-single ended read ports 0
# Multiple banks connected using a bus
-UCA bank count 1
-technology (u) 0.028
//-technology (u) 0.040
//-technology (u) 0.032
//-technology (u) 0.090
# following three parameters are meaningful only for main memories
-page size (bits) 8192
-burst length 8
-internal prefetch width 8
# following parameter can have one of five values -- (itrs-hp, itrs-lstp, itrs-lop, lp-dram, comm-dram)
//-Data array cell type - "itrs-hp"
//-Data array cell type - "itrs-lstp"
-Data array cell type - "itrs-lop"
# following parameter can have one of three values -- (itrs-hp, itrs-lstp, itrs-lop)
//-Data array peripheral type - "itrs-hp"
//-Data array peripheral type - "itrs-lstp"
-Data array peripheral type - "itrs-lop"
# following parameter can have one of five values -- (itrs-hp, itrs-lstp, itrs-lop, lp-dram, comm-dram)
//-Tag array cell type - "itrs-hp"
//-Tag array cell type - "itrs-lstp"
-Tag array cell type - "itrs-lop"
# following parameter can have one of three values -- (itrs-hp, itrs-lstp, itrs-lop)
//-Tag array peripheral type - "itrs-hp"
//-Tag array peripheral type - "itrs-lstp"
-Tag array peripheral type - "itrs-lop
# Bus width include data bits and address bits required by the decoder
//-output/input bus width 16
-output/input bus width 512
// 300-400 in steps of 10
-operating temperature (K) 360
# Type of memory - cache (with a tag array) or ram (scratch ram similar to a register file)
# or main memory (no tag array and every access will happen at a page granularity Ref: CACTI 5.3 report)
//-cache type "cache"
-cache type "ram"
//-cache type "main memory"
# to model special structure like branch target buffers, directory, etc.
# change the tag size parameter
# if you want cacti to calculate the tagbits, set the tag size to "default"
-tag size (b) "default"
//-tag size (b) 22
# fast - data and tag access happen in parallel
# sequential - data array is accessed after accessing the tag array
# normal - data array lookup and tag access happen in parallel
# final data block is broadcasted in data array h-tree
# after getting the signal from the tag array
//-access mode (normal, sequential, fast) - "fast"
-access mode (normal, sequential, fast) - "normal"
//-access mode (normal, sequential, fast) - "sequential"
# DESIGN OBJECTIVE for UCA (or banks in NUCA)
-design objective (weight delay, dynamic power, leakage power, cycle time, area) 0:0:0:100:0
# Percentage deviation from the minimum value
# Ex: A deviation value of 10:1000:1000:1000:1000 will try to find an organization
# that compromises at most 10% delay.
# NOTE: Try reasonable values for % deviation. Inconsistent deviation
# percentage values will not produce any valid organizations. For example,
# 0:0:100:100:100 will try to identify an organization that has both
# least delay and dynamic power. Since such an organization is not possible, CACTI will
# throw an error. Refer CACTI-6 Technical report for more details
-deviate (delay, dynamic power, leakage power, cycle time, area) 20:100000:100000:100000:100000
# Objective for NUCA
-NUCAdesign objective (weight delay, dynamic power, leakage power, cycle time, area) 100:100:0:0:100
-NUCAdeviate (delay, dynamic power, leakage power, cycle time, area) 10:10000:10000:10000:10000
# Set optimize tag to ED or ED^2 to obtain a cache configuration optimized for
# energy-delay or energy-delay sq. product
# Note: Optimize tag will disable weight or deviate values mentioned above
# Set it to NONE to let weight and deviate values determine the
# appropriate cache configuration
//-Optimize ED or ED^2 (ED, ED^2, NONE): "ED"
-Optimize ED or ED^2 (ED, ED^2, NONE): "ED^2"
//-Optimize ED or ED^2 (ED, ED^2, NONE): "NONE"
-Cache model (NUCA, UCA) - "UCA"
//-Cache model (NUCA, UCA) - "NUCA"
# In order for CACTI to find the optimal NUCA bank value the following
# variable should be assigned 0.
-NUCA bank count 0
# NOTE: for nuca network frequency is set to a default value of
# 5GHz in time.c. CACTI automatically
# calculates the maximum possible frequency and downgrades this value if necessary
# By default CACTI considers both full-swing and low-swing
# wires to find an optimal configuration. However, it is possible to
# restrict the search space by changing the signaling from "default" to
# "fullswing" or "lowswing" type.
//-Wire signaling (fullswing, lowswing, default) - "Global_30"
-Wire signaling (fullswing, lowswing, default) - "default"
//-Wire signaling (fullswing, lowswing, default) - "lowswing"
//-Wire inside mat - "global"
-Wire inside mat - "semi-global"
//-Wire outside mat - "global"
-Wire outside mat - "semi-global"
-Interconnect projection - "conservative"
//-Interconnect projection - "aggressive"
# Contention in network (which is a function of core count and cache level) is one of
# the critical factor used for deciding the optimal bank count value
# core count can be 4, 8, or 16
//-Core count 4
-Core count 8
//-Core count 16
-Cache level (L2/L3) - "L3"
-Add ECC - "true"
//-Print level (DETAILED, CONCISE) - "CONCISE"
-Print level (DETAILED, CONCISE) - "DETAILED"
# for debugging
-Print input parameters - "true"
//-Print input parameters - "false"
# force CACTI to model the cache with the
# following Ndbl, Ndwl, Nspd, Ndsam,
# and Ndcm values
//-Force cache config - "true"
-Force cache config - "false"
-Ndwl 1
-Ndbl 1
-Nspd 0
-Ndcm 1
-Ndsam1 0
-Ndsam2 0
#### Default CONFIGURATION values for baseline external IO parameters to DRAM. More details can be found in the CACTI-IO technical report (), especially Chapters 2 and 3.
# Memory Type (D3=DDR3, D4=DDR4, L=LPDDR2, W=WideIO, S=Serial). Additional memory types can be defined by the user in extio_technology.cc, along with their technology and configuration parameters.
-dram_type "DDR3"
//-dram_type "DDR4"
//-dram_type "LPDDR2"
//-dram_type "WideIO"
//-dram_type "Serial"
# Memory State (R=Read, W=Write, I=Idle or S=Sleep)
//-io state "READ"
-io state "WRITE"
//-io state "IDLE"
//-io state "SLEEP"
#Address bus timing. To alleviate the timing on the command and address bus due to high loading (shared across all memories on the channel), the interface allows for multi-cycle timing options.
//-addr_timing 0.5 //DDR
-addr_timing 1.0 //SDR (half of DQ rate)
//-addr_timing 2.0 //2T timing (One fourth of DQ rate)
//-addr_timing 3.0 // 3T timing (One sixth of DQ rate)
# Memory Density (Gbit per memory/DRAM die)
-mem_density 4 Gb //Valid values 2^n Gb
# IO frequency (MHz) (frequency of the external memory interface).
-bus_freq 800 MHz //As of current memory standards (2013), valid range 0 to 1.5 GHz for DDR3, 0 to 533 MHz for LPDDR2, 0 - 800 MHz for WideIO and 0 - 3 GHz for Low-swing differential. However this can change, and the user is free to define valid ranges based on new memory types or extending beyond existing standards for existing dram types.
# Duty Cycle (fraction of time in the Memory State defined above)
-duty_cycle 1.0 //Valid range 0 to 1.0
# Activity factor for Data (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)
-activity_dq 1.0 //Valid range 0 to 1.0 for DDR, 0 to 0.5 for SDR
# Activity factor for Control/Address (0->1 transitions) per cycle (for DDR, need to account for the higher activity in this parameter. E.g. max. activity factor for DDR is 1.0, for SDR is 0.5)
-activity_ca 0.5 //Valid range 0 to 1.0 for DDR, 0 to 0.5 for SDR, 0 to 0.25 for 2T, and 0 to 0.17 for 3T
# Number of DQ pins
-num_dq 72 //Number of DQ pins. Includes ECC pins.
# Number of DQS pins. DQS is a data strobe that is sent along with a small number of data-lanes so the source synchronous timing is local to these DQ bits. Typically, 1 DQS per byte (8 DQ bits) is used. The DQS is also typucally differential, just like the CLK pin.
-num_dqs 18 //2 x differential pairs. Include ECC pins as well. Valid range 0 to 18. For x4 memories, could have 36 DQS pins.
# Number of CA pins
-num_ca 25 //Valid range 0 to 35 pins.
# Number of CLK pins. CLK is typically a differential pair. In some cases additional CLK pairs may be used to limit the loading on the CLK pin.
-num_clk 2 //2 x differential pair. Valid values: 0/2/4.
# Number of Physical Ranks
-num_mem_dq 2 //Number of ranks (loads on DQ and DQS) per buffer/register. If multiple LRDIMMs or buffer chips exist, the analysis for capacity and power is reported per buffer/register.
# Width of the Memory Data Bus
-mem_data_width 8 //x4 or x8 or x16 or x32 memories. For WideIO upto x128.
# RTT Termination Resistance
-rtt_value 10000
# RON Termination Resistance
-ron_value 34
# Time of flight for DQ
-tflight_value
# Parameter related to MemCAD
# Number of BoBs: 1,2,3,4,5,6,
-num_bobs 1
# Memory System Capacity in GB
-capacity 80
# Number of Channel per BoB: 1,2.
-num_channels_per_bob 1
# First Metric for ordering different design points
-first metric "Cost"
#-first metric "Bandwidth"
#-first metric "Energy"
# Second Metric for ordering different design points
#-second metric "Cost"
-second metric "Bandwidth"
#-second metric "Energy"
# Third Metric for ordering different design points
#-third metric "Cost"
#-third metric "Bandwidth"
-third metric "Energy"
# Possible DIMM option to consider
#-DIMM model "JUST_UDIMM"
#-DIMM model "JUST_RDIMM"
#-DIMM model "JUST_LRDIMM"
-DIMM model "ALL"
#if channels of each bob have the same configurations
#-mirror_in_bob "T"
-mirror_in_bob "F"
#if we want to see all channels/bobs/memory configurations explored
#-verbose "T"
#-verbose "F"
the output is :
Cache size : 1024
Block size : 64
Associativity : 1
Read only ports : 0
Write only ports : 0
Read write ports : 1
Single ended read ports : 0
Cache banks (UCA) : 1
Technology : 0.028
Temperature : 360
Tag size : 42
array type : Scratch RAM
Model as memory : 0
Model as 3D memory : 0
Access mode : 0
Data array cell type : 2
Data array peripheral type : 2
Tag array cell type : 2
Tag array peripheral type : 2
Optimization target : 2
Design objective (UCA wt) : 0 0 0 100 0
Design objective (UCA dev) : 20 100000 100000 100000 100000
Cache model : 0
Nuca bank : 0
Wire inside mat : 1
Wire outside mat : 1
Interconnect projection : 1
Wire signaling : 0
Print level : 1
ECC overhead : 1
Page size : 8192
Burst length : 8
Internal prefetch width : 8
Force cache config : 0
Subarray Driver direction : 1
iostate : WRITE
dram_ecc : NO_ECC
io_type : DDR3
dram_dimm : UDIMM
IO Area (sq.mm) = inf
IO Timing Margin (ps) = -14.1667
IO Votlage Margin (V) = 0.155
IO Dynamic Power (mW) = 1506.36 PHY Power (mW) = 232.752 PHY Wakeup Time (us) = 27.503
IO Termination and Bias Power (mW) = 2505.96
ERROR: no valid data array organizations found
Hi, I'm Jason Chang
I want to run CACTI 7.0 with your github(https://github.com/HewlettPackard/cacti) or this websie(www.cs.utah.edu/~rajeev/cacti7/) in linux RHEL 6,and
you teach us "define the cache model using cache.cfg", and
run the "cacti" binary <./cacti -infile cache.cfg>; however I can not find "cacti" this file to run ./cacti,
so I want to ask you how to create "cacti" to run ./cacti this command.
Thank you very much!
I've found this link on the web. It says Cacti could generate .lib file. However, I couldn't find a way to do that.
Once I tried to set the associativity for 0, and selected the type "cache". And the error showed me that the "CAM and fully associative must have at least 1 search port"
.
However. I could not find a parameter named "search port"
in the .cfg file given as examples.
Any one can help me?
Most of the config files in the repo lead to either a segmentation fault
or a floating point error
. I basically clone, compile, and run cacti w/o touching anything. I use a Ubuntu 16.04.3 LTS
system with kernel 4.15.0-24-generic
. Could you take a look at that?
You can find a summary of these config files below:
This is what I do:
git clone https://github.com/HewlettPackard/cacti.git Cacti7
cd Cacti7
make
./cacti -infile sample_config_files/lpddr3_cache.cfg
Here are the errors I get with the provided config files:
Config File | Behavior |
---|---|
cache.cfg | Finishes successfully |
ddr3.cfg | Throws Segmentation fault (core dumped) |
dram.cfg | Throws Floating point exception (core dumped) |
lpddr.cfg | Throws Segmentation fault (core dumped) |
2DDRAM_micron1Gb.cfg | Throws Floating point exception (core dumped) |
2DDRAM_Samsung2GbDDR2.cfg | Throws Floating point exception (core dumped) |
3DDRAM_Samsung3D8Gb_extened.cfg | Throws Floating point exception (core dumped) |
sample_config_files/diff_ddr3_cache.cfg | Quits with num_clk should be greater than zero! |
sample_config_files/ddr3_cache.cfg | Throws Segmentation fault (core dumped) |
sample_config_files/lpddr3_cache.cfg | Throws Segmentation fault (core dumped) |
sample_config_files/wideio_cache.cfg | Throws Segmentation fault (core dumped) |
Segmentation fault
:Thread 1 "cacti" received signal SIGSEGV, Segmentation fault.
0x0000000000473d8e in find_all_bobs (memcad_params=0x0) at memcad.cc:396
396 int last_bw =(*memcad_all_channels)[0]->bandwidth;
(gdb) bt
#0 0x0000000000473d8e in find_all_bobs (memcad_params=0x0) at memcad.cc:396
#1 0x0000000000474a1e in solve_memcad (memcad_params=0x7fffffff3310) at memcad.cc:595
#2 0x00000000004207c3 in cacti_interface (infile_name=...) at io.cc:1344
#3 0x0000000000411fd3 in main (argc=0, argv=0x656c69666e692d) at main.cc:78
Segmentation fault happens after logging the Low-swing wire
stats. So until that point some time/area/power components are already reported. But the segmentation fault makes me suspicious about the correctness. Could you check that?
Floating point exception
Program received signal SIGFPE, Arithmetic exception.
0x000000000045fe31 in IOTechParam::IOTechParam (this=0x6ad070, g_ip=0x1400000000, io_type1=32767, num_mem_dq=-51952, mem_data_width=0, num_dq=7000176, connection=0, num_loads=1, freq=0) at extio_technology.cc:918
918 (num_dq/mem_data_width)/(g_ip->num_clk/2);
(gdb) bt
#0 0x000000000045fe31 in IOTechParam::IOTechParam (this=0x6ad070, g_ip=0x1400000000, io_type1=32767, num_mem_dq=-51952, mem_data_width=0, num_dq=7000176, connection=0, num_loads=1, freq=0) at extio_technology.cc:918
#1 0x00000000004206cf in cacti_interface (infile_name=...) at io.cc:1292
#2 0x0000000000411fd3 in main (argc=0, argv=0x656c69666e692d) at main.cc:78
Floating point exception happens directly after printing the configs, so no results are reported.
Hi, I don't know if this is out of the scope of this repo, but I noticed that the web interface at http://quid.hpl.hp.com:9081/cacti/ is no longer running and I wanted to make sure someone knows about it.
Lines 449 to 465 in 1ffd8df
As above, looks like after cmp delay is assigned to access_time
at line 449, access_time
will be overwritten by later lines of code, which means the comparator delay is not counted in the final output?
Hello,
I came across a line of code in parameter.cc that seems like it may be a bug to my colleagues and me, though it could certainly be just my own misunderstanding.
Line 1947:
num_r_subarray = (int)ceil(capacity_per_die / (g_ip->nbanks * g_ip->block_sz * g_ip->data_assoc * Ndbl * Nspd));
specifies the number of rows per subarray, but seems to not take into account that the block size is specified in bytes, not bits. This is seen in the next line that specifies the number of columns:
num_c_subarray = (int)ceil((8 * g_ip->block_sz * g_ip->data_assoc * Nspd / Ndwl));
It seems to me that the line should be
num_r_subarray = (int)ceil(capacity_per_die / (g_ip->nbanks * 8 * g_ip->block_sz * g_ip->data_assoc * Ndbl * Nspd));
Which would be the equivalent to saying:
num_r_subarray = (int)ceil(capacity_per_die / (num_c_subarray * Ndbl));
Which is what we would expect: height = area/width.
Could you please advise me on where/if my reasoning is incorrect?
When running "./cacti -infile ddr3.cfg", it reported error with " ERROR: no valid data array organizations found".
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.