neurosim / 3d_neurosim_v1.0 Goto Github PK
View Code? Open in Web Editor NEWBenchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration
Benchmark framework of 3D integrated CIM accelerators for popular DNN inference, support both monolithic and heterogeneous 3D integration
How can we get the floorplan for H3D pipeline system?
I try to get the resullt of the 2D 7nm SRAM. I use 8-bit VGG-8 network on CIFAR-10 dataset. The VGG-8 network model is from DNN_NeuroSim_V1.4.
I set memcelltype = 1, novelMapping = true, SARADC = true, validated = false, synchronous = false, pipeline = false, M3D = false, technode = 7, featuresize = 18e-9, wireWidth = 1, levelOutput = 16, cellBit = 1, heightInFeatureSizeSRAM = 16, widthInFeatureSizeSRAM = 34.43, widthSRAMCellNMOS = 1, numColMuxed = 8
But I get the readDynamicEnergy is: 9.62642e+07pJ. It is different with the result in 'Benchmarking Monolithic 3D Integration for Compute-in-Memory Accelerators: Overcoming ADC Bottlenecks and Maintaining Scalability to 7nm or Beyond ' which is:
Area: 8.36mm^2, TOPS/W: 30.30, TOPS: 1.95, Power Density: 7.72e-03 W/mm^2, latency: 600us, dynamic energy: 35uJ
Do you have any suggestions to help me get the results similar to those in the paper?
My result is here.
------------------------------ Summary --------------------------------
ChipArea : 9.46458e+06um^2
Chip total CIM array : 3.52389e+06um^2
Total IC Area on chip (Global and Tile/PE local): 931046um^2
Total ADC (or S/As and precharger for SRAM) Area on chip : 2.04312e+06um^2
Total Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) on chip : 1.80574e+06um^2
Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, pooling and activation units) : 1.16078e+06um^2
Chip layer-by-layer readLatency (per image) is: 603729ns
Chip total readDynamicEnergy is: 9.62642e+07pJ
Chip total leakage Energy is: 6.02362e+06pJ
Chip total leakage Power is: 7531.8uW
Chip buffer readLatency is: 314434ns
Chip buffer readDynamicEnergy is: 236904pJ
Chip ic readLatency is: 65154.7ns
Chip ic readDynamicEnergy is: 3.45468e+06pJ
************************ Breakdown of Latency and Dynamic Energy *************************
----------- ADC (or S/As and precharger for SRAM) readLatency is : 173409ns
----------- Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) readLatency is : 10241.2ns
----------- Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, IC, pooling and activation units) readLatency is : 420079ns
----------- ADC (or S/As and precharger for SRAM) readDynamicEnergy is : 8.11379e+07pJ
----------- Accumulation Circuits (subarray level: adders, shiftAdds; PE/Tile/Global level: accumulation units) readDynamicEnergy is : 8.23443e+06pJ
----------- Other Peripheries (e.g. decoders, mux, switchmatrix, buffers, IC, pooling and activation units) readDynamicEnergy is : 6.8919e+06pJ
************************ Breakdown of Latency and Dynamic Energy *************************
----------------------------- Performance -------------------------------
Chip Operation Temperature (K): 313
Energy Efficiency TOPS/W (Layer-by-Layer Process): 12.0428
Throughput TOPS (Layer-by-Layer Process): 2.04038
Throughput FPS (Layer-by-Layer Process): 1656.37
Compute efficiency TOPS/mm^2 (Layer-by-Layer Process): 0.21558
Power Density W/mm^2 (Layer-by-Layer Process): 0.0179011
-------------------------------------- Hardware Performance Done --------------------------------------
My 'Param.cpp' is here.
Param::Param() {
/***************************************** user defined design options and parameters *****************************************/
operationmode = 2; // 1: conventionalSequential (Use several multi-bit RRAM as one synapse)
// 2: conventionalParallel (Use several multi-bit RRAM as one synapse)
memcelltype = 1; // 1: cell.memCellType = Type::SRAM
// 2: cell.memCellType = Type::RRAM
// 3: cell.memCellType = Type::FeFET
accesstype = 1; // 1: cell.accessType = CMOS_access
// 2: cell.accessType = BJT_access
// 3: cell.accessType = diode_access
// 4: cell.accessType = none_access (Crossbar Array)
transistortype = 1; // 1: inputParameter.transistorType = conventional
deviceroadmap = 2; // 1: inputParameter.deviceRoadmap = HP
// 2: inputParameter.deviceRoadmap = LSTP
globalBufferType = false; // false: register file
// true: SRAM
globalBufferCoreSizeRow = 128;
globalBufferCoreSizeCol = 128;
tileBufferType = false; // false: register file
// true: SRAM
tileBufferCoreSizeRow = 32;
tileBufferCoreSizeCol = 32;
peBufferType = false; // false: register file
// true: SRAM
chipActivation = true; // false: activation (reLu/sigmoid) inside Tile
// true: activation outside Tile
reLu = true; // false: sigmoid
// true: reLu
novelMapping = true; // false: conventional mapping
// true: novel mapping
SARADC = true; // false: MLSA
// true: sar ADC
currentMode = true; // false: MLSA use VSA
// true: MLSA use CSA
pipeline = false; // false: layer-by-layer process --> huge leakage energy in HP
// true: pipeline process
speedUpDegree = 8; // 1 = no speed up --> original speed
// 2 and more : speed up ratio, the higher, the faster
// A speed-up degree upper bound: when there is no idle period during each layer --> no need to further fold the system clock
// This idle period is defined by IFM sizes and data flow, the actual process latency of each layer may be different due to extra peripheries
validated = false; // false: no calibration factors
// true: validated by silicon data (wiring area in layout, gate switching activity, post-layout performance drop...)
synchronous = false; // false: asynchronous
// true: synchronous, clkFreq will be decided by sensing delay
M3D = false; // false: run 2D simulation
// true: run M3D simulation
/*** algorithm weight range, the default wrapper (based on WAGE) has fixed weight range of (-1, 1) ***/
algoWeightMax = 1;
algoWeightMin = -1;
/*** conventional hardware design options ***/
clkFreq = 1e9; // Clock frequency
temp = 300; // Temperature (K)
// technode: 130 --> wireWidth: 175
// technode: 90 --> wireWidth: 110
// technode: 65 --> wireWidth: 105
// technode: 45 --> wireWidth: 80
// technode: 32 --> wireWidth: 56
// technode: 22 --> wireWidth: 40
// technode: 14 --> wireWidth: 25
// technode: 10, 7 --> wireWidth: 18
technode = 7; // Technology
featuresize = 18e-9; // Wire width for subArray simulation
wireWidth = 18; // wireWidth of the cell for Accuracy calculation
globalBusDelayTolerance = 0.1; // to relax bus delay for global H-Tree (chip level: communication among tiles), if tolerance is 0.1, the latency will be relax to (1+0.1)*optimalLatency (trade-off with energy)
localBusDelayTolerance = 0.1; // to relax bus delay for global H-Tree (tile level: communication among PEs), if tolerance is 0.1, the latency will be relax to (1+0.1)*optimalLatency (trade-off with energy)
treeFoldedRatio = 4; // the H-Tree is assumed to be able to folding in layout (save area)
maxGlobalBusWidth = 2048; // the max buswidth allowed on chip level (just a upper_bound, the actual bus width is defined according to the auto floorplan)
// NOTE: Carefully choose this number!!!
// e.g. when use pipeline with high speedUpDegree, i.e. high throughput, need to increase the global bus width (interface of global buffer) --> guarantee global buffer speed
numRowSubArray = 128; // # of rows in single subArray
numColSubArray = 128; // # of columns in single subArray
/*** option to relax subArray layout ***/
relaxArrayCellHeight = 0; // relax ArrayCellHeight or not
relaxArrayCellWidth = 0; // relax ArrayCellWidth or not
numColMuxed = 8; // How many columns share 1 ADC (for eNVM and FeFET) or parallel SRAM
levelOutput = 16; // # of levels of the multilevelSenseAmp output, should be in 2^N forms; e.g. 32 levels --> 5-bit ADC
cellBit = 1; // precision of memory device
/*** parameters for SRAM ***/
// due the scaling, suggested SRAM cell size above 22nm: 160F^2
// SRAM cell size at 14nm: 300F^2
// SRAM cell size at 10nm: 400F^2
// SRAM cell size at 7nm: 600F^2
heightInFeatureSizeSRAM = 16; // SRAM Cell height in feature size
widthInFeatureSizeSRAM = 34.43; // SRAM Cell width in feature size
widthSRAMCellNMOS = 1;
widthSRAMCellPMOS = 1;
widthAccessCMOS = 1;
minSenseVoltage = 0.1;
/*** parameters for analog synaptic devices ***/
heightInFeatureSize1T1R = 4; // 1T1R Cell height in feature size
widthInFeatureSize1T1R = 12; // 1T1R Cell width in feature size
heightInFeatureSizeCrossbar = 2; // Crossbar Cell height in feature size
widthInFeatureSizeCrossbar = 2; // Crossbar Cell width in feature size
resistanceOn = 6e3; // Ron resistance at Vr in the reported measurement data (need to recalculate below if considering the nonlinearity)
resistanceOff = 6e3*150; // Roff resistance at Vr in the reported measurement dat (need to recalculate below if considering the nonlinearity)
maxConductance = (double) 1/resistanceOn;
minConductance = (double) 1/resistanceOff;
readVoltage = 0.5; // On-chip read voltage for memory cell
readPulseWidth = 10e-9; // read pulse width in sec
accessVoltage = 1.1; // Gate voltage for the transistor in 1T1R
resistanceAccess = resistanceOn*IR_DROP_TOLERANCE; // resistance of access CMOS in 1T1R
writeVoltage = 2; // Enable level shifer if writeVoltage > 1.5V
/*** Calibration parameters ***/
if(validated){
alpha = 1.44; // wiring area of level shifter
beta = 1.4; // latency factor of sensing cycle
gamma = 0.5; // switching activity of DFF in shifter-add and accumulator
delta = 0.15; // switching activity of adder
epsilon = 0.05; // switching activity of control circuits
zeta = 1.22; // post-layout energy increase
}
/***************************************** user defined design options and parameters *****************************************/
/***************************************** Initialization of parameters NO need to modify *****************************************/
if (memcelltype == 1) {
cellBit = 1; // force cellBit = 1 for all SRAM cases
}
/*** initialize operationMode as default ***/
conventionalParallel = 0;
conventionalSequential = 0;
BNNparallelMode = 0;
BNNsequentialMode = 0;
XNORsequentialMode = 0;
XNORparallelMode = 0;
switch(operationmode) {
case 6: XNORparallelMode = 1; break;
case 5: XNORsequentialMode = 1; break;
case 4: BNNparallelMode = 1; break;
case 3: BNNsequentialMode = 1; break;
case 2: conventionalParallel = 1; break;
case 1: conventionalSequential = 1; break;
default: printf("operationmode ERROR\n"); exit(-1);
}
/*** parallel read ***/
parallelRead = 0;
if(conventionalParallel || BNNparallelMode || XNORparallelMode) {
parallelRead = 1;
} else {
parallelRead = 0;
}
/*** Initialize interconnect wires ***/
switch(wireWidth) {
case 175: AR = 1.60; Rho = 2.20e-8; break; // for technode: 130
case 110: AR = 1.60; Rho = 2.52e-8; break; // for technode: 90
case 105: AR = 1.70; Rho = 2.68e-8; break; // for technode: 65
case 80: AR = 1.70; Rho = 3.31e-8; break; // for technode: 45
case 56: AR = 1.80; Rho = 3.70e-8; break; // for technode: 32
case 40: AR = 1.90; Rho = 4.03e-8; break; // for technode: 22
case 25: AR = 2.00; Rho = 5.08e-8; break; // for technode: 14
case 18: AR = 2.00; Rho = 6.35e-8; break; // for technode: 7, 10
case -1: break; // Ignore wire resistance or user define
default: exit(-1); puts("Wire width out of range");
}
if (memcelltype == 1) {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSizeSRAM;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSizeSRAM;
} else {
if (accesstype == 1) {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSize1T1R;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSize1T1R;
} else {
wireLengthRow = wireWidth * 1e-9 * heightInFeatureSizeCrossbar;
wireLengthCol = wireWidth * 1e-9 * widthInFeatureSizeCrossbar;
}
}
Rho *= (1+0.00451*abs(temp-300));
if (wireWidth == -1) {
unitLengthWireResistance = 1.0; // Use a small number to prevent numerical error for NeuroSim
wireResistanceRow = 0;
wireResistanceCol = 0;
} else {
unitLengthWireResistance = Rho / ( wireWidth*1e-9 * wireWidth*1e-9 * AR );
wireResistanceRow = unitLengthWireResistance * wireLengthRow;
wireResistanceCol = unitLengthWireResistance * wireLengthCol;
}
/***************************************** Initialization of parameters NO need to modify *****************************************/
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.