My name is Singularity Chen, a Ph.D. candidate in SCSE at NTU. My interests are architecture design and exploring the fashionable things.
Here are the coding languages that I used (actually, I also use Python and Perl in my private repos ⚡).
Deep Learning Accelerator Based on Eyeriss V2 Architecture with custom RISC-V extended instructions
Detect one kind of combinational loop, which should be broken inner Router Cluster.
firrtl.transforms.CheckCombLoops$CombLoopException: : [module ClusterGroup] Combinational loop detected:
ClusterGroup.PECluster.io_dataPath_pSumIO_inIOs_0_valid
ClusterGroup.PECluster._GEN_1 @[PECluster.scala 44:46]
ClusterGroup.PECluster.muxInPSumDataWire_0_valid @[PECluster.scala 45:28 PECluster.scala 48:28]
ClusterGroup.PECluster.ProcessingElement_8.io_dataStream_ipsIO_valid @[PECluster.scala 42:38]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad.io_dataStream_ipsIO_valid @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad._T_104 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad._T_105 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad.io_dataStream_opsIO_valid @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_8.io_dataStream_opsIO_valid @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.ProcessingElement_4.io_dataStream_ipsIO_valid @[PECluster.scala 40:36]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad.io_dataStream_ipsIO_valid @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad._T_104 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad._T_105 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad.io_dataStream_opsIO_valid @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_4.io_dataStream_opsIO_valid @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.ProcessingElement.io_dataStream_ipsIO_valid @[PECluster.scala 39:38]
ClusterGroup.PECluster.ProcessingElement.Queue_4.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement.Queue_4._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement.Queue_4.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad.io_dataStream_ipsIO_valid @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad._T_104 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad._T_105 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad.io_dataStream_opsIO_valid @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement.Queue_5.io_enq_valid @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement.Queue_5._GEN_8 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement.Queue_5.io_deq_valid @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement.io_dataStream_opsIO_valid @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.io_dataPath_pSumIO_outIOs_0_valid @[PECluster.scala 38:34]
ClusterGroup.RouterCluster.io_dataPath_routerData_pSumRIO_0_inIOs_0_valid @[ClusterGroup.scala 92:61]
ClusterGroup.RouterCluster.PSumRouter.io_dataPath_inIOs_0_valid @[RouterCluster.scala 14:75]
ClusterGroup.RouterCluster.PSumRouter._GEN_6 @[RouterCluster.scala 153:28]
ClusterGroup.RouterCluster.PSumRouter.pSumInternalDataWire_valid @[RouterCluster.scala 154:22 RouterCluster.scala 158:22 RouterCluster.scala 162:22]
ClusterGroup.RouterCluster.PSumRouter._T_7 @[RouterCluster.scala 169:59]
ClusterGroup.RouterCluster.PSumRouter._GEN_16 @[RouterCluster.scala 166:29]
ClusterGroup.RouterCluster.PSumRouter.io_dataPath_outIOs_0_valid @[RouterCluster.scala 169:33]
ClusterGroup.RouterCluster.io_dataPath_routerData_pSumRIO_0_outIOs_0_valid @[RouterCluster.scala 14:75]
ClusterGroup.PECluster.io_dataPath_pSumIO_inIOs_0_valid @[ClusterGroup.scala 94:40]
The PE should use Toeplitz Matrix
Errors as shown bellow:
io_debugIO_eachPETopDebug_0_3_writeFinishRegVec_0=1 did not equal expected=0: Some(later PEs haven't begin poke address yet)
Actually, it begins to poke inAct data to the later PEs instead of the former ones.
When I compile the PECluster in test file, I meet several errors, one is showed bellow:
@[PECluster.scala 159:25 PECluster.scala 166:25 PECluster.scala 173:25] : muxInActDataWire[0][0].adrIOs.data.bits <= mux(_T_122, io.dataPath.inActIO[0].adrIOs.data.bits, mux(_T_123, io.dataPath.inActIO[1].adrIOs.data.bits, mux(_T_124, io.dataPath.inActIO[2].adrIOs.data.bits, VOID))) @[PECluster.scala 159:25 PECluster.scala 166:25 PECluster.scala 173:25]
firrtl.passes.CheckInitialization$RefNotInitializedException: @[PECluster.scala 20:38] : [module PECluster] Reference muxInActDataWire is not fully initialized.
But when I checked the generated firrtl file at line 16703 to 16731, I found nothing special. The coresponding Chisel file is here
GLB SRAM should have separated write and read ports because when accumulating PSum, PE Array will read current PSum and maybe before reading finish, it will write back PSum.
Currently, it just generates one stream data, but in reality, we need a different number of Seq streams corresponding to each router number, i.e., the number of input activation router, weight router and partial sum router.
And the relationships between them are not simply can be described as different Seq streams.
Dear author,
Thank you for your excellent work in reproducing the Eyeriss v2.
I want to ask how you implement different dataflow(OS, WS, IS) with your design.
Thank you
Just wondering. The import mill._
under the build.sc prompted errors. How do you build up the dependencies without build.sbt?
Thanks.
The CSCSwitcher not always works.
[info] goldenAdr Vs. outAdr = List(
(1,1), (4,4), (5,5), (7,7), (8,8), (0,0),
(2,2), (15,15), (5,5), (15,15), (6,6), (0,0),
(1,1), (3,3), (4,4), (5,5), (15,15), (0,0),
(3,3), (4,4), (15,15), (5,5), (7,7), (0,0),
(1,1), (15,15), (2,2), (15,15), (15,15), (0,0),
(1,1), (2,3), (15,15), (4,4), (6,6), (0,0),
(1,1), (4,4), (7,7), (8,8), (15,9), (0,0),
(1,15), (15,1), (2,4), (4,5), (5,0), (0,1),
(2,4), (3,5), (6,7), (7,8), (15,0), (0,2),
(1,15), (3,3), (4,4), (5,0), (6,1), (0,2),
(1,4), (2,5), (4,7), (15,0), (6,2), (0,3),
(1,4), (3,15), (4,6), (5,0), (15,1), (0,3),
(1,5), (2,15), (4,6), (6,0), (15,1), (0,2),
(2,4), (3,15), (4,6), (6,0), (15,15), (0,1),
(1,2), (15,3), (2,4), (15,0), (3,2), (0,3),
When trying to generate waveform of LazyEyeriss, it will throw this error:
requirement failed: Diplomacy error: eyeriss.controlNode (A sink node with parent
'eyeriss/eyerissWrapper' at (LazyEyerissSpecTest.scala:18:40))
has 1 != 0 up/down inner parameters
genSparse
should return the Toeplitz MatrixdoEn
with false.B
here, then the first SRAM's state will be "not done" after three SRAM finish. But when I move it one cycle later, that will be fine.timescope
inner three SRAM read fork, then the time won't stop and back to where it should, but continues.The PSum peeked from the circuit does not equal to the golden results.
Either golden results or source poke is wrong.
----------- begin to readout PSum ------------
pSum1 = List(31296, 49452, 47387, 1794, 7, 58771, 23086, 49675, 50953, 28015, 27001, 46664, 30807, 23661, 24353, 10185, 9385, 14769, 10381, 24182, 2995, 7169, 71117, 21761)
-------- 1-th Column PEs receive all inPSum
pSum3 = List(31297, 15064, 47382, 9551, 3568, 61938, 30815, 49678, 50959, 28017, 29409, 54567, 30815, 23653, 6847, 10186, 9393, 14763, 10384, 23815, 28363, 51185, 102775, 61797)
-------- 3-th Column PEs receive all inPSum
pSum0 = List(15136, 2, 22427, 70882, 13931, 263, 18098, 31471, 12650, 86682, 0, 72437, 2734, 5632, 1, 13686, 18794, 19264, 8521, 27892, 48654, 49481, 57641, 119523)
-------- 0-th Column PEs receive all inPSum
pSum2 = List(27743, 26004, 19017, 70885, 24458, 29114, 26909, 31472, 43189, 51480, 22232, 45734, 6, 45, 67, 13688, 16064, 13637, 8518, 27888, 79336, 82811, 57641, 111709)
-------- 2-th Column PEs receive all inPSum
pSum0 =
List(15136, 13925, 12653, 2731, 36625, 7551, 7, 261, 86683, 5629, 15642, 30509, 22426, 18095, 1, 0, 33110, 8440, 70890, 31474, 72428, 13687, 58895, 21247)
pSum1 =
List(31296, 3565, 50960, 30808, 3271, 13439, 15059, 61935, 28014, 23653, 2664, 23552, 47380, 30816, 29406, 6843, 37426, 16675, 9546, 49678, 54566, 10187, 13827, 29681)
pSum2 =
List(16079, 65294, 46726, 4, 32388, 8, 33139, 54610, 22674, 45, 33636, 311, 3882, 37541, 35569, 61, 20188, 20660, 6279, 7182, 51503, 561, 36677, 21631)
pSum3 =
List(56476, 14201, 6839, 65096, 4, 65474, 39218, 8, 49762, 41943, 44514, 49522, 48691, 6139, 4878, 36194, 27819, 45112, 66582, 36937, 25003, 33703, 1, 7244)
Chipyard is a pipular learning platform. Can you integtate your code, especially test code, into the environment of chipyard?
genAdrCountData
's address is incorrect while there are more than one non-zero elements in the last columnAlthough I have added fork.withRegion
, it will report errors after one poke and peek cycle.
[inActIO0@Adr@0] poke 1 now
[inActIO0@Data@0] poke 562 now
[inActIO1@Adr@0] poke 1 now
[inActIO1@Data@0] poke 2194 now
[inActIO2@Adr@0] poke 1 now
[inActIO2@Data@0] poke 2945 now
[muxInActIO@0@0@Adr@0] now valid = Bool(true)
[muxInActIO@0@0@Adr@0] peek 1 now
[muxInActIO@0@0@Data@0] now valid = Bool(true)
[muxInActIO@0@0@Adr@0] peek 562 now
[muxInActIO@0@1@Adr@0] now valid = Bool(true)
[muxInActIO@0@1@Adr@0] peek 1 now
[muxInActIO@0@1@Data@0] now valid = Bool(true)
[muxInActIO@0@1@Adr@0] peek 2194 now
[muxInActIO@0@2@Adr@0] now valid = Bool(true)
[muxInActIO@0@2@Adr@0] peek 1 now
[muxInActIO@0@2@Data@0] now valid = Bool(true)
[muxInActIO@0@2@Adr@0] peek 2945 now
[muxInActIO@1@0@Adr@0] now valid = Bool(true)
[muxInActIO@1@0@Adr@0] peek 1 now
[muxInActIO@1@0@Data@0] now valid = Bool(true)
[muxInActIO@1@0@Adr@0] peek 2194 now
[muxInActIO@1@1@Adr@0] now valid = Bool(true)
[muxInActIO@1@1@Adr@0] peek 1 now
[muxInActIO@1@1@Data@0] now valid = Bool(true)
[muxInActIO@1@1@Adr@0] peek 2945 now
[muxInActIO@2@0@Adr@0] now valid = Bool(true)
[muxInActIO@2@0@Adr@0] peek 1 now
[muxInActIO@2@0@Data@0] now valid = Bool(true)
[muxInActIO@2@0@Adr@0] peek 2945 now
[inActIO0@Adr@0] t + 1, ready = Bool(true)
[inActIO0@Adr@1] poke 4 now
[inActIO0@Data@0] t + 1, ready = Bool(true)
[inActIO0@Data@1] poke 432 now
[inActIO1@Adr@0] t + 1, ready = Bool(true)
[inActIO1@Adr@1] poke 15 now
[inActIO1@Data@0] t + 1, ready = Bool(true)
[inActIO1@Data@1] poke 2241 now
[inActIO2@Adr@0] t + 1, ready = Bool(true)
[inActIO2@Adr@1] poke 15 now
[inActIO2@Data@0] t + 1, ready = Bool(true)
[inActIO2@Data@1] poke 1411 now
test PEClusterInAct Success: 0 tests passed in 4 cycles in 0.200795 seconds 19.92 Hz
Bool(IO io_inActToArrayData_muxInActData_0_0_dataIOs_data_ready in PEClusterInAct) -> Bool(IO io_inActToArrayData_inActIO_0_dataIOs_data_ready in PEClusterInAct): Divergent poking / peeking threads
chiseltest.ThreadOrderDependentException: Bool(IO io_inActToArrayData_muxInActData_0_0_dataIOs_data_ready in PEClusterInAct) -> Bool(IO io_inActToArrayData_inActIO_0_dataIOs_data_ready in PEClusterInAct): Divergent poking / peeking threads
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.