Coder Social home page Coder Social logo

singularitykchen / dl_accelerator Goto Github PK

View Code? Open in Web Editor NEW
165.0 165.0 30.0 5.91 MB

Deep Learning Accelerator Based on Eyeriss V2 Architecture with custom RISC-V extended instructions

Scala 100.00%
chisel3 deep-learning-accelerator eyeriss final-year-project risc-v

dl_accelerator's Introduction

Hi there 👋

My name is Singularity Chen, a Ph.D. candidate in SCSE at NTU. My interests are architecture design and exploring the fashionable things.

Anurag's github stats

Here are the coding languages that I used (actually, I also use Python and Perl in my private repos ⚡).

Top Langs

SingularityKChen's GitHub activity graph of last 31 days.

dl_accelerator's People

Contributors

hafred avatar sequencer avatar singularitykchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dl_accelerator's Issues

[module ClusterGroup] Combinational loop detected

Detect one kind of combinational loop, which should be broken inner Router Cluster.

firrtl.transforms.CheckCombLoops$CombLoopException: : [module ClusterGroup] Combinational loop detected:
ClusterGroup.PECluster.io_dataPath_pSumIO_inIOs_0_valid
ClusterGroup.PECluster._GEN_1	 @[PECluster.scala 44:46]
ClusterGroup.PECluster.muxInPSumDataWire_0_valid	 @[PECluster.scala 45:28 PECluster.scala 48:28]
ClusterGroup.PECluster.ProcessingElement_8.io_dataStream_ipsIO_valid	 @[PECluster.scala 42:38]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_8.Queue_4.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad.io_dataStream_ipsIO_valid	 @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad._T_104	 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad._T_105	 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement_8.ProcessingElementPad.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_8.Queue_5.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_8.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.ProcessingElement_4.io_dataStream_ipsIO_valid	 @[PECluster.scala 40:36]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_4.Queue_4.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad.io_dataStream_ipsIO_valid	 @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad._T_104	 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad._T_105	 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement_4.ProcessingElementPad.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement_4.Queue_5.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement_4.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.ProcessingElement.io_dataStream_ipsIO_valid	 @[PECluster.scala 39:38]
ClusterGroup.PECluster.ProcessingElement.Queue_4.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement.Queue_4._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement.Queue_4.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad.io_dataStream_ipsIO_valid	 @[ProcessingElement.scala 44:26]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad._T_104	 @[ProcessingElement.scala 301:62]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad._T_105	 @[ProcessingElement.scala 301:91]
ClusterGroup.PECluster.ProcessingElement.ProcessingElementPad.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 301:29]
ClusterGroup.PECluster.ProcessingElement.Queue_5.io_enq_valid	 @[Decoupled.scala 288:22]
ClusterGroup.PECluster.ProcessingElement.Queue_5._GEN_8	 @[Decoupled.scala 236:25]
ClusterGroup.PECluster.ProcessingElement.Queue_5.io_deq_valid	 @[Decoupled.scala 231:16 Decoupled.scala 236:40]
ClusterGroup.PECluster.ProcessingElement.io_dataStream_opsIO_valid	 @[ProcessingElement.scala 45:23]
ClusterGroup.PECluster.io_dataPath_pSumIO_outIOs_0_valid	 @[PECluster.scala 38:34]
ClusterGroup.RouterCluster.io_dataPath_routerData_pSumRIO_0_inIOs_0_valid	 @[ClusterGroup.scala 92:61]
ClusterGroup.RouterCluster.PSumRouter.io_dataPath_inIOs_0_valid	 @[RouterCluster.scala 14:75]
ClusterGroup.RouterCluster.PSumRouter._GEN_6	 @[RouterCluster.scala 153:28]
ClusterGroup.RouterCluster.PSumRouter.pSumInternalDataWire_valid	 @[RouterCluster.scala 154:22 RouterCluster.scala 158:22 RouterCluster.scala 162:22]
ClusterGroup.RouterCluster.PSumRouter._T_7	 @[RouterCluster.scala 169:59]
ClusterGroup.RouterCluster.PSumRouter._GEN_16	 @[RouterCluster.scala 166:29]
ClusterGroup.RouterCluster.PSumRouter.io_dataPath_outIOs_0_valid	 @[RouterCluster.scala 169:33]
ClusterGroup.RouterCluster.io_dataPath_routerData_pSumRIO_0_outIOs_0_valid	 @[RouterCluster.scala 14:75]
ClusterGroup.PECluster.io_dataPath_pSumIO_inIOs_0_valid	 @[ClusterGroup.scala 94:40]

The PE inAct Poke Strated at Later PEs

Errors as shown bellow:

io_debugIO_eachPETopDebug_0_3_writeFinishRegVec_0=1 did not equal expected=0: Some(later PEs haven't begin poke address yet)

Actually, it begins to poke inAct data to the later PEs instead of the former ones.

Can Not Compile PECluster

When I compile the PECluster in test file, I meet several errors, one is showed bellow:

@[PECluster.scala 159:25 PECluster.scala 166:25 PECluster.scala 173:25] : muxInActDataWire[0][0].adrIOs.data.bits <= mux(_T_122, io.dataPath.inActIO[0].adrIOs.data.bits, mux(_T_123, io.dataPath.inActIO[1].adrIOs.data.bits, mux(_T_124, io.dataPath.inActIO[2].adrIOs.data.bits, VOID))) @[PECluster.scala 159:25 PECluster.scala 166:25 PECluster.scala 173:25]
firrtl.passes.CheckInitialization$RefNotInitializedException:  @[PECluster.scala 20:38] : [module PECluster]  Reference muxInActDataWire is not fully initialized.

But when I checked the generated firrtl file at line 16703 to 16731, I found nothing special. The coresponding Chisel file is here

GenOneStreamData should be corrected

Currently, it just generates one stream data, but in reality, we need a different number of Seq streams corresponding to each router number, i.e., the number of input activation router, weight router and partial sum router.

And the relationships between them are not simply can be described as different Seq streams.

关于rocketchip如何做energy/area modeling问题请教

hi,你好。 我最近也用chisel做了一个加速器,也是rocket+加速器 的模式,然后目前跑了verilator 程序仿真了。 下面我想做功耗仿真,我看到其他论文里面有类似这种图:
image
可以仿真出处理器各个部件的功耗情况,请问您知道这是怎么做到的吗?可以有哪些工具软件可以做到呢?希望能和你交流一下

GLB InAct's write enable is always true

The GLB input activation's write enable (the read line) should be false after one cycle it's true, or it will case the data stored in input activation SRAM being zero and affects the final result.
image

Eyeriss dataflow configure

Dear author,
Thank you for your excellent work in reproducing the Eyeriss v2.
I want to ask how you implement different dataflow(OS, WS, IS) with your design.

Thank you

csc switcher can only pass several cycles

The CSCSwitcher not always works.

[info] goldenAdr Vs. outAdr = List(
(1,1), (4,4), (5,5), (7,7), (8,8), (0,0), 
(2,2), (15,15), (5,5), (15,15), (6,6), (0,0),
(1,1), (3,3), (4,4), (5,5), (15,15), (0,0),
(3,3), (4,4), (15,15), (5,5), (7,7), (0,0),
(1,1), (15,15), (2,2), (15,15), (15,15), (0,0),
(1,1), (2,3), (15,15), (4,4), (6,6), (0,0), 
(1,1), (4,4), (7,7), (8,8), (15,9), (0,0), 
(1,15), (15,1), (2,4), (4,5), (5,0), (0,1), 
(2,4), (3,5), (6,7), (7,8), (15,0), (0,2),
(1,15), (3,3), (4,4), (5,0), (6,1), (0,2), 
(1,4), (2,5), (4,7), (15,0), (6,2), (0,3), 
(1,4), (3,15), (4,6), (5,0), (15,1), (0,3), 
(1,5), (2,15), (4,6), (6,0), (15,1), (0,2), 
(2,4), (3,15), (4,6), (6,0), (15,15), (0,1), 
(1,2), (15,3), (2,4), (15,0), (3,2), (0,3),

LazyEyeriss can not generate waveform

When trying to generate waveform of LazyEyeriss, it will throw this error:

requirement failed: Diplomacy error: eyeriss.controlNode (A sink node with parent 
'eyeriss/eyerissWrapper' at  (LazyEyerissSpecTest.scala:18:40)) 
has 1 != 0 up/down inner parameters

Two Tricky Bugs at GLBCluster

  • When I poke the doEn with false.B here, then the first SRAM's state will be "not done" after three SRAM finish. But when I move it one cycle later, that will be fine.
  • Also, when I use timescope inner three SRAM read fork, then the time won't stop and back to where it should, but continues.

the PSum via the circuit doesn't equal to theorial results

The PSum peeked from the circuit does not equal to the golden results.
Either golden results or source poke is wrong.

----------- begin to readout PSum ------------
pSum1 = List(31296, 49452, 47387, 1794, 7, 58771, 23086, 49675, 50953, 28015, 27001, 46664, 30807, 23661, 24353, 10185, 9385, 14769, 10381, 24182, 2995, 7169, 71117, 21761)
-------- 1-th Column PEs receive all inPSum
pSum3 = List(31297, 15064, 47382, 9551, 3568, 61938, 30815, 49678, 50959, 28017, 29409, 54567, 30815, 23653, 6847, 10186, 9393, 14763, 10384, 23815, 28363, 51185, 102775, 61797)
-------- 3-th Column PEs receive all inPSum
pSum0 = List(15136, 2, 22427, 70882, 13931, 263, 18098, 31471, 12650, 86682, 0, 72437, 2734, 5632, 1, 13686, 18794, 19264, 8521, 27892, 48654, 49481, 57641, 119523)
-------- 0-th Column PEs receive all inPSum
pSum2 = List(27743, 26004, 19017, 70885, 24458, 29114, 26909, 31472, 43189, 51480, 22232, 45734, 6, 45, 67, 13688, 16064, 13637, 8518, 27888, 79336, 82811, 57641, 111709)
-------- 2-th Column PEs receive all inPSum
pSum0 = 
List(15136, 13925, 12653, 2731, 36625, 7551, 7, 261, 86683, 5629, 15642, 30509, 22426, 18095, 1, 0, 33110, 8440, 70890, 31474, 72428, 13687, 58895, 21247)
pSum1 = 
List(31296, 3565, 50960, 30808, 3271, 13439, 15059, 61935, 28014, 23653, 2664, 23552, 47380, 30816, 29406, 6843, 37426, 16675, 9546, 49678, 54566, 10187, 13827, 29681)
pSum2 = 
List(16079, 65294, 46726, 4, 32388, 8, 33139, 54610, 22674, 45, 33636, 311, 3882, 37541, 35569, 61, 20188, 20660, 6279, 7182, 51503, 561, 36677, 21631)
pSum3 = 
List(56476, 14201, 6839, 65096, 4, 65474, 39218, 8, 49762, 41943, 44514, 49522, 48691, 6139, 4878, 36194, 27819, 45112, 66582, 36937, 25003, 33703, 1, 7244)

in PEClusterInAct: Divergent poking / peeking threads

Although I have added fork.withRegion , it will report errors after one poke and peek cycle.

[inActIO0@Adr@0] poke 1 now
[inActIO0@Data@0] poke 562 now
[inActIO1@Adr@0] poke 1 now
[inActIO1@Data@0] poke 2194 now
[inActIO2@Adr@0] poke 1 now
[inActIO2@Data@0] poke 2945 now
[muxInActIO@0@0@Adr@0] now valid = Bool(true)
[muxInActIO@0@0@Adr@0] peek 1 now
[muxInActIO@0@0@Data@0] now valid = Bool(true)
[muxInActIO@0@0@Adr@0] peek 562 now
[muxInActIO@0@1@Adr@0] now valid = Bool(true)
[muxInActIO@0@1@Adr@0] peek 1 now
[muxInActIO@0@1@Data@0] now valid = Bool(true)
[muxInActIO@0@1@Adr@0] peek 2194 now
[muxInActIO@0@2@Adr@0] now valid = Bool(true)
[muxInActIO@0@2@Adr@0] peek 1 now
[muxInActIO@0@2@Data@0] now valid = Bool(true)
[muxInActIO@0@2@Adr@0] peek 2945 now
[muxInActIO@1@0@Adr@0] now valid = Bool(true)
[muxInActIO@1@0@Adr@0] peek 1 now
[muxInActIO@1@0@Data@0] now valid = Bool(true)
[muxInActIO@1@0@Adr@0] peek 2194 now
[muxInActIO@1@1@Adr@0] now valid = Bool(true)
[muxInActIO@1@1@Adr@0] peek 1 now
[muxInActIO@1@1@Data@0] now valid = Bool(true)
[muxInActIO@1@1@Adr@0] peek 2945 now
[muxInActIO@2@0@Adr@0] now valid = Bool(true)
[muxInActIO@2@0@Adr@0] peek 1 now
[muxInActIO@2@0@Data@0] now valid = Bool(true)
[muxInActIO@2@0@Adr@0] peek 2945 now
[inActIO0@Adr@0] t + 1, ready = Bool(true)
[inActIO0@Adr@1] poke 4 now
[inActIO0@Data@0] t + 1, ready = Bool(true)
[inActIO0@Data@1] poke 432 now
[inActIO1@Adr@0] t + 1, ready = Bool(true)
[inActIO1@Adr@1] poke 15 now
[inActIO1@Data@0] t + 1, ready = Bool(true)
[inActIO1@Data@1] poke 2241 now
[inActIO2@Adr@0] t + 1, ready = Bool(true)
[inActIO2@Adr@1] poke 15 now
[inActIO2@Data@0] t + 1, ready = Bool(true)
[inActIO2@Data@1] poke 1411 now
test PEClusterInAct Success: 0 tests passed in 4 cycles in 0.200795 seconds 19.92 Hz


Bool(IO io_inActToArrayData_muxInActData_0_0_dataIOs_data_ready in PEClusterInAct) -> Bool(IO io_inActToArrayData_inActIO_0_dataIOs_data_ready in PEClusterInAct): Divergent poking / peeking threads
chiseltest.ThreadOrderDependentException: Bool(IO io_inActToArrayData_muxInActData_0_0_dataIOs_data_ready in PEClusterInAct) -> Bool(IO io_inActToArrayData_inActIO_0_dataIOs_data_ready in PEClusterInAct): Divergent poking / peeking threads

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.