Coder Social home page Coder Social logo

lassen's People

Contributors

alexcarsello avatar bobcheng15 avatar cdonovick avatar jack-melchert avatar jeffsetter avatar kalhankoul96 avatar kuree avatar leonardt avatar nikhilbhagdikar avatar phanrahan avatar priyanka-raina avatar rdaly525 avatar rsetaluri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lassen's Issues

Master is broken

Looks like some mapper tests and rtl tests are failing.

Mapper tests fixed by #24

Mapper/Packer tasks

  • automatically generate compile complex op rewrite rules
  • Full Halide test suite for mapper rewrite rules
  • Implement Instruction selection
  • Implement hierarchical packing
    • Constant/register packing
    • Investigate generating VPR format from PEak spec
    • Investigate using SMT to generate viable packings

Lassen Tapeout Task List

Resolving/Missing Features

  • Counter - dropping
  • Breakpoints/IRQ - Not dropping
  • Floating Point Comparison/flags, Ross, Nikhil
  • Resolve differences between python gmpy2.mpfr implementation and design ware, Caleb, Ross, Nikhil
  • clk_en/Register Valid
  • Carry in
  • compare specification of instructions to both python simulator and RTL, Lenny, Keyi

Missing randomized Tests

  • "micro" FP ops, Nikhil, Ross (branch 'float-test')

    • GetMant
    • AddIExp
    • SubExp
    • CnvExp2F
    • GetFInt
    • GetFFrac
    • FCnvInt2F
    • FPSub
    • FPAdd
  • Complex Ops, Nikhil, Ross (Branch 'complex-float')

    • div
    • ln
    • exp
    • round
    • Ceil
    • Floor
  • Register Modes (for each of the 5 inputs), Pat:

    • Delay
    • Const
    • Bypass
  • Carry tests, Ross ('add32' -> master)

    • adc
    • sbc
    • add32 (complex op)
    • sub32 (complex op)
  • IRQ

  • Reading and writing Data and bit registers (Ross/Keyi)

  • Stall

  • Asynchronous reset on Data/Bit Registers (Lenny)

RTL

  • Check that no functional units are duplicated, Lenny
    • check that FP FUs are not duplicated (only one instance of add/mul modules in output verilog)
    • reduced the number of multiply units via #131 and #132
  • Sanity check the RTL and look for simple optimizations, Lenny
    • Register (Peak) - visually inspected and tested via cdonovick/peak#56
    • RegisterMode - #125
    • These datapath elements are currently tested through test_pe so we're fairly confident in there functionality:
      • _lut
      • adc
      • alu
      • cond
      • ite
      • lut
      • overflow
      • PE

Need memory specification in order to do mapping

I have a branch called 'mem' where I began specifying the memory using Peak.

I need the following fleshed out in the Peak representation of this memory tile.

  • All the ports that can be routed to
  • The full instruction
  • The functional specification (lower priority)

Currently I have hand-wavy specified the RAM/ROM mode in lassen/mem/*.py

Let me know if you have any qustions

Why is irq listed as PE output

It seems like irq signal gets propagates into canal as a valid output. As a result, the irq signal goes to all the switchbox muxes, potentially increasing the area. What is this signal doing? Based on the code here:

lassen/lassen/sim.py

Lines 257 to 261 in 5909514

# calculate interrupt request
irq = Bit(0) # NYI
# return 16-bit result, 1-bit result, irq
return alu_res, res_p, irq

is irq a way to output constant 0 to the application network? If not, can someone explain to me which applications we have so far need it? Is it a form of premature optimization?

Rounding mode mismatch with the hardware

Two problems:

  1. The rounding mode, which is round to nearest even, in the functional model and RTL does not match. See: https://buildkite.com/stanford-aha/lassen/builds/109#712c5179-b555-4ddb-8fd3-6da54ebba32b
    The functional model is using mpfr, which I believe should also be correct.

  2. When using -v with pytest, fault doesn't catch the error yet running individually does. There is something wrong with either the test bench setup or fault. See the successful build: https://buildkite.com/stanford-aha/lassen/builds/108#ef4d8b94-f985-4e9e-9873-cdf64ba87be3
    @leonardt can you take a closer look?

Add comments on PE bitstream

Currently there is no way to know what ops the PE bitstream corresponds to. In other words it's impossible to debug the PE. Can someone add ways to comment the bitstream so that a human programmer can understand what's going one?

Lassen Bug Tracker

FP_Mult Error mode issue
-Problem: RTL mismatched FPVector and Halide due to wrapper
-Solution: Use DW FP IP instead
-Verification: Lassen RTL tests
-Can be resolved today

SLT/SGT Bug
-problem: SMT cannot infer SLT and SGT. But the random tests are passing
-Solution: Still debugging (Caleb)
-Verification: Automapper finding SLT and SGT

SubExp Bug
-problem: A small portion of random tests (~5%) are failing the SubExp micro op RTL
-solution: Still debugging (Lenny/anyone else wants to help?)
-verification: Lassen RTL tests

Update complex op tests

@nikhilbhagdikar I have isolated your complex ops tests into a separate branch called 'complex-float'. Can you change them to inherent from peak in that branch?

Again, in general please do small pull requests that do not have multiple orthogonal changes.

asr doesn't work at all

When I was running Jeff's end-to-end tests, based on the waveform I've seen asr is not implemented correctly in RTL. See the waveform below.
image

Then I went ahead and poked around inside lassen. Even if I added asr to RTL, the tests is still passing. That's odd. I then ask pytest to print out everything, and here are the things I found:

  1. asr is not working
  2. fault is not reporting error even though verilator fails.
  3. (maybe a problem) where is the SMT formal check on the RTL? Or this is not how formal works? (please forgive me if I make a wrong assertion.)

Here is the link to the print out: https://travis-ci.org/StanfordAHA/lassen/builds/532629001#L1217
You can reproduce the individual test using test_asr branch with the following command:

pytest tests/test_rtl.py -k "test_rtl[False-mode0-asr]" -s

You should see the verilator assertion failed but pytest passed.

I'd call for a thorough manually examination on every single op we have a manually test them to prevent these kinds of mistake from happening again.

Lassen read/write registers

@rdaly525
Peak core needs to output the following ports:

  • config_addr -> this is 8 bits.
  • config_data -> this is 32-bit wide
  • read_config_data -> this can be any size
  • config_en -> this is the write signal
  • reset -> 1-bit signal that sets everything to zero

Notice that there is no read signal (sorry I lied). The core should always returns the values to read_config_data given the addr.

More Complex Ops

We need Round, Floor and Ceil. If we have one of these ops, the other two are very easy to create by just adding or subtracting 0.5 appropriately.

@nikhilbhagdikar, could you add these to our complex ops (in lassen/stdlib/) and add some tests?

For the tests, please use hwtypes.FPVector in order to construct and manipulate python floating point values

DELAY mode doesn't work in RTL

Screenshot from 2019-05-10 11-44-41

Here we have a PE configured as mult (1, reg), where data0 is configured as a const register, and data1 is configured as DEALY.
0: Single from data0 connection box. Since data0 is in CONST mode, as indicated in signal 2, this is expected
1: Signal from data1 connection box. This is the input the the PE core.
2. Register mode for data0. 0 is CONST so this is expected.
3. Register const value for data0, this is expected since it's multiply by 1.
4. Register mode for data1. 11 is 3, which is DELAY mode. This is also correct.
5. This is the signal going to the ALU unit. Notice that there is no delay!
6. This is the output from the ALU unit. No delay either.

Mode flags are here:

lassen/lassen/mode.py

Lines 9 to 17 in 5ed30d6

def gen_mode_type(family):
"""
Field for specifying register modes
"""
class Mode(family.Enum):
CONST = 0 # Register returns constant in constant field
BYPASS = 2 # Register is bypassed and input value is returned
DELAY = 3 # Register written with input value, previous value returned
return Mode

Other "micro" ops for the PE?

Halide requires the use of the following complex floating point operations:

div, rem, log, exp, pow, sqrt, sin, cos, tan, asin, acos, atan2, tanh

We currently have a bunch of microps that are required to do the floating point divide algorithm. They are as follows:

FGetMant
FAddIExp
FSubExp
FCnvExp2F
FGetFInt
FGetFFrac

Are these microps sufficient to be able to do similar algorithms for the rest of these complex operations?

Wrong implementation for LTE_MIN

lassen/lassen/sim.py

Lines 77 to 84 in 008a0da

elif alu == ALU.GTE_Max:
# C, V = a-b?
pred = a >= b
res, res_p = pred.ite(a,b), a >= b
elif alu == ALU.LTE_Min:
# C, V = a-b?
pred = a <= b
res, res_p = pred.ite(a,b), a >= b

Someone needs to write more exhaustive tests on every op with multiple inputs.

Flags are wrong for FPAdd

We need to the flags to be correct in order to do all the floating point comparison operations. (<,<=,==,!=,etc...)

Ideally this can be done just using the already existent flags and that the cond.py does not actually depend on the instruction.

Better support for FPSub

I propose there be a dedicated configuration bit that will can flip the sign bit of floating point values. This would prevent burning an additional PE when doing FPSUB. (You can flip the sign bit by first doing an XOR in a separate PE).

@alexcarsello, thoughts?

Leverage the functional tests to do RTL tests

One major issue we have seen is that we are missing a lot of RTL unit tests in lassen. Due to the nature of Peak we should be able to parameterize the functional tests (test_pe.py) to also do RTL tests.

Floating point lassen implementations seems to be wrong endianness.

Floating point numbers should have their fractional bits as their lsbs, and their sign bit as the msb.

@nikhilbhagdikarI I suspect you have done the opposite in your lassen implementation for most of the fp ops.

I have a branch called 'float-test' that contains your floating point changes along with some more parameterized tests.

on this branch, you can run:
>pytest -k get_mant

and you will see an assertion error.

IRQ discussion

I have a preliminary branch implementing IRQ in lassen called 'irq'

The features currently are that for both the alu output and the single bit output, you can enable irq along with a comparison value. If at any time during running the comparison is true, the PE will output a 1 on the irq output along with latching that 1 in a register in the PE. The reason for this latch is so that the SOC can easily determine which PE triggered the interrupt.

Open questions for lassen features:
-Should the IRQ be output the same cycle when the interrupt occurs?
-Should we continue output the IRQ till it is cleared by software?
-Should we support multiple comparison operations? (==,!=,<,>,etc)

Let me know your thoughts or any other assumptions I missed.

Why VALID mode is removed from PE registers?

During configuration when the chip is clock-gated, using VALID mode will prevent the register taking values. This mode was proven to be very critical when I was testing the Jade chip.

So the question is: is VALID got renamed to DELAY mode, or we can't never clock gate the register in the future? If that's the case, there should be extra graph analysis to make sure that register used in counter should never be used inside PE since you can't clock gate it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.