The lassen's discuss from stanfordaha

Add the following complex ops to rewrute rule generation

umin, umax, smin, smax

Wrong implementation for LTE_MIN

Lines 77 to 84 in 008a0da

    
           elif alu == ALU.GTE_Max: 
        
               # C, V = a-b? 
        
               pred = a >= b 
        
               res, res_p = pred.ite(a,b), a >= b 
        
           elif alu == ALU.LTE_Min: 
        
               # C, V = a-b? 
        
               pred = a <= b 
        
               res, res_p = pred.ite(a,b), a >= b

Someone needs to write more exhaustive tests on every op with multiple inputs.

DELAY mode doesn't work in RTL

Here we have a PE configured as mult (1, reg), where data0 is configured as a const register, and data1 is configured as DEALY.
0: Single from data0 connection box. Since data0 is in CONST mode, as indicated in signal 2, this is expected
1: Signal from data1 connection box. This is the input the the PE core.
2. Register mode for data0. 0 is CONST so this is expected.
3. Register const value for data0, this is expected since it's multiply by 1.
4. Register mode for data1. 11 is 3, which is DELAY mode. This is also correct.
5. This is the signal going to the ALU unit. Notice that there is no delay!
6. This is the output from the ALU unit. No delay either.

Mode flags are here:

lassen/lassen/mode.py

Lines 9 to 17 in 5ed30d6

    
           def gen_mode_type(family): 
        
               """ 
        
               Field for specifying register modes 
        
               """ 
        
               class Mode(family.Enum): 
        
                   CONST = 0   # Register returns constant in constant field 
        
                   BYPASS = 2  # Register is bypassed and input value is returned 
        
                   DELAY = 3   # Register written with input value, previous value returned 
        
               return Mode

sle not mapped properly

See https://travis-ci.com/StanfordAHA/GarnetFlow/jobs/204405717#L2120

Missing tests for LUTs

asr doesn't work at all

When I was running Jeff's end-to-end tests, based on the waveform I've seen asr is not implemented correctly in RTL. See the waveform below.

Then I went ahead and poked around inside lassen. Even if I added asr to RTL, the tests is still passing. That's odd. I then ask pytest to print out everything, and here are the things I found:

asr is not working
fault is not reporting error even though verilator fails.
(maybe a problem) where is the SMT formal check on the RTL? Or this is not how formal works? (please forgive me if I make a wrong assertion.)

Here is the link to the print out: https://travis-ci.org/StanfordAHA/lassen/builds/532629001#L1217
You can reproduce the individual test using test_asr branch with the following command:

pytest tests/test_rtl.py -k "test_rtl[False-mode0-asr]" -s

You should see the verilator assertion failed but pytest passed.

I'd call for a thorough manually examination on every single op we have a manually test them to prevent these kinds of mistake from happening again.

Add support for casting from float to Int

Lut bitvector constructor not supported by SMTBitVector

in lut.py: 16 dynamically constructing a new bitvector out of an array of Bits is not supported.

@cdonovick, is this something you want to support, or is there a better way to write this code?

signed ops is not working in RTL

Probably the same issue with asr.

I haven't test smin yet, but I'm suspecting it's the same error.

See smax branch.

See: https://travis-ci.com/StanfordAHA/lassen/builds/111969902

Master is broken

Looks like some mapper tests and rtl tests are failing.

Mapper tests fixed by #24

Please update the Garnetflow requirements when you use non-released version of packages

I will work on a solution that will block merge if the PR fails GarnetFlow.

IRQ discussion

I have a preliminary branch implementing IRQ in lassen called 'irq'

The features currently are that for both the alu output and the single bit output, you can enable irq along with a comparison value. If at any time during running the comparison is true, the PE will output a 1 on the irq output along with latching that 1 in a register in the PE. The reason for this latch is so that the SOC can easily determine which PE triggered the interrupt.

Open questions for lassen features:
-Should the IRQ be output the same cycle when the interrupt occurs?
-Should we continue output the IRQ till it is cleared by software?
-Should we support multiple comparison operations? (==,!=,<,>,etc)

Let me know your thoughts or any other assumptions I missed.

Add test that will verify circuit equivalence for generated RTL vs RTL-freeze RTL

The goal of this is to use CoSA to formally prove that that any changes to lassen does not change the generated RTL for the PE tile.

#135 contains the gold coreir json file which we should be comparing against.

Match Halide execution to floating point IPs

The bfloat implementation in Halide seems to be consistent with the IP with rnd=1. It would much easier if we can use that mode instead.

mux not mapped correctly

See https://travis-ci.com/StanfordAHA/GarnetFlow/jobs/201924953#L4299

Add comments on PE bitstream

Currently there is no way to know what ops the PE bitstream corresponds to. In other words it's impossible to debug the PE. Can someone add ways to comment the bitstream so that a human programmer can understand what's going one?

Leverage the functional tests to do RTL tests

One major issue we have seen is that we are missing a lot of RTL unit tests in lassen. Due to the nature of Peak we should be able to parameterize the functional tests (test_pe.py) to also do RTL tests.

add w/ external carry feature dropped?

Right now the carry in for the normal add op is always set to 0 (see https://github.com/StanfordAHA/lassen/blob/master/lassen/sim.py#L60). IIRC jade set the cin using a one-bit input (see https://github.com/StanfordAHA/CGRAGenerator/blob/master/hardware/generator_z/pe_new/pe/rtl/test_pe_comp.svp#L507) which enables the construction of a carry chain adder (e.g. for a 32 bit add).

Did we intend to drop this feature?

FP mult not working

Continuing on testing float pointwise. Create a new issue since the old one is too long: #111.

Build log: https://buildkite.com/stanford-aha/lassen/builds/194#65c1f0d0-c89a-47d4-97b9-5064d65ceaf5/80-1018

name_outputs incompatible with rtl generation

I would like to use name_outputs as a decorator on the call function. Currently this causes rtl generation to break, so I have to do a workaround.

Add tests for the following complex ops

Add coveralls to lassen

Broken lassen master branch

See here: https://travis-ci.com/StanfordAHA/GarnetFlow/builds/110987402
and here: https://travis-ci.com/StanfordAHA/lassen/builds/110816257

In the future people should run the garnet flow to make sure that nothing is broken...

FP add/mult not working

See https://buildkite.com/stanford-aha/lassen/builds/1#f7512cd6-f12f-455e-980a-3f25eafa2a84

Just a side note:
From now on every push and PR in lassen will trigger buildkite build, which tests the floating point ops with CW files. If you want to debug, please use kiwi to test locally.

EDIT:
I will improve the rtl_tester to intelligently choose different simulator based on the environment it's in. GarnetFlow is already doing that.

Better support for FPSub

I propose there be a dedicated configuration bit that will can flip the sign bit of floating point values. This would prevent burning an additional PE when doing FPSUB. (You can flip the sign bit by first doing an XOR in a separate PE).

@alexcarsello, thoughts?

Mapper/Packer tasks

automatically generate compile complex op rewrite rules
Full Halide test suite for mapper rewrite rules
Implement Instruction selection
Implement hierarchical packing
- Constant/register packing
- Investigate generating VPR format from PEak spec
- Investigate using SMT to generate viable packings

Other "micro" ops for the PE?

Halide requires the use of the following complex floating point operations:

div, rem, log, exp, pow, sqrt, sin, cos, tan, asin, acos, atan2, tanh

We currently have a bunch of microps that are required to do the floating point divide algorithm. They are as follows:

FGetMant
FAddIExp
FSubExp
FCnvExp2F
FGetFInt
FGetFFrac

Are these microps sufficient to be able to do similar algorithms for the rest of these complex operations?

Missing PE test for FPAdd

Better pytest generator for NANs in test_micro.py

List of possibly missing ops we would like to be able to support in lassen

32-bit Add
32-bit Mul
Carry-in
Counter mode
qualified register (valid)
Breakpoints/Watchpoints/IRQ

More Complex Ops

We need Round, Floor and Ceil. If we have one of these ops, the other two are very easy to create by just adding or subtracting 0.5 appropriately.

@nikhilbhagdikar, could you add these to our complex ops (in lassen/stdlib/) and add some tests?

For the tests, please use hwtypes.FPVector in order to construct and manipulate python floating point values

Move reading/writing logic outside of lassen description

Add support for casting int to float

Flags are wrong for FPAdd

We need to the flags to be correct in order to do all the floating point comparison operations. (<,<=,==,!=,etc...)

Ideally this can be done just using the already existent flags and that the cond.py does not actually depend on the instruction.

Why is irq listed as PE output

It seems like irq signal gets propagates into canal as a valid output. As a result, the irq signal goes to all the switchbox muxes, potentially increasing the area. What is this signal doing? Based on the code here:

lassen/lassen/sim.py

Lines 257 to 261 in 5909514

    
           # calculate interrupt request 
        
           irq = Bit(0) # NYI 
        
           # return 16-bit result, 1-bit result, irq 
        
           return alu_res, res_p, irq

is irq a way to output constant 0 to the application network? If not, can someone explain to me which applications we have so far need it? Is it a form of premature optimization?

Floating point lassen implementations seems to be wrong endianness.

Floating point numbers should have their fractional bits as their lsbs, and their sign bit as the msb.

@nikhilbhagdikarI I suspect you have done the opposite in your lassen implementation for most of the fp ops.

I have a branch called 'float-test' that contains your floating point changes along with some more parameterized tests.

on this branch, you can run:
>pytest -k get_mant

and you will see an assertion error.

Lassen read/write registers

@rdaly525
Peak core needs to output the following ports:

config_addr -> this is 8 bits.
config_data -> this is 32-bit wide
read_config_data -> this can be any size
config_en -> this is the write signal
reset -> 1-bit signal that sets everything to zero

Notice that there is no read signal (sorry I lied). The core should always returns the values to read_config_data given the addr.

Signed gte max not working

See: https://travis-ci.com/StanfordAHA/lassen/jobs/201656574#L1389

Convert the complex ops into a peak class

@nikhilbhagdikar, Can you do a PR off of 'float-test' and transfer your complex op algorithms into an inherited peak class? There is an explicit example of how to do this in the test test_complex.py with the FMA.

Convert the complex op tests to use hwtypes.FPVector

Lassen Tapeout Task List

Resolving/Missing Features

Counter - dropping
Breakpoints/IRQ - Not dropping
Floating Point Comparison/flags, Ross, Nikhil
Resolve differences between python gmpy2.mpfr implementation and design ware, Caleb, Ross, Nikhil
clk_en/Register Valid
Carry in
compare specification of instructions to both python simulator and RTL, Lenny, Keyi

Missing randomized Tests

RTL

Check that no functional units are duplicated, Lenny
- check that FP FUs are not duplicated (only one instance of add/mul modules in output verilog)
- reduced the number of multiply units via #131 and #132
Sanity check the RTL and look for simple optimizations, Lenny
- Register (Peak) - visually inspected and tested via cdonovick/peak#56
- RegisterMode - #125
- These datapath elements are currently tested through test_pe so we're fairly confident in there functionality:
  - _lut
  - adc
  - alu
  - cond
  - ite
  - lut
  - overflow
  - PE

Add test to use all rules in rules/all.json

Need memory specification in order to do mapping

I have a branch called 'mem' where I began specifying the memory using Peak.

I need the following fleshed out in the Peak representation of this memory tile.

All the ports that can be routed to
The full instruction
The functional specification (lower priority)

Currently I have hand-wavy specified the RAM/ROM mode in lassen/mem/*.py

Let me know if you have any qustions

Lassen Bug Tracker

FP_Mult Error mode issue
-Problem: RTL mismatched FPVector and Halide due to wrapper
-Solution: Use DW FP IP instead
-Verification: Lassen RTL tests
-Can be resolved today

SLT/SGT Bug
-problem: SMT cannot infer SLT and SGT. But the random tests are passing
-Solution: Still debugging (Caleb)
-Verification: Automapper finding SLT and SGT

SubExp Bug
-problem: A small portion of random tests (~5%) are failing the SubExp micro op RTL
-solution: Still debugging (Lenny/anyone else wants to help?)
-verification: Lassen RTL tests

Why VALID mode is removed from PE registers?

During configuration when the chip is clock-gated, using VALID mode will prevent the register taking values. This mode was proven to be very critical when I was testing the Jade chip.

So the question is: is VALID got renamed to DELAY mode, or we can't never clock gate the register in the future? If that's the case, there should be extra graph analysis to make sure that register used in counter should never be used inside PE since you can't clock gate it.

The rounding mode, which is round to nearest even, in the functional model and RTL does not match. See: https://buildkite.com/stanford-aha/lassen/builds/109#712c5179-b555-4ddb-8fd3-6da54ebba32b
The functional model is using mpfr, which I believe should also be correct.
When using -v with pytest, fault doesn't catch the error yet running individually does. There is something wrong with either the test bench setup or fault. See the successful build: https://buildkite.com/stanford-aha/lassen/builds/108#ef4d8b94-f985-4e9e-9873-cdf64ba87be3
@leonardt can you take a closer look?

	elif alu == ALU.GTE_Max:
	# C, V = a-b?
	pred = a >= b
	res, res_p = pred.ite(a,b), a >= b
	elif alu == ALU.LTE_Min:
	# C, V = a-b?
	pred = a <= b
	res, res_p = pred.ite(a,b), a >= b

	def gen_mode_type(family):
	"""
	Field for specifying register modes
	"""
	class Mode(family.Enum):
	CONST = 0 # Register returns constant in constant field
	BYPASS = 2 # Register is bypassed and input value is returned
	DELAY = 3 # Register written with input value, previous value returned
	return Mode

	# calculate interrupt request
	irq = Bit(0) # NYI

	# return 16-bit result, 1-bit result, irq
	return alu_res, res_p, irq

stanfordaha / lassen Goto Github PK

lassen's Issues

Recommend Projects

Recommend Topics

Recommend Org