cucapra / filament Goto Github PK

Fearless hardware design

License: MIT License

Rust 22.87% Makefile 0.02% SystemVerilog 1.15% Python 1.33% Vim Script 0.08% Shell 0.15% Verilog 73.23% CSS 0.55% JavaScript 0.04% HTML 0.45% Dockerfile 0.13%

fpga hardware-accelerator hardware-description-language type-system

filament's People

Stargazers

Watchers

Forkers

mfkiwl unsignedbyte kayagokalp hasheddan mayl xeyler jetbl

filament's Issues

Axiom schemas for known functions

We've played around with encoding the semantics of log and pow using forall statements but that didn't work since the solver will often timeout or respond with unknown in presence of quantifier:

; definition of log 
(assert (= (pow2 0) 1))
(assert (forall ((n Int)) (= (pow2 (+ n 1)) (* (pow2 n) 2))))

However, we can instantiate these assumptions for specific variables. For example, if someone writes:

comp Foo[#N, #K]() -> () where #N = pow2(#K) {}

It is sound to add the following assume statement to the body and the signature:

assume #N/2 = pow2(#K);

This can possibly be enough to prove recursive component generation correct (for example, a reduction tree) and would avoid problems like #119. In general, there is no guarantee this will be enough but at least we have a centralized source of soundness that avoids making the user litter code with assume.

Note: This is specialized version of the general concept that we don't want the forall to be instantiated (Z3 terminology) too many times and want some concept of a fuel. See F*'s concept of fuel.

Generate Filament stub from Verilog module

Build a simple tool that parses a Verilog module's signature and generates a Filament stub out of it. Next step would be to add comments to the Verilog module that can be extracted as signatures directly

Verifying complex conditions through predicate abstraction

Sometimes we might have complex constraints that are not easily proved using the SMT-backed solver. There are two possible solutions to this problem:

Propagate the constraint through the hierarchy to the point where the constraint has concrete values and can be evaluated
Add an assume statement that defers the checking of the constraint to the monomorphization stage.

Threading Through the Hierarchy

For example, if we have the following component:

comp Foo[#N, #K](...) where log2(#N) = #K { ... }

Because we have $log_2(N) = K$, we also know that $2^K = N$ but this fact might not be available in the theory of the solver we're using. Instead of actually requiring the solver to understand the theory of exponentials, we can just add the constraint to the component:

comp Foo[#N, #K](...) where log2(#N) = #K, exp(2, #K) = #N { ... }

Of course, if a component uses Foo, it has to prove this property as well and therefore needs to assume it:

comp Bar[#P, #Q] where log2(#P) = #Q, exp(2, #Q) = #P {
  f := new Foo[#P, #Q];
}
comp main() -> () {
  b := new Bar[16, 4];
}

Note that this "threading through" only needs to happen up till the main component which provides concrete values for both the parameters and can therefore prove that the constraint holds numerically.

`assume` Statements

The threading through technique can get pretty cumbersome, especially when complex constraints are implied by the type safety checks. Instead of threading through things, we can use assume statements:

comp Foo[#N, #K](...) where log2(#N) = #K assume exp(2, #K) = #N { ... }

This basically states that if the constraints on the component are met then certain other conditions automatically hold. Other components can use Foo without having to assume extra constraints:

comp Bar[#P, #Q] where log2(#P) = #Q {
  f := new Foo[#P, #Q];
}

Of course, the assume statement itself could be invalid and so during monomorphization, when we have concrete values for #N and #K, we can plug them in and check that the assume statement held. If it didn't, we can signal a problem.

Bundles should have delays

Currently, the bundle syntax looks something like this:

bundle f[#N]: for<#i> @[G+#i, G+#i+1] #W

It is (hopefully) obvious that each wire in this bundle holds onto values for 1 cycle. However, for a bundle like this:

bundle f[#N]: for<#i> @[G, G+#i+1] #W

Each index holds onto a value for varying number of cycles. This means that this bundle cannot be reused until the index that holds the value for the longest amount of time releases it. This is extremely similar to how the input-output ports of a module affect its delay. It implies that a bundle must have a delay:

bundle f<G: #N>: for<#i> @[G, G+#i+1] #W

This tells us that the bundle may hold onto values for long amounts of time but it is bounded by #N. A bundle well-formedness pass must check this property.

A bundle used inside a module affects its delay in the same way that an invocation: for a module with delay d, for each bundle, we must ensure that the delay of bundle b is less than or equal to d.

Automatically mark ports as `@stable` in Calyx backend

The @stable annotation is used by the Calyx canonicalization pass to sort assignments in dataflow order. However, because filament doesn't mark any primitive ports as @stable, canonicalization often fails leading us to disable it. It occurs to me that we can do a conservative analysis to figure out which ports must be registered and therefore @stable. Specifically, any port that isn't alive in the cycle [G, G+1] must be registered while ports active during that interval might have a combinational path between their inputs.

This can be discussed in the paper as well as a utility of the type system

`with` construct to clean up pipeline stage definition

Construct to Define Event Equalities

When working with big pipelines, it is common to have a bunch of code that looks like this:

// stage 1
add := new Add<G>(left, right);
gt := new Gt<G>(add.out, 10);
...

// stage 2
add := new Add<G+1>(left, right);
gt := new Gt<G+1>(add.out, 10);
...

It might be useful to have an operator that allows us to define the start time of the current pipeline stage:

with S1 = G {
  add := new Add<S1>(left, right);
  gt := new Gt<S1>(add.out, 10);
  ...
}
with S2 = G+1 {
  add := new Add<S2>(left, right);
  gt := new Gt<S2>(add.out, 10);
  ...
}

Of course, under the hood, all this is doing is defining equality between the events S1 and G but from the perspective of the user, it is a nice way to visually see which portions of the code are associated with a particular stage.

Correctness of Guard Synthesis

Compilation of high-level Filament programs need to turn all invocation into guarded assignments that forward the source values into the instance's ports as specified by the schedule.

In short, it must turn this high-level program:

M := new Mult; m0 := M<G>(a.out);

into:

M := new Mult; m0 := invoke M<G>; m0.left = guard ? a.out;

The core problem during lowering is synthesizing guard expressions for each invocation. Here, we'll show that for a well-typed high-level Filament program, we can always generate correct guards.

Restrictions: We only address compilation of programs without max expressions and adhering to restrictions from #27 which disallows relating events using ordering constraints.

Ports on Pipelined FSMs

Generating FSMs: We assume that for each event G for the current component, there is an FSM called Gf that is triggered by the interface port for G:

component main<G>(@interface[G, G+d] go_G: 1) -> (...) {
    fsm Gf[n](go_G);
}

The FSMs for each event can be thought of as a series of shift registers of length n. The port Gf._i is active when the $i^{th}$ register has a token in it. The interval type for a port is:

Gf._i : @[G+i-d, G+i+d] + @exact[G+i, G+i+1]

where d is the delay for the port that triggers the FSM.

First, we look at the active part of the specification: @exact[G+i, G+i+1]. This means that the guard has a non-zero value $i$ cycles after a token enters the FSM.

The availability of the guard is more complex: [G+i-d, G+i+d]. It states that the guard is guaranteed to have a known value in the given duration. Next, it says that the guard does not observe a new token for d cycles after it gets a token or d cycles before it observes a new token.

The intuition for this is that if an FSM is triggered at most once every d cycles, each token is separated by d cycles in both the past and the future. The first token is a special case but we assume that resetting the circuit instantiates everything to $0$.

Constraints on Low-level Assignments

In the high-level program

m0 := M<G>(a.out);

We have the restriction that a.out is available for at least as long as m0.left requires. Thus, if we have the availabilities $[G+m_s, G+n_s]$ and $[G+m_d, G+n_d]$ for a.out and m0.left respectively, we have $[G+m_s, G+n_s] \supseteq [G+m_d, G+n_d]$.

When generating the assignment:

m0.left = guard ? a.out;

We need to synthesize a guard that is available for $[G+m_s, G+n_s]$ and active for $[G+m_d, G+n_d]$, i.e. it has a non-zero value in the interval that the destination requires and long enough to guard the source port.

Lemma: The delay for the event $G$ is greater than or equal to $n_s - m_s$.
Proof. Since a.out is an output from some instance related to invocation a, we know that G must obey the delay constraints (as specified in #20). Specifically, the delay must be longer than the length of the interval of any input or output port. Similar reasoning applies to any input port of the component that can be used in this position.

We also know that $n_s - m_s \geq n_d - m_d$ due to the subset relationship between the source and destination port.

Synthesizability of Guards

Here is the rule we use to generate guards: If the requirement of a destination port is $[G+m_d, G+n_d]$, generate the expression:

Gf._md | Gf._m{d+1} | ... | Gf._n{d-1}

So, if we have the interval $[G+3, G+5]$, we'll generate the expression:

Gf._3 | Gf._4

To show that this is correct, we need to show that:

The guard is active for as long as the destination requires
The guard is available for as long as the source is

We have (1) by construction since we use the destination to construct the guard expression.

For (2), we have $n_s - m_s \leq d$ and $m_s \leq m_d$, thus we have:

$$ m_s + d \leq m_d + d $$

$$ n_s \leq m_d + d $$

This shows that the guard is active at least until the end of the interval for the source port.

Next, we can use similar reasoning to show that the guard is active at least before the start of the source port:

$$ n_d \leq n_s $$

$$ n_d - d \leq n_s - d $$

$$ n_d - d \leq m_s $$

Thus, we have that the guard is available for at least as long as the source port and active as long as the destination requires.

Implementation

The implementation in the compiler currently does not have a way to talk about terms like G+i-d since negation of events is not supported. If we want the lowered programs to type check, we need some way of doing this. If not, we can skip implementing this and rely on the pen-and-paper proof.

Bundles in component signatures

Parameteric components might benefit from being able to specify that their IO interface depends on the parameters. For example if we want to build a shift register where each one of its intermediate states are visible:

comp Shift[N]<G: 1>(@[G, G+1] in: 32) -> (out[N]: for #i. @[G+i, G+i+1])

Here, the port out is a bundle (similar to an array) but can provide values at different time steps. The nice thing is that we don't need any extra machinery to check such IO bundles because they are already supported by the compiler inside the component body.

I think the types of bundles are also safe to occur in the IO binding position because they are dependent pairs and not forall quantified.

A package manager and module system for Filament

A key part of a language ecosystem is a well thought out package manager and module system so let's build one for Filament! There is a couple of key challenges with designing something like this for Filament:

Supporting hardware generators. For example, it should be possible for someone to write a hardware generator in Chisel and provide a command line invocation to generate the corresponding Verilog file for a given Filament extern. We should figure out exactly what the interface is here but I'm hopeful that something like "consume Filament signatures and generate verilog files" would be a good first pass.
Supporting Black box IP. We should support things like Xilinx's LogiCORE IP generator. Again, it is a similar problem to hardware generator but the problem is that we don't get access to Verilog files anymore. We need to be tied to a particular compilation flow (Xilinx vs Synopsys) and the Verilog is only visible during synthesis.
Downloading packages. It's too early for us to build a package repository. I suggest we take a note from cargo and allow people to point us to particular github repository or even commits to grab a particular file that implements desired modules. We should probably disallow arbitrary file downloads for safety reasons.

Module System

The space of module systems is vast and complicated. I suggest we start with something simple and extensible like Rust's module and namespacing scheme. We should have the ability to describe private and public components and referring to them using the :: syntax.

@sampsyo's writeup on modules for Calyx is a good starting point for other ideas. Specifically, take a look at the "related work" links which mentions verilog package managers: calyxir/calyx#419

Designs to Implement

Here is an ever-growing collection of designs to implement in Filament as well as interesting links worth reading.

FFT

Papers

Parametric implementation of benchmarks from Aetherling paper
Genome Kernel
Crypto/ECC/Homomorphic accelerators
Systolic arrays
Hardware debugging paper from Michigan people
FINN matrix-vector RTL

Processors

RISC-V, especially if we show going from 3-stage to 5-stage pipeline we get the right errors
- PicoRV32 from Chris batten’s class

Actionable Apps

Applications from: https://github.com/KastnerRG/pp4fpgas
CORDIC: https://github.com/cebarnes/cordic/blob/master/cordic.v
Multiply core generator: https://github.com/ZipCPU/fwmpy
FloPoCo: http://www.flopoco.org/
DSP filters: https://github.com/ZipCPU/dspfilters
Polynomial multipliers: https://arxiv.org/pdf/2110.12127.pdf
Accelerators: https://github.com/aolofsson/awesome-opensource-hardware#accelerators
HardCaml libs: https://blog.janestreet.com/growing-the-hardcaml-toolset-index/
Spiral hardware generators
Efficient Multi-Cycle Folded Integer Multipliers
zk-SNARK accelerator from Jane Street

Less Actionable

Match action tables?
- Memories have a specific latency
- Some network processing function
- They are often fixed-latency
Extract pipelines from Rosetta benchmarks
- Rosetta: https://github.com/cornell-zhang/rosetta
Polybench Benchmark Suite
Kernels from google/xls?
BF16 float mult

Optional defauts for event variables

It would be useful to specify default values of event variables. For example, we often want to use a register for exactly one cycle. But we have to specify:

r := new Register[32]<G, G+1, G+2>(...)

Instead, it would be nice to just say:

r := new Register[32]<G>(...)

We can add some syntax for default bindings for event variables:

component Register[WIDTH]<G, ?S = G+1, ?L = G+2>(...)

The ?S syntax tells the compiler to bind S to G+1 if no binding is provided for the variable. ?X variables must appear at the end of the event variable list

`@interface` specifications should provide delay instead of intervals

Currently, we specify interfaces like this:

@interface[G, L]

which denotes that this is an interface for the event G and has the length L - G. However this doesn't allow us to specify interfaces whose delay is something like G-L-1 because L-1 is not a valid event. However, consider the Latch interface:

  component Latch<G, L>(
    @interface[G, L] write_en: 1,
    @[G, G+1] in: 32
  ) -> (
    @[G+1, L] out: 32
  ) where L > G+1;

It says that the Latch may not be re-executed more than once every L-G cycles. However, for a concrete use of the latch, we'll have something like:

L = Latch<G, G+2>(in)

In order to make the in signal available between G+1 and G+2. However, the interface will force us to not use the latch for G+2 - G cycles which is not correct.

This problem came up in the broader context of trying to implement an II=1 pipeline:

component FastMult<G>(
  @interface[G, G+1] go_G: 1,
  @[G, G+1] left: 32,
  @[G, G+1] right: 32,
) -> (
  @[G+3, G+4] out: 32,
) {
  // First stage, register the inputs
  L := new Latch;
  R := new Latch;
  l := L<G, G+2>(left);
  r := R<G, G+2>(right);

  // Second stage, perform the computation and save it
  M := new MultComb;
  m := M<G+1>(l.out, r.out);
  OutTmp := new Latch;
  ot := OutTmp<G+1, G+3>(m.out);

  // Third stage, forward the value from temp out to out register
  Out := new Latch;
  final := Out<G+2, G+4>(ot.out);

  // Connect the output to the out register
  out = final.out;
}

This pipeline cannot type check because G pulses every cycle but the interface requirements on the Latch will not allow it to pulse more often than every two cycle.

Enforcing stronger bounds on dynamic events

The following program type checks:

import "primitives/core.fil";

component main<G, H, L>(
  @interface[G, G+3] go_G: 1,
  @interface[H, L] go_H: 1,
  @interface[L, L+1] go_L: 1,
  @[H, L] in: 32
) -> () where G >= H, L >= G+3 {
  M := new Mult;
  m0 := M<G>(in, in);
}

The constraints on H and L ensure that the requirements of m0 are met. However, these constraints are not enough to make the program pipeline safe.

This is because G may re-trigger after 3 cycles but L from the first iteration may still be active. This means that the second iteration of this pipeline would not be able to satisfy the requirement H <= G.

I think we need to say something about the constraints on the constraints themselves. Specifically, given a bunch of constraints, we need to make sure that they are correct for this execution as well as the pipelined execution. In this case, I think the thing to enforce is that G + dG >= H + dH and L + dL >= G + 3 + dG, where dX is the delay of the interface X. These constraints ensure that the requirements implied by the first set of constraints are satisfiable during pipelined execution.

Checking sharing with `for` loops

If we have the following program:

A := new Add[32];
for #i in 0..#W {
  a := A<G+#i>(10, 20); // Uses the adder #W times
}

We need some way to check that the various shared uses of a are safe. The naive encoding is of course just unrolling the loop and adding constraints between each use. However, we can use the fact that the shared uses come from a structured loop to generate more efficient constraints. Namely:

To check that two invocations do not conflict, we need to enforce that #i >= delay of A. This is sufficient because each use of A is guaranteed to be #i cycles after the previous one.
To check that the parent component's delay is long enough, we can require that max(#i) + delay of A - min(#i) <= delay. This is again sufficient because all uses of A are going to be contained within the range of #i.

The bigger challenge is what do when we have uses of A instead multiple contexts? I think the above can serve as a guideline for generating efficient constraints instead of the normal thing of O(n^2) constraints for each reuse.

Should `@interface` be longer than any of the outputs?

Seems like they probably should? This program type checks on 8d9af95:

extern "dummy.sv" {
  component Merge<G, L, E>(
    @interface[G, L] go_G: 1,
    @interface[L, E] go_L: 1,
    @[G, L] in1: 32,
    @[L, E] in2: 32,
  ) -> (
    @[G, E] out: 32,
  ) where L > G, E > L, E > G;

But I think that @interface[G, L] is wrong because the out signal needs to be held between [G, E]. Shouldn't the interface for G be greater than E?

Pretty printer does not print parameters for modules

Existentially quantified parameters

Parameters in Filament are universally quantified. For example, for the following module:

comp Foo[#W]() -> () where #W >= 32 { ... }

The parameter #W is universally quantified: forall values of #W that respect the constraints, we generate a component that is well-pipelined. Such universal quantification allows the calling context to decide how to use the module by providing a concrete instantiation for #W. In return, the calling context must prove that the binding for #W satisfies the constraint.

Existential types are the dual of universally quantified types. Instead of the calling context deciding how to use the module, the module tells the context how it's allowed to use it. For example:

comp ShiftSome[exists #L]<G: 1>(@[G, G+1] in: 32) -> (@[G+#L, G+#L+1] out: 32)

This module delays the signal by some amount #L that is unknown to the calling context. Instead, when the calling context instantiates a shift circuit, it must use the specific instance of #L bound to that ShiftSome component:

comp main<G: 1>(... in: 32) -> () {
  S := new ShiftSome; // We don't provide a parameter here
  s0 := S<G>(in);
  mux := new Mux[32]<G+S.#L>(sel, s0.out, ...); // Scheduling must use S.#L to correctly use the signal s0.out
}

Note that when scheduling this program, we have to use S.#L to schedule things. S.#L is unknown when we write the program but will turned into a concrete value during compilation. While all invocations of S will get the same constant, different instantiations of ShiftSome might get different values of #L. This is important because it opens the door to something really cool–per-instance retiming.

Constraints on existential parameters must be satisfied by the body of the component defining them and can be assumed by the calling context.

Compilation

The compilation strategy for existentials involves turning them into universally quantified types. This has the effect of turning the quantifiers "inside out". For each instance, we decide (somehow), the binding for each existential variable and replace all instantiations to use those constants. For example, in the above program, if we decide #L=10 is a good binding for the variable, we change:

S := new ShiftSome;

Into:

S := new ShiftSome[10]

And make the signature of the component take to have #L be universally quantified parameter.

This is not fully developed yet but I think it's a good starting point.

Allow terminating recursion

With parameteric module, recursion starts making more sense if we can prove that the recursion terminates. This can be done if we can show that all parameters to a recursive call are smaller than their current values (maybe only one parameter needs to decrease?)

Maybe Ryan knows if this is the same as Coqs termination check.

Bundle interval well-formedness

We should establish that for each bundle, the interval is well-formed. For example:

bundle f[#N]: for<#i> @[G, G+#i] #W

Ensure that: $i \in [0, N): G+i > G$

Require parameters usage to be non-negative

We should check that each instantiation site only uses non-negative parameter expressions. This will allow us to come up with a well-founded induction scheme over the parameter and enable #85 in the future

Calyx doesn't have ir::Printer::write_extern function

In src::backend::compile.rs, it's calling write_extern function. However, fb15c5e7 commit of calyx removed it, and I we have to use write_externs function instead.
I would recommend,

ir::Printer::write_externs(
            (&path, prims.iter().map(|(_, v)| v)),
            &mut out,
        )

Separate out type checking from interval check

The interval check pass has gotten quite big and complicated because it is currently handling two different tasks:

Standard well-formedness checks like indexes in bundles being within bounds
Filament's interval checking algorithm

The reason for this is because of the original authors laziness and hesitation in adding another pass that calls into an SMT solver–because (1) also requires calls into an SMT solver, it was just easier to keep everything in one place. However, this has gotten extremely unwieldy. The correct solution is separating those two things out so we can write a standard type checking and well-formedness pass followed by an interval checker.

The nice thing about this issue is that it mostly involves code reorganization.

Ensure all ports of this component are assigned to

Different Definitions of Initiation Interval

There are two possible definitions of initiation intervals of pipelines (apologies in advance for the unwieldy names):

Exact retriggerability
Forever retriggerability

Exact Retriggerability

If $P(t)$ represents the execution of a pipeline at time $t$, then $i$ is the initiation interval if and only if s.t. $P(t)$ and $P(t+i)$ do not conflict. There may $n$ s.t. $P(t)$ and $P(t+i+n)$ do conflict.

Forever Retriggerability

If $P(t)$ represents the execution of a pipeline at time $t$, then $i$ is the initiation interval if and only if $P(t)$ and $P(t+i)$ do not conflict and there is no $n$ such that $P(t)$ and $P(t+i+n)$ conflict. In other words, once a user waits for the $i$ cycles, it can choose to retrigger the pipeline anytime in the future.

Which one to pick?

Both definitions of initiation intervals (II) seem reasonable and there doesn't seem to be any a priori consensus on which one to pick. For example, in software pipelining, the initiation interval is considered to be "the number of cycles between the launch of successive loop iterations". This corresponds to our "exact retriggerability" definition. I suspect that HLS tools subscribe to this definition as well.

In hardware pipelines, it often seems to be "the number of cycles that must elapse before instruction issue to the same unit". This corresponds to our "forever retriggerability" definition.

Without much motivation, I'm going to pick two criteria for a good definition of II:

Enforceability: The definition can be tractably enforced by the type system
Composition: The definition does not have to change in response to every internal implementation detail

Enforceability

There are two places where the definition of initiation interval is used and checked:

Repeated invocations of the same instance
Implicit pipelining of an invocation

For (1), we have something like this:

M := new Mult; m0 := M<G>(...); m1 := M<G+10>(...);

We need to ensure that m0 and m1 do not conflict. In the exact case, we must show that after G occurs, G+10-G is exactly a constant factor of the initiation interval of Mult. For n invocations, we must do $O(n^2)$ such checks.

In the forever case, we must check that G+10-G if greater than or equal to the II of the pipeline. This gives us a little more flexibility in that we can reuse an instance any time after the II of the instance. On the other hand, we might have to wait for fewer cycles in the exact case.

For (2), we need to ensure the event used to trigger an invocation, triggers in accordance to the instance's II. In the exact case, we would have to ensure that G triggers some constant factor of the II of Mult. In the forever case, we have to ensure that G triggers less often than the II of Mult. Again, forever gives us more flexibility but may require the pipeline to wait for longer.

Composition

Questions about composition are more focused on interfaces:

component Double<G>(
    @interface[G, G+100] go_G: 1,
    ...
) -> (...) {
    M := new Mult; // II = 3
    m0 := M<G>(...); m1 := M<G+3>(...)
}

The interface for G waits a 100 hundred cycles even though the pipeline takes 6 cycles. In the exact case, we'd have to have the interface line up exactly with the II of G, along with other constraints.

The more interesting case of the brittleness of the exact definition is when we use an II where there is some $n$ even though $P(t)$ and $P(t+i)$ do not conflict, $P(t)$ and $P(t+i+n)$ do conflict. In this case, a small change to the pipeline may cause the entire interface to change, creating cascading effects up the pipeline.

Time multiplex bundles onto the same wire

It seems that if we can prove that none of the values in a bundle overlap then we can use the same wire to receive the inputs. What is the right way to do this? Probably needs special handling in the backend

`struct` definitions in Filament

With complex bundle types showing up in the arguments of components, it seems that a more general technique to represent IO interfaces is in need. Borrowing ideas from Chisel, I think we can use the concept of Bundle (different from filament bundles which are just arrays).

The high-level idea is that a struct definition has a delay and it ensures that all sub-fields hold onto values for less time than the delay of the struct:

struct PacketInfo<G: 1> {
  id: @[G, G+4] 32,
  fields: for<#i> @[G+#i, G+#i+1] 32, 
}

Type-checking should reject this struct definition because the id field holds onto values for 4 cycles while the PacketInfo specifies that the event G reoccurs every cycle. This is problematic because this means that providing a new PacketInfo every cycle will cause problems.

Components like Mac can be defined like this now:

struct MacInputs<G: 1> {
  left: @[G, G+1] 32,
  right: @[G, G+1] 32,
  acc: @[G+2, G+3] 32
}
comp Mac<G: 1>(inp: MacInputs<G>) -> (@[G, G+1] out: 32) { ... }

Of course, you should be able to specify parameters for your structs.

Support for existential variables

Add support for existential variables which can occur in signatures and need to used in interval checking.

Syntax

Choice 1
We've been using the following syntax for whiteboard discussions. It's kinda janky:

extern component Mult<G, @exists(L)>(
  @within(G, G+1) left: 32,
) -> (
  @exact(G+L, G+L+1) valid: 1
);

However, it does make clear that L is only bound within the definition of the signature. This also implicitly means that L cannot be used in the body of the component. This might be an unintended restriction.

Choice 2
We can use an OCaml-like syntax for existential type variables:

extern component Mult<G>(
  @within(G, G+1) left: 32,
) -> (
  @exact(G+L, G+L+1) valid: 1
) {
  exists L;
};

This has the benefit that an invocation only needs to instantiate the exact time variables that are specified in the signature. The downside is the complexity in detaching a Signature from a component definition.

Interval Checker

TODO: Flesh out the proposal after basic interval checker is working with non-existential programs.

Allow user-level components to specify `@phantom` ports

@phantom ports are currently used to model the initiation intervals of modules that don't have an explicit go signal. For example, a combinational component has a @phantom @interface[G, G+1] go: 1 port to model the fact that they cannot be used multiple times in the same cycle.

The idea here is to allow user-level components to specify @phantom ports iff the lowered component will not instantiate a component. This is possible only when:

All sub-components have only @phantom go ports, and
The component does not have any resource sharing

In these cases, we do not need to instantiate an FSM and can additionally make the go port corresponding to the event a @phantom.

Port bundles affect the delay of the module

When using a bundle in the ports of a component, we need to ensure that the availability of each signal is less than the delay if the module.

Case studies

Some ideas for case studies. We need to find components that have purely static behavior:

Systolic array
Pipelined multiply-add

Disjoint write analysis

The disjoint write analysis provides the second guarantee of the Filament type checker: that resource usage never conflicts. A naive way to enforce this is to require that writes to ports of the same instance are all disjoint. For example:

component MultTwice(@[G, G+1] left: 32, @[G, G+1] right: 32) -> (@[G+2, G+3] out: 32) {
  M := new Mult; // 1-cycle latency Mult
  m0 := M<G>(left, right);
  m1 := M<G+1>(m0.out, m0.right);
}

We can try to generate the following disjointness conditions where L give the liveness of a port:

L(left) /\ L(m.out) = ϕ
L(right) /\ L(m.out) = ϕ

In general, for a component with n input ports and k uses, we'd generate n constraints of size k. However, I want to demonstrate that we can get away by checking that the assignments to the abstract event variables do not conflict with each other.

The `@interface` Ports

The @exact modality allows us to reason about when signals will have zero and non-zero values. For example:

@[G, G+5] + @exact[G, G+1] go: 1

This states that we can reason about the value of go b/w [G, G+5] and rely on the fact that it will only be non-zero during [G, G+1].

An @interface definition like the following:

@interface<G, 2> go: 1

Is just sugar for:

@[G, G+2] + @exact[G, G+1]

Using this "required zero" semantics, @interface ports allow us to enforce requirements on repeated usage: a signal may not pulse more often than allowed by its interface requirement.

Correct Interface Ports

When compiling a high-level Filament program, the compiler infers interface ports that correspond to each event in the component.

Well-formedness of @interface: If a signal is used b/w [G+i, G+i+n], the interface signal should have a delay of at least n.

In other words, if a design is using the signal for n cycles, it is certainly not safe to trigger the design more than once every n cycles.

Correct Interface Usage Implies Disjoint Use

Going back to our original example, if the design uses an input signal between [G+i, G+i+n], then we know that the interface signal will require a delay of at least n.

m0 := M<G>    // <- first assignment to interface
m1 := M<G+1>  // <- second assignment to interface

If we can ensure that G and G+1 are n apart, then we know that all signals associated with them will also be disjoint.

In other words, the disjointness of @interface ports is a sufficient constraint for disjointness of all other ports that make use of the related event.

Borrow instance syntax

Extend the instance creation syntax with option of defining the length of the "borrow" for the instance. This is meant to pre-commit to an interval during which the instance can be reused. I think that in general, parametric programs will need this in order for the shareability check (especially for #76)

Implicit Pipelining with Dynamic Events

In Filament, components are implicitly pipelined. For example, in the following component can execute in a pipelined fashion:

component MultAdd<G>(
    @interface[G, G+1] go_G: 1
    @[G, G+1] left: 32,
    @[G, G+1] right: 32,
    @[G+1, G+2] acc: 32
) -> (
    @[G+1, G+2] out: 32
) {
    M := new Mult;
    A := new Add;
    m0 := M<G>(left, right);
    a0 := A<G+1>(m0.out, acc);
    out = a0.out;
}

Since the interface for G specifies that it may trigger every cycle, the Filament type checker must check that repeated re-execution is safe.

Safety is checked is assuming that G will trigger any time after its delay, in this case 1, and showing that no re-execution of the pipeline after G's delay would conflict with the current iteration.

Pipelining with Dynamic Events

Consider the following component signature:

component Dyn<G, L>(
    @interface[G, G+3] go_G: 1,
    @interface[L, L+1] go_L: 1, ...
) -> ( ... ) where L >= G+3 { ... }

The interface specifies two events G and L and provides an ordering constraint between them, L >= G+3. The ordering constraint requires that any called of the component ensures that L arrives at least 3 cycles after G does.

In order to ensure safe pipelining, we have to reason about the retriggering behavior of both G and L and ensure that their delay do not violate the constraint on the interface. As provided, the interface is incorrect; here is an trace of events that is accepted by the interface but violate the constraint:

Cycle 0: $G_0$ triggers
Cycle 3: $G_1$ triggers, $L_0$ triggers
Cycle 4: $L_1$ triggers

Note that this trace is accepted by the delay specifications for $G$ and $L$ since $G_1 - G_0 = 3$ and $L_1 - L_0 = 1$. However, it violates the constraint on the interface since $L_1 \ngeq G_1 + 3$.

Restricting Interfaces

The key problem is that the delay specification for G and L do not constraint the event in such a way that subsequent retriggers obey the constraints of the interface. Ideally, we could change the delay specification such that the constraint holds. We would like trace history to look something like:

G0 --> G0+3 --> L0 --> G1

This implies that the delay for G must be dependent on the event L occurring; G may not retrigger unless L has triggered:

@interface[G, L] go_G: 1

Next, we need to ensure that L1 obeys the interface constraint:

... --> L0 --> G1 --> G1+3 --> L1

which is the same as (where dG is the delay for G):

... --> L0 --> G0+dG --> G0+dG+3 --> L1

This means that the constraint on the delay for L looks something like:

L + dL >= G + dG + 3

This shows that the delay for L depends on the delay of G; however, the delay of G is itself dependent on L.

TLDR: The history of L and G are interdependent; in order to figure out when the next L should occur, we need to know when the last G occurred. In other words, the @interface for both events need to be able to talk about the trace of the other event directly so that the constraint can be obeyed.

The Problem

The problem here is the constraint between G and L. By specifying it, we relate the traces of events in some manner, and we need the delay specifications to encode this information. Since interfaces only talk about delays, there is no way to relate histories of multiple events.

At this point we have two solutions:

Extend the delay specifications to handle more general histories
Disallow constraints over events in the language

I do not think (1) is worth doing. This is because even if we can make it work, this way of relating dynamic events together does not correspond well with dynamic pipelines. I think the right way of doing this probably requires some mechanism of specifying type-level "back pressure"; the retrigger of G and L is not dependent on the two events but rather the availability of the resources. The resources should be able to apply back pressure to the rest of the circuit to stall events from arriving. Doing all of this is definitely out of scope for the first paper.

This means that the proposed solution is disallowing ordering constraints in the language.

The Fix

The proposed fix is to disallow constraints on user-level components but to allow them on primitives. For example, the register primitive is still allowed to specify the constraint:

component Latch<G, L>(
    @interface[G, L] go_G: 1,
    @[G, G+1] in: 32,
) -> (
    @[S, L] out: 32
) where L > G+1;

User-level components do not have ordering constraints, there is no way to construct a dynamic use of the Latch:

component Main<G, L>(@[G, G+1] in: 32) -> (...) {
    L := new Latch;
    l0 := L<G, L>(in); // <- no way to prove L > G+1
}

This program will not type-check since the component cannot specify L > G+1.

However, uses of such Latch that use the same event are still valid:

l0 := L<G, G+3>(in) // G+3 > G+1

Furthermore, it might still be possible to use max expression to express a limited form of dynamism in the language.

Components to implement

Some components we should implement:

Sequential read/write memories
Iterative Divider
Pipelined divider (do these exist?)
Square root (for polybench benchmarks)

Requirements on Low and High-level programs

Filament programs exist at two-levels of abstractions: high and low. A high-level filament program can use abstract scheduling whereas a low-level filament program must explicitly use FSMs to trigger and manage resources. Both levels of abstractions are checked for the Filament type safety property.

High-level Filament Programs

A high-level filament program defines all the events uses using generics syntax and only uses those events to perform scheduling:

component Mult<G>(@[G, G+1] left: 32, @[G, G+2] right: 32) -> (@[G+2, G+3] out: 32) {
  A := new Add;
  a0 = A<G>(left, right);
  out = a0.out
}

A high-level filament program is not allowed to use @interface signals,
fsm construct, or low-level invoke statements inside it.

High-level filament programs have the following type checking rules:

Ports provided as inputs must be available for a superset of the interval requirement of the arguments.

Low-level Filament Programs

A low-level filament program does not use high-level invokes to perform scheduling. Instead, it explicitly
instantiates fsm to coordinate the use of resources and signals.

The specific requirements of a low-level filament program is:

There must be an @interface signal for each defined event in the component
There must not be any high-level invoke statements in the component; only the low-level invoke syntax is allowed.

The type checking rules are:

Input ports must not be driven longer than the initiation interval of the corresponding signal. (Nte: This needs to be explained in detail).

Given these requirements, we can guarantee that:

A low-level filament program can be directly compiled into a Calyx program

Compiling High-level Filament Programs

Compiling a high-level filament program to low-level one involves:

Infer all the @interface signals
Generate FSMs for each event
Compile high-level invokes into low-level invokes with connections

The compilation process must ensure that the generated code follows the requirements laid out for the low-level filament programs.

Are Guards Needed in Absence of Sharing?

For all the data ports, why do we even need a guard? In absence of sharing, the port will either receive a 0 (because of how calyx generated muxes) or always receive the same input. If data ports behave as described in calyxir/calyx#1169, then we can just always forward the input because the Filament type system guarantees that we will not read outputs corresponding to 'x.

In other words, because the default assignments for data ports should be 'x (according to the Calyx proposal above), it is safe to replace the assignment: in = guard ? data : 'x with in = data because 'x can be any value.

Passing binary values in with data files

Rachit and I talked about this briefly - I thought it would be convenient if we could represent inputs in binary/hex when testing components. I don't think JSON files will accept things that aren't in decimal, though.

This came to mind when I was trying to test a decoder for RISC-V instructions. To test it, I was writing out the instructions in binary, converting it to decimal, and putting that value in the JSON file. This was a bit unwieldy and also made it less clear what each test was targeting.

Well formedness constraints for intervals

Generate and check them

Using indicator variable for constraint checking

Using Indicator Variables

Currently, we use the following loop to check whether a given constraint P is valid, i.e., satisfied by all assignments to the variables:

(push)
(assert (not P))
(check-sat)
(pop)

Instead of doing that, we can attempt to generate "indicator variables" which essentially tell us which constraints can be violated.
For example, we can have a constant t0 along with the formula:

(assert (and t0 (not P)))

This formula is satisfiable if and only if both t0 and P can be assigned to true at the same time. This means that if we allocate a different variable for each assert and the check-sat query says SAT, we can look at the model and look which indicator variables were set to 1 to know which constraints were violated.

Getting all Violated Constraints

Of course, in the above model, we will only get a non-deterministic subset of indicator variables set to 1 when the constraints are violated. In order to get all the violated constraints, we can ask the solver to maximize the value of (sum t0 .. tn) which would force it to assign as many ti to 1 as possible.

Consistency checks for `assume`

With #117, we have support for assume statements that add assumptions in the interval checking context. The expectation is that users only use these for sound assumptions with respect to the current set of constraints. However, it is going to be very easy to run into trouble with them. For example, the following is completely legal code currently:

comp Foo[#W]() -> () where #W > 10 {}
comp Bar[#N]() -> () {
  assume 1 = 0;
  F := new Foo[#W]; // type checks because the assumption makes everything unsat
}

It is easy to do some simple consistency checks for assume statements: if, for the current path condition (set of assumptions), this assume statement makes the path condition unsat, then we know that the assume statement is unsound.

Dynamic pipelines in Filament

I was wondering a couple of related questions:

Where do non-offset intervals come from and what do they really mean? Ex: [G, L] or [G, max(G, L)+1] as opposed to [G, G+n].
What can components that non-trivially use multiple events actually do?

A trivial multi-event component is one that essentially has different "paths"
running through its signals associated with an individual event:

component two_paths<G, L>(
  @[G, G+1] in0: 32,
  @[L, L+1] in1: 32,
) -> (
  @[G+3, G+4] out0: 32,
  @[L+2, L+3] out1: 32,
)

The signature of this components tells us that the two events, G and L do not interact in any observable way; if they did, at least one of the outputs would have a lifetime that depended on both of them.

One of the simplest non-trivial multi-event component is a Latch:

component Latch<G, L>(
  @interface<G, L> write_en: 1,
  @[G, G+1] in: 32
) -> (
  @[G+1, L] out: 32
)

The out signal of the latch requires that it be available till the event L occurs. The write_en signal enforces that the latch cannot be used again till at least the event L has been observed. Because the implementation of the Latch does not provide a way to observe the L event, there is no Filament program that can construct signal available from [G+1, L]; this means Latch is necessarily implemented as an extern.

Another non-trivial multi-event component is the following Join component which provides its output when both the input tokens, left and right arrive:

// A component that triggers when both left and right tokens are available.
component Join<G, L>(
  @interface<G, max(G, L)> left: 1,
  @interface<L, max(G, L)> right: 1,
) -> (
  @[max(G, L), max(G, L) + 1] out: 1,
) {
  LL := new Latch;
  LR := new Latch;
  A := new CombAnd;

  l := LL<G, max(G, L)+1>(left);
  r := LR<L, max(G, L)+1>(right);
  and := A<max(G, L)>(l.out, r.out);

  out = and.out;
}

This is a truly dynamic component: G and L are entirely unrelated and unordered which means either event can fire first. In fact, different invocations can have different orders of G and L firing.
The interface ports for the two events require that the module may not be used till max(G, L) has been observed, which again, is a truly dynamic event in the system.

Enumerations and methods on components

Oftentimes, components can have different sets of behavior depending on the subset of the signals being assigned to. For example, a memory will often have two different interfaces: one for reads and one for writes

comp Mem(addr: 8, in: 32, read_en: 1, write_en: 1) -> (out: 32, write_done: 1);

This component has two interfaces:

write_en tells the component that the data on the input needs to be written to the address.
read_en tells the component that the user is trying to read data from an address.

In Filament land, we can write the following signature:

comp Mem<G: 1>(
  @interface[G] read_en: 1, 
  @interface[G] write_en: 1, 
  @[G, G+1] addr: 8, 
  @[G, G+1] in: 32
) -> (
  @[G+1, G+2] out: 32,
  @[G+2, G+3] write_done: 1
);

Of course, this gives you the usual goodness of knowing that writes take two cycles while reads take one cycle. However, the weird thing is that it has two interface ports: both read_en and write_en represent the event G. Additionally, only a subset of ports are meaningful. For example, when the memory is being used to perform a read, the out port will have a meaningful output while performing a write, it won't. Similarly, the in port doesn't require an input if we're performing a write.

Enumeration: Representing Method Calls

The solution proposed here is using enumerations (similar to Rust's enum) as a way to represent method calls. For example:

enum MemInterface {
  Read(@interface[G] read_en: 1) -> (@[G+1, G+2] out: 32),
  Write(@interface[G] write_en: 1, @[G, G+1] in: 32) -> (@[G+2, G+3] write_done: 1)
}

(We use the function-like port signature to represent the inputs and outputs related to the call)

Next, we can change the definition of the component to use this enum:

comp Mem<G: 1>(addr: 8, interface: MemInterface) -> (...interface.outs) {...}

(TK: Not sure what the best way to represent the output associated with the enum is here. Just using something dumb for now)

When using this component, you must explicitly construct the enumeration associated with the "method" you're trying to use: either a read or a write

comp main<G: 1>)(...) {
  M := new Mem;
  mw := M<G>(1 /*addr*/, MemInterface::Write { in: 15 }) // again, not the best syntax here
  out = mw.out;  // error because we called the Write method
}

This all gives us the usual goodness of Filament of checking that signals are available at the right times etc. However, this leaves out a big question.

A Matter of Delay

Delays are the heart of Filament's reasoning about pipelining. The challenge with enumerations is that different methods might have different affects on pipelining: do reads and writes use the same underlying circuits and therefore might affect each other's delay. The simple solution is just making the whole enum have the same delay which might be a deal breaker but perhaps something to start with.

The other question is how to represent methods at the source level? This just talks about external primitives but overall, we want people to implement their own methods. I think the solution to this second question will probably elucidate the answer for the first one.

Inter-pipeline Data Access

Filament enforces the read-validity property of the type system on intra-iteration access, that is, data within the same pipeline iteration. A simple example of this is a two stage multiplier:

component Mult<G>(
  @interface[G, G+1] go: 1,
  @[G, G+1] left: 32,
  @[G, G+1] right: 32,
) -> (
  @[G+2, G+3] out: 32
) {
  // Register the inputs
  LL := new Latch;
  LR := new Latch;
  ll := LL<G, G+1, G+2>(left);
  lr := LR<G, G+1, G+2>(right);

  // Perform combinational multiply and return the answer
  M := new MultComb;
  m0 := M<G+2>(ll.out, lr.out);

  out = m0.out;
}

The multiplier registers the inputs left and right allowing it to accept another set of inputs on the next iteration while the multiplication is being performed. The multiplication stage accesses the data that was registered in the previous cycle. The type system ensures that when the multiplier reads the data, it is still valid.

A simple example of inter-iteration data accesses is something like a MAC unit which accumulates the products in the same register. However, Filament will not allow a user to write this program:

component Mac<G>(
  @interface[G, G+1] go: 1,
  @[G, G+1] left: 32,
  @[G, G+1] right: 32
) -> (
  @[G, G+1] out: 32
) {
  Acc := new Latch;
  M := new MultComb;
  A := new Add;

  m0 := M<G>(left, right);
  a := A<G>(m0.out, acc.out);
  acc := Acc<G, G+1, G+2>(a.out);
}

The problem is that Filament does not consider the input acc.out to be live till $[G+1, G+2]$ whereas the addition requires for it to be available at $[G, G+1]$. Of course, this is expected because our definition of the register states that the input is available one cycle after a write occurs:

component Latch<G, S, L>(
  @interface[G, G+1] write_en: 1,
  @interface[S, L] _go_S: 1,
  @[G, G+1] in: 32
) -> (
  @[S, L] out: 32
) where S = G+1, L > S;

Registers that look into the Past

What we would like to express is that we're accessing the previous value stored in the register. One way to do this is to change the Register component to expose a prev signal that exposes the previous value stored in the register. This is a simple change to the component:

component Latch<G, S, L>(
  @interface[G, G+1] write_en: 1,
  @interface[S, L] _go_S: 1,
  @[G, G+1] in: 32
) -> (
  @[S, L] out: 32,
  @[G, G+1] prev: 32
) where S = G+1, L > S;

With this change, we can implement the Mac component as follows:

component Mac<G>(...) {
  ...
  m0 := M<G>(left, right);
  a := A<G>(m0.out, acc.prev);
  acc := Acc<G, G+1, G+2>(a.out);
}

Within the Verilog implementation of the register, the prev signal is just connected to the register's output, i.e., it only exists for modeling purposes and does not imply any additional hardware since the register already contains its "previous" value.

There opens up a couple of important questions:

How do we know that the prev signal is valid? For example, at the start of the computation, the register might contain garbage and needs to be initialized to zero.
In general, how to distinguish between "reuse" of the same Mac unit and a new usage. Repeated invocations are going to continue accumulating values into the Mac unit. We need some way to explicitly reset the internal state.
This acc.prev signal adds a form of aliasing to the system since a register that is reused has two ways to access the same logical value. Is this going to be a problem?

Out of Band Data Circulation

A different approach to this problem is having the data be "recirculated" into the pipeline, i.e., we make the driver of the pipeline responsible to providing both the current value and the previous value. We do this by exposing the value of the accumulator stored in the Mac:

component Mac<G>(
  @[G, G+1] left: 32,
  @[G, G+1] right: 32,
  @[G, G+1] acc: 32
) -> (
  ...
  @[G+1, G+2] out: 32
) {
  ...
  m0 := M<G>(left, right);
  a := A<G>(m0.out, acc);
  acc := Acc<G, G+1, G+2>(a.out);
  out = acc.out;
}

Then, the driver of the pipeline provides the value of the accumulator by "tieing" the acc signal to the output of the Mac unit after the first iteration:

component Driver<G>(...) {
  M := new Mac;
  m0 := M<G>(l0, r0, 0);
  m1 := M<G+1>(l1, r1, m0.out);
  m2 := M<G+2>(l2, r2, m1.out);
  ...
}

This solution becomes pretty unwieldy quickly---in a $4\times4$ systolic array, we'd have to recirculate $16$ such signal. The benefit is that the things pretty much just work and we don't have to tackle the questions posed above.

Linearity check for port assignments

We need to ensure that output ports of a component and all bundle locations have exactly one assignment to them (are linear). The former isn't super hard but we'd probably need more machinery to encode the constraints needed to prove that assignments to all bundle locations are unique.

Expressing backend capabilities using parameter bounds

One way to use parameter bounds for black-box external tools is to provide a bound on them such that we can guarantee that the backend tool can generate a corresponding module. This can be useful for tools like Reticle which cannot necessarily generate dot product hardware for all possible sizes of vector.

Generate efficient FSMs for slow modules

When a module does not have an II=1, we should be able to generate smaller, counter-based FSMs. This is particularly important if we're interacting with modules of high-latency since we may need to a long time and the generated pipelined FSM can get quite large.

Stream ports

Here is a component that takes three inputs and adds them all up:

import "primitives/core.fil";

component Add3<G>(
  @interface[G, G+1] go_G: 1,
  @[G, G+1] in0: 32,
  @[G+1, G+2] in1: 32,
  @[G+2, G+3] in2: 32
) -> (
  @[G+2, G+3] out: 32
) {
  // Register first input
  L0 := new Latch;
  l0 := L0<G, G+1, G+2>(in0);

  // Add the second input and register it
  A0 := new Add;
  a0 := A0<G+1>(l0.out, in1);
  L1 := new Latch;
  l1 := L1<G+1, G+2, G+3>(a0.out);

  // Add the final input
  A1 := new Add;
  a1 := A1<G+2>(l1.out, in2);

  // Produce the output
  out = a1.out;
}

Note: This can be turned into a finite impulse response (FIR) filter by providing a delay parameter for each input.

Looking at the signature, we can see that the inputs all arrive at different times:

component Add3<G>(
  @[G, G+1] in0: 32,
  @[G+1, G+2] in1: 32,
  @[G+2, G+3] in2: 32

Because of this, we should be able to map the logical ports to one physical wire since we know that the inputs arrive at disjoint time intervals.

This is not true if Add3 is pipelined; in that case, in0 and in1 may arrive at the same time and cannot be multiplexed onto the same input wire.

In addition to making the interface more concise, this also allows to express the true intention behind this component: its interface is a stream of inputs that we'd like to process using the pipeline.

Describing Streaming Interfaces

Defining a Stream Interface

Here is some made up syntax:

component Add3<G>(
  @stream([G, G+1], delay=1, num=3) in: 32

The syntax tells us that the first input arrives at @[G, G+1], the next input arrives after a delay of 1, i.e. at @[G+1, G+2], and there will be three such inputs. Therefore, normal ports can be written down as:

component Add3<G>(
  @stream([G, G+1], delay=1 num=1) in0: 32

Accessing Stream Ports

Inside the body of the component, we use the syntax port[idx] to access the idxth port of a stream:

component Add3<G>(
  @stream([G, G+1], delay=1, num=3) in: 32
) -> (...) {
  // Register first input
  L0 := new Latch;
  l0 := L0<G, G+1, G+2>(in[0]);

  // Add the second input and register it
  A0 := new Add;
  a0 := A0<G+1>(l0.out, in[1]);

The semantics of stream ports are straightforward, the ith port of a stream port with: @stream([G, G+n], delay=d, num=i) is the ith port of the stream that arrives at @[G+i*d, G+n+i*d].

Constructing Stream Ports

Now we have component that takes a stream of inputs, we need a way to construct a stream of inputs. This can be done using the Stream construct:

component Add3<G>(
  @stream([G, G+1], delay=1, num=3) in: 32, ...) {
  ...
}

component Main<T>(
  @[T, T+1] in0: 32,
  @[T+1, T+2] in1: 32,
  @[T+2, T+3] in2: 32
) -> (...) {
  A := new Add3;
  Stream inputs([T, T+1], delay=1) = [in0, in1, in2];
  a := A<T>(inputs);

The syntax is a bit verbose but for now that okay

Type checking the Stream construct is straightforward:

The start time of the stream is dictated by the first input, i.e. in0 in this case.
The delay between the inputs must match the delay parameter for the stream

Compiling Stream Ports

There are two parts to this, the @stream ports and construction of stream ports.

Stream port definitions need to be compiled into a set of normal ports. For example, the Add3 component above would be compiled into:

component Add3<G>(
  @[G, G+1] in0: 32,
  @[G+1, G+2] in1: 32,
  @[G+2, G+3] in2: 32
) -> (...) {
  ...
}

Each use of a stream port is replaced with the corresponding normal port.

Constructing stream ports is straightforward, we simply forward each signal to the corresponding normal port in the component.

Using `@phantom` port in a module should be an error

Any uses of @phantom ports in the body of a component should be an error.

Invalid pipeline with resource sharing passes type checking

The following program defines an unsafe pipeline:

import "primitives/core.fil";

component Main<G>(
  @interface[G, G+3] go_G: 1,
  @[G, G+1] l0: 32,
  @[G, G+1] r0: 32,
  @[G+9, G+10] l1: 32,
  @[G+9, G+10] r1: 32,
) -> () {
  M := new Mult;

  m0 := M<G>(l0, r0);
  m1 := M<G+9>(l1, r1);
}

It runs the same multiplier on the first and the ninth cycles of its execution.
However, this pipeline is unsafe; here is a sequence of events that will cause conflicting use of G:

Cycle 0: Main triggered (invocation 1), m0 starts executing
Cycle 3: M completes execution
Cycle 9: Main triggered again (invocation 2). However, invocation 1 of Main attempts to execute m1 which now conflicts with execution of m0 in invocation 1.

The @interface port should be constrained so that it does not attempt to restart the module until all executions of a instance have had a chance to run.

Initiation Intervals for Pipelines

Definition: For a pipeline $P$, let $P(t)$ be the invocation of $P$ at time $t$. The initiation interval is the smallest value $i$ such that for all cycles $t$, $P(t)$ and $P(t+i)$ do not conflict. In order words, $i$ is the minimum number of cycles before which $P$ can be safely executed anytime in the future.

Definition (safe execution of a pipeline): A pipeline execution is safe if no resource used in the pipeline conflicts with any other usages of the pipeline. In other words, all active invocations of the pipeline use disjoint sets of resources.

Note: The use of "anytime" in the definition of an initiation interval is important because there might exist other values $l$ such that $l < i$ and $P(t)$ and $P(t+l)$ do not conflict however, there exists some time $n$ such that $P(t)$ and $P(t+l+n)$ do conflict. The existence of such $n$ makes $l$ an invalid initiation interval. Filament could infer this if there was a way to force the calling context to invoke the pipeline exactly $l$ cycles later.

The current restriction on @interface are:

Its delay, computed as the difference between the end and start time, is longer than the delay of any input or output port that uses the interface's corresponding event.
Its delay is longer than the delay required by any invocation that uses the corresponding event for scheduling.
Its delay is sufficient to ensure that reuse of an instance does not conflict with any other use.

We can divide the requirements into safety conditions for one invocation and safety conditions for pipelined use.
(3), along with the availability checking of signals within the body of the component, ensures that one use of a pipeline is correct; all signals are valid when they are read and resource usage is non-conflicting.

(1) & (2) attempt to ensure that pipelined usages of the component are safe.
(1) ensures that subsequent pipelined invocations do not break the requirements or guarantees of a previous invocation.
(2) ensures that all components used inside the module are only invoked when they can be safely pipelined.

The missing piece is this: the @interface signal needs to ensure that a pipelined invocation of a resource does not conflict with the current use of the components.
A high-level, we can imagine "ghost" version of all components floating around that represent the execution of the pipeline at time interval $i$. We need to ensure that none of them conflict with the current execution.

My current thought on how to do it is by ensure that the end time of @interface is greater than or equal to the max of the end times of all usages of an instance. I have to convince myself that this is necessary and sufficient.

Interpreting interface signals in continuous pipelines

Some components have signatures like this:

component Mux<G>(
  @interface[G, G+1] _go: 1, ...

Where the signal _go is not used inside the component and only provided for modeling the II of the component (in this case, a combinational Mux that is available every cycle).

However, there is another way to interpret an "unused" interface signal which basically states that the signal is "internal" to the module and cannot be provided by the outside world. For example, consider a pipelined systolic array that always moves its internal values no matter whether a new input was provided or not. In this case, the calling context cannot provide this pipeline with a go signal since the pipeline is continuously performing its computation. I'll call such pipeline "continuous pipeline". Other benefits of continuous pipelines is that they do not need internal pipelined FSMs because their internal logic is always executing. Because of this, we can completely eliminate guard in certain cases as well (#37).

The challenge with such pipelines is that there isn't a clear way to have the calling context be forced to continuously provide valid data to them which might break Filament's guarantees. I say might because Filament's guarantee might allow some wiggle room about interpreting the meaning of absent signals vs present 'x signals.