Coder Social home page Coder Social logo

Exceeds buffer capacity about timeloop HOT 10 CLOSED

nvlabs avatar nvlabs commented on August 17, 2024
Exceeds buffer capacity

from timeloop.

Comments (10)

angshuman-parashar avatar angshuman-parashar commented on August 17, 2024

Factors are multiplicatively cumulative, so to determine the tile size at the Global buffer I'll need to know the factors at all levels inside of the Global buffer as well.

from timeloop.

aleczhanshi avatar aleczhanshi commented on August 17, 2024

@angshuman-parashar Thanks. Below are the factors of the global buffer. Is that what we need to compute the tile size?

    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 

from timeloop.

angshuman-parashar avatar angshuman-parashar commented on August 17, 2024

No that's not enough. As you can see, that's storage level #4. I need to know factors for levels 0, 1, 2, 3 as well - the product of all of those factors will give you the tile size at level 4. Perhaps that explains why your buffer is overflowing?

from timeloop.

aleczhanshi avatar aleczhanshi commented on August 17, 2024

@angshuman-parashar Thanks! What are the equations behind this? For example, is the tile size for level 0 the product of all factors (R, S, P, Q, C, K, N)? For upper levels, could you show me the equation to compute the tile size based on the lower levels and itself? I'm putting all the factors below. Thanks!

    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 

from timeloop.

angshuman-parashar avatar angshuman-parashar commented on August 17, 2024

First calculate each dimension as the product of all factors. E.g., multiplying over all levels (temporal + spatial) from 0 through 4, we get: R = 7, S=7, P=112, Q=8, C=1, K=64, N=1. This gives us the problem- or iteration-space tile at level 4. Next, project this problem-space into the data-spaces (i.e., tensors) to obtain the tile shapes for those spaces. E.g., weights = R*S*C*K = 3,136, outputs = N*K*Q*P = 57,344 and inputs = N*C*(S+(Q-1)*Hstride)*(R+(P-1)*Wstride) = 4,809 (assuming dilation=1), giving us a total of 65,289 entries. You can multiply that by the word size to get the capacity in bytes.

Now I'm curious, because it doesn't match the error message (unless I messed up the math somewhere above). Could you please email or upload the entire .cfg (arch, mapping, everything) so that I can reproduce at my end?

from timeloop.

aleczhanshi avatar aleczhanshi commented on August 17, 2024

@angshuman-parashar Thanks for doing the computation! I really appreciate it. The error for this set of parameters below is ERROR: couldn't map level GlobalBuffer: mapped tile size 62153 exceeds buffer capacity 32768. I've done the math and got the same results as you, which is 65289, but it ends up being 62153 instead. Not that much of difference but any clue why this is the case?

arch : 
{
  arithmetic : 
  {
    name = "MACs";
    instances = 256;
    word-bits = 16;
    meshX = 16;
  };
  storage = ( 
    {
      name = "PsumRegFile";
      entries = 16;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "WeightRegFile";
      entries = 192;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "InputRegFile";
      entries = 12;
      instances = 256;
      meshX = 16;
      word-bits = 16;
      read_bandwidth = 2;
      write_bandwidth = 2;
    }, 
    {
      name = "DummyBuffer";
      entries = 0;
      instances = 16;
      meshX = 16;
      word-bits = 16;
    }, 
    {
      name = "GlobalBuffer";
      sizeKB = 64;
      instances = 1;
      meshX = 1;
      word-bits = 16;
      block-size = 4;
      read_bandwidth = 16;
      write_bandwidth = 16;
    }, 
    {
      name = "DRAM";
      technology = "DRAM";
      instances = 1;
      word-bits = 16;
    } );
};

problem : 
{
  R = 7;
  S = 7;
  P = 112;
  Q = 112;
  C = 3;
  K = 64;
  N = 1;
  Wstride = 2;
  Hstride = 2;
};

mapping = (
    {
      target = 0;
      type = "datatype";
      keep = [ "Outputs" ];
      bypass = [ "Weights", "Inputs" ];
    }, 
    {
      target = 1;
      type = "datatype";
      keep = [ "Weights" ];
      bypass = [ "Inputs", "Outputs" ];
    }, 
    {
      target = 2;
      type = "datatype";
      keep = [ "Inputs" ];
      bypass = [ "Weights", "Outputs" ];
    }, 
    {
      target = 3;
      type = "datatype";
      keep = [ ];
      bypass = [ "Weights", "Inputs", "Outputs" ];
    }, 
    {
      target = 4;
      type = "datatype";
      keep = [ "Inputs", "Outputs" ];
      bypass = [ "Weights" ];
    }, 
    {
      target = 5;
      type = "datatype";
      keep = [ "Weights", "Inputs", "Outputs" ];
      bypass = [ ];
    }, 
    {
      target = 0;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K16 N1";
      permutation = "KRSPQCN";
    }, 
    {
      target = 1;
      type = "temporal";
      factors = "R7 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 2;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 3;
      type = "spatial";
      factors = "R1 S7 P1 Q1 C1 K2 N1";
      permutation = "SKRPQCN";
      split = 0;
    }, 
    {
      target = 3;
      type = "temporal";
      factors = "R1 S1 P1 Q1 C1 K1 N1";
      permutation = "RSPQCKN";
    }, 
    {
      target = 4;
      type = "spatial";
      factors = "R1 S1 P1 Q8 C1 K2 N1";
      permutation = "QKRSPCN";
      split = 2;
    }, 
    {
      target = 4;
      type = "temporal";
      factors = "R1 S1 P112 Q1 C1 K1 N1";
      permutation = "PRSQCKN";
    }, 
    {
      target = 5;
      type = "temporal";
      factors = "R1 S1 P1 Q14 C3 K1 N1";
      permutation = "CQKRSPN";
    }
);

from timeloop.

aleczhanshi avatar aleczhanshi commented on August 17, 2024

@angshuman-parashar Another question is, I assume that the permutation will not affect the tile size, is it true?

Further, I guess that only those non-one factors will count in the permutation in terms of performance implications. For example, if I have R1 S1 P1 Q8 C1 K2 N1, only the order of Q and K affects the performance because other factors are all ones. In other words, {QK}RSPCN should be same as RSPCN{QK}, and also {QK}PCNRS. Is it correct?

from timeloop.

angshuman-parashar avatar angshuman-parashar commented on August 17, 2024

Re. your earlier question: Look at the bypass settings. Weights are being bypassed at that level. 65289 - 62153 = 3136, which is the weight tile :).

Re. your most recent question: Correct, permutation does not affect size. And correct, permutations of only non-unit factors affect performance/energy efficiency. In fact, this is something that the mapper exploits to prune the search space.

from timeloop.

aleczhanshi avatar aleczhanshi commented on August 17, 2024

@angshuman-parashar Thanks! It makes a lot of sense. I really appreciate it!

from timeloop.

agarwal-ayushi avatar agarwal-ayushi commented on August 17, 2024

Hi @aleczhanshi and @angshuman-parashar : I am facing a similar issue while trying to convert the mapper output map.txt file to .yaml format for the timeloop-model. I am specifically working on the tutorial example: timeloop-accelergy-exercises/workspace/exercises/2020.ispass/timeloop/06-mapper-convlayer-eyeriss

For the mapping given in ref-output: timeloop-mapper.map.txt: here
Motivation for my work: I want to use sparse-opt in the timeloop-model on a particular mapping to study impact of sparsity. timeloop-model uses map.yaml. Hence, this effort.
I wrote a map.yaml file:

mapping:
- target: DRAM
type: temporal
factors: Q=4 M=4 C=8 P=1 R=1 S=1 N=1
permutation: CMQPRSN

- target: shared_glb
type: temporal
factors: M=4 P=56 Q=1 R=1 S=1 C=1 N=1
permutation: QMPRSCN

- target: shared_glb
type: spatial
factors: Q=14 M=1 P=1 C=1 R=1 S=1 N=1
permutation: QMPCRSN
split: 1

- target: DummyBuffer
type: temporal
factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1
permutation: MSCQPRN

- target: DummyBuffer
type: spatial
factors: Q=1 C=4 S=3 P=1 R=1 N=1 M=1
permutation: PRNMQSC
split: 4

- target: ifmap_spad
type: temporal
factors: Q=1 M=1 C=1 S=1 P=1 R=1 N=1
permutation: CMQSPRN

- target: weights_spad
type: temporal
factors: R=3 C=4 N=1 S=1 P=1 Q=1 M=1
permutation: CRNSPQM

- target: psum_spad
type: temporal
factors: M=16 R=1 C=1 N=1 S=1 P=1 Q=1
permutation: MRCNSPQ

However when I run:
timeloop-model arch/eyeriss_like.yaml arch/components/*.yaml prob/VGG02_layer5.yaml trial_map.yaml
I get this error: I have been unable to figure out the problem in my mapping. Any help would be great. No other files have been modified.

Sparse optimization configuration complete.
ERROR: couldn't map level psum_spad: mapped tile size 33 exceeds buffer capacity 16

from timeloop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.