Coder Social home page Coder Social logo

Comments (5)

hikettei avatar hikettei commented on May 24, 2024

テストの破壊的変更: VM動作とインタプリタ動作の2パターンでテストを分けた(lake test/lake vm-test)
masterブランチのGithub Actionsで両方呼ぶように

from cl-waffe2.

hikettei avatar hikettei commented on May 24, 2024

cl-waffe2 IRを導入した
defnodeで表現されたIRをさらにトポロジカルソートしてその後In-place mutationなどのグラフレベルの最適化を実行 + 1次元の命令列に治す

Sample

(defsequence CNN ()
	     (Conv2D 3 16  `(3 3))
	     (asnode #'!relu)     
	     (MaxPool2D    `(2 2))
	     (Conv2D 16 32 `(5 5))
	     (asnode #'!relu)
	     (MaxPool2D `(2 2))
	     (asnode #'!reshape t (* 32 5 5)) 
	     (LinearLayer (* 32 5 5) 10))

(defmethod train ((model CNN) x y)
  (!mean
   (softmax-cross-entropy
    (call model x)
    y)))


(train (CNN) (randn `(64 3 32 32)) (randn `(64 10)))

コンパイルされたForwardのIRの例

 
(<WfInst[Compiled: VIEWTENSORNODE-T] : TID82611.state <= apply( TID82611(1 1) TID82609(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82488.state <= apply( TID82488(1 1) TID82483(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82496.state <= apply( TID82496(1) TID82485(1) )>

 <WfInst[Compiled: SCALARMUL-LISPTENSOR] : TID82488.state <= apply( TID82488(1 1) TID82496(1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID82505.state <= apply( TID82505(64 10) TID82488(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82513.state <= apply( TID82513(64 10) TID82505(64 10) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID82157.state <= apply( TID82157(64 32 11 11) TID82155(64 32 12 12) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID81644.state <= apply( TID81644(64 16 30 30) TID81642(64 16 31 31) )>

 <WfInst[Compiled: IM2COLNODE-T] : TID81305.state <= apply( TID81301(64 3 32 32) TID81305(64 3 3 3 30 30) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81316.state <= apply( TID81305(64 3 3 3 30 30) TID81316(64 30 30 3 3 3) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81325.state <= apply( TID81325(64 30 30 3 3 3) TID81316(64 30 30 3 3 3) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81362.state <= apply( TID81362(64 30 30 3 3 3) TID81325(64 30 30 3 3 3) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81322.state <= apply( TID81362(64 30 30 3 3 3) TID81322(57600 27) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81398.state <= apply( TID81398(57600 27) TID81322(57600 27) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81376.state <= apply( TID81376(16 3 3 3) <Param>TID81291(16 3 3 3) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81373.state <= apply( TID81376(16 3 3 3) TID81373(16 27) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81386.state <= apply( TID81373(16 27) TID81386(27 16) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID81386.state <= apply( TID81386(27 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81406.state <= apply( TID81406(27 16) TID81386(27 16) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID81395.state <= apply( TID81398(57600 27) TID81406(27 16) TID81395(57600 16) )>

 <WfInst[Compiled: <DELETED>] : TID81438.state <= apply( TID81438(57600 16) TID81395(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81454.state <= apply( TID81454(57600 16) TID81438(57600 16) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID81295.state <= apply( <Param>TID81295(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81446.state <= apply( TID81446(16) TID81295(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81465.state <= apply( TID81465(16) TID81446(16) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81462.state <= apply( TID81465(16) TID81462(1 16) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID81475.state <= apply( TID81475(57600 16) TID81462(1 16) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID81454.state <= apply( TID81454(57600 16) TID81475(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81494.state <= apply( TID81494(57600 16) TID81454(57600 16) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81491.state <= apply( TID81494(57600 16) TID81491(64 30 30 16) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81504.state <= apply( TID81491(64 30 30 16) TID81504(64 16 30 30) )>

 <WfInst[Compiled: <DELETED>] : TID81537.state <= apply( TID81537(64 16 30 30) TID81504(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81572.state <= apply( TID81572(64 16 30 30) TID81537(64 16 30 30) )>

 <WfInst[Compiled: WHERE-OPERATION-NODE-LISPTENSOR] : TID81510.state <= apply( TID81504(64 16 30 30) TID81510(64 16 30 30) )>

 <WfInst[Compiled: <DELETED>] : TID81564.state <= apply( TID81564(64 16 30 30) TID81510(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81599.state <= apply( TID81599(64 16 30 30) TID81564(64 16 30 30) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID81572.state <= apply( TID81572(64 16 30 30) TID81599(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81644.state <= apply( TID81644(64 16 30 30) TID81572(64 16 30 30) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID81673.state <= apply( TID81673(64 16 31 31) TID81644(64 16 30 30) )>

 <WfInst[Compiled: IM2COLNODE-T] : TID81678.state <= apply( TID81673(64 16 31 31) TID81678(64 16 2 2 15 15) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81689.state <= apply( TID81678(64 16 2 2 15 15) TID81689(64 15 15 16 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81698.state <= apply( TID81698(64 15 15 16 2 2) TID81689(64 15 15 16 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81735.state <= apply( TID81735(64 15 15 16 2 2) TID81698(64 15 15 16 2 2) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81695.state <= apply( TID81735(64 15 15 16 2 2) TID81695(14400 64) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81749.state <= apply( TID81749(14400 64) TID81695(14400 64) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81746.state <= apply( TID81749(14400 64) TID81746(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81763.state <= apply( TID81763(230400 4) TID81746(230400 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81760.state <= apply( TID81763(230400 4) TID81760(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81777.state <= apply( TID81777(230400 4) TID81760(230400 4) )>

 <WfInst[Compiled: MAXVALUE-NODE-CPUTENSOR] : TID81774.state <= apply( TID81777(230400 4) TID81774(230400 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81803.state <= apply( TID81803(230400 1) TID81774(230400 1) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81800.state <= apply( TID81803(230400 1) TID81800(64 15 15 16) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81813.state <= apply( TID81800(64 15 15 16) TID81813(64 16 15 15) )>

 <WfInst[Compiled: IM2COLNODE-T] : TID81818.state <= apply( TID81813(64 16 15 15) TID81818(64 16 5 5 11 11) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81829.state <= apply( TID81818(64 16 5 5 11 11) TID81829(64 11 11 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81838.state <= apply( TID81838(64 11 11 16 5 5) TID81829(64 11 11 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81875.state <= apply( TID81875(64 11 11 16 5 5) TID81838(64 11 11 16 5 5) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81835.state <= apply( TID81875(64 11 11 16 5 5) TID81835(7744 400) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81911.state <= apply( TID81911(7744 400) TID81835(7744 400) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81889.state <= apply( TID81889(32 16 5 5) <Param>TID81277(32 16 5 5) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81886.state <= apply( TID81889(32 16 5 5) TID81886(32 400) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID81899.state <= apply( TID81886(32 400) TID81899(400 32) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID81899.state <= apply( TID81899(400 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81919.state <= apply( TID81919(400 32) TID81899(400 32) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID81908.state <= apply( TID81911(7744 400) TID81919(400 32) TID81908(7744 32) )>

 <WfInst[Compiled: <DELETED>] : TID81951.state <= apply( TID81951(7744 32) TID81908(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81967.state <= apply( TID81967(7744 32) TID81951(7744 32) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID81281.state <= apply( <Param>TID81281(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81959.state <= apply( TID81959(32) TID81281(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID81978.state <= apply( TID81978(32) TID81959(32) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID81975.state <= apply( TID81978(32) TID81975(1 32) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID81988.state <= apply( TID81988(7744 32) TID81975(1 32) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID81967.state <= apply( TID81967(7744 32) TID81988(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82007.state <= apply( TID82007(7744 32) TID81967(7744 32) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82004.state <= apply( TID82007(7744 32) TID82004(64 11 11 32) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID82017.state <= apply( TID82004(64 11 11 32) TID82017(64 32 11 11) )>

 <WfInst[Compiled: <DELETED>] : TID82050.state <= apply( TID82050(64 32 11 11) TID82017(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82085.state <= apply( TID82085(64 32 11 11) TID82050(64 32 11 11) )>

 <WfInst[Compiled: WHERE-OPERATION-NODE-LISPTENSOR] : TID82023.state <= apply( TID82017(64 32 11 11) TID82023(64 32 11 11) )>

 <WfInst[Compiled: <DELETED>] : TID82077.state <= apply( TID82077(64 32 11 11) TID82023(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82112.state <= apply( TID82112(64 32 11 11) TID82077(64 32 11 11) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID82085.state <= apply( TID82085(64 32 11 11) TID82112(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82157.state <= apply( TID82157(64 32 11 11) TID82085(64 32 11 11) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID82186.state <= apply( TID82186(64 32 12 12) TID82157(64 32 11 11) )>

 <WfInst[Compiled: IM2COLNODE-T] : TID82191.state <= apply( TID82186(64 32 12 12) TID82191(64 32 2 2 5 5) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID82202.state <= apply( TID82191(64 32 2 2 5 5) TID82202(64 5 5 32 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82211.state <= apply( TID82211(64 5 5 32 2 2) TID82202(64 5 5 32 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82248.state <= apply( TID82248(64 5 5 32 2 2) TID82211(64 5 5 32 2 2) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82208.state <= apply( TID82248(64 5 5 32 2 2) TID82208(1600 128) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82262.state <= apply( TID82262(1600 128) TID82208(1600 128) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82259.state <= apply( TID82262(1600 128) TID82259(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82276.state <= apply( TID82276(51200 4) TID82259(51200 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82273.state <= apply( TID82276(51200 4) TID82273(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82290.state <= apply( TID82290(51200 4) TID82273(51200 4) )>

 <WfInst[Compiled: MAXVALUE-NODE-CPUTENSOR] : TID82287.state <= apply( TID82290(51200 4) TID82287(51200 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82316.state <= apply( TID82316(51200 1) TID82287(51200 1) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82313.state <= apply( TID82316(51200 1) TID82313(64 5 5 32) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID82326.state <= apply( TID82313(64 5 5 32) TID82326(64 32 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82335.state <= apply( TID82335(64 32 5 5) TID82326(64 32 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82362.state <= apply( TID82362(64 32 5 5) TID82335(64 32 5 5) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82332.state <= apply( TID82362(64 32 5 5) TID82332(64 800) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82387.state <= apply( TID82387(64 800) TID82332(64 800) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID81261.state <= apply( <Param>TID81261(10 800) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID82375.state <= apply( TID81261(10 800) TID82375(800 10) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID82375.state <= apply( TID82375(800 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82395.state <= apply( TID82395(800 10) TID82375(800 10) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID82384.state <= apply( TID82387(64 800) TID82395(800 10) TID82384(64 10) )>

 <WfInst[Compiled: <DELETED>] : TID82427.state <= apply( TID82427(64 10) TID82384(64 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82443.state <= apply( TID82443(64 10) TID82427(64 10) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID81265.state <= apply( <Param>TID81265(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82435.state <= apply( TID82435(10) TID81265(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82454.state <= apply( TID82454(10) TID82435(10) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID82451.state <= apply( TID82454(10) TID82451(1 10) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID82464.state <= apply( TID82464(64 10) TID82451(1 10) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID82443.state <= apply( TID82443(64 10) TID82464(64 10) )>

 <WfInst[Compiled: SOFTMAX-CROSS-ENTROPY-NODE-T] : TID82480.state <= apply( TID82443(64 10) TID81303(64 10) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID82513.state <= apply( TID82513(64 10) TID82480(64 10) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID82543.state <= apply( TID82543(1 1) TID82513(64 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82611.state <= apply( TID82611(1 1) TID82543(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID82639.state <= apply( TID82639(1 1) TID82611(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82553.state <= apply( TID82553(1) TID82548(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82565.state <= apply( TID82565(1) TID82553(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82559.state <= apply( TID82559(1) TID82550(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82571.state <= apply( TID82571(1) TID82559(1) )>

 <WfInst[Compiled: SCALARANDSCALARMUL-SCALARTENSOR] : TID82565.state <= apply( TID82565(1) TID82571(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82582.state <= apply( TID82582(1) TID82565(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82594.state <= apply( TID82594(1) TID82582(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82588.state <= apply( TID82588(1) TID82579(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82600.state <= apply( TID82600(1) TID82588(1) )>

 <WfInst[Compiled: SCALARANDSCALARMUL-SCALARTENSOR] : TID82594.state <= apply( TID82594(1) TID82600(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82631.state <= apply( TID82631(1) TID82594(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID82656.state <= apply( TID82656(1) TID82631(1) )>

 <WfInst[Compiled: SCALARDIV-LISPTENSOR] : TID82639.state <= apply( TID82639(1 1) TID82656(1) )>
)

Backwardの例

(<WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87615.state <= apply( TID87615(1 1) TID87612(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87573.state <= apply( TID87573(1) TID87536(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87623.state <= apply( TID87623(1) TID87573(1) )>

 <WfInst[Compiled: SCALARDIV-LISPTENSOR] : TID87615.state <= apply( TID87615(1 1) TID87623(1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87870.state <= apply( TID87870(1 1) TID87615(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID87806.state <= apply( TID87806(1 1) TID87804(1 1) )>

 <WfInst[Compiled: SCALARMUL-LISPTENSOR] : TID87735.state <= apply( TID87735(1 1) TID87737(1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID87743.state <= apply( TID87743(1 1) TID87735(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID87700.state <= apply( TID87700(1 1) TID87698(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID87655.state <= apply( TID87655(1 1) TID87653(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87553.state <= apply( TID87553(1 1) TID87485(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87655.state <= apply( TID87655(1 1) TID87553(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87635.state <= apply( TID87635(1 1) TID87612(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87643.state <= apply( TID87643(1) TID87632(1) )>

 <WfInst[Compiled: SCALARMUL-LISPTENSOR] : TID87635.state <= apply( TID87635(1 1) TID87643(1) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID87655.state <= apply( TID87655(1 1) TID87635(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87700.state <= apply( TID87700(1 1) TID87655(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87689.state <= apply( TID87689(1) TID87573(1) )>

 <WfInst[Compiled: SCALAR-SQUARENODE-SCALARTENSOR] : TID87689.state <= apply( TID87573(1) TID87689(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87720.state <= apply( TID87720(1) TID87689(1) )>

 <WfInst[Compiled: SCALARDIV-LISPTENSOR] : TID87700.state <= apply( TID87700(1 1) TID87720(1) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID87743.state <= apply( TID87743(1 1) TID87700(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID87762.state <= apply( TID87762(1 1) TID87743(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87806.state <= apply( TID87806(1 1) TID87762(1 1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87772.state <= apply( TID87772(1) TID87767(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87778.state <= apply( TID87778(1) TID87769(1) )>

 <WfInst[Compiled: SCALARANDSCALARMUL-SCALARTENSOR] : TID87772.state <= apply( TID87772(1) TID87778(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87789.state <= apply( TID87789(1) TID87772(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87795.state <= apply( TID87795(1) TID87786(1) )>

 <WfInst[Compiled: SCALARANDSCALARMUL-SCALARTENSOR] : TID87789.state <= apply( TID87789(1) TID87795(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87826.state <= apply( TID87826(1) TID87789(1) )>

 <WfInst[Compiled: SCALARDIV-LISPTENSOR] : TID87806.state <= apply( TID87806(1 1) TID87826(1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87844.state <= apply( TID87844(1 1) TID87806(1 1) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID87841.state <= apply( TID87844(1 1) TID87841(1) )>

 <WfInst[Compiled: MAT->SCALARNODE-T] : TID87864.state <= apply( TID87841(1) TID87864(1) )>

 <WfInst[Compiled: MOVESCALARTENSORNODE-SCALARTENSOR] : TID87878.state <= apply( TID87878(1) TID87864(1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88199.state <= apply( TID88199(1 1) TID87870(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88207.state <= apply( TID88207(1 1) TID88199(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88214.state <= apply( TID88214(1 1) TID88207(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87485.state <= apply( TID87485(1 1) TID88214(1 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88233.state <= apply( TID88233(64 10) TID87485(1 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88239.state <= apply( TID88239(64 10) TID88233(64 10) )>

 <WfInst[Compiled: SOFTMAX-CROSS-ENTROPY-NODE-BACKWARD-T] : TID88256.state <= apply( TID88239(64 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88265.state <= apply( TID88265(64 10) TID88256(64 10) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID88239.state <= apply( TID88239(64 10) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88280.state <= apply( TID88280(64 10) TID88265(64 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87406.state <= apply( TID87406(64 10) TID88280(64 10) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88295.state <= apply( TID88295(1 10) TID87406(64 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88301.state <= apply( TID88301(1 10) TID88295(1 10) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88309.state <= apply( TID88301(1 10) TID88309(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88315.state <= apply( TID88315(10) TID88309(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88323.state <= apply( TID88323(10) TID88315(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88331.state <= apply( TID88331(10) TID88323(10) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID88331.state <= apply( TID88331(10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88355.state <= apply( TID88355(64 10) TID88265(64 10) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID87317.state <= apply( TID87317(800 10) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID88362.state <= apply( TID87317(800 10) TID88362(10 800) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88371.state <= apply( TID88371(10 800) TID88362(10 800) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID88368.state <= apply( TID88355(64 10) TID88371(10 800) TID88368(64 800) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88423.state <= apply( TID88423(64 800) TID88368(64 800) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID87274.state <= apply( TID87304(64 32 5 5) TID87274(64 800) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID88399.state <= apply( TID87274(64 800) TID88399(800 64) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID88399.state <= apply( TID88399(800 64) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID88408.state <= apply( TID88399(800 64) TID88355(64 10) TID88408(800 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88431.state <= apply( TID88431(800 10) TID88408(800 10) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID88448.state <= apply( TID88448(800 10) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88454.state <= apply( TID88454(800 10) TID88448(800 10) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID88470.state <= apply( TID88454(800 10) TID88470(10 800) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88476.state <= apply( TID88476(10 800) TID88470(10 800) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID88476.state <= apply( TID88476(10 800) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88509.state <= apply( TID88509(64 800) TID88423(64 800) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88517.state <= apply( TID88509(64 800) TID88517(64 32 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88523.state <= apply( TID88523(64 32 5 5) TID88517(64 32 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88531.state <= apply( TID88531(64 32 5 5) TID88523(64 32 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88539.state <= apply( TID88539(64 32 5 5) TID88531(64 32 5 5) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID88546.state <= apply( TID88539(64 32 5 5) TID88546(64 5 5 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88552.state <= apply( TID88552(64 5 5 32) TID88546(64 5 5 32) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88579.state <= apply( TID88552(64 5 5 32) TID88579(51200 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88585.state <= apply( TID88585(51200 1) TID88579(51200 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88593.state <= apply( TID88593(51200 1) TID88585(51200 1) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID87215.state <= apply( TID87218(51200 4) TID87215(51200 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88601.state <= apply( TID87215(51200 4) TID88601(51200 4) )>

 <WfInst[Compiled: MAXVALUE-NODE-CPUTENSOR] : TID88607.state <= apply( TID88601(51200 4) TID88607(51200 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88624.state <= apply( TID88624(51200 4) TID88607(51200 1) )>

 <WfInst[Compiled: COMPARE-OPERATION-NODE-LISPTENSOR] : TID88630.state <= apply( TID87215(51200 4) TID88624(51200 4) TID88630(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88657.state <= apply( TID88657(51200 4) TID88630(51200 4) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88651.state <= apply( TID88651(51200 4) TID88593(51200 1) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID88657.state <= apply( TID88657(51200 4) TID88651(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88679.state <= apply( TID88679(51200 4) TID88657(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88687.state <= apply( TID88687(51200 4) TID88679(51200 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88695.state <= apply( TID88687(51200 4) TID88695(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88701.state <= apply( TID88701(51200 4) TID88695(51200 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88709.state <= apply( TID88709(51200 4) TID88701(51200 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88717.state <= apply( TID88709(51200 4) TID88717(1600 128) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88723.state <= apply( TID88723(1600 128) TID88717(1600 128) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88731.state <= apply( TID88731(1600 128) TID88723(1600 128) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID88739.state <= apply( TID88731(1600 128) TID88739(64 5 5 32 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88745.state <= apply( TID88745(64 5 5 32 2 2) TID88739(64 5 5 32 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88753.state <= apply( TID88753(64 5 5 32 2 2) TID88745(64 5 5 32 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88761.state <= apply( TID88761(64 5 5 32 2 2) TID88753(64 5 5 32 2 2) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID88768.state <= apply( TID88761(64 5 5 32 2 2) TID88768(64 32 2 2 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88774.state <= apply( TID88774(64 32 2 2 5 5) TID88768(64 32 2 2 5 5) )>

 <WfInst[Compiled: COL2IMNODE-T] : TID88811.state <= apply( TID88774(64 32 2 2 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88814.state <= apply( TID88814(64 32 12 12) TID88811(64 32 12 12) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88821.state <= apply( TID88821(64 32 12 12) TID88814(64 32 12 12) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID87128.state <= apply( TID87128(64 32 12 12) TID88821(64 32 12 12) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID88831.state <= apply( TID88831(64 32 11 11) TID87128(64 32 12 12) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88837.state <= apply( TID88837(64 32 11 11) TID88831(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88864.state <= apply( TID88864(64 32 11 11) TID88837(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88872.state <= apply( TID88872(64 32 11 11) TID88864(64 32 11 11) )>

 <WfInst[Compiled: <DELETED>] : TID87019.state <= apply( TID87019(64 32 11 11) TID86965(64 32 11 11) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID88872.state <= apply( TID88872(64 32 11 11) TID87019(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88917.state <= apply( TID88917(64 32 11 11) TID88872(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88885.state <= apply( TID88885(64 32 11 11) TID88864(64 32 11 11) )>

 <WfInst[Compiled: <DELETED>] : TID86992.state <= apply( TID86992(64 32 11 11) TID86959(64 32 11 11) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID88885.state <= apply( TID88885(64 32 11 11) TID86992(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88944.state <= apply( TID88944(64 32 11 11) TID88885(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88952.state <= apply( TID88952(64 32 11 11) TID88944(64 32 11 11) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID88960.state <= apply( TID88960(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88966.state <= apply( TID88966(64 32 11 11) TID88960(64 32 11 11) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID88993.state <= apply( TID88993(64 32 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID88999.state <= apply( TID88999(64 32 11 11) TID88993(64 32 11 11) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89025.state <= apply( TID88999(64 32 11 11) TID89025(64 11 11 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89031.state <= apply( TID89031(64 11 11 32) TID89025(64 11 11 32) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89058.state <= apply( TID89031(64 11 11 32) TID89058(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89064.state <= apply( TID89064(7744 32) TID89058(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89072.state <= apply( TID89072(7744 32) TID89064(7744 32) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89079.state <= apply( TID89079(7744 32) TID89072(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID86930.state <= apply( TID86930(7744 32) TID89079(7744 32) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89094.state <= apply( TID89094(1 32) TID86930(7744 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89100.state <= apply( TID89100(1 32) TID89094(1 32) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89108.state <= apply( TID89100(1 32) TID89108(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89114.state <= apply( TID89114(32) TID89108(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89122.state <= apply( TID89122(32) TID89114(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89130.state <= apply( TID89130(32) TID89122(32) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID89130.state <= apply( TID89130(32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89154.state <= apply( TID89154(7744 32) TID89072(7744 32) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID86841.state <= apply( TID86841(400 32) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89161.state <= apply( TID86841(400 32) TID89161(32 400) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89170.state <= apply( TID89170(32 400) TID89161(32 400) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID89167.state <= apply( TID89154(7744 32) TID89170(32 400) TID89167(7744 400) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89222.state <= apply( TID89222(7744 400) TID89167(7744 400) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID86777.state <= apply( TID86817(64 11 11 16 5 5) TID86777(7744 400) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89198.state <= apply( TID86777(7744 400) TID89198(400 7744) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID89198.state <= apply( TID89198(400 7744) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID89207.state <= apply( TID89198(400 7744) TID89154(7744 32) TID89207(400 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89230.state <= apply( TID89230(400 32) TID89207(400 32) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID89247.state <= apply( TID89247(400 32) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89253.state <= apply( TID89253(400 32) TID89247(400 32) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89269.state <= apply( TID89253(400 32) TID89269(32 400) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89275.state <= apply( TID89275(32 400) TID89269(32 400) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89292.state <= apply( TID89275(32 400) TID89292(32 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89298.state <= apply( TID89298(32 16 5 5) TID89292(32 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89311.state <= apply( TID89311(32 16 5 5) TID89298(32 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89319.state <= apply( TID89319(7744 400) TID89222(7744 400) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89327.state <= apply( TID89319(7744 400) TID89327(64 11 11 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89333.state <= apply( TID89333(64 11 11 16 5 5) TID89327(64 11 11 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89341.state <= apply( TID89341(64 11 11 16 5 5) TID89333(64 11 11 16 5 5) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89349.state <= apply( TID89349(64 11 11 16 5 5) TID89341(64 11 11 16 5 5) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89356.state <= apply( TID89349(64 11 11 16 5 5) TID89356(64 16 5 5 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89362.state <= apply( TID89362(64 16 5 5 11 11) TID89356(64 16 5 5 11 11) )>

 <WfInst[Compiled: COL2IMNODE-T] : TID89399.state <= apply( TID89362(64 16 5 5 11 11) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89402.state <= apply( TID89402(64 16 15 15) TID89399(64 16 15 15) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89428.state <= apply( TID89402(64 16 15 15) TID89428(64 15 15 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89434.state <= apply( TID89434(64 15 15 16) TID89428(64 15 15 16) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89461.state <= apply( TID89434(64 15 15 16) TID89461(230400 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89467.state <= apply( TID89467(230400 1) TID89461(230400 1) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89475.state <= apply( TID89475(230400 1) TID89467(230400 1) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID86702.state <= apply( TID86705(230400 4) TID86702(230400 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89483.state <= apply( TID86702(230400 4) TID89483(230400 4) )>

 <WfInst[Compiled: MAXVALUE-NODE-CPUTENSOR] : TID89489.state <= apply( TID89483(230400 4) TID89489(230400 1) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89506.state <= apply( TID89506(230400 4) TID89489(230400 1) )>

 <WfInst[Compiled: COMPARE-OPERATION-NODE-LISPTENSOR] : TID89512.state <= apply( TID86702(230400 4) TID89506(230400 4) TID89512(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89539.state <= apply( TID89539(230400 4) TID89512(230400 4) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89533.state <= apply( TID89533(230400 4) TID89475(230400 1) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID89539.state <= apply( TID89539(230400 4) TID89533(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89561.state <= apply( TID89561(230400 4) TID89539(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89569.state <= apply( TID89569(230400 4) TID89561(230400 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89577.state <= apply( TID89569(230400 4) TID89577(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89583.state <= apply( TID89583(230400 4) TID89577(230400 4) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89591.state <= apply( TID89591(230400 4) TID89583(230400 4) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89599.state <= apply( TID89591(230400 4) TID89599(14400 64) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89605.state <= apply( TID89605(14400 64) TID89599(14400 64) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89613.state <= apply( TID89613(14400 64) TID89605(14400 64) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89621.state <= apply( TID89613(14400 64) TID89621(64 15 15 16 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89627.state <= apply( TID89627(64 15 15 16 2 2) TID89621(64 15 15 16 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89635.state <= apply( TID89635(64 15 15 16 2 2) TID89627(64 15 15 16 2 2) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89643.state <= apply( TID89643(64 15 15 16 2 2) TID89635(64 15 15 16 2 2) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89650.state <= apply( TID89643(64 15 15 16 2 2) TID89650(64 16 2 2 15 15) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89656.state <= apply( TID89656(64 16 2 2 15 15) TID89650(64 16 2 2 15 15) )>

 <WfInst[Compiled: COL2IMNODE-T] : TID89693.state <= apply( TID89656(64 16 2 2 15 15) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89696.state <= apply( TID89696(64 16 31 31) TID89693(64 16 31 31) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89703.state <= apply( TID89703(64 16 31 31) TID89696(64 16 31 31) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID86615.state <= apply( TID86615(64 16 31 31) TID89703(64 16 31 31) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89713.state <= apply( TID89713(64 16 30 30) TID86615(64 16 31 31) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89719.state <= apply( TID89719(64 16 30 30) TID89713(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89746.state <= apply( TID89746(64 16 30 30) TID89719(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89754.state <= apply( TID89754(64 16 30 30) TID89746(64 16 30 30) )>

 <WfInst[Compiled: <DELETED>] : TID86506.state <= apply( TID86506(64 16 30 30) TID86452(64 16 30 30) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID89754.state <= apply( TID89754(64 16 30 30) TID86506(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89799.state <= apply( TID89799(64 16 30 30) TID89754(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89767.state <= apply( TID89767(64 16 30 30) TID89746(64 16 30 30) )>

 <WfInst[Compiled: <DELETED>] : TID86479.state <= apply( TID86479(64 16 30 30) TID86446(64 16 30 30) )>

 <WfInst[Compiled: MULNODE-LISPTENSOR] : TID89767.state <= apply( TID89767(64 16 30 30) TID86479(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89826.state <= apply( TID89826(64 16 30 30) TID89767(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89834.state <= apply( TID89834(64 16 30 30) TID89826(64 16 30 30) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID89842.state <= apply( TID89842(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89848.state <= apply( TID89848(64 16 30 30) TID89842(64 16 30 30) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID89875.state <= apply( TID89875(64 16 30 30) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89881.state <= apply( TID89881(64 16 30 30) TID89875(64 16 30 30) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID89907.state <= apply( TID89881(64 16 30 30) TID89907(64 30 30 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89913.state <= apply( TID89913(64 30 30 16) TID89907(64 30 30 16) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89940.state <= apply( TID89913(64 30 30 16) TID89940(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89946.state <= apply( TID89946(57600 16) TID89940(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89954.state <= apply( TID89954(57600 16) TID89946(57600 16) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89961.state <= apply( TID89961(57600 16) TID89954(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID86417.state <= apply( TID86417(57600 16) TID89961(57600 16) )>

 <WfInst[Compiled: VIEWTENSORNODE-T] : TID89976.state <= apply( TID89976(1 16) TID86417(57600 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89982.state <= apply( TID89982(1 16) TID89976(1 16) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID89990.state <= apply( TID89982(1 16) TID89990(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID89996.state <= apply( TID89996(16) TID89990(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90004.state <= apply( TID90004(16) TID89996(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90012.state <= apply( TID90012(16) TID90004(16) )>

 <WfInst[Compiled: FLEXIBLE-RANK-NODE-T] : TID90012.state <= apply( TID90012(16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90036.state <= apply( TID90036(57600 16) TID89954(57600 16) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID86328.state <= apply( TID86328(27 16) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID90043.state <= apply( TID86328(27 16) TID90043(16 27) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90052.state <= apply( TID90052(16 27) TID90043(16 27) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID90049.state <= apply( TID90036(57600 16) TID90052(16 27) TID90049(57600 27) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90104.state <= apply( TID90104(57600 27) TID90049(57600 27) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID86264.state <= apply( TID86304(64 30 30 3 3 3) TID86264(57600 27) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID90080.state <= apply( TID86264(57600 27) TID90080(27 57600) )>

 <WfInst[Compiled: LAZYTRANSPOSENODE-T] : TID90080.state <= apply( TID90080(27 57600) )>

 <WfInst[Compiled: MATMULNODE-CPUTENSOR] : TID90089.state <= apply( TID90080(27 57600) TID90036(57600 16) TID90089(27 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90112.state <= apply( TID90112(27 16) TID90089(27 16) )>

 <WfInst[Compiled: INSTANTKERNELNODE-T] : TID90129.state <= apply( TID90129(27 16) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90135.state <= apply( TID90135(27 16) TID90129(27 16) )>

 <WfInst[Compiled: PERMUTE-NODE-T] : TID90151.state <= apply( TID90135(27 16) TID90151(16 27) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90157.state <= apply( TID90157(16 27) TID90151(16 27) )>

 <WfInst[Compiled: RESHAPETENSORNODE-T] : TID90174.state <= apply( TID90157(16 27) TID90174(16 3 3 3) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90180.state <= apply( TID90180(16 3 3 3) TID90174(16 3 3 3) )>

 <WfInst[Compiled: MOVETENSORNODE-CPUTENSOR] : TID90193.state <= apply( TID90193(16 3 3 3) TID90180(16 3 3 3) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86235.state <= apply( TID86235(16 3 3 3) TID90180(16 3 3 3) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86239.state <= apply( TID86239(16) TID90012(16) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86221.state <= apply( TID86221(32 16 5 5) TID89298(32 16 5 5) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86225.state <= apply( TID86225(32) TID89130(32) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86205.state <= apply( TID86205(10 800) TID88476(10 800) )>

 <WfInst[Compiled: ADDNODE-CPUTENSOR] : TID86209.state <= apply( TID86209(10) TID88331(10) )>
)

from cl-waffe2.

hikettei avatar hikettei commented on May 24, 2024

(気が向いたらスライド作ってドキュメントに投げる+英訳)

実行の手順

データ構造

cl-waffe2では複数のAbstractNodeを組み合わせて計算ノードを構築する。時たまCompositeというAbstractNodeの集合であるデータ構造を定義することもあるが、最小単位はAbstractNodeになっている。

最もprimitiveなAbstractNode結果の格納先 <- 引数Tensor1 引数Tensor2 ...(Subscript DSLで表現)というノードであり、その演算はS式で表現される。それに加えて、以下の情報を保持している: AbstractNode=CLOSクラス それのスロット, 逆伝播の定義など・・・

AbstractNodeから派生するdefine-static-node等もあるが、それはdefnodedefine-impldefine-static-nodeマクロでラップしているだけである。(最も汎用的なデータ構造がS式なので、最もprimitiveなAbstractNodeもS式)

演算は、S式で多次元行列・View互換・Multiple-Dtypeになるように展開するS式(achieved by the call-with-view function)を動的にcompileすることでλ関数を生成しその後実行する。再利用性を高めるために:cache-when-compiled=tにすれば二度目のコンパイルは行わない。

基本方針として、高級な機能はマクロでラップして実現するし、演算はラムダ関数で表現する。よって、cl-waffe2のVMは一番低レベルな機能であるAbstractNodeをいかに大量にかつ高速に実行するかだけを考えればいい。

IRの実行

AbstractNodeforward/callすることで得られる計算ノードは、実行形態が以下の二パターンの時に応じて異なる方法で実行される

インタプリタモード(proceed)

計算ノード木構造のまま実行する。これはREPLでのデバッグ時や、学習データの整形時など、即時実行が欲しい時向け

VMモード(build)

以下の手順で実行する

(forward mode)
[AbstractNodeの計算ノード]
 ↓
[トポロジカルソートをして依存関係解消]
 ↓ 
[一次元のcl-waffe2 IRに変換]
 ↓
[各変数の参照回数を数え上げて、最後の参照はMoveTensorNodeを削除する(In-place mutation)] ... (a)
 ↓
[VMで実行]
(reverse mode)
[(a)時点での計算ノードを全て参照、対応するbackwardを呼び出す]
↓
[計算ノードに全て書き出す]
↓
[forward modeと同じ手順]
↓
[gradient adderを最後に埋め込む]
↓
[VMで実行]

(注: cl-waffe2 IRは以下の形式 X <- f(a b c...), where f = λ関数)

(現在)高速化のためにできてないことが二つある: FuseOps Multithreading call-with-viewのデータ構造を工夫することで, JITを介さなくても自動でできるけど、どうせならAVX512命令が欲しいのでExtensible JITで対応することにした。(並列化はcall-with-viewをちょっと書き直せばできる。演算の融合は今ある機能をなんかのマクロでWrapする。破壊的変更を加えない。)

これでCNNのコンパイル時間が0.5sくらい 本気を出せばPetalispくらい早くなりそうだけど、正味コンパイル時間にあんまり興味がない

Extensible JIT

(仕様変えようと思ってるがまだ未定)

References

http://www.utkuevci.com/ml/autograd/

from cl-waffe2.

hikettei avatar hikettei commented on May 24, 2024

今後の方針:
・Forward/Backwardが全てのテストを通る + MLP/CNNが正しく学習できていることを確認
・JITCPUTensorをcl-waffe2 IRからコンパイルするように書き直す
・Github ActionsのWorkflow修正
-> 順次develop/masterにmerge

from cl-waffe2.

hikettei avatar hikettei commented on May 24, 2024

Successfully Merged at #75

from cl-waffe2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.