Regarding the discussion in <a class="issue-link js-issue-link" data-error-text="Faile

Nice! And cool that this is very simple to implement with the <code class="notranslate

Manual allocation strategy about tensoroperations.jl HOT 2 CLOSED

amilsted commented on September 26, 2024

Manual allocation strategy

from tensoroperations.jl.

Comments (2)

lkdvos commented on September 26, 2024 2

As promised, I've started experimenting with some strategies, it is still very much WIP but you can follow it here: AllocationKit

So far I've tried just plain malloc/free, and the finalizer version which is less prone to memory leaks, and initial benchmarks on my laptop report the underlying times. The testcase is the application of a single-site update encountered for example in DMRG or VUMPS for an Ising-like hamiltonian, repeated 30 times. (benchmark here )

In summary, it seems as if just doing this already leads to various improvements, depending on the tensor sizes. I might continue playing around this week with something like Bumper.jl, allthough I ran into some issues trying to get the StrideArray/PtrArray to work nicely with Strided.jl, so I will need to fix that first.

D = 64:

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  400.062 μs … 715.187 μs  ┊ GC (min … max): 13.62% … 35.15%
 Time  (median):     509.756 μs               ┊ GC (median):    11.73%
 Time  (mean ± σ):   503.847 μs ±  51.078 μs  ┊ GC (mean ± σ):  13.61% ±  3.73%

      ▁                       ▂▂█▅▁▁       ▁                     
  ▃▂▄▄█▄▇▄▇▆▅▅▅▄▃▃▃▄▄▄▄▄▅▄▄▄▄▆██████▆▅█▇█▇▇█▅▆▅▅▅▄▄▁▃▃▄▂▃▃▁▃▃▃▃ ▄
  400 μs           Histogram: frequency by time          617 μs <

 Memory estimate: 1.63 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  271.277 μs … 727.405 μs  ┊ GC (min … max):  0.00% … 37.25%
 Time  (median):     345.764 μs               ┊ GC (median):    13.97%
 Time  (mean ± σ):   360.340 μs ±  61.898 μs  ┊ GC (mean ± σ):   8.75% ±  6.93%

    ▁▄▅          ▃▄▆▃█▅ ▁ ▂▁                                ▃    
  ▃████▄▆▆█▆▅▆▄▄▇████████▇████▆▅▅▅▆▃▅▄▃▃▄██▇▅▄▄▁▃▁▁▁▁▁▁▁▁▄█▇█▇▃ ▅
  271 μs           Histogram: frequency by time          493 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  276.067 μs … 568.403 μs  ┊ GC (min … max):  0.00% … 37.13%
 Time  (median):     362.419 μs               ┊ GC (median):    13.79%
 Time  (mean ± σ):   365.961 μs ±  39.841 μs  ┊ GC (mean ± σ):   8.66% ±  6.82%

                      ▃▃▃▇▅▁ ▁             ▁▇█▇▃▃                
  ▃▅▄▄▆▄▄▂▃▃▂▁▅▃▃▂▃▃▅▆██████▇█▅▇▄▆▅▄▃▄▃▄▂▆████████▅▄▄▃▃▄▂▁▁▁▁▁▂ ▄
  276 μs           Histogram: frequency by time          448 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

D = 128

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.929 ms …   3.871 ms  ┊ GC (min … max): 11.19% … 6.77%
 Time  (median):     2.403 ms               ┊ GC (median):    10.51%
 Time  (mean ± σ):   2.416 ms ± 271.936 μs  ┊ GC (mean ± σ):  10.48% ± 1.48%

                  ▁ ▃▄▇█▅                                      
  ▄▅▅▅▄▄▄▃▄▃▃▃▃▃▄▆███████▇▆▅▅▄▄▅▅▃▃▂▃▂▃▃▂▁▃▃▃▂▁▃▃▃▃▃▂▃▂▂▃▁▂▁▃ ▃
  1.93 ms         Histogram: frequency by time        3.29 ms <

 Memory estimate: 6.50 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   3.481 ms  ┊ GC (min … max): 5.90% … 3.47%
 Time  (median):     1.734 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.886 ms ± 299.257 μs  ┊ GC (mean ± σ):  6.39% ± 1.41%

   ▂█▃▇▁                                                       
  ▅█████▆▄▃▄▅▄▄▃▃▄▃▃▃▃▃▂▃▂▂▃▃▃▃▃▃▃▃▂▃▃▃▃▃▃▄▂▂▂▂▂▂▁▃▂▃▂▂▂▂▁▁▁▂ ▃
  1.62 ms         Histogram: frequency by time        2.85 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   2.556 ms  ┊ GC (min … max): 6.02% … 7.28%
 Time  (median):     1.716 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.761 ms ± 134.372 μs  ┊ GC (mean ± σ):  6.81% ± 1.37%

    ▂▄█▄▄▃▆▆                                                   
  ▃▄█████████▇▇▅▄▄▃▃▃▃▃▃▄▃▁▂▃▃▃▃▃▁▂▃▃▃▃▃▃▂▂▃▂▂▃▂▁▂▃▂▁▂▂▁▂▂▁▂▃ ▃
  1.63 ms         Histogram: frequency by time        2.26 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

from tensoroperations.jl.

amilsted commented on September 26, 2024

Nice! And cool that this is very simple to implement with the tensoralloc scheme.

from tensoroperations.jl.

Manual allocation strategy about tensoroperations.jl HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent