Coder Social home page Coder Social logo

Comments (2)

lkdvos avatar lkdvos commented on September 26, 2024 2

As promised, I've started experimenting with some strategies, it is still very much WIP but you can follow it here: AllocationKit

So far I've tried just plain malloc/free, and the finalizer version which is less prone to memory leaks, and initial benchmarks on my laptop report the underlying times. The testcase is the application of a single-site update encountered for example in DMRG or VUMPS for an Ising-like hamiltonian, repeated 30 times. (benchmark here )

In summary, it seems as if just doing this already leads to various improvements, depending on the tensor sizes. I might continue playing around this week with something like Bumper.jl, allthough I ran into some issues trying to get the StrideArray/PtrArray to work nicely with Strided.jl, so I will need to fix that first.

D = 64:

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  400.062 μs … 715.187 μs  ┊ GC (min … max): 13.62% … 35.15%
 Time  (median):     509.756 μs               ┊ GC (median):    11.73%
 Time  (mean ± σ):   503.847 μs ±  51.078 μs  ┊ GC (mean ± σ):  13.61% ±  3.73%

      ▁                       ▂▂█▅▁▁       ▁                     
  ▃▂▄▄█▄▇▄▇▆▅▅▅▄▃▃▃▄▄▄▄▄▅▄▄▄▄▆██████▆▅█▇█▇▇█▅▆▅▅▅▄▄▁▃▃▄▂▃▃▁▃▃▃▃ ▄
  400 μs           Histogram: frequency by time          617 μs <

 Memory estimate: 1.63 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  271.277 μs … 727.405 μs  ┊ GC (min … max):  0.00% … 37.25%
 Time  (median):     345.764 μs               ┊ GC (median):    13.97%
 Time  (mean ± σ):   360.340 μs ±  61.898 μs  ┊ GC (mean ± σ):   8.75% ±  6.93%

    ▁▄▅          ▃▄▆▃█▅ ▁ ▂▁                                ▃    
  ▃████▄▆▆█▆▅▆▄▄▇████████▇████▆▅▅▅▆▃▅▄▃▃▄██▇▅▄▄▁▃▁▁▁▁▁▁▁▁▄█▇█▇▃ ▅
  271 μs           Histogram: frequency by time          493 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  276.067 μs … 568.403 μs  ┊ GC (min … max):  0.00% … 37.13%
 Time  (median):     362.419 μs               ┊ GC (median):    13.79%
 Time  (mean ± σ):   365.961 μs ±  39.841 μs  ┊ GC (mean ± σ):   8.66% ±  6.82%

                      ▃▃▃▇▅▁ ▁             ▁▇█▇▃▃                
  ▃▅▄▄▆▄▄▂▃▃▂▁▅▃▃▂▃▃▅▆██████▇█▅▇▄▆▅▄▃▄▃▄▂▆████████▅▄▄▃▃▄▂▁▁▁▁▁▂ ▄
  276 μs           Histogram: frequency by time          448 μs <

 Memory estimate: 899.11 KiB, allocs estimate: 39.

D = 128

julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.929 ms …   3.871 ms  ┊ GC (min … max): 11.19% … 6.77%
 Time  (median):     2.403 ms               ┊ GC (median):    10.51%
 Time  (mean ± σ):   2.416 ms ± 271.936 μs  ┊ GC (mean ± σ):  10.48% ± 1.48%

                  ▁ ▃▄▇█▅                                      
  ▄▅▅▅▄▄▄▃▄▃▃▃▃▃▄▆███████▇▆▅▅▄▄▅▅▃▃▂▃▂▃▃▂▁▃▃▃▂▁▃▃▃▃▃▂▃▂▂▃▁▂▁▃ ▃
  1.93 ms         Histogram: frequency by time        3.29 ms <

 Memory estimate: 6.50 MiB, allocs estimate: 38.

julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   3.481 ms  ┊ GC (min … max): 5.90% … 3.47%
 Time  (median):     1.734 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.886 ms ± 299.257 μs  ┊ GC (mean ± σ):  6.39% ± 1.41%

   ▂█▃▇▁                                                       
  ▅█████▆▄▃▄▅▄▄▃▃▄▃▃▃▃▃▂▃▂▂▃▃▃▃▃▃▃▃▂▃▃▃▃▃▃▄▂▂▂▂▂▂▁▃▂▃▂▂▂▂▁▁▁▂ ▃
  1.62 ms         Histogram: frequency by time        2.85 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
 Range (min … max):  1.625 ms …   2.556 ms  ┊ GC (min … max): 6.02% … 7.28%
 Time  (median):     1.716 ms               ┊ GC (median):    6.02%
 Time  (mean ± σ):   1.761 ms ± 134.372 μs  ┊ GC (mean ± σ):  6.81% ± 1.37%

    ▂▄█▄▄▃▆▆                                                   
  ▃▄█████████▇▇▅▄▄▃▃▃▃▃▃▄▃▁▂▃▃▃▃▃▁▂▃▃▃▃▃▃▂▂▃▂▂▃▂▁▂▃▂▁▂▂▁▂▂▁▂▃ ▃
  1.63 ms         Histogram: frequency by time        2.26 ms <

 Memory estimate: 3.50 MiB, allocs estimate: 39.

from tensoroperations.jl.

amilsted avatar amilsted commented on September 26, 2024

Nice! And cool that this is very simple to implement with the tensoralloc scheme.

from tensoroperations.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.