Comments (2)
As promised, I've started experimenting with some strategies, it is still very much WIP but you can follow it here: AllocationKit
So far I've tried just plain malloc/free, and the finalizer version which is less prone to memory leaks, and initial benchmarks on my laptop report the underlying times. The testcase is the application of a single-site update encountered for example in DMRG or VUMPS for an Ising-like hamiltonian, repeated 30 times. (benchmark here )
In summary, it seems as if just doing this already leads to various improvements, depending on the tensor sizes. I might continue playing around this week with something like Bumper.jl, allthough I ran into some issues trying to get the StrideArray
/PtrArray
to work nicely with Strided.jl, so I will need to fix that first.
D = 64:
julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 400.062 μs … 715.187 μs ┊ GC (min … max): 13.62% … 35.15%
Time (median): 509.756 μs ┊ GC (median): 11.73%
Time (mean ± σ): 503.847 μs ± 51.078 μs ┊ GC (mean ± σ): 13.61% ± 3.73%
▁ ▂▂█▅▁▁ ▁
▃▂▄▄█▄▇▄▇▆▅▅▅▄▃▃▃▄▄▄▄▄▅▄▄▄▄▆██████▆▅█▇█▇▇█▅▆▅▅▅▄▄▁▃▃▄▂▃▃▁▃▃▃▃ ▄
400 μs Histogram: frequency by time 617 μs <
Memory estimate: 1.63 MiB, allocs estimate: 38.
julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 271.277 μs … 727.405 μs ┊ GC (min … max): 0.00% … 37.25%
Time (median): 345.764 μs ┊ GC (median): 13.97%
Time (mean ± σ): 360.340 μs ± 61.898 μs ┊ GC (mean ± σ): 8.75% ± 6.93%
▁▄▅ ▃▄▆▃█▅ ▁ ▂▁ ▃
▃████▄▆▆█▆▅▆▄▄▇████████▇████▆▅▅▅▆▃▅▄▃▃▄██▇▅▄▄▁▃▁▁▁▁▁▁▁▁▄█▇█▇▃ ▅
271 μs Histogram: frequency by time 493 μs <
Memory estimate: 899.11 KiB, allocs estimate: 39.
julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 276.067 μs … 568.403 μs ┊ GC (min … max): 0.00% … 37.13%
Time (median): 362.419 μs ┊ GC (median): 13.79%
Time (mean ± σ): 365.961 μs ± 39.841 μs ┊ GC (mean ± σ): 8.66% ± 6.82%
▃▃▃▇▅▁ ▁ ▁▇█▇▃▃
▃▅▄▄▆▄▄▂▃▃▂▁▅▃▃▂▃▃▅▆██████▇█▅▇▄▆▅▄▃▄▃▄▂▆████████▅▄▄▃▃▄▂▁▁▁▁▁▂ ▄
276 μs Histogram: frequency by time 448 μs <
Memory estimate: 899.11 KiB, allocs estimate: 39.
D = 128
julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.929 ms … 3.871 ms ┊ GC (min … max): 11.19% … 6.77%
Time (median): 2.403 ms ┊ GC (median): 10.51%
Time (mean ± σ): 2.416 ms ± 271.936 μs ┊ GC (mean ± σ): 10.48% ± 1.48%
▁ ▃▄▇█▅
▄▅▅▅▄▄▄▃▄▃▃▃▃▃▄▆███████▇▆▅▅▄▄▅▅▃▃▂▃▂▃▃▂▁▃▃▃▂▁▃▃▃▃▃▂▃▂▂▃▁▂▁▃ ▃
1.93 ms Histogram: frequency by time 3.29 ms <
Memory estimate: 6.50 MiB, allocs estimate: 38.
julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.625 ms … 3.481 ms ┊ GC (min … max): 5.90% … 3.47%
Time (median): 1.734 ms ┊ GC (median): 6.02%
Time (mean ± σ): 1.886 ms ± 299.257 μs ┊ GC (mean ± σ): 6.39% ± 1.41%
▂█▃▇▁
▅█████▆▄▃▄▅▄▄▃▃▄▃▃▃▃▃▂▃▂▂▃▃▃▃▃▃▃▃▂▃▃▃▃▃▃▄▂▂▂▂▂▂▁▃▂▃▂▂▂▂▁▁▁▂ ▃
1.62 ms Histogram: frequency by time 2.85 ms <
Memory estimate: 3.50 MiB, allocs estimate: 39.
julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.625 ms … 2.556 ms ┊ GC (min … max): 6.02% … 7.28%
Time (median): 1.716 ms ┊ GC (median): 6.02%
Time (mean ± σ): 1.761 ms ± 134.372 μs ┊ GC (mean ± σ): 6.81% ± 1.37%
▂▄█▄▄▃▆▆
▃▄█████████▇▇▅▄▄▃▃▃▃▃▃▄▃▁▂▃▃▃▃▃▁▂▃▃▃▃▃▃▂▂▃▂▂▃▂▁▂▃▂▁▂▂▁▂▂▁▂▃ ▃
1.63 ms Histogram: frequency by time 2.26 ms <
Memory estimate: 3.50 MiB, allocs estimate: 39.
from tensoroperations.jl.
Nice! And cool that this is very simple to implement with the tensoralloc
scheme.
from tensoroperations.jl.
Related Issues (20)
- Floating Point Accuracy of @tensor results with CUDA HOT 3
- Enable multithreads when doing the permutedims in the TTGT algorithms HOT 2
- Unexpected `DimensionMismatch` (v4.0.2 -> v4.0.3) HOT 3
- Wrong result with subnetworks with equal labels HOT 2
- Bug in CUDA backend HOT 6
- Unintuitive `ncon` result when scalar HOT 3
- Taking gradients of traces HOT 6
- np.einsum_path vs TensorOperations HOT 3
- `ncon` fails with AD HOT 2
- `tensortrace` not working on Arrays of Symbolic Expressions from Symbolics.jl. HOT 2
- Combining LinearAlgebra.Diagonal with a CuArray inside @tensor HOT 2
- Compability with CUDA 5.2 HOT 4
- Confusion when using cuTENSOR HOT 5
- cuTENSOR not working with automatic differentiation HOT 5
- Freed reference problem when combining cuTENSOR and Zygote HOT 8
- TensorOperationscuTENSORExt fails to compile HOT 4
- Costchecks are not using `GlobalRef` HOT 2
- Optdata cannot use `Int` as labeltype HOT 1
- install newest version? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensoroperations.jl.