Comments (8)
Yeah, the small number of evaluations was just during testing, as I thought it might have falling back to cpu. Usually I use much much bigger numbers....
MonteCarlo is also not convincing. It is also very slow (much slower than CPU) and did even throw some wired exceptions some times ....
As I have to pay for the TPU usage, I might not test to much. I am using it with the V100 NVIDIA card at the moment, which is quite fine....
Thanks for your support...
from torchquad.
Good tip, but does not help :(
from torchquad.
Hi @dsmic !
Thanks for posting this.
On a practical level, my first suspicion would be this
torch.set_default_device(dev)
set_up_backend("torch", data_type="float64", torch_enable_cuda=False)
Set up backend probably calls this code , leading to a call of
torch.set_default_tensor_type("float64")
which may not be correct for TPU? 🤔
If that is not it, just to be sure, are you sure the problem is within torchquad? Not sure if you can use a different torch / torch_xla version to check if you get a more verbose feedback there?
from torchquad.
Thx for the response. I did some digging and it seems it is just awfully slow. Taking 20seconds for preparing the next call to my function:
counter = 0
def Norm(wf, x):
global counter
print('deb', x.device)
res = (torch.conj(wf(x)) * wf(x)).real
print(counter, 'res', res)
counter += 1
return res
My function returns nearly immediately :(
So I am not sure, what is so expensive with TPU...
from torchquad.
Can you check which device your tensors are on? I suspect you are using the CPU and not TPU because torch.set_default_tensor_type("float64")
makes the CPU the default device to use. I am not quite sure what default tensor type should be used with TPU/XLA. You could try not setting up the backend at all but I am not sure that works 🤔 Alternatively, try moving your torch.set_default_device(dev)
call after the set_up_backend
call?
If neither works, we might need a dedicated backend type for TPUs. Not sure if we ever tried them before.
from torchquad.
Yes, the tensors are on the device. (my debug print prints the device. I increased the log level and the time seems to be spend within torchquad:
13:57:19|TQ-INFO| Setting Torch's default tensor type to Float64 (CUDA not initialized).
13:57:19|TQ-DEBUG| Setting LogLevel to TRACE
13:57:19|TQ-DEBUG| Checking inputs to Integrator.
13:57:19|TQ-DEBUG|
VEGAS integrating a 6-dimensional fn with 10000 points over [[-1.2999999523162842, -1.2899999523162842], [tensor(-1.3000, device='xla:1'), tensor(1.3000, device='xla:1')], [tensor(-1.3000, device='xla:1'), tensor(1.3000, device='xla:1')], [-0.1, 0.1], [-0.1, 0.1], [-0.1, 0.1]]
13:57:19|TQ-DEBUG| Setting up integration domain.
13:57:19|TQ-DEBUG| Starting VEGAS
13:57:19|TQ-DEBUG| Running Map Warmup with warmup_N_it=5, N_samples=80...
13:57:19|TQ-DEBUG| | Iter | N_Eval | Result | Error | Acc | Total Evals
tensor([1., 1., 3., 1., 1.], device='xla:1')
xla:1
(0, 1, 2) 0
(0, 2, 1) 1
(1, 0, 2) 1
(1, 2, 0) 0
(2, 0, 1) 0
(2, 1, 0) 1
[[0 1 2 3 4 5]
[0 2 1 3 4 5]
[1 0 2 3 4 5]
[1 2 0 3 4 5]
[2 0 1 3 4 5]
[2 1 0 3 4 5]]
(6, 6)
deb xla:1
0 res xla:1
13:57:19|TQ-DEBUG| The integrand was not evaluated in 28 of 240 VEGASMap intervals. Filling the weights for some of them with neighbouring values.
13:57:20|TQ-DEBUG| remaining intervals: 1
13:57:20|TQ-DEBUG| remaining intervals: 0
13:57:35|TQ-DEBUG| | 0| 80| 4.000101e-05 | 3.852720e-11 | 1.551718e-01%| 80
deb xla:1
1 res xla:1
13:57:35|TQ-DEBUG| The integrand was not evaluated in 22 of 240 VEGASMap intervals. Filling the weights for some of them with neighbouring values.
13:57:35|TQ-DEBUG| remaining intervals: 0
13:58:12|TQ-DEBUG| | 1| 80| 4.332120e-05 | 3.020238e-11 | 1.268587e-01%| 160
deb xla:1
2 res xla:1
13:58:12|TQ-DEBUG| The integrand was not evaluated in 34 of 240 VEGASMap intervals. Filling the weights for some of them with neighbouring values.
13:58:13|TQ-DEBUG| remaining intervals: 1
13:58:13|TQ-DEBUG| remaining intervals: 0
13:59:12|TQ-DEBUG| | 2| 80| 4.083231e-05 | 1.591607e-11 | 9.770439e-02%| 240
deb xla:1
3 res xla:1
13:59:13|TQ-DEBUG| The integrand was not evaluated in 33 of 240 VEGASMap intervals. Filling the weights for some of them with neighbouring values.
13:59:13|TQ-DEBUG| remaining intervals: 1
13:59:13|TQ-DEBUG| remaining intervals: 0
14:00:38|TQ-DEBUG| | 3| 80| 4.592329e-05 | 1.429694e-11 | 8.233576e-02%| 320
deb xla:1
4 res xla:1
14:00:38|TQ-DEBUG| The integrand was not evaluated in 38 of 240 VEGASMap intervals. Filling the weights for some of them with neighbouring values.
14:00:39|TQ-DEBUG| remaining intervals: 2
14:00:39|TQ-DEBUG| remaining intervals: 0
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
from torchquad.
Hmmmm, okay that's good.
Then, it could be that the problem is specific to vegas. I noticed you are using a comparatively small number of evaluation points, that is usually quite inefficient with VEGAS (as those evaluation are split between a number of iterations, so you parallelize over a small number of points in the end). Could you try a different integrator to see if that is better?
from torchquad.
Okay, one final thought maybe: I noticed you are using float64, could this be the problem? TPUs are targeted at float16 if I am not mistaken?
from torchquad.
Related Issues (20)
- Change behavior of 'backend' HOT 6
- Elementwise numerical integration HOT 28
- Let user choose which GPU to use HOT 4
- torch dataloader crashed after `set_up_backend` HOT 4
- Cannot import torchquad (conda install) HOT 2
- A lot of warnings in the current test CI on develop HOT 1
- Add tests for JIT HOT 2
- Evaluate many different integrands over the different domain HOT 6
- Coverage check fails on PRs HOT 1
- Release 0.4.0 HOT 8
- Tests failing on GPU HOT 15
- torch.tensor containing integers as integration domain returns zero with non-compiled integrator HOT 2
- Logging with loguru not implemented correctly
- Example/documentation for parametric domain of integration? HOT 9
- Integrate function with parameters HOT 3
- CI failing on coverage check
- Regression in tests with TF HOT 4
- depricated torch.set_default_tensor_type()
- torchquad0.4版本没有BatchMulVEGAS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchquad.