I am relying on gap_fit to be reproducible, and it do

We've studied this quite a bit with <a class="user-mention notranslate" data-hovercard

gap_fit is not reproducible even passing in seed about quip HOT 22 CLOSED

bernstei commented on August 20, 2024

gap_fit is not reproducible even passing in seed

from quip.

Comments (22)

albapa commented on August 20, 2024

We've studied this quite a bit with @Sideboard and @jameskermode and our conclusion was that the linear system is so ill-conditioned that even the smallest perturbation will change the weights quite considerably. The good thing is that the physics of the model does not seem to be sensitive. Our idea for a test was a Delta-type test, and those work very reliably.

from quip.

albapa commented on August 20, 2024

And for even differences way larger than 1e-8 the model is close to equivalent.

from quip.

bernstei commented on August 20, 2024

Why is there any change? It should be bit-exact. My problem is the usual one - I then use the GAP to do minimization, from those make another GAP, etc, and eventually it's different. I agree that the physics isn't different, but it's harder to write a unit test. I can work around it, but why is it even an issue?

from quip.

albapa commented on August 20, 2024

Can you change the test to something that uses np.isclose() or equivalent?

from quip.

bernstei commented on August 20, 2024

Is the seed being applied to only the sparse points, but not the jitter, or something?

from quip.

albapa commented on August 20, 2024

Why is there any change? It should be bit-exact. My problem is the usual one - I then use the GAP to do minimization, from those make another GAP, etc, and eventually it's different. I agree that the physics isn't different, but it's harder to write a unit test. I can work around it, but why is it even an issue?

I remember a discussion precisely on this on Slack - conclusion was floating point algebra isn't exact, that's why.

from quip.

bernstei commented on August 20, 2024

I remember a discussion precisely on this on Slack - conclusion was floating point algebra isn't exact, that's why.

It's not exact, but it is deterministic (in the absence of OpenMP anyway - I guess I should check that it's disabled for this test, but I think it is).

from quip.

albapa commented on August 20, 2024

The jitter is a simple constant, although somewhat counterintuitively. If the sparse points are the same (and they should be) then it's the way the covariance matrix is built, and then the solution of the system.

from quip.

albapa commented on August 20, 2024

Are the array operations in Fortran deterministic? Like addition, multiplication, sums etc? We make extensive use of those.

But @Sideboard built in a series of prints that can dump all kinds of intermediate matrices, so it is possible to check where things go awry.

from quip.

bernstei commented on August 20, 2024

I believe that without OpenMP they are deterministic, but I can investigate this. If you could tell me the interface to the intermediate print stuff (just verbosity?) it'd save me the time to add exactly that.

from quip.

bernstei commented on August 20, 2024

The alphas are similar, so I assume that the sparse points are the same.

from quip.

bernstei commented on August 20, 2024

Interesting - looks like maybe it is an OpenMP thing, because when I run twice manually it is reproducible exactly. If I convince myself of this, I'll close this issue.

from quip.

gabor1 commented on August 20, 2024

Is this true also for turbosoap?

from quip.

bernstei commented on August 20, 2024

Don't know yet. But I currently think it's a false positive, and my fault for not fully disabling OpenMP.

from quip.

gabor1 commented on August 20, 2024

For unit testing you could make the design matrix small enough and the environments distinct enough that the condition number is not huge and then you'd get better reproducibility

from quip.

bernstei commented on August 20, 2024

It's already not bad, even with OpenMP - at the first 2 iterations I compare the predicted energies and they are within 1e-5. It's only that it builds on itself, and by the 3rd iteration the RSS minima are different, and then it falls apart. But as I said, I'm pretty sure I just failed to disable OpenMP everywhere (specifically in run generating the reference data, which is eventually used by the final pytest where OpenMP was disabled).

[added] the design matrix is small just so the fits are fast.

from quip.

bernstei commented on August 20, 2024

Looks like it's a subtle np.dot() issue, perhaps a bug. Apparently I get a slightly different dot product, therefore a different config energy_sigma, therefore different GAP, and it all builds from there.

from quip.

bernstei commented on August 20, 2024

It's not a determinism issue, apparently - just different results from np.dot(v1, v2) and np.sum(v1 * v2) on different machines (a compute node generating the reference data and the head node runningthe final pytest)

from quip.

bernstei commented on August 20, 2024

Here's a script that shows the problem. On our older cpus it prints 0.0, but on newer CPUs (with OpenBLAS, set to 1 thread) it prints a number of order 1e-16. The head node running pytest is new enough to give the 1e-16, while the node generating reference data was old enough to give 0.0.

import numpy as np

p = np.asarray([16.90513420661038423986610723659396171569824218750000, 0.50000000000000000000000000000000000000000000000000, -1.71583683999999991875995419832179322838783264160156])
v = np.asarray([0.01628648030646704172874628113731887424364686012268, -0.17188353256547875269610869963798904791474342346191, -0.98498264035059979182307188239064998924732208251953])

val_dot = np.dot(v, p)
val_sum = np.sum(v * p)

np.show_config()
print("dot - sum", val_dot - val_sum)

from quip.

bernstei commented on August 20, 2024

I opened this OpenMathLib/OpenBLAS#3583. Once I'm 100% sure it's this, I'll close this issue, but we should all be careful with numpy, which uses OpenBLAS by default.

from quip.

Sideboard commented on August 20, 2024

I remember having different results for identical Turbomole runs once (in the last digits). The reason was afair different math libraries on the nodes. I suppose you use a cluster with uniform nodes so that shouldn't be a problem.

For testing gap_fit I also encountered the problem that even OpenMP runs with identical options (threads, chunk size) are non-deterministic as addition is not commutable for floating point arithmetic, and threads may be faster or slower. This can be dampened by explicit ordered reduction. I have a branch where I tested it in one spot. It works but makes the code more verbose, so the question is if it's worth the effort. The restriction to identical OpenMP options would still apply.

from quip.

bernstei commented on August 20, 2024

Sorry, I should have closed this. Serial gap_fit is deterministic. I agree that it may be possible to make OpenMP and/or MPI deterministic as well (I know I did something like that a few years ago for CP2K with ScaLAPACK or PBLAS or something), but I don't think it's worth the effort for what I need. My real problem is that np.dot is architecture dependent when using OpenBLAS (at least the version provided by conda), because on avx512 machines it uses some operation that is slightly more accurate than the default behavior. That was affecting my GAP multistage fit script, and therefore giving slightly different potentials.

from quip.

gap_fit is not reproducible even passing in seed about quip HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent