Coder Social home page Coder Social logo

Problem with gcc 6 and i386 about erfa HOT 22 CLOSED

liberfa avatar liberfa commented on August 15, 2024
Problem with gcc 6 and i386

from erfa.

Comments (22)

astrofrog avatar astrofrog commented on August 15, 2024

@sergiopasra - just to check, does this depend on the optimization level you compile with?

from erfa.

sergiopasra avatar sergiopasra commented on August 15, 2024

@astrofrog Yes, with -O0 and -O1 it works, with -O2, -O3 it doesn't (only in i386, x86_64 works with both)

from erfa.

astrofrog avatar astrofrog commented on August 15, 2024

@sergiopasra - do you see the same issue when compiling SOFA? If so, we'll need to report it upstream.

from erfa.

sergiopasra avatar sergiopasra commented on August 15, 2024

@astrofrog I obtain the same if I change the compilation flags of sofa from "-pedantic -Wall -W -O" to "-pedantic -Wall -W -O2"
I'm not sure that they are going to consider this a bug

from erfa.

timj avatar timj commented on August 15, 2024

How do we know this isn't a GCC6 compiler bug? How much does the value change between GCC5 and GCC6? Does reordering some expressions fix it? How much does the tolerance have to change to make the test pass (not very much)? What is the actual required precision of this calculation?

from erfa.

olebole avatar olebole commented on August 15, 2024

This also affects Debian: Bug 835105.

from erfa.

olebole avatar olebole commented on August 15, 2024

I played around a bit with the different optimization options: The source code in question is src/starpv.c, and the optimization option is -fcaller-saves.
Applying -O2 -fno-caller-saves to gcc -c starpv.c works well, -O1 -fcaller-saves shows the failure.

from erfa.

timj avatar timj commented on August 15, 2024

So this looks like a GCC compiler bug then.

from erfa.

olebole avatar olebole commented on August 15, 2024

Agreed. I could extract the following code from starpv.c:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double f(double x) {
  return sqrt(1.0 - x) - 1.0;
}

double g(double x) {
  double res = f(x);
  printf("r =%.20g\n", res);
  return res;
}

int main(void) {
  double x = 3.1740879016271733482e-09;
  double r1 = f(x);
  double r2 = g(x);
  printf("r1=%.20g r2=%.20g diff=%20g\n", r1, r2, r2-r1);
  exit(0);
}  

This code should print the same number for r1 and r2, and a diff of 0. When compiled with gcc -O2 -o m m.c -lm, I get the result

r =-1.5870439520936432953e-09
r1=-1.5870439407095204842e-09 r2=-1.5870439520936432953e-09 diff=        -1.13841e-17

when I add -fno-caller-saves, everything is fine. The numbers are taken from the erfa test case.

from erfa.

olebole avatar olebole commented on August 15, 2024

FYI: I added a gcc bug for this issue.

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

not really a gcc bug, i386 does not have reliable floating point math due to the 80 bit x87 fpu, if you need it try the -fexcess-precision=standard or -ffloat-store or -mfpmath=sse
see https://gcc.gnu.org/wiki/x87note (or the super old gcc issue 323)

from erfa.

olebole avatar olebole commented on August 15, 2024

@juliantaylor When thinking about the numbers here, I don't think this is the issue: with double precision (64 bit), we have a mantissa of 52(+1) bit and a precision of ~16 digits. The two numbers in the test case however differ already in the 8th digit. Having 80 bit here, or converting 80 <-> 64 bit cannot explain this.

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

don't know what you are computing, but that they differ on the 8th digit does not necessarily imply that its not the excess precision that is to blame. Rounding differences can easily amplify .
Also that fno-caller-saves "fixes" it also speaks for it, that flag basically causes a limited form of -ffloat-store (but possibly worse for performance as it also affects non float register allocations).
Does -fexcess-precision=standard fix the issue?

from erfa.

olebole avatar olebole commented on August 15, 2024

The code is above, and I see nothing there that could amplify the error. From rounding 80->64 bit, I see no reason why it should be legitimate to have an error in the 8th digit; the error should be in the order of the precision. -fexcess-precision=standard fixes the problem, however.

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

I don't think its obvious to say that from that code, there are several things in it that could amplify it, subtractions, divisions its got it all.
An actual compiler mis-compilation that produces a 8th digit accuracy difference on the other hand seems less likely to me, in particular one that disabling caller saves fixes.
In the end only a proper numerical analysis or looking at the dissambly can prove what is to blame, but for me the evidence that it is the well known x87 issue is pretty overwhelming while a mis-compilation is still just speculation.
Then its not really worthwhile to dive into this deeper to see what it is, you should never do numerics on x87 fpus without proper compiler flags anyway (-mfp=sse is the best option for determinism and performance).

from erfa.

olebole avatar olebole commented on August 15, 2024

OK, agreed (and I am going to close the gcc bug here as well). However, I am not happy with just enabling these flags -- IMO they don't solve the underlying problem, but hide it: The function eraStarpv() is quite inaccurate because of sqrt(1.0 - betr*betr - bett*bett) - 1.0, if betr and bett are very small. In the test case of astropy, they are ~ 10^-4, which translates into an accuracy of only ~ 8 digits. Astropy however seems to need a higher accuracy in this loop when handling ephemerides.
IMO, there are two things is one thing needed/possible here:

  • the mentioned code line should be changed for small values of betr and bett (use polynom instead of square root in this case, or similar). This should be discussed with upstream. Or at least the limited precision should be documented.
  • Astropy should not assume that the precision of eraStarpv() is very high, and change the loop accordingly. I will copy my comments to astropy/astropy#5425 as well. Edit: I mixed up things here. This was nonsense ;-)

The point here is IMO that the x87 FPU does not give wrong results, but just different ones. Both results are as expected within the accuracy, the 64-bit results are just more predictable. This gives us the chance to find exactly such problems, and therefore I would vote against just setting -mfp=sse or such. Instead, i386 CI tests would be quite useful.

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

indeed that expression is poor for small values
herbie recommends this replacement: http://herbie.uwplse.org/demo/6c625a8ebdbd46954b959c975040214b/graph.html

from erfa.

olebole avatar olebole commented on August 15, 2024

Funnily (but as expected), the i386 result seems to be the better one ;-)

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

Its not actually very surprising, the x87 fpu does have 80 bit precision for intermediate results, so it being better is not unusual. But its indeterminism is often outweighs this. If you have something that needs 80 bits you can use long doubles at some performance cost on newer machines (and less portability as some arches don't have 80 bit long doubles).

from erfa.

olebole avatar olebole commented on August 15, 2024

Well, deterministically inaccurate results are not really what one wants...

from erfa.

juliantaylor avatar juliantaylor commented on August 15, 2024

you can easily construct cases where 80 bit is not enough either, its just shifting the problem a tiny bit.
determinism is far more valuable.

from erfa.

olebole avatar olebole commented on August 15, 2024

An algorithm should give predictable results within the expected accuracy. If it does not, it is the algorithm which is to blame, not the "unpredictability" of the FPU. If we would have just enabled -mfp=sse, we would not have detected the cause of the problem. Therefore, unpredicted results within the specified accuracy help to keep good algorithms, while -mfp=sse just lets it look good but hides the real problem.
Just make sure that the algorithm is robust, then one doesn't need to care about additional unexpected precision.

from erfa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.