Coder Social home page Coder Social logo

Comments (4)

fgebhart avatar fgebhart commented on May 18, 2024 1

@kw4tts wow what an investigation! Thank you so much for this thorough comment.

I totally agree on your suggestion of using haversine instead of the custom math formula. By now I actually already have some other tool published which in fact uses haversine. I have to admit though, that I simply adopted the haversine pypi package because of its convenience and not the improved handling of coordinates with small proximity. Thanks to your in depth investigation I now also have much more important argument at hand.

I will try to come up with an implementation which replaces the current math formula with the haversine approach soon.

from workoutizer.

fgebhart avatar fgebhart commented on May 18, 2024

Hey! Thanks for filing that issue and providing a proposed fix! 👍

As I'm on a somewhat lengthy family trip, I won't be able to investigate this issue any time soon.

However, your proposed fix looks good to me at first glance. If you want and have time a PR would be highly appreciated. I believe adding a little reproducing test for the mentioned function would help in gaining confidence that your proposed change actually fixes the bug.

Let me know if I should provide more support/info.

from workoutizer.

kw4tts avatar kw4tts commented on May 18, 2024

Hello! Thank you for your reply. Please, enjoy the trip, as this "bug" is far from critical.

As it currently stands, I have no experience writing unit tests; that's why I stated I am happy to discuss whether the "bodge" is appropriate (and whether it fits the PR anyhow).

I have been, however, doing some research in the meanwhile. Mainly discovering, why does this precise GPX log file trigger this error.

As a write-up; it's difficult to logically sum up past 6 days; but here we go - it's a long, and messy one.

EDIT: a late-night post is never a good idea. This comment has been updated 2022-05-20 at 9:30AM CEST.


Chapter 1: Reproducing the error (with fictional coordinates)

First, I've been experimenting with fictional coordinates - [1,1] and [1.0001,1.0001], slowly closing them together - a small increment at a time. At first, I could not reproduce the problem in its original form.
As the matter of fact, with the pair [1,1] and [1.00001,1.0001] (and with the delta step of simply ridiculous -=0.00000000000001), the first few test result can be seen fluctuating around two values, suggesting the presence of rounding errors.

As the distance was 15 meters (and we already know this is far enough not to trigger error in WKZ), I decided to close the gap further, starting from [1,1] and [1.00001,1.00001] - distance of 1.5 meter - and with a bit more "modest" delta step of -=0.0000000001 degree:

from math import acos,cos,sin,pi
coordinate_1=[1,1]; coordinate_2=[1.00001,1.00001]
def _to_rad(deg): return deg*pi/180.0

while (coordinate_2[0]>coordinate_1[0]) and (coordinate_2[1]>coordinate_1[1]):
  try:
    angle1=sin(_to_rad(coordinate_1[0])) * sin(_to_rad(coordinate_2[0])) + cos(_to_rad(coordinate_1[0])) * cos(_to_rad(coordinate_2[0])) * cos(_to_rad(coordinate_1[1] - coordinate_2[1]))
    distance1 = acos(angle1)* 6378100; distance2 = acos(max(0,min(1,angle1))) * 6378100
    print("Angle: ",angle1,"\t","distance1:\t",distance1,"\t\tDistance2:\t",distance2,"\t\tDistance equal:", distance1==distance2)
    coordinate_2[0]-=0.0000000001; coordinate_2[1]-=0.0000000001
  except:
    print("Angle: ",angle1,"\t","distance1:\t",distance1,"\t\tDistance2:\t",distance2);
    print("Coordinate 2:\t",coordinate_2);
    print("Error");
    break

This was the computational output:

Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  0.9999999999999999 	 distance1:	 0.09504109621047974 		Distance2:	 0.09504109621047974 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  0.9999999999999999 	 distance1:	 0.09504109621047974 		Distance2:	 0.09504109621047974 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  0.9999999999999999 	 distance1:	 0.09504109621047974 		Distance2:	 0.09504109621047974 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  0.9999999999999999 	 distance1:	 0.09504109621047974 		Distance2:	 0.09504109621047974 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0 	 distance1:	 0.0 		Distance2:	 0.0 		Distance equal: True
Angle:  1.0000000000000002 	 distance1:	 0.0 		Distance2:	 0.0
Coordinate 2:	 [1.000000415499207, 1.000000415499207]
Error

There we go.

In the theoretical case of two neighboring points, which "happen" to have the same offset in both the latitude as well as in longitude, the "degree delta" between the latitude and longitude, smaller than 0.000000415499207 would therefore be enough to trigger a rounding error, and therefore failing the calculation. As said before, we also see some distance fluctuations between "9.5cm" and "0cm".

What we also see is that the smallest possibile distance we can calculate is 9.5 centimeters.


Chapter 2: Reproducing the error (with the GPX file)

I have later tried to reproduce the error with the original GPX file. I have extracted the functions, used by WKZ, into a single separated script and did some "print debugging" on it.

It occured to me, that in my case, the pair of "point, next_point" (to be using get_total_distance_of_trace() nomenclature) that triggered the computational error was (46.24446, 14.191566), (46.24446, 14.191565).

Quick search throughout the GPX file contains these records - to a decimal place precisely, indicating no raw data has been rounded while (for example) loading them into Pandas array, so that's OK-good:

   <trkpt lat="46.2444600" lon="14.1915660">
    <ele>688.5</ele>
    <time>2020-07-28T06:07:31Z</time>
   </trkpt>
   <trkpt lat="46.2444600" lon="14.1915650">
    <ele>688.5</ele>
    <time>2020-07-28T06:07:32Z</time>

Chapter 3: How close is far enough, really? (The distance between two points)

I plugged the coordinates into the NHC calculator and got NaN error. The NHC calculator is based on Great Circle Calculator, which returned value of aprox. 7.69cm, smaller than 9.5cm I've discovered as a "limit" in the beginning of this comment, thus hinting the cause for original calculation formula failure. This would also indicate, that NHC's calculator probably resides on the formula, currently in use by WKZ.

The NHC's calculator page has a link that has brought me to a chapter about distance calculation.

I began to dig around and search for the different methods of calculating the distance between two objects on the sphere. Sure enough, I quickly found a Wikipedia Article about Great circle distance and therefore also learned about a spherical law of cosines (method, used by WKZ). The article also quickly confirmed the sentence by Mr. Williams, about the rounding errors for two nearby coordinates:

Mr. Williams (about alternative formula for distance calculation compared to Spherical law of cosines):

A mathematically equivalent formula, which is less subject to rounding error for short distances

Wikipedia:

On computer systems with low floating point precision, the spherical law of cosines formula can have large rounding errors if the distance is small (if the two points are a kilometer apart on the surface of the Earth, the cosine of the central angle is near 0.99999999). For modern 64-bit floating-point numbers, the spherical law of cosines formula, given above, does not have serious rounding errors for distances larger than a few meters on the surface of the Earth.

Opening Google maps and using coordinates from my GPX file does confirm, that these are quite "a bit" closer, compared to "a few meters". I'd rather say, they are pretty much few centimeters apart (minus the HDOP and gps accuracy, but that's a story for another day).

This lead me to Haversine formula. Further reading on Wikipedia there states:

When using these formulae, one must ensure that h does not exceed 1 due to a floating point error (d is only real for 0 ≤ h ≤ 1). h only approaches 1 for antipodal points (on opposite sides of the sphere).

Thinking about that for a second brought me to a conclusion, that one rarely changes his location while working out, in a manner so rapid, causing his two consecutive track points being separated by half of a Earth's sphere.


Chapter 4: How does the alternative stack up to the existing solution?

Comparison time, then. In order to compare Haversine result with currently used, I implemented Haversine function in a totally-not-this-repo-friendly manner, also implementing my bodge for existing calculation formula (in order to finish calculations at all)

from math import acos, cos, pi, sin, asin, sqrt
from typing import List, Tuple
import gpxpy
import pandas as pd

def _to_rad(degree: float) -> float:
    return degree / 180 * pi

def calculate_distance_between_points(coordinate_1: Tuple[float], coordinate_2: Tuple[float]) -> float:
    if coordinate_1[0] == coordinate_2[0] and coordinate_1[1] == coordinate_2[1]:
        return 0.0
    
    AngleBetweenPoints1=sin(_to_rad(coordinate_1[0])) * sin(_to_rad(coordinate_2[0])) + cos(_to_rad(coordinate_1[0])) * cos(_to_rad(coordinate_2[0])) * cos(_to_rad(coordinate_2[1] - coordinate_1[1]))
#    if(AngleBetweenPoints1>1):
#        print(coordinate_1,"\t",coordinate_2)
    try:
        distance = acos(AngleBetweenPoints)
        return distance * 6378100
    except:
        #print("RoundErr")
        distance = acos(max(0,min(AngleBetweenPoints1,1)))
        #print(distance)
        return distance * 6378100

def haversine(lat1, lon1, lat2, lon2):
    dLat = (lat2 - lat1) * pi / 180.0
    dLon = (lon2 - lon1) * pi / 180.0
    lat1 = (lat1) * pi / 180.0
    lat2 = (lat2) * pi / 180.0
    a = (pow(sin(dLat / 2), 2) +
         pow(sin(dLon / 2), 2) *
             cos(lat1) * cos(lat2));
    rad = 6378100 # sphere radius in m
    c = 2 * asin(sqrt(a))
    return rad * c

latitude_list=[]; longitude_list=[]

gpx_file=open('BMC.gpx','r')
gpxData=gpxpy.parse(gpx_file)
for track in gpxData.tracks:
    for segment in track.segments:
        for point in segment.points:
            latitude_list.append(point.latitude)
            longitude_list.append(point.longitude)

coordinates_df = pd.DataFrame({"lat": latitude_list, "lon": longitude_list})
coordinates_df.dropna(inplace=True)

#legacy method
total_distance1 = 0
for index, row in coordinates_df.iterrows():
    if index < len(coordinates_df) - 1:
        point = (row["lat"], row["lon"])
        next_point = (coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
        dist = calculate_distance_between_points(point, next_point)
        #if dist==0.0: print(index)
        total_distance1+=dist

#haversines method
total_distance2 = 0
for index, row in coordinates_df.iterrows():
    if index < len(coordinates_df) - 1:
        dist=haversine(row["lat"], row["lon"], coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
        total_distance2+=dist

print("Total distance (Legacy):\t",total_distance1/1000,"km")
print("Total distance (Haversine):\t",total_distance2/1000,"km")

The output:

>>> print("Total distance (Legacy):\t",total_distance1/1000,"km")
Total distance (Legacy):	 100.57842160860646 km
>>> print("Total distance (Haversine):\t",total_distance2/1000,"km")
Total distance (Haversine):	 100.58544858153573 km
>>> 

Chapter 5: Results are in, but how and why?

OK, so the distance differs for 10 meters. Bravo, me. Slow clap. Saved the world. Bra-vo. Give yourself a pat on the back. (/s)

But where does the difference come from?

Closely examining the legacy code seems to suggest that there are lot of points in my recording, that are (accidentally?) very close together:

zero_distance=[]
coordinates_df = pd.DataFrame({"lat": latitude_list, "lon": longitude_list})
coordinates_df.dropna(inplace=True)
for index, row in coordinates_df.iterrows():
    if index < len(coordinates_df) - 1:
        point = (row["lat"], row["lon"])
        next_point = (coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
        dist = calculate_distance_between_points(point, next_point)
        if dist==0: zero_distance.append(index)

>>> print(len(zero_distance))
178

Tis many close-proximity points could accumulate for a difference of 10 meters. Heck, it could accumulate for even more.
Of course, one could always assume they are on same coordinates.

I am not going to assume or dig down this rabbit hole too much. :)


Chapter 6: Testing the new function again - with our first, fictional scenario.

So, how does Haversine stack up in the rounding errors on ultra-small coordinate differences?

I have decided to run the test from the beginning of my comment on Haversines:

from math import pi, pow, sin, cos, asin, sqrt
coordinate_1=[1,1]; coordinate_2=[1.00001,1.00001]
def _to_rad(deg): return deg*pi/180.0

def haversine(lat1, lon1, lat2, lon2):
  dLat = (lat2 - lat1) * pi / 180.0
  dLon = (lon2 - lon1) * pi / 180.0
  lat1 = (lat1) * pi / 180.0
  lat2 = (lat2) * pi / 180.0
  a = (pow(sin(dLat / 2), 2) +
       pow(sin(dLon / 2), 2) *
       cos(lat1) * cos(lat2));
  rad = 6378100
  c = 2 * asin(sqrt(a))
  return rad * c

while (coordinate_2[0]>coordinate_1[0]) and (coordinate_2[1]>coordinate_1[1]):
  distance=haversine(coordinate_1[0],coordinate_1[1],coordinate_2[0],coordinate_2[1])
  print("Distance: \t",distance,"\tCoordinate: \t",coordinate_2)
  coordinate_2[0]-=0.0000000001
  coordinate_2[1]-=0.0000000001

Last few lines of output:

Distance: 	 9.43197503796711e-05 	Coordinate: 	 [1.0000000005991727, 1.0000000005991727]
Distance: 	 7.85780858509159e-05 	Coordinate: 	 [1.0000000004991727, 1.0000000004991727]
Distance: 	 6.283642132216042e-05 	Coordinate: 	 [1.0000000003991727, 1.0000000003991727]
Distance: 	 4.7094756793404725e-05 	Coordinate: 	 [1.0000000002991727, 1.0000000002991727]
Distance: 	 3.135309226464879e-05 	Coordinate: 	 [1.0000000001991727, 1.0000000001991727]
Distance: 	 1.5611427735892607e-05 	Coordinate: 	 [1.0000000000991727, 1.0000000000991727]

This would seem to suggest that the accuracy of Haversine function is far greater and error-prone. Even lowering the "delta step" led me to the results in range of "1e-08".


I know this comment is an exhausting giant mess, I won't lie. At least I hope, that it illustrates that the cause of the problem is the rounding error (in cosine function), and that for small proximity, cosines should apparently be avoided. :)

As stated above, as our sport activities will never jump across half the globe between two recordings, my personal suggestion is that we replace the existing mathematical formula with Haversines.

Take all the time you need and let me know what do you think.

Kind regards!

P.S. Also, are you sure that this is multiplied with "centimeters"? I believe it should stand for "meters" :)

from workoutizer.

fgebhart avatar fgebhart commented on May 18, 2024

This should be it.

(Just need to fix the failing ci pipeline, hasn't been touched for a while...)

from workoutizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.