Comments (4)
@kw4tts wow what an investigation! Thank you so much for this thorough comment.
I totally agree on your suggestion of using haversine instead of the custom math formula. By now I actually already have some other tool published which in fact uses haversine. I have to admit though, that I simply adopted the haversine pypi package because of its convenience and not the improved handling of coordinates with small proximity. Thanks to your in depth investigation I now also have much more important argument at hand.
I will try to come up with an implementation which replaces the current math formula with the haversine approach soon.
from workoutizer.
Hey! Thanks for filing that issue and providing a proposed fix! 👍
As I'm on a somewhat lengthy family trip, I won't be able to investigate this issue any time soon.
However, your proposed fix looks good to me at first glance. If you want and have time a PR would be highly appreciated. I believe adding a little reproducing test for the mentioned function would help in gaining confidence that your proposed change actually fixes the bug.
Let me know if I should provide more support/info.
from workoutizer.
Hello! Thank you for your reply. Please, enjoy the trip, as this "bug" is far from critical.
As it currently stands, I have no experience writing unit tests; that's why I stated I am happy to discuss whether the "bodge" is appropriate (and whether it fits the PR anyhow).
I have been, however, doing some research in the meanwhile. Mainly discovering, why does this precise GPX log file trigger this error.
As a write-up; it's difficult to logically sum up past 6 days; but here we go - it's a long, and messy one.
EDIT: a late-night post is never a good idea. This comment has been updated 2022-05-20 at 9:30AM CEST.
Chapter 1: Reproducing the error (with fictional coordinates)
First, I've been experimenting with fictional coordinates - [1,1]
and [1.0001,1.0001]
, slowly closing them together - a small increment at a time. At first, I could not reproduce the problem in its original form.
As the matter of fact, with the pair [1,1]
and [1.00001,1.0001]
(and with the delta step of simply ridiculous -=0.00000000000001
), the first few test result can be seen fluctuating around two values, suggesting the presence of rounding errors.
As the distance was 15 meters (and we already know this is far enough not to trigger error in WKZ), I decided to close the gap further, starting from [1,1]
and [1.00001,1.00001]
- distance of 1.5 meter - and with a bit more "modest" delta step of -=0.0000000001
degree:
from math import acos,cos,sin,pi
coordinate_1=[1,1]; coordinate_2=[1.00001,1.00001]
def _to_rad(deg): return deg*pi/180.0
while (coordinate_2[0]>coordinate_1[0]) and (coordinate_2[1]>coordinate_1[1]):
try:
angle1=sin(_to_rad(coordinate_1[0])) * sin(_to_rad(coordinate_2[0])) + cos(_to_rad(coordinate_1[0])) * cos(_to_rad(coordinate_2[0])) * cos(_to_rad(coordinate_1[1] - coordinate_2[1]))
distance1 = acos(angle1)* 6378100; distance2 = acos(max(0,min(1,angle1))) * 6378100
print("Angle: ",angle1,"\t","distance1:\t",distance1,"\t\tDistance2:\t",distance2,"\t\tDistance equal:", distance1==distance2)
coordinate_2[0]-=0.0000000001; coordinate_2[1]-=0.0000000001
except:
print("Angle: ",angle1,"\t","distance1:\t",distance1,"\t\tDistance2:\t",distance2);
print("Coordinate 2:\t",coordinate_2);
print("Error");
break
This was the computational output:
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 0.9999999999999999 distance1: 0.09504109621047974 Distance2: 0.09504109621047974 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 0.9999999999999999 distance1: 0.09504109621047974 Distance2: 0.09504109621047974 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 0.9999999999999999 distance1: 0.09504109621047974 Distance2: 0.09504109621047974 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 0.9999999999999999 distance1: 0.09504109621047974 Distance2: 0.09504109621047974 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0 distance1: 0.0 Distance2: 0.0 Distance equal: True
Angle: 1.0000000000000002 distance1: 0.0 Distance2: 0.0
Coordinate 2: [1.000000415499207, 1.000000415499207]
Error
There we go.
In the theoretical case of two neighboring points, which "happen" to have the same offset in both the latitude as well as in longitude, the "degree delta" between the latitude and longitude, smaller than 0.000000415499207
would therefore be enough to trigger a rounding error, and therefore failing the calculation. As said before, we also see some distance fluctuations between "9.5cm" and "0cm".
What we also see is that the smallest possibile distance we can calculate is 9.5 centimeters.
Chapter 2: Reproducing the error (with the GPX file)
I have later tried to reproduce the error with the original GPX file. I have extracted the functions, used by WKZ, into a single separated script and did some "print debugging" on it.
It occured to me, that in my case, the pair of "point, next_point" (to be using get_total_distance_of_trace() nomenclature) that triggered the computational error was (46.24446, 14.191566), (46.24446, 14.191565)
.
Quick search throughout the GPX file contains these records - to a decimal place precisely, indicating no raw data has been rounded while (for example) loading them into Pandas array, so that's OK-good:
<trkpt lat="46.2444600" lon="14.1915660">
<ele>688.5</ele>
<time>2020-07-28T06:07:31Z</time>
</trkpt>
<trkpt lat="46.2444600" lon="14.1915650">
<ele>688.5</ele>
<time>2020-07-28T06:07:32Z</time>
Chapter 3: How close is far enough, really? (The distance between two points)
I plugged the coordinates into the NHC calculator and got NaN
error. The NHC calculator is based on Great Circle Calculator, which returned value of aprox. 7.69cm, smaller than 9.5cm I've discovered as a "limit" in the beginning of this comment, thus hinting the cause for original calculation formula failure. This would also indicate, that NHC's calculator probably resides on the formula, currently in use by WKZ.
The NHC's calculator page has a link that has brought me to a chapter about distance calculation.
I began to dig around and search for the different methods of calculating the distance between two objects on the sphere. Sure enough, I quickly found a Wikipedia Article about Great circle distance and therefore also learned about a spherical law of cosines (method, used by WKZ). The article also quickly confirmed the sentence by Mr. Williams, about the rounding errors for two nearby coordinates:
Mr. Williams (about alternative formula for distance calculation compared to Spherical law of cosines):
A mathematically equivalent formula, which is less subject to rounding error for short distances
Wikipedia:
On computer systems with low floating point precision, the spherical law of cosines formula can have large rounding errors if the distance is small (if the two points are a kilometer apart on the surface of the Earth, the cosine of the central angle is near 0.99999999). For modern 64-bit floating-point numbers, the spherical law of cosines formula, given above, does not have serious rounding errors for distances larger than a few meters on the surface of the Earth.
Opening Google maps and using coordinates from my GPX file does confirm, that these are quite "a bit" closer, compared to "a few meters". I'd rather say, they are pretty much few centimeters apart (minus the HDOP and gps accuracy, but that's a story for another day).
This lead me to Haversine formula. Further reading on Wikipedia there states:
When using these formulae, one must ensure that h does not exceed 1 due to a floating point error (d is only real for 0 ≤ h ≤ 1). h only approaches 1 for antipodal points (on opposite sides of the sphere).
Thinking about that for a second brought me to a conclusion, that one rarely changes his location while working out, in a manner so rapid, causing his two consecutive track points being separated by half of a Earth's sphere.
Chapter 4: How does the alternative stack up to the existing solution?
Comparison time, then. In order to compare Haversine result with currently used, I implemented Haversine function in a totally-not-this-repo-friendly manner, also implementing my bodge for existing calculation formula (in order to finish calculations at all)
from math import acos, cos, pi, sin, asin, sqrt
from typing import List, Tuple
import gpxpy
import pandas as pd
def _to_rad(degree: float) -> float:
return degree / 180 * pi
def calculate_distance_between_points(coordinate_1: Tuple[float], coordinate_2: Tuple[float]) -> float:
if coordinate_1[0] == coordinate_2[0] and coordinate_1[1] == coordinate_2[1]:
return 0.0
AngleBetweenPoints1=sin(_to_rad(coordinate_1[0])) * sin(_to_rad(coordinate_2[0])) + cos(_to_rad(coordinate_1[0])) * cos(_to_rad(coordinate_2[0])) * cos(_to_rad(coordinate_2[1] - coordinate_1[1]))
# if(AngleBetweenPoints1>1):
# print(coordinate_1,"\t",coordinate_2)
try:
distance = acos(AngleBetweenPoints)
return distance * 6378100
except:
#print("RoundErr")
distance = acos(max(0,min(AngleBetweenPoints1,1)))
#print(distance)
return distance * 6378100
def haversine(lat1, lon1, lat2, lon2):
dLat = (lat2 - lat1) * pi / 180.0
dLon = (lon2 - lon1) * pi / 180.0
lat1 = (lat1) * pi / 180.0
lat2 = (lat2) * pi / 180.0
a = (pow(sin(dLat / 2), 2) +
pow(sin(dLon / 2), 2) *
cos(lat1) * cos(lat2));
rad = 6378100 # sphere radius in m
c = 2 * asin(sqrt(a))
return rad * c
latitude_list=[]; longitude_list=[]
gpx_file=open('BMC.gpx','r')
gpxData=gpxpy.parse(gpx_file)
for track in gpxData.tracks:
for segment in track.segments:
for point in segment.points:
latitude_list.append(point.latitude)
longitude_list.append(point.longitude)
coordinates_df = pd.DataFrame({"lat": latitude_list, "lon": longitude_list})
coordinates_df.dropna(inplace=True)
#legacy method
total_distance1 = 0
for index, row in coordinates_df.iterrows():
if index < len(coordinates_df) - 1:
point = (row["lat"], row["lon"])
next_point = (coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
dist = calculate_distance_between_points(point, next_point)
#if dist==0.0: print(index)
total_distance1+=dist
#haversines method
total_distance2 = 0
for index, row in coordinates_df.iterrows():
if index < len(coordinates_df) - 1:
dist=haversine(row["lat"], row["lon"], coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
total_distance2+=dist
print("Total distance (Legacy):\t",total_distance1/1000,"km")
print("Total distance (Haversine):\t",total_distance2/1000,"km")
The output:
>>> print("Total distance (Legacy):\t",total_distance1/1000,"km")
Total distance (Legacy): 100.57842160860646 km
>>> print("Total distance (Haversine):\t",total_distance2/1000,"km")
Total distance (Haversine): 100.58544858153573 km
>>>
Chapter 5: Results are in, but how and why?
OK, so the distance differs for 10 meters. Bravo, me. Slow clap. Saved the world. Bra-vo. Give yourself a pat on the back. (/s)
But where does the difference come from?
Closely examining the legacy code seems to suggest that there are lot of points in my recording, that are (accidentally?) very close together:
zero_distance=[]
coordinates_df = pd.DataFrame({"lat": latitude_list, "lon": longitude_list})
coordinates_df.dropna(inplace=True)
for index, row in coordinates_df.iterrows():
if index < len(coordinates_df) - 1:
point = (row["lat"], row["lon"])
next_point = (coordinates_df.at[index + 1, "lat"], coordinates_df.at[index + 1, "lon"])
dist = calculate_distance_between_points(point, next_point)
if dist==0: zero_distance.append(index)
>>> print(len(zero_distance))
178
Tis many close-proximity points could accumulate for a difference of 10 meters. Heck, it could accumulate for even more.
Of course, one could always assume they are on same coordinates.
I am not going to assume or dig down this rabbit hole too much. :)
Chapter 6: Testing the new function again - with our first, fictional scenario.
So, how does Haversine stack up in the rounding errors on ultra-small coordinate differences?
I have decided to run the test from the beginning of my comment on Haversines:
from math import pi, pow, sin, cos, asin, sqrt
coordinate_1=[1,1]; coordinate_2=[1.00001,1.00001]
def _to_rad(deg): return deg*pi/180.0
def haversine(lat1, lon1, lat2, lon2):
dLat = (lat2 - lat1) * pi / 180.0
dLon = (lon2 - lon1) * pi / 180.0
lat1 = (lat1) * pi / 180.0
lat2 = (lat2) * pi / 180.0
a = (pow(sin(dLat / 2), 2) +
pow(sin(dLon / 2), 2) *
cos(lat1) * cos(lat2));
rad = 6378100
c = 2 * asin(sqrt(a))
return rad * c
while (coordinate_2[0]>coordinate_1[0]) and (coordinate_2[1]>coordinate_1[1]):
distance=haversine(coordinate_1[0],coordinate_1[1],coordinate_2[0],coordinate_2[1])
print("Distance: \t",distance,"\tCoordinate: \t",coordinate_2)
coordinate_2[0]-=0.0000000001
coordinate_2[1]-=0.0000000001
Last few lines of output:
Distance: 9.43197503796711e-05 Coordinate: [1.0000000005991727, 1.0000000005991727]
Distance: 7.85780858509159e-05 Coordinate: [1.0000000004991727, 1.0000000004991727]
Distance: 6.283642132216042e-05 Coordinate: [1.0000000003991727, 1.0000000003991727]
Distance: 4.7094756793404725e-05 Coordinate: [1.0000000002991727, 1.0000000002991727]
Distance: 3.135309226464879e-05 Coordinate: [1.0000000001991727, 1.0000000001991727]
Distance: 1.5611427735892607e-05 Coordinate: [1.0000000000991727, 1.0000000000991727]
This would seem to suggest that the accuracy of Haversine function is far greater and error-prone. Even lowering the "delta step" led me to the results in range of "1e-08".
I know this comment is an exhausting giant mess, I won't lie. At least I hope, that it illustrates that the cause of the problem is the rounding error (in cosine function), and that for small proximity, cosines should apparently be avoided. :)
As stated above, as our sport activities will never jump across half the globe between two recordings, my personal suggestion is that we replace the existing mathematical formula with Haversines.
Take all the time you need and let me know what do you think.
Kind regards!
P.S. Also, are you sure that this is multiplied with "centimeters"? I believe it should stand for "meters" :)
from workoutizer.
(Just need to fix the failing ci pipeline, hasn't been touched for a while...)
from workoutizer.
Related Issues (20)
- Make the auto-import more flexible HOT 15
- ENH: Provide feedback on progress of "Reimport" in UI
- Introduce mypy to code base
- Introduce isort to code base HOT 1
- [BUG] run_docker.sh does not forward container port HOT 3
- [BUG] Readme in setup dir does not install latest version of workoutizer HOT 5
- [BUG] processing demo activity's does not complete HOT 7
- [ENH] Officially support Python 3.7 and run on Raspberry Pi (ARM) HOT 3
- [BUG] 920xt not mounting automatically HOT 9
- TST: Test results not consistent
- [BUG] Incorrect if statement causes block devices to never be mounted
- [BUG] Altitude is shown incorrectly HOT 6
- [BUG] Block device not settled yet when mount triggerd by wkz_mount.service HOT 7
- [ENH] Make workoutizer Windows compatible
- [BUG] macos build pipeline seems broken
- Trouble getting workoutizer to work HOT 13
- Make path to activities inside garmin device configurable
- Add import/copy command to workoutizer HOT 2
- BUG: Error in filter when e.g. avg_speed is not available in fit file
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from workoutizer.