Coder Social home page Coder Social logo

muktihari / fit Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 0.0 6.76 MB

A FIT SDK for decoding and encoding Garmin FIT files in Go supporting FIT Protocol V2.

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.13% Go 99.87%
binary decoding encoding fit fitness garmin garmin-fit go golang golang-fit-sdk protocol sdk

fit's Introduction

Hi there, I'm Mukti ๐Ÿ‘‹

I'm a software engineer who is interested in code optimization and creating libraries optimized for specific purposes, with a main focus on Go programming language.

More about me

LinkedIn

Stats

Mukti's GitHub stats ย  Mukti's Top Langs

fit's People

Contributors

dependabot[bot] avatar muktihari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fit's Issues

DecodeWithContext and EncodeWithContext's spawned goroutine still running in background

While the user can receive an immediate context cancellation error, the goroutine function spawned by these methods is still running in the background, wasting resources, since its lifecycle is not tied to the caller functions.

decoder:

fit/decoder/decoder.go

Lines 274 to 285 in 1ea859a

done := make(chan struct{})
go func() {
fit, err = d.Decode()
close(done)
}()
select {
case <-ctx.Done():
return nil, ctx.Err()
case <-done:
return
}

encoder:

fit/encoder/encoder.go

Lines 208 to 219 in 1ea859a

done := make(chan struct{})
go func() {
err = e.Encode(fit)
close(done)
}()
select {
case <-ctx.Done():
return ctx.Err()
case <-done:
return
}

Proof: https://go.dev/play/p/RHGgnBkTmMh

This need to be changed if we still need to support context propagation. The decoding/encoding FIT files is generally fast and may not need to use context, but there is a probability that we might deal with very large-sized FIT files and we need to process hundreds, if not, thousands of it at same time, early termination to save resources will be appreciated.

If we were to support this, we have some thoughts to consider. We wouldn't want the context variable passing around on every method, this is not necessary useful since the most affecting factor on how long the time spent on this process is depend of how many message is there in the FIT files, having context in every method is only "polluting" the logic, e.g. adding this to decodeCRC is only adding complexity, harder to read, without added value.

Moreover, having context in method that being called by Decode() that explicitly don't want to use context will only add checking context overhead, even thought it's so small, "doing nothing is better that do something that we would not use anyway". And also, this will affect the SDK adopter, let's say people using tinygo will no longer be able to use this SDK since context is not supported. But why we care about tinygo?

First, in our opinion, tinygo is currently best fit to be used as WASM since its compact binary size compared to standard Go's binary that's relatively huge. Second, there might be a tinygo embedded device project that may use this SDK to create a FIT files, one that I can think of is an GPS tracker device such as cycling computer (?)

Well, the chances are still small but not zero, and this is only an anticipation (or a speculation), it's worth if we can accommodate many adopters while it's still achievable. So let's create a layer of abstraction to support this without affecting our existing code that much.

hash/crc16: reduce pointer value dereference on Write method

  1. c.crc will be de-referenced on each iteration making the process slower, we can dereference it once into a local variable and then use that variable inside the loop, only after it finish, copy the local variable value back to c.crc.
    https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/kit/hash/crc16/crc16.go#L38C1-L43C2
func (c *crc16) Write(p []byte) (n int, err error) {
	for _, b := range p {
		c.crc = c.compute(c.crc, b) <- this
	}
	return len(p), nil
}
  1. Potential misuse my occur by altering the value of the global variable fitTable returned by MakeFitTable(), it should instead return a new copy of the table.
    https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/kit/hash/crc16/crc16.go#L18C1-L25C1
var fitTable = &Table{
	0x0000, 0xCC01, 0xD801, 0x1400, 0xF001, 0x3C00, 0x2800, 0xE401,
	0xA001, 0x6C00, 0x7800, 0xB401, 0x5000, 0x9C01, 0x8801, 0x4400,
}

// MakeFitTable is the table defined in [https://developer.garmin.com/fit/protocol]
func MakeFitTable() *Table { return fitTable } <- this
  1. After we make MakeFitTable() returning new copy, we will have increased table allocation on each Decoder. To address this, let's reuse fitTable when nil table is passed. And in decoder.New() and encoder.New(), change crc16.New(crc16.MakeFitTable()) to crc16.New(nil)
    https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/kit/hash/crc16/crc16.go#L29C1-L31C2
func New(table *Table) hash.Hash16 {
	return &crc16{table: table}
}
  1. Rename MakeFitTable to MakeFITTable, The word "fit" should either "fit" or "FIT" since FIT is an abbreviation.
  2. Consider put this hash package under internal since it only be used internally, drop support for external dependency which might use this package. The hash package should be exported since we have RawDecoder.

decoder: []byte buffer extra copying elimination

In current implementation, each read operation will copy read []byte into bytesArray to avoid alloc (make([]byte, n)):

https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/decoder/decoder.go#L48C2-L50C26

When working with io.Reader that requires syscall e.g. *os.File, it's encouraged to wrap it with buffer such as *bufio.Buffer to reduce syscall as it is typically more expensive than copying to buffer (memmove).

However when the two implementation coincide, we will have extra copying O(2n) as follow:

  1. copy *os.File Read([]byte) to *bufio.Reader buf
  2. copy *bufio.Reader buf to *Decoder bytesArray only then we can consume it

We could eliminate the extra copying by implementing our own custom buffer as we have full control over the buffer buf.

type readbuf struct {
    buf []byte // 765 + at least 4096 (default)
    cur int       // cursor 
}

We can create the API somewhat like this:

func (r *readbuf) ReadN(n int) ([]byte, error) {} // `n` will never exceed 765 bytes
  1. Get n bytes from readbuffer, when buf is empty fill with 4096 from the *os.File then return [n]byte, cur = n+1.
  2. When n == cap(buf) -1, fill buf[0:4096+1] again, cur = 0
  3. Else if n > rem (rem = cap(buf) - cur), copy(buf[:rem+1], buf[cur:]) then fill buf[rem+1:] with 4096 from the *os.File, cur = 0, return [n]byte -> cur = n+1. Repeat the process.

This way, we only need to copy the bytes from *os.File to buf once + splicing how many times we get to point 3 which should be small copy, < 765 bytes. Reducing O(2n) operation into almost O(n).

proto: Value takes ownership of a slice, update documentation

When creating Value from a slice and retrieving a slice from Value, the underlying value is not copied to reduce unnecessary alloc since Value is designed for one-time, short-lived use. However, it could make potential misuse if the user is not aware and try modifying the slice, the slice inside the Value will be changed as well. Let's make documentation more clearer.

https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/proto/value.go#L731C1-L734C2

// SliceFloat64 converts []float64 as Value. <- update this
func SliceFloat64[S []E, E ~float64](s S) Value {
        return Value{num: uint64(len(s)), any: unsafe.SliceData(*(*[]float64)(unsafe.Pointer(&s)))} // <- takes ownership
}

https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/proto/value.go#L361C1-L369C1

// SliceFloat64 returns Value as []float64, if it's not a valid []float64 value, it returns nil. <- update this
func (v Value) SliceFloat64() []float64 {
        ptr, ok := v.any.(*float64)
        if !ok {
                return nil
        }
        return unsafe.Slice(ptr, v.num) // <- takes ownership
}

Decoder Benchmark Results on Windows with 12th Gen Intel i7-12700H Laptop Version

Here the Decoder Benchmark Results on Windows with 12th Gen Intel i7-12700H Laptop Version as requested.

Syntax:

cd decoder/
go test -bench ^BenchmarkDecode -benchmem -count 10

Result

goos: windows
goarch: amd64
pkg: github.com/muktihari/fit/decoder
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkDecodeMessageData-20             516486              2262 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             588439              2337 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             489655              2233 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             525909              2437 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             518671              2523 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             474898              2348 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             501902              2630 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             585914              2479 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             482737              2426 ns/op            3456 B/op          1 allocs/op
BenchmarkDecodeMessageData-20             428689              2365 ns/op            3456 B/op          1 allocs/op
BenchmarkDecode-20                            25          45101612 ns/op        77060420 B/op     100039 allocs/op
BenchmarkDecode-20                            26          46000412 ns/op        77060603 B/op     100039 allocs/op
BenchmarkDecode-20                            24          47202175 ns/op        77060396 B/op     100039 allocs/op
BenchmarkDecode-20                            26          46334173 ns/op        77060399 B/op     100039 allocs/op
BenchmarkDecode-20                            24          46923171 ns/op        77060405 B/op     100039 allocs/op
BenchmarkDecode-20                            22          46817300 ns/op        77060398 B/op     100039 allocs/op
BenchmarkDecode-20                            22          47535132 ns/op        77060389 B/op     100039 allocs/op
BenchmarkDecode-20                            25          47507604 ns/op        77060475 B/op     100039 allocs/op
BenchmarkDecode-20                            25          45809960 ns/op        77060403 B/op     100039 allocs/op
BenchmarkDecode-20                            24          44326008 ns/op        77060401 B/op     100039 allocs/op
BenchmarkDecodeWithFiledef-20                 20          62280170 ns/op        96960784 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 22          58066368 ns/op        96960767 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 20          61198835 ns/op        96960806 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 19          58542326 ns/op        96960792 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 19          57530174 ns/op        96960809 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 21          57851810 ns/op        96960798 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 19          58226537 ns/op        96960802 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 19          57598268 ns/op        96960782 B/op     200047 allocs/op
BenchmarkDecodeWithFiledef-20                 19          56857942 ns/op        96960844 B/op     200048 allocs/op
BenchmarkDecodeWithFiledef-20                 19          57165637 ns/op        96960904 B/op     200048 allocs/op
PASS
ok      github.com/muktihari/fit/decoder        43.805s

decoder: reduce alloc + copy before passing the mesg to Listener

The current implementation on decoding message data makes new slice allocation for Fields (and DeveloperFields) for every decoded message regardless whether or not the message being retained to be returned by the decoder as Messages on proto.FIT.

https://github.com/muktihari/fit/blob/8b94fa933ac5cca0285f30d1a0409e21f2c21c89/decoder/decoder.go#L630C2-L631C37

func (d *Decoder) decodeMessageData(header byte) error {
    ...
    mesg.Fields = make([]proto.Field, len(mesg.Fields)) // <- unnecessary alloc if broadcastOnly is true.
    copy(mesg.Fields, d.fieldsArray[:]) // <- unnecessary copy if broadcastOnly is true.
    ...
    for _, mesgListener := range d.mesgListeners {
	mesgListener.OnMesg(mesg)
    }
    ...
}

Every listener will have shared slice anyway, if any listener tries to modify the value, other listeners will be impacted. This was intentionally designed to reduce alloc since if we make copy for every listener, it will affect performance a lot.

However, I think current implementation is still inefficient, we could have made the lifecycle of the mesg object short-lived, the mesg is guaranteed to be valid and not being changed only before OnMesg returned for every listener, and make a rule that every listener should handle the mesg completely before returned and if the listener is designed in non-blocking way, it should make a copy of the mesg before process it concurrently.

But wouldn't having every listener copy the mesg affect performance a lot as stated before? It depends, listener can implement reusable mesg object to avoid Fields and DeveloperFields allocation. That approach would likely be more efficient, outweighing the tradeoff we're making.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.