I'm a software engineer who is interested in code optimization and creating libraries optimized for specific purposes, with a main focus on Go programming language.
A FIT SDK for decoding and encoding Garmin FIT files in Go supporting FIT Protocol V2.
License: BSD 3-Clause "New" or "Revised" License
While the user can receive an immediate context cancellation error, the goroutine function spawned by these methods is still running in the background, wasting resources, since its lifecycle is not tied to the caller functions.
decoder:
Lines 274 to 285 in 1ea859a
encoder:
Lines 208 to 219 in 1ea859a
Proof: https://go.dev/play/p/RHGgnBkTmMh
This need to be changed if we still need to support context propagation. The decoding/encoding FIT files is generally fast and may not need to use context, but there is a probability that we might deal with very large-sized FIT files and we need to process hundreds, if not, thousands of it at same time, early termination to save resources will be appreciated.
If we were to support this, we have some thoughts to consider. We wouldn't want the context variable passing around on every method, this is not necessary useful since the most affecting factor on how long the time spent on this process is depend of how many message is there in the FIT files, having context in every method is only "polluting" the logic, e.g. adding this to decodeCRC is only adding complexity, harder to read, without added value.
Moreover, having context in method that being called by Decode() that explicitly don't want to use context will only add checking context overhead, even thought it's so small, "doing nothing is better that do something that we would not use anyway". And also, this will affect the SDK adopter, let's say people using tinygo will no longer be able to use this SDK since context is not supported. But why we care about tinygo?
First, in our opinion, tinygo is currently best fit to be used as WASM since its compact binary size compared to standard Go's binary that's relatively huge. Second, there might be a tinygo embedded device project that may use this SDK to create a FIT files, one that I can think of is an GPS tracker device such as cycling computer (?)
Well, the chances are still small but not zero, and this is only an anticipation (or a speculation), it's worth if we can accommodate many adopters while it's still achievable. So let's create a layer of abstraction to support this without affecting our existing code that much.
func (c *crc16) Write(p []byte) (n int, err error) {
for _, b := range p {
c.crc = c.compute(c.crc, b) <- this
}
return len(p), nil
}
fitTable
returned by MakeFitTable(), it should instead return a new copy of the table.var fitTable = &Table{
0x0000, 0xCC01, 0xD801, 0x1400, 0xF001, 0x3C00, 0x2800, 0xE401,
0xA001, 0x6C00, 0x7800, 0xB401, 0x5000, 0x9C01, 0x8801, 0x4400,
}
// MakeFitTable is the table defined in [https://developer.garmin.com/fit/protocol]
func MakeFitTable() *Table { return fitTable } <- this
decoder.New()
and encoder.New()
, change crc16.New(crc16.MakeFitTable())
to crc16.New(nil)
func New(table *Table) hash.Hash16 {
return &crc16{table: table}
}
MakeFitTable
to MakeFITTable
, The word "fit" should either "fit" or "FIT" since FIT is an abbreviation.hash
package under internal
since it only be used internally, drop support for external dependency which might use this package.hash
package should be exported since we have RawDecoder
.In current implementation, each read operation will copy read []byte into bytesArray
to avoid alloc (make([]byte, n)
):
When working with io.Reader
that requires syscall e.g. *os.File
, it's encouraged to wrap it with buffer such as *bufio.Buffer
to reduce syscall
as it is typically more expensive than copying to buffer (memmove
).
However when the two implementation coincide, we will have extra copying O(2n) as follow:
[]byte
) to *bufio.Reader buf
buf
to *Decoder bytesArray
only then we can consume itWe could eliminate the extra copying by implementing our own custom buffer as we have full control over the buffer buf
.
type readbuf struct {
buf []byte // 765 + at least 4096 (default)
cur int // cursor
}
We can create the API somewhat like this:
func (r *readbuf) ReadN(n int) ([]byte, error) {} // `n` will never exceed 765 bytes
readbuffer
, when buf
is empty fill with 4096 from the *os.File
then return [n]byte
, cur = n+1
.n == cap(buf) -1
, fill buf[0:4096+1] again, cur = 0
n > rem
(rem = cap(buf) - cur), copy(buf[:rem+1], buf[cur:])
then fill buf[rem+1:]
with 4096 from the *os.File
, cur = 0, return [n]byte -> cur = n+1. Repeat the process.This way, we only need to copy the bytes from *os.File
to buf
once + splicing how many times we get to point 3
which should be small copy, < 765 bytes. Reducing O(2n) operation into almost O(n).
When creating Value from a slice and retrieving a slice from Value, the underlying value is not copied to reduce unnecessary alloc since Value is designed for one-time, short-lived use. However, it could make potential misuse if the user is not aware and try modifying the slice, the slice inside the Value will be changed as well. Let's make documentation more clearer.
// SliceFloat64 converts []float64 as Value. <- update this
func SliceFloat64[S []E, E ~float64](s S) Value {
return Value{num: uint64(len(s)), any: unsafe.SliceData(*(*[]float64)(unsafe.Pointer(&s)))} // <- takes ownership
}
// SliceFloat64 returns Value as []float64, if it's not a valid []float64 value, it returns nil. <- update this
func (v Value) SliceFloat64() []float64 {
ptr, ok := v.any.(*float64)
if !ok {
return nil
}
return unsafe.Slice(ptr, v.num) // <- takes ownership
}
Here the Decoder Benchmark Results on Windows with 12th Gen Intel i7-12700H Laptop Version as requested.
Syntax:
cd decoder/
go test -bench ^BenchmarkDecode -benchmem -count 10
Result
goos: windows
goarch: amd64
pkg: github.com/muktihari/fit/decoder
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkDecodeMessageData-20 516486 2262 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 588439 2337 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 489655 2233 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 525909 2437 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 518671 2523 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 474898 2348 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 501902 2630 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 585914 2479 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 482737 2426 ns/op 3456 B/op 1 allocs/op
BenchmarkDecodeMessageData-20 428689 2365 ns/op 3456 B/op 1 allocs/op
BenchmarkDecode-20 25 45101612 ns/op 77060420 B/op 100039 allocs/op
BenchmarkDecode-20 26 46000412 ns/op 77060603 B/op 100039 allocs/op
BenchmarkDecode-20 24 47202175 ns/op 77060396 B/op 100039 allocs/op
BenchmarkDecode-20 26 46334173 ns/op 77060399 B/op 100039 allocs/op
BenchmarkDecode-20 24 46923171 ns/op 77060405 B/op 100039 allocs/op
BenchmarkDecode-20 22 46817300 ns/op 77060398 B/op 100039 allocs/op
BenchmarkDecode-20 22 47535132 ns/op 77060389 B/op 100039 allocs/op
BenchmarkDecode-20 25 47507604 ns/op 77060475 B/op 100039 allocs/op
BenchmarkDecode-20 25 45809960 ns/op 77060403 B/op 100039 allocs/op
BenchmarkDecode-20 24 44326008 ns/op 77060401 B/op 100039 allocs/op
BenchmarkDecodeWithFiledef-20 20 62280170 ns/op 96960784 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 22 58066368 ns/op 96960767 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 20 61198835 ns/op 96960806 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 19 58542326 ns/op 96960792 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 19 57530174 ns/op 96960809 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 21 57851810 ns/op 96960798 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 19 58226537 ns/op 96960802 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 19 57598268 ns/op 96960782 B/op 200047 allocs/op
BenchmarkDecodeWithFiledef-20 19 56857942 ns/op 96960844 B/op 200048 allocs/op
BenchmarkDecodeWithFiledef-20 19 57165637 ns/op 96960904 B/op 200048 allocs/op
PASS
ok github.com/muktihari/fit/decoder 43.805s
The current implementation on decoding message data makes new slice allocation for Fields (and DeveloperFields) for every decoded message regardless whether or not the message being retained to be returned by the decoder as Messages
on proto.FIT
.
func (d *Decoder) decodeMessageData(header byte) error {
...
mesg.Fields = make([]proto.Field, len(mesg.Fields)) // <- unnecessary alloc if broadcastOnly is true.
copy(mesg.Fields, d.fieldsArray[:]) // <- unnecessary copy if broadcastOnly is true.
...
for _, mesgListener := range d.mesgListeners {
mesgListener.OnMesg(mesg)
}
...
}
Every listener will have shared slice anyway, if any listener tries to modify the value, other listeners will be impacted. This was intentionally designed to reduce alloc since if we make copy for every listener, it will affect performance a lot.
However, I think current implementation is still inefficient, we could have made the lifecycle of the mesg
object short-lived, the mesg is guaranteed to be valid and not being changed only before OnMesg
returned for every listener, and make a rule that every listener should handle the mesg completely before returned and if the listener is designed in non-blocking way, it should make a copy of the mesg before process it concurrently.
But wouldn't having every listener copy the mesg affect performance a lot as stated before? It depends, listener can implement reusable mesg object to avoid Fields and DeveloperFields allocation. That approach would likely be more efficient, outweighing the tradeoff we're making.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.