The posit from tochinet

Posit Library for Arduino

This is an C/C++ library for posit8 and posit16 floating point arithmetic support in Arduino.

Posit Arithmetic was invented by John Gustafson. It is an alternative floating point format to IEEE 754 that promises a more efficient and balanced precision, especially useful for AI. Any number representation system using N bits can represent exactly 2^n values. All other numbers have to be rounded (up or down). While IEEE 754 numbers have fixed sizes for exponent (power of two) and mantissa (fractional part), hence can always express 2^mantissa values between any successive powers of two, such as between 1 and 2, or between -1/64 and -1/32, posits have a variable length exponent, so they can exactly express more values around +/-1, where more calculations typically take place, and are less precise for very big or very small numbers (like 4096 or 1E-6). The Posit Standard was released in 2022 and defines the storage format, operation behavior and required mathematical functions for Posits. It differs in some design choices from previous publications on posit arithmetic, and is only partially covered here, since not all decisions are equally applicable to the Arduino environment.

Posits can be any size from 2 to 32 bits or even more. Only 8-bit and 16-bit are considered in this library, and only the ATmega368 architecture is targeted (UNO etc.) at this stage, since 32-bit architectures often have hardware acceleration for IEEE floats.

No code was copied from any existing work, but some inspiration came from the SoftPosit C reference library, from section IV of the https://arxiv.org/pdf/2308.03425 paper (on division algorithms and rounding, leading to the conclusion that rounding to nearest even is not likely worth pursuing on Arduino), and from many other pages on the Internet (Quora, Stack Overflow, etc.).

WokWi was extensively used for the development of this library, as it responds much faster to code change than the Arduino software.

Status

This library is a work in progress, and my first experience in creating an Arduino library and publishing on GitHub, so expect errors and mistakes, stupid or not, and don't hesitate to contribute and propose correction and ameliorations. I tried to follow the official guide. As with all WIP, expect many, frequent and breaking changes. Remember this is also a way for me to learn.

Today, the library allows to create posit8 objects (both with 0 and 2 exponent bits) and posit16,2 objects from raw unsigned byte, from int (16 bits), float32 and double (also 32 bits on Arduino platform), to convert a posit back to float, to add, subtract, multiply and divide two posits.

Planned in coming iterations : prior/next, square root, comparisons, overloading of comparisons and += etc. operators, refactoring ... Maybe a better way to round operations will sneak in if it doesn't break the simplicity rule. Trigonometry functions may also be considered.

Some explanations on Floats and Posits.

IEEE 754 float representation

The standard Arduino library supports one type of floating point numbers, 32-bit IEEE 754. This is the most common standard for floating point calculations. A float consist of one bit sign (like signed integers), 8 bits exponent (power of two, biased by adding 127), and 23 bits of mantissa (the bits after the "1", that is not coded). There is no inversion of bits (2's complement) like for signed integers.

Expressing -10.5 in float requires the following steps :

coding the absolute value in binary (1010.1)
finding the power of 2 (3) and add 127 (total 130)
coding the sign, the exponent and the mantissa

Hence the 32-bit float representation of -10.5is 0b1100 0001 0010 1000 0000 0000 0000 0000 (with exponent bits bold and italicized).

Funny enough many simple reals cannot be expressed exactly. For example 0.1 is rounded tp 0b0 0011 1101 1100 1100 1100 1100 1100 1101 (positive, power -4, mantissa 1.6000000238418579), or 0,1000000014901161. IEEE 754 also defines some exceptions such as +/- infinity (exponent 255, mantissa all zeros), two different zeros (+0 is all zeros and -0 is 1 followed by all zeros), subnormal numbers (very small), and different representations of Not a Number (NAN).

Posit float representation

The major invention in the posit format was to add one additional variable-length field (called "regime") between the sign and the exponent. All regime bits are equal, and the first different bit marks the end of the regime field. The (fixed) number of exponent bits ("E" field in the picture, also called "es" bits) was an externally defined parameter. This means there were multiple, incompatible versions of posits, depending on the es parameter. To remove that caveat, the posit standard imposes es=2 for all sizes. However, the library supports both Posit8,0 (no exponent field) and Posit8,2.

Fig.1 : General Posit Format (from Posit Standard(2022))

Precision extension (for example from 8 to 16 bits) can be done simply by adding zeros at the end, expressing the very same numbers. Posit arithmetic implies that there is never underflow or overflow, only "rounding errors". For example, in posit8,2 arithmetic, 1024+224 is ... 1024 and there are no values expressing numbers between 16384 and 32768. There are also as many numbers bigger than one than between zero and one.

Examples of tiny posit sizes make it easier to understand the concept: since a posit2 number has 2 bits, it can obviously only express a set of 4 values. These are zero, one, minus one, and infinity(Not a Real, or NaR). Each time one bit is added, one value is inserted between each value already in the set. So a 3-bit posit will be able to express exactly the same four values, but also four new values : +/-2 and +/-0.5 if es=0, or +/-16 and +/-1/16 if es=2. Posit4,0 adds positive and negative values 1/4, 3/4, 3/2 and 4, and so on. An interactive visualisation project shows you all possibilities for Posits up to 8 bits.

The number of exponent bits, in the field between the regime and the mantissa, extends the limits of expressiveness, to the detriment of precision : posit3,1 numbers express exactly 4 and 0.25 instead of 2 and 0.5. Similarly, posit4,1 can express 0, 1/16, 1/4, 1/2, 1, 2, 4, 16 and infinity (and negatives). Most often the numbers expressed above and below one are reciprocal (their product is one), but this isn't true for values with mantissa bits : the reciprocal of 1.5 is 2/3=0.6666... but the posit number is 0.75 (3/4).

(Assumed) deviations from the standard

The main purpose of this library is to provide a more efficient alternative to the existing float arithmetic on constrained microcontrollers (ATmega328). This has several implications :

Support of 32-bit Posits is not considered, because the increased precision (vs. float32) is a non-objective, and float64 doesn't exist in Arduino.
The absence of overflow is a positive aspect of posits. However, underflow to zero is likely desirable for IoT applications etc. Hence the library will round down posits smaller than 1E-6 (1ppm) to zero by default for Posit8, and its square 1E-12 for Posit16. This is parametrizable by defining ESPILON in your sketch before including the library.
Rounding towards zero is preferred to "Rounding to nearest even" because it comes with much lower complexity (no guard/round/sticky bits to process). Beware that this means that 64.0 - 0.5 = 32.0.

tochinet / posit Goto Github PK

posit's Introduction

Posit Library for Arduino

Status

Some explanations on Floats and Posits.

IEEE 754 float representation

Posit float representation

(Assumed) deviations from the standard

posit's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent