V1.1.0 Improved precision and better performance

All basic functions were restructured to allow access to full internal 56-bit precision. This was necessary to completely rewrite all trigonometric functions and to update logarithm and exponential functions. As a result, most higher math functions (like sin, cos, tan, asin, acos, atan, log and exp) have now increased precision and reduced execution time, still with small code size. As an example, execution time for fp64_sin is now between 600 and 650 micro seconds on a standard, 16 MHz Arduino MEGA 2560 – or 9500 to 10500 ticks (instructions).

V1.1.0 also includes some minor bugfixes and code improvements.

V1.0.5 available

I fixed a bug that caused incorrect rounding when there was a carry over across all digits, including the one before the decimal point. Check out the latest version via the Arduino library manager.

Initial release V1.0 available

After 2 years of development, reading several 100 pages of algorithms, after typing more than half a million lines of code, after creating, testing and verifying more that 19000 test cases, after endless hours/nights/weekends of hunting down nasty bugs, after documenting more than 60 top level functions, it’s finally good enough to be released as a V1.0.

The initial release of fp64lib for Atmel AVR 328 microprocessors is available here as a downloadable library for the Arduino IDE.

Have fun using it and I am pleased to receive any feedback via mail (at) fp64lib (dot) org.

Conversion functions

As fp64lib is an add-on library, getting data into the float64_t datatype and out of it is a key. The following will not work:

float64_t x = 1.0;
float64_t y = (float64_t) 3.141;
float64_t z = 10;

The above statements will all load some data into x, y, and z – but it will definitely not be 1.0, 3.141 or 10! Instead, use the following code that will work correctly:

float64_t x = fp64_sd(1.0);
float64_t y = fp64_sd(3.141);
float64_t z = fp64_uint32_to_float64(10);

fp64lib provides quite a number of functions to convert data to all native C data types: more...

convert to float64_tconvert from float64_t
long longfp64_int64_to_float64()fp64_to_int64()
unsigned
long long
fp64_uint64_to_float64() fp64_to_uint64()
long fp64_int32_to_float64()
fp64_long_to_float64()
fp64_to_int32()
unsigned
long
fp64_uint32_to_float64() fp64_to_uint32
int fp64_int32_to_float64() * fp64_to_int16()
unsigned
int
fp64_uint32_to_float64() * fp64_to_uint16()
char fp64_int32_to_float64() *fp64_to_int8()
unsigned
char
fp64_uint32_to_float64() *fp64_to_uint8()
floatfp64_sd()fp64_ds()
char*fp64_strtod()fp64_to_decimalExp()
fp64_to_string()

*For these data types, no special routine was needed to implement, as the compiler automatically extends (“coerces”) the smaller data type into a signed/unsigned long. So basically the following two lines are identical:

float64_t x = fp64_int32_to_float64( 17 );
float64_t x = fp64_int32_to_float64( (long) 17 );

So you can go with the short version.

More to come soon…

Soon, the initial version of a full 64-bit floating point library for the Atmel AVR 328 microprocessors, which are used for example in the popular Arduino boards, will be released.

Beside the basic mathematical operations (+, -, *, / ), the library implements all the necessary standard functions of math.h, like sin(), sqrt() or log() and most of IEEE 754 features like NaN, Inf, signed zero and subnormal numbers.

All routines are optimized for minimal size, leaving enough flash space for your application.