Creating a full blown 64-bit floating point library requires some planning and working structure. The following design and coding principles helped me to maintain consistency across the dozens of files and algorithms.
Design principles are overall guidelines, independent of the current algorithm. They cover both guidelines for the overall architecture of the library and the files within, but also some more implementation specific guidelines.
For some topics, like for the trigonometric functions (e.g. sine or cosine), I had to make a decision, which algorithm to choose. These special decisions are covered in the chapter “Design Desicions” below.
- Overall objective is to create a full blown 64-bit floating point library for the family of Atmel AVR microprocessors in assembly language. “Full blown” means, that all functions of “math.h” are available.
- In addition, also all functions implemented in avr_f64, the 64bit C library, will have to be made available.
- Minimize the code size, even at the cost of more execution time! Every byte counts!
- As a consequence, common routines (like shifting) are to be placed in separate files (1 routine per file) as a subroutine.
- In order to facilitate later integration into the AVR GCC compiler, structure and naming of files follows the already existing 32-bit float library, called libm32 throughout the rest of this document.
- All functions are callable via the C language interface.
Externally called functions are prefixed by “fp64_”.
Internal functions are prefixed by “__fp64_”.
- The library will not be designed to be re-entrant. i.e. it will neither support to be called via threads nor via interrupts. As even the simplest 64bit floating point operation will take several 100 instruction cycles, these operations shall not be used in interrupt routines.
- If possible, all values should be stored in registers.
- With the AVR 328 architecture, it is not possible to store more than 4 64bit values in its 32 registers. If an algorithm needs more space for variables, using static memory is preferred instead of using stack space, as it is easier to address and to handle. However, using static memory is only possible as the library is designed not to be reentrant. If you call library routines inside interrupts, unexpected side effects may happen due to the use of static memory!
- Macros are to be avoided as they extend code size. Exceptions are the commonly used “XCALL”, “XJMP” macros used in avr32 to hide code generation dependant on the target architecture .
- Follow the IEEE 754 standard very closely with the following exceptions:
- Only 1 rounding mode is implemented. → Rounding mode can not be set.
- Fp64lib only supports “silent signalling”.
- For testing the implementation, the results of avr64 are the “gold” standard. An algorithm is considered to be ok, if the results match the avr64 results – with the following exceptions:
- fp64lib implements calculations with subnormal numbers – these are not supported by avr64.
- Processing of non-normal numbers (NaN, +Inf, -Inf) in avr64 dows not always follow IEEE 754. fp64lib follows are more strict adherence to IEEE 754.
- fp64lib is targeted for Arduino and Arduino IDE users. Therefore, normal Arduino users have to be able to follow development and testing of the library. As a consequence, no special tools (like AVRsim or AVRice) can be used for testing and debugging.
- The significand is 52 bit plus the “hidden” leading bit, i.e. 53 bit in total which can be stored into 7 8-bit registers. The significand is always stored left-aligned including the “hidden” leading bit, i.e. the most significant bit (MSB) is always set. As a consequence of left-alignment, there are 3 “empty” bits at the right end (LSB) of the significand. These 3 bits are used to extend the significand to 56 bit. All internal calculations have to be made with at least 56 bit precision.
- For multiplication, internal calculation will be made with 72 bit (9 8-bit registers) to minimize rounding errors. Full 106 bit (2*53 bit) precision multiplication provides no benefit to 72 bit precision, which both are rounded to 53 bit for the end result.
- All constant values should be stored in program memory to avoid using memory for variables.
- For the trigonometric functions, for CORDIC algorithm was chosen to be superior regarding space efficiency than the usually used Taylor approximations. Main advantages for CORDIC are:
- The same table of constants is used both for trigonometric functions (sin, cos, tan) and the inverse trigonometric functions (arcsin, arccos, arctan). Taylor approximations use different tables for inverse trigonometric functions.
- In the main definition area 0 to PI/2, precision is independent from the input value. It is solely dependant on the number of table entries, which also give the number of iterations. i.e. precision and timing are guaranteed. Precision of Taylor approximations is strongly determined by input value and table entries.
- CORDIC calculates both sin and cos of the input value in the same run. As a consequence, adding a simple division returns tan as tan(x) = sin(x) / cos(x). For Taylor, either two runs are needed (one for sin(x), one for cos(x))f ollowed by a division or a separate Taylor series is created for tan, needing more space for constant table entries.
- If an algorithm was taken over from avr32, code structure (including naming of code labels) follows avr32 structure.
- To take advantage of AVR branch restrictions, special cases are checked at the very beginning of the code and the handling of these special cases is before main entry point of the file.
- Overall structure of a file is:
- Comment header, including license
- File specific definitions, e.g. declaration of register usage
- Code for handling special cases
- Main Entry
- Checking for special cases
- Implementation of algorithm
- Returning value
- Constant table(s), if needed
- Static memory reservation(s), if needed
- To allow internal calling of algorithms, routines should provide an entry point after arguments were unpacked and checked.