The Floating Point Unit is designed to speed up calculations on real numbers, which are encoded in a computer as floating-point numbers. At the beginning of the history of x86 processors, the FPU was a separate integrated circuit. Since the i486DX, it has been a standard element of every processor. The FPU operates on single-, double-, or extended-precision values, using its own set of registers and instructions. For details on FPU registers, please refer to the “Register Set” section.
Registers in the FPU are organised as a stack with 8 levels of depth. Physical registers are named R0-R8, while registers visible to the FPU are named ST(0)-ST(7). The ST(0), also referred to as ST, is always the top of the stack, while ST(7) is the bottom of the stack. The top of the stack is pointed to with three bits in the FPU status word register. Each time data is loaded into the FPU, the stack top is decremented; each time data is popped from the stack, it is incremented. The initial state is as shown in figure 1.
While data is loaded to the FPU, it is pushed onto the stack, as shown in figure 2.
The stack organisation of registers makes it easier to implement mathematical calculations using RPN (Reverse Polish Notation), also called postfix notation. Further in this section, we'll present the FPU coprocessor's instructions. They can be grouped as:
Data in the memory used by the FPU can be stored as a single-precision, double-precision, or double-extended-precision floating-point value, or as an integer of the type word, double word, quadword, or 18-digit BCD number. The FPU always converts it to double-extended-precision floating-point while loading it into an internal register, and converts the data to the required format while storing it back into memory.
Loading instructions: always first decrement the stack top field in the status word register, then load the value onto the new top of the register stack.
Store instructions take data off the FPU register stack and place it in the memory.
There are eight such instructions summarised in the table 1
| Mnemonic | flags checked | description |
|---|---|---|
| fcmove | ZF=1 | equal |
| fcmovne | ZF=0 | not equal |
| fcmovb | CF=1 | below |
| fcmovbe | CF=1 or ZF=1 | below or equal |
| fcmovnb | CF=0 | not below |
| fcmovnbe | CF=0 and ZF=0 | not below or equal |
| fcmovu | PF=1 | unordered |
| fcmovnu | PF=0 | not unordered |
NaN).
Some constant values can be pushed onto the FPU register stack without the need to define them in memory. Such loading is faster than loading from memory in instructions. They are summarised in the table 2.
This group of instructions includes addition, subtraction, multiplication, and division in various forms. The arguments to these instructions determine their behaviour. Let's consider some examples.
If the instruction has a single argument, it must be a memory argument that specifies a single- or double-precision floating-point number. The result is always stored in ST(0). The version with two arguments works only with registers. The order of arguments determines the order of the calculation and the placement of the result.
For example, fsub ST(0), ST(i) subtracts ST(i) from ST(0) and stores the result in ST(0).
The fsub ST(i), ST(0) subtracts ST(0) from ST(i) and stores the result in ST(i). The popped version with two arguments additionally pops the stack.
For example, fsubp ST(i), ST(0) subtracts ST(0) from ST(i), stores the result in ST(i) and pops the stack. No argument version implies ST(1) as the destination and ST(0) as the source argument.
For example, fsubp subtracts ST(0) from ST(1), stores the result in ST(1) and pops the stack. The result is then at the top of the stack. Basic arithmetic instructions are summarised in table 3, float represents the single precision argument in memory, double represents the double precision argument in memory. The ST(i) is the i-th FPU register.
| Mnemonic | operation | result | pop |
|---|---|---|---|
| ADDITION | |||
| fadd float | ST(0) + float | ST(0) | no |
| fadd double | ST(0) + double | ST(0) | no |
| fadd ST(0), ST(i) | ST(0) + ST(i) | ST(0) | no |
| fadd ST(i), ST(0) | ST(i) + ST(0) | ST(i) | no |
| faddp ST(i), ST(0) | ST(i) + ST(0) | ST(i) | yes |
| faddp | ST(1) + ST(0) | ST(1) | yes |
| SUBTRACTION | |||
| fsub float | ST(0) - float | ST(0) | no |
| fsub double | ST(0) - double | ST(0) | no |
| fsub ST(0), ST(i) | ST(0) - ST(i) | ST(0) | no |
| fsub ST(i), ST(0) | ST(i) - ST(0) | ST(i) | no |
| fsubp ST(i), ST(0) | ST(i) - ST(0) | ST(i) | yes |
| fsubp | ST(1) - ST(0) | ST(1) | yes |
| MULTIPLICATION | |||
| fmul float | ST(0) * float | ST(0) | no |
| fmul double | ST(0) * double | ST(0) | no |
| fmul ST(0), ST(i) | ST(0) * ST(i) | ST(0) | no |
| fmul ST(i), ST(0) | ST(i) * ST(0) | ST(i) | no |
| fmulp ST(i), ST(0) | ST(i) * ST(0) | ST(i) | yes |
| fmulp | ST(1) * ST(0) | ST(1) | yes |
| DIVISION | |||
| fdiv float | ST(0) / float | ST(0) | no |
| fdiv double | ST(0) / double | ST(0) | no |
| fdiv ST(0), ST(i) | ST(0) / ST(i) | ST(0) | no |
| fdiv ST(i), ST(0) | ST(i) / ST(0) | ST(i) | no |
| fdivp ST(i), ST(0) | ST(i) / ST(0) | ST(i) | yes |
| fdivp | ST(1) / ST(0) | ST(1) | yes |
The addition and multiplication operations are commutative, while subtraction and division are not. That's why the reversed versions of subtraction and division are implemented (table 4). The difference is the order of operations, while the destination remains the same as in non-reversed versions.
| Mnemonic | operation | result | pop |
|---|---|---|---|
| REVERSED SUBTRACTION | |||
| fsubr float | float - ST(0) | ST(0) | no |
| fsubr double | double - ST(0) | ST(0) | no |
| fsubr ST(0), ST(i) | ST(i) - ST(0) | ST(0) | no |
| fsubr ST(i), ST(0) | ST(0) - ST(i) | ST(i) | no |
| fsubrp ST(i), ST(0) | ST(0) - ST(i) | ST(i) | yes |
| fsubrp | ST(0) - ST(1) | ST(1) | yes |
| REVERSED DIVISION | |||
| fdivr float | float / ST(0) | ST(0) | no |
| fdivr double | double / ST(0) | ST(0) | no |
| fdivr ST(0), ST(i) | ST(i) / ST(0) | ST(0) | no |
| fdivr ST(i), ST(0) | ST(0) / ST(i) | ST(i) | no |
| fdivrp ST(i), ST(0) | ST(0) / ST(i) | ST(i) | yes |
| fdivrp | ST(0) / ST(1) | ST(1) | yes |
There are also versions of four basic arithmetic instructions that operate on an integer memory argument (see table 5). It can be a word or a doubleword.
| Mnemonic | operation | result | pop |
|---|---|---|---|
| ADDITION | |||
| fiadd word | ST(0) + word | ST(0) | no |
| fiadd doubleword | ST(0) + doubleword | ST(0) | no |
| SUBTRACTION | |||
| fisub word | ST(0) - word | ST(0) | no |
| fisub doubleword | ST(0) - doubleword | ST(0) | no |
| REVERSED SUBTRACTION | |||
| fisubr word | word - ST(0) | ST(0) | no |
| fisubr doubleword | doubleword - ST(0) | ST(0) | no |
| MULTIPLICATION | |||
| fimul word | ST(0) * word | ST(0) | no |
| fimul doubleword | ST(0) * doubleword | ST(0) | no |
| DIVISION | |||
| fidiv word | ST(0) / word | ST(0) | no |
| fidiv doubleword | ST(0) / doubleword | ST(0) | no |
| REVERSED DIVISION | |||
| fidivr word | word / ST(0) | ST(0) | no |
| fidivr doubleword | doubleword / ST(0) | ST(0) | no |
The basic arithmetic instructions also contain instructions for other calculations.
The comparison instructions compare two floating-point values and set flags appropriate to the result.
The operand of the fcom instruction can be a memory operand or another FPU register. It is always compared with the top of the stack. If no operand is specified, it compares ST(0) and ST(1).
Popped version fcomp pops ST(0) off the stack.
The instruction fcompp with double “P” at the end can't have any argument, compares ST(0) and ST(1) and pops both registers off the stack.
If one of the arguments is NaN, they generate the invalid arithmetic operand exception. To avoid unwanted exceptions, there are unordered versions of comparison instructions.
Unordered comparison instructions do not operate with memory arguments.
Two instructions are implemented to compare integers.
They have a single memory argument that can be a word or doubleword, which is compared with the top of the stack.
The original instructions set flags C0, C2, and C3 in the FPU status word register. After implementing FPU as the integral unit of the processor, a new set of instructions appeared that directly set flags in the FLAGS register.
Their first argument is always ST(0), the second is another FPU register.
To the group of comparison instructions also belong fxam and ftst instructions. They return the information in C0, C2 and C3 flags.
The transcendental instructions perform calculations of advanced mathematical functions.
After calculating the tangent, the value 1.0 is pushed onto the stack to make it easier to calculate the cotangent later by executing the fdivr instruction. The partial means that this instruction handles only a limited range of input arguments.
The instructions for exponential and logarithmic functions are summarised in table 6.
The FPU control instructions help the programmer to save and restore the contents of chosen registers if there is a need to use them in an interrupt handler or inside a function. It is also possible to initialise the FPU unit's state or clear errors.
The following set of instructions can perform error checking while execution (instructions without “N”) or perform the operation without checking for error conditions (instructions without “N”).