FPU

The Floating Point Unit is designed to speed up calculations on real numbers, which are encoded in a computer as floating-point numbers. At the beginning of the history of x86 processors, the FPU was a separate integrated circuit. Since the i486DX, it has been a standard element of every processor. The FPU operates on single-, double-, or extended-precision values, using its own set of registers and instructions. For details on FPU registers, please refer to the “Register Set” section.

Although modern extensions to processors can currently perform real-number calculations faster using vector instructions, they do not achieve the precision available with extended-precision encoding.

Registers in the FPU are organised as a stack with 8 levels of depth. Physical registers are named R0-R8, while registers visible to the FPU are named ST(0)-ST(7). The ST(0), also referred to as ST, is always the top of the stack, while ST(7) is the bottom of the stack. The top of the stack is pointed to with three bits in the FPU status word register. Each time data is loaded into the FPU, the stack top is decremented; each time data is popped from the stack, it is incremented. The initial state is as shown in figure 1.

Diagram showing FPU stack initial state: 8 registers ST(0) through ST(7) arranged vertically, all empty. Stack pointer initially at ST(0). — Figure 1: The Initial State of FPU Registers and Stack

While data is loaded to the FPU, it is pushed onto the stack, as shown in figure 2.

Diagram showing FPU stack after loading single data element: new value pushed to ST(0), previous value moves to ST(1), stack pointer decrements. — Figure 2: The FPU Stack after Pushing Single Data

The stack organisation of registers makes it easier to implement mathematical calculations using RPN (Reverse Polish Notation), also called postfix notation. Further in this section, we'll present the FPU coprocessor's instructions. They can be grouped as:

data transfer instructions,
load constants instructions,
basic arithmetic instructions,
comparison instructions,
transcendental instructions,
FPU control instructions.

Data transfer instructions

Data in the memory used by the FPU can be stored as a single-precision, double-precision, or double-extended-precision floating-point value, or as an integer of the type word, double word, quadword, or 18-digit BCD number. The FPU always converts it to double-extended-precision floating-point while loading it into an internal register, and converts the data to the required format while storing it back into memory.

Loading instructions: always first decrement the stack top field in the status word register, then load the value onto the new top of the register stack.

fld - instruction loads a single precision, double precision or double extended-precision onto the FPU register stack. It can also copy data from other FPU registers into ST.
fild - instruction loads integer values of word, doubleword or quadword.
fbld - instruction loads an 18-digit binary decimal encoded value.

Store instructions take data off the FPU register stack and place it in the memory.

fst - instruction stores a single-precision or a double-precision value to the memory. It can also copy a value to another FPU register.
fstp - instruction also works with 80-bit double extended precision values and additionally pops data off the stack by incrementing the stack top field in the status word register. It can also copy a value to another FPU register, popping it off the FPU register stack.
fist - converts the value from the top of the FPU register stack into a word or doubleword integer and stores it in memory.
fistp - can also store a 64-bit quadword integer and pop the value from the stack top.
fbstp - pops a value from the FPU register stack and writes it as an 80-bit BCD-encoded integer to memory.

Please note that it is not possible to exchange values directly between the FPU stack and CPU registers. It is also not possible to load the constant encoded as an immediate value. You can do it using temporal variables stored in memory.

fxch - instruction exchanges values in two FPU registers, where one of them is the top of the stack. This instruction, used without any argument, exchanges ST and ST(1).
fcmovcc - instructions provide the conditional data transfer. They are introduced in the P6 processor to avoid conditional jumps. They test the condition based on flags in the EFLAGS register. The source operand can be any FPU register, while the destination is always ST(0). The fcmovcc instructions do not modify the stack top in the FPU.

There are eight such instructions summarised in the table 1

Table 1: Variations of fcmovcc Instruction

Mnemonic	flags checked	description
fcmove	ZF=1	equal
fcmovne	ZF=0	not equal
fcmovb	CF=1	below
fcmovbe	CF=1 or ZF=1	below or equal
fcmovnb	CF=0	not below
fcmovnbe	CF=0 and ZF=0	not below or equal
fcmovu	PF=1	unordered
fcmovnu	PF=0	not unordered

Unordered means that at least one of the arguments of the comparison instruction does not represent a proper numerical value (is NaN).

Load constants instructions

Some constant values can be pushed onto the FPU register stack without the need to define them in memory. Such loading is faster than loading from memory in instructions. They are summarised in the table 2.

Table 2: Load Constants Instructions

Mnemonic	value loaded into ST
fldz	0
fld1	1
fldpi
fldl2e
fldl2t
fldlg2
fldln2

Basic arithmetic instructions

This group of instructions includes addition, subtraction, multiplication, and division in various forms. The arguments to these instructions determine their behaviour. Let's consider some examples.
If the instruction has a single argument, it must be a memory argument that specifies a single- or double-precision floating-point number. The result is always stored in ST(0). The version with two arguments works only with registers. The order of arguments determines the order of the calculation and the placement of the result.

For example, fsub ST(0), ST(i) subtracts ST(i) from ST(0) and stores the result in ST(0).
The fsub ST(i), ST(0) subtracts ST(0) from ST(i) and stores the result in ST(i). The popped version with two arguments additionally pops the stack.
For example, fsubp ST(i), ST(0) subtracts ST(0) from ST(i), stores the result in ST(i) and pops the stack. No argument version implies ST(1) as the destination and ST(0) as the source argument.
For example, fsubp subtracts ST(0) from ST(1), stores the result in ST(1) and pops the stack. The result is then at the top of the stack. Basic arithmetic instructions are summarised in table 3, float represents the single precision argument in memory, double represents the double precision argument in memory. The ST(i) is the i-th FPU register.

Table 3: Basic Floating Point Arithmetic Instructions

Mnemonic	operation	result	pop
ADDITION
fadd float	ST(0) + float	ST(0)	no
fadd double	ST(0) + double	ST(0)	no
fadd ST(0), ST(i)	ST(0) + ST(i)	ST(0)	no
fadd ST(i), ST(0)	ST(i) + ST(0)	ST(i)	no
faddp ST(i), ST(0)	ST(i) + ST(0)	ST(i)	yes
faddp	ST(1) + ST(0)	ST(1)	yes
SUBTRACTION
fsub float	ST(0) - float	ST(0)	no
fsub double	ST(0) - double	ST(0)	no
fsub ST(0), ST(i)	ST(0) - ST(i)	ST(0)	no
fsub ST(i), ST(0)	ST(i) - ST(0)	ST(i)	no
fsubp ST(i), ST(0)	ST(i) - ST(0)	ST(i)	yes
fsubp	ST(1) - ST(0)	ST(1)	yes
MULTIPLICATION
fmul float	ST(0) * float	ST(0)	no
fmul double	ST(0) * double	ST(0)	no
fmul ST(0), ST(i)	ST(0) * ST(i)	ST(0)	no
fmul ST(i), ST(0)	ST(i) * ST(0)	ST(i)	no
fmulp ST(i), ST(0)	ST(i) * ST(0)	ST(i)	yes
fmulp	ST(1) * ST(0)	ST(1)	yes
DIVISION
fdiv float	ST(0) / float	ST(0)	no
fdiv double	ST(0) / double	ST(0)	no
fdiv ST(0), ST(i)	ST(0) / ST(i)	ST(0)	no
fdiv ST(i), ST(0)	ST(i) / ST(0)	ST(i)	no
fdivp ST(i), ST(0)	ST(i) / ST(0)	ST(i)	yes
fdivp	ST(1) / ST(0)	ST(1)	yes

The addition and multiplication operations are commutative, while subtraction and division are not. That's why the reversed versions of subtraction and division are implemented (table 4). The difference is the order of operations, while the destination remains the same as in non-reversed versions.

Table 4: Reversed Floating Point Arithmetic Instructions

Mnemonic	operation	result	pop
REVERSED SUBTRACTION
fsubr float	float - ST(0)	ST(0)	no
fsubr double	double - ST(0)	ST(0)	no
fsubr ST(0), ST(i)	ST(i) - ST(0)	ST(0)	no
fsubr ST(i), ST(0)	ST(0) - ST(i)	ST(i)	no
fsubrp ST(i), ST(0)	ST(0) - ST(i)	ST(i)	yes
fsubrp	ST(0) - ST(1)	ST(1)	yes
REVERSED DIVISION
fdivr float	float / ST(0)	ST(0)	no
fdivr double	double / ST(0)	ST(0)	no
fdivr ST(0), ST(i)	ST(i) / ST(0)	ST(0)	no
fdivr ST(i), ST(0)	ST(0) / ST(i)	ST(i)	no
fdivrp ST(i), ST(0)	ST(0) / ST(i)	ST(i)	yes
fdivrp	ST(0) / ST(1)	ST(1)	yes

There are also versions of four basic arithmetic instructions that operate on an integer memory argument (see table 5). It can be a word or a doubleword.

Table 5: Basic Integer Arithmetic Instructions

Mnemonic	operation	result	pop
ADDITION
fiadd word	ST(0) + word	ST(0)	no
fiadd doubleword	ST(0) + doubleword	ST(0)	no
SUBTRACTION
fisub word	ST(0) - word	ST(0)	no
fisub doubleword	ST(0) - doubleword	ST(0)	no
REVERSED SUBTRACTION
fisubr word	word - ST(0)	ST(0)	no
fisubr doubleword	doubleword - ST(0)	ST(0)	no
MULTIPLICATION
fimul word	ST(0) * word	ST(0)	no
fimul doubleword	ST(0) * doubleword	ST(0)	no
DIVISION
fidiv word	ST(0) / word	ST(0)	no
fidiv doubleword	ST(0) / doubleword	ST(0)	no
REVERSED DIVISION
fidivr word	word / ST(0)	ST(0)	no
fidivr doubleword	doubleword / ST(0)	ST(0)	no

The basic arithmetic instructions also contain instructions for other calculations.

fprem - calculate the partial remainder
fprem1 - calculate the partial remainder obtained from dividing the value in the ST(0) register by the value in the ST(1) register.
fabs - calculate the absolute value of ST(0).
fchs - changes the sign of ST(0).
frndint - rounds the ST(0) to an integer.
fscale - scales ST(0) by a power of two taken from ST(1),
fxtract - separates the value in ST(0) into the exponent placed in ST(0) and the significand, which is pushed onto the stack. As a result, the exponent is in ST(1) and the significand in ST(0).
sqrt - calculate the square root of ST(0).

Comparison instructions

The comparison instructions compare two floating-point values and set flags appropriate to the result.

fcom - compare two floating-point values
fcomp - compare two floating-point values with pop
fcompp - compare two floating-point values with double pop

The operand of the fcom instruction can be a memory operand or another FPU register. It is always compared with the top of the stack. If no operand is specified, it compares ST(0) and ST(1).
Popped version fcomp pops ST(0) off the stack.
The instruction fcompp with double “P” at the end can't have any argument, compares ST(0) and ST(1) and pops both registers off the stack.
If one of the arguments is NaN, they generate the invalid arithmetic operand exception. To avoid unwanted exceptions, there are unordered versions of comparison instructions.

fucom - compare two floating-point values, one can be NaN
fucomp - compare two floating-point values with pop, one can be NaN
fucompp - compare two floating-point values with double pop, one can be NaN

Unordered comparison instructions do not operate with memory arguments.
Two instructions are implemented to compare integers.

ficom - compare ST with integer
ficomp - compare ST with integer and pop

They have a single memory argument that can be a word or doubleword, which is compared with the top of the stack.

The original instructions set flags C0, C2, and C3 in the FPU status word register. After implementing FPU as the integral unit of the processor, a new set of instructions appeared that directly set flags in the FLAGS register.

fcomi - compare two floating-point values, set eflags
fcomip - compare two floating-point values with pop, set eflags
fucomi - compare two floating-point values, set eflags
fucomip - compare two floating-point values with pop, set eflags

Their first argument is always ST(0), the second is another FPU register.
To the group of comparison instructions also belong fxam and ftst instructions. They return the information in C0, C2 and C3 flags.

fxam - instruction classifies the value of ST(0),
ftst - instruction compares ST(0) with the value of 0.0.

Transcendental instructions

The transcendental instructions perform calculations of advanced mathematical functions.

fsin - instruction calculates the sine
fcos - calculates the cosine of the argument stored in ST(0).
fsincos - calculates both sine and cosine with the same instruction. The sine is returned in ST(1), the cosine in ST(0).
fptan - instruction calculates the partial tangent
fpatan - calculates the partial arctangent.

After calculating the tangent, the value 1.0 is pushed onto the stack to make it easier to calculate the cotangent later by executing the fdivr instruction. The partial means that this instruction handles only a limited range of input arguments.
The instructions for exponential and logarithmic functions are summarised in table 6.

Table 6: Transcendental Arithmetic Instructions

Mnemonic	operation	note on operands
f2xm1
fyl2x		y is ST(1); x is ST(0)
fyl2xp1		y is ST(1); x is ST(0)

FPU control instructions

The FPU control instructions help the programmer to save and restore the contents of chosen registers if there is a need to use them in an interrupt handler or inside a function. It is also possible to initialise the FPU unit's state or clear errors.

fincstp - increments the FPU register stack pointer
fdecstp - decrements the FPU register stack pointer.

The following set of instructions can perform error checking while execution (instructions without “N”) or perform the operation without checking for error conditions (instructions without “N”).

finit and fninit functions initialise the FPU (either after checking error conditions or without checking them).
fclex and fnclex clear floating-point exception flags.
fstcw and fnstcw store the FPU control word.
fldcw loads the FPU control word.
fstenv and fnstenr store the FPU environment. The environment consists of the FPU control word, status word, tag word, instruction pointer, data pointer, and last opcode register.
fldenv loads the FPU environment.
fsave and fnsave save the FPU state. The state is the operating environment and full register stack.
frstor restores the FPU state.
fstsw and fnstsw store the FPU status word. There is no instruction for restoring the status word.
wait or fwait waits for the FPU to finish the operation.
fnop instruction is the no-operation instruction for the FPU.

Table of Contents