In this scenario, we will write the program using FPU to calculate some mathematical equations.
Prerequisites
Read the sections “Scenario x64_2: Creating static libraries. Integer to text conversion function.” and “Scenario x64_4: Converting floating point values to text.” You should have the library with the conversion functions ready.
Scenario
Write the code of an assembler main program performing calculation with PFU and display the resulting values with the previously written conversion library. Use various floating-point types: single precision, double precision and double extended precision.
Result
The results are calculated and displayed in a console.
Start
You can create a new assembly source file or modify the file that uses the conversion library created in “Scenario x64_4: Converting floating point values to text.”
Step 1
We will start with a simple addition of two single-precision floating-point numbers to test if all works correctly. At the beginning, we need to include all required libraries and a file containing function prototypes.
; include the system library and our convert library includelib kernel32.lib includelib convert.lib include convert.inc
We need to declare external system functions and a constant for GetStdHandle.
; declare external functions EXTERN WriteConsoleA:PROC EXTERN GetStdHandle:PROC ; constant required by the GetStdHandle system function STD_OUTPUT_HANDLE equ -11
The data section of our program contains the buffer for the text representation of the number to be displayed, the variables required for the proper operation of system functions, our two addends, and a place for the result.
.data buffer db 32 dup(0) ; buffer for a string hOut dq ? ; placeholder for console handle dummy dq ? ; place for dummy parameter var_x real4 4.89 ; first addend var_y real4 17.45 ; second addend result real4 ? ; place for result
The code of the main function is similar to the one presented in “Scenario x64_2: Creating static libraries. Integer to text conversion function.” Here we show the FPU calculations and the conversion function call. Remember that the floating point argument to the conversion function is passed by XMM0.
; perform FPU calculations fld var_x ; load first addend to FPU stack fadd var_y ; add second addend fstp result ; store the result ; convert float to text lea rdx, buffer ; prepare the address of a buffer movss xmm0, result ; pass the result as an argument to a function call float_to_ascii ; call conversion function
Step 2
While we confirmed that our simple program works properly, we can write more complex FPU calculations. Let's compute the average of the values in a table.
We start with the definition of a table and a variable that holds the table length.
.data ... table_real real4 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 table_count DW LENGTHOF table_real
The code to compute the average starts with the preparation for a loop. The number of iterations is stored in rcx. We zero-extend the word variable table_count to the size of rcx with the use of the movzx instruction. We need to prepare the address register to access table elements with indirect addressing. Here, the address register is rdx. We decrement rcx and add 4 to rdx because the first element is pre-loaded to the FPU stack before the loop starts.
; perform FPU calculations - average movzx rcx, table_count lea rdx, table_real fld dword ptr [rdx] sub rcx, 1 add rdx, 4
Inside a loop, we compute the sum of all elements. After finishing a loop, we divide the sum by the number of elements with the use of the integer divide instruction fidiv.
average_loop: fadd dword ptr [rdx] add rdx, 4 loop average_loop fidiv table_count fstp result
Step 3
In this step, we'll solve the quadratic equation. The classical method is to calculate the delta, and later to calculate two possible results: root_1 and root_2. Let's start with writing formulas in Reverse Polish Notation (postfix notation). The formula to calculate delta looks like the following.
** Delta ** b b * 4 a * c * -
This can be calculated with the code below.
.data aa real4 3.0 ; a bb real4 7.0 ; b cc real4 2.0 ; c const_4 real4 4.0 ; constant 4 .code ; calculating delta fld bb ; load b fld bb ; load b fmul ; multiply b * b fld const_4 ; load 4 fld aa ; load a fmul ; multiply 4 * a fld cc ; load c fmul ; multiply 4*a * c fsub ; subtract b*b - 4*a*c
After executing the code, we have the delta at the FPU stack top. We will use it to compute both roots. Before we'll test if delta is not negative.
; here we test if delta is non-negative ftst ; test st(0) fstsw ax ; store status word to ax sahf ; store ah to flags jb no_roots ; jump if delta is negative
It's time to calculate roots. The RPN formula for this calculation looks like this:
** X1 ** b neg delta sqrt + 2 a * /
Writing the code for this calculation, we obtain:
.data const_2 real4 2.0 ; constant 2 root_1 real4 ? ; place for the first result root_2 real4 ? ; place for the second result .code ; calculate first root fld bb ; load b fchs ; change sign of b fld st(1) ; copy delta into st(0) fsqrt ; square root of delta fadd ; add (-b) + square root of delta fld const_2 ; load 2 fld aa ; load a fmul ; multiply 2 * a fdiv ; divide ; we have the first root fstp root_1
The RPN-noted formula for the second root differs by one operation only. Minus instead of plus.
** X2 ** b neg delta sqrt - 2 a * /
Please write the code for the second root calculation on your own. Please note that after calculating the first root, we still have the delta in st(0). Write code to properly handle the situation when we don't have a real solution by displaying the appropriate message instead of results.
Step 4
Our previous program was almost a direct translation from RPN to FPU mnemonics. We can optimise the code in some places. Here are some examples:
The optimised version of the code is shown in the following fragments. They also show the contents of the FPU stack after executing each instruction to track the algorithm.
; Calculating delta st(0); st(1); st(2); st(3); fld1 ; Get constant 1 1; ; ; ; fadd st,st ; Add 1 to 1 to get 2 2; ; ; ; fld st ; Copy it 2; 2; ; ; fmul aa ; = 2a 2a; 2; ; ; fmul st(1),st ; = 4a 2a; 4a; ; ; fxch ; Exchange st and st(1) 4a; 2a; ; ; fmul cc ; = 4ac 4ac; 2a; ; ; fld bb ; Load b b; 4ac; 2a; ; fmul st,st ; = b*b b*b; 4ac; 2a; ; fsubr ; = b*b - 4ac (always pop) delta; 2a; ; ; ; you can test here if the delta is non-negative
The following code presents computing both roots in the same fragment of a program. Please note that d represents delta, while pv and mv results before division by 2a.
; Compute both roots st(0); st(1); st(2); st(3); fsqrt ; = sqrt(b*b - 4ac) sqr(d); 2a; ; ; fld bb ; Load b b; sqr(d); 2a; ; fchs ; Make it negative -b; sqr(d); 2a; ; fxch ; Exchange sqr(d); -b; 2a; ; fld st ; Copy square root sqr(d); sqr(d); -b; 2a; fadd st,st(2) ; pv = -b + root(b*b - 4ac) pv; sqr(d); -b; 2a; fxch ; Exchange sqr(d); pv; -b; 2a; fsubp st(2),st ; mv = -b - root(b*b - 4ac) pv; mv; 2a; ; fdiv st,st(2) ; Divide plus version pv/2a; mv; 2a; ; fstp root_1 ; Store it mv; 2a; ; ; fdivr ; Divide minus version mv/2a; ; ; ; fstp root_2 ; Store it ; ; ; ;
This optimised implementation requires detailed stack tracing, but can yield significant speed improvements.
Step 5
Make calculations with double-precision and double-extended precision values. Compare results. Note that for longer data formats, you need to modify the conversion library, remembering that SSE does not operate on double-extended precision values.
Result validation
The program should display proper floating-point results of calculations. Please check the positive and negative input values.
If it does not print the proper value?
Please remember the order and placement of the arguments and the result of the function. Please keep the Windows ABI calling convention and preserve any non-volatile register before use. You can use FPU registers freely, remembering that they are organised as a stack.