This is an old revision of the document!
In our first scenario, we will modify the conversion library, adding another function which should convert integer input into a hexadecimal representation. We can copy the int_to_ascii function and introduce some simple modifications. First, we need to divide the input value by 16, not by 10.
mov rbx, 16
After each division operation, we will obtain the remainder from the range 0-15. We can't convert this into an ASCII digit the same way as in decimal, because the digits 0-9 and letters A-F do not form a continuous range. We can deal with this situation in different ways. One approach is to check if dl is bigger than 9 and shift it to point to letter characters if true.
cmp dl, 9 ; test if dl > 9 jna zero_to_nine ; if not jump over adjustment add dl, "A"-"9"-1 ; adjust dl with the distance between A and 9 zero_to_nine: add dl, "0" ; convert to ASCII
Another approach is to define the table of characters (lookup table) in the data section containing all digits and letters, and pick the correct character using the xlatb instruction or the mov with proper indirect addressing mode.
.data hex_digits db "0123456789ABCDEF" .code ... lea rcx, hex_digits ; load address of lookup table and rdx, 0000000Fh ; limit the range to 15 mov byte ptr dl, [rcx+rdx] ; convert remainder into ASCII ...
In the second approach, we use indirect addressing with the use of the sum of the rcx and rdx registers. The base address of a table must be loaded to rcx with the use of the lea instruction, not used as a constant. This is because an instruction we could use in 32-bit mode:
mov byte ptr dl, hex_digits[rdx] ; This instruction is NOT VALID in 64-bit mode
used in 64-bit long mode will signal an error. The address of the lookup table is a 64-bit number, but the constant encoded in the used form of the mov instruction can't exceed 32 bits.
To use the mentioned xlatb instruction, we have to preserve the rax before conversion. We will do it by storing it temporarily in rcx. We need to handle the rbx in a different way. In each iteration, set it to 16 before dividing, and to the lookup table address before xlatb.
.code ... mov rbx, 16 ; prepare divisor div rbx ; rax / 16 → remainder in rdx mov rcx, rax ; store temporarily rax lea rbx, hex_digits ; load address of lookup table and rdx, 0000000Fh ; limit the range to 15 mov al, dl ; prepare index in al xlatb ; convert remainder into ASCII mov [rdi], al ; put character to resulting table mov rax, rcx ; restore rax
To improve the performance of our code, in the case of hexadecimal numbers, it is possible to replace the time-consuming division instruction with an instruction to shift the number by four bit positions right. We leave the implementation of this optimisation to the reader.
As the second scenario, we will add to our library a function for displaying floating-point values. This function will allow us to display the results of calculations we implement in further scenarios. According to x64 Windows ABI rules, floating-point values should be passed through XMM registers. We will display a single value, so we'll use the XMM0 register.
Displaying floating-point numbers is a much more complex task than displaying an integer. We will split it into a conversion of the fractional part and a conversion of the integer part. First, we'll store the argument in XMM0 into XMM1 to have the original value unchanged.
Let's start with a check to see if the value is positive or negative. Floating-point numbers are stored as absolute values, with the sign bit in the most significant position. The encoding scheme for positive and negative numbers with the same absolute value differs only in the sign bit. To test whether a number is negative, we can use the movmskps instruction, which copies the sign bits from all elements of a vector into the destination register. As our argument is a scalar, the bit we're interested in is at the lowest position. Shifting the register one position to the right, we can execute a conditional jump. If the argument is negative, we'll change it into positive by clearing a sign bit. The andps instruction with clear_sign_bit variable clears one bit in the XMM1 register.
.data clear_sign_bit dword 07FFFFFFFh, 0FFFFFFFFh, 0FFFFFFFFh, 0FFFFFFFFh ... .code ... ; test if the number is positive or negative movq xmm1, xmm0 movmskps rax, xmm1 rcr rax, 1 jnc float_positive ; change the sign of the scalar andps xmm1, xmmword ptr clear_sign_bit ; do not change the sign float_positive:
We will start the conversion from the least significant digit of the fractional part, limiting precision to thousandths. We obtain the fractional part by subtracting the integer part from the original argument. An integer is obtained with the cvttss2si instruction, which simply cuts out the fractional part of a number. We store the result in rcx for further use.
.data const1000 real4 1000.0 .code ... ; convert fractional part cvttss2si rax, xmm1 ; convert float to int with truncation mov rcx, rax ; store for conversion of an integer part cvtsi2ss xmm2, rax ; convert back into float subss xmm1, xmm2 ; subtract integer part mulss xmm1, const1000 ; we want three fractional digits cvttss2si rax, xmm1 mov rbx, 10 convert_fraction: dec rdi ; starting from the end of the text (least significant) xor rdx, rdx ; prepare to divide rdx:rax by rbx div rbx ; rax / 10 → remainder in rdx add dl, "0" ; convert remainder into ASCII mov [rdi], dl ; write character to buffer test rax, rax ; test if there is still a value for conversion jne convert_fraction
We separate the fractional and integer parts with a dot.
; add dot dec rdi mov byte ptr [rdi], '.'
The integer part is converted with the same algorithm as the fractional, but before we restore its value from rcx.
; convert integer part mov rax, rcx ; restore integer part convert_integer: dec rdi ; starting from the end of the text (least significant) xor rdx, rdx ; prepare to divide rdx:rax by rbx div rbx ; rax / 10 → remainder in rdx add dl, "0" ; convert remainder into ASCII mov [rdi], dl ; write character to buffer test rax, rax ; test if there is still a value for conversion jne convert_integer
After converting the integer part, we need to add “minus” for a negative value. We'll test it again with the same method as at the beginning. For this purpose, we kept the original argument in XMM0.
; test if the number is positive or negative movmskps rax, xmm0 rcr rax, 1 jnc end_float ; add minus if needed dec rdi ; add minus character mov byte ptr [rdi], '-' end_float:
The final part, calculating the string length, is the same as in the conversion of integers.
In another scenario, we will create a library with functions performing the simple calculations on integers and floating-point numbers. We will write functions for adding six integers and six floating-point values. This example will present argument passing through the registers and also through the stack, showing the order and addresses of stack-allocated arguments. The simplest version of a function adds six integers without advanced stack manipulation. Let's present the code of a function first. It takes the first four arguments from registers, and the latter two from the stack. Please note that for each argument, there is an 8-byte space reserved on the stack.
.code ; --------------------------------- ; sum of six integer arguments ; this is a leaf function ; does not need to reserve shadow space ; arguments as passed by MSVC ; a = RCX ; b = RDX ; c = R8 ; d = R9 ; e = [RSP + 28h] ; f = [RSP + 30h] ; --------------------------------- sum_6_int proc mov rax, rcx add rax, rdx ; a + b mov rcx, r8 add rax, rcx ; + c mov rcx, r9 add rax, rcx ; + d mov rcx, QWORD PTR [rsp + 28h] add rax, rcx ; + e mov rcx, QWORD PTR [rsp + 30h] add rax, rcx ; + f ret sum_6_int endp
The stack from a function perspective looks like in a fig.1.
The caller passes arguments according to the Windows x64 ABI. First four arguments through RCX, RDX, R8 and R9 registers. Further arguments are placed onto the stack. The caller is also responsible for reserving the shadow space for all arguments before the call, even those passed through registers. That's why 32 bytes are reserved before the return address is automatically placed on the stack by the call instruction.
Please note the order of arguments. It is assumed that they are placed onto the stack in reverse order. The last argument is placed on the stack first. That's why the 6th argument is at the higher address, next is the 5th argument and next there is a shadow space for arguments 1 - 4. From the perspective of a function, the first argument (or rather its shadow) is just after the return address. As the return address consumes 8 bytes, the shadow space for the first argument is at address SP+8.
How to call such a function? Putting the first four parameters into registers is quite simple. To place remaining arguments onto the stack, it is possible to use the push instruction.
;call sum of 6 integers function mov rcx, 1 ; 1st argmument mov rdx, 2 ; 2nd argmument mov r8, 3 ; 3rd argmument mov r9, 4 ; 4th argmument mov r11, 6 push r11 ; 6th argument mov r10, 5 push r10 ; 5th argument sub rsp, 20h ; shadow space call sum_6_int ; function call add rsp, 30h ; stack cleanup mov rcx, rax ; result in rax
The figure shows the stack organisation from the caller's perspective. First, the 6th argument is pushed onto the stack. Next, the 5th argument is pushed. Next, the shadow space is reserved with the subtraction instruction sub rsp, 20h. Finally, the return address is pushed by the call instruction. The arrows point to the addresses (where RSP points) after the specified instructions.