====== Data Types and Encoding ====== The x86 family of processors allows for computations on integer, floating-point, and vector data. This data is stored in memory as a series of bytes of varying length. In this chapter, we will present fundamental data types and the types derived from them in assembly language programs. ===== Fundamental data types ===== Fundamental data types are the data elements stored in memory as 8-bit (Byte), 16-bit (Word), 32-bit (Doubleword), 64-bit (Quadword), or 128-bit (Double quadword), as shown in figure {{ref>fundamentaldata}}. Many instructions allow processing of these data types without special interpretation. It depends on the programming engineer how to interpret the inputs and results of instruction execution. The order of bytes in data types that contain more than a single byte is little-endian. The lower (least significant) byte is stored at the lower address of the data. This address represents the address of the data as a whole.
{{ :en:multiasm:cs:fundamental_data_types.png?600 |Diagram showing fundamental data types: byte (8 bits), word (16 bits), doubleword (32 bits), quadword (64 bits), and double quadword (128 bits). Shows binary representation with bit positions and byte organization.}} Fundamental Data Types
The data types that can be used in x64-architecture processors are integers and floating-point numbers. Integers are processed by the main CPU as single values (scalars). They can also be packed into vectors and processed with specific SIMD instructions, including MMX and, in part, SSE and AVX. Integers can be interpreted as unsigned or signed. They can also represent the pointer to some variable or address within the code. The FPU processes scalar real numbers. SSE and AVX instructions support calculations with scalars or vectors composed of real values. All possible variants of data types are stored within the fundamental data types. Further in this chapter, we describe all possible singular or packed integers and floating-point values. ===== Integers ===== Integers are the numbers without the fractional part. In the x64 architecture, it is possible to define a variety of data types of different sizes, all of which are based on bytes. A single byte forms the smallest possible information item stored in memory. Even if not all bits are effectively used, the smallest element which can be stored is a byte. Two bytes form the word. It means that in the x64 architecture, the Word data type is 16 bits. Two words form the Double Word data type (32 bits), and four words form the Quad Word data type (64 bits). With the use of large registers in modern processors, it is possible to use the Double Quad Word data type, containing 128 bits (sometimes called Octal Word), in a few instructions. ===== Integer scalar data types ===== Integer data types can be one, two, four or eight bytes in length. Unsigned integers are binary encoded in natural binary code. It means that the starting value is 0 (zero), and the maximum value is formed with bits "1" at all available positions. The x64 architecture supports the unsigned integer data types shown in table {{ref>uinttypes}}. ^ Name ^ Number of bits ^ Minimum value ^ Maximum value ^ Minimum value (hex) ^ Maximum value (hex) ^ | Byte | 8 | 0 | 255 | 0x00 | 0xFF | | Word | 16 | 0 | 65535 | 0x0000 | 0xFFFF | | Doubleword | 32 | 0 | 4294967295 | 0x00000000 | 0xFFFFFFFF | | Quadword | 64 | 0 | 18446744073709551615 | 0x0000000000000000 | 0xFFFFFFFFFFFFFFFF |
Unsigned Integer Data Types
Signed integers are binary encoded in 2's complement binary code. The most significant bit of the value is the sign bit. If it is zero, the number is non-negative; if it is one, the value is negative. It means that the starting value is encoded with the highest bit set to 1 and all other bits set to 0. The maximum value is formed with a "0" bit at the highest position and bits "1" at all other positions. The x64 architecture supports the signed integer data types shown in table {{ref>sinttypes}}. ^ Name ^ Number of bits ^ Minimum value ^ Maximum value ^ Minimum value (hex) ^ Maximum value (hex) ^ | Signed Byte | 8 | -128 | 127 | 0x80 | 0x7F | | Signed Word | 16 | -32768 | 32767 | 0x8000 | 0x7FFF | | Signed Doubleword | 32 | -2147483648 | 2147483647 | 0x80000000 | 0x7FFFFFFF | | Signed Quadword | 64 | -9223372036854775808 | 9223372036854775807 | 0x8000000000000000 | 0x7FFFFFFFFFFFFFFF |
Signed Integer Data Types
===== Integer vector data types ===== Vector data types were introduced with SIMD instructions starting with the MMX extension, and followed in the SSE and AVX extensions. They form packed data types that contain multiple elements of the same size. The elements can be considered as signed or unsigned depending on the algorithm and instructions used. The 64-bit packed integer data type contains eight Bytes, four Words or two Doublewords as shown in figure {{ref>packedint64}}.
{{ :en:multiasm:cs:packed_integers_64.png?600 |Diagram of 64-bit packed integer formats showing: 8 packed bytes (8 bits each), 4 packed words (16 bits each), 2 packed doublewords (32 bits each). Shows how different element sizes pack into 64-bit register.}} 64-Bit Packed Integer Data Types
The 128-bit packed integer data type contains sixteen Bytes, eight Words, four Doublewords or two Quadwords as shown in figure {{ref>packedint128}}.
{{ :en:multiasm:cs:packed_integers_128.png?600 |Diagram of 128-bit packed integer formats showing: 16 packed bytes, 8 packed words, 4 packed doublewords, 2 packed quadwords. Shows element organization within 128-bit XMM register.}} 128-Bit Packed Integer Data Types
The 256-bit packed integer data type contains thirty-two Bytes, sixteen Words, eight Doublewords, four Quadwords or two Double Quadwords as shown in figure {{ref>packedint256}}.
{{ :en:multiasm:cs:packed_integers_256.png?600 |Diagram of 256-bit packed integer formats showing: 32 packed bytes, 16 packed words, 8 packed doublewords, 4 packed quadwords, 2 packed double quadwords. Shows packing options for AVX 256-bit registers.}} 256-Bit Packed Integer Data Types
The 512-bit packed integer data type contains sixty-four Bytes, thirty-two Words, sixteen Doublewords, eight Quadwords or four Double Quadwords as shown in figure {{ref>packedint512}}. Double Quadwords are not used as operands; they are the results of some operations only.
{{ :en:multiasm:cs:packed_integers_512.png?600 |Diagram of 512-bit packed integer formats showing: 64 packed bytes, 32 packed words, 16 packed doublewords, 8 packed quadwords, 4 packed double quadwords. Shows maximum packing for AVX-512 registers.}} 512-Bit Packed Integer Data Types
===== Floating point values ===== Floating-point values store the data encoded for calculation on real numbers. Depending on the algorithm's precision requirements, we can use different data sizes. Scalar data types are supported by the FPU (Floating Point Unit), offering single-precision, double-precision, or double-extended-precision real numbers. In C/C++, they are referred to as the float, double, and long double data types, respectively. Vector (packed) floating-point data types can be processed by many SSE and AVX instructions, offering fast vector, matrix or artificial intelligence calculations. Vector units can process half-precision, single-precision and double-precision formats. The 16-bit Brain Float format was introduced to compute dot products to improve the efficiency of AI training and inference algorithms. Floating point data types are shown in figure {{ref>floattypes}} and described in table {{ref>tablefloattypes}}. The table shows the number of bits used. In reality, the number of mantissa bits is assumed to be one bit longer, because the highest bit represents the integer part, which is always "1", so there is no need to store it (except for Double extended data format, where the integer bit is present).
{{ :en:multiasm:cs:floating_types.png?600 |Diagram showing floating-point data types: double extended (80-bit), double precision (64-bit), single precision (32-bit), half precision (16-bit), and brain float (16-bit). Shows sign, exponent, and mantissa field layout for each.}} Floating Point Data Types in x64 Architecture
^ Name ^ Bits ^ Mantissa bits ^ Exponent bits ^ Min value ^ Max value ^ | Double extended | 80 | 64 | 15 | {{:en:multiasm:cs:float_ep_min.png?105 }} | {{:en:multiasm:cs:float_ep_max.png?105 }} | | Double precision | 64 | 52 | 11 | {{:en:multiasm:cs:float_dp_min.png?100 }} | {{:en:multiasm:cs:float_dp_max.png?95 }} | | Single precision | 32 | 23 | 8 | {{:en:multiasm:cs:float_sp_min.png?100 }} | {{:en:multiasm:cs:float_sp_max.png?90 }} | | Half precision | 16 | 10 | 5 | {{:en:multiasm:cs:float_hp_min.png?90 }} | {{:en:multiasm:cs:float_hp_max.png?80 }} | | Brain Float | 16 | 7 | 8 | {{:en:multiasm:cs:float_bf_min.png?100 }} | {{:en:multiasm:cs:float_bf_max.png?90 }} |
Floating Point Data Types
===== Floating point vector data types ===== Floating-point vectors are formed with single or double-precision packed data formats. They are processed by SSE or AVX instructions using a SIMD approach. A 128-bit packed data format can store four single-precision data elements or two double-precision data elements. A 256-bit packed data format can store eight single-precision values or four double-precision values. A 512-bit packed data format can store either 16 single-precision values or 8 double-precision values. These packed data types are shown in figure {{ref>packedfloattypes}}. Instructions operating on 16-bit half-precision values, or Brain Floats, can use twice as many such elements simultaneously as single-precision data. It is worth mentioning that some instructions operate on a single floating-point value (scalar), using only the lowest elements of the operands.
{{ :en:multiasm:cs:packed_floats.png?600 |Diagram of packed floating-point formats: 128-bit with 4 single-precision or 2 double-precision, 256-bit with 8 single-precision or 4 double-precision, 512-bit with 16 single-precision or 8 double-precision. Shows element packing for SSE/AVX operations.}} Packed Floating Point Data Types in x64 Architecture
===== Bit field data type ===== A bit field is a data type whose size is counted by the number of bits it occupies. The bit field can start at any bit position in the fundamental data type and can be up to 32 bits long. MASM supports it with the RECORD data type. The bit field type is shown in figure {{ref>bitfieldtype}}.
{{ :en:multiasm:cs:bit_field_type.png?300 |Diagram of bit field data type showing variable-length field starting at arbitrary bit position within fundamental data type, can span from 1 to 32 bits in length.}} The Bit Field Data Type
===== Pointers ===== Pointers store the address of the memory which contains interesting information. They can point to the data or the instruction. If the segmentation is enabled, pointers can be near or far. The far pointer contains the logical address (formed with the segment and offset parts). The near pointer contains only the offset. The offset can be 16, 32 or 64 bits long. The segment selector is always stored as a 16-bit number. Illustration of possible pointer types is shown in figure {{ref>pointertypes}}.
{{ :en:multiasm:cs:pointers_types.png?600 |Diagram showing near and far pointer types: near pointers (16-bit, 32-bit, or 64-bit offset only), far pointers (16-bit segment selector + 16/32-bit offset). Shows memory layout and logical vs linear addressing.}} The Near and Far Pointers Types
The offset is often the result of complex addressing-mode calculations and is called the effective address.