Specific Elements of the x86 and x64 Architectures

Specific Elements of the x86 and x64 Architectures

The x86 architecture was created in the late 70th years of the 20th century. The technology available at that time didn't allow the implementation of advanced integrated circuits containing millions of transistors on a single silicon die. The 8086, the first processor in the x86 family, required additional support circuitry to operate. This led to the need to make decisions about a compromise among efficiency, computational capabilities, silicon size, and cost. This is why Intel invented certain elements of the architecture. One of them is the segmentation mechanism that extends the addressing space from the 64 kB typically available to 16-bit processors to 1 MB. In this chapter, we present some specific features of the x86 and x64 architectures.

Segmented addressing in real mode

The 8086 can address memory only in so-called real mode. In this mode, the address is calculated using two 16-bit elements: the segment and the offset. The 8086 implements four special registers to store the segment part of the address: CS, DS, ES, and SS. During program execution, all addresses are calculated relative to one of these registers. The program is divided into three segments containing the main elements. The code segment contains processor instructions and their immediate operands. The instructions' addresses are related to the CS register. The data segment is related to the DS register. It contains data allocated by the program. The stack segment contains the program stack and is related to the SS register. If needed, it is possible to use an additional segment related to the ES register. It is used by default by string instructions.

Diagram showing segmented memory in 8086 real mode with four segment registers (CS, DS, SS, ES) pointing to code, data, stack, and extra segments in 1MB address space. Shows how multiple logical segments map to physical memory regions. — Figure 1: Segments and segment registers in real mode

Although the 8086 processor has only four segment registers, a program can define many segments. The limitation is that the processor can access only four of them at the same time, as presented in figure 1. To access other segments, it must change the content of the segment register.

The address, which consists of two elements, the segment and the offset, is named a logical address. Both numbers that form a logical address are 16-bit. So, how does a processor calculate a 20-bit address using two 16-bit values? It is done as follows. The segment part, always taken from the chosen segment register, is shifted left by 4 bit positions. Four bits on the right side are filled with zeros, forming a 20-bit value. The offset value is added to the result of the shift. The result of the calculations is named the linear address. It is presented in the figure 2. In the 8086 processor, the linear address equals the physical address, which is provided via the address bus to the computer's memory.

Diagram showing address calculation in real mode: segment register value shifted left by 4 bits (creating 20-bit base), then offset added to produce 20-bit linear/physical address. Shows example with segment and offset values combining. — Figure 2: Address calculation in real mode

The segment in the memory is called a physical segment. The maximum size of a single physical segment is 64 kB (65536 B), and it can start at an address that is evenly divisible by 16. In the program, we define logical segments which can be smaller than 64 kB, can overlap, or even start at the same address.

Segmented addressing in protected mode

With the introduction of processors capable of addressing larger memory spaces, the real addressing mode was replaced by descriptor-based addressing. We will briefly describe this mechanism, taking the 80386 as an example. The 80386 processor is a 32-bit machine built according to the IA-32 architecture. Using 32 bits, it is possible to address 4 GB of linear address space. Segmentation in 32-bit processors can be used to implement the protection mechanisms. They prevent access to segments created by other processes, ensuring that processes can operate simultaneously without interfering. Every segment is described by its descriptor. The descriptor contains important information about the segment, including the starting address, limit (the segment size), and attributes. As it is possible to define many segments at the same time, descriptors are stored in the memory in descriptor tables. IA-32 processors contain two additional segment registers, FS and GS, which can be used to access two extra data segments (figure 3), but all segment registers are still 16-bit.

Diagram showing segment registers in 32-bit protected mode with six registers (CS, DS, SS, ES, FS, GS) and their corresponding descriptor table entries containing base address, limit, and attributes for each segment's protection and memory management. — Figure 3: Segments and segment registers in protected mode

The segment register in protected mode holds the segment selector, which is an index in the table of descriptors. The table is stored in memory, created and managed by the operating system. Each segment register has an additional part that is hidden from direct access. To speed up the operation, the descriptor is downloaded from the table into the hidden part automatically, each time the segment register is modified. It is schematically presented in figure 4.

Diagram showing address calculation in protected mode: segment register holds index into descriptor table in memory, descriptor automatically loaded contains base address, which is added to offset to produce 32-bit linear address passed to paging unit. — Figure 4: Address calculation in protected mode

Although segmentation allows for advanced memory management and the implementation of memory protection, none of the popular operating systems, including Windows, Linux-based, or MacOS, has ever used it. In Windows, all segment registers, via descriptors, point to the zero base address and the maximal limit, resulting in the flat memory model. In this approach, the segmentation de facto does not function. Memory protection is implemented at the paging level. The flat memory model is shown in the figure 5.

Diagram of flat memory addressing showing all segment registers (CS, DS, SS, ES, FS, GS) set to zero base address with maximum limit, causing offset to equal linear address directly, effectively bypassing segmentation. — Figure 5: Flat memory addressing

Paging

Paging is a mechanism for translating linear addresses to physical addresses. Operating systems extensively use it to implement memory protection and virtual memory mechanisms. The virtual memory mechanism allows the operating system to use the entire address space with less physical memory installed in the computer. Paging is supported by the memory management unit using a table structure stored in memory. In 32-bit mode, the tables are organised in a two-level structure, with a single page directory table and a set of page tables. The pages can be 4kB or 4MB in size. The 1-bit information about the page size (named PS) is stored in the page directory entry.

4kB pages

Each entry in the page directory holds a pointer to the page table, and every page table stores the pointers to the pages. The linear address is divided into three parts. The first (highest) part is a 10-bit index into the page directory table, the middle 10 bits form an index into the page table, and finally, the last (least significant) 12 bits are the offset within the table in memory. This mechanism is shown in figure 6.

Diagram of 32-bit mode paging with 4kB pages: linear address divided into 10-bit page directory index, 10-bit page table index, and 12-bit offset. Two-level page table structure translates linear to physical address. — Figure 6: Paging in 32-bit mode with 4kB pages

4MB pages

For 4MB pages, the page table level is omitted. The page directory holds the direct pointer to the table in memory. In this situation, the entry in the page directory is indexed with 10 bits, and the offset within the page is 22 bits long. It is shown in the figure 7.

Diagram of 32-bit mode paging with 4MB pages: linear address divided into 10-bit page directory index and 22-bit offset within 4MB page. Single-level page directory directly maps to physical pages. — Figure 7: Paging in 32-bit mode with 4MB pages

In 64-bit mode, the tables are organised in a four or five-level structure. In a 4-level structure, the highest level is a single-page map level-4 table; next are page directory pointer tables, page directory tables, and page tables. The linear address is divided into six parts. Each table is indexed with 9 bits and can store up to 512 entries. It is shown in the figure 8. The pages can be 4kB, 2MB, or 1GB in size. For 2MB pages, the page tables level is omitted, while for 1GB pages, the page directory level is additionally omitted.

Diagram of 64-bit mode paging with 4kB pages: linear address divided into four 9-bit indices (PML4, PDPT, PD, PT) for four-level hierarchy and 12-bit page offset. Shows hierarchical page table structure for 64-bit translation. — Figure 8: Paging in 64-bit mode with 4kB pages

As currently built computers have significantly less physical memory than can be theoretically addressed with a 64-bit linear address, producers decided to limit the usable address space. That's why the 4-level paging mechanism recognises only 48 bits, leaving the upper bits unused. In 5-level paging, 57 bits are recognised. The most significant part of the address should be the value of the highest recognisable bit. As the most significant bit of the number represents the sign, duplicating this bit is known as sign extension.

Sign-extended addresses having 48 or 57 recognised bits are known as canonical addresses, while others are non-canonical. It is presented in figure 9. In current machines, only canonical addresses are valid.

Diagram showing canonical addresses in 64-bit mode with 48-bit and 57-bit variants. Shows how bits 47 or 56 are sign-extended into upper bits, making certain address ranges valid (canonical) and others invalid (non-canonical) for processor access. — Figure 9: Canonical addresses in 64-bit mode

Addressing in x64 processors

Because operating system software producers didn't use segmentation, AMD and Intel decided to abandon it in 64-bit processors. For backwards compatibility, modern processors can run in 32-bit mode with segmentation, but newer operating systems use 64-bit addressing, known as long mode and referred to as x64. The only possible addressing is a flat memory model with segment registers and descriptors set to zero address as the beginning of the linear memory space.

In x64-bit mode, some instructions can use the FS and GS segment registers as base registers. Operating systems' kernels also use them.

The 64-bit mode theoretically allows the processor to address a vast amount of memory, up to 16 exabytes. It is expected that such a large memory will not be installed in currently built computers, so the processors limit the available address space not only at the paging level but also physically, due to a limited number of address bus lines.

Table of Contents

Specific Elements of the x86 and x64 Architectures

Segmented addressing in real mode

Segmented addressing in protected mode

Paging

Addressing in x64 processors