This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_15 [2026/02/20 11:21] – [The use of inc and dec instructions] ktokarz | en:multiasm:papc:chapter_6_15 [2026/02/20 11:49] (current) – [Cache temporal locality] ktokarz | ||
|---|---|---|---|
| Line 2: | Line 2: | ||
| Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | ||
| A selection of specific optimisation recommendations is described in this section. | A selection of specific optimisation recommendations is described in this section. | ||
| + | |||
| + | ===== Data placement ===== | ||
| + | It is recommended to place variables in the memory at their natural boundaries. It means that if the data is 16 bytes, the address should be evenly divisible by 16. For 8-byte data, the address should be divisible by 8. | ||
| + | |||
| + | ===== Registers use ===== | ||
| + | It is recommended to use registers instead of memory for scalar data if possible. Keeping data in registers eliminates the need to load and store it in memory. | ||
| ===== The use of inc and dec instructions ===== | ===== The use of inc and dec instructions ===== | ||
| Line 8: | Line 14: | ||
| ===== Versions of logic instructions ===== | ===== Versions of logic instructions ===== | ||
| While new extensions are introduced, several new instructions appear. In addition to advanced data processing instructions, | While new extensions are introduced, several new instructions appear. In addition to advanced data processing instructions, | ||
| - | |||
| - | |||
| - | ===== Data placement ===== | ||
| - | It is recommended to place variables in the memory at their natural boundaries. It means that if the data is 16 bytes, the address should be evenly divisible by 16. For 8-byte data, the address should be divisible by 8. | ||
| - | |||
| - | ===== Registers use ===== | ||
| - | It is recommended to use registers instead of memory for scalar data if possible. Keeping data in registers eliminates the need to load and store it in memory. | ||
| ===== Pause instruction ===== | ===== Pause instruction ===== | ||
| It is a common method to pause the program execution and wait for an event for a short period in a spin loop. In case of a brief waiting period, this method is more efficient than calling an operating system function, which waits for an event. In modern processors, the **pause** instruction should be used inside such a loop. It helps the internal mechanisms of the processor to allocate hardware resources temporarily to another logical processor. | It is a common method to pause the program execution and wait for an event for a short period in a spin loop. In case of a brief waiting period, this method is more efficient than calling an operating system function, which waits for an event. In modern processors, the **pause** instruction should be used inside such a loop. It helps the internal mechanisms of the processor to allocate hardware resources temporarily to another logical processor. | ||
| - | |||
| ===== Cache utilisation ===== | ===== Cache utilisation ===== | ||
| Line 33: | Line 31: | ||
| * Object-oriented programming helps to utilise cache because members of the class are grouped. | * Object-oriented programming helps to utilise cache because members of the class are grouped. | ||
| ===== Cache temporal locality ===== | ===== Cache temporal locality ===== | ||
| - | | + | Cache temporal locality is the feature |
| - | In a situation where the data processed | + | |
| ===== Cache support instructions ===== | ===== Cache support instructions ===== | ||
| Line 44: | Line 41: | ||
| Fence instructions guarantee that the load and/or store instructions before the fence are completed before the corresponding instruction after the fence. | Fence instructions guarantee that the load and/or store instructions before the fence are completed before the corresponding instruction after the fence. | ||
| - | * **spence** force the memory–cache synchronisation after store instructions | + | * **sfence** force the memory–cache synchronisation after store instructions |
| * **lfence** force the memory–cache synchronisation after load instructions | * **lfence** force the memory–cache synchronisation after load instructions | ||
| * **mfence** force the memory–cache synchronisation after load and store instructions | * **mfence** force the memory–cache synchronisation after load and store instructions | ||
| Line 51: | Line 48: | ||
| * **prefetch** a hint to the processor, | * **prefetch** a hint to the processor, | ||
| * **clflush** flushes a Cache Line from all levels of cache. | * **clflush** flushes a Cache Line from all levels of cache. | ||
| - | |||
| - | |||
| - | |||
| - | |||
| - | |||
| - | |||
| ===== Further reading ===== | ===== Further reading ===== | ||