This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_16 [2025/11/22 13:44] – [Cache temporal locality] ktokarz | en:multiasm:papc:chapter_6_16 [Unknown date] (current) – external edit (Unknown date) 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Optimisation | + | ====== Optimisation ====== |
| Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | ||
| A selection of specific optimisation recommendations is described in this section. | A selection of specific optimisation recommendations is described in this section. | ||
| Line 20: | Line 20: | ||
| - | ===== Cache support instructions | + | ===== Cache utilisation |
| - | In modern microarchitectures, | + | In modern microarchitectures, |
| + | The cache works on two main principles: | ||
| + | * temporal locality | ||
| + | * spatial locality. | ||
| + | The term temporal locality refers to the fact that if a program recently used a certain portion of data, it is likely to need it again soon. It means that if data is used, it remains in a cache for a certain amount of time until other data is loaded into the cache. It is efficient | ||
| + | The term spatial locality refers | ||
| + | It is recommended to write the programs in any programming language, keeping these rules in mind. Some recommendations | ||
| + | * The program should do as much work as possible on one small area of code and data; after doing the job, it can move to the next part. | ||
| + | * The program should avoid frequent jumping over distant regions of memory. | ||
| + | * While processing big multidimensional data arrays, keep in mind their placement in memory (row-wise or column-wise), | ||
| + | * Object-oriented programming helps to utilise cache because members of the class are grouped. | ||
| + | ===== Cache temporal locality ===== | ||
| + | This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop. | ||
| + | In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn' | ||
| + | ===== Cache support instructions ===== | ||
| There are also instructions which allow the programmer to support the processor with cache utilisation. | There are also instructions which allow the programmer to support the processor with cache utilisation. | ||
| * **movntq** saving the contents of the MMX register, bypassing cache | * **movntq** saving the contents of the MMX register, bypassing cache | ||
| Line 35: | Line 48: | ||
| * **mfence** force the memory–cache synchronisation after load and store instructions | * **mfence** force the memory–cache synchronisation after load and store instructions | ||
| + | Some instructions are hints to the processor indicating that the programmer expects the data to be stored in cache rather than in memory, or they do not expect to use the data in cache anymore. | ||
| * **prefetch** a hint to the processor, | * **prefetch** a hint to the processor, | ||
| * **clflush** flushes a Cache Line from all levels of cache. | * **clflush** flushes a Cache Line from all levels of cache. | ||
| Line 42: | Line 56: | ||
| - | ===== Cache temporal locality ===== | + | |
| - | The term temporal locality refers to the fact that if data is used, it remains in a cache for a certain amount of time until other data is loaded into the cache. It is efficient to keep data in a cache instead of reloading it. This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop. | + | |
| - | In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn' | + | |
| ===== Further reading ===== | ===== Further reading ===== | ||