This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:multiasm:papc:chapter_6_16 [2025/11/22 13:47] – [Cache support instructions] ktokarz | en:multiasm:papc:chapter_6_16 [Unknown date] (current) – external edit (Unknown date) 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Optimisation | + | ====== Optimisation ====== |
| Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is " | ||
| A selection of specific optimisation recommendations is described in this section. | A selection of specific optimisation recommendations is described in this section. | ||
| Line 20: | Line 20: | ||
| - | ===== Cache support instructions | + | ===== Cache utilisation |
| - | In modern microarchitectures, | + | In modern microarchitectures, |
| + | The cache works on two main principles: | ||
| + | * temporal locality | ||
| + | * spatial locality. | ||
| + | The term temporal locality refers to the fact that if a program recently used a certain portion of data, it is likely to need it again soon. It means that if data is used, it remains in a cache for a certain amount of time until other data is loaded into the cache. It is efficient | ||
| + | The term spatial locality refers | ||
| + | It is recommended to write the programs in any programming language, keeping these rules in mind. Some recommendations | ||
| + | * The program should do as much work as possible on one small area of code and data; after doing the job, it can move to the next part. | ||
| + | * The program should avoid frequent jumping over distant regions of memory. | ||
| + | * While processing big multidimensional data arrays, keep in mind their placement in memory (row-wise or column-wise), | ||
| + | * Object-oriented programming helps to utilise cache because members of the class are grouped. | ||
| + | ===== Cache temporal locality ===== | ||
| + | This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop. | ||
| + | In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn' | ||
| + | ===== Cache support instructions ===== | ||
| There are also instructions which allow the programmer to support the processor with cache utilisation. | There are also instructions which allow the programmer to support the processor with cache utilisation. | ||
| * **movntq** saving the contents of the MMX register, bypassing cache | * **movntq** saving the contents of the MMX register, bypassing cache | ||
| Line 43: | Line 56: | ||
| - | ===== Cache temporal locality ===== | + | |
| - | The term temporal locality refers to the fact that if data is used, it remains in a cache for a certain amount of time until other data is loaded into the cache. It is efficient to keep data in a cache instead of reloading it. This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop. | + | |
| - | In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn' | + | |
| ===== Further reading ===== | ===== Further reading ===== | ||