Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:multiasm:papc:chapter_6_16 [2025/11/22 14:17] – [Cache support instructions] ktokarzen:multiasm:papc:chapter_6_16 [Unknown date] (current) – external edit (Unknown date) 127.0.0.1
Line 1: Line 1:
-====== Optimisation (DRAFT) ======+====== Optimisation ======
 Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is "Intel® 64 and IA-32 Architectures Optimization" ((https://www.intel.com/content/www/us/en/developer/articles/technical/intel64-and-ia32-architectures-optimization.html)). AMD published the document "Software Optimization Guide for the AMD Zen5 Microarchitecture" ((https://docs.amd.com/v/u/en-US/58455_1.00)). Optimisation strongly depends on the microarchitecture of the processor. Some optimisation recommendations change together with new versions of processors. Producers usually publish the most up-to-date recommendations. The last release of the Intel documentation is "Intel® 64 and IA-32 Architectures Optimization" ((https://www.intel.com/content/www/us/en/developer/articles/technical/intel64-and-ia32-architectures-optimization.html)). AMD published the document "Software Optimization Guide for the AMD Zen5 Microarchitecture" ((https://docs.amd.com/v/u/en-US/58455_1.00)).
 A selection of specific optimisation recommendations is described in this section. A selection of specific optimisation recommendations is described in this section.
Line 20: Line 20:
  
  
-===== Cache support instructions =====+===== Cache utilisation =====
 In modern microarchitectures, the cache memory is essential for improving performance. In general, the processor handles cache memory in the most efficient way possible; however, it is easy to write a program that prevents the processor from utilising this mechanism effectively.  In modern microarchitectures, the cache memory is essential for improving performance. In general, the processor handles cache memory in the most efficient way possible; however, it is easy to write a program that prevents the processor from utilising this mechanism effectively. 
 The cache works on two main principles:  The cache works on two main principles: 
Line 32: Line 32:
   * While processing big multidimensional data arrays, keep in mind their placement in memory (row-wise or column-wise), which is specific to the programming language.   * While processing big multidimensional data arrays, keep in mind their placement in memory (row-wise or column-wise), which is specific to the programming language.
   * Object-oriented programming helps to utilise cache because members of the class are grouped.   * Object-oriented programming helps to utilise cache because members of the class are grouped.
 +===== Cache temporal locality =====
 + This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop.
 +In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn't mean that the data is immediately stored directly in memory. It can remain in the internal processor's buffers, and it is likely that the last version is not visible to other units of the computer. It is the programmer's responsibility to synchronise the data using the **sfence** (Store Fence) instruction.
  
 +===== Cache support instructions =====
 There are also instructions which allow the programmer to support the processor with cache utilisation. There are also instructions which allow the programmer to support the processor with cache utilisation.
   * **movntq** saving the contents of the MMX register, bypassing cache   * **movntq** saving the contents of the MMX register, bypassing cache
Line 52: Line 56:
  
  
-===== Cache temporal locality ===== +
- This feature helps improve performance in situations where the program uses the same variables repeatedly, e.g. in a loop. +
-In a situation where the data processed exceeds half the size of a level 1 cache, it is recommended to use the non-temporal data move instructions **movntq** and **movntdq** to store data from registers to memory. These instructions are hints to the processor to omit the cache if possible. It doesn't mean that the data is immediately stored directly in memory. It can remain in the internal processor's buffers, and it is likely that the last version is not visible to other units of the computer. It is the programmer's responsibility to synchronise the data using the **sfence** (Store Fence) instruction.+
  
 ===== Further reading ===== ===== Further reading =====
en/multiasm/papc/chapter_6_16.1763813829.txt.gz · Last modified: by ktokarz
CC Attribution-Share Alike 4.0 International
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0